a technical dive into defensive trickery

A Technical DiveInto Defensive Trickery

Dan Kaminsky

Chief Scientist

White Ops

Hi!

• I’m Dan Kaminsky• Been fixing things for almost two decades

• Broke a big thing

• People only remember that

Mission of this talk

• You may think things are impossible.

• You may think some of these specific things are impossible.

• I want to challenge your assumptions.

• DEATH TO NIHILISM• With the healing power of surprising data

• Also I’ve given quite a few high level keynotes as of late and I’d like to actually discuss the nerdery that’s consumed me this year.

• LET’S DANCE

Security Is Hard

• Denial of Service AttacksDDoS is hard to remediate

• CryptographyTLS is hard to deploy

• Data Loss PreventionAttacks are hard to survive

• Code SafetyNot getting owned is hard

Make Security Easy: What we’re doing about it• Denial of Service Attacks

DDoS is hard to remediateOverflowd: Let the victims of network flows, learn from Netflow

• CryptographyTLS is hard to deployJFE: Launch one Daemon, all networking is TLS secured w/ valid cert

• Data Loss PreventionAttacks are hard to surviveRatelock: Make the cloud enforce security policies, including hard rate limits

• Code SafetyNot getting owned is hardAutoclave: Run entire operating systems in tighter sandboxes than Chrome

We can do better

• We did do better, at the first O’Reilly Security Hackathon• Led by White Ops Labs (me)

• Hosted at Code for America (awesome)

• Thanks!• Overflowd: +Cosmo Mielke, Jeff Ward

• JFE: +David Strauss of Pantheon

• Ratelock: +Andy McMurry of getmedal.com, Mark Shlimovich

• Moar!• Stay tuned.

Denial of Service AttacksDDoS is hard to remediate

Someday, systems will not get hacked

• That day is not today.

• Mirai vs. Dyn == Parts of the Internet actually went down• No defense survives 10M nodes flooding you

• When things go wrong, what can we do?• Step 1: Communicate

• Step 0: Figure out who we’re suppose to communicate with

The Nocmonkey Curse(Besides being called monkeys)• 1) Spoofed Traffic

• Attackers lie about where they are on the network• This will always be possible

• 2) Asymmetrically Routed Traffic• Traceroute just shows how to reach your attacker• It doesn’t show how their traffic is reaching you

• These are the problematic packets!

• 3) Bad Contact Data• IP address ranges are large, “Autonomous systems” aren’t, contact data is stale

• Attacks are usually remediated, but it’s hard, slow, unreliable, not scaling• Literally the opposite of what the Net is supposed to be• Can we do better?

The Two Great Hopes

• Attacker networks hit victim networks.• They’re not directly connected – many parties in the middle.

• 1) Everyone monitors their networks• At least for traffic management and capacity planning

• Generally use Netflow – provides source/dest metrics with light protocol analysis

• 2) Not everyone on the Internet is a jerk• And even if they are, getting abuse calls is annoying, and the big floods are

bad for business

• Many would act, if the benefit was incremental and the risk was low

Netflow usually just goes to a network’s own operators, and mass aggregators.

Maybe just a little should flow to the networks being affected.

If they already knew, why do we have to call them?

Overflowd:Stochastic Traffic Factoring Utility

1/1M packets cause anti-abuse metadata to be sent to source and dest, by Netflow infrastructure.

https://github.com/dakami/overflowd

Demo

• {'data': {'bcount': 682512, 'protocol': 6, 'tos': 0, 'etime': 1325314888, 'daddr': '122.166.77.74', 'pcount': 17001…• Whitelisted flow metadata, so recipient can match

• 'signature': {'key': 'd52b9644ba6ffd2bdaa6505e649fd80ca…'signature': 'z5yMEHH0pYe++uOiNhWzLkCyXsT…• NaCl Signatures, unchained for now• “Oh, somebody’s spoofing? OK, what signature have I been seeing all year, on other

networks”

• 'metadata': {'info': 'FLOWSEEN', 'class': 'INFORMATIONAL', 'time': 1477778027.138109}}• Could also have MACHINE_SUSPICIOUS, HUMAN_SUSPICIOUS,

HUMAN_CONFIRMED_PLEASE_CONTACT, etc

• ‘contact’: {‘email’: ‘[email protected]’}

Still Deciding On Channels

• 65535/udp• Theend• Doesn’t require acknowledgement, does need fragmentation

• ICMP• Would follow packets further along route, maybe• Might get dropped earlier too

• HTTP/HTTPS• Many networks have an easier time picking up .well-known web paths• Can’t just be passively received

• TODO

Explicit Plan

• We have no idea how precisely this data would be, or should be consumed• We do know we don’t want to share more much more data than legitimate

person should already know

• Not sending raw netflow, not sending at high rates

• May send faster on known badness – badness and packet count are not equal!

• We think interesting and useful things would be built in the presence over overflowd

Cryptography:TLS is hard to deploy

Crypto is hard.

That’s just one service. Here’s more.

Has Anyone Ever Not Seen This?

Well, at least nobody’s judging you for a not entirely perfect TLS suite…

Those are secure configurations.Here’s the insecure one.•

Reality

• TLS required certificate authorities

• Certificate authorities required bizdudes

• Software vendors couldn’t automate bizdudes

• Software vendors couldn’t automate TLS

• Software vendors could and did automate listening on standard ports• Just not with security

• The TLS mess chains back to the devops non-viability of automatically acquiring certificates

We Live In The (Near) Future

• Let’s Encrypt• Free Certificate Authority

• Allows Automatic Certificate Provisioning using open ACME protocol

• Services can in fact autoprovision certificates now!• Caddy

• HAProxy

• Nginx

Why Not Both Everything?

JFE:Jump to Full Encryption

# ./jfe -Dhttps://github.com/dakami/jfe

# curl http://163.jfe.examplehello worl

# curl https://163.jfe.examplehello worl

# curl https://163.jfe.example:40080<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

What’s Going On Here(That you didn’t know existed)• iptables -t mangle -A PREROUTING -p tcp --dport 23:65535 ! -d 127.0.0.1 -j

TPROXY \ --tproxy-mark 0x1/0x1 --on-port 1• Grab all traffic from port 23 through 65K, send it to port 1

• self.sock.setsockopt(socket.SOL_IP, 19, 1) # IP_TRANSPARENT• Allow listener on Port 1, to received traffic from other IPs and Ports

• sniff = client.recv(128, socket.MSG_PEEK).split("\n")[0]• Sniff the first 128 bytes on the socket, without actually “draining” from it

• ctx.set_servername_callback(on_sni)• Do things (like get a new cert) during initial handshaking

• cert=free_tls_certificates.certbotClient.issue_certificate(…)• Get cert from Let’s Encrypt (with a little help)

JFE Just WorksFull system TLS! Fully patched!

Could support other protocols/wrappers!

Bugs! We got ‘em!

• Trusts the client for the name to acquire• Zero configuration == Attacker configuration• Some efforts at validation but incomplete for now• Rate limits at Let’s Encrypt can be problematic

• Low Performance• Threading model only thing that survives blocking network in free_tls_certs• Other languages have problems missing setsockopt or MSG_PEEK or or or…

• Localhost• Connections appear to come from localhost (not great)• Connections are routed to localhost (actually bad, things that bind to

127.0.0.1 are still exposed)

Fixing Localhost: The Plans

• IPTables TPROXY is janky and clearly nobody else has fixed this either• Squid, HAProxy, various SSL MITM attack tools (lol) all get stuck here, try to

just be an intercepting proxy to another host downwire

• NFTables clearly the approach to take• New firewalling subsystem in Linux

• Could gate packet redirection with IP Address Aliases (eth0:1)

• Could gate packet redirection with cgroups (as per containers)

• Full system is powerful, full container might be easier• More aligned with how software is generally being deployed nowadays

Also JFE TODO

• Would need to find a way to query wrapped sockets for metadata• Should figure out how client socket wrapping might work

• Must be mandatory• I have plans here

• Could support/detect encrypted backends• Doesn’t matter if backend has janky crypto if it’s wrapped with something better

• Could integrate with clouds• Open socket on client == provisioned socket on ELB w/ provisioned cert• Amazon does do all this, other clouds do too

• DTLS? IPSec? Websockets? SSH?• Yes, DNSSEC/DANE plays into this. Of course it does.• Many useful things to help on.

TCP IS NOT HARD TO DEPLOY.WHY SHOULD TLS BE?

Data Loss PreventionAttacks are hard to survive

Risk Management Is Not All Or Nothing

• There’s $20 in the Gas Station Cash Register

• Not all corporate payroll for the month of July

• But we assume if they can get any of our data, they probably got all of our data• Why?

They probably got all of our data.

Our Designs Are Often “All or Nothing” Affairs

• Classical JBOS (Just a Bunch Of Servers) design• Shared credentials

• Complex services

• Full mutual trust – root on one is root on all

• Rate limits for a database would be useless in the event of a hack• If you can steal some data…

• …you can disable the rate limits…

• …and steal all the data.

• This is why you’re supposed to salt and stretch stored password hashes• “After your data is lost, make it hard for an attacker to convert it back to passwords”

What is this “After” malarky?

Ratelock:Restricting Data Loss with Serverless Cloud Enforcementhttps://github.com/dakami/ratelock

The Clouds are not JBOS.They provide services with authenticated semantics.Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem Somebody else’s problem

./ratelock.py add foo bartrue(Password stored in DynamoDB, proxied through Lambda)

./ratelock.py check foo bartrue./ratelock.py check foo wrongfalse Both checks against DynamoDB, proxied.

Lambda “invoke” right against function “ratelock” only thing required.

# while [ 1 ]; do ./ratelock.py check foo bar;sleep 0.25; done

true ... true ... true ... true ... false ... false ... falseThe proxy starts providing false errors. The caller doesn’t have the ability to directly bypass the proxy.

The complex server can get completely compromised. The simple policy survives.

“What if you can’t trust Lambda?”

Here’s a string Amazon will verify, but never leak, even to you. USEFUL!

$ ./walliam.py adddemouser 1234567$ cat authdb.json{"demouser":"BvL40myloWAo39hbIpRpKOy4Skdtswcaa7WJUzWf"}We actually create an IAM user “demouser” under a special path. We just create the user, we don’t grant privileges. But we do get a secret key…which that isn’t.

add_useraes = (CTR, sha256(userpw))raw = b64decode(aws_secret)enc = aes.encrypt(raw)saved_pw = b64encode(enc)The secret key is first base64 decoded, and then encrypted with the user’s password. We save that. Why decode?

check_userenc = b64decode(saved_pw)aes = (CTR, sha256(userpw))raw = aes.decrypt(enc)aws_secret = b64encode(raw)To invert the process, we decrypt the saved value with what is supposed to be the user’s password, and base64 encode.

aws_secret can’t be checked offline.

They have to ask IAM. Online.

Good luck doing that 100M times.

If there’s one thing Amazon is going to keep online, it’s IAM.

If we didn’t b64decode the Secret Key, there’d be a simple offline attack – post-decrypt, is it Base64?This is why we aren’t using PyNaCl – we need encryption without integrity, for maybe the first time ever!

Some Notes

• One of the largest e-commerce sites in the world provided required rates for their password server• 7/sec• Yahoo 500M / 7 per sec = 2.26 years• Who are we building instadump for, anyway?

• Backups can go to an asymmetric key – encrypt online, decrypt offline• Not just for passwords, this can rate limit any sort of data loss

• Working on this

• Not just for rate loss, can apply any policy• Notification, delay, extra approvals

• What else can we factor out to the cloud functions?• OpenSSL Engine?

Many server breaches.No known Lambda breaches.No known IAM breaches.Nice table, is it…actuarial?

#NotJustAmazonSomebody at Google App Engine is one of us.

But what if we can’t trust the cloud?

(There have been breaches, there are many clouds, even at single providers…)

Code SafetyNot getting owned is hard.

“If only users would stop running dangerous code.”

This PDF must be read.By somebody.That is their job.

Stop Victim Shaming.It’s not helping.

“Why isn’t everything run in a sandbox? Or at least AV?”

Have you ever tried to find documentation on sandboxing.

Chrome Source Code doesn’t count.

What about Containers?What about Docker?

docker run -it --privileged -p80:80 dakami/guachrome

GREAT FOR DEVELOPERS

Security? Is it easy?

There’s just a lot that containers need to secure:That Chrome instance needs 98 syscalls from the host.• accept access arch_prctl bind brk capset chdir chmod clone close connect

creat dup epoll_create epoll_ctl epoll_wait execve exit exit_group fchmodfchown fcntl fdatasync fstat ftruncate futex getcwd getdents getegidgeteuid getgid getpeername getpid getpriority getrlimit getsocknamegetsockopt gettid getuid ioctl kill listen lseek lstat madvise mkdir mmapmount mprotect mremap munmap nanosleep newfstatat open openatpipe poll ppoll prctl pread pwrite read readlink recvfrom recvmsg rename rt_sigaction rt_sigprocmask sched_getaffinity sched_setschedulersched_yield select sendmsg sendto setfsgid setfsuid setitimer setprioritysetrlimit set_robust_list setsockopt shmat shmctl shmget shutdown signaldeliver sigreturn socket socketpair stat statfs times umask unameunlink wait4 write writev

1) Why it’s 122 pages2) How it’s not easy (for anyone)

Same code, hosted slightly differently…

All of Chrome, Docker, Linux, Java…13 syscalls.• futex ioctl ppoll read recvfrom recvmsg sendto write rt_sigaction

rt_sigreturn readv writev close

• (Yes, shared memory maps and open files are minimal as well.)

• It is much easier to secure 13 syscalls than 98. In fact…

Actually, it looks like this.(Plus a bit of goop to further lockdown ioctl.)

It could probably be smaller.

AutoClave:Syscall Firewalls for VM Isolationhttps://github.com/dakami/autoclave

WARNING: Lots of stuff hasn’t been pushed to master. I prioritized the code other people helped with, and I’d do it again.

https://github.com/dakami/autoclave

Live Demo?Sure, go to https://autoclave.run

You’ll see:

Linux and Windows running fine under extreme syscall firewalls. Fully ephemeral, fully repeatable.(Slightly wider ruleset than just described)

If you’d like to try to break out, here’s hypervisor root (Ctrl-F2)

Who wants to have a PDF parsing party!

(They’re even more fun than crypto parties)

What’s going on?

• VMs have always required less of the host than containers• Easier to secure kernel-to-kernel than userspace-to-kernel

• VMs require many more syscalls to start up, than to continue running• Syscall firewall is thus delayed as long as possible – until VNC/network/explicit

post-boot activation

• Probably the one significant security contribution here

• VMs can be restored from memory, I mean, they actually can• Linux does not really allow process freeze/restore

• CRIU tries. Oh, does it try.

Bypass-shared-memory

• Patch from hyper.sh crew

• I was trying to do this myself, but they actually manage a qemu fork• When restoring from memory, the big part is system memory. It’s read() in

during restore, not fast

• Better method: Generate memory image incrementally with mmap/MAP_SHARED, execute new restorations with mmap/MAP_PRIVATE

• Means 100 instances share the “template state” via Copy on Write• It’s fine, we block madvise

• (Well, now we do)

• Restores move from 5s to <250ms

Bugs

• Need to actually lock user input until system is sufficiently booted• Fails closed, but still fails

• Need to integrate lots of usability tweaks• Need to support slightly different syscall firewalls depending on enabled

features• Need to integrate with hyper.sh/clear containers

• Both want to use virtfs, which requires all the syscalls, both could use virtfs-proxy-helper, not clear fs calls are entirely proxied

• Perf, perf, perf – VMs bleed for every bit of it

• Need a solution that doesn’t require bare metal.• This is an actual good reason a) for nested virt and b) for making nested virt

performant (it’s not)

• Add more VMs, figure out how to host this at scale!

Maybe we don’t need unikernelsto give every incoming connection a completely fresh/ephemeral VMWe like to cheat

We like we like to cheat

Security gets a syscall firewall.Performance gets instant boot.Developers get free reign as root.

This is not a zero same game!Developer Ergonomics is the best phrase.

Let’s Make Security Easy

• Finding an abuse contact was hard. Now you just look for the tracers amongst the noise. Easy.

• TLS was hard. Now you run a daemon, and it’s just there. Easy.

• Surviving a breach was hard. Now you design your systems to lose an amount you can live with. Easy.

• Running dangerous code was…ok, it was always easy. But now not getting infected by that code is also easy.

#MakeSecurityEasyNot just a hashtag. We can do this.• HALP

• I can’t write it all!

• https://github.com/dakami

• https://labs.whiteops.com

• Another hackathon in the very near future is likely, talk to me about interest• [email protected]

https://github.com/dakami

https://labs.whiteops.om/

mailto:[email protected]

a technical dive into defensive trickery

Technology