netflixoss meetup season 3 episode 1
TRANSCRIPT
Season 3 Episode 1Feb 11, 2015
Ruslan Meshenberg - @rusmeshenberg
Introduction
● One new way to eval○ Zero To Docker
● Three community users○ IBM Watson○ Nike Digital○ Pivotal
Agenda - Lightning Talks● Eight new projects
○ Atlas○ Prana○ Raigad○ Genie 2○ Inviso○ Dynomite○ Nicobar○ MSL
AtlasRoy Rapoport - @royrapoport
In-House Telemetry? Inconceivable!
● Crowded OSS field!○ Cacti, InfluxDB, OpenTSDB, Nagios, Icinga, NeDi, Zabbix, Observium, Sensu, Zenoss,
OpenNMS, Bosun, Prometheus, etc
● Not to mention commercial products
● Some shortcomings ...
In-House Telemetry? Inconceivable!
● Agility Mismatch● Cloud (and Netflix Ecosystem) Integration● Multiple Data Sources● Scale● No, seriously. Scale
○ 2011: 10M/minute○ ~2x Increase per quarter○ Now up to 1.3B/minute
If You Build It …
Also …
● Decent UIs● Alerting
○ And alert threshold analysis and recommendations● Real-Time Analytics● Integration with Hive and EMR● Dashboards frameworks
Also …
● Composable● So we can change our minds later …
For now …
● Query layer● Back end
Soon …
● Improved deployment● Publish client● Alerting● Better UI
PranaDiptanu Choudhury - @diptanu
Motivations
● The Netflix Platform stack is JVM based● Platform features are provided to developers
via client libraries○ Service Discovery○ Client Side Load Balancing○ Monitoring and alerting client libraries
The Netflix Ecosystem
Meet Prana
● Prana provides the same set of features to non-jvm or non-netflix-platform based software
● It allows applications to gel with the Netflix Ecosystem
Prana Features
● Easy to use http based api ○ Load Balancing via Ribbon○ Service discovery via Eureka client○ Monitoring via Atlas Client
● Extensible via a plugin framework● Highly Configurable
RaigadSagar Loke - @sagar_loke
Raigad - Motivation
● Elasticsearch Side Car – Co-process runs along side ES process
● Helps to automate ES deployment○ ~50 Clusters in test -- ~180TB data○ ~45 Clusters in prod -- ~780TB data
● Node Discovery and Tracking● Automatic Index Management● Scheduled Backup and Restore● Geared towards running in AWS Environment
Auto ES Deployments
● Based on configuration parameters; tunes Elasticsearch.yml file
● Multi-region support● Currently follows dedicated Master-Data-
Search deployment based on ASG Names
Node Discovery and Tracking● Sample implementation using Cassandra● C* keeps track of metadata information of ES Clusters● ES instance reads C* to discover other nodes during
bootstrap● Storing metadata in C* helps in Multi-Region
deployments
Auto Index Management● Provides configuration properties for Auto Index
Management● Based on specific index date suffix (YYYYMMDD), old
indices are cleaned and new indices are created● Before running Index Manager
Auto Index Management … continued
● After running Index Manager
● Index Manager job can be scheduled
Running in AWS
● Automatic updates to Security Groups when new nodes are added or removed
● Supports IAM Credentials● Scheduled Snapshot Backup to S3 -- uses
elasticsearch-cloud-aws plugin● Publish ES Metrics to Servo - Centralized
Monitoring System
Genie 2Tom Gianos
DataWarehouse
Prod VPCBonusQuery Prod Test
ProcessingClusters
Clients
Service
Tools
Our Current Architecture
CLI’s
Goals For Genie 2● Develop a generic data model, which would let jobs run
on any multi-tenant distributed processing cluster.● Implement a flexible cluster and command selection
algorithm for running a job.● Provide richer API support.● Implement a more flexible, extensible and robust
codebase.
{ "user": "tgianos", "name": "PrestoJob.1421807841069", "commandArgs": "-f script.presto ", "clusterCriterias": [ { "tags": [ "presto", "prod" ] }, { "tags": [ "adhoc" ] } ], "commandCriteria": [ "presto" ], "tags": [ "headers", "presto", "BigDataPortal" ]}
Our Current Deployment
● 19 i2.2xlarge nodes in prod cluster○ Configured to allow room to scale up as needed
● 34 max jobs per node● ~17,000 Jobs Per Day
Daniel Weeks
DynomiteMinh Do - @timiblossom
What is Dynomite?● Dynamo layer on top of a non-distributed system
(Redis/Memcache)○ Peer-to-peer ○ Replication○ Sharding○ Gossipping○ Multi-datacenters and
racks awareness○ Encryption○ Linear scale
Dynomite Node
Network Topology
Operation features
● Florida - sidecar application to manage Dynomite clusters (like Priam for Cassandra)
● Data backup (Redis only)de replacement● Data warm-up (Redis only)● Client failover strategy - Dyno (our java
client)● Atlas/Servo integration for operation metrics
Incoming features
● Higher read/write consistencies● Data reconciliation or data repair ● Other data storages besides Redis and
Memcache● Better/more generic warm-up method● Spark driver integration● and others
Performance
● AWS:○ 126 nodes total in us-east-1, us-west-2, eu-west-1○ r3-xlarge○ 1K data payload
● 250K Write RPS, 250K Read RPS
● Client observed latencies○ average less than 1ms○ 99th at ~1.5ms○ 99.5th at ~2.5ms
NicobarDynamic Scripting Library for Java
Vasanth Asokan
What is Nicobar?Mainly, two things:
1. A Pluggable, Dynamic Scripting Framework for Java
(powered by)
2. A Modular Classloading System
Traditional Java Classloader Hierarchy
Powered by JBoss Modules
Nicobar Module Classloader Hierarchy
Putting it all together
MSLMitch Zollinger
What is MSL?
MSL = Message Security Layer
MSL is a modern security protocol which enables arbitrary application protocols to be secured over arbitrary transport protocols.
Performance
● sub-second playback start● MSL messaging stacked with app protocol
○ request can have: device authentication, user authentication, key exchange & application message
○ response can have: key exchange, authentication renewal, application message
● Netflix streaming should start faster than changing channels on your cable box!
Reliability
● We need 4-5 “9s” of reliability● MSL has automatic error recovery● We had to remove reliance on 3rd party PKI● Client time: not needed by MSL
Modern Protocol Design
● Human readable JSON vs. complex binary format○ ASN.1 security issues go away
● Multiple implementations: Java, JS, C#, …○ JS: updateable in-field
Flexibility
● Pluggable○ authentication○ crypto algorithms
● Standard porting API○ Can use W3C WebCrypto, for example
Deployment Models
● Trusted Services Network○ All servers shares a common master key allowing
the same level of trust across the network● Peer-to-Peer
○ Every pair of entities shares connection specific keys & credentials
Security / Feature List
● encrypt / decrypt● device authentication● user authentication● integrity protection● key exchange
● anti-replay protection● compression● chunked messaging
Zero To DockerAndrew Spyker - @aspyker
● Up and running in minutes
● Before - Documented technology that we expected you to assemble
● Now - Running technology that we assembled and validated
● Not - Production Ready. Examples only, not run this way at Netflix (security, HA, monitoring, etc.)
Netflix OSS on your laptop
Docker Host (ex. Virtual Box on OSX)
Ubuntu 14.04
single kernel
Con
tain
er #
1Fi
lesy
stem
+
proc
ess
Eur
eka
Con
tain
er
Zuul
Con
tain
er
Ano
ther
C
onta
iner
...
Trusted and Transparent Builds
● Start with Dockerhub registry○ Pull images that you know were built securely○ All you need if you just want to run them
● Inspect the linked github Dockerfile○ Want to know how NetflixOSS was configured?○ Want to know how NetflixOSS code was built?○ All code and configuration explicitly documented
What is available?
From https://hub.docker.com/u/netflixoss/
● asgard● eureka● edda● sketchy● security monkey● exhibitor
● sample karyon application
● zuul● atlas
Nike DigitalAlan Scherger - @flyinprogrammer
Where we started...> Datacenter 2 Cloud (AWS)> Cloud native architecture using microservices> Defined a Cloud Blueprint> Pioneered a REST application bootstrap> Maintain a boilerplate to define transitive dependencies.
How do we do metrics across billions (or 100s) of
microservices?
● Instrumentation to JMX● Graphite Observer to capture metrics
Observer Modifications
● Use Eureka to find a Graphite node
● Use a healthcheck to timeout the tcp socket
How are we going to store all of these metrics?
● Graphite Carbon compliant● Cassandra metric storage● Elasticsearch metric search● C* and ES cross-region replication
enable a global view of the metrics
Cyanitehttps://github.com/pyr/cyanite
How do we make these tools Blueprint
compliant?
Sidecars to the rescue!
Priam + Cassadra = Done
Raigad + Elasticsearch ; Prana + Cyanite
So those didn’t exist - sour.
Generic Sidecar
● Application daemon
● Convention over configuration groovy scripts.
configure.groovy
Generate 3 config files off eureka data.
Add the ingredients that produce code.
Altas Jr.
Kevin Haverlock [email protected] Aroop Pandya [email protected]
Kelly Abuelsaad [email protected] Diamond [email protected]
IBM Watson Developer Cloud
…
http://www.msnbc.com/msnbc/how-supercomputer-sees-the-state-the-union
Visual RecognitionImage/Video recognition and classification service to provide assessment of a user from their images
Extract information from text: People, Organizations, Locations, Events, and the relationships between them
User ModelingImproved understanding of people's preferences to help engage users on own terms
Language Identification
Machine Translation
Concept Expansion
Message Resonance
Question and Answer
Relationship Extraction
Text to SpeechThe conversion of text to outputted audio stream
Speech to TextConverts speech into text
Tradeoff Analyticshelps people make better choices while taking into account multiple, often conflicting, goals that matter
Concept AnalyticsLinks documents that you provide with a pre-existing graph of concepts
…
……
…
…
…
…
•––
•
•
•
•
Joshua Long - @starbuxman
Pivotal“bootifuL” microservices
with spring cloud & Netflix oss
@starbuxman
@Grab("spring-boot-starter-actuator")@RestController class GreetingsController {
@RequestMapping("/hi/{name}")def hi(@PathVariable String name){
[ greeting: "Hello, " + name +"!" ]}
}
> spring run greeting.groovy> spring jar greeting.groovy greeting.jar
@starbuxman
import org.springframework.cloud.config.server.EnableConfigServer;
@SpringBootApplication@EnableConfigServerpublic class ConfigurationServerApplication {
public static void main(String[] args) throws Exception { SpringApplication.run(ConfigurationServerApplication.class, args); }}
spring: cloud: config: server: uri: ${MY_CONF:https://github.com/some/git-repository}
@Value("${some.property}")private String someProperty ;
@starbuxman
@SpringBootApplication@EnableEurekaClientpublic class DogeApplication { // …
// src/main/resources/bootstrap.ymlspring: application: name: doge-service
@starbuxman
@Componentclass ReliableClient {
@HystrixCommand( fallbackMethod = "defaultDogeLink") public Link buildDogeLink() { // insert volatile
// service-to-service call here }}
@SpringBootApplication@EnableHystrixDashboardpublic class HystrixApplication {
public static void main(String[] args) { SpringApplication.run(HystrixApplication.class, args); }}
@starbuxman
zuul: proxy: mapping: /api addProxyHeaders: true route: account-service: /accounts doge-service: /doges
zuul: proxy: mapping: /api //: true route: /api/accounts /api/doges
@starbuxman
spring: oauth2: client: clientId: acme clientSecret: acmesecret resource: tokenInfoUri: http://localhost:8002/auth/oauth/check_token id: openid serviceId: resource
@SpringBootApplication@RestController@EnableOAuth2Resourcepublic class SsoResourceApplication {
public static void main(String[] args) { SpringApplication.run(SsoResourceApplication.class, args); }
@RequestMapping("/hi") String hi(@RequestParam Optional<String> name) { return "Hello" + name.map(n -> ", " + n).orElse("") + "! "; }}
Josh Long (龙之春)@starbuxman
@springcentral [email protected]
github.com/joshlong
Referencesspring.io/guidesgithub.com/spring-cloud/github.com/spring-cloud-samples/github.com/joshlong/spring-dogegithub.com/joshlong/spring-doge-microservicedocs.spring.io/spring-boot/
Questions?
Please join us next doorfor mingling, drinks and food!
@netflixoss
Thank you!