apache eagle dublin hadoop summit 2016
TRANSCRIPT
![Page 1: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/1.jpg)
![Page 2: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/2.jpg)
2
Apache EagleMonitor Hadoop in Real Time
Yong Zhang | Senior Architect | [email protected] Manoharan | Senior Product Manager | @lycos_86
![Page 3: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/3.jpg)
Big Data @ eBay
800MListings *
159M Global Active Buyers *
*Q3 2015 data
7 Hadoop Clusters*
800MHDFS operations (single cluster)*
120 PB Data*
Hadoop @ eBay
![Page 4: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/4.jpg)
HADOOP SECURITY
Authorization & Access Control
Perimeter Security
Data Classification
Activity Monitoring
SecurityMDR
• Perimeter Security• Authorization &
Access Control• Discovery• Activity Monitoring
Security for Hadoop
![Page 5: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/5.jpg)
Who is accessing the data?
What data are they accessing?
Is someone trying to access data that they don’t have access to?
Are there any anomalous access patterns?
Is there a security threat?
How to monitor and get notified during or prior to an anomalous event occurring?
Motivation
![Page 6: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/6.jpg)
Apache Eagle
Apache Eagle: Monitor Hadoop in Real Time
Apache Eagle is an Open Source Monitoring Platform for Hadoop eco-system, which started with monitoring data activities in Hadoop. It can instantly identify access to sensitive data, recognize attacks/malicious activities and blocks access in real time.
In conjunction with components such as Ranger, Sentry, Knox, DgSecure and Splunk etc., Eagle provides comprehensive solution to secure sensitive data stored in Hadoop.
![Page 7: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/7.jpg)
Apache Eagle Composition
Apache Eagle
Integrations Alert EngineHDFSAUDIT
HIVEQUERY
HBASEAUDIT
CASSANDRAAUDIT
MapRAUDIT
2 HADOOPPerformanceMetric
Namenode JMX Metrics
DatanodeJMX Metrics
SystemMetrics
3 M/R JobPerformanceMetric
History Job Metrics
Running Job Metrics
4 Spark JobPerformanceMetric
Spark Job Metrics
QueueMetrics
1 Data Activity Monitoring
RMJMXMetrics
1 Policy Store
2 Metadata API
3 Scalability
4 Extensibility
[Domains] [Applications]
![Page 8: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/8.jpg)
More Integrations
•Cassandra•MapR•Mongo DB•Job•Queue
![Page 9: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/9.jpg)
Extensibility
Ranger• As remediation engine• As generic data source
DgSecure• Source of truth for data classification
Splunk• Syslog format output• EAGLE alert output is the 1st abstraction of analytics and
Splunk is the 2nd abstraction
![Page 10: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/10.jpg)
Eagle Architecture
![Page 11: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/11.jpg)
Highlights
1. Turn-key integration: after installation, user defines rules2. Comprehensive rules on high volume of data: Eagle solves some
unique problem in Hadoop3. Hot deploy rule: Eagle does not provide a lot of charts, instead it
allows user to write ad-hoc rule and hot deploy it.4. Metadata driven: kept in mind, here metadata includes policy, event
schema and UI component etc.5. Extensibility: Keep in mind that Eagle can’t succeed alone, Eagle has to
be integrated with other system for example data classification, policy enforcement etc.
6. Monolithic storm topology: application pre-processing are running together with alert engine.
![Page 12: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/12.jpg)
Example 1: Integration with HDFS AUDIT log
• Ingestion KafkaLog4jAppender+Ka
fka Logstash+Kafka
• Partition By user
• Pre-processing Sensitivity join Command re-assembler
Namenode
Kafka Partition_1
Kafka Partition_2
Kafka Partition_N
StormKafkaSpout
User1 User1
Alert Executor_1
Alert Executor_2
Alert Executor_K
User2 User2
User1
User2
![Page 13: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/13.jpg)
Data Classification - HDFS
• Browse HDFS file system• Batch import sensitivity metadata through Eagle API• Manually mark sensitivity in Eagle UI
![Page 14: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/14.jpg)
One user command generates multiple HDFS audit events Eagle does reverse engineering to figure out original user command Example COPYFROMLOCAL_PATTERN = “every a = eventStream[cmd==‘getfileinfo’] ” + “-> b = eventStream[cmd==‘getfileinfo’ and user==a.user and src==str:concat(a.src,‘._COPYING_’)] ” + “-> c = eventStream[cmd==‘create’ and user==a.user and src==b.src] ” + “-> d = eventStream[cmd==‘getfileinfo’ and user==a.user and src==b.src] ” + “-> e = eventStream[cmd==‘delete’ and user==a.user and src==a.src] ” + “-> f = eventStream[cmd==‘rename’ and user==a.user and src==b.src and dst==a.src]”
2015-11-20 00:06:47,090 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private dst=null perm=null proto=rpc2015-11-20 00:06:47,185 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private._COPYING_ dst=null perm=null proto=rpc2015-11-20 00:06:47,254 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=create src=/tmp/private._COPYING_ dst=null perm=root:hdfs:rw-r--r-- proto=rpc2015-11-20 00:06:47,289 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=getfileinfo src=/tmp/private._COPYING_ dst=null perm=null proto=rpc2015-11-20 00:06:47,609 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=delete src=/tmp/private dst=null perm=null proto=rpc2015-11-20 00:06:47,624 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/10.0.2.15 cmd=rename src=/tmp/private._COPYING_ dst=/tmp/private perm=root:hdfs:rw-r--r-- proto=rpc
User Command Re-assembly
![Page 15: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/15.jpg)
• Policy evaluation is stateful (one user’s data has to go to one physical bolt)• Partition by user all the way (hash)• User is not balanced at all• Greedy algorithm https://en.wikipedia.org/wiki/Partition_problem#The_greedy_algorithm
Data Skew Problem
![Page 16: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/16.jpg)
Policy weight is not even• Regex policy is CPU intensive• Window based policy is Memory intensive
Computation Skew Problem
![Page 17: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/17.jpg)
Example 2: Integration with Hive
• Ingestion Yarn API
• Partition user
• Pre-processing
Sensitivity join Hive SQL
parser
![Page 18: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/18.jpg)
Data Classification - Hive
• Browse Hive databases/tables/columns• Batch import sensitivity metadata through Eagle API• Manually mark sensitivity in Eagle UI
![Page 19: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/19.jpg)
Eagle Alert Engine Overview
1 Runs CEP engine on Apache Storm• Use CEP engine as library (Siddhi CEP)• Evaluate policy on streamed data• Rule is hot deployable
2 Inject policy dynamically• API• Intuitive UI
3 Scalability• Computation # of policies (policy placement)• Storage # of events (event partition)
4 Extensibility for policy enforcement• Post-alert processing with plugin
![Page 20: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/20.jpg)
Run CEP Engine on Storm
Storm BoltCEPWorkerCEPWorker
CEPWorker
… …
Policy Check Thread Polic
y Store
Metadata API
event1
event1event1
event1
policy1,2,3,4,5,6
policy1,2,3policy1
policy2
policy3
Storm Bolt
event1
policy4,5,6
event schema
![Page 21: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/21.jpg)
Primitives – event, policy, alert
Raw Event2015-10-11 01:00:00,014 INFO FSNamesystem.audit: allowed=true [email protected] (auth:KERBEROS) ip=/10.0.0.1 cmd=getfileinfo src=/tmp/private dst=null perm=null
Alert EventTimestamp, cmd, src, dst, ugi, sensitivityType, securityZone
PolicyviewPrivate: from hdfsAuditLogEventStream[(cmd=='getfileinfo') and (src=’/tmp/private’)]
Alert2015-10-11 01:00:09[UTC] hdfsAuditLog viewPrivate user_tom /10.0.0.1 The Policy "viewPrivate" has been detected with the below information: timestamp="1445993770932" allowed="true" cmd="getfileinfo" host="/10.0.0.1" sensitivityType="PRIVATE" securityZone="NA" src="/tmp/private" dst="NA" user=“user_tom”
![Page 22: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/22.jpg)
Event Schema
• Modeling event
![Page 23: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/23.jpg)
1 Single event evaluation• threshold check with various
conditions
Policy Capabilities
2 Event window based evaluation• various window semantics (time/length sliding/batch
window)• comprehensive aggregation support
3 Correlation for multiple event streams• SQL-like join
4 Pattern Match and Sequence• a happens followed by b
Powered by Siddhi 3.0.5, but Eagle provides dynamic capabilities and intuitive API/UI
![Page 24: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/24.jpg)
1 Namenode master/slave lag from every a = hadoopJmxMetricEventStream[metric=="hadoop.namenode.journaltransaction.lastappliedorwrittentxid"] -> b = hadoopJmxMetricEventStream[metric==a.metric and b.host != a.host and (max(convert(a.value, "long")) + 100) <= max(convert(value, "long"))] within 5 min select a.host as hostA, a.value as transactIdA, b.host as hostB, b.value as transactIdB insert into tmp;
Some policy examples
3 Namenode HA state changefrom every a = hadoopJmxMetricEventStream[metric=="hadoop.namenode.hastate.active.count"] -> b = hadoopJmxMetricEventStream[metric==a.metric and b.host == a.host and (convert(a.value, "long") != convert(value, "long"))] within 10 min select a.host, a.value as oldHaState, b.value as newHaState, b.timestamp as timestamp, b.metric as metric, b.component as component, b.site as site insert into tmp;
2 Namenode last checkpoint time• from hadoopJmxMetricEventStream[metric ==
"hadoop.namenode.dfs.lastcheckpointtime" and (convert(value, "long") + 18000000) < timestamp] select metric, host, value, timestamp, component, site insert into tmp;
![Page 25: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/25.jpg)
Define policy in UI and API
curl -u ${EAGLE_SERVICE_USER}:${EAGLE_SERVICE_PASSWD} -X POST -H 'Content-Type:application/json' \ "http://${EAGLE_SERVICE_HOST}:${EAGLE_SERVICE_PORT}/eagle-service/rest/entities?serviceName=AlertDefinitionService" \ -d ' [ { "prefix": "alertdef", "tags": { "site": "sandbox", "application": "hadoopJmxMetricDataSource", "policyId": "capacityUsedPolicy", "alertExecutorId": "hadoopJmxMetricAlertExecutor", "policyType": "siddhiCEPEngine" }, "description": "jmx metric ", "policyDef": "{\"expression\":\"from hadoopJmxMetricEventStream[metric == \\\"hadoop.namenode.fsnamesystemstate.capacityused\\\" and convert(value, \\\"long\\\") > 0] select metric, host, value, timestamp, component, site insert into tmp; \",\"type\":\"siddhiCEPEngine\"}", "enabled": true, "dedupeDef": "{\"alertDedupIntervalMin\":10,\"emailDedupIntervalMin\":10}", "notificationDef": "[{\"sender\":\"[email protected]\",\"recipients\":\"[email protected]\",\"subject\":\"missing block found.\",\"flavor\":\"email\",\"id\":\"email_1\",\"tplFileName\":\"\"}]" } ] '
1 Create policy using API 2 Create policy using UI
![Page 26: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/26.jpg)
Scalability
•Scale with # of events•Scale with # of policies
![Page 27: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/27.jpg)
Statistics• # of events evaluated per
second• audit for policy change
Eagle ServiceAs of 0.3.0, Eagle stores metadata and statistics into HBASE, and support Druid as metric store.
Metadata• Policy• Event schema• Site/Application/UI Features
HBASE• Store metrics• Store M/R job/task data• Rowkey design for time-series
data• HBase Coprocessor
Raw data• Druid for metric• HBASE for M/R job/task
etc.• ES for log (future)
1 Data to be stored
2 Storage 3 API/UI
Druid• Consume data from Kafka
HBASE• filter, groupby, sort,
top
Druid• Druid query API• Dashboard in Eagle
![Page 28: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/28.jpg)
Alert Engine Limitations in Eagle 0.3
1 High cost for integrating• Coding for onboarding new data source• Monolithic topology for pre-processing and
alert
3 Policy capability restricted by event partition• Can’t do ad-hoc group-by policy expressionFor example from groupby user to groupby cmd
2 Not multi-tenant• Alert engine is embedded into application• Many separate Storm topologies
4 Correlation is not declarative• Coding for correlating existing data sources
If traffic is partitioned by user, policy only supports expression of user based group-by
One storm topology even for one trivial data source
Even if it is a simple data source, you have to write storm topology and then deploy
Can’t declare correlations for multiple metrics
5 Stateful policy evaluation• fail over when bolt is down
How to replay one week history data when node is down
![Page 29: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/29.jpg)
Eagle Next Releases
• Improve User experience Remote start storm topology Metadata stored in RDBMS
Eagle 0.4 Eagle 0.5
• Alert Engine as Platform No monolithic topology Declarative data source onboard Easy correlation Support policies with any field
group-by Elastic capacity management
![Page 30: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/30.jpg)
USER PROFILE ALGORITHMS…Eigen Value Decomposition
• Compute mean and variance
• Compute Eigen Vectors and determine Principal Components
• Normal data points lie near first few principal components
• Abnormal data points lie further from first few principal components and
closer to later components
![Page 31: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/31.jpg)
USER PROFILE ARCHITECTURE
![Page 32: Apache Eagle Dublin Hadoop Summit 2016](https://reader036.vdocuments.us/reader036/viewer/2022081517/58f2d6991a28abc8138b4569/html5/thumbnails/32.jpg)
http://eagle.incubator.apache.org
https://github.com/apache/incubator-eagle Github
Welcome Contributors in Apache Eagle
Dev Mail List
@TheApacheEagleTwitter
Q & A