from zero to data flow in hours with apache nifi
TRANSCRIPT
Copyright © 2016, Schlumberger, All rights reserved.
From Zero to Data FlowIn Hours with Apache Nifi
Hadoop Summit – San Jose 2016
Chris HerreraSchlumberger
Copyright © 2016, Schlumberger, All rights reserved.
Agenda
• Why is composable data flow important to the drilling industry
• Current State of the System
• The Breaking Point to the new system
• An unexpected workflow in testing
• How are we using it today
• What’s Next
Copyright © 2016, Schlumberger, All rights reserved.
Legal Notices This presentation is for informational purposes only. STATEMENTS AND OPINIONS EXPRESSED IN THIS PRESENTATION ARE THOSE OF THE PRESENTER AND DO NOT REFLECT THE OPINIONS OF SCHLUMBERGER. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY REPRESENTATIONS AND/OR WARRANTIES EXPRESS OR IMPLIED. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY RESPONSIBILITY FOR THE CONTENT, ACCURACY, AND/OR COMPLETENESS OF THE INFORMATION IN this presentation. This presentation, and any recordings or reproductions in various media formats, including, without limitation, print, audio, and video, is the copyrighted work of Schlumberger, and Schlumberger hereby retains all intellectual property and/or proprietary rights related thereto. Schlumberger and the Schlumberger logo are trademarks of Schlumberger in the U.S. and/or other countries. Other names and brands referenced in this presentation are the trademarks of their respective owners, and any references thereto are not endorsements or approvals. Copyright © 2016, Schlumberger, All rights reserved.
Copyright © 2016, Schlumberger, All rights reserved.
Introduction
• 2 Years managing product development and innovation teams working on real time data ingestion and delivery
• 5 years of experience in the Hadoop ecosystem
• 11 years of experience with various aspects of the oilfield (operational and technical)
Chris HerreraSchlumberger
Copyright © 2016, Schlumberger, All rights reserved.
Wireline
Measurement / Logging While Drilling
Mud logging
Fluids
Completions
Cementing
Rig • Several contractors brought in to develop and complete the well
• Can be comprised of one, or most of the time many companies
• All bringing their own system, a lot of times without a central repository of data
• Can be within decent cell connectivity, or out deep in the middle of a jungle with only 128k of high latency bandwidth
The Major Components of a Drilling Project
Copyright © 2016, Schlumberger, All rights reserved.
Where Does This Data Need to Go?
RT Server
Operational Support
Client Monitoring
Processing and Print Centers
Copyright © 2016, Schlumberger, All rights reserved.
Workflow of Data During and Post Operations
Proc
essin
g Ce
nter
Acqu
isitio
n
Data
Ser
ver
Classification & Labelling
Quality Control
Classification
Quality Control
Hosting
QC & Labelling
Conversion
Data Delivery
KPI &
Rep
ortin
g ProcessingAcq
Sales and Job Planning
Data Processor
Customer
Manager
Client Data Delivery
Sales
Field Engineer
Copyright © 2016, Schlumberger, All rights reserved.
Input
DLIS
LAS1.22.0 3.0
WITSLevel 0Level 1Level 2
CSV
Profibus Modbus
What Does This Mean In A Data Sense
Output
CSV PDS
LAS1.22.0 3.0
DLIS
RT Server
Copyright © 2016, Schlumberger, All rights reserved.
What Does This Mean in a Volume Sense
~9000Users / Month
~10Files / Minute
~480Data
Queries / sec
~3050 Wells / month
Copyright © 2016, Schlumberger, All rights reserved.
Context
Fidelity
TimeAcquisition - Field Interpretation - Office
A Quick(ish) Note On The Importance of Data Provenance
• Need to retain the fidelity throughout the flow.
Copyright © 2016, Schlumberger, All rights reserved.
Typical Data Problems Concerns
• What is the time zone of the data we are receiving – one day UTC...
• ”Ahh, I see you did not implement that part of the standard...”
• Wait, Why are you sending data at 5 times the sampling rate of the sensor...
• I did not get the memo that you were changing your data model today...
• Governmental / Client data residency concerns
Copyright © 2016, Schlumberger, All rights reserved.
Current Solution…
• 100+ Man Years of effort over 14 years
• ~2,000,000 + Lines of Code
• Extreme barrier to entry for workflow changes
• Very little understanding of what happened to the data
Input
DLISLAS1.2
2.0 3.0
WITSLevel 0Level 1Level 2
CSV
Profibus Modbus
Output
CSV PDSLAS1.2
2.0 3.0
DLIS
RT Server
Copyright © 2016, Schlumberger, All rights reserved.
We Needed A Simpler – Maintainable Solution…
Copyright © 2016, Schlumberger, All rights reserved.
The Original Plan…
Rabbit MQ
DLIS Parser
ETP Endpoint
LAS Parser Data
Writer
{}DB
Event Publisher
Node JS
What About:
• Data cleansing
• Routing
• The ability to debug what has gone wrong
• TIME (estimated 6 man months)
Copyright © 2016, Schlumberger, All rights reserved.
How does Nifi fit into the equation?
• Knowing where data came from is crucial (and often missing) to real time decision making
• The ability to visualize the data flow at a granular level aids in troubleshooting and operational understanding
• With several processors already available, there is a low barrier to entry when it comes to data flow creation
Copyright © 2016, Schlumberger, All rights reserved.
Enter Nifi…
Processor Creation Data Flow Creation Creation Play…
10 Man HoursETPWITSML 1.3.1.1 / 1.4.1.1LAS 1.2 / 2.0
1 Man Day
Copyright © 2016, Schlumberger, All rights reserved.
Prototype Setup
Data Source Processor Input
Data Cleansing
Data Enrichment
{ }Repo
Data Storage
Put Data
2 Man Days
• Append Well Name• Append Client Name• Append Run name• Append Pass Name
Process Group:GetUpdate
Process Group:Fix Time ZoneRemove Absent indexes
Data Cleansing
Routing
Copyright © 2016, Schlumberger, All rights reserved.
What About Testing!
Copyright © 2016, Schlumberger, All rights reserved.
Testing Landscape Today
2.2 TB Test Data
• 22 Applications
• 14 Different formats of data
• Data of questionable quality
• Stored on a file share
Effort
• .5 man effort / sprint on maintenance
• 2 weeks to perform a full test
Copyright © 2016, Schlumberger, All rights reserved.
Step 1: Data Set Curation – Creating the Set of Reference
LAS1.22.0 3.0
WITSLevel 0Level 1Level 2
CSV
Clean Test Data Set
2.2 TB Test Data
6 Hours
Copyright © 2016, Schlumberger, All rights reserved.
Docker
Step 2: Immediate Test Harness
Clean Test Data Set
• Step 1: Need Data
• Step 2: Docker pull xxx.xxx.xxx.xxx:xxxx/flowTest
• Step 3: add put processor
• Step 4: start dataflow
From: 2 weeks to setup a test to:
Copyright © 2016, Schlumberger, All rights reserved.
• Docker
Step 3: Immediate Live Data Testing
Production RT System
Processor Input
Testing Processor
Group
Anonymize Data
• Significantly cuts down time to test application against real data• Especially in
brownfield applications
• Brings a level of confidence to the project that otherwise would be missing.
Copyright © 2016, Schlumberger, All rights reserved.
Next Steps
Copyright © 2016, Schlumberger, All rights reserved.
Use Cases to be Explored for MiniFi – Rig Data Ingestion with Provenance
RT Server
• Understanding the chain of custody from sensor to user
• Tracking the provenance of the data as it traverses through the system
Copyright © 2016, Schlumberger, All rights reserved.
Thank You! Questions?