from zero to data flow in hours with apache nifi

Copyright © 2016, Schlumberger, All rights reserved.

From Zero to Data FlowIn Hours with Apache Nifi

Hadoop Summit – San Jose 2016

Chris HerreraSchlumberger


Agenda

• Why is composable data flow important to the drilling industry

• Current State of the System

• The Breaking Point to the new system

• An unexpected workflow in testing

• How are we using it today

• What’s Next


Legal Notices This presentation is for informational purposes only. STATEMENTS AND OPINIONS EXPRESSED IN THIS PRESENTATION ARE THOSE OF THE PRESENTER AND DO NOT REFLECT THE OPINIONS OF SCHLUMBERGER. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY REPRESENTATIONS AND/OR WARRANTIES EXPRESS OR IMPLIED. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY RESPONSIBILITY FOR THE CONTENT, ACCURACY, AND/OR COMPLETENESS OF THE INFORMATION IN this presentation. This presentation, and any recordings or reproductions in various media formats, including, without limitation, print, audio, and video, is the copyrighted work of Schlumberger, and Schlumberger hereby retains all intellectual property and/or proprietary rights related thereto. Schlumberger and the Schlumberger logo are trademarks of Schlumberger in the U.S. and/or other countries. Other names and brands referenced in this presentation are the trademarks of their respective owners, and any references thereto are not endorsements or approvals. Copyright © 2016, Schlumberger, All rights reserved.


Introduction

• 2 Years managing product development and innovation teams working on real time data ingestion and delivery

• 5 years of experience in the Hadoop ecosystem

• 11 years of experience with various aspects of the oilfield (operational and technical)

Chris HerreraSchlumberger


Wireline

Measurement / Logging While Drilling

Mud logging

Fluids

Completions

Cementing

Rig • Several contractors brought in to develop and complete the well

• Can be comprised of one, or most of the time many companies

• All bringing their own system, a lot of times without a central repository of data

• Can be within decent cell connectivity, or out deep in the middle of a jungle with only 128k of high latency bandwidth

The Major Components of a Drilling Project


Where Does This Data Need to Go?

RT Server

Operational Support

Client Monitoring

Processing and Print Centers


Workflow of Data During and Post Operations

Proc

essin

g Ce

nter

Acqu

isitio

n

Data

Ser

ver

Classification & Labelling

Quality Control

Classification

Quality Control

Hosting

QC & Labelling

Conversion

Data Delivery

KPI &

Rep

ortin

g ProcessingAcq

Sales and Job Planning

Data Processor

Customer

Manager

Client Data Delivery

Sales

Field Engineer


Input

DLIS

LAS1.22.0 3.0

WITSLevel 0Level 1Level 2

CSV

Profibus Modbus

What Does This Mean In A Data Sense

Output

CSV PDS

LAS1.22.0 3.0

DLIS

RT Server


What Does This Mean in a Volume Sense

~9000Users / Month

~10Files / Minute

~480Data

Queries / sec

~3050 Wells / month


Context

Fidelity

TimeAcquisition - Field Interpretation - Office

A Quick(ish) Note On The Importance of Data Provenance

• Need to retain the fidelity throughout the flow.


Typical Data Problems Concerns

• What is the time zone of the data we are receiving – one day UTC...

• ”Ahh, I see you did not implement that part of the standard...”

• Wait, Why are you sending data at 5 times the sampling rate of the sensor...

• I did not get the memo that you were changing your data model today...

• Governmental / Client data residency concerns


Current Solution…

• 100+ Man Years of effort over 14 years

• ~2,000,000 + Lines of Code

• Extreme barrier to entry for workflow changes

• Very little understanding of what happened to the data

Input

DLISLAS1.2

2.0 3.0


CSV

Profibus Modbus

Output

CSV PDSLAS1.2

2.0 3.0

DLIS

RT Server


We Needed A Simpler – Maintainable Solution…


The Original Plan…

Rabbit MQ

DLIS Parser

ETP Endpoint

LAS Parser Data

Writer

{}DB

Event Publisher

Node JS

What About:

• Data cleansing

• Routing

• The ability to debug what has gone wrong

• TIME (estimated 6 man months)


How does Nifi fit into the equation?

• Knowing where data came from is crucial (and often missing) to real time decision making

• The ability to visualize the data flow at a granular level aids in troubleshooting and operational understanding

• With several processors already available, there is a low barrier to entry when it comes to data flow creation


Enter Nifi…

Processor Creation Data Flow Creation Creation Play…

10 Man HoursETPWITSML 1.3.1.1 / 1.4.1.1LAS 1.2 / 2.0

1 Man Day


Prototype Setup

Data Source Processor Input

Data Cleansing

Data Enrichment

{ }Repo

Data Storage

Put Data

2 Man Days

• Append Well Name• Append Client Name• Append Run name• Append Pass Name

Process Group:GetUpdate

Process Group:Fix Time ZoneRemove Absent indexes

Data Cleansing

Routing


What About Testing!


Testing Landscape Today

2.2 TB Test Data

• 22 Applications

• 14 Different formats of data

• Data of questionable quality

• Stored on a file share

Effort

• .5 man effort / sprint on maintenance

• 2 weeks to perform a full test


Step 1: Data Set Curation – Creating the Set of Reference

LAS1.22.0 3.0


CSV

Clean Test Data Set

2.2 TB Test Data

6 Hours


Docker

Step 2: Immediate Test Harness

Clean Test Data Set

• Step 1: Need Data

• Step 2: Docker pull xxx.xxx.xxx.xxx:xxxx/flowTest

• Step 3: add put processor

• Step 4: start dataflow

From: 2 weeks to setup a test to:


• Docker

Step 3: Immediate Live Data Testing

Production RT System

Processor Input

Testing Processor

Group

Anonymize Data

• Significantly cuts down time to test application against real data• Especially in

brownfield applications

• Brings a level of confidence to the project that otherwise would be missing.


Next Steps


Use Cases to be Explored for MiniFi – Rig Data Ingestion with Provenance

RT Server

• Understanding the chain of custody from sensor to user

• Tracking the provenance of the data as it traverses through the system


Thank You! Questions?

from zero to data flow in hours with apache nifi

Technology