hpcc systems - ecl for programmers - big data - data scientist

22
By Fujio Turner HPCC Systems - ECL Intro Big Data Querying Made EZ Enterprise Control Language explained for Programmers @FujioTurner

Upload: fujio-turner

Post on 10-Jul-2015

272 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

By Fujio Turner

HPCC Systems - ECL Intro Big Data Querying Made EZ

Enterprise Control Language explained for Programmers

@FujioTurner

Page 2: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,!

accounting and academic markets. !!

LexisNexis has been in business since 1977 with over 30,000 employees worldwide. 

What is HPCC Systems?Who is LexisNexis?

LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.

Page 3: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Comparison

JAVA C++

Petabytes

1-80,000 Jobs/day

Since 2005

Exabytes

Non-Indexed 4X-13X

Since 2000

Indexed: 2K-3K Jobs/sec

? ? ? ? ? ?

Thor Roxie

Block Based File Based

Page 4: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

What Is ECL?ECL (Enterprise Control Language) is a C++ based query language for use with HPCC Systems Big Data platform. ECLs syntax and format is very simple and easy to learn.!!

Note - ECL is very similar to Hadoop’s pig ,but!more expressive and feature rich.

Page 5: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Comparing ECL to General Programming

ECLGeneral

In this presentation you will see how in ECL loading and querying data is just like reading and finding data in a plain text file.!

general programming (general common logic)!vs.!

ECL

ECL Code HEREGeneral Code HERE

Page 6: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Example Text File

Kevin CA 45 Mark MI 27 Sara FL 64

Name State Age

Customer Data May 2010

~/cdata_2010.txt!example file name

~/hpcc::cdata_2010.txt=ECL example file distributed in HPCC cluster

Page 7: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

d = fopen(‘~/cdata_2010.txt’)

Opening File: general programming vs ECL

ECLGeneral

File Location

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

Page 8: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

d = fopen(‘~/cdata_2010.txt’)

Opening File: general programming vs ECL

ECLGeneral

File Location

Open File Function

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

Page 9: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Organizing: general programming vs ECL

new_d = split( d ,“\r\n”)

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

Split Data(d) by Rowd := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

Page 10: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Organizing: general programming vs ECL

new_d = split( d ,“\r\n”)

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

ECLGeneral

Use This Schema on this file!to Give Structure to Data

Kevin CA 45 Mark MI 27 Sara FL 64

Split Data(d) by Rowd := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

Page 11: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Find “Sara”: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Split Data by Column

Page 12: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Find “Sara”: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Split Data by Column

Filter Data By

Page 13: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Find “Sara”: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Split Data by Column

Filter Data By

Output

Page 14: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Find “Sara”: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}

sara := d(Name = ‘Sara’);

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Split Data by Column

Filter Data By

Output

Page 15: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Find “Sara”: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = split(row,“ ”)!! if(new_row[0] == ‘Sara’){!! ! print ”Found Sara”!! }!}

sara := d(Name = ’Sara’);

OUTPUT(sara);

ECLGeneral

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Split Data by Column

Filter Data By

Output

Page 16: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Find “Sara” & Older then 50: general programming vs ECL

cs := RECORD!! STRING20 Name;!! STRING2 State;!! INT3 Age;!END

for(x = 0; x< 3; x++){!! row = new_d[x]!! new_row = row.split(“ ”)!! if(new row[0] == ‘Sara’ and row[2] >50){!! ! print ”Found Sara”!! }!}

sara := d(Name = ‘Sara’ AND Age > 50);

OUTPUT(sara);

ECLGeneral

d := DATASET(‘~hpcc::cdata_2010’,cs,THOR);

d = fopen(‘~/cdata_2010.txt’)

new_d = split( d ,“\r\n”)

Kevin CA 45 Mark MI 27 Sara FL 64

0 1 2

Page 17: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

ECL is EZ•Make your own functions & libraries in ECL.!•Modularize your code with “Import”: reuse old code

Machine Learning Built-in

http://hpccsystems.com/ml

Page 18: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

ECL Plugin for Eclipse IDE

http://hpccsystems.com/products-and-services/products/plugins/eclipse-ide

Page 19: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

ECL + Others Languages

ECL is C++ based so all your C/C++ code can be used in ECL.!&!

Use other languages and methods like below to query too.

Page 20: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

ECL GUIDEhttp://hpccsystems.com/download/docs/ecl-language-referenceJOIN!

MERGE!LENGTH!REGEX!

ROUND!SUM!

COUNT!TRIM!WHEN!

AVE!ABS!

CASE!DEDUP!

NORMALIZE!DENORMALIZE!

IF!SORT!

GROUP!more ….

Page 21: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

Query with Plain SQL

http://www.slideshare.net/FujioTurner/meet-up-sqldemopp

For More HPCC “How To’s” Go to

http://www.slideshare.net/hpccsystems/jdbc-hpcc

SQL TO ECLor

Page 22: HPCC Systems - ECL for Programmers - Big Data - Data Scientist

http://www.youtube.com/watch?v=8SV43DCUqJg

Watch how to install HPCC Systems

in 5 Minutes

Download HPCC Systems Open Source

Community Edition

or

Source Codehttps://github.com/hpcc-systems

http://hpccsystems.com/download/