[nyjavasig] riding the distributed streams - feb 2nd, 2017
TRANSCRIPT
![Page 1: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/1.jpg)
This slide blank on purpose
![Page 2: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/2.jpg)
Riding the Distributed Streams
#nyjavasig #hazelcastjet#java8
![Page 3: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/3.jpg)
> whoami• Solutions Architect
@Hazelcast
• Hang out with awesome people
• @gamussa in internetz
Please, follow me in TwitterI’m very interesting ©
![Page 4: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/4.jpg)
Agenda
• Refreshing knowledge on Java 8 Streams
• Distribute and Conquer
• Distributed Data
• Distributed Streams
• How we did all this
![Page 5: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/5.jpg)
Java 8 Streams
![Page 6: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/6.jpg)
Java 8 Streams…• An abstraction represents a sequence of
elements
• Is not a data structure
• Convey elements from a source through a pipeline of operations
• Operation doesn’t modify a source
![Page 7: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/7.jpg)
Why I should care about Stream API?
• You’re Java developer
![Page 8: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/8.jpg)
What does regular Java developer think about Scala?advanced
![Page 9: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/9.jpg)
Why I should care about Stream API?
• You’re Java developer
• Many Java developers know Java
• It’s all about data processing
![Page 10: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/10.jpg)
java.util.stream operations
• map(), flatMap(), filter()
• reduce(), collect()
• sorted()
Intermediate operation
Terminal operation
Blocking operation
![Page 11: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/11.jpg)
![Page 12: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/12.jpg)
![Page 13: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/13.jpg)
![Page 14: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/14.jpg)
Problem
• One does not simply put all Big Data in one machine
![Page 15: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/15.jpg)
Problem
• Data doesn’t fit just one machine
ONE DOES NOT SIMPLY
FIT BIG DATA IN ONE MACHINE
![Page 16: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/16.jpg)
Problem
• One does not simply put all Big Data in one machine
• Data is too important to have it only one machine
![Page 17: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/17.jpg)
EXCUSE ME,
COULD YOU SPARE A MOMENT TO TALK ABOUT
DATA DISTRIBUTION
![Page 18: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/18.jpg)
CACHES
REPLICATION
SHARDING
![Page 19: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/19.jpg)
Replication on Sharding?
http://book.mixu.net/distsys/single-page.html
![Page 20: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/20.jpg)
Solution
• Use Distributed Map aka IMap
![Page 21: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/21.jpg)
What’s Hazelcast IMDG?• In-memory Data Grid• Apache v2 Licensed• Distributed
• Caches (IMap, JCache)• Java Collections (IList, ISet, IQueue)• Messaging (Topic, RingBuffer)• Computation (ExecutorService, M-R)
![Page 22: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/22.jpg)
Scale-Out Computing
![Page 23: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/23.jpg)
Scale-Up Computing
![Page 24: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/24.jpg)
I DON’T ALWAYS BACKUP THE DATA
BUT WHEN I DO I BACKUP IT IN-MEMORY
![Page 25: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/25.jpg)
GreenPrimary
GreenBackup
GreenShard Dat
a
![Page 26: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/26.jpg)
Can I haz some code?
![Page 27: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/27.jpg)
27
Problem
• Lambda serialization
![Page 28: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/28.jpg)
28
![Page 29: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/29.jpg)
29
Solution
• serializable version of the interfaces
• Introducing DistributedStream
![Page 30: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/30.jpg)
30
![Page 31: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/31.jpg)
Can I haz some code?
![Page 32: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/32.jpg)
32
Jet Streams
![Page 33: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/33.jpg)
![Page 34: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/34.jpg)
34
What’s Hazelcast Jet?• General purpose distributed data
processing framework
• Based on Direct Acyclic Graph to model data flow
• Built on top of Hazelcast IMDG
• Comparable to Apache Spark or Apache Flink
![Page 35: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/35.jpg)
![Page 36: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/36.jpg)
36
DAG
![Page 37: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/37.jpg)
37
Job Execution
![Page 38: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/38.jpg)
![Page 39: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/39.jpg)
Future (It’s bright!)• Memory module for processing big data
• Higher level streaming and batching APIs
• Reactive Streams
• Distributed Classloading
• Integrations (HDFS/Yarn/Mesos)
![Page 40: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/40.jpg)
Your fuel, our Jet Engine• Public release – Feb 7th.
• Developer Preview today - yay!
• http://hazelcast.org/jet-signup
• Send me a note [email protected]
• Follow @hazelcast and @gamussa (duh!!)
• Your questions #hazelcast #hazelcastjet
![Page 41: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/41.jpg)
Conclusion• Java Stream API provides very white range of
data processing tools
• War And Piece – is a Big (a lot of data) Book!
• Now we’re pretty sure that Andrew and Pierre are the main characters
![Page 42: [NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017](https://reader035.vdocuments.us/reader035/viewer/2022070523/58ed3dd71a28ab8f478b45ff/html5/thumbnails/42.jpg)
#nyjavasig #hazelcastjet#java8
@gamussa
http://bit.ly/jet-streams-code
http://hazelcast.org/jet-signup