big data scala by the bay: interactive spark in your browser
TRANSCRIPT
![Page 2: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/2.jpg)
GOALOF HUEWEB INTERFACE FOR ANALYZING DATA WITH APACHE HADOOP
SIMPLIFY AND INTEGRATE
FREE AND OPEN SOURCE
—> WEB “EXCEL” FOR HADOOP
![Page 3: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/3.jpg)
VIEW FROM30K FEET
Hadoop Web Server
You, your colleagues and even that friend that uses IE9 ;)
![Page 4: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/4.jpg)
WHY SPARK?
SIMPLER (PYTHON, STREAMING, INTERACTIVE…)
OPENS UP DATA TO SCIENCE
SPARK —> MR
Apache Spark
Spark Streaming
MLlib(machine learning)
GraphX(graph)
Spark SQL
![Page 5: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/5.jpg)
![Page 6: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/6.jpg)
![Page 7: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/7.jpg)
WHYIN HUE? MARRIED WITH FULL HADOOP ECOSYSTEM (Hive Tables, HDFS, Job Browser…)
![Page 8: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/8.jpg)
WHYIN HUE? Multi user, YARN, Impersonation/SecurityNot yet-another-app-to-install
...
![Page 9: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/9.jpg)
• It works
HISTORYV1: OOZIE THE GOOD
• Submit through Oozie
• Slow
THE BAD
![Page 10: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/10.jpg)
• It works better
HISTORYV2: SPARK IGNITER THE GOOD
• Compiler Jar
• Batch
THE BAD
![Page 11: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/11.jpg)
• It works even better
• Scala / Python / R shells
• Jar / Py batches
• Notebook UI
• YARN
HISTORYV3: NOTEBOOK THE GOOD
• Still new
THE BAD
![Page 12: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/12.jpg)
GENERALARCHITECTURE
Livy
Spark
Spark
Spark
YARN
Backend partWeb part
![Page 13: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/13.jpg)
GENERALARCHITECTURE
Livy
Spark
Spark
Spark
YARN
Backend partWeb part
![Page 14: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/14.jpg)
Notebook with snippets
WEBARCHITECTURE
Server
Spark
ScalaCommon API
Pig Hive
Livy … HS2
Scala
Hive
Specific APIs
AJAXcreate_session()execute()…
REST Thrift
OpenSession()ExecuteStatement()
/session/sessions/{sessionId}/statements
![Page 15: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/15.jpg)
LIVY SPARK SERVER
![Page 16: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/16.jpg)
• REST Web server in Scala
• Interactive Spark Sessions and Batch Jobs
• Type Introspection for Visualization
• Running sessions in YARN local
• Backends: Scala, Python, R
• Open Source: https://github.com/cloudera/hue/tree/master/apps/spark/java
LIVYSPARK SERVER
![Page 17: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/17.jpg)
LIVY WEB SERVERARCHITECTUREYARN
MasterSpark Client
YARNNode
SparkInterpreter
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
Livy Server
Scalatra
Session Manager
Session
![Page 18: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/18.jpg)
LIVY WEB SERVERARCHITECTURE
Livy Server
YARN Master
Scalatra
Spark Client
Session Manager
Session
YARNNode
SparkInterpreter
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
![Page 19: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/19.jpg)
LIVY WEB SERVERARCHITECTUREYARN
MasterSpark Client
YARNNode
SparkInterpreter
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
Livy Server
Scalatra
Session Manager
Session
![Page 20: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/20.jpg)
LIVY WEB SERVERARCHITECTUREYARN
MasterSpark Client
YARNNode
SparkInterpreter
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
Livy Server
Scalatra
Session Manager
Session
![Page 21: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/21.jpg)
LIVY WEB SERVERARCHITECTUREYARN
MasterSpark Client
YARNNode
SparkInterpreter
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
4Livy Server
Scalatra
Session Manager
Session
![Page 22: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/22.jpg)
LIVY WEB SERVERARCHITECTUREYARN
MasterSpark Client
YARNNode
SparkInterpreter
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
4
5
Livy Server
Scalatra
Session Manager
Session
![Page 23: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/23.jpg)
LIVY WEB SERVERARCHITECTUREYARN
MasterSpark Client
YARNNode
SparkInterpreter
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
4
5
6Livy Server
Scalatra
Session Manager
Session
![Page 24: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/24.jpg)
LIVY WEB SERVERARCHITECTUREYARN
MasterSpark Client
YARNNode
SparkInterpreter
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1 7
2
3
4
5
6Livy Server
Scalatra
Session Manager
Session
![Page 25: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/25.jpg)
SESSION CREATIONAND EXECUTION
% curl -XPOST localhost:8998/sessions \ -d '{"kind": "spark"}'{ "id": 0, "kind": "spark", "log": [...], "state": "idle"}
% curl -XPOST localhost:8998/sessions/0/statements -d '{"code": "1+1"}'{ "id": 0, "output": { "data": { "text/plain": "res0: Int = 2" }, "execution_count": 0, "status": "ok" }, "state": "available"}
![Page 26: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/26.jpg)
LIVY INTERPRETERSScala, Python, R…
![Page 27: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/27.jpg)
INTERPRETERS
• Pipe stdin/stdout to a running shell
• Execute the code / send to Spark workers
• Perform magic operations
• One interpreter by language
• “Swappable” with other kernels (python, spark..)
Interpreter
> println(1 + 1)2
println(1 + 1)
2
![Page 28: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/28.jpg)
INTERPRETER FLOW
CURL
Hue
Livy Server Livy Session Interpreter
1+1
2
{ “data”: { “application/json”: “2” }}
1+1
2
![Page 29: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/29.jpg)
INTERPRETER FLOW CHART
Receive lines Split lines
Send outputto server
Success
Incomplete Merge withnext lineError
Execute LineMagic!
Linesleft?
Magic line?
No
Yes
NoYes
![Page 30: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/30.jpg)
LIVY INTERPRETERS
trait Interpreter { def state: State def execute(code: String): Future[JValue] def close(): Unit}
sealed trait State case class NotStarted() extends State case class Starting() extends Statecase class Idle() extends Statecase class Running() extends Statecase class Busy() extends Statecase class Error() extends Statecase class ShuttingDown() extends Statecase class Dead() extends State
![Page 31: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/31.jpg)
LIVY INTERPRETERS
trait Interpreter { def state: State def execute(code: String): Future[JValue] def close(): Unit}
sealed trait Statecase class NotStarted() extends Statecase class Starting() extends Statecase class Idle() extends Statecase class Running() extends Statecase class Busy() extends Statecase class Error() extends Statecase class ShuttingDown() extends Statecase class Dead() extends State
![Page 32: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/32.jpg)
SPARK INTERPRETER
class SparkInterpeter extends Interpreter { … private var _state: State = NotStarted() private val outputStream = new ByteArrayOutputStream() private var sparkIMain: SparkIMain = _ def start() = { ... _state = Starting() sparkIMain = new SparkIMain(new Settings(), new JPrintWriter(outputStream, true)) sparkIMain.initializeSynchronous() ...
Interpreter
new SparkIMain(new Settings(), new JPrintWriter(outputStream, true))
![Page 33: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/33.jpg)
SPARK INTERPRETER
private var sparkContext: SparkContext = _def start() = { ... val sparkConf = new SparkConf(true) sparkContext = new SparkContext(sparkConf) sparkIMain.beQuietDuring { sparkIMain.bind("sc", "org.apache.spark.SparkContext", sparkContext, List("""@transient""")) } _state = Idle()}
sparkIMain.bind("sc", "org.apache.spark.SparkContext",sparkContext, List("""@transient"""))
![Page 34: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/34.jpg)
EXECUTING SPARKprivate def executeLine(code: String): ExecuteResult = { code match { case MAGIC_REGEX(magic, rest) => executeMagic(magic, rest) case _ => scala.Console.withOut(outputStream) { sparkIMain.interpret(code) match { case Results.Success => ExecuteComplete(readStdout()) case Results.Incomplete => ExecuteIncomplete(readStdout()) case Results.Error => ExecuteError(readStdout()) } ...
case MAGIC_REGEX(magic, rest) =>
case _ =>
![Page 35: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/35.jpg)
INTERPRETER MAGIC
private val MAGIC_REGEX = "^%(\\w+)\\W*(.*)".r
private def executeMagic(magic: String, rest: String): ExecuteResponse = { magic match { case "json" => executeJsonMagic(rest) case "table" => executeTableMagic(rest) case _ => ExecuteError(f"Unknown magic command $magic") }}
case "json" => executeJsonMagic(rest) case "table" => executeTableMagic(rest) case _ => ExecuteError(f"Unknown magic command $magic")
![Page 36: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/36.jpg)
INTERPRETER MAGICprivate def executeJsonMagic(name: String): ExecuteResponse = { sparkIMain.valueOfTerm(name) match { case Some(value: RDD[_]) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value.asInstanceOf[RDD[_]].take(10))))
case Some(value) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value)))
case None => ExecuteError(f"Value $name does not exist") }}
case Some(value: RDD[_]) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value.asInstanceOf[RDD[_]].take(10))))
case Some(value) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value)))
![Page 37: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/37.jpg)
TABLE MAGIC
"application/vnd.livy.table.v1+json": { "headers": [ { "name": "count", "type": "BIGINT_TYPE" }, { "name": "name", "type": "STRING_TYPE" } ], "data": [ [ 23407, "the" ], [ 19540, "I" ], [ 18358, "and" ], ... ]}
val lines = sc.textFile("shakespeare.txt");val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) }%table counts%table counts
![Page 38: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/38.jpg)
TABLE MAGIC
"application/vnd.livy.table.v1+json": { "headers": [ { "name": "count", "type": "BIGINT_TYPE" }, { "name": "name", "type": "STRING_TYPE" } ], "data": [ [ 23407, "the" ], [ 19540, "I" ], [ 18358, "and" ], ... ]}
val lines = sc.textFile("shakespeare.txt");val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) }%table counts
![Page 39: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/39.jpg)
JSON MAGIC
val lines = sc.textFile("shakespeare.txt");val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) }%json counts
{ "id": 0, "output": { "application/json": [ { "count": 506610, "word": "" }, { "count": 23407, "word": "the" }, { "count": 19540, "word": "I" }, ... ] ...}%json counts
![Page 40: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/40.jpg)
JSON MAGIC
val lines = sc.textFile("shakespeare.txt");val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) }%json counts
{ "id": 0, "output": { "application/json": [ { "count": 506610, "word": "" }, { "count": 23407, "word": "the" }, { "count": 19540, "word": "I" }, ... ] ...}
![Page 41: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/41.jpg)
• Stability and Scaling• Security• iPython/Jupyter backends
and file format
COMING SOON
![Page 42: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/42.jpg)
DEMO TIME
![Page 43: Big Data Scala by the Bay: Interactive Spark in your Browser](https://reader036.vdocuments.us/reader036/viewer/2022062412/587332221a28ab596c8b6ceb/html5/thumbnails/43.jpg)
@gethue
USER GROUP
hue-user@
WEBSITE
http://gethue.com
LEARN
http://learn.gethue.com
THANKS!