webinar: mongodb connector for spark
TRANSCRIPT
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
MongoDB Connector For Spark@blimpyacht
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
HDFS
Distributed Data
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Spark Stand Alone
YARN
Mesos
HDFS
Distributed Resources
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
YARN
SparkMesos
HDFS
Spark Stand Alone
Hadoop
Distributed Processing
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
YARN
SparkMesos
Hive
Pig
HDFS
Hadoop
Spark Stand Alone
Domain Specific Languages
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
YARN
SparkMesos
Hive
Pig
SparkSQL
Spark Shell
SparkStreaming
HDFS
Spark Stand Alone
Hadoop
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
YARN
SparkMesos
Hive
Pig
SparkSQL
Spark Shell
SparkStreaming
Spark Stand Alone
Hadoop
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Stand AloneYARN
SparkMesos
SparkSQL
SparkShell
SparkStreaming
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Stand AloneYARN
SparkMesos
SparkSQL
SparkShell
SparkStreaming
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
executor
Worker Node
executor
Worker Node Master
Spark Connector
Driver Application
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Parellelize
Parellelize
Parellelize
Parellelize
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Transformationsfilter( func )union( func )intersection( set )distinct( n )map( function )
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Actionscollect()count()first()take( n )reduce( function )
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Result
Result
Result
Result
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Result
Result
Result
Result
Lineage
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Result
Result
Result
Result
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Using the Connector
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
https://github.com/mongodb/mongo-spark
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
http://spark.apache.org/docs/latest/
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
{
"_id" : ObjectId("578be1fe1fe699f2deb80807"),
"user_id" : 196,
"movie_id" : 242,
"rating" : 3,
"timestamp" : 881250949
}
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
./bin/spark-shell \ --conf \
"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
./bin/spark-shell \ --conf \
"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
./bin/spark-shell \ --conf \
"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
./bin/spark-shell \ --conf \
"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
import com.mongodb.spark._import com.mongodb.spark.rdd.MongoRDDimport org.bson.Document
val rdd = sc.loadFromMongoDB()for( doc <- rdd.take( 10 ) ) println( doc )
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Read Config Write Config
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Aggregation Filters$match | $project | $group
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
JSONJSONJSONJSONJSONJSONJSONJSONJSONJSONJSON
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
JSONJSONJSONJSONJSONJSONJSONJSONJSONJSONJSON
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
val aggRdd = rdd.withPipeline( Seq( Document.parse( "{ $match: { Country: \"USA\" } }" ) ) )
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Spark SQL + Dataframes
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
RDD + Schema = Dataframe
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
JSONJSONJSONJSONJSONJSONJSONJSONJSONJSONJSON
$sample
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Data Locality mongos
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
Courses and Resources
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
https://university.mongodb.com/courses/M233/about
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
https://databricks.com/blog
MongoDB Connector Highlights
•
•
•
•
•
•
DataSource
Use Cases
1 Real-Time
•
•
•
•
Use Cases
2 Batch
•
•
•
•
Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb
THANKS!@blimpyacht