building an an ai startup with mongodb at x.ai
TRANSCRIPT
x.ai a personal assistant who schedules meetings for you
MONGO WORLD 2015 NEW YORK CITY VISIT X.AI TO JOIN THE WAITLIST
Using MongoDB at x.ai
varun vijayaraghavan@xdotai
Magically Schedule Meetings
Just Cc: [email protected]
Pain Solution Jane John Jane [email protected] John
CC: Amy @ x.ai“Amy, please set something up for Jane and I next week.”
Product Characteristics
● Need quick response
● Supervised Learning requires large training data set
● # meetings scale linearly with # users
● 1 user meets with N people
● People share meeting, places, times and company
Technical challenges
● Natural language understanding with extremely high accuracy
● Natural conversation over email with people
● Complex data relationship
● Optimize for sparse data
● Speed of development and change
Stack
Queue Based Architecture
Modeling Meetings
Meetings
People
Places Times
1:N and N:N relationships across various collections
Modeling Meetings{
host : Participant,guests : [Participant],time : { start : Date,
end: Date, recurring: String},
timezone : String,duration : Number,locations : [Location],timeInitiated : Date,timeRescheduled: [Date],timeCompleted: Date,status : String,landmarks: [Landmark],activities: [Activity]…...
}
Embedding vs. Referencing
{ host : { name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], …... }, travelTime : String, status : String, timezone : String, duration : Number, …...}
{ host : Participant, travelTime : String, status : String, timezone : String, duration : Number, …...}
Participant { name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], …... },
Embedding ReferencingConsiderations
● Query patterns
● Access to embedded doc
● # references to a doc
● Application level join
● 1-way or 2-way referencing
Assistant is a PERSON Assistant is an Attribute of PERSON
Assistant is a PROFILE, a separate and smaller
entity
Modeling someone’s assistant1st try 2nd try 3rd try
{ name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String] …...}
{ name : { first : String, last: String }, primaryEmail : String }
{ name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], assistant : { name : {.....}, primaryEmail : String } …...}
Dealing with schema changes
Issues
● Inconsistent character offsets
● Inconsistent time representation
● Improper sent date (yr 2026)
● Key info not saved
Fixes
● Recalculate character offsets
● Reconstruct time entities
● Recalculate timezone based on context
● Filter out unsalvageable data
Schemaless vs. Typesafe
● Discrepancy between models in Scala versus stored data in mongo.
○ MongoDB stores free form nested structures
○ Scala relies on strict, type safed models for data
● Models in flux
○ Data extraction (write) vs. decision making (read) formats
○ We are continuously learning new edge case about scheduling meetings
Mongoose
● A very nice object modeling and querying library for mongodb with node.js.
● Client side schema layer
● Useful features
○ Data validation
○ “Virtual” fields
○ Nested documents with automatic _id reference
○ Client side joins across collections and databases
Interactive Applications Architecture
Internal API powered by express-restify-mongo provides a plug-n-play API layer for mongoose and MongoDB
Model evolution with mongoose
{ host : Participant guests : [Participant] ... activity: String ...}
Meeting Format in Mongo
Activity Model
New Scala Model for Meeting
case class Meeting (host: Participantguests:
List[Participant]...activities:
List[Activity]...
)
case class Meeting (host: Participantguests:
List[Participant]...activity: String...
)
case class Activity (id: String,activity: String
)
Initial Scala Model for Meeting
Ease transition with Virtual field
MeetingSchema.virtual( "activities" ).get( function() { var virtualActivity = { id: randomString(), activity: this.activity } return [virtualActivity];});
Feeding data science
ML training architecture
Using Mongo for ML: What we need
● Heavy on experimentation○ Need to train on different parameters, algorithms, data sets
● Nested data sets that are dependent on the specific experiment○ Emails have several feature types, each of which have several
parameters
● Mongo models to work well with Scala models
Using Mongo for ML: The beginning
{"messageId":"<somerandommessage@xxxx>",
"classes" : ["A", "B", "C"],
"featureVector" : [ [
"OneGrams", [ { "work" : 3 }, { "that" : 1 }, … ] ],
[ "TwoGrams",
[ { "that work" : 1 }, { "you then" : 1 }, ]],...
}
case class Features (messageId: String,classes: List[String],featureVector:
List[Tuple2[String, List[Map[String, Integer]]]])
Scala
Mongo
Mongo for ML: Make it work with Scala
{ "messageId" : "<somerandommessage@xxxx>", "classifierLabels" : [ { "classifier" : "MeetingClassifier", "class" : "A" } ], "featureVector" : [ { "featureType" : "TaggedOneGram", "featureKeys" : [ { "featureKey" : "work", "featureValue" : 3 }, { "featureKey" : "that", "featureValue" : 1 }, ... ] }, ... ]}
case class Features (messageId: String,classes: List[Classes],featureVector:
List[FeatureVectors])
case class Classes (classifier: String,class: String
)
case class FeatureVectors(featureType: String,featureKeys: List[FeatureKeys]
)
case class FeatureKey(featureKey: String,featureValue: 1
)
ScalaMongo
varun @ human.x.aibackend engineer
25 Broadway. 9th FloorNew York, 10004 NY
E: [email protected]: @xdotai
Visit x.ai to join the waitlist