adaptive blue java nyc meetup

22
AdaptiveBlue @ Java NYC Meetup April 20, 2009 Alex Iskold, Founder/CEO http://getglue.com

Upload: alex-iskold

Post on 10-May-2015

617 views

Category:

Technology


0 download

DESCRIPTION

Presentation of Glue, http://getglue.com, a browser addon made by AdaptiveBlue. In depth discussion of how we use Amazon Web Services and Semantic Algorithms.

TRANSCRIPT

Page 1: Adaptive Blue Java Nyc Meetup

AdaptiveBlue @Java NYC Meetup

April 20, 2009

Alex Iskold, Founder/CEOhttp://getglue.com

Page 2: Adaptive Blue Java Nyc Meetup

Agenda

About AdaptiveBlue Glue: The Network of People and Things Glue: Building on Amazon Web Services Glue: Semantic Technology Stack

Page 3: Adaptive Blue Java Nyc Meetup

About AdaptiveBlue

Founded in 2006, based in New York

Funded by USV and RRE

Focuses on enhancing browsing experience

Launched BlueOrganizer and Glue add-ons forFirefox and SmartLinks Widgets for blogs

Page 4: Adaptive Blue Java Nyc Meetup

Get Glue. The Network ThatSticks With You.

http://getglue.com

Page 5: Adaptive Blue Java Nyc Meetup

What is Glue?

Glue is a contextual network that usessemantic technology to automaticallyconnect people around everyday things -books, music, movies, stars, artists,stocks, wine, restaurants and more.

Page 6: Adaptive Blue Java Nyc Meetup

1. Contextual: Glue is distributed and appearswhen it makes sense on popular sites.

2. Automatic: Users participate in Glue just bybrowsing their favorite sites.

3. Simple: Glue removes the friction involvedin networking - the network comes to you.

Page 7: Adaptive Blue Java Nyc Meetup

Glue Demo

Page 8: Adaptive Blue Java Nyc Meetup

Glue:Building on Amazon

Web Services

Page 9: Adaptive Blue Java Nyc Meetup

AWS-based Architecture

Host N (EC2)

Amazon SimpleDB

Interactions betweenPeople and Things

Rackspace MySQL

User accountsAnalytics

Glue Web Service

Host 1 (EC2)

Glue Web Service

Batch Services

. . .

Amazon S3

Object Database/People Profiles

Database Layer

Web Service Layer

Load Balancer 1 Load Balancer 2

Round Robin DNSLoad Balancer Layer

Browser Add-Ons Widgets iPhones Facebook Apps API Clients

Client Layer

Batch Services

Page 10: Adaptive Blue Java Nyc Meetup

AdaptiveBlue AWS Stack

Relating People and Things ( SimpleDB )

Records of people’s interactions around things are stored in SimpleDB Domains using duplication for fast access.

Transactional and Batch Support ( EC2 )

Web Service Requests and batches are distributed through EC2 instances.

Storing Object Meta Data ( S3 )

XML representation of millions of books, music, movies, etc. is stored using Amazon S3

Page 11: Adaptive Blue Java Nyc Meetup

Client

Idea:

Create flat database with auto-indexed tables.

Main Features:

Each attribute is indexed. Record structure is flexible. Basic operators in queries Supports sorting.

Simple DB DomainRecord 1

Put recordGet recordQuery records

Key1 Attributes: A1,A2…

Record NKey2 Attributes: A1,A2…

Amazon SimpleDB in a Nutshell

Page 12: Adaptive Blue Java Nyc Meetup

Object Domains

Interaction RecordKey1 Attributes: A1,A2…

How Glue uses SimpleDB

Each record is duplicated into Object and Person Domain The Key is a combination of USER_ID and OBJECT_KEY Djb2hash is used to calculate the domain for each record

Records for each USER and each OBJECT inside the same domain.

OD2OD1 ODN…

People Domains

PD2PD1 PDN…

Page 13: Adaptive Blue Java Nyc Meetup

Client

Idea:

Put/Get objects into bucketsbased on unique keys.

Main Features:

Public/Private access. Support for large objects.

Amazon S3

Bucket 1 Bucket N…

Put object Get object

Amazon S3 in a Nutshell

Page 14: Adaptive Blue Java Nyc Meetup

Object Bucket

How Glue Uses S3

XML-files with object information

People Bucket

XML-files with user and friends info

XML is serialized as string and written to S3 Each file has a unique key: OBJECT_ID or USER_ID/profile, etc.

Page 15: Adaptive Blue Java Nyc Meetup

MachineImage

(OS + Apps)

Usage:

Create Machine Image Deploy the image to S3 Start 1 or more instances Use it as regular machine(s)

Main Options:

Dynamic/Static IPS Choose cores Choose locations Persistence via EBS

Amazon EC2 in the Nutshell

Page 16: Adaptive Blue Java Nyc Meetup

How Glue uses EC2

Host N (EC2/Rackspace)

Glue Web Service

Host 1 (EC2/Rackspace)

Glue Web Service

Batch Services

. . .

Load Balancer 1 Load Balancer 2

Round Robin DNS

Batch Services

Web Service processes transactional requests Batch Services are time-based & run on sets of USERS and OBJECTS

The system scales by equally partitioning Data and Requests

Page 17: Adaptive Blue Java Nyc Meetup

Glue:Semantic

Technologies Stack

Page 18: Adaptive Blue Java Nyc Meetup

Semantic Technology Stack

Concept Definition

Server-based XML schemas for things (nouns): books, music, movies, stocks, wines, recipes, etc.

Recognition Algorithms

Recognition of things in Pages, Links and Text

Identity Algorithms

Correlation of the same thing from different pages across the web.

Page 19: Adaptive Blue Java Nyc Meetup

1. XML-based: A schema file resides on theserver for each type.

2. Data Composition: Each type has attributes(i.e. book has author, etc.)

3. Extensible: New types can be plugged intothe engine dynamically.

Semantic Technology Stack:Concept Definitions

Page 20: Adaptive Blue Java Nyc Meetup

1. Key-based: Each object in the system hasunique key, depending on its type:books/kite_runner/khaled_hosseini

2. Attribute-based: Keys are based on thecombination of attributes (i.e. title/author)

3. Normalized: Multiple transformations andvalidations are applied to raw text togenerate the keys.

Semantic Technology Stack:Identity Algorithms

Page 21: Adaptive Blue Java Nyc Meetup

1. Extraction: First phase of the recognition isbased on processing elements of the page:XML-based framework for parsing DOM usedboth by Java backend and JavaScript client.

2. Cleaning: Second phase of the recognition isasynchronous query of multiple web services/API.For books we query Amazon, for movies Netflix,etc. and then normalize and merge results.

3. Caching: Clean objects are cached. Misses/false-positives are patched manually.

Semantic Technology Stack:Recognition Algorithms

Page 22: Adaptive Blue Java Nyc Meetup

http://getglue.com

http://twitter/[email protected]