wannaeat: a computer vision-based, multi-platform restaurant lookup app

42
Oct.28.2017 Steve, Cedric, Liboon Rakuten, Inc.

Upload: rakuten-inc

Post on 21-Jan-2018

78 views

Category:

Technology


3 download

TRANSCRIPT

Oct.28.2017

Steve, Cedric, Liboon

Rakuten, Inc.

2

3

Stevche Radevski

Msc Software Engineering

Working on internal business

support/analytics tools.

Mainly working with JavaScript

(React, React Native, NodeJS)

Cedric Konan

Msc Ubiquitous Computing

Working on Dining services

tools.

Mainly working with Java but

a php lover

Tan Li Boon

Bsc Computer Science

Working on big data with

dev-ops.

Mainly working with Java

(Spark, Couchbase, Hadoop)

4

5

Given 3 months to do anything we want!

Every Friday.

Any topic.

Work however we want.

FREEDOM!

6

Play around with Machine Learning

Do something differently

Utilize everyone’s specialized skills and learn from each other

Multiplatform mobile app development

7

8

• ’

9

11

12

13

14

Fast Development

Multi-platform

Easily-interfaceable

Plug and play

15

A JavaScript based framework to build native, multi-platform mobile applications

1487 official contributors on GitHub

1000+ mobile apps created with it

16

Easy to use (especially if you know React)

Based on JavaScript

Uses markup language syntax

Reusable Components

Very well supported (documentation, active community)

17

App Result list

Star Rating Component

Row component

List View

18

19

20

Recognize what a restaurant sells based on their

food pictures!

21

22

23

No billing surprises

Demonstration of expertise

Specialized Models

A technology company should maintain in-house infrastructure

24

References:

Yoshiyuki Kawano and Keiji Yanai,

The University of Electro-

Communications, Tokyo, Japan

256 food categories, 100

images per category

UEC FOOD-256

25

Inception-v3 is a convolution-based neural network (ConvNet).

It takes 2 to 3 weeks on multiple GPUs to train a ConvNet from zero!

If you need to tweak the network parameters, you have to re-train the whole thing.

Clearly this is not reasonable.

Enter Transfer Learning…

26

27

Use the outputs of another trained network as generic image feature detectors,

and train a new shallow model using these outputs.

softmax

conv2

conv1

Images and tags

loss

softmax

conv2

conv1

Images and tags

loss

Original Target

Pre-trained

30

31

32

Google Vision API

Mine Restaurants Data

Backend API

Vision API

WannaEat Vision

Restaurants.json

Get Tags Per Image

Per Restaurant

Image – Tag

Dictionary

1

2

3

4

5

7

8

6

Store reduced

tags per restaurant

Get 1000 restaurants from Google Maps

Generate tags list

per image

33

Mobile

Google Vision API

Backend API

Vision API

WannaEat Vision

Restaurants.json

User uploaded Image

Image or Tag

Top tag

for image

Matching

Restaurants

34

35

36

Collocation is great (no communication friction)

Freedom: Possibility to choose topic and tech stack.

Small team, so no need for long meetings, tickets, wikis (Trello and p2p talk).

Clear system boundaries with clear interfaces between each boundary.

Well-defined responsibilities (but still helping each other)

37

38

Nothing, we are that good!

39

Took some time to remember what we did the last week

Collecting data for restaurants was time-consuming

Vision processing took equally long.

Spent half the time to figure out what problem to tackle

40

41

Of course we did!

Work in small teams!

Have clearly defined responsibilities

Do one thing at a time (switching between projects takes mental effort)

Machine Learning is not difficult anymore (many API as a service providers)

Rethink best practices (both development and UI/UX)