smart snap - report

Smart Glass

Submitted in partial fulfillment of the requirements

of the degree of

Bachelor of Engineering

by

Jay Shah, 60003100048

Pooja Shah, 60003100043

Tapan Desai, 60003100012

Supervisors:

Prof. Vinaya Sawant

Prof. Anuja Nagare

Information Technology

Dwarkadas J. Sanghvi College of Engineering, University Of Mumbai

2013-2014

Project Report Approval for B. E.

This project report entitled “Smart Glass” by Jay Shah, Pooja Shah and

Tapan Desai is approved for the degree of Information Technology.

Internal Guide (Prof. Vinaya Sawant)

Internal Examiner External Examiner

Vice Principal (Acad) and HOD, IT Dept. Principal

(Dr. A. R. Joshi) (Dr. Hari Vasudevan)

Declaration

We declare that this written submission represents our ideas in our own words and

where others' ideas or words have been included, we have adequately cited and

referenced the original sources. We also declare that we have adhered to all principles

of academic honesty and integrity and have not misrepresented or fabricated or

falsified any idea/data/fact/source in our submission. We understand that any violation

of the above will be cause for disciplinary action by the Institute and can also evoke

penal action from the sources which have thus not been properly cited or from whom

proper permission has not been taken when needed.

Jay Shah, 60003100048

----------------------------------------- -----------------------------------------

Pooja Shah, 60003100043

----------------------------------------- -----------------------------------------

Tapan Desai, 60003100012

----------------------------------------- -----------------------------------------

(Name of student and Roll No) (Signature)

Date: 28/04/2014

ACKNOWLEDGEMENTS

We are highly indebted to Dwarkadas J. Sanghvi College of Engineering for their

guidance and constant supervision as well as for providing necessary information

regarding the project & also for their support in completing the project.

We would like to express our heartfelt gratitude towards our guide Prof. Vinaya Sawant

and our co-guide Prof. Anuja Nagare for their kind co-operation and encouragement

which help us in completion of this Synopsis.

We would like to express our special gratitude and thanks to faculty of Information

Technology Department for giving us such attention and time. Our thanks and

appreciations also go to our colleagues in developing the project and people who have

willingly helped us out with their abilities.

Jay Shah

Pooja Shah

Tapan Desai

Table of Contents

1. Introduction……………………………………………………………………………………1

2. Literature Review……………………………………………………………………………..2

3. Problem Definition…………………………………………………………………………….5

4. Proposed System………………………………………………………………………………7

5. Project Management………………………………………………………………………….11

5.1 Schedule……………………………………………………………………………………...11

5.2 Project Resources…………………………………………………………………………….12

5.3 Project Estimates...…………………………………………………………………………...13

5.4 Risk Mitigation Strategy……………………………………………………………………..15

6. Project Design………………………………………………………………………………..18

6.1 System Architecture. ………………………………………………………………………...18

6.2 Module/Component Description……………………………………………………………..20

6.3 User Interface Design………………………………………………………………………...27

7. Implementation……………………………………………………………………………….30

7.1 Modules/ Component Description……………………………………………………………30

7.2 Module-wise Algorithm……………………………………………………………………...32

8. Experiments and Project Testing……………………………………………………………..34

8.1 Test plan……………………………………………………………………………………...34

8.2 Test cases……………………………………………………………………………………..35

8.3 Methods used…………………………………………………………………………………36

8.4 Test Results…………………………………………………………………………………..37

9. Maintenance………………………………………………………………………………….38

9.1 User Manual………………………………………………………………………………….38

9.2 Constraints……………………………………………………………………………………38

10. Conclusion and Future Scope………………………………………………………………...40

11. References and Bibliography………………………………………………………………...41

List of Figures

Figure 2.1: Screenshot for existing system .......................................................................................................... 3

Figure 3.1: Use your mobile ................................................................................................................................ 6

Figure 3.2: Snap a picture……………………………………………………………………............................. 6

Figure 3.3: Get related content ............................................................................................................................. 6

Figure 4.1: Architecture of Proposed System ...................................................................................................... 9

Figure 5.1: Gnatt Chart for Project Schedule ..................................................................................................... 12

Figure 6.1: Actual System Architecture based on modules ............................................................................... 20

Figure 6.2: Use Case Diagram ........................................................................................................................... 21

Figure 6.3: Activity Diagram ............................................................................................................................. 22

Figure 6.4: State transition to capture image ..................................................................................................... 23

Figure 6.5: State transition for text conversion .................................................................................................. 23

Figure 6.6: Deployment Diagram ...................................................................................................................... 24

Figure 6.7: Data Flow Diagram ......................................................................................................................... 26

Figure 6.8: Splash Screen .................................................................................................................................. 26

Figure 6.9: Smart Snap of Play Store ................................................................................................................. 27

Figure 6.10: Main Menu .................................................................................................................................... 27

Figure 6.11: Slide Menu .................................................................................................................................... 27

List of Tables

Table 1: COCOMO Model………………………………………………………………...................................................14 Table 2: Risk Mitigation Strategy………………………………………………………………........................................15 Table 3: Risk Sheet………………………………………………………………..............................................................16 Table 4: Test Case………………………………………………………………...............................................................36

Project Report: Smart Glass

Chapter 1

1. Introduction

Mobile photo + Image Recognition = Identification

A lot of existing systems exist which provide text based results for news. There are E-commerce portals

to buy products based on text search. The thing common with both the systems is a text based search.

Our project aims on eliminating text based search and give results based on visual search.

Our project add context to images. Our image recognition products help in connecting images to

relevant information. The user snaps a picture; the application tells the user what's in it.

Now that the communication has moved from text to images, technology needs to improve. The project

is based on the latest technology of visual search. The application is an android application in which it

identifies objects from images. Once objects are identified, it is able to provide additional information

about them, e.g. related web-pages, description, present statistics etc.

The application allows the user to click an image and search for it over the web using our web engine

or just execute the actions specified by them on their own spreadsheet. As we see the application is

totally customizable as per user requirements. The main features of the application includes Document

Capture, News aggregator, Real-Time Information and Visual Commerce.


2

Chapter 2

2. Literature Review

This section consists of the research conducted by the team. It includes a cursory preface of the existing

system and its drawbacks. It also gives an idea about all potential frameworks that could have been

used in building this application.

2.1 Existing Systems

Google Goggles:

Scan barcodes using Goggles to get product information

Scan QR codes using Goggles to extract information

Recognize famous landmarks

Translate by taking a picture of foreign language text

Add Contacts by scanning business cards or QR codes

Scan text using Optical Character Recognition (OCR)

Recognize paintings, books, DVDs, CDs, and just about any 2D image

Solve Sudoku puzzles

Find similar products


3

Smart Glance:

Support for secured cloud hosted service & on-premises Installation

Supports multiple mobile platforms

View Data in Rich Graphical formats

Analyse by Charts – trend & Column, Pie & Donut

Zoom In/Out on charts for better view

Tools to compare two elements instantly

Tools to compare against target/benchmark

Support for offline access

Localization in multiple languages – Spanish, German, Italian, French, Chinese &

Japanese.

Figure 1: Screenshots of existing system

2.2 Pitfalls

Google Goggles:

Redirects to the Google search engine while scanning an image instead of giving the

results on the website.

No visual commerce present. Scans the image and provides the web link for the product.

Users cannot customize the search based on their requirements. Gives all the

information, instead of providing a link which the user actually needs.

Smart Glance:


4

Provides information for all the machines instead of narrowing the search down to the

machine required.

Doesn’t include any other features.


5

Chapter 3

3. Problem Definition

Problem Definition

Smart Glass is an android application which takes in images using mobile camera. The application uses

optical character recognition and image processing to scan the images and gives results based on the search.

The application has a number of features Visual Commerce, Visual Search, Real time object statistics,

language translator, user specified spreadsheets, form processing and document capture.

The application will run on all android operating systems. The user can get any information based on the

image they’ve searched from a dedicated database.

Document Capture:

This features converts a printed document to editable text. Scanned documents can be exported to

various formats. Prevents wastage of time and resources as the user doesn’t have to transfer and type

content from images.

Visual Commerce:


6

User can purchase products directly from the application from EBay and Amazon. The user just have

to click the picture of the product they want to purchase and the application will give them results.

News Aggregator:

The application will give results based on the images the user clicks. The application will retrieve

information from the dedicated databases which will get populated automatically with the latest content

every day.

Real-Time Information:

This feature will give real time results for the captured products. This information is retrieved from a user

defined spread-sheet or database. This system can be used to retrieve information from ERP systems as

well. This feature gives users complete customization over the results.

The basic idea of the application is:

Figure 2: Use your mobile Figure 3: Snap a picture Figure 4: Get related content

The user clicks a photo from his/her smart phone. The image is scanned using an OCR. The results from

the OCR are passed as a query to our database. This database is populated everyday using crawlers. The

data matched in the database is passed back to the mobile device and displayed to the user.


7

Chapter 4

4. Proposed System

4.1 Draft of Proposed System

Project plan is basically a proposed approach to creating the application.

The basic concept of the application is that the user won’t have to type everything they want to

search. The user can just click the image of the object they want to search and the application will

give results directly.

This software can provide information such as machine specification, maintenance statistics,

production capacity, effectiveness etc. by just placing the camera on the machine. It can also help

detect real time machine speed or the temperature of the furnace just by connecting it to the

factory’s ERP system.

The application can be used by anyone. The interface is user-friendly and the user will get all the

results directly in the application. The user won’t be redirected to any of the web browser. Just by

placing the camera on any brand name will retrieve all the information about the brand along with

its specialization, web pages and various branches.

Major technologies to be used:


8

• OCR Engine

• MATLAB

• Php

• MySQL

• Eclipse

Today more than texts images get uploaded on Internet. This application makes use of images and

retrieves news and important data needed as per the user’s requirement. It helps user to manage his

own spreadsheet through this app. User can specify the action that he /she wants to perform on the

click of that particular object. After capturing a particular object user can retrieve latest news on

that object. User can also change the file format using this application. Users can even buy products

using this online.

4.2 Expected Modules:

The basic project plan is dividing the application into four modules:

Document Capture: This module will allow user to capture documents and edit them. The user

would be allowed to save them in a PDF format. This is done using an OCR engine. The text which

is read in the document is then sent to the server. The server will then send back an edited form of

the document.

Visual Search: This module will again be created using an OCR engine. This module will allow

user to click photos of nearby texts and retrieve results based on the text from the database. The

project plan includes creating our own database which will be populated using crawlers.

Visual Commerce: The basic idea of this module is to allow user to shop using our application.

The application will be connected to various API’s from top E-commerce portals. Whenever the

user clicks the photo of a nearby object they want to buy, the application will display products from

e-commerce portals.

Real-Time information: This module will include giving real-time information from a connected

spreadsheet. This module will make the application user customisable.


9

4.3 Architecture for Proposed System:

Figure 5: Architecture of Proposed System

The proposed architecture is based on a mobile device which is the central unit of the

application. The mobile device would include the OCR engine and the required Camera

Activity. The application then connects to the services of individual modules.

Visual Search: For this module the application is connected to the search engine which is

integrated with the application database.

Document Capture: This uses the Document Translator API.

Real-Time Information: The application is connected to the web services which are then

connected to the Spreadsheet and the ERP system.

4.4 Advantages of Proposed System

The application is an all-in-one system. Each module has its own advantage for the user.

They are as follows:

Document Capture: The user doesn’t have to type or re-type an entire document. The user has just

to click an image of a printed document and get results based on the text recognized. The main

advantage will be that the user will be saving a lot of time. Also the application converts the new

document into PDF format. This document could be shared using all major social networking

platforms.


10

Visual Search: The user doesn’t have to login to the web browser and search for the news. The

main advantage of this module is that it allows user to retrieve news within seconds. All the news

is aggregated and stored in a database. Again saves time for the user.

Visual Commerce: This module will allow user to shop online without searching for an item they

like over all the e-commerce portals. The top e-commerce portals are aggregated in this small

application. Thus, it acts as an one stop shopping solution. The main feature will be visual search

in e-commerce instead of traditional text-based search. The user is allowed to just click a photo of

the object they want to acquire and get the product price and availability from e-commerce portals.

Real-Time Information: This module will allow user to customise the application as per their

requirement. The application can be linked to user created spreadsheets or an ERP system. The

major advantage is that the user will get real-time information from the application.


11

Chapter 5

5. Project Management

Project management is the discipline of planning, organizing, securing and managing resources to

achieve specific goals.

Schedule

Schedule helped us to know project's milestones, activities, and deliverables, with start and

finish dates.

Project scope is defined and the appropriate methods for completing the project are determined.

The durations for the various tasks necessary to complete the work are listed.

Gnatt chart for our project is as follows:

http://en.wikipedia.org/wiki/Project

http://en.wikipedia.org/wiki/Milestone_(project_management)

http://en.wikipedia.org/wiki/Task_(project_management)

http://en.wikipedia.org/wiki/Deliverable

http://en.wikipedia.org/wiki/Scope_(project_management)

http://en.wikipedia.org/wiki/Duration_(project_management)

http://en.wikipedia.org/wiki/Task_(project_management)

http://en.wikipedia.org/wiki/Work_(project_management)


12

Figure 6: Gnatt Chart for Project Schedule

Project Resources

The project will require a limited amount of resources.

Hardware:

A mobile device running on Android Platform. The application should have a running camera.

A computer which could serve as a server for Database. The server needs to run 24x7.

Software:

Android SDK for development along with an Android IDE.

Google API

API’s from E-Commerce portals to link them to the application.

Google Spread sheets.

Other Requirements:

The mobile network should have an active internet service.

The mobile device should support Google Play Store.


13

Project Estimates

Estimation is basically identifying and acquiring necessary resources such as equipment’s,

materials, man-power etc. required for accomplishing the project successfully. Estimation

techniques used for our project are as follows:

Lines of Code

Lines of code (LOC) is a software metric used to measure the size of a computer program by

counting the number of lines in the text of the program's source code. It is typically used to

predict the amount of effort that will be required to develop a program, as well as to estimate

programming productivity or maintainability once the software is produced.

For our project, the estimated lines of code = 6.5 K

The above mentioned Lines of Code (LOC) include the following:

• Authentication to access the tool

• Code for adding/deleting questions and updating keywords of the questions

• Logic for highlighting the important sentences in the document

• Frequency calculation of the keywords in the document for report preparation

• Graph generation code to display the overall performance of the class

COCOMO Estimation Model

The Constructive Cost Model (COCOMO) is an algorithmic software cost estimation model

that computes software development effort and cost as a function of program size. Program

size is expressed in estimated thousands of source lines of code (SLOC). Basic COCOMO is

good for quick estimate of software cost.

COCOMO applies to three classes of software projects:

Organic projects: "small" teams with "good" experience working with "less than rigid"

requirements


14

Semi-detached projects: "medium" teams with mixed experience working with a mix of rigid

and less than rigid requirements

Embedded projects: developed within a set of "tight" constraints. It is also combination of

organic and semi-detached projects.

The basic COCOMO equations take the form:

Effort applied=a*(KLOC)b [man-months]

Development time=c*(effort applied)d [months]

People required = Effort applied [ count]

Development time

where, KLOC is the estimated number of delivered lines (expressed in thousands ) of code for project.

The co-efficient a, b, c, d is given in the following table:

Software Project A B C D

Organic 2.4 1.05 2.5 0.38

Semi-detached 3.0 1.12 2.5 0.35

Embedded 3.6 1.20 2.5 0.32

Table 1: COCOMO Model

Estimates of Effort, Cost, Duration:

E = a*(KLOC)b = 3.0*(6.5)1.12

= 24.411

D = c*(E)d = 2.5 *(24.411)0.35

= 7.65


15

P = E/D = 24.411

7.65

= 3

Risk Mitigation Strategy

RISK CATEGORY PROBABILITY IMPACT

Insufficient Accuracy PS High 1

Inadequate knowledge about

application

BU Low 3

Table 2: Risk Mitigation Strategy

Impact Values :

1-catastropic 2-critical

3-marginal 4-negligible

A project team begins by listing all risks in first column of table. Each risk is

categorized in the second column (PS-Project Risk, DE-Development Risk, BU-Business Risk,

and TE- Technical Risk). The probability of occurrence of each risk is entered in the next

column of the table.

RMMM Plan for each risk:

Risk Information Sheet

Project Name : Smart Glass

Risk Id:- 001 Date :- 4/8/2013 Probability :- High Impact :- catastrophic

Origin :- Jay Shah Assigned To :- Tapan Desai

Description :-

The chances of loss due to this could be 65%. Lack of accuracy will lead into incorrect results

given by blurred or tilted image and it may lead to crashing of the application.

Mitigation/Monitoring :-


16

The user can ensure accuracy by using the proper or specified megapixel of the camera in his

android smartphone. He should try to click image where there is sufficient brightness.

Management :-

Once risk becomes active then we will provide the user with the facility to edit the text if he or

she feels that the converted image is incorrect.

Status :- Still left to implement.

Approval :- Vinaya Sawant Closing Date :- 17-10-2013

Table 3: Risk Sheet


17

Risk Information Sheet

Project Name : Smart Glass

Risk Id:- 002 Date :- 15/8/2013 Probability :- Low Impact :- serious

Origin :- Tapan Desai Assigned To :- Pooja Shah

Description :-

The user has 5% chances of loss. If the user does not have prior knowledge about the working

about the application then he or she may switch to some other application if they find any.

Mitigation/Monitoring :-

The GUI(Graphical user interface) of the application would be simple enough for the user to

understand the basic operations that are to be carried out for successful working of the

application.

Management :-

Once risk becomes active then we will provide the user with the Help menu where he would

find all the steps that are to be carried out for using the application efficiently.

Status :- Still left to implement.

Approval :- Vinaya Sawant Closing Date :- 17-10-2013

Table 4: Risk Sheet 2


18

Chapter 6

6. Project Design

System Architecture

The main aim of the system is to retrieve data from the captured image. The data is then passed

on to the servers. The servers then check for relevant information of the searched module. This

information is then passed on to the mobile device which the user can read.

The architecture is based on an OCR engine and Camera Activity running on a mobile device.

The architecture is divided into various units for each type of modules

a. Visual Search: Visual search module is based on Database Technology, Crawlers and

OCR engines. The user has to click an image of the text he/she wants to search. This text

is then read by the OCR engine. The text is then passed as an query to the server. The server

consists of MySQL database which daily populated by two crawlers. The database is then

searched for the query. Once the results are found they are sent back to the mobile device.

The response is quick and the results are displayed within a span of 3 seconds. The main

feature of Visual Search is not using any readymade search engines but aggregating news

in our own database and displaying results for the user.

b. Document Capture: The text in this module is read by the OCR Engine. The recognized

text is then passed to the server. There is another OCR present in the server to improve the

text accuracy. The retrieved text is then passed back to the user mobile device in an editable


19

format. After the text is edited the document is converted into PDF Format using Document

Translator. The saved document can also be shared on all the social networking platforms.

c. Real Time Information: This feature makes the application customisable. The application

can be linked to user defined spread sheets or on site ERP System. This module again uses

the OCR engine and read the text from the image. The text is then matched from the linked

spreadsheet or ERP system and the results are displayed to the user. The spreadsheet can

contain any data which can be filled by the user. The application sends a query to the

spreadsheet and retrieves information from the same. Real time information works well for

large factories with heavy machineries as the information can be retrieved quickly from the

machines.

d. Visual Commerce [6]: Visual Commerce is an unique blend of Visual Search and E-

Commerce. This module is implemented using the Voila Jhones algorithm. In this module

the user doesn’t need to click a photo of the required object. Instead they have to just hover

their camera over the object. The object is recognized from a data set using the algorithm.

Once the object is recognized it is searched in the top e-commerce portals using their API’s.

These API’s are linked with the application at first to gain easy access over the portals.

Once the object is recognized in the portals they are displayed in an aggregated form to the

user with their price and availability mentioned. The data set can be expanded over time

using various machine learning algorithms.

e. Database: The database is MySQL database being populated via crawlers. The database is

created as an aggregation for news articles. There are two crawlers running daily on the

database. The first crawler runs through RSS links and stores the URL in the database. The

other crawler then runs through those links and stores all the news articles. The search is

based on a few parameters which includes relevancy, precision, date of the article and

priority.

f. OCR Engine: The OCR Engine using is Teserract [2]. Tesseract is an optical character

recognition engine for various operating systems. It is free software, released under the

Apache License, Version 2.0, and development has been sponsored by Google since 2006.

Tesseract is considered one of the most accurate open source OCR engines currently

available. If Tesseract is used to process right-to-left text such Arabic or Hebrew the results

are ordered as though it is left-to-right text.

Tesseract is suitable for use as a backend engine.

Module wise System Architecture:

http://en.wikipedia.org/wiki/Optical_character_recognition

http://en.wikipedia.org/wiki/Optical_character_recognition

http://en.wikipedia.org/wiki/Free_software

http://en.wikipedia.org/wiki/Apache_License

http://en.wikipedia.org/wiki/Google


20

Figure 7: Actual System Architecture based on modules

Module/Component Description

UML is de facto standard notation for software design. It can be used for drawing diagrams and

also to generate codes, apply design patterns, mine requirements and perform impact analysis.

UML is flexible and UML models are portable. UML is well known visual language that can

capture much of the information that one needs to communicate about the system.

Use Case Diagram: A use case diagram at its simplest is a representation of a user's interaction

with the system and depicting the specifications of a use case. A use case diagram can portray the

different types of users of a system and the various ways that they interact with the system. This

type of diagram is typically used in conjunction with the textual use case and will often be

accompanied by other types of diagrams as well.

https://en.wikipedia.org/wiki/Use_Case

https://en.wikipedia.org/wiki/Use_Case


21

Figure 8: Use Case Diagram

Activity Diagram: Activity diagrams are graphical representations of workflows of stepwise

activities and actions with support for choice, iteration and concurrency. In the Unified Modelling

Language, activity diagrams are intended to model both computational and organisational

processes (i.e. workflows). Activity diagrams show the overall flow of control.

https://en.wikipedia.org/wiki/Workflow

https://en.wikipedia.org/wiki/Unified_Modeling_Language

https://en.wikipedia.org/wiki/Unified_Modeling_Language


22

Figure 9: Activity Diagram

State Transition Diagram: A state diagram is a type of diagram used in computer science and

related fields to describe the behavior of systems. State diagrams require that the system described

is composed of a finite number of states; sometimes, this is indeed the case, while at other times

this is a reasonable abstraction. Many forms of state diagrams exist, which differ slightly and

have different semantics.

https://en.wikipedia.org/wiki/Diagram

https://en.wikipedia.org/wiki/Computer_science

https://en.wikipedia.org/wiki/State_(computer_science)

https://en.wikipedia.org/wiki/Abstraction_(computer_science)

https://en.wikipedia.org/wiki/Semantics#Computer_science


23

Figure 10: State transition to capture image

Figure 11: State transition for text conversion

Deployment Diagram: Deployment diagrams are used to visualize the topology of the physical

components of a system where the software components are deployed.

So deployment diagrams are used to describe the static deployment view of a system. Deployment

diagrams consist of nodes and their relationships


24

Figure 12: Deployment Diagram

Class Diagram: The class diagram is a static diagram. It represents the static view of an

application. Class diagram is not only used for visualizing, describing and documenting different

aspects of a system but also for constructing executable code of the software application.

The class diagram describes the attributes and operations of a class and also the constraints

imposed on the system. The class diagrams are widely used in the modelling of object oriented

systems because they are the only UML diagrams which can be mapped directly with object

oriented languages.

Figure 15: Class Diagram

Component Diagram: Component diagrams are different in terms of nature and behaviour.

Component diagrams are used to model physical aspects of a system.


25

Now the question is what are these physical aspects? Physical aspects are the elements like

executables, libraries, files, documents etc which resides in a node.

So component diagrams are used to visualize the organization and relationships among

components in a system. These diagrams are also used to make executable systems.

Figure 16: Component Diagram

Data Flow Diagram: A data flow diagram (DFD) is a graphical representation of the "flow" of

data through an information system, modelling its process aspects. Often they are a preliminary

step used to create an overview of the system which can later be elaborated. DFDs can also be

used for the visualization of data processing (structured design).

A DFD shows what kinds of information will be input to and output from the system, where the

data will come from and go to, and where the data will be stored. It does not show information

about the timing of processes, or information about whether processes will operate in sequence

or in parallel (which is shown on a flowchart).

https://en.wikipedia.org/wiki/Information_system

https://en.wikipedia.org/wiki/Data_visualization

https://en.wikipedia.org/wiki/Data_processing

https://en.wikipedia.org/wiki/Flowchart


26

Figure 13: Data Flow Diagram


27

User Interface Design

Figure 14: Splash Screen Figure 15: Smart Snap of Play Store

Figure 156: Main Menu Figure 167: Slide Menu


28

Figure 18: Working of Visual Search. The selection screen.

Fig. 19: Visual Search results Fig. 20: News articles based on Visual search


29

Fig. 21 Share option in the application Fig. 22 Loading Screen for Document Capture

Fig. 23 Real-Time Information Fig. 24 Results for Real-Time Information


30

Chapter 7

7. Implementation

Modules/ Component Description:

Mobile Device:

This Module contains three sub modules. It consists of all the features that are present on the

mobile phone of the user. It contains GUI and OCR. GUI contains one more module.

Mobile Device is the connection between users’s input to the backend.

GUI:

The GUI is the interface with which the user interacts. The GUI is based for an high end

Android Device. It also contains a slide menu. Visual Search consists of UI which connects to

the database. Document Capture is designed in a way to connect to the server. It also includes

a load screen. Real Time information connects with the user defined spreadsheets.

OCR:

This module contains the recognition of the data from the image. This module helps in

retrieving data from the images and sends back to display it to the user. This module works on


31

the mobile device of the user. There is also an OCR running on the server for Document

Capture Module. The OCR Engine used in the project is Teserract.

Capture Image:

This module helps the user to capture images through the mobile device. The images captured

through this module are used for further processing. This module provides the input to the

whole system. The OCR module works on the captured image.

Image Processing [1]:

This module helps in processing and retrieval of the images from the image that is captured

from the user. The images extracted from this module helps in retrieval of the information as

per the user’s request. The application uses Voila Johnes Algorithm to process image from

capture image in Visual Commerce.

Visual Search:

In Visual Search the images extracted or captured by the user are used for buying the products

in that image from the sites like Amazon and eBay. In this module the information about those

products from the shopping sites from where the user can buy those products.

Search Engine:

This module is a search engine from where the data or the information is extracted based on

the user’s request. It is connected to a database which consists of images and information and

news about products. The database is created on MySql which is populated by two crawlers [5].

It consists of all the files and details about the products, websites and RSS feeds regarding

various fields and retrieve data according to the user’s request.

Web Service:

Helps to establish a link between the data source and the android application. Based on the php

framework, it fires query to the MySQL server and retrieves query results which are then

encoded in the JSON format and passed to the application.


32

Spreadsheets:

Instead of hard coding the actions to specific keywords, the user is provided with an ability to

specify his own actions to specific keywords through configuring his spreadsheet. The user has

to specify his web publish url provided by the Google Spreadsheet to configure it. The

spreadsheet has columns like products, action and a url belonging to that particular action.

Document Translator

This provides a convenient way to process an image and segregate it into text and images

separately. It saves the time and energy of retyping and also provides a convenient way to

change the fonts and resizing the images.

Module-wise Algorithm:

Visual Search

1. Scan the word content

2. Extract the text content from the image

3. Spell check the retreived content

4. Text sent to the server

5. Server Side Programming

a. Split the sentence into words

b. Remove stop words

c. Search for relevant news articles based on scanned words

d. JSON encode the retreived articles and pass it to the device

6. Decode the JSON response on the device

7. Display results.

Visual Commerce



3. Spell check the retreived content




33



c. Aggregate deals from leading ecommerce portals

d. Standardizing the results into a uniform format.

e. JSON encode the result and pass it to the device


7. Display results

Real Time Information



3. Spell check the retrieved content





c. Connect to the ERP / Excel / Spreadsheet from which the content is to

be monitored

d. Filter out the rows matching with the searched content

e. Standardizing the results into a uniform format as per the required

information

f. JSON encode the result and pass it to the device


7. Display results


34

Chapter 8

8. Experiments and Project Testing

Test plan

The objective of the plan is to break the product down into distinct areas and identify features

of the Smart Snap Application that are to be tested. The test plan approach that has been used

in our project includes the following:

1. Design verification or Compliance test:

These stages of testing have been performed during the development or approval stage of

the product, typically on a small sample of units.

2. Test Coverage

The design verification tests have been performed at the point of reaching every milestone.

Test areas include testing of various features such as line segmentation, line and edge

detection, Binary image conversion, symbol detection, palm colour detection, etc.

3. Test Methods

Testing of diverse features has been performed in “Smart Snap”. For each module,

corresponding outputs were checked. For testing each module, the output produced from

running the code was checked with the test data set.


35

4. Test Responsibility

The team members working on their respective features performed the testing of those

features. Test responsibilities also include, the data collected, and how that data was used

and reported.

8.2Test cases

A test case is a set of conditions or variables under which we will determine whether the Smart

Snap application is working correctly or not. We have used many test cases to determine that

the system is sufficiently scrutinized.

Test

Case

ID

Case Description Expected Result Actual Result Pass/Fail

1 User takes picture

of a document

The image should be

converted into editable

text and displayed.

The image is converted

into editable text and

displayed.

Pass

2 The user takes a

picture of a text

The OCR should read

the text and display the

related news articles to

the user.

The OCR reads the text

correctly and displays

it to the user.

Pass

3 The user clicks a

photo in portrait

mode instead of

landscape mode.

An error message is

displayed and the user

is prompted to click the

image back in

landscape mode.

The error message is

displayed prompting

user to click the photo

in landscape mode.

Pass

4 The user clicks the

share button

The application should

display all the

available platforms on

The application

displays all the

sharable platforms to

the user.

Pass


36

which the user can

share the results

5 The user clicks the

create PDF option.

A PDF should be

created for the user in

the Micro SD card.

A PDF is created for

the user in the memory

card.

Pass

Table 4: Test Case

8.3 Methods used

The methods used by us for testing are as given below:

1. Unit Testing

Unit testing is a method by which individual units of source code, sets of one or more

program modules together are tested to determine if they fit for use. In our application, we

considered each module as one unit and tested these units with help of test cases and test

plan developed. Unit testing was carried out on each module and on every function within

the module. Output of each unit was assessed for accuracy and if found incorrect,

appropriate corrections were made.

2. Integration Testing

Integration testing is a phase in software testing in which individual software modules are

combined and tested as a group. The purpose of integration testing is to detect any

inconsistencies between the software units that are integrated together. The modules of our

application were integrated together in order to verify that they provide the required

functionalities appropriately. The various modules were tested together to check for their

accuracy and compatibility.

3. System Testing

System testing of software or hardware is testing conducted on a complete, integrated

system to evaluate the system's compliance with its specified requirements. As a rule,

system testing takes, as its input, all of the "integrated" software components that have

successfully passed integration testing and also the software system itself integrated with

any applicable hardware system. We performed system testing after integration testing to

ensure proper functioning of the project as whole.

4. Acceptance Testing


37

Acceptance testing will be conducted to determine if the requirements and specifications

are met. It may involve performance tests. Acceptance testing performed by the customer

is known as user acceptance testing (UAT), end-user testing, site (acceptance) testing, or

field (acceptance) testing. Acceptance testing generally involves running a suite of tests on

the completed system. Each individual test, known as a case, exercises a particular

operating condition of the user's environment or feature of the system, and will result in a

pass or fail, or Boolean, outcome. Here the end user will use the Smart Snap Application

first in a tested environment and then in the environment of his/her own home.

5. Usability testing

Usability testing is a technique used to evaluate a product by testing it on users. Usability

testing focuses on measuring a human-made product's capacity to meet its intended

purpose. . Usability testing measures the usability, or ease of use, of a specific object or set

of objects. The results of this review will help improve the end-user interaction of the

software. The purpose of usability testing is to ensure that the Smart Snap Application will

function in a manner that is acceptable to the user.

6. Performance testing

Performance testing is testing that is performed, to determine how fast some aspect of

a system performs under a particular workload. It can also serve to validate and verify

other quality attributes of the system, such as scalability, reliability and resource usage. In

the Smart Snap Application, these tests ensure that the system provides acceptable response

times. It should not exceed 10 seconds once the user has finished loading the image and

has clicked on the ‘Output’ button.

8.4 Test Results

Preliminary test conducted showed promising results. The tests conducted included taking a

number of photographs at different angles to ensure readability of data. The information read

via the OCR engine were good. The data was even read at an angle of 30 degrees. However,

the data could not be read when the photo was blur. The angle for the OCR engine can be

improved.

Further tests were conducted for the retrieval of data from the servers. The data was retrieved

within 3 seconds for Visual Search which was astounding. The results before were quite bad

as the entire data was being loaded to the local database and then being searched. Algorithms

were improved and then the searched data only was being sent. This improved the speed

exponentially.


38


39

Chapter 9

9. Maintenance

9.1 User Manual

Prerequisites

The user must have an Android Phone

The user must have an active data plan or any sort of internet connectivity

How to use

The user must start the application.

Select one of the various modules which are Document Capture, Visual Search, Visual

Commerce and Real Time Information.

The user then must click a photo for which he wants the results

The results are displayed which can be shared on various social networking platforms

9.2 Constraints

The user must have internet connectivity.


40

Visual commerce does not work in portrait mode. The user must click the photo in

landscape mode only.

Document Capture works only in portrait mode. The image shouldn’t be clicked in

landscape mode.

While clicking the image the hand should be stable and the image clicked should not have

any disruptions. The environment should not be too dark.


41

Chapter 10

10. Conclusion and Future Scope

After working on the project for a span of 6 months we have proposed a new system of search

based on visual search instead of text based search. Text based search is a traditional outdated

search. A new way of search was required which was more precise and optimized.

Based on Visual Search, Image Recognition and Database Technology a more optimized way

of search and conversion tool was created which would help the user in saving a lot of time.

The features are closely integrated and help the user in converting image to text, buying new

products and getting the latest news.

The application has a number of future scopes. The precision of the application needs to be

improved for better results. The database can be optimized to give quicker search results.

Also the future scope includes adding a number of features. These features include adding

Sudoku Solver, Location based Navigation and Voice Based Search.

Overall we conclude, the application brings on new technology which is user-friendly and save

a lot of time for the user.


42

References and Bibliography

A

Android: It is an operating system for Mobile devices and tablets being developed by Google.

D

Document Capture: A feature that will allow users to convert images into editable texts. These texts

will be copied to the devices clipboard.

F

Forward-When the recipient of an email message sends it on to someone he or she thinks might find

it interesting or benefit from.

I

Image Recognition: Image recognitions is recognises relevant information from the image clicked

using various available technologies.

M

MATLAB: Developed by MathWorks, MATLAB allows matrix manipulations, plotting

of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with

programs written in other languages, including C, C++, Java, and Fortran.

N

News Aggregator: News aggregator is a feature which pulls in all the relevant information about a

topic from the net using custom search.

O

Optical Character Recognition: Optical character recognition, usually abbreviated to OCR, is

the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed

text into machine-encoded text.

P

Page Rank: It assigns a weight to the web page depending on the ranking of the users.

Q

QR Codes: QR code (abbreviated from Quick Response Code) is the trademark for a type of matrix

barcode

S

Spreadsheets: Spreadsheets are sheets available online by Google. The user can create tables and

files within a spredsheets and store custom information.

http://en.wikipedia.org/wiki/MathWorks

http://en.wikipedia.org/wiki/Matrix_(mathematics)

http://en.wikipedia.org/wiki/Function_(mathematics)

http://en.wikipedia.org/wiki/Algorithm

http://en.wikipedia.org/wiki/User_interface

http://en.wikipedia.org/wiki/C_(programming_language)

http://en.wikipedia.org/wiki/C%2B%2B

http://en.wikipedia.org/wiki/Java_(programming_language)

http://en.wikipedia.org/wiki/Fortran

http://en.wikipedia.org/wiki/Machine

http://en.wikipedia.org/wiki/Electronics

http://en.wikipedia.org/wiki/Image

http://en.wikipedia.org/wiki/Matrix_barcode

http://en.wikipedia.org/wiki/Matrix_barcode


43

T

Test-An action taken to ensure an email will perform properly before it is sent. A test message is sent

to several “testing” accounts and allows marketers to identify problems such as broken links or

images and rectify them before sending the email to an entire list as well as a means of comparing

the results of different versions of an email.

V

Visual Commerce: A feature that allows users to purchase online not by typing but just by clicking

an image of the object they want to purchase.

Recommended sites, newsletters, blogs and books.

Websites

1. The website has all the details about image processing basics.

http://www.idi.ntnu.no/~blake/gbimpdet.htm

2. Google Codes-This is the place to find explanations related to Tesseract OCR.

http://code.google.com/p/tesseract-ocr/

3. http://seomojo.net/how_seo.htm

4. http://framework.zend.com/

5. http://visual.ly/google%E2%80%99s-hummingbird-algorithm-%E2%80%93-

what%E2%80%99s-it-all-about

6. http://www.mathworks.in/help/vision/ref/vision.cascadeobjectdetectorclass.html

Books

Document image analysis: A Primer

Rangachar Department of Computer Science & Engineering, The Pennsylvania State

University, University Park, PA 16802, USA

Kasturi LawrenceO’Gorman Avaya Labs, Room 1B04, 233 Mt. Airy Road, Basking Ridge, NJ

07920, USA

Venu Govindraju CEDAR, State University of New York at Buffalo, Amherst, NY 14228, USA

smart snap - report

Technology

project design

project estimates

project testing

project resources

project management

project report approval

internal guide

necessary information