Mobility Assistant for VisuallyImpaired(MAVI) on Cloud
A thesis submitted in partial fulfillmentof the requirements for the degree of
BACHELOR OF TECHNOLOGY &MASTER OF TECHNOLOGY
in
Computer Science & Engineering
by
Akhil Soni2013CS50300
Under the guidance of
Prof. M. BalakrishnanDr. Chetan Arora
Department of Computer Science and Engineering,Indian Institute of Technology Delhi.
June 2018
Certificate
This is to certify that the thesis titled Mobility Assistant for Visually
Impaired(MAVI) on Cloud being submitted by Akhil Soni for the award
of Bachelor of Technology & Masters of Technology in Computer
Science & Engineering is a record of bona fide work carried out by him
under my guidance and supervision at the Department of Computer Sci-
ence & Engineering. The work presented in this thesis has not been sub-
mitted elsewhere either in part or full, for the award of any other degree or
diploma.
Prof. M. Balakrishnan
Department of Computer Science and Engineering
Indian Institute of Technology, Delhi
Dr. Chetan Arora
Department of Computer Science and Engineering
Indraprastha Institute of Information Technology, Delhi
Abstract
Mobility Assistant for Visually Impaired (MAVI) is a device which aims to
improve the life of visually impaired in terms of safety, social inclusion and
navigation. This thesis implements a cloud and online solution for MAVI.
Initially, the available cloud services were identified and the best fit cloud
service was chosen to proceed. A simple prototype was developed for my lo-
cal system and then eventually, a prototype of MAVI on cloud was developed
on Raspberry Pi.
Network latency analysis was performed in order to improve the solution.
A detailed survey was done in order to estimate the cloud run times. The
accuracies of all the modules of MAVI that were ported to cloud (which are
Face Detection, Cow and Dog Detection and OCR) were reported. The En-
ergy Consumption was also calculated.
In the final phases, a comparative study of the batch processing of images
was performed to show the effect of batch sizes on the total run time. The
comparison between PivotHead smart camera and USB webcam was per-
formed to identify the better fit for our application.
Finally, a fully functional end to end MAVI on Cloud prototype was de-
veloped comprising of a Raspberry Pi, a USB Webcam and the Android
App.
Acknowledgments
I would like to thank my supervisor, Prof. M. Balakrishnan for providing
me with the opportunity to work on this interesting project as my M.Tech
Project. His unfailing support, guidance and help have been invaluable dur-
ing the course of this project. I am grateful for all the help I received from
him.
I would also like to thank Dr. Chetan Arora for his valuable insights and
assistance regarding subject of Computer Vision.
Mr. Rajesh Kedia and Mr. Anupam Sobti played an integral role in my
project. I sincerely thank them for all their insights, experience, coding ex-
pertise and efforts that have really helped me.
I also extend my thanks to my friends Deepanker Mishra(2013CS50282),
Garvit Jain(2013CS50284) and Akhil Masa(2013MT60602) for their insights
and coding expertise that I have received at various instances.
Special thanks to Mr. S. D. Sharma for providing me with all the lab equip-
ments and support. This is not just the result of my efforts but an outcome
of efforts of several individuals.
Akhil Soni
Contents
1 Prelude 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation and Objective . . . . . . . . . . . . . . . . . . . . 1
1.3 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Survey of Cloud Services 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Google Vision API . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Microsoft Computer Vision API . . . . . . . . . . . . . . . . . 5
2.4 Amazon Rekognition . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 IBM Watson Visual Recognition . . . . . . . . . . . . . . . . . 6
2.6 SkyBiometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.7 Comparison and Conclusion . . . . . . . . . . . . . . . . . . . 6
3 Setup and Testbed for the evaluation 8
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . 8
3.1.2 Virtual Machine on Google Cloud . . . . . . . . . . . . 9
3.1.3 Connecting to the Server . . . . . . . . . . . . . . . . . 9
3.2 Phase1 : On PC . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 Installing Pre-Requisites . . . . . . . . . . . . . . . . . 10
3.2.2 Running the Vision API . . . . . . . . . . . . . . . . . 11
3.3 Phase2 : On Raspberry Pi . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Installing Pre-Requisites . . . . . . . . . . . . . . . . . 11
3.3.2 Running the Vision API . . . . . . . . . . . . . . . . . 12
c© 2018, Indian Institute of Technology Delhi
CONTENTS
4 Experimental Setup 13
4.1 Network Latency . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 General Methodology . . . . . . . . . . . . . . . . . . . 13
4.1.2 Face Detection . . . . . . . . . . . . . . . . . . . . . . 14
4.1.3 OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.4 Animal Detection . . . . . . . . . . . . . . . . . . . . . 16
4.1.5 Diverse Images . . . . . . . . . . . . . . . . . . . . . . 16
4.1.6 Experiment Details . . . . . . . . . . . . . . . . . . . . 16
4.2 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.1 Current Measurements . . . . . . . . . . . . . . . . . . 17
4.2.2 Energy Measurements . . . . . . . . . . . . . . . . . . 17
5 Results 19
5.1 Network Latency Analysis . . . . . . . . . . . . . . . . . . . . 19
5.1.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . 19
5.1.2 Animal Detection . . . . . . . . . . . . . . . . . . . . . 21
5.1.3 OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1.4 Diverse Images . . . . . . . . . . . . . . . . . . . . . . 25
5.1.5 Mean and Standard Deviation . . . . . . . . . . . . . . 26
5.2 Cloud Run Time Analysis . . . . . . . . . . . . . . . . . . . . 27
5.2.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . 27
5.2.2 Animal Detection . . . . . . . . . . . . . . . . . . . . . 29
5.2.3 OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.4 Diverse Images . . . . . . . . . . . . . . . . . . . . . . 33
5.2.5 Mean and Standard Deviation . . . . . . . . . . . . . . 34
5.3 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . 35
c© 2018, Indian Institute of Technology Delhi
5.3.2 Animal Detection . . . . . . . . . . . . . . . . . . . . . 38
5.3.3 OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 Energy Consumption Analysis . . . . . . . . . . . . . . . . . . 41
5.4.1 Current Readings . . . . . . . . . . . . . . . . . . . . . 41
5.4.2 Energy Readings . . . . . . . . . . . . . . . . . . . . . 41
6 Batch Processing 44
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7 Prototype 46
7.1 Capturing Images . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.1.1 PivotHead Smart Camera . . . . . . . . . . . . . . . . 46
7.1.2 USB Web Camera . . . . . . . . . . . . . . . . . . . . 47
7.2 Prototype1 - Using PivotHead . . . . . . . . . . . . . . . . . . 48
7.3 Prototype2 - Using USB Web Camera . . . . . . . . . . . . . . 48
7.4 PivotHead Vs WebCam . . . . . . . . . . . . . . . . . . . . . 49
8 Conclusions 50
8.1 Cloud Service Used . . . . . . . . . . . . . . . . . . . . . . . . 50
8.2 Network Latency . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.3 Cloud Run Time . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.4 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . 51
8.5 Final Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Bibliography 52
Chapter 1
Prelude
1.1 Introduction
Mobility Assistant for Visually Impaired(MAVI)
MAVI is an ambitious project aimed at enabling mobility for visually im-
paired individuals, especially in India. The three major problems that MAVI
tackles are Safety, Social Inclusion and Navigation. The overview of the sys-
tem is shown in the figure below: [6]
Figure 1.1: MAVI Overview
1.2 Motivation and Objective
Being blessed with all the basic sensory organs, we don’t realize the impor-
tance of any unless we come across someone who is deprived of one or more
c© 2018, Indian Institute of Technology Delhi
1.3 Thesis Contribution
of the basic senses. Eyes being one of the most important and basic sensory
organs that a human needs to function, MAVI aims to bring vision to the
visually impaired and make them visualize as much as they can. Being a vi-
sually impaired person is really challenging. Just imagine a scenario where in
one has to live his entire life with his eyes closed. It’s a nightmare. Hence, vi-
sual impairment is one of the severe type of disabilities a person must endure.
With MAVI, we are trying to solve serious problems which could be solved
using the current technologies. With the advancement in cloud technologies,
a lot of computation can be done on the cloud which might save some time if
we run the same algorithms locally. Also, with the growing number of people
every day gaining access to Internet connectivity, cloud computation will be
a really good idea. For example, the face detection service provided by cloud
service providers might be significantly faster than the algorithm which has
been developed for performing face detection locally. Hence, Objective of
this thesis is to implement and develop a cloud solution for MAVI, perform
network latency analysis and compute the accuracy of the cloud services.
1.3 Thesis Contribution
The thesis is focused at implementing a cloud solution for MAVI and per-
forming an in depth analysis for the entire system. Initially, the various cloud
services available were explored and the best fit for our application(MAVI)
was identified. Once, the cloud service was identified, the system was set up
on my PC, wherein an image is uploaded from my PC to the cloud and the
cloud returns with a solution. Once, the PC setup was working perfectly,
the system was ported to Raspberry Pi. Initially the images were stored
locally on the Raspberry Pi in the memory card but gradually the system
dynamically took images from the camera and an end to end MAVI on Cloud
system was developed. The later and final part of the thesis focused on the
analysis of the system developed. An in depth network analysis was done on
three different spectrums 3G, 4G and IIT Delhi campus WiFi respectively.
Batch processing of images and the analysis of the same was also performed.
c© 2018, Indian Institute of Technology Delhi
1.4 Thesis Outline
Finally, the accuracy of the system was measured.
1.4 Thesis Outline
The thesis comprises of 8 chapters including this one. Chapter 2 discusses
the comparison and survey of the various cloud services available. Chap-
ter 3 describes about the experimental setup on local PC and Raspberry Pi
respectively. Chapter 4 is about the experimental setup detailing how the
experiments were carried out, datasets used for analysis, the tools used, etc.
Chapter 5 shows the results for accuracy, latency and cloud runtime analy-
sis. Chapter 6 describes about the batch processing experiment. Chapter 7
depicts the prototype developed. Chapter 8 concludes and summarizes my
entire work along with references attached in the end.
c© 2018, Indian Institute of Technology Delhi
Chapter 2
Survey of Cloud Services
2.1 Introduction
In this chapter, the various cloud services available online are explored and
compared with each other. Big tech giants like Google, Amazon, Microsoft
and IBM provide vision API’s where in they have their own computer vi-
sion algorithms which incorporate various features such as Face Detection,
Natural Language Processing(OCR), Logo Detection and several other fea-
tures. There are also some certain cloud services which cater to only specific
features like Face Detection and Recognition. Skybiometry is one such ap-
plication which pertain only to face detection and recognition. Let us now
analyze the various cloud services available.
2.2 Google Vision API
Google’s Vision API is one of the best among the various cloud service
providers. It provides various features such as Face Detection, Label Detec-
tion and Logo Detection. Apart from this, it also provides Landmark Detec-
tion and Explicit Content Detection. Optical Character Recognition(OCR)
is also supported by the Google’s Vision API. There are a total of 56 lan-
guages supported where the OCR can detect text.[8] These 56 languages
include English and various Indian languages such as Marathi, Tamil, San-
skrit and Bengali. The modules of MAVI which could be ported to cloud by
using Google’s Vision API are Face Detection, OCR in Sign Board Detection
and Animal Detection(Cow and Dog) in Label Detection.
c© 2018, Indian Institute of Technology Delhi
2.3 Microsoft Computer Vision API
2.3 Microsoft Computer Vision API
Microsoft’s Vision API provides features such as categorizing images, identi-
fying image types, fagging adult content, Optical Character Recognition(OCR)
and several other features such as generating thumbnails and perceiving color
schemes.[7] There are a total of 25 languages supported by Microsoft’s Vision
API in OCR which includes English but none of the Indian languages unlike
Google’s Vision API. The languages supported are quite few as compared to
the Google’s Vision API. The categorizing image feature of Microsoft’s Vision
API is analogous to Google’s Label Detection. It classifies image into several
categories including animals like cow and dog. The MAVI modules which
could be ported to cloud by using Microsoft’s Vision API are Face Detec-
tion, OCR in Sign Board Detection(just English) and Animal Detection(Cow
and Dog) in Categorizing Images.
2.4 Amazon Rekognition
Amazon Rekognition provides features such as Object and Scene Detection,
Facial Recognition, Facial Analysis, Face Comparison, Unsafe Image Detec-
tion, Celebrity Detection and Text in Image(OCR). Amazon Rekognition
supports text in most Latin Scripts for text detection. It recognizes up to 50
sequences of characters per image and lists them as words and lines. Also,
this feature recognizes only text horizontal with +/- 30 degrees orientation.
Facial Recognition finds similar faces in a large collection of images.[10] The
modules of MAVI that can be ported to cloud using Amazon Rekognition are
Face Detection, Face Recognition, Animal Detection(Cow and Dog) in Ob-
ject and Scene Detection and Optical Character Recognition(OCR) in Text
in Image.
c© 2018, Indian Institute of Technology Delhi
2.5 IBM Watson Visual Recognition
2.5 IBM Watson Visual Recognition
IBM Watson Visual Recognition provides Image classification and Face De-
tection. Image classification classifies the image into various classes such as
bench, dog, bush, swings, text and a few other.[4] The languages supported
by IBM Watson Visual Recognition are just 8 including English, French,
Spanish, German, Korean, Japanese, Italian and Arabic. There is no sup-
port for any Indian Language. The MAVI modules that can be ported to
cloud using IBM Watson Visual Recognition are Face Detection, Text De-
tection and Animal Detection(using Image Classification).
2.6 SkyBiometry
SkyBiometry is a specialized tool which provides only specific feature of Facial
Detection and Recognition. No other support is available. It is a tool that
is meant specifically for Facial Detection and Recognition.
2.7 Comparison and Conclusion
Table 2.1: Cloud Services Comparison
FD FR AD OCR
Google Yes No Yes Yes(EN,HI)
Microsoft Yes No Yes Yes(EN)
Amazon Yes Yes Yes Yes(EN)
IBM Yes No Yes Yes(EN)
SkyBiometry Yes Yes No No
FD - Face Detection
FR - Face Recognition
AD - Animal Detection
c© 2018, Indian Institute of Technology Delhi
2.7 Comparison and Conclusion
The above table clearly summarizes the various cloud services available with
respect to MAVI modules. Hence, based on the above table, the Google’s
Vision API was chosen over the others as it provides OCR support in both
English and Hindi respectively. Also there is extensive support available on
the web for Google’s Vision API. Hence, based on these two conclusions, I
chose to build the MAVI on Cloud with cloud services taken from Google’s
Vision API. The features which were used from the Google’s Vision API
are Face Detection, Label Detection and OCR for detecting faces, detecting
animals(cows and dogs) and for detecting the text in SignBoards.
c© 2018, Indian Institute of Technology Delhi
Chapter 3
Setup and Testbed for the eval-
uation
3.1 Introduction
Once the cloud service is finalized, the next step is to setup the system and
get it working. It was done in two phases. In the initial and the starting
phase, the setup was done on my local PC while in the second and the latter
phase, the setup was ported to Raspberry Pi.
3.1.1 Getting Started
Google provides a free trial of its cloud services where in it provides credits
worth $300 free which has a validity of one year. There is a term called units
defined for billing purposes by Google. Google charges for each feature that
is applied on an image. Here, each and every feature that is applied to an
image is termed as a billable unit by Google. For example, if one applies
Face Detection and Label Detection to the same image, the user is billed for
1 unit of Label Detection and 1 unit for Face Detection both. The pricing
system of Google is as depicted in the figure below:
Figure 3.1: Google Pricing as of 19/06/18 [3]
c© 2018, Indian Institute of Technology Delhi
3.2 Phase1 : On PC
3.1.2 Virtual Machine on Google Cloud
The next step was to create a Virtual Machine(VM) on the Google Cloud.
This VM is basically the host for all of our cloud computations. All the
cloud processing that is going to happen will be on this VM. The VM that
is created has the following specifications:
Zone - us-central1-a
1 shared vCPU, 0.6GB memory
Operating System - Ubuntu16.04
SSD Persistent Disk - 15GB
Overall Rent/month - $4.34
The above specifications were sufficient for the project purpose and were
decided focusing on the aim to minimize the VM rent per month.
3.1.3 Connecting to the Server
Once the VM is set up and running, the next step is to establish a connection
to the VM through the local machine or the Raspberry Pi. SSH keys are used
in order to connect to the server. SSH keys can be generated in numerous
ways. PuttyGen has been used to generate SSH keys. Hence, a private key
and public key pair is generated. We would use these set of keys to connect
to the server on the Google Cloud. Now, since the VM is set up, the next
step is to run the Vision API.
3.2 Phase1 : On PC
In the starting phase, the aim was to run it successfully on the local system.
Putty has been used to connect to the VM that is hosted on the Google
Cloud and FileZilla to transfer files across the systems. Following are the
c© 2018, Indian Institute of Technology Delhi
3.2 Phase1 : On PC
pre requisite softwares and packages that needed to be installed on the VM
before one can actually run the Vision API.
3.2.1 Installing Pre-Requisites
The pre requisites are Python-Pip, the Google-Cloud library for python and
finally the credentials for the Vision API need to be created. The sequential
execution of the following commands would ensure the smooth installation
of all the pre requisites. The credentials for the Vision API are created from
the console GUI of the google-cloud. The page looks like this:
Figure 3.2: Credentials[1]
Clicking on the Create credentials button would create a JSON file which is
to be uploaded to the VM. The sequential execution of the following com-
mands would ensure a successful setup of the Google Cloud.
sudo apt-get install python-pip
sudo pip install google-cloud
pip install –upgrade pip
c© 2018, Indian Institute of Technology Delhi
3.3 Phase2 : On Raspberry Pi
nano $HOME/.profile
export GOOGLE APPLICATION CREDENTIALS=$PATH
source $HOME/.bashrc
$PATH is the path where the JSON file is stored
3.2.2 Running the Vision API
In this setup, the images were stored on my local system. Once all the pre
requisites are met, a generalized script was written which took in an image
as an input and then produced the desired output. This script comprises
of three scripts. One which is run on the VM, it runs the Vision API and
produces the output in a txt file on the VM itself. Second script is used to
upload the image from the local system to the VM and the third script then
brings back the generated txt file to the local system.
3.3 Phase2 : On Raspberry Pi
3.3.1 Installing Pre-Requisites
The procedure to install pre requisites on Raspberry Pi is very similar to
that discussed earlier. The pre requisites remain the same which are to in-
stall Python Pip, installing google cloud library for python and loading the
credentials for the Vision API. The sequential execution of the following
commands would ensure a smooth and successful installation of all the pre
requisites:[2]
sudo apt-get install python-pip
sudo pip install –upgrade pip
sudo apt-get install libjpeg8-dev
sudo pip install –upgrade google-api-python-client
sudo pip install –upgrade Pillow sudo su
sudo nano $HOME/.bashrc
export GOOGLE APPLICATION CREDENTIALS=$PATH
c© 2018, Indian Institute of Technology Delhi
3.3 Phase2 : On Raspberry Pi
source $HOME/.bashrc
$PATH is the path where the JSON file is stored
3.3.2 Running the Vision API
In this case, when the pre requisites are installed, there is no external need
to upload the image to the VM separately. Hence, a single script suffices.
This script takes an image as an input which is stored on the SD card of
Raspberry Pi, uploads it to the cloud and receives back the output on the
SD card. The coding of the script in Raspberry Pi is not the same as that
of the scripts used on the local system. The scripts differ in the part where
the images need to be uploaded on cloud. In case of Raspberry Pi, a single
Request serves the purpose of uploading the image on the cloud as well as
downloading the JSON output from the Google Cloud whereas in case of
PC, there were two different scripts specialized to do each of the two tasks
of uploading image and running the Vision API and downloading the JSON
object.
c© 2018, Indian Institute of Technology Delhi
Chapter 4
Experimental Setup
In this chapter, the experimental setup is described along with how the var-
ious experiments are conducted in order to perform the Network Latency
and Energy Consumption analysis. Setup for network latency is described
first and then Energy Consumption. Let me start by Network Latency then
proceeding to Energy Consumption.
4.1 Network Latency
4.1.1 General Methodology
The default SSID hotspot name to which the Raspberry Pi connects is
MAVI hotspot and the password to the same is Mavi@123. The images are
stored on the memory card which is inserted in Raspberry Pi. The datasets
that are used in all the experiments are the MAVI datasets which is to say
MAVI Face Detection dataset, MAVI Cow and Dog dataset, MAVI Sign-
Board dataset for analyzing Face images, Animal Images and OCR Images
respectively. To analyze the network behavior, a tool named Wireshark is
used. The command line tool for Wireshark is known as Tshark. Tshark is
used to analyze the network behavior during the experiments. The Tshark
output is directed to a txt file. A parser in python is written in order to
parse the txt file and extract meaningful information from the same like Im-
age Upload time, Cloud Run Time and the JSON Download time.
The python parser parses the tshark txt file into meaningful data. This data
is then transferred to an excel sheet. The data is further processed and then
individual Matlab scripts are written to plot graphs for Face Detection, An-
imal Detection and OCR respectively.
c© 2018, Indian Institute of Technology Delhi
4.1 Network Latency
The output of tshark looks somewhat like in the figure given below:
Figure 4.1: Tshark Output
4.1.2 Face Detection
The images are characterized into three classes:
• Class 1 - Images containing 1 face.
• Class 2 - Images containing 2 faces.
• Class 3 - Images containing 4 faces.
There are a total of 25 images chosen from each class. Also there are two
types of experiments done:
• Experiment 1 - Where only Face Detection Algorithm of the Vision
API is running(on all the three classes of images).
• Experiment 2 - Where all algorithms(Face Detection, Label Detection
and Text Detection) of the Vision API are running(on all the three
classes of images).
c© 2018, Indian Institute of Technology Delhi
4.1 Network Latency
Hence, there are two sets of experiments which are performed over 75 images
each. These images are chosen such that ensuring the required number of
faces are detected in each of them. This is to stay consistent with the Cloud
Run Time.
There is a field named Face Annotations in the response JSON which provides
us the information like how many faces are detected and the bounding box
co ordinates of each of the faces. I look for this field in the response JSON.
4.1.3 OCR
The images are classified into two classes:
• Class 1 - Images containing SignBoards with just English Text
• Class 2 - Images containing SignBoards with both English and Hindi
Text
There are a total of 25 images chosen from each class. Also there are two
types of experiments done:
• Experiment 1 - Where only Text Detection Algorithm of the Vision
API is running(on both the classes of images).
• Experiment 2 - Where all algorithms(Face Detection, Label Detection
and Text Detection) of the Vision API are running(on both the classes
of images).
Hence, there are two sets of experiments which are performed over 50 images
each. There is a field named Text Annotations in the response JSON which
provides us the information with what is the text detected and the bounding
box co ordinates of the text that is detected. I look for this field in the
response JSON.
c© 2018, Indian Institute of Technology Delhi
4.1 Network Latency
4.1.4 Animal Detection
There is only one class of images here containing cows and dogs. There are
a total of 25 images. There are two types of experiments done:
• Experiment 1 - Where only Label Detection Algorithm of the Vision
API is running.
• Experiment 2 - Where all algorithms(Face Detection, Label Detection
and Text Detection) of the Vision API are running.
Hence, there are two sets of experiments which are performed over 25 images
each. There is a field named Label Annotations in the response JSON which
provides us the information with what are the labels that are detected. I
look for this field in the response JSON.
4.1.5 Diverse Images
25 images were collected of diverse qualities captured from the PivotHead
camera. These images had a blend of variety of images, some containing
faces, some containing animals and some containing SignBoards. Another
experiment was performed in which all the algorithms of the Vision API
were running(Face Detection, Text Detection and Label Detection).
4.1.6 Experiment Details
All the above sets of experiments were performed during 6 different times
of days and on multiple days. To ensure the network behavior is captured
correctly, all the above mentioned experiments were conducted under three
different network spectrums - 3G, 4G and IIT Delhi Campus Wifi.
c© 2018, Indian Institute of Technology Delhi
4.2 Energy Consumption
4.2 Energy Consumption
4.2.1 Current Measurements
An external device is used in order to analyze the current that is absorbed
by Raspberry Pi when the prototype is running. Initially, the current is mea-
sured when there are no external components connected to the Raspberry
Pi apart from the power source. Then, after connecting components such as
keyboard, mouse and monitor for the GUI output of Raspberry Pi, current
is measured. This gives the base readings for comparison.
There are two sets of experiments that are performed:
• Set 1 - Images are stored on Raspberry Pi itself. The code is run.
• Set 2 - Images are dynamically captured from the USB Webcam and
the full prototype is run including the MAVI app
In both of the above mentioned sets of experiments, there were two subclasses
in each of them one in which the Internet Connectivity is provided using WiFi
Hotspot and other in which the Internet Connectivity is provided using USB
Tethering.
4.2.2 Energy Measurements
The same external device which was used to measure current is used to
measure Energy too. But, while measuring Energy, a 2600mAH power bank
is also used. Here there were again two sets of experiments performed:
• Set 1 - In this, the external USB device is used to measure the Energy.
In this experiment, there were two different experiments performed
including USB Tethering and WiFi Hotspot
• Set 2 - In this, 2600mAh power bank is used to power the Raspberry
Pi and check how long does it take to completely discharge the power
bank thereby calculating Energy used. In this experiment too, there
c© 2018, Indian Institute of Technology Delhi
4.2 Energy Consumption
were two different experiments performed including USB Tethering and
WiFi Hotspot.
c© 2018, Indian Institute of Technology Delhi
Chapter 5
Results
5.1 Network Latency Analysis
As described earlier, the experiments were performed under three different
network conditions mainly 3G, 4G and IITD Wifi. The experiments were
performed during 6 different times of the day which are 7 A.M, 10 A.M, 1:30
P.M, 4:30 P.M, 7:30 P.M and 12 A.M. Also, the network latency is assumed
to be same as the latency involved in uploading an image because the latency
in downloading the JSON is negligible(of the order of 10 mili seconds). A
comparative analysis of these 3 networks during these 6 different times of a
day is described below:
5.1.1 Face Detection
There are a total of 150 images which are plotted in the upcoming graphs
since there were 75 images each belonging to the 2 different types of experi-
ments as discussed earlier.
In all the subsequent graphs, X axis denotes the time in milliseconds and
Y axis denotes the fraction of images. A point on the graph denotes what
fraction of images are uploaded(y co-ordinate) by the corresponding time(x
co-ordinate) The following graph is a cumulative distribution function which
shows the variation of the upload time of images under 3G network:
c© 2018, Indian Institute of Technology Delhi
5.1 Network Latency Analysis
0 1000 2000 3000 4000 5000 6000 7000 80000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Upload Time for Face Images − 3G
7 A.M
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.1: Upload Time for face images - 3G
The following graph is a cdf which shows the variation of the upload time of
images under 4G network:
0 1000 2000 3000 4000 5000 6000 7000 80000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Upload Time for Face Images − Airtel4G
7 A.M
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.2: Upload Time for face images - 4G
The following graph is a cdf which shows the variation of the upload time of
images under IITD Wifi network:
c© 2018, Indian Institute of Technology Delhi
5.1 Network Latency Analysis
0 1000 2000 3000 4000 5000 6000 7000 80000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Upload Time for Face Images − IITDWifi
7 A.M
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.3: Upload Time for face images - IITD Wifi
5.1.2 Animal Detection
There are a total of 50 images which are plotted in the upcoming graphs since
there were 25 images each belonging to the 2 different types of experiments
as discussed earlier.
The following graph is a cdf which shows the variation of the upload time of
images under 3G network:
c© 2018, Indian Institute of Technology Delhi
5.1 Network Latency Analysis
0 1000 2000 3000 4000 5000 6000 70000
0.2
0.4
0.6
0.8
1
1.2
Time(in ms)
Frac
tion
of Im
ages
Upload Time for animal Images − 3g
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.4: Upload Time for animal images - 3G
The following graph is a cdf which shows the variation of the upload time of
images under 4G network:
0 1000 2000 3000 4000 5000 6000 70000
0.2
0.4
0.6
0.8
1
1.2
Time(in ms)
Frac
tion
of Im
ages
Upload Time for animal Images − airtel
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.5: Upload Time for animal images - 4G
The following graph is a cdf which shows the variation of the upload time of
images under IITD Wifi network:
c© 2018, Indian Institute of Technology Delhi
5.1 Network Latency Analysis
0 1000 2000 3000 4000 5000 6000 70000
0.2
0.4
0.6
0.8
1
1.2
Time(in ms)
Frac
tion
of Im
ages
Upload Time for animal Images − wifi
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.6: Upload Time for animal images - IITD Wifi
5.1.3 OCR
There are a total of 100 images which are plotted in the upcoming graphs
since there were 50 images each belonging to the 2 different types of exper-
iments as discussed earlier. The following graph is a cdf which shows the
variation of the upload time of images under 3G network:
0 1000 2000 3000 4000 5000 6000 70000
0.2
0.4
0.6
0.8
1
1.2
Time(in ms)
Frac
tion
of Im
ages
Upload Time for OCR Images − 3g
7:00 A.M.
10:00 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.7: Upload Time for Sign Board Images - 3G
c© 2018, Indian Institute of Technology Delhi
5.1 Network Latency Analysis
The following graph is a cdf which shows the variation of the upload time of
images under 4G network:
0 1000 2000 3000 4000 5000 6000 70000
0.2
0.4
0.6
0.8
1
1.2
Time(in ms)
Frac
tion
of Im
ages
Upload Time for OCR Images − Airtel4G
7:00 A.M.
10:00 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.8: Upload Time for Sign Board Images- 4G
The following graph is a cdf which shows the variation of the upload time of
images under IITD Wifi network:
0 1000 2000 3000 4000 5000 6000 70000
0.2
0.4
0.6
0.8
1
1.2
Time(in ms)
Frac
tion
of Im
ages
Upload Time for OCR Images − wifi
7:00 A.M.
10:00 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.9: Upload Time for Sign Board Images - IITD Wifi
c© 2018, Indian Institute of Technology Delhi
5.1 Network Latency Analysis
5.1.4 Diverse Images
There are a total of 25 images which are plotted in the upcoming graphs since
there were 25 images and only one type of experiment as discussed earlier.
The following graphs are cdf’s which shows the variation of the upload time
of images under 3G, 4G and IITD Wifi network respectively:
0 2000 4000 6000 8000 100000
0.2
0.4
0.6
0.8
1
1.2
Time(in ms)
Frac
tion
of Im
ages
Upload Time for miixed Images − 3g
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.10: Upload Time for diverse images - 3G
0 2000 4000 6000 8000 100000
0.2
0.4
0.6
0.8
1
1.2
Time(in ms)
Frac
tion
of Im
ages
Upload Time for mixed Images − Airtel4G
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.11: Upload Time for diverse images - 4G
c© 2018, Indian Institute of Technology Delhi
5.1 Network Latency Analysis
0 2000 4000 6000 8000 100000
0.2
0.4
0.6
0.8
1
1.2
Time(in ms)
Frac
tion
of Im
ages
Upload Time for mixed Images − IITDWifi
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.12: Upload Time for diverse images - IITD Wifi
5.1.5 Mean and Standard Deviation
The mean and standard deviation of the upload times under 3G, 4G and
IITD Wifi networks are captured in the tables below:
Table 5.1: Mean and Standard Deviation - 3G
Face Images Animal Im-ages
OCR Images Diverse Im-ages
Mean 1487.47ms 640.53ms 938.21ms 527.40ms
StandardDevia-tion
1368.08ms 346.22ms 910.52ms 275.31ms
c© 2018, Indian Institute of Technology Delhi
5.2 Cloud Run Time Analysis
Table 5.2: Mean and Standard Deviation - 4G
Face Images Animal Im-ages
OCR Images Diverse Im-ages
Mean 734.12ms 370.52ms 563.12ms 393.89ms
StandardDevia-tion
839.37ms 702.84ms 645.56ms 362.58ms
Table 5.3: Mean and Standard Deviation - IITD Wifi
Face Images Animal Im-ages
OCR Images Diverse Im-ages
Mean 249.57ms 298.38ms 264.09ms 181.94ms
StandardDevia-tion
336.32ms 636.23ms 554.28ms 439.32ms
5.2 Cloud Run Time Analysis
5.2.1 Face Detection
There are a total of 75 images which are plotted in the upcoming graphs since
there were 25 images each of three different characteristics of image namely
images containing 1 face, 2 face and 4 faces respectively as discussed earlier.
The average over 75 images is plotted in the upcoming graphs assuming the
same cloud run time of 1 face, 2 faces and 4 faces. This was concluded after
observing no significant difference in their run times. The following graph is
a cdf which shows the variation of the Cloud Run Time of images under 3G
network:
c© 2018, Indian Institute of Technology Delhi
5.2 Cloud Run Time Analysis
0 500 1000 1500 2000 25000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for Face Images − 3G
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.13: Cloud Run Time for face detection - 3G
The following graph is a cdf which shows the variation of the Cloud Run
Time of images under 4G network:
0 500 1000 1500 2000 25000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for Face Images − Airtel4G
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.14: Cloud Run Time for face detection - 4G
c© 2018, Indian Institute of Technology Delhi
5.2 Cloud Run Time Analysis
The following graph is a cdf which shows the variation of the Cloud Run
Time of images under IITD WIfi network:
0 500 1000 1500 2000 25000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for Face Images − IITDWifi
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.15: Cloud Run Time for face detection - IITD Wifi
5.2.2 Animal Detection
There are a total of 25 images which are plotted in the upcoming graphs
since there was just one class of images containing 25 images as discussed
earlier.
The following graphs is a cdf which shows the variation of the Cloud Run
Time of images under 3G network:
c© 2018, Indian Institute of Technology Delhi
5.2 Cloud Run Time Analysis
0 1000 2000 3000 4000 5000 60000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for animal Images − 3g
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.16: Cloud Run Time for animal detection - 3G
The following graphs is a cdf which shows the variation of the Cloud Run
Time of images under 4G network
0 1000 2000 3000 4000 5000 60000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for animal Images − Airtel4G
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.17: Cloud Run Time for animal detection - 4G
The following graphs is a cdf which shows the variation of the Cloud Run
Time of images under IITD Wifi network
c© 2018, Indian Institute of Technology Delhi
5.2 Cloud Run Time Analysis
0 1000 2000 3000 4000 5000 60000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for animal Images − IITDWifi
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.18: Cloud Run Time for animal detection - IITD Wifi
5.2.3 OCR
There are a total of 25 images which are plotted in the upcoming graphs since
there are 2 different characteristics of images, one on which just English text
detection algorithm is run and one on which both English and Hindi text
detection algorithm is run. The following graph is a cdf which shows the
variation of the Cloud Run Time of images under 3G network:
0 500 1000 1500 2000 2500 3000 3500 4000 45000
0.2
0.4
0.6
0.8
1
Time(in ms)
Fra
ctio
n o
f Im
ages
Algo Time for ocr Images − 3g
7:00 A.M. − English
10:00 A.M. − English
1:30 P.M. − English
4:30 P.M. − English
7:30 P.M. − English
12 A.M. − English
7:00 A.M. − Both
10:00 A.M. − Both
1:30 P.M. − Both
4:30 P.M. − Both
7:30 P.M. − Both
12 A.M. − Both
Figure 5.19: Cloud Run Time for OCR - 3G
c© 2018, Indian Institute of Technology Delhi
5.2 Cloud Run Time Analysis
The following graph is a cdf which shows the variation of the Cloud Run
Time of images under 4G network:
0 500 1000 1500 2000 2500 3000 3500 4000 45000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for ocr Images − Airtel4G
7:00 A.M. − English
10:00 A.M. − English
1:30 P.M. − English
4:30 P.M. − English
7:30 P.M. − English
12 A.M. − English
7:00 A.M. − Both
10:00 A.M. − Both
1:30 P.M. − Both
4:30 P.M. − Both
7:30 P.M. − Both
12 A.M. − Both
Figure 5.20: Cloud Run Time for OCR - 4G
The following graph is a cdf which shows the variation of the Cloud Run
Time of images under IITD WIfi network:
0 500 1000 1500 2000 2500 3000 35000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for ocr Images − IITDWifi
7:00 A.M. − English
10:00 A.M. − English
1:30 P.M. − English
4:30 P.M. − English
7:30 P.M. − English
12 A.M. − English
7:00 A.M. − Both
10:00 A.M. − Both
1:30 P.M. − Both
4:30 P.M. − Both
7:30 P.M. − Both
12 A.M. − Both
Figure 5.21: Cloud Run Time for OCR - IITD Wifi
c© 2018, Indian Institute of Technology Delhi
5.2 Cloud Run Time Analysis
5.2.4 Diverse Images
There are a total of 25 images which are plotted in the upcoming graphs
since there were 25 images and only one characteristic of images are there as
discussed earlier. The following graphs are cdf’s which shows the variation
of the Cloud Run Time of images under 3G, 4G and IITD WIfi network
respectively:
0 1000 2000 3000 4000 5000 60000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for mixed Images − 3g
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.22: Cloud Run Time for all algorithms combined - 3G
0 1000 2000 3000 4000 5000 60000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for mixed Images − airtel
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.23: Cloud Run Time for all algorithms combined - 4G
c© 2018, Indian Institute of Technology Delhi
5.2 Cloud Run Time Analysis
0 1000 2000 3000 4000 5000 60000
0.2
0.4
0.6
0.8
1
Time(in ms)
Frac
tion
of Im
ages
Algo Time for mixed Images − IITDWifi
7 A.M.
10 A.M.
1:30 P.M.
4:30 P.M.
7:30 P.M.
12 A.M.
Figure 5.24: Cloud Run Time for all algorithms combined - IITD Wifi
5.2.5 Mean and Standard Deviation
The mean and standard deviation of the Cloud Run Times under 3G, 4G
and IITD Wifi networks are captured in the tables below:
Table 5.4: Mean and Standard Deviation - 3G
Face Animal OCR En-glish
OCRBoth
All Algo-rithms
Mean 708.25ms 865.02ms 835.16ms 2090.28ms 835.8ms
StandardDevia-tion
208.61ms 140.11ms 215.35ms 962.38ms 113.89ms
c© 2018, Indian Institute of Technology Delhi
5.3 Accuracy
Table 5.5: Mean and Standard Deviation - 4G
Face Animal OCR En-glish
OCRBoth
All Algo-rithms
Mean 766ms 928.45ms 909.76ms 1719.21ms 952.64ms
StandardDevia-tion
133.63ms 145.23ms 238.05ms 421.86ms 127.19ms
Table 5.6: Mean and Standard Deviation - IITD Wifi
Face Animal OCR En-glish
OCRBoth
All Algo-rithms
Mean 970.33ms 986.58ms 1015.80ms 1812.47ms 1089.53ms
StandardDevia-tion
162.60ms 335.14ms 261.57ms 485.02ms 223.96ms
5.3 Accuracy
In this section, I will discuss about the accuracy of the various algorithms
that are run on Cloud and the specific details too for which the test results
are positive and negative
5.3.1 Face Detection
There are a total of 731 images in the MAVI Face Detection dataset. They
are classified under two illumination conditions 1 and 2. The results are as
depicted in the graph below:
c© 2018, Indian Institute of Technology Delhi
5.3 Accuracy
0
50
100
150
200
250
300
0
50
100
150
200
250
300
350
400
−1
1
Width Of The Bounding BoxHeight of The Bounding Box
Figure 5.25: Face Detection Accuracy
In the above figure, the X-axis depicts the width of the face in the image
and the Y-axis depicts the height of the face in the image. On the Z-axis, -1
depicts that a face is not detected and +1 depicts that the face is detected.
As we can see from the graph, all the faces which are not detected mostly
lie in the box with dimensions less than 50x50. A detailed description of
variation of accuracy with face sizes is shown in the table below:
Table 5.7: Variation of Accuracy with Face Size
Size Images Detected Images Accuracy
30x30 89 0 0%
40x40 179 1 0.005%
50x50 114 58 50.87%
60x60 79 76 96.20%
60x60 + 270 269 99.62%
In the above table, number of face images corresponding to size AxB denotes
the number of images that have their dimensions smaller than AxB but bigger
than the ones in the previous row. For example, 40x40 denotes the number
c© 2018, Indian Institute of Technology Delhi
5.3 Accuracy
of images whose dimensions are less than 40x40 but bigger than 30x30 and
so on. The variation of accuracy with illumination conditions is shown in the
graph below:
Figure 5.26: Illumination Vs Accuracy
The distribution of image sizes with the two illumination conditions are as
shown in the graphs below:
0
50
100
150
200
250
300
0
50
100
150
200
250
300
350
400
−1
1
Width Of The Bounding BoxHeight of The Bounding Box
Figure 5.27: Illumination1 Vs Accuracy
c© 2018, Indian Institute of Technology Delhi
5.3 Accuracy
0
50
100
150
200
250
0
50
100
150
200
250
300
350
−1
1
Width Of The Bounding BoxHeight of The Bounding Box
Figure 5.28: Illumination2 Vs Accuracy
5.3.2 Animal Detection
Cow
There are a total of 1599 images of Cows in the MAVI Cow Dataset. These
images are classified according to two domains. Domain 1 classifies images
as Standing and Sitting Cows while Domain 2 classifies images as Front, Side
and Back poses of Cows.
Figure 5.29: Accuracy - Cow Detection
c© 2018, Indian Institute of Technology Delhi
5.3 Accuracy
The graph above shows the accuracies in all of these different categories. The
table below captures the details of the images with respect to different poses
of cows and the accuracy of the same respectively.
Table 5.8: Variation of Accuracy with Cow Poses
Pose Images Detected Images Accuracy
Standing 1552 695 44.78%
Sitting 44 13 29.54%
Front 161 72 44.72%
Side 1157 600 51.85%
Back 280 36 12.85%
Dog
There are a total of 1557 images of Cows in the MAVI Dog Dataset. These
images are classified according to two domains. Domain 1 classifies images
as Standing and Sitting Dogs while Domain 2 classifies images as Front,
Side and Back poses of Dogs. The table below captures the details of the
images with respect to different poses of dogs and the accuracy of the same
respectively.
Table 5.9: Variation of Accuracy with Dog Poses
Pose Images Detected Images Accuracy
Standing 667 273 40.93%
Sitting 889 343 38.58%
Front 322 129 40.06%
Side 997 458 45.93%
Back 237 29 12.23%
The graph below shows the accuracies of DOg detection in all the different
poses of Dogs:
c© 2018, Indian Institute of Technology Delhi
5.3 Accuracy
Figure 5.30: Accuracy - Dog Detection
5.3.3 OCR
The OCR accuracy is divided into two classes:
• Class 1 - English Accuracy in which only English language is specified
in the request to the Vision API
• Class 2 - English and Hindi Accuracy in which both English and Hindi
is specified in the request to the Vision API.
The dataset used is the MAVI OCR dataset updated as on 25th June, 2018.
The accuracy is computed as character by character. A total of 1428 images
are taken into consideration in Class 1 and a total of 1409 images are taken
into consideration in Class 2.
The Accuracy in Class 1 is 90.26% while in Class 2 is just 60.58%
Hence, as we can see the Accuracy is much higher when just the English lan-
guage is specified as compared to when both English and Hindi language are
specified. Also, the accuracy of English text detection decreases too when
c© 2018, Indian Institute of Technology Delhi
5.4 Energy Consumption Analysis
both English and Hindi are specified as compared to when just English is
specified.
5.4 Energy Consumption Analysis
5.4.1 Current Readings
Base Current is the current absorbed by Raspberry Pi when there are no
external components connected apart from the power source.
Base Current - 0.31A The current grows to 0.72A when external com-
ponents like mouse, keyboard, monitor are connected and the Internet con-
nectivity is provided through USB tethering.
The following table shows the readings of the two set of experiments that
were performed:
Table 5.10: Current Consumption
WiFiHotspot
USB Tether-ing
Images stored on Raspberry Pi 0.33A - 0.35A 0.77A - 0.84A
Entire Prototype(Images capturedfrom camera and the output sent toapp via Bluetooth)
0.45A - 0.55A 0.82A - 0.88A
In the second experiment where the images are taken from camera, the cur-
rent consumption increases just when every new image is captured.
5.4.2 Energy Readings
The one iteration of the following experiments consisted of 13 different ex-
periments namely diverse images, 1face, 1face - all, 2 face, 2 face all, 4 face,
4 face, animal, animal all, ocr English, ocr English all, ocr Both, ocr Both
all. Each of which consisted of 25 images.
c© 2018, Indian Institute of Technology Delhi
5.4 Energy Consumption Analysis
1/2/4face - Images containing 1/2/4 faces and just the face detection algo-
rithm is run
1/2/4face all - Images containing 1/2/4 faces and all the algorithms are run
A similar convention is followed for all the remaining naming’s too.
USB Tethering
The Readings obtained through power bank are as follows:
Power Bank Capacity at 3.7V = 2600mAh
Power Bank Capacity at 5V = 3.7/5 * 2600 = 1924mAh
Total Run Time – 2 hrs 20 mins
10.5 iterations of the experiment done
3425 images computed(10*13*25 + 1*7*25)
Average Current Drawn = 824.5 mA(1924*3/7)
Measurements through USB Device:
Initially – 460 mAh
After 1 iteration – 624 mAH
1 iteration cost – 164 mAh
10.5 iterations cost – 1726mAh
Hence, the device has roughly around 10% error since the actual total energy
used is 1924mAh whereas if we calculate the total energy used by the USB
device, it turns out to be 1726mAh.
WiFi Hotspot
The readings obtained through power bank are as follows:
Power Bank Capacity at 3.7V – 2600mAh
Power Bank Capacity at 5V = 3.7/5 * 2600 = 1924mAh
Total Run Time – 5 hrs 20 mins
c© 2018, Indian Institute of Technology Delhi
5.4 Energy Consumption Analysis
30.38 iterations of the experiment done
9875 images computed(30*13*25 + 1*5*25)
Average Current Drawn = 360.75 mA(1924*3/16)
Measurements through USB Device:
Initially – 1329 mAh
After 1 iteration – 1389 mAH
1 iteration cost – 60 mAh
30.38 iterations cost – 1823mAh
Hence, the device has roughly around 10% error since the actual total energy
used is 1924mAh whereas if we calculate the total energy used by the USB
device, it turns out to be 1823mAh.
c© 2018, Indian Institute of Technology Delhi
Chapter 6
Batch Processing
6.1 Introduction
In this chapter, the batch processing of images on the cloud is described.
Here, a batch of images will be evaluated together on the cloud and the
corresponding output would also be received in batches. The flow of this
chapter will be as follows: I will start with the Experimental Setup followed
by the Results
6.2 Experimental Setup
There are a total of 60 images comprising of a blend of diverse images like
SignBoard Images, Animal Images and Face Images. There are two set of
experiments performed on these images:
• Class 1 - Sequential execution of algorithms on images which means
Images are processed one by one and one after the another
• Class 2 - Processing a batch of images together. The batch size was
varied from 5 to 10 to 15.
All the above mentioned experiments were conducted under three network
spectrums namely 3G, 4G and IITD Wifi as were the earlier experiments
too.
Google cloud doesn’t support a batch size of more than 16 as of 21/06/2018.
6.3 Results
In the subsequent tables, batch size 1 is equivalent to processing one image
after another sequentially. The following table shows the results of batch
processing of images under 3G Network:
c© 2018, Indian Institute of Technology Delhi
6.3 Results
Table 6.1: Batch Processing Of Images - 3G
Batch Size Mean Standard Deviation
1 137.4s 1.48s
5 79.54s 3.38s
10 84.07s 2.12s
15 77.28s 1.62s
The following table shows the variation of run time with batch size under 4G
network:
Table 6.2: Batch Processing Of Images - 4G
Batch Size Mean Standard Deviation
1 95.64s 4.90s
5 40.19s 10.15s
10 38.93s 8.84s
15 33.91s 5.46s
The following table shows the variation of run time with batch size under
IITD Wifi network:
Table 6.3: Batch Processing Of Images - IITD WIfi
Batch Size Mean Standard Deviation
1 96s 2s
5 54s 2s
10 32.5s 2.5s
15 30s 2s
c© 2018, Indian Institute of Technology Delhi
Chapter 7
Prototype
7.1 Capturing Images
In this chapter, the MAVI On Cloud prototypes are described. As the MAVI
system requires a continuous stream of images that are processed. Hence, we
need a camera too. There are two sources available to capture images which
are using the PivotHead smart camera and using the USB Web Camera. In
the following sections, a detailed description of both of these cameras are
mentioned.
7.1.1 PivotHead Smart Camera
Figure 7.1: PivotHead wearable smart camera[9]
PivotHead is a wearable device like a spectacle with a camera attached in
between the two glasses. It has various modes like the live streaming mode
c© 2018, Indian Institute of Technology Delhi
7.1 Capturing Images
and the image capture mode. It generates a wifi hotspot over which the
images are transferred. Thus, in order to receive the captured images during
live streaming from the PivotHead camera, one needs to connect to the wifi
hotspot generated by the same. It also has a provision for inserting a memory
card where the captured images can be stored. For more details, one can have
a look at the extensive documentation of the PivotHead camera available
online.
7.1.2 USB Web Camera
Another way of capturing images is using the USB webcam. We have used
the Logitech USB webcam. For capturing images, the fswebcam tool is in-
stalled in Raspberry Pi. The images are taken on RaspBerry Pi using the
following command:
fswebcam –no-banner -r 640x480 image.jpg
The Logitech USB webcam that is used looks like the figure below:
Figure 7.2: Logitech USB Web Camera[5]
c© 2018, Indian Institute of Technology Delhi
7.2 Prototype1 - Using PivotHead
7.2 Prototype1 - Using PivotHead
In this prototype, PivotHead smart camera is used for capturing images.
Since this is a MAVI on Cloud prototype, a common portable wifi hotspot is
generated. The images from the PivotHead camera is transferred to Rasp-
berry Pi over this hotspot itself and the same hotspot is also used to provide
Internet Connectivity so that the Google Vision API can run. The latency
in this prototype is large since there are two processes running on the same
network, the images are getting transferred as well as the images are getting
uploaded to the cloud. Hence, the order of latency per image is roughly
around 4 to 6 seconds. Also, the PivotHead Camera live streaming stops
automatically after some time due to over heating. Hence, there are a few
drawbacks to this prototype.
The visually impaired person will wear the PivotHead smart camera and
the Raspberry Pi will be mounted in a case. The person walks as the Pivot-
Head camera captures images, the images are sent to the Google Cloud for
processing and finally the MAVI android app speaks out the information
detected in the image.
7.3 Prototype2 - Using USB Web Camera
In this prototype, the Logitech USB Web Camera is used in order to capture
images. The latency in this prototype is lesser than as compared to the
earlier one. Though, the quality of images captured by Webcam is not as
good as the one’s captured from PivotHead. This prototype works extremely
well in indoor settings but performs poorly in the outdoor settings. Thus,
this prototype has a few drawbacks too. The following section presents a
comparison of the two prototypes discussed, the pro’s and con’s of both of
them.
c© 2018, Indian Institute of Technology Delhi
7.4 PivotHead Vs WebCam
7.4 PivotHead Vs WebCam
As discussed in the earlier two sections, both the prototypes have their own
pro’s and con’s. The following table summarizes the advantages and disad-
vantages of both the prototypes.
Table 7.1: Prototypes Comparison
Prototype1 - PivotHead Prototype2 - Webcam
Large amounts of latency(4s to 6s) Latency around 2.5s to 3.5s
Unreliable source stream Reliable source stream
Touch sensor is disabled due tooverheating of the device
No touch sensors
Good Quality Of Images Image quality not as good.
Performs equally well in both in-door and outdoor settings
Performs great for indoor settingsbut performs poorly in outdoorsettings.
c© 2018, Indian Institute of Technology Delhi
Chapter 8
Conclusions
8.1 Cloud Service Used
As discussed in chapter 2, after analyzing and comparing the features of
various cloud services available online, Google Cloud was finally chosen for
the purpose. The Google Cloud Vision API provides both English and Hindi
language support for OCR while none of the other cloud services provide
the same. Also, there is an extensive documentation available of the Goolge
Cloud Vision API as compared to the others and there is a lot of online
support available for the Google Cloud Vision API too. Hence, google cloud
best suits the MAVI application.
8.2 Network Latency
In chapter 5, the results of Network Latency were presented under three
network conditions 3G, 4G and IITD Wifi respectively. The total run time is
computed as the sum of the upload time, cloud run time and the download
time. The upload time is the governing factor in the total run time as it
comprises of the major chunk. The upload time is lowest in IITD Wifi and
highest in 3G which is also expected as the bandwidth decreases as we go
from IITD WIfi to 4G to 3G networks.
8.3 Cloud Run Time
In chapter 5, the results of Cloud Run Time were presented under three
network conditions 3G, 4G and IITD Wifi respectively.
As we can see from the subsection 5.2.5, the Cloud Run Time remains more or
less the same with change in network as is also expected. The OCR algorithm
c© 2018, Indian Institute of Technology Delhi
8.4 Energy Consumption
when run only for English language takes lesser time as compared to when
it is run for both English and Hindi languages. The standard deviation
in the cloud run time is much lesser than that in the upload time. This
can be attributed to the reason that upload time is primarily dependent
on the network speed and bandwidth available while the cloud run time is
independent of such factors leading to the small standard deviations.
8.4 Energy Consumption
As seen in the results in section 5.4, the current drawn by Raspberry Pi is not
much when the Internet connectivity is provided via WiFi hotspot whereas
it is significantly higher when the Internet Connectivity is provided through
USB tethering. This is due to the reason that the mobile draws current too
to get charged when there is USB tethering. There is just a slight increase
in current when an image is captured by the USB camera. There is no
significant increase in current if the images are stored on the Pi itself and
not taken dynamically from the camera.
8.5 Final Prototype
As discussed in chapter 7, there were two prototypes that were developed.
But, finally the USB webcam prototype was chosen for the demo on the Open
House Day due to the small amounts of latency involved in it and the reliable
source stream of images. Since Open House day was conducted in outdoor
settings, initially the prototype didn’t perform well as the images that were
captured were all of bad quality. But, once this was realized, the USB web
cam was placed in such a way so that the images captured were appropriate
which means to say in such a way that images were neither too bright nor
too dark.
c© 2018, Indian Institute of Technology Delhi
Bibliography
[1] Google cloud console home page https://console.cloud.google.com/home.
[2] Google cloud vision on raspberry pi
https://www.dexterindustries.com/howto/use-google-cloud-vision-
on-the-raspberry-pi.
[3] Google vision api pricing https://cloud.google.com/vision/pricing.
[4] Ibm watson visual recognition https://www.ibm.com/watson/developercloud/visual-
recognition/api/v3/curl.html?curl.
[5] Logitech web cam https://www.logitech.com/en-
roeu/product/webcam-c170.
[6] Mavi overview http://www.cse.iitd.ac.in/mavi/.
[7] Microsoft vision api https://docs.microsoft.com/en-us/azure/cognitive-
services/computer-vision/home.
[8] Ocr language support - google https://cloud.google.com/vision/docs/languages.
[9] Pivothead https://www.techrepublic.com/article/pivothead-debuts-
next-generation-smartglass-at-wearable-tech-expo/.
[10] Amazon Rekognition https://aws.amazon.com/rekognition/image fea-
tures/.
[7, 3, 1, 2, 10, 4, 6, 5, 9, 8]
c© 2018, Indian Institute of Technology Delhi