mit-som college, pune innovation in itmitsomcollege.edu.in/pdf/innovation-in-it.pdf ·...
TRANSCRIPT
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 1
MAEERS
MIT_SOM COLLEGE |PUNE|INDIA
Affiliated to savitribai puhle pune university & Accredited by NAAC with “A” Grade
INNOVATION IN IT
Research journal on Information Technology
MAEER’S MIT-SOM College, Pune
2 MIT-SOM College Journal on “Innovation in IT”
From the Editor’s desk
It is heartening to see that research and writing on varied aspects of information technology
embedded in Indian environment is growing. We, at MITSOM College, feel so happy to launch
our IT research Journal “Innovation in IT” as a part of this movement.
We are grateful to the many authors and institutes that are contributing to our endeavor to
promote research journal on Information Technology. MITSOM College upholds and preserves
the quest for academic enrichment and interpersonal development. Research area is the core for
the curriculum excellence. With the help of research, there is development of positive thinking
and with this, there is motivation to excel in research field which is very important for self
esteem and confidence.
This journal aims to support and promote the researches in many fields in IT such as Computer
Engineering, Computer Science, Computer Technology, Cyber crime, E-Business, Engineering
Management, Engineering Technology, Industrial Technology, Information Systems and many
more.
I would like to congratulate the team who strived for the grand success of this IT journal and we
look forward to the continued interest and contributions towards the journal.
Dr. R.M. Chitnis
Principal
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 3
INDEX
Research Paper
1) A Study of cloud based study of Cloud based technology for professional Education in
India
Prof. Rahul K. Mahakal and Dr. Shivaji D. Mundhe
2) Commerce Technology
Prof. Nidhi Satavlekar
3) Dial M for E Commerce
Prof. Shrinivas Kulkarni
4) Green Computing – Development of Industry and Prevention of Environment
Mrs.Poonam A. Lalwani and Ms.Aparna N. Kulkarni and Dr. Abhishek V. Jain
5) Challenges and opportunities with BigData Prof. Seema Rawat
Research Article
1) Contribution of India's IT Industry to Economic Progress Aditya Kurane ,S.Y.M.C.A.(Comm)
2) Current trends in Information Technology
Monika wicks ,S.Y. M.C.A(Comm.)
3) Windows 8
Kuldeep Jain ,,S.Y.M.C.A.(Comm)
MAEER’S MIT-SOM College, Pune
4 MIT-SOM College Journal on “Innovation in IT”
A STUDY OF CLOUD BASED TECHNOLOGY FOR PROFESSIONAL
EDUCATION IN INDIA
Prof. Rahul K. Mahakal1
Assistant Professor,
MITSOM College, Pune
Dr. Shivaji D. Mundhe2
Director – MCA
Sinhgad IMCA, Narhe, Pune
ABSTRACT:
Cloud computing provides a shared platform
of resources required for computing, that
can be made available and release on the
user‘s demand to serve a wide and
constantly expanding range of information
processing needs by considering the
necessity and elasticity of demand. Due to
the huge benefits of this technology, is
growing rapidly and being accepted in
various applications such as business,
education, government etc. In this paper, we
study how cloud computing can benefit
professional education in India. We also
discuss the cloud computing educational
environment and explore how universities
and institutions may take advantage of
clouds not only in terms of cost but also in
terms of efficiency, reliability, portability,
flexibility, and security.
KEYWORDS:
Cloud Computing, Web-based Learning,
Education System, Professional Education
System
I. INTRODUCTION:
Education is the most important pillar for
the developing countries, through which
growth of the society can be achieved.
Population of India is very large, therefore
to providing education to every individual is
very difficult in reality. Therefore there is
need a paradigm though which we can
achieve it. The best paradigm in education is
e-learning. It is commonly referred to the
intentional use of networked information
and communications technology (ICT) in
teaching and learning. We can also describe
it as a new way of learning through the
terms such as online learning, virtual
learning, distributed learning, network and
web-based learning etc. Last few years we
are observing the growing demand in e-
learning based application many directions.
E-learning based education can be useful for
both distance education programs and
residential campus based education
programs. In case of distance education
programs, e-learning act as a logical
extension of their distance education related
activities. And in case of residential campus-
based educational organizations, e-learning
as a way of improving access to their
various programs and also as a path of
grabbing the position into growing niche
markets [11-14].
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 5
II. PROBLEMS IN E-LEARNING:
The e-learning based education has
tremendous opportunity in country like
India. Through such educational mode we
can provide the similar type of education to
all the people of the country. The growth of
e-learning applications, are directly
connected to the increase in the access of
ICT and cost reduction in the education. One
of the reasons in the growing interest in e-
learning is use of multimedia resources for
teaching and learning process in ICT. Now a
day‘s most of the teacher is making use of
ICT in their teaching sessions. Most of the
educational organization provides their
programs through this mode, so that they
can reach to the maximum students. ICT has
provided an opportunity to learn from
anywhere and at any time to all the
aspirants.
In spite of the popularity of e-learning, it has
various constraints and limitations. The
main obstacle in the growth of e-learning is
the access problem due to the poor
infrastructure, without it there can be no e-
learning. Other limitations are nothing but
the cost related to e-learning application
which includes software and hardware cost
are falling, deployment cost, support and
maintenance cost and cost of trained
staff.[13]
III. CLOUD COMPUTING:
Cloud computing is a fast growing area
which attracts many users from various
disciplines. Cloud computing has brought
the new paradigm shift in the field of
education. Cloud computing delivers
services separately based on demand of user
and provide adequate network access, data
resource environment and efficient
flexibility. This technology is used for more
efficient and cost-effective computing by
centralizing storage, memory, computing
capacity of PC‘s and servers. The benefits of
cloud computing can support education
institutions to resolve some of the common
challenges such as cost reduction, quick and
effective communication, security, privacy,
flexibility and accessibility [1,2,4,6,7,8].
The National Institute of Standards and
Technology (NIST) defined five essential
characteristics for cloud computing which
include: [17-19].
On-demand Self Service
Broad Network Access
Resource Pooling
Rapid Elasticity
Measured Services
Cloud Computing based applications
provides various services in the field of
banking, healthcare and government. Cloud
computing services can be provided in
through the following service models:
IaaS (Infrastructure as a Service): Abstraction and virtualization [20-23] might
be provided to utilize the services of an
Internet with high scalability, higher
throughput, quality of service and high
computing power, this is known as
Infrastructure as a Service (IaaS).
SaaS (Software as a Service): Cloud
computing providers deliver common online
services which are accessed on the Internet
through a web browser. These services have
long been referred to as Software as a
Service (SaaS).
PaaS (Platform as a Service): Cloud
allows consumers to not only deploy but
also design, model, develop and test
MAEER’S MIT-SOM College, Pune
6 MIT-SOM College Journal on “Innovation in IT”
applications directly on the Cloud. It
supports work in groups on collaborative
projects where project team members are
geographically distributed, this is known as
Platform as a Service (PaaS).
The cloud can be used by public individuals
(public cloud), a single organization (private
cloud) or more than one organization that
share the same interests and policies
(community cloud). It can also be a mixture
of public and private clouds (hybrid cloud)
[55,56].
IV. CLOUD BASED FRAMEWORK
FOR EDUCATION:
The cloud based framework will be the
better solution to overcome all the problems
which are associated with the e-learning. As
per my study I have proposed a architecture
model, which can be implemented at
University level. It will be beneficial to all
Colleges, institutes those are affiliated with
the university. This architecture is based on
the four Layers (ISAU for Education)
I : Implementation Layer
S : Service Layer
A : Access Layer
U : User Layer
Figure – Layers of EDU-CLOUD
A. IMPLEMENTATION LAYER : In this layer implementation of
cloud can be done as per the need of
the system. It can be public cloud,
private cloud, community cloud or
hybrid cloud.
B. SERVICE LAYER : In this layer service will provided
as per the need of the system users.
It can be Software-as-a-Service
(SaaS), Platform-as-a-Service
(PaaS) or Infrastructure-as-a-
Service (IaaS).
C. ACCESS LAYER :
In this layer service can be accessed
through the devices. These devices
can be Desktops, Smart Phones or
Laptops.
D. USER LAYER :
This is last but important layer this
specifies the user of the cloud.
These users can be Students,
Teachers, Research Scholars,
Management, Principals, Parents,
Government or Control bodies.
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 7
Figure – Framework for EDU-CLOUD
V. BENEFITS OF CLOUD BASED
FRAMEWORK:
The following are some of the benefits of
successful implementation of EDU-CLOUD
model.
It can help universities keep pace with
ever growing resource requirements and
energy costs.
It creates huge opportunities for faster
research.
Faculty can be benefited through
efficient access and flexibility when
integrating technology into their classes.
Technology enhancement can be done at
single end only.
Researchers want instant access to high
performance computing services,
without the responsibility of managing a
large server and storage farm.
Similar kind of education will be
available for all students.
It can provide important gains in
offering direct access to a wide range of
different academic resources, research
applications and educational tools.
Various user of the system can connect
to the campus through their devices.
Parent can easily check the progress of
their wards through this system.
It also promises to provide a variety of
services that will be very useful to
faculty, staff and students.
In addition to this the universities can
also open their technology
infrastructures to private, public sectors
for research advancements.
VI. CONCLUSION:
Cloud based technology, is growing rapidly
and being accepted in various applications
such as business, education, government etc.
Through this paper we have highlighted the
cloud computing educational environment
through the EDU-CLOUD Framework and
explore how universities and institutions
may take advantage of clouds not only in
terms of cost but also in terms of efficiency,
reliability, portability, flexibility, and
security. In conclusion educational cloud
computing environment offers a wide range
of services in application, platform, and
infrastructure levels to students, faculty,
researchers, and academic staff.
VII. REFRENCES:
[1] Justin, C., Ivan, B., Arvind, K. and Tom,
A. ―Seattle: A Platform for Educational
MAEER’S MIT-SOM College, Pune
8 MIT-SOM College Journal on “Innovation in IT”
Cloud Computing‖ SIGCSE09, March 37,
2009, Chattanooga, Tennessee, USA. 2009.
[2] Shanthi Bala, P. ―Intensification of
educational cloud computing and crisis of
data security in public clouds‖, International
Journal on Computer Science and
Engineering (IJCSE), Vol. 02, No. 03, 2010,
741-745. Advanced Computing: An
International Journal ( ACIJ ), Vol.3, No.1,
January 2012
[3] M. Armbrust, A. Fox, R. Griffith, A.
Joseph, R. Katz, A. Konwinski, G. Lee, D.
Patterson, A.Rabkin, I. Stoica, and M.
Zaharia, ―Above the Clouds: A Berkeley
View of Cloud Computing,‖ UC Berkeley
Reliable Adaptive Distributed Systems
Laboratory, 2009.
[4] Al Noor, S., Mustafa, G., Chowdhury,
S., Hossain, Z. and Jaigirdar, F. ―A
Proposed Architecture of Cloud Computing
for Education System in Bangladesh and the
Impact on Current Education System‖
International Journal of Computer Science
and Network Security (IJCSNS), Vol.10
No.10. 2010.
[5] L. Vaquero, L. Rodero-Merino, J.
Caceres, and M. Lindner, ―A Break in the
Clouds: Towards a Cloud Definition,‖ ACM
SIGCOMM Computer Communication
Review, Volume 39 Issue 1, pages 50-55,
January 2009.
[6] Cloud Computing Articles. Cloud
Computing Education.
http://www.code2cloud.com/cloudcomputin
g-education/
[7] Cloud Computing Articles,
SaaS+PaaS+IaaS. Free Cloud Apps for
Educational Institutes: Schools, Colleges,
Universities. http://www.techno-
pulse.com/2010/08/free-cloud-apps-
educational-institutes.html
[8] Thomas, P. ―Cloud Computing: A
potential paradigm for practising the
scholarship of teaching and learning‖,
Electronic Library, The, Vol. 29 Iss: 2,
pp.214 – 224, 2011.
[9] Sultan, N. ―Cloud computing for
education: A new dawn?‖, International
Journal of Information Management 30
(2010) 109116.
[10] HP cloud system. A single platform for
private, public, and hybrid clouds. Simply
the most complete cloud system for
enterprises and service providers. Hewlett-
Packard Development Company 2011.
http://www.hp.com/hpinfo/newsroom/press_
kits/2011/EBcloudcomputing2011/fs_Cloud
_CloudSystem.pdf.
[11] Ellen Wagner, ―Delivering on the
promise of e-learning‖, white paper,
http://www.adobe.com/education/pdf/elearni
ng/Promise_of_eLearning_wp_final.pdf
[12] Luciana Carabaneanu, Romica
Trandafir, and Ion Mierlus-Mazilu, “Trends
in e-learning‖,
http://www.codewitz.net/papers/MMT_106-
111_Trends_in_E-Learning.pdf
[13] Som Naidu, ―E-learning a guidebook of
principles, procedures and practices‖,
CEMCA, 2006.
[14] ―What is Electronic Learning‖,
http://www.mup.com.au/uploads/files/pdf/9
78-0-522-85130-4.pdf
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 9
[15] Michael Miller, ―Cloud Computing
Pros and Cons for End Users‖,
microsoftpartnercommunity.co.uk,2009.
http://www.informit.com/articles/article.asp
x?p=1324280
[16]
http://en.wikipedia.org/wiki/Cloud_computi
ng
[17] GTSI Group, ―Cloud Computing -
Building a Framework for Successful
Transition,‖ White Paper, GTSI
Corporation, 2009.
[18] T. Dillon, C. Wu and E. Chang, ―Cloud
Computing: Issues and Challenges‖, 24th
IEEE International Conference on Advanced
Information Networking and Appications,
2010.
[19] P. Mell and T. Grance, ―The NIST
Definition of Cloud Computing‖
Recommendation of NIST, 2011
[20] Cloud Computing vs. Virtualization
http://www.learncomputer.com/cloud-
computing-vsvirtualization/
[21] Wikipedia ,
http://en.wikipedia.org/wiki/Virtualization
[22] Y. Luo,"Network I/O Virtualization for
Cloud Computing",IEEE Computer
Society,Oct. 2010.
[23] V. Sarathy, P. Narayan, and R.
Mikkilineni, ―Next generation Cloud
Computing Architecture‖ 2nd
International
IEEE Workshop On collaboration & Cloud
Computing, 2010.
[24] N. Robinson, L. Valeri, J. Cave, T.
Starkey, H. Graux, S. Creese and P.
Hopkins, ―The Cloud Understanding the
Security, Privacy and Trust Challenges‖,
RAND Corporation, 2011.
[25] W. Jansen and T.Grance ―Guidelines on
Security and Privacy in Public Cloud
Computing‖, NIST Draft Special
Publication 800-144, 2011.
[26] Mirza, A., "Is E-Learning Finally
Gaining Legitimacy in Saudi Arabia?",
Saudi Computer Journal, Vol. 6, No. 2,
2007.
‗
MAEER’S MIT-SOM College, Pune
10 MIT-SOM College Journal on “Innovation in IT”
Commerce Technology
Prof. Nidhi Satavlekar
Assistant Professor,
MITSOM College, Pune
Abstract:-
Mobile Technology is growing at a
tremendous speed and Internet has become a
vital resource. Broadly mcommerce involves
transactions that have financial values over
mobile device. This paper discusses the
concept of mobile commerce, which is now
considered major force in the international
business ground. It focuses on basic
functional platform of mcommerce
applications for understanding mcommerce.
This paper has projected building blocks of
mcommerce applications. Paper highlights
on definitions of mcommerce scope, market,
In this growing technology one is also
bothered about security of transactions It
looks at how the technology of mobile
devices has captured horizontal and vertical
markets.
Mcommerce:-A mobile user is the one who
can rightly use information from mobile or
wireless device and synchronizes
information on to mobile or wireless device
across wireless networks.
The fundamental Functional platform of
Mcommerce applications
Mcommerce services are classified into five
functional units that include wireless
messaging services, wireless web access
services, voice-activated services, location-
based services and digital content services.
Let us have a look into these five units:
Messaging Services:- In today‘s scenario
people have become more and more techno
savy ,email and messaging have become a
daily activity. We can now send and receive
message using wireless media. We can now
use Yahoo!IM .SMS
1. also provides a wide variety of
information services, including weather
reports, traffic information,
entertainment information like theatre ,
cinema and concerts. SMS also provides
financial information on stock quotes,
brokerage service sand directory
assistance. Some of the factors that may
even popularize mobile email system
would be providing with needed
software, installing reader in their
phones to view more popular file types.
Also offering the ability to forward
mobile email or mobile content to user
groups.
2. Web Access Service:-These type of
services offer formatting of any web site
for display on a mobile device screen. In
PDA market, this service helps
synchronize the user‘s desktop and PDA
so that both devices are updated.
Contents can be reformatted and sent to
a mobile browser using variety of other
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 11
technology such as Wireless Markup
Language(WML).
3. Voice Activated Services:- These offer
services like reading the email received,
speech recognition and directly spoken
driving directions along with the
graphical maps. There are many such
voice portals to name the few eg:-
Mapquest.com,--- A voice interface
driven by predefined question and
comments like if the user wants to know
which is the next schedule flight to
<someplace>? , such types of questions
are then recognized by mcommerce
services and respond to them
accordingly will definitely help users in
daily activities.
4. Location Based Services:-This could
lead to suite of valuable location- based
application and services such as for
driving directions, making hotel
reservation based on the location of the
user, finding and booking good
restaurant. But location based services
will lead to affect privacy details
5. Digital Content Services:-Despite of the
low-Bandwidth limitations of wireless
networks, several technologies are in
development that aim to offer video on
PDA. As Amazon.com was the
originator of commerce application and
more popularly in e-books. Users can
now access e-books on their hand held
devices and read them. Along with e-
books now e-music has covered the
market for mobile consumers.
Building blocks for mCommerce
Applications:-The mcommerce services
constitute of corporate servers, network,
setup process of devices and software
components.
Client Service setup:--Currently mobile
consumers are facing with long setup
process, entering number of parameters for
connection establishing. GPRS have
lessened the need of dial up every time a
user wants to access services. CDPD are
easing connectivity situation.
Network:-- In mobile systems, the data
propagates from the content server to GSM
network to mobile devices. The service
provider depend upon web connectivity. The
components that are called up during
interaction are generally the Base Stations,
the home location register, the mobile
switching centers, the visiting location
register.
Server Software Components:-- These
software s take into consideration the
appearance of the information being
displayed. A service provider or company‘s
application server will need to recognize
different client types in order to serve
appropriate content to them.
Application of mcommerce :--
1. Mobile advertising brings up major
issues of privacy. Much of the mobile
advertising is now based on physical
location of the user.Additional
implication include maintaining the
integrity of online, mobile coupons.
Their can be fake coupons and cashing
them in. One scheme for protecting
against fake redemption would include
the use of unique , random codes
MAEER’S MIT-SOM College, Pune
12 MIT-SOM College Journal on “Innovation in IT”
created as a hash of the promotional
code ,date time and some unique factors.
2. Mobile Banking:-Wireless banking
services are on the edge of becoming a
significant market for some reason:
People like to constantly manage their
bank account. People get SMS when
they perform any bank transaction. Users
can ask for balance inquiry.
3. Retails:-Handheld terminals can be used
to download sales and inventory
information for stock replacement.
4. Education:- In education sector
mcommerce technologies are used for
managing and accessing homework,,
attendance, extra curricular activities ,
referring material ,demonstrating science
application. This technologies offer
library access to students and faculties
on their handheld computer .It allow for
researchers to access and monitor the
results of test and surveys over wireless
network using handheld devices.
5. emergency room(ER). The doctors in ER
will analyze date and send advice back
to ambulance.
6. Travel:-Mobile devices can now be used
by maintenance floor personnel for
accessing ticketing information,
handling baggage as well as tracking lost
items.
Security issues:-
As we realized with the commercialization
of the Internet, security becomes a standard
issue that has to be managed. Security for
horizontal markets application includes
privacy integrity and non repudiation.
Privacy for mobile commerce generally
centers on the physical movements and
activities of individuals. Integrity ensures
data has not been modified in transit.
Nonrepudiation is similar to the wired or
physical world in that we must prove with
reasonable effort that a particular world has
willfully conducted a particular transaction.
Security solutions are restricted in the
mobile world mainly due to size and
mobility requirements of mobile devices.
Conclusions:- As m-commerce applications
and wireless devices are emerging with
rapid pace , one will take forward the other
one towards empowering innovation,
versatility and power in them. There are a
number of business opportunities and
challenges of bringing forth robust wireless
technologies ahead for fulfilling mobile
users requirements. With 4G systems, more
security, more speed and trendy display
mobile devices, mobile application will
survive and dominate the market.
Reference:-
1. Mcommerce security – A beginners
guide by Kapil Raina and Anurag
Harsh
2. http://www.peterindia.net/M-
CommerceOverview.html
3. http://en.wikipedia.org/wiki/Mobile_
computing
4. http://www.wapforum.org
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 13
Dial M for E Commerce
Prof. Shrinivas Kulkarni
Assistant Professor,
MIT-SOM College, Pune
Introduction:
The recently released Google Trends report
reiterated the inflection point of online retail
in India. Online shopping in India, saw 128
per cent growth in interest from the
consumers in the year 2011 to 2012 in
comparison to only 40 per cent growth in
2010 to 2011, making 2012 the tipping point
for online shopping in India.
Data released by Internet and Mobile
Association of India (IAMAI) pegs the total
Indian market for e-commerce at around
INR 50,000 crore (USD 12 billion) of which
80 per cent is transacted through travel e-
commerce. Retail e-commerce shares just 20
per cent of the pie. However, experts believe
that by the year 2025, the total e-commerce
market will reach at least INR 4, 00,000
crore (USD 96 billion) and the share of retail
will be half of that.
A considerably large number of shoppers are
buying products such as cameras, mobiles,
computers and accessories, apparels,
jewellery, home and kitchen appliances,
toys, gift items etc online. Till about five
years ago, books and music were the largest
selling categories online, but not anymore.
With the number of internet users growing
at a fast pace, online retail is bound to see a
revolution. A closer look at the market
shows five big trends that will shape the
marketing strategies in the online retail
environment in India.
The E Commerce Overview:
India's e-commerce market grew at a
staggering 88 per cent in 2013 to $ 16
billion, riding on booming online retail
trends and defying slower economic growth
and spiraling inflation, according to a survey
by industry body Assocham.The increasing
Internet penetration and availability of more
payment options boosted the e-commerce
industry in 2013.
"Besides electronics gadgets, apparel and
jewellery, home and kitchen appliances,
lifestyle accessories like watches, books,
beauty products and perfumes, baby
products witnessed significant upward
movement in last one year,"
According to the survey, India's e-commerce
market, which stood at $2.5 billion in 2009,
reached $8.5 billion in 2012 and rose 88 per
cent to touch $16 billion in 2013. The
survey estimates the country's e-commerce
market to reach $56 billion by 2023, driven
by rising online retail. Online shopping grew
at a rapid pace in 2013 due to aggressive
online discounts, rising fuel prices and
availability of abundant online options.
MAEER’S MIT-SOM College, Pune
14 MIT-SOM College Journal on “Innovation in IT”
Among the cities, Mumbai topped the list of
online shoppers followed by Delhi, while
Kolkata ranked third. The age-wise analysis
revealed that 35 per cent of online shoppers
are aged between 18 years and 25 years, 55
per cent between 26 years and 35 years, 8
per cent in the age group of 36-45 years,
while only 2 per cent are in the age group of
45-60 years. Besides, 65 per cent of online
shoppers are male while 35 per cent are
female.
To make the most of increasing online
shopping trends, more companies are
collaborating with daily deal and discount
sites, the survey pointed out. Customers are
looking for width of options and choices,
and thus online retail will soon no longer be
differentiated by deals and discounts.
Besides, online retailing is also being
considered a serious channel by sellers,
competing closely with their emphasis on
mainstream selling options.‖
The products that are sold most are in the
tech and fashion category, including mobile
phones, ipads, accessories, MP3 players,
digital cameras and jewellery, among others,
it found.
India has Internet base of around 150
million as of August, 2013, meaning, close
to 10 per cent of Internet penetration in India
throws a very big opportunity for online
retailers to grow and expand as future of
Internet seems very bright.
Those who are reluctant to shop online cited
reasons like preference to research products
and services online (30 per cent), finding
delivery costs too high (20), fear of sharing
personal financial information online (25)
and lack of trust on whether products would
be delivered in good condition (15), while
10 per cent do not have a credit or debit
card.
Drivers & Challenges:
Drivers:
With a mobile customer base of 951 million
Indians, M Commerce will expand rapidly,
in near future. With availability of 3G and
4G LTE Services, being launched by Airtel
and Reliance Geo, the transaction
experience will enhance the customer
satisfaction and may even lead to Customer
delight. This is one of the major
differentiators, for consumers, especially
during peak holiday
(Dussara/Diwali/Christmas/Valentine day
etc), when the customer satisfaction suffers.
The ever rising cost of fuel and perennial
parking problems, faced in large Metros, the
comfort of E Commerce will only grow in
leaps and bounds, in the coming years. The
latest trend of most Mobile users, buying
Smartphones, will expand the user base for
M Commerce and this initiative will
accelerate, as the smart phones prices will
drop under INR 4,000/ in FY-2014.Reports
have indicated that 87 million Indians prefer
accessing online shopping through their
smart phones. This has prompted most M
Commerce retailers to launch user friendly
Apps, which offers @ 2% conversions.
There are many tech start ups, which will
jump in fray investing in app development,
which would make these mobile transactions
smoother.
The growth will also be fueled by launching
Apps on 2G Phones, thru sms, which will
then enhance the potential user base to close
to 95% or over 800 Million Mobile users in
India.
The urban and semi-urban markets are
witnessing explosive growth in issuance of
Debit and Credit cards, which will be one of
the major growth drivers. The hinterland is
also preparing to participate, with RBI
initiative of Financial Inclusion, which will
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 15
enhance the Population under banking net to
over 50% from current 23%. According to a
report by Internet and Mobile Association of
India (IAMAI) and IMRB, India is expected
to have close to 165 million mobile internet
users by March 2014, up from 87.1 million
in December 2012 as more people are
accessing the web through mobile devices
and dongles.
The peculiarity behind the success of online
shopping in Tier II and II markets is
attributed to the fact that the accessibility
towards bigger brands is low but the
aspiration levels are high. Most e-retailers
agree that around 60 per cent of their orders
are placed from the top 10 cities and as high
as 40 per cent come from smaller towns.
This ratio was 80:20 five years back. On-
time delivery acts as a major differentiator
for smaller cities. Around 50-60 per cent of
our orders are from Tier II & III towns.
India is touted to be one of the biggest e-
commerce markets globally. This is one of
the reasons for the likes of Amazon to set up
shop in the country. It is also believed that
while players are mushrooming in the sector
at the moment, the future ahead would see
consolidation and emergence of clear
leaders.
Customers are looking for width of options
and choices, and thus online retail will soon
no longer be differentiated by deals and
discounts. Besides, online retailing is also
being considered a serious channel by
sellers, competing closely with their
emphasis on mainstream selling options.
Challenges:
Those who are reluctant to shop online cited
reasons like preference to research products
and services online (30 per cent), finding
delivery costs too high (20), fear of sharing
personal financial information online (25)
and lack of trust on whether products would
be delivered in good condition (15), while
10 per cent do not have a credit or debit
card.
Barriers to purchase:
Trust
Fear of loosing sensitive
Credit/Debit card details
Supply chain deficiencies, leading to
delays and breakages
Warranty and other obligations
Difference in item, ordered vs.
delivered, mainly in terms of
size,colour,model no., packaging and
accessories
Limitations of retailers website
navigational ease and comfort
Poor 3G Network, making access
difficult
Limited access to banking products
and services, mainly in semi-urban
and rural areas.
E-Commerce Players in India
Snap deal
Myntra.com
Flip cart
Yabhi.com
Times shopping
Jabong.com
Many others
Strategic Recommendation‘
MAEER’S MIT-SOM College, Pune
16 MIT-SOM College Journal on “Innovation in IT”
Customer acquisition to remain focus
area
As online retail is still a new phenomenon in
the country, acquiring customers still
happens to be the major focus area of
marketers. ―There are two kinds of users that
visit online retail sites. Ones who browse but
have not yet made the first purchase online
and second, who we call ‗fence buyers‘.
These are people who have experimented
online retailing for majorly ticketing
transactions. The strategy ahead is to
convert the latter into active online shoppers
First shopping experience key to building
customer retention
It‘s one of the major challenges in the retail
industry to convert a store into a brand, so
that the customer attaches loyalty to the
online retail brand and not just the products
the portal stocks. It‘s the experience that
counts to build loyalty in this competitive
space. ―The key decision maker for a
shopper to return to a site will completely
depend on the first shopping experience.
Providing high resolution images/videos,
investing in logistics operations, offering
stress free return policy and personalizing
the entire transaction will be the game
changers in this industry,‖ More than
loyalty, even the ticket size per transaction
sees as increase with return shoppers.
Shoppers start transacting with lower
Average Order Value (AOV) and once the
experience is good, based on his experience
of website, product and delivery, moves to
better AOVs.
Tier II & III cities to drive growth
The peculiarity behind the success of online
shopping in Tier II and III markets is
attributed to the fact that the accessibility
towards bigger brands is low but the
aspiration levels are high. With higher GDP
Growth, the purchasing power for hinterland
consumers will increase in next few years
and they will use E Commerce to purchase
inspirational brands, which may not be
available in their cities/towns.
Proliferation of Shopping Applications
In the last few years, a lot of shopping
applications have come up from eBay to
Amazon to best buy to Macy‘s on iOS and
Android platforms. These applications are
taking advantage of the mobile features like
context (location, time and people) which
have made it easier for the consumers to use
them. As a result, Forrester estimated that in
2011, over 24% iPhone users and 21%
Android users used a shopping application
New and Diverse Mobile Advertising
Formats
One of the key developments in the last few
years has been the maturity of mobile
advertising. The mobile advertising has
moved from banner advertising to coupons,
real time call to action, last minute deals,
etc. Real time access to potential customer
makes the advertising meaningful and
interested audience can react spontaneously
to the deal. Mobile coupons have a
redemption rate of 15% to 40%. Compare
this to traditional print coupons, which are
redeemed at less than 2%. Imagine a cinema
hall giving last minute discounts by
broadcasting the deal to the people in the
vicinity if it finds the hall half full. The
mobile ads recognize the user‘s location and
show them how far away the nearest
McDonald‘s location is.
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 17
Growing use of social media/ social
commerce
There is overwhelming evidence that people
trust their friends more than the claims made
by the retailers selling them and hence
integration of social media is a definite boon
to mobile commerce. People look for
reviews and recommendations from their
friends and other consumers and in
exchange are willing to share their network
with the retailer. IBM recently released a
study that showed that consumers are more
than willing to share with retailers through
social networks. In exchange for a better,
more personalized shopping experience,
consumers will tell all about their media
consumption (75%); age, race, gender, and
income (73%); name and address (61%);
and lifestyle details such as hobbies and
other interests (59%). Social commerce and
mobile has been the biggest trend of E
commerce
Growing popularity of Local Commerce
Social-Location-Mobile are the buzz words
these days. Services like Foursquare and
Face book Places are offering deals based on
loyalty by using the check-in functionality.
If a person is going to a particular bar more
often, then the bar tender can figure out the
loyalty based on the number of check-ins
that the person has done in that bar and can
therefore offer a suitable offer? Group
Buying sites like Groupon and deal coupons
are contributing to the popularity of local
commerce. Almost all the popular deal sites
have mobile applications and they have seen
steady increase in traffic from the mobile
phones.
Increasing use of Mobile Phones to get
Product Reviews/Information
Mobile phones are helping the consumers to
make informed decisions while they are in a
store. Last year nearly half the consumers in
the US used their mobiles to look for
product reviews. The consumers used the
bar code readers to get the product
information.
Proliferation of Price Comparison
Applications
Price comparison applications from eBay,
Amazon and The Find are becoming a real
threat to the physical retailers. Many
customers when they walk into a store are
using these applications to find the price on
an online store or even the other physical
stores in the vicinity. The retailer loses the
sale in case the customer is able to find a
better deal somewhere else. Earlier in the
absence of information, the conversion of a
walk-in customer was much higher. The
chart below from Comscroe shows that
pricing is important for the consumers and
the stores are having lost sale due to price
comparison applications.
Online shopping no more a price war
Emergence of newer categories will see
success in e-retailing, going beyond apparels
and electronics. ―Local commerce will
emerge in a big way, complemented by
social media recommendations. Curated and
differentiated deals which offer unique
MAEER’S MIT-SOM College, Pune
18 MIT-SOM College Journal on “Innovation in IT”
experience to customers will see a surge.
The next big thing to watch out in this
segment is the option of buying meals
online. It is an opportunity area with great
potential.
Aggressive marketing to create Brand:
With most Indian etailers, flush with latest
round of angel funding, they are becoming
aggressive on marketing. The recent ad blitz
on TV, Press, OOH and Social networking
will create a huge awareness going forward.
This initiative should also include setting up
exclusive brand showrooms, in high street
malls, where the customer can visit and
touch/feel the items, sold online. Consider
the fact, that most Retailer Big Bazaar,
Reliance, Chroma, have E Commerce
portals, which compliment their traditional
retailing. The Etailers can also adapt similar
strategy to expand their online sales. The
EBSR will enhance the brand equity and can
be used as face to face contact point for new
or first time customers.
Social networking
Unhappy customers, venting their frustration
on social networking sites, need to be
monitored carefully. The on line reputation
management, will be a must for most
retaiolrs.Perhaps,a Gift Coupon or Discount
Coupon for a delayed delivery or damaged
product, could help in a long way.
Creating new Apps, which will enhance
user experience
App development is a key to conversion
ratio. The etailers can invest own capital or
can also look at strategic alliances, based on
Revenue sharing concepts, which is very
common in MVAS space.
Create Customer Loyalty programs
Offer loyalty plans/programs, which will
offer additional benefits to existing
customers. There could be many offers,
bonus point, air mile rewards and a variety
of options, which will build long term loyal
consumer base. The Privilege card holders
can influence other fence seaters and should
be rewarded for their referral efforts.
Amazon offers a special scheme for $ 79
annual fees, with 48 hours guaranteed
delivery promise.
Attractive portal and strong/robust
Payment system
Many existing retailers are using third party
payment gateways. This trend will slowly
change to build own payment gateways. The
stronger Portal cyber security and easy
access with high band-width high band-
width super info highway will drive the
traffic and conversion.
Supply chain management
The most important aspect will be sourcing
of products and speedier delivery to
customers. The back end IT/SCM needs to
be strengthened, as most customers will
demand guaranteed delivery times. This is
more complicated, as buyers, - both
husband-wives are working and thus most
deliveries get staggered on the week-ends or
holidays. The investment in IT, bar coding,
on-line Tracking systems and all other
aspects of time and QOS compliances will
become more stringent. The bar on IT/SCM
has already been raised by MNCs, such as
Amazon/EBay and the Indian etailers will
need to compete with them.
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 19
Cash on Delivery and/or Free delivery
India is one of the major markets, where
lack of Credit/Debit cards, can be overcome
thru COD transactions. This has proven to
be a master stroke and have resolved many
other challenges / hurdles from customer
perspectives.However, customer
authentication, verification needs to be
strengthened, to minimize wastages.
Launch a prepaid Credit/Debit card
The Etailers can also launch own prepaid
Debit/Credit card, which can come with
Free initial top up and can be used, by those
customers, who do not have Credit/Debit
card. This would also be an ideal solution,
for parents, whose wards are staying away
for studies and can use this card for their
routine purchases.
Customer care:
Many novice customers may have lot of
queries, which could be resolved thru strong
customer care mechanism. This could be in
the forms of an online Avatar or Live
person, which can improve the conversion
rates dramatically.
Focus on in-house brands
Most organized retailers have in-house
brands, which contribute higher margins,
than branded items. This will help the
Etailers to break even and become profitable
in a short period of time.
Conclusions:
India with 1.2 Billion population and one of
the fastest growing middle class is most
potent market for M Commerce. The
explosion of Mobile phones and easier
access to organized payment system, will
add the required stimulus to E Commerce.
The projected figures of USD 100 Billion, is
definitely possible, as the customers see
value in on-line purchases, besides the
problems associated with traditional
purchasing options.
MAEER’S MIT-SOM College, Pune
20 MIT-SOM College Journal on “Innovation in IT”
Green Computing – Development of Industry and Prevention of
Environment
Mrs.Poonam A. Lalwani Ms.Aparna N. Kulkarni Dr. AbhisheK V. Jain
Asst. Professor, Managing Director, Asst. Professor
MITSOM College, Pune Digixe Core IT & NBN Sinhgad SCS, Pune
Multimedia Services,Pune.
.
ABSTRACT
As computers play an ever-larger role
in our lives, energy demands, costs,
and waste are escalating dramatically.
Green Computing is now under the
attention of not only environmental
organizations, but also businesses from
other industries. In recent years,
companies in the computer industry
have come to realize that going green is
in their best interest, both in terms of
public relations and reduced costs.
However the IT department is usually
always the one department
that uses the most amount of power
which in turn is an excessive amount
of overhead for a business as well
as a source for toxic waste. Making
IT ―Green‖ can not only save money
but help save our world by making it a
better place through reducing and/or
Eliminating wasteful practices and
using non- toxic materials.
I. INTRODUCTION
Green Computing is the study and
practice of designing, manufacturing,
using, and disposing of computers,
servers, and associated subsystems such
as monitors, printers, storage devices,
and networking and communications
systems efficiently and effectively with
minimal or no impact on the
environment. The goals are similar to
green chemistry that is reduce the use
of hazardous materials, maximize
energy efficiency during the product's
lifetime, and promote recyclability or
biodegradability of defunct products
and factory waste. Sustainable IT
services require the integration of green
computing practices such as power
management, virtualization, improving
cooling technology, recycling,
electronic waste disposal, and
optimization of the IT infrastructure to
meet sustainability requirements. The
lifecycle for computers has a direct
impact on our environment, including
pollution, use of heavy metals and toxic
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 21
products, and significant energy
consumption levels. IT sector alone
accounts for 2% of CO² emissions
worldwide. IT companies‘ data centers
are responsible for nearly one quarter of
the total carbon emissions produced by
this sector. Green computing also
strives to achieve economic viability
and improved system performance and
use, while abiding by our social and
ethical responsibilities.
II. NEED OF GREEN
COMPUTING
We have great machines and
equipments to accomplish our tasks,
great gadgets with royal looks and
features make our lives more
impressive and smooth. Today almost
all streams weather its IT, medicine,
transportation, agriculture uses
machines which indirectly requires
large amount of power and money for
its effective functioning. In IT
department, it is observed that most of
the computer energy is often wasteful.
This is because we leave the computer
ON even when it is not in use. The CPU
and fan consume power screen savers
consume power even when the system
is not in use. Insufficient power and
cooling capacities can also results in
loss of energy. It is observed that most
of the data centers don‘t have sufficient
cooling capacities. This results in
environment pollution.
It is the need of the hour to educate
people about the ―GREEN‖ use of ICT.
In order to promote these ideas and
create standards and regulations various
organizations have been formed. Many
technology companies actually belong
several of these to further their goals of
becoming more ―green‖.
III. AREA OF FOCUS
It is important to understand the life
cycle of computer while applying the
Concept of GREEN IT. This was
explained with the help of following
figure 1.
MAEER’S MIT-SOM College, Pune
22 MIT-SOM College Journal on “Innovation in IT”
FIGURE 1: Life Cycle Approach or Green IT
From the view of a user in an
organization, following are the Computer
Myths:
You should never turn off your
computer
Your computer is designed to
handle 40,000 on/off cycles. If you are an
average user, that‘s significantly more
cycles than you will initiate in the
computer‘s five to seven year life. When
you turn your computer off, you not only
reduce energy use, you also lower heat
stress and wear on the system.
Turning your computer off and
then back on uses more energy than
leaving it on
The surge of power used by a CPU
to boot up is far less than the energy your
computer uses when left on for more than
three minutes.
Screen savers save energy
This is a common
misconception. Screen savers were
originally designed to help prolong the
life of monochrome monitors. Those
monitors are now technologically
obsolete. Screen savers save energy only
if they actually turn off the screen or, with
laptops, turn off the backlight.
Network connections are lost when
computers go into low-power/sleep
mode
Newer computers are
designed to sleep on networks without
loss of data or connection. CPUs with
Wake on LAN (WOL) technology can be
left in sleep mode overnight to wake up
and receive data packets sent to the unit.
IV. WHAT CAN WE DO TO GO
GREEN
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 23
1. Turn off your computer at night so it
runs only eight hours a day you‘ll
reduce your energy use by 810 kWh
per year and net a 67 percent annual
savings.
2. Purchase flat screen monitors they use
significantly less energy and are not
as hard on your eyes as CRTs.
3. Unplug the electronics if not in use.
4. Consider a smaller monitor a 14-inch
display uses 40 percent less energy
than a 17-inch one.
5. Purchase an Energy Star–compliant
computer. Note that laptop models
use much less energy than desktop
units.
6. Save Paper when Printing: Things you
can do on your own—like printing
duplex, printing to PDF, previewing
before printing, and not printing
hundreds of copies of an email
forward to plaster around the office.
7. Recycling - Electronics Waste Can be
Recycled. Recycling can be defined
as the process of used materials
processing into new useful materials
with the aim to reduce environmental
pollution. The recycling process is
more environmentally friendly than
the process of making new stuff
because it can reduce the use of new
raw materials, land degradation,
pollution, and energy usage and also
can reduce greenhouse gases [2].
8. E-mail communications as an
alternative to paper memos and fax
documents.
9. plan your computer-related activities
so you can do them all at once,
keeping the computer off at other
times
V. CONCLUSION
The good news is, by embracing simple,
everyday green computing practices you
can improve energy management,
increase energy efficiency, reduce e-
waste, and save money in the process!
The main purpose of adopting an eco-
friendly lifestyle and making conscious
decisions is to reduce the harm to the
planet and create positive conditions for
the environment to flourish and thrive.
Going green is not a fad or a fashion. It is
a way of life. It is a conscious Go Green
to Breathe Easy effort to make a personal
contribution to improving the Earth‘s
health. Make your entire organization
Green in every way possible. Understand
the life cycle of IT products. Reduce as
much paper as possible and recycle it
when you can. Now the time came to
think about the efficiently use of
computers and the resources which are
non renewable.
MAEER’S MIT-SOM College, Pune
24 MIT-SOM College Journal on “Innovation in IT”
VI. REFERENCES
[ 1 ] Baroudi, Hill, Reinhold, and
Senxian (2009) Green IT for Dummies,
[ 2 ] Climate Savers Computing Initiative
(2011) Retrieved from
http://www.climatesaverscomputing.org/
[ 3 ] Energy Star Program (2010)
Retrieved from
http://www.energystar.gov/
[4] http://www.theglobal warming
statistics.org/ global- warming-essays.
[ 5 ] Microsoft: Green IT taking the first
step (2010) Retrieved from
http://www.microsoft.com/environment/o
ur_commitme
nt/articles/green_guide.aspx
[ 6 ] Recycle-it America (2011) Retrieved
from http://www.recycleitamerica.com/
[ 7 ] San Murugesan, ―Harnessing
Green IT: Principles and Practices,‖ IEEE
IT Professional, January-February 2008.
[ 8 ] The Green Grid (2010) Retrieved
from http://www.uh.edu/ infotech
/news/story.php ?story_id=130
[ 9 ] Ryan, John C. & Durning, Alan T.
Stuff: The Secret Lives of Everyday
Things. 1997.
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 25
CHALLENGES AND OPPORTUNITIES
WITH BIG DATA
Prof. SEEMA RAWAT
Assistant Professor,
MIT-SOM College, Pune
Executive Summary
The promise of data-driven decision-
making is now being recognized broadly,
and there is growing enthusiasm for the
notion of ``Big Data.‘‘ While the promise of
Big Data is real -- for example, it is
estimated that Google alone contributed 54
billion dollars to the US economy in 2009 --
there is currently a wide gap between its
potential and its realization.
Heterogeneity, scale, timeliness,
complexity, and privacy problems with Big
Data impede progress at all phases of the
pipeline that can create value from data. The
problems start right away during data
acquisition, when the data tsunami requires
us to make decisions, currently in an ad hoc
manner, about what data to keep and what to
discard, and how to store what we keep
reliably with the right metadata. Much data
today is not natively in structured format;
for example, tweets and blogs are weakly
structured pieces of text, while images and
video are structured for storage and display,
but not for semantic content and search:
transforming such content into a structured
format for later analysis is a major
challenge. The value of data
explodes when it can be linked with other
data, thus data integration is a major creator
of value. Since
most data is directly generated in
digital format today, we have the
opportunity and the challenge both to
influence the creation to facilitate later
linkage and to automatically link previously
created data. Data analysis, organization,
retrieval, and modeling are other
foundational challenges. Data analysis is a
clear bottleneck in many applications, both
due to lack of scalability of the underlying
algorithms and due to the complexity of the
data that needs to be analyzed. Finally,
presentation of the results and its
interpretation by non-technical domain
experts is crucial to extracting actionable
knowledge.
During the last 35 years, data
management principles such as physical and
logical independence, declarative querying
and cost-based optimization have led, during
the last 35 years, to a multi-billion dollar
industry. More importantly, these technical
MAEER’S MIT-SOM College, Pune
26 MIT-SOM College Journal on “Innovation in IT”
advances have enabled the first round of
business intelligence applications and laid
the foundation for managing and analyzing
Big Data today. The many novel challenges
and opportunities associated with Big Data
necessitate rethinking many aspects of these
data management platforms, while retaining
other desirable aspects. We believe that
appropriate investment in Big Data will lead
to a new wave of fundamental technological
advances that will be embodied in the next
generations of Big Data management and
analysis platforms, products, and systems.
We believe that these research
problems are not only timely, but also
having the potential to create huge economic
value in the US economy for years to come.
However, they are also hard, requiring us to
rethink data analysis systems in fundamental
ways. A major investment in Big Data,
properly directed, can result not only in
major scientific advances, but also lay the
foundation for the next generation of
advances in science, medicine, and business.
Challenges and Opportunities with Big
Data
1. Introduction
We are awash in a flood of data
today. In a broad range of application areas,
data is being collected at unprecedented
scale. Decisions that previously were based
on guesswork, or on pains takingly
constructed models of reality, can now be
made based on the data itself. Such Big Data
analysis now drives nearly every aspect of
our modern society, including mobile
services, retail, manufacturing, financial
services, life sciences, and physical sciences.
Scientific research has been
revolutionized by Big Data. The Sloan
Digital Sky Survey has today become a
central resource for astronomers the world
over. The field of Astronomy is being
transformed from one where taking pictures
of the sky was a large part of an
astronomer‘s job to one where the pictures
are all in a database already and the
astronomer‘s task is to find interesting
objects and phenomena in the database. In
the biological sciences, there is now a well-
established tradition of depositing scientific
data into a public repository, and also of
creating public databases for use by other
scientists. In fact, there is an entire
discipline of bioinformatics that is largely
devoted to the duration and analysis of such
data. As technology advances, particularly
with the advent of Next Generation
Sequencing, the size and number of
experimental data sets available is
increasing exponentially.
Big Data has the potential to
revolutionize not just research, but also
education. A recent detailed quantitative
comparison of different approaches taken by
35 charter schools in NYC has found that
one of the top five policies correlated with
measurable academic effectiveness was the
use of data to guide instruction. Imagine a
world in which we have access to a huge
database where we collect every detailed
measure of every student's academic
performance. This data could be used to
design the most effective approaches to
education, starting from reading, writing,
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 27
and math, to advanced, college-level,
courses. We are far from having access to
such data, but there are powerful trends in
this direction. In particular, there is a strong
trend for massive Web deployment of
educational activities, and this will generate
an increasingly large amount of detailed data
about students' performance.
It is widely believed that the use of
information technology can reduce the cost
of healthcare while improving its quality, by
making care more preventive and
personalized and basing it on more extensive
(home-based) continuous monitoring.
McKinsey estimates a savings of 300 billion
dollars every year in the US alone.
In 2010, enterprises and users stored
more than 13 Exabyte‘s of new data; this is
over 50,000 times the data in the Library of
Congress. The potential value of global
personal location data is estimated to be
$700 billion to end users, and it can result in
an up to 50% decrease in product
development and assembly costs, according
to a recent McKinsey report. McKinsey
predicts an equally great effect of Big Data
in employment, where 140,000-190,000
workers with ―deep analytical‖ experience
will be needed in the US; furthermore, 1.5
million managers will need to become data-
literate. Not surprisingly, the recent PCAST
report on Networking and IT R&D
identified Big Data as a ―research frontier‖
that can ―accelerate progress across a broad
range of priorities.‖ Even popular news
media now appreciates the value of Big Data
as evidenced by coverage in the Economist
[Eco2011], the New York Times, and
National Public Radio.
While the potential benefits of Big
Data are real and significant, and some
initial successes have already been achieved
(such as the Sloan Digital Sky Survey), there
remain many technical challenges that must
be addressed to fully realize this potential.
The sheer size of the data, of course, is a
major challenge, and is the one that is most
easily recognized. However, there are
others. Industry analysis companies like to
point out that there are challenges not just in
Volume, but also in Variety and Velocity,
and that companies should not focus on just
the first of these. By Variety, they usually
mean heterogeneity of data types,
representation, and semantic interpretation.
By Velocity, they mean both the rate at
which data arrive and the time in which it
must be acted upon. While these three are
important, this short list fails to include
additional important requirements such as
privacy and usability.
The analysis of Big Data involves
multiple distinct phases as shown in the
figure below, each of which introduces
challenges. Many people unfortunately
focus just on the analysis/modeling phase:
while that phase is crucial, it is of little use
without the other phases of the data analysis
pipeline. Even in the analysis phase, which
has received much attention, there are
poorly understood complexities in the
MAEER’S MIT-SOM College, Pune
28 MIT-SOM College Journal on “Innovation in IT”
context of multi-tenanted clusters where
several users‘ programs run concurrently.
Many significant challenges extend beyond
the analysis phase. For example, Big Data
has to be managed in context, which may be
noisy, heterogeneous and not include an
upfront model. Doing so raises the need to
track provenance and to handle uncertainty
and error: topics that are crucial to success,
and yet rarely mentioned in the same breath
as Big Data. Similarly, the questions to the
data analysis pipeline will typically not all
be laid out in advance. We may need to
figure out good questions based on the data.
Doing this will require smarter systems and
also better support for user interaction with
the analysis pipeline. Bottleneck in the
number of people empowered to ask
questions of the data and analyze it.
In fact, we currently have a major
We can drastically increase this number by
supporting many levels of engagement with
the data, not all requiring deep database
expertise. Solutions to problems such as this
will not come from incremental
improvements to business as usual such as
industry may make on its own. Rather, they
require us to fundamentally rethink how we
manage data analysis.
Fortunately, existing computational
techniques can be applied, either as is or
with some extensions, to at least some
aspects of the Big Data problem. For
example, relational databases rely on the
notion of logical data independence:
users can think about what they want
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 29
to compute, while the system (with skilled
engineers designing those systems)
determines how to compute it efficiently.
Similarly, the SQL standard and the
relational data model provide a uniform,
powerful language to express many query
needs and, in principle, allows customers to
choose between vendors, increasing
competition. The challenge ahead of us is to
combine these healthy features of prior
systems as we devise novel solutions to the
many new challenges of Big Data. In this paper, we consider each of the boxes
in the figure above, and discuss both what has already been done and what challenges
remain as we seek to exploit Big Data. We
begin by considering the five stages in the pipeline, then move on to the five cross-
cutting challenges, and end with a discussion of the architecture of the overall
system that combines all these functions.
2. Phases in the Processing Pipeline 2.1 Data Acquisition and Recording
Big Data does not arise out of a
vacuum: it is recorded from some data
generating source. For example, consider
our ability to sense and observe the world
around us, from the heart rate of an elderly
citizen, and presence of toxins in the air we
breathe, to the planned square kilometer
array telescope, which will produce up to 1
million terabytes of raw data per day.
Similarly, scientific experiments and
simulations can easily produce peta bytes of
data today.
Much of this data is of no interest,
and it can be filtered and compressed by
orders of magnitude. One challenge is to
define these filters in such a way that they
do not discard useful information. For
example, suppose one sensor reading differs
substantially from the rest: it is likely to be
due to the sensor being faulty, but how can
we be sure that it is not an artifact that
deserves attention? In addition, the data
collected by these sensors most often are
spatially and temporally correlated (e.g.,
traffic sensors on the same road segment).
We need research in the science of data
reduction that can intelligently process this
raw data to a size that its users can handle
while not missing the needle in the haystack.
Furthermore, we require ―on-line‖ analysis
techniques that can process such streaming
data on the fly, since we cannot afford to
store first and reduce afterward.
The second big challenge is to
automatically generate the right metadata to
describe what data is recorded and how it is
recorded and measured. For example, in
scientific experiments, considerable detail
regarding specific experimental conditions
and procedures may be required to be able to
interpret the results correctly, and it is
important that such metadata be recorded
with observational data. Metadata
acquisition systems can minimize the human
burden in recording metadata. Another
important issue here is data provenance.
Recording information about the data at its
birth is not useful unless this information
MAEER’S MIT-SOM College, Pune
30 MIT-SOM College Journal on “Innovation in IT”
can be interpreted and carried along through
the data analysis pipeline. For example, a
processing error at one step can render
subsequent analysis useless; with suitable
provenance, we can easily identify all
subsequent processing that dependent on
this step. Thus we need research both into
generating suitable metadata and into data
systems that carry the provenance of data
and its metadata through data analysis
pipelines.
2.2 Information Extraction and Cleaning
Frequently, the information collected
will not be in a format ready for analysis.
For example, consider the collection of
electronic health records in a hospital,
comprising transcribed dictations from
several physicians, structured data from
sensors and measurements (possibly with
some associated uncertainty), and image
data such as x-rays. We cannot leave the
data in this form and still effectively analyze
it. Rather we require an information extraction
process that pulls out the required
information from the underlying sources and
expresses it in a structured form suitable for
analysis. Doing this correctly and
completely is a continuing technical
challenge. Note that this data also includes
images and will in the future include video;
such extraction is often highly application
dependent (e.g., what you want to pull out of
an MRI is very different from what you
would pull out of a picture of the stars, or a
surveillance photo). In addition, due to the
ubiquity of surveillance cameras and
popularity of GPS-enabled mobile phones,
cameras, and other portable devices, rich
and high fidelity location and trajectory (i.e.,
movement in space) data can also be
extracted.
We are used to thinking of Big Data
as always telling us the truth, but this is
actually far from reality. For example,
patients may choose to hide risky behavior
and caregivers may sometimes mis-diagnose
a condition; patients may also inaccurately
recall the name of a drug or even that they
ever took it, leading to missing information
in (the history portion of) their medical
record. Existing work on data cleaning
assumes well-recognized constraints on
valid data or well-understood error models;
for many emerging Big Data domains these
do not exist. 2.3 Data Integration, Aggregation, and
Representation
Given the heterogeneity of the flood
of data, it is not enough merely to record it
and throw it into a repository. Consider, for
example, data from a range of scientific
experiments. If we just have a bunch of data
sets in a repository, it is unlikely anyone will
ever be able to find, let alone reuse, any of
this data. With adequate metadata, there is
some hope, but even so, challenges will
remain due to differences in experimental
details and in data record structure.
Data analysis is considerably more
challenging than simply locating,
identifying, understanding, and citing data.
For effective large-scale analysis all of this
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 31
has to happen in a completely automated
manner. This requires differences in data
structure and semantics to be expressed in
forms that are computer understandable, and
then ―robotically‖ resolvable. There is a
strong body of work in data integration that
can provide some of the answers. However,
considerable additional work is required to
achieve automated error-free difference
resolution.
Even for simpler analyses that
depend on only one data set, there remains
an important question of suitable database
design. Usually, there will be many
alternative ways in which to store the same
information. Certain designs will have
advantages over others for certain purposes,
and possibly drawbacks for other purposes.
Witness, for instance, the tremendous
variety in the structure of bioinformatics
databases with information regarding
substantially similar entities, such as genes.
Database design is today an art, and is
carefully executed in the enterprise context
by highly-paid professionals. We must
enable other professionals, such as domain
scientists, to create effective database
designs, either through devising tools to
assist them in the design process or through
forgoing the design process completely and
developing techniques so that databases can
be used effectively in the absence of
intelligent database design.
2.4 Query Processing, Data Modeling, and
Analysis
Methods for querying and mining
Big Data are fundamentally different from
traditional statistical analysis on small
samples. Big Data is often noisy, dynamic,
heterogeneous, inter-related and
untrustworthy. Nevertheless, even noisy Big
Data could be more valuable than tiny
samples because general statistics obtained
from frequent patterns and correlation
analysis usually overpower individual
fluctuations and often disclose more reliable
hidden patterns and knowledge. Further,
interconnected Big Data forms large
heterogeneous information networks, with
which information redundancy can be
explored to compensate for missing data, to
crosscheck conflicting cases, to validate
trustworthy relationships, to disclose
inherent clusters, and to uncover hidden
relationships and models.
Mining requires integrated, cleaned,
trustworthy, and efficiently accessible data,
declarative query and mining interfaces,
scalable mining algorithms, and big-data
computing environments. At the same time,
data mining itself can also be used to help
improve the quality and trustworthiness of
the data, understand its semantics, and
provide intelligent querying functions. As
noted previously, real-life medical records
have errors, are heterogeneous, and
frequently are distributed across multiple
systems. The value of Big Data analysis in
health care, to take just one example
application domain, can only be realized if it
can be applied robustly under these difficult
MAEER’S MIT-SOM College, Pune
32 MIT-SOM College Journal on “Innovation in IT”
conditions. On the flip side, knowledge
developed from data can help in correcting
errors and removing ambiguity. For
example, a physician may write ―DVT‖ as
the diagnosis for a patient. This abbreviation
is commonly used for both ―deep vein
thrombosis‖ and ―diverticulitis,‖ two very
different medical conditions. A knowledge-
base constructed from related data can use
associated symptoms or medications to
determine which of two the physician
meant.
Big Data is also enabling the next
generation of interactive data analysis with
real-time answers. In the future, queries
towards Big Data will be automatically
generated for content creation on websites,
to populate hot-lists or recommendations,
and to provide an ad hoc analysis of the
value of a data set to decide whether to store
or to discard it. Scaling complex query
processing techniques to terabytes while
enabling interactive response times is a
major open research problem today.
A problem with current Big Data
analysis is the lack of coordination between
database systems, which host the data and
provide SQL querying, with analytics
packages that perform various forms of non-
SQL processing, such as data mining and
statistical analyses. Today‘s analysts are
impeded by a tedious process of exporting
data from the database, performing a non-
SQL process and bringing the data back.
This is an obstacle to carrying over the
interactive elegance of the first generation of
SQL-driven OLAP systems into the data
mining type of analysis that is in increasing
demand. A tight coupling between
declarative query languages and the
functions of such packages will benefit both
expressiveness and performance of the
analysis. 2.5 Interpretation Having the ability to analyze Big Data is of
limited value if users cannot understand the
analysis. Ultimately, a decision-maker,
provided with the result of analysis, has to
interpret these results. This interpretation
cannot happen in a vacuum. Usually, it
involves examining all the assumptions
made and retracing the analysis.
Furthermore, as we saw above, there are
many possible sources of error: computer
systems can have bugs, models almost
always have assumptions, and results can be
based on erroneous data. For all of these
reasons, no responsible user will cede
authority to the computer system. Rather she
will try to understand, and verify, the results
produced by the computer. The computer
system must make it easy for her to do so.
This is particularly a challenge with Big
Data due to its complexity. There are often
crucial assumptions behind the data
recorded. Analytical pipelines can often
involve multiple steps, again with
assumptions built in. The recent mortgage-
related shock to the financial system
dramatically underscored the need for such
decision-maker diligence -- rather than
accept the stated solvency of a financial
institution at face value, a decision-maker
has to examine critically the many
assumptions at multiple stages of analysis.
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 33
In short, it is rarely enough to
provide just the results. Rather, one must
provide supplementary information that
explains how each result was derived, and
based upon precisely what inputs. Such
supplementary information is called the
provenance of the (result) data. By studying
how best to capture, store, and query
provenance, in conjunction with techniques
to capture adequate metadata, we can create
an infrastructure to provide users with the
ability both to interpret analytical results
obtained and to repeat the analysis with
different assumptions, parameters, or data
sets.
Systems with a rich palette of
visualizations become important in
conveying to the users the results of the
queries in a way that is best understood in
the particular domain. Whereas early
business intelligence systems‘ users were
content with tabular presentations, today‘s
analysts need to pack and present results in
powerful visualizations that assist
interpretation, and support user
collaboration as discussed in Sec. 3.5.
Furthermore, with a few clicks the
user should be able to drill down into each
piece of data that she sees and understand its
provenance, which is a key feature to
understanding the data. That is, users need
to be able to see not just the results, but also
understand why they are seeing those
results. However, raw provenance,
particularly regarding the phases in the
analytics pipeline, is likely to be too
technical for many users to grasp
completely. One alternative is to enable the
users to ―play‖ with the steps in the
analysis – make small changes to the
pipeline, for example, or modify values for
some parameters. The users can then view
the results of these incremental changes. By
these means, users can develop an intuitive
feeling for the analysis and also verify that it
performs as expected in corner cases.
Accomplishing this requires the system to
provide convenient facilities for the user to
specify analyses. Declarative specification,
discussed in Sec. 4, is one component of
such a system.
3. Challenges in Big Data Analysis
Having described the multiple phases
in the Big Data analysis pipeline, we now
turn to some common challenges that
underlie many, and sometimes all, of these
phases. These are shown as five boxes in the
second row of Fig. 1.
3.1 Heterogeneity and Incompleteness
When humans consume information,
a great deal of heterogeneity is comfortably
tolerated. In fact, the nuance and richness of
natural language can provide valuable depth.
However, machine analysis algorithms
expect homogeneous data, and cannot
understand nuance. In consequence, data
must be carefully structured as a first step in
(or prior to) data analysis. Consider, for
example, a patient who has multiple medical
MAEER’S MIT-SOM College, Pune
34 MIT-SOM College Journal on “Innovation in IT”
procedures at a hospital. We could create
one record per medical procedure or
laboratory test, one record for the entire
hospital stay, or one record for all lifetime
hospital interactions of this patient. With
anything other than the first design, the
number of medical procedures and lab tests
per record would be different for each
patient. The three design choices listed have
successively less structure and, conversely,
successively greater variety. Greater
structure is likely to be required by many
(traditional) data analysis systems. However,
the less structured design is likely to be
more effective for many purposes – for
example questions relating to disease
progression over time will require an
expensive join operation with the first two
designs, but can be avoided with the latter.
However, computer systems work most
efficiently if they can store multiple items
that are all identical in size and structure.
Efficient representation, access, and analysis
of semi-structured data require further work.
Consider an electronic health record
database design that has fields for birth date,
occupation, and blood type for each patient.
What do we do if one or more of these
pieces of information is not provided by a
patient? Obviously, the health record is still
placed in the database, but with the
corresponding attribute values being set to
NULL. A data analysis that looks to classify
patients by, say, occupation, must take into
account patients for which this information
is not known. Worse, these patients with
unknown occupations can be ignored in the
analysis only if we have reason to believe
that they are otherwise statistically similar to
the patients with known occupation for the
analysis performed. For example, if
unemployed patients are more likely to hide
their employment status, analysis results
may be skewed in that it considers a more
employed population mix than exists, and
hence potentially one that has differences in
occupation-related health-profiles.
Even after data cleaning and error
correction, some incompleteness and some
errors in data are likely to remain. This
incompleteness and these errors must be
managed during data analysis. Doing this
correctly is a challenge. Recent work on
managing probabilistic data suggests one
way to make progress. 3.2 Scale
Of course, the first thing anyone
thinks of with Big Data is its size. After all,
the word ―big‖ is there in the very name.
Managing large and rapidly increasing
volumes of data has been a challenging issue
for many decades. In the past, this challenge
was mitigated by processors getting faster,
following Moore‘s law, to provide us with the resources needed to cope with increasing
volumes of data. But, there is a fundamental
shift underway now: data volume is scaling faster than compute resources, and CPU
speeds are static.
First, over the last five years the
processor technology has made a dramatic
shift - rather than processors doubling their
clock cycle frequency every 18-24 months,
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 35
now, due to power constraints, clock speeds
have largely stalled and processors are being
built with increasing numbers of cores. In
the past, large data processing systems had
to worry about parallelism across nodes in a
cluster; now, one has to deal with
parallelism within a single node.
Unfortunately, parallel data processing
techniques that were applied in the past for
processing data across nodes don‘t directly
apply for intra-node parallelism, since the
architecture looks very different; for
example, there are many more hardware
resources such as processor caches and
processor memory channels that are shared
across cores in a single node. Furthermore,
the move towards packing multiple sockets
(each with 10s of cores) adds another level
of complexity for intra-node parallelism.
Finally, with predictions of ―dark silicon‖,
namely that power consideration will likely
in the future prohibit us from using all of the
hardware in the system continuously, data
processing systems will likely have to
actively manage the power consumption of
the processor. These unprecedented changes
require us to rethink how we design, build
and operate data processing components.
The second dramatic shift that is
underway is the move towards cloud
computing, which now aggregates multiple
disparate workloads with varying
performance goals (e.g. interactive services
demand that the data processing engine
return back an answer within a fixed
response time cap) into very large clusters.
This level of sharing of resources on
expensive and large clusters requires new
ways of determining how to run and execute
data processing jobs so that we can meet the
goals of each workload cost-effectively, and
to deal with system failures, which occur
more frequently as we operate on larger and
larger clusters (that are required to deal with
the rapid growth in data volumes). This
places a premium on declarative approaches
to expressing programs, even those doing
complex machine learning tasks, since
global optimization across multiple users‘
programs is necessary for good overall
performance. Reliance on user-driven
program optimizations is likely to lead to
poor cluster utilization, since users are
unaware of other users‘ programs. System-
driven holistic optimization requires
programs to be sufficiently transparent, e.g.,
as in relational database systems, where
declarative query languages are designed
with this in mind.
A third dramatic shift that is
underway is the transformative change of
the traditional I/O subsystem. For many
decades, hard disk drives (HDDs) were used
to store persistent data. HDDs had far slower
random IO performance than sequential IO
performance, and data processing engines
formatted their data and designed their query
processing methods to ―work around‖ this
limitation. But, HDDs are increasingly being
replaced by solid state drives today, and
other technologies such as Phase Change
Memory are around the corner. These newer
MAEER’S MIT-SOM College, Pune
36 MIT-SOM College Journal on “Innovation in IT”
storage technologies do not have the same
large spread in performance between the
sequential and random I/O performance,
which requires a rethinking of how we
design storage subsystems for data
processing systems. Implications of this
changing storage subsystem potentially
touch every aspect of data processing,
including query processing algorithms,
query scheduling, database design,
concurrency control methods and recovery
methods.
3.3 Timeliness
The flip side of size is speed. The
larger the data set to be processed, the
longer it will take to analyze. The design of
a system that effectively deals with size is
likely also to result in a system that can
process a given size of data set faster.
However, it is not just this speed that is
usually meant when one speaks of Velocity
in the context of Big Data. Rather, there is
an acquisition rate challenge as described in
Sec. 2.1, and a timeliness challenge
described next.
There are many situations in which
the result of the analysis is required
immediately. For example, if a fraudulent
credit card transaction is suspected, it should
ideally be flagged before the transaction is
completed – potentially preventing the
transaction from taking place at all.
Obviously, a full analysis of a user‘s
purchase history is not likely to be feasible
in real-time. Rather, we need to develop
partial results in advance so that a small
amount of incremental computation with
new data can be used to arrive at a quick
determination.
Given a large data set, it is often
necessary to find elements in it that meet a
specified criterion. In the course of data
analysis, this sort of search is likely to occur
repeatedly. Scanning the entire data set to
find suitable elements is obviously
impractical. Rather, index structures are
created in advance to permit finding
qualifying elements quickly. The problem is
that each index structure is designed to
support only some classes of criteria. With
new analyses desired using Big Data, there
are new types of criteria specified, and a
need to devise new index structures to
support such criteria. For example, consider
a traffic management system with
information regarding thousands of vehicles
and local hot spots on roadways. The system
may need to predict potential congestion
points along a route chosen by a user, and
suggest alternatives. Doing so requires
evaluating multiple spatial proximity queries
working with the trajectories of moving
objects. New index structures are required to
support such queries. Designing such
structures becomes particularly challenging
when the data volume is growing rapidly
and the queries have tight response time
limits. 3.4 Privacy
The privacy of data is another huge
concern, and one that increases in the
context of Big Data. For electronic health
records, there are strict laws governing what
can and cannot be done. For other data,
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 37
regulations, particularly in the US, are less
forceful. However, there is great public fear
regarding the inappropriate use of personal
data, particularly through linking of data
from multiple sources. Managing privacy is
effectively both a technical and a
sociological problem, which must be
addressed jointly from both perspectives to
realize the promise of big data.
Consider, for example, data gleaned from location-based services. These new architectures require a user to share his/her location with the service provider, resulting in obvious privacy concerns. Note that hiding the user‘s identity alone
without hiding her location would not
properly address these privacy concerns. An
attacker or a (potentially malicious)
location-based server can infer the identity
of the query source from its (subsequent)
location information. For example, a user‘s
location information can be tracked through
several stationary connection points (e.g.,
cell towers). After a while, the user leaves
―a trail of packet crumbs‖ which could be
associated to a certain residence or office
location and thereby used to determine the
user‘s identity. Several other types of
surprisingly private information such as
health issues (e.g., presence in a cancer
treatment center) or religious preferences
(e.g., presence in a church) can also be
revealed by just observing anonymous
users‘ movement and usage pattern over
time. In general, Barabási et al. showed that
there is a close correlation between people‘s
identities and their movement patterns
[Gon2008]. Note that hiding a user location
is much more challenging than hiding
his/her identity. This is because with
location-based services, the location of the
user is needed for a successful data access or
data collection, while the identity of the user
is not necessary.
There are many additional
challenging research problems. For example,
we do not know yet how to share private
data while limiting disclosure and ensuring
sufficient data utility in the shared data. The
existing paradigm of differential privacy is a
very important step in the right direction, but
it unfortunately reduces information content
too far in order to be useful in most practical
cases. In addition, real data is not static but
gets larger and changes over time; none of
the prevailing techniques results in any
useful content being released in this
scenario. Yet another very important
direction is to rethink security for
information sharing in Big Data use cases.
Many online services today require us to
share private information (think of Facebook
applications), but beyond record-level
access control we do not understand what it
means to share data, how the shared data can
be linked, and how to give users fine-
grained control over this sharing. 3.5 Human Collaboration
In spite of the tremendous advances
made in computational analysis, there
remain many patterns that humans can easily
MAEER’S MIT-SOM College, Pune
38 MIT-SOM College Journal on “Innovation in IT”
detect but computer algorithms have a hard
time finding. Indeed, CAPTCHAs exploit
precisely this fact to tell human web users
apart from computer programs. Ideally,
analytics for Big Data will not be all
computational – rather it will be designed
explicitly to have a human in the loop. The
new sub-field of visual analytics is
attempting to do this, at least with respect to
the modeling and analysis phase in the
pipeline. There is similar value to human
input at all stages of the analysis pipeline.
In today‘s complex world, it often
takes multiple experts from different
domains to really understand what is going
on. A Big Data analysis system must support
input from multiple human experts, and
shared exploration of results. These multiple
experts may be separated in space and time
when it is too expensive to assemble an
entire team together in one room. The data
system has to accept this distributed expert
input, and support their collaboration.
A popular new method of harnessing
human ingenuity to solve problems is
through crowd-sourcing. Wikipedia, the
online encyclopedia, is perhaps the best
known example of crowd-sourced data. We
are relying upon information provided by
unvetted strangers. Most often, what they
say is correct. However, we should expect
there to be individuals who have other
motives and abilities – some may have a
reason to provide false information in an
intentional attempt to mislead. While most
such errors will be detected and corrected by
others in the crowd, we need technologies to
facilitate this. We also need a framework to
use in analysis of such crowd-sourced data
with conflicting statements. As humans, we
can look at reviews of a restaurant, some of
which are positive and others critical, and
come up with a summary assessment based
on which we can decide whether to try
eating there. We need computers to be able
to do the equivalent. The issues of
uncertainty and error become even more
pronounced in a specific type of crowd-
sourcing, termed participatory-sensing. In
this case, every person with a mobile phone
can act as a multi-modal sensor collecting
various types of data instantaneously (e.g.,
picture, video, audio, location, time, speed,
direction, acceleration). The extra challenge
here is the inherent uncertainty of the data
collection devices. The fact that collected
data are probably spatially and temporally
correlated can be exploited to better assess
their correctness. When crowd-sourced data
is obtained for hire, such as with
―Mechanical Turks,‖ much of the data
created may be with a primary objective of
getting it done quickly rather than correctly.
This is yet another error model, which must
be planned for explicitly when it applies. 4. System Architecture
Companies today already use, and
appreciate the value of, business
intelligence. Business data is analyzed for
many purposes: a company may perform
system log analytics and social media
analytics for risk assessment, customer
retention, brand management, and so on.
Typically, such varied tasks have been
handled by separate systems, even if each
system includes common steps of
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 39
information extraction, data cleaning,
relational-like processing (joins, group-by,
aggregation), statistical and predictive
modeling, and appropriate exploration and
visualization tools as shown in Fig. 1.
With Big Data, the use of separate
systems in this fashion becomes
prohibitively expensive given the large size
of the data sets. The expense is due not only
to the cost of the systems themselves, but
also the time to load the data into multiple
systems. In consequence, Big Data has made
it necessary to run heterogeneous workloads
on a single infrastructure that is sufficiently
flexible to handle all these workloads. The
challenge here is not to build a system that is
ideally suited for all processing tasks.
Instead, the need is for the underlying
system architecture to be flexible enough
that the components built on top of it for
expressing the various kinds of processing
tasks can tune it to efficiently run these
different workloads. The effects of scale on
the physical architecture were considered in
Sec 3.2. In this section, we focus on the
programmability requirements.
If users are to compose and build
complex analytical pipelines over Big Data,
it is essential that they have appropriate
high-level primitives to specify their needs
in such flexible systems. The Map-Reduce
framework has been tremendously valuable,
but is only a first step. Even declarative
languages that exploit it, such as Pig Latin,
are at a rather low level when it comes to
complex analysis tasks. Similar declarative
specifications are required at higher levels to
meet the programmability and composition
needs of these analysis pipelines. Besides
the basic technical need, there is a strong
business imperative as well. Businesses
typically will outsource Big Data
processing, or many aspects of it.
Declarative specifications are required to
enable technically meaningful service level
agreements, since the point of the out-sourcing is to
specify precisely what task will be
performed without going into details of how to do it.
Declarative specification is needed
not just for the pipeline composition, but
also for the individual operations
themselves. Each operation (cleaning,
extraction, modeling etc.) potentially runs
on a very large data set. Furthermore, each
operation itself is sufficiently complex that
there are many choices and optimizations
possible in how it is implemented. In
databases, there is considerable work on
optimizing individual operations, such as
joins. It is well-known that there can be
multiple orders of magnitude difference in
the cost of two different ways to execute the
same query. Fortunately, the user does not
have to make this choice – the database
system makes it for her. In the case of Big
Data, these optimizations may be more
complex because not all operations will be
I/O intensive as in databases. Some
operations may be, but others may be CPU
MAEER’S MIT-SOM College, Pune
40 MIT-SOM College Journal on “Innovation in IT”
intensive, or a mix. So standard database
optimization techniques cannot directly be
used. However, it should be possible to
develop new techniques for Big Data
operations inspired by database techniques.
The very fact that Big Data analysis
typically involves multiple phases highlights
a challenge that arises routinely in practice:
production systems must run complex
analytic pipelines, or workflows, at routine
intervals, e.g., hourly or daily. New data
must be incrementally accounted for, taking
into account the results of prior analysis and
pre-existing data. And of course, provenance
must be preserved, and must include the
phases in the analytic pipeline. Current
systems offer little to no support for such
Big Data pipelines, and this is in itself a
challenging objective. 5. Conclusion
We have entered an era of Big Data.
Through better analysis of the large volumes
of data that are becoming available, there is
the potential for making faster advances in
many scientific disciplines and improving
the profitability and success of many
enterprises. However, many technical
challenges described in this paper must be
addressed before this potential can be
realized fully. The challenges include not
just the obvious issues of scale, but also
heterogeneity, lack of structure, error-
handling, privacy, timeliness, provenance,
and visualization, at all stages of the analysis
pipeline from data acquisition to result
interpretation. These technical challenges
are common across a large variety of
application domains, and therefore not cost-
effective to address in the context of one
domain alone. Furthermore, these challenges
will require transformative solutions, and
will not be addressed naturally by the next
generation of industrial products. We must
support and encourage fundamental research
towards addressing these technical
challenges if we are to achieve the promised
benefits of Big Data. Bibliography
1. [CCC2011a] Advancing Discovery
in Science and Engineering.
Computing Community Consortium. i. Spring 2011.
2. [CCC2011b] Advancing
Personalized Education. Computing
Community Consortium. Spring
2011.
3. [CCC2011c] Smart Health and
Wellbeing. Computing Community
Consortium. Spring 2011.
4. [CCC2011d] A Sustainable Future.
Computing Community Consortium.
Summer 2011.
5. [DF2011] Getting
Beneath the Veil of Effective
Schools: Evidence from New York
City. Will
6. Dobbie, Roland G. Fryer, Jr. NBER
Working Paper No. 17632. Issued
Dec. 2011.
7. [Eco2011]Drowning in
numbers -- Digital data will flood the
planet—and help us understand it i. better. The Economist, Nov 18,
2011.
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 41
ii. http://www.economist.com/blogs/d
ailychart/2011/11/big-data-0
8. [FJ+2011] Using Data for
Systemic Financial Risk
Management. Mark Flood, H V
Jagadish, Albert i. Kyle, Frank Olken, and Louiqa
Raschid. Proc. Fifth Biennial
Conf. Innovative Data
ii. Systems
iii. Research, Jan. 2011.
9. [Gar2011] Pattern-Based
Strategy: Getting Value from Big
Data. Gartner Group press release.
July
1. 2011. Available at
http://www.gartner.com/it/page.jsp?i
d=1731916
10. [Gon2008] Understanding
individual human mobility patterns.
Marta C. González, César A.
Hidalgo, i. and Albert-László Barabási. Nature
453, 779-782 (5 June 2008)
11. [LP+2009] Computational
Social Science. David Lazer, Alex
Pentland, Lada Adamic, Sinan Aral,
i. Albert-László Barabási, Devon
Brewer,Nicholas Christakis, Noshir
Contractor, James ii. Fowler, Myron Gutmann, Tony
Jebara, Gary King, Michael Macy, Deb Roy, and Marshall
12. Van Alstyne. Science 6 February
2009: 323 (5915), 721-723.
13. [McK2011] Big data: The
next frontier for innovation,
competition, and productivity. James
Manyika, i. Michael Chui, Brad Brown,
Jacques Bughin, Richard Dobbs,
Charles Roxburgh, and
ii. Angela
iii. Hung Byers. McKinsey Global
Institute. May 2011.
14. [MGI2011] Materials
Genome Initiative for Global
Competitiveness. National Science
and
i. Technology Council. June
2011.
15. [NPR2011a] Folowing the
Breadcrumbs to Big Data Gold. Yuki
Noguchi. National Public Radio,
Nov. http://www.npr.org/2011/11/29/1425
21910/the-digital-breadcrumbs-that-
lead-to-big-data
16. [NPR2011b] The Search for
Analysts to Make Sense of Big Data.
Yuki Noguchi. National Public
Radio, i. Nov. 30, 2011.
ii. http://www.npr.org/2011/11/30/14
2893065/the-search-for-analysts-
to-make-sense-of-big-data
17. [NYT2012] The Age of
Big Data. Steve Lohr. New York
Times, Feb 11, 2012. i. http://www.nytimes.com/201
2/02/12/sunday-review/big-
datas-impact-in-the-
world.html
MAEER’S MIT-SOM College, Pune
42 MIT-SOM College Journal on “Innovation in IT”
Contribution of India's IT Industry to Economic Progress
Aditya Kurane
[S.Y.M.C.A.]
The contribution of India's IT industry to
economic progress has been quite
significant. The rapidly expanding socio-
economic infrastructure has proved to be of
great use in supporting the growth of Indian
information technology industry. The
flourishing Indian economy has helped the
IT sector to maintain
It‘s petitiveness in the global market. The IT
and IT enabled services industry in India has
recorded a growth rate of 22.4% in the last
fiscal year. The total revenue from this
sector was valued at 2.46 trillion Indian
rupees in the fiscal year 2007. Out of this
figure, the domestic IT market in India
accounted for 900 billion rupees. So, the IT
sector in India has played a major role in
drawing foreign funds into the domestic
market.
The growth and prosperity of India's IT
industry depends on some crucial factors.
These factors are as follows:
India is home to a large number of IT
professionals, who have the necessary
skill and expertise to meet the demands
and expectations of the global IT
industry.
The cost of skilled Indian workforce is
reasonably low compared to the
developed nations. This makes the
Indian IT services highly cost efficient
and this is also the reason as to why the
IT enabled services like business process
outsourcing and knowledge process
outsourcing have expanded significantly
in the Indian job market.
India has a huge pool of English-
speaking IT professionals. This is why
the English-speaking countries like the
US and the UK depend on the Indian IT
industry for outsourcing their business
processes. Also the Indian accent, which
is neutral, plays a major role, and
enables effective client –professional
Communication.
The emergence of Indian information
technology sector has brought about sea
changes in the Indian job market. The IT
sector of India offers a host of
opportunities of employment. With IT
giants like Infosys, Cognizant, Wipro,
Tata Consultancy Services, Accenture
and several other IT firms operating in
some of the major Indian cities, there is
no dearth of job opportunities for the
Indian software professionals. The IT
enabled sector of India absorbs a large
number of graduates from general
stream in the BPO and KPO firms. All
these have solved the unemployment
problem of India to a great extent. The
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 43
average purchasing power of the
common people of India has improved
substantially. The consumption spending
has recorded an all-time high. The
aggregate demand has increased as a
result. All these have improved the gross
production of goods and services in the
Indian economy. So in conclusion it can
be said that the growth of India's IT
industry has been instrumental in
facilitating the economic progress of
India.
MAEER’S MIT-SOM College, Pune
44 MIT-SOM College Journal on “Innovation in IT”
Current trends in Information Technology
Monika Wicks
SY MCA
The current world is techno- centric more
than ever. The rapidly expanding
information sector has left a huge disparity
between where the world is heading and the
approaches businesses are employing to run
their operations. The challenges to
businesses are therefore phenomenal
especially considering the fact the IT
industry is undergoing a tectonic shift
in technology. Different aspects of the
computing landscape are changing at the
same time including communication,
delivery platforms and collaboration
channels. With the information technology
sector, technological innovations are short-
lived as they are frequently changing with
time.
Some of the latest trends that have bought a
revelation in current IT industry are:
Cloud Computing
It is certainly one of the most
sophisticated of the latest trends in
information technology. Cloud computing
provides services such as software,
computation, data access and storage
services without the end- user knowing the
knowledge of the physical location and the
configuration of the system that provides the
service. It is especially effective in cutting
running costs for business for data storage
and other operation costs. Data- centers are
now being down- sized to pave way for
cloud storage. Cloud computing also has in-
built scalability and elasticity features which
can efficiently guide the growth of
businesses.
Consumerization of information
Technology
Technological innovation is actually driven
by the consumer world. More mobile
applications are increasingly being built for
the purpose of mobile users but not for the
replacement of computer applications. The
days of monolithic suits are slowly fading
away and are being taken over by
applications meant specifically for mobile
tablets and smart phones.
Big data/analytics and patterns
As companies continue to drown in
unstructured data which they hardly access,
innovations like the SLDF are being
incorporated in order to manage data. There
are different kinds of SLDF which include
waterfall and the Agile Development
Methodology. Some of the features of ADM
include continuous integration of data, pair
programming, offering spike solutions and
refactoring. The waterfall is more traditional
but is being fast replaced by the Agile
Development Methodology systems. Other
effective systems of data management
include technologies such as in- line
MAEER’S MIT-SOM College, Pune
MIT-SOM College Journal on “ Innovation in IT” 45
duplication, flash or solid- state drives and
automated tiring of data.
Resource management
servers are being virtualized which benefits
businesses in reducing work load
management. Data centers are moving
towards smaller sizes but with greater
density for data storage, i.e. creation of
infinite data centers. Virtualization enables
the improvement of vertically scale data
centers. Its use optimizes server
performance hence creating more floor
space and saving on energy. New scripting
languages: They include Java and .NET.
Some of the features and benefits of .NET
include a fast turnaround time, a simpler
AJAX implementation, and a single
framework that handles a variety of
operations.
There is therefore no need for multiple
frameworks from different vendors in order
to perform different functionalities. It is also
better funded thus, enabling new features to
come out at the fastest pace possible. Some
of the features integrated into the platform
include LINQ, AJAX, the Unit Testing
Framework, Performance Profiler, and
Client Side Reporting among various other
features. Java is quite similar to .NET in
features and benefits.
Fabrics
This is the vertical integration of server
systems, network and storage systems along
with components that have element- level
management software which lays the
foundation that can optimize shared data
resources effectively and dynamically.
Systems that are incorporating this feature
are Cisco and HP which use it to unify
network control.
MAEER’S MIT-SOM College, Pune
46 MIT-SOM College Journal on “Innovation in IT”
WINDOWS 8
Kuldeep jain
S.Y. MCA(com.)
The operating system was released to
manufacturing on August 1, 2012, and was
released for general availability on October
26, 2012.
Windows 8 introduced major changes to the
operating system's platform and user
interface to improve its user experience
on tablets, where Windows was now
competing with mobile operating systems,
including Android and iOS. In particular,
these changes included a touch-
optimized Windows shell based on
Microsoft's "Metro" design language,
the Start screen (which displays programs
and dynamically updated content on a grid
of tiles), a new platform for
developing apps with an emphasis
on touchscreen input, integration with online
services (including the ability to sync apps
and settings between devices), and Windows
Store, an online store for downloading and
purchasing new software. Windows 8 added
support for USB 3.0, Advanced Format hard
drives, near field communications,
and cloud computing. Additional security
features were introduced, such as built-
in antivirus software, integration
with Microsoft Smart Screen phishing
filtering service and support for UEFI
Secure Boot on supported devices
with UEFI firmware, to
prevent malware from infecting the boot
process.
Windows 8 hasn't done fantastically well in
terms of public reception - even leading
some at Microsoft to say that the company's
"Start Screen first" mentality was wide of
the mark. Sales of the software
also struggled at first, but after 90 days,
Microsoft indicated it has shifted enough
licenses to equal that of Windows 7.More
than 100 million Windows 8 licenses have
now been sold by Microsoft.
On October 17, 2013, Microsoft released the
first major update to the operating
system, Windows 8.1. The update addresses
some aspects of Windows 8 that were
criticized by reviewers and early
adopters and incorporates additional
improvements to various aspects of the
operating system.