mit-som college, pune innovation in itmitsomcollege.edu.in/pdf/innovation-in-it.pdf ·...

MAEER’S MIT-SOM College, Pune

MIT-SOM College Journal on “ Innovation in IT” 1

MAEERS

MIT_SOM COLLEGE |PUNE|INDIA

Affiliated to savitribai puhle pune university & Accredited by NAAC with “A” Grade

INNOVATION IN IT

Research journal on Information Technology


2 MIT-SOM College Journal on “Innovation in IT”

From the Editor’s desk

It is heartening to see that research and writing on varied aspects of information technology

embedded in Indian environment is growing. We, at MITSOM College, feel so happy to launch

our IT research Journal “Innovation in IT” as a part of this movement.

We are grateful to the many authors and institutes that are contributing to our endeavor to

promote research journal on Information Technology. MITSOM College upholds and preserves

the quest for academic enrichment and interpersonal development. Research area is the core for

the curriculum excellence. With the help of research, there is development of positive thinking

and with this, there is motivation to excel in research field which is very important for self

esteem and confidence.

This journal aims to support and promote the researches in many fields in IT such as Computer

Engineering, Computer Science, Computer Technology, Cyber crime, E-Business, Engineering

Management, Engineering Technology, Industrial Technology, Information Systems and many

more.

I would like to congratulate the team who strived for the grand success of this IT journal and we

look forward to the continued interest and contributions towards the journal.

Dr. R.M. Chitnis

Principal



INDEX

Research Paper

1) A Study of cloud based study of Cloud based technology for professional Education in

India

Prof. Rahul K. Mahakal and Dr. Shivaji D. Mundhe

2) Commerce Technology

Prof. Nidhi Satavlekar

3) Dial M for E Commerce

Prof. Shrinivas Kulkarni

4) Green Computing – Development of Industry and Prevention of Environment

Mrs.Poonam A. Lalwani and Ms.Aparna N. Kulkarni and Dr. Abhishek V. Jain

5) Challenges and opportunities with BigData Prof. Seema Rawat

Research Article

1) Contribution of India's IT Industry to Economic Progress Aditya Kurane ,S.Y.M.C.A.(Comm)

2) Current trends in Information Technology

Monika wicks ,S.Y. M.C.A(Comm.)

3) Windows 8

Kuldeep Jain ,,S.Y.M.C.A.(Comm)



A STUDY OF CLOUD BASED TECHNOLOGY FOR PROFESSIONAL

EDUCATION IN INDIA

Prof. Rahul K. Mahakal1

Assistant Professor,

MITSOM College, Pune

[email protected]

Dr. Shivaji D. Mundhe2

Director – MCA

Sinhgad IMCA, Narhe, Pune

[email protected]

ABSTRACT:

Cloud computing provides a shared platform

of resources required for computing, that

can be made available and release on the

user‘s demand to serve a wide and

constantly expanding range of information

processing needs by considering the

necessity and elasticity of demand. Due to

the huge benefits of this technology, is

growing rapidly and being accepted in

various applications such as business,

education, government etc. In this paper, we

study how cloud computing can benefit

professional education in India. We also

discuss the cloud computing educational

environment and explore how universities

and institutions may take advantage of

clouds not only in terms of cost but also in

terms of efficiency, reliability, portability,

flexibility, and security.

KEYWORDS:

Cloud Computing, Web-based Learning,

Education System, Professional Education

System

I. INTRODUCTION:

Education is the most important pillar for

the developing countries, through which

growth of the society can be achieved.

Population of India is very large, therefore

to providing education to every individual is

very difficult in reality. Therefore there is

need a paradigm though which we can

achieve it. The best paradigm in education is

e-learning. It is commonly referred to the

intentional use of networked information

and communications technology (ICT) in

teaching and learning. We can also describe

it as a new way of learning through the

terms such as online learning, virtual

learning, distributed learning, network and

web-based learning etc. Last few years we

are observing the growing demand in e-

learning based application many directions.

E-learning based education can be useful for

both distance education programs and

residential campus based education

programs. In case of distance education

programs, e-learning act as a logical

extension of their distance education related

activities. And in case of residential campus-

based educational organizations, e-learning

as a way of improving access to their

various programs and also as a path of

grabbing the position into growing niche

markets [11-14].

mailto:[email protected]

mailto:[email protected]



II. PROBLEMS IN E-LEARNING:

The e-learning based education has

tremendous opportunity in country like

India. Through such educational mode we

can provide the similar type of education to

all the people of the country. The growth of

e-learning applications, are directly

connected to the increase in the access of

ICT and cost reduction in the education. One

of the reasons in the growing interest in e-

learning is use of multimedia resources for

teaching and learning process in ICT. Now a

day‘s most of the teacher is making use of

ICT in their teaching sessions. Most of the

educational organization provides their

programs through this mode, so that they

can reach to the maximum students. ICT has

provided an opportunity to learn from

anywhere and at any time to all the

aspirants.

In spite of the popularity of e-learning, it has

various constraints and limitations. The

main obstacle in the growth of e-learning is

the access problem due to the poor

infrastructure, without it there can be no e-

learning. Other limitations are nothing but

the cost related to e-learning application

which includes software and hardware cost

are falling, deployment cost, support and

maintenance cost and cost of trained

staff.[13]

III. CLOUD COMPUTING:

Cloud computing is a fast growing area

which attracts many users from various

disciplines. Cloud computing has brought

the new paradigm shift in the field of

education. Cloud computing delivers

services separately based on demand of user

and provide adequate network access, data

resource environment and efficient

flexibility. This technology is used for more

efficient and cost-effective computing by

centralizing storage, memory, computing

capacity of PC‘s and servers. The benefits of

cloud computing can support education

institutions to resolve some of the common

challenges such as cost reduction, quick and

effective communication, security, privacy,

flexibility and accessibility [1,2,4,6,7,8].

The National Institute of Standards and

Technology (NIST) defined five essential

characteristics for cloud computing which

include: [17-19].

On-demand Self Service

Broad Network Access

Resource Pooling

Rapid Elasticity

Measured Services

Cloud Computing based applications

provides various services in the field of

banking, healthcare and government. Cloud

computing services can be provided in

through the following service models:

IaaS (Infrastructure as a Service): Abstraction and virtualization [20-23] might

be provided to utilize the services of an

Internet with high scalability, higher

throughput, quality of service and high

computing power, this is known as

Infrastructure as a Service (IaaS).

SaaS (Software as a Service): Cloud

computing providers deliver common online

services which are accessed on the Internet

through a web browser. These services have

long been referred to as Software as a

Service (SaaS).

PaaS (Platform as a Service): Cloud

allows consumers to not only deploy but

also design, model, develop and test



applications directly on the Cloud. It

supports work in groups on collaborative

projects where project team members are

geographically distributed, this is known as

Platform as a Service (PaaS).

The cloud can be used by public individuals

(public cloud), a single organization (private

cloud) or more than one organization that

share the same interests and policies

(community cloud). It can also be a mixture

of public and private clouds (hybrid cloud)

[55,56].

IV. CLOUD BASED FRAMEWORK

FOR EDUCATION:

The cloud based framework will be the

better solution to overcome all the problems

which are associated with the e-learning. As

per my study I have proposed a architecture

model, which can be implemented at

University level. It will be beneficial to all

Colleges, institutes those are affiliated with

the university. This architecture is based on

the four Layers (ISAU for Education)

I : Implementation Layer

S : Service Layer

A : Access Layer

U : User Layer

Figure – Layers of EDU-CLOUD

A. IMPLEMENTATION LAYER : In this layer implementation of

cloud can be done as per the need of

the system. It can be public cloud,

private cloud, community cloud or

hybrid cloud.

B. SERVICE LAYER : In this layer service will provided

as per the need of the system users.

It can be Software-as-a-Service

(SaaS), Platform-as-a-Service

(PaaS) or Infrastructure-as-a-

Service (IaaS).

C. ACCESS LAYER :

In this layer service can be accessed

through the devices. These devices

can be Desktops, Smart Phones or

Laptops.

D. USER LAYER :

This is last but important layer this

specifies the user of the cloud.

These users can be Students,

Teachers, Research Scholars,

Management, Principals, Parents,

Government or Control bodies.



Figure – Framework for EDU-CLOUD

V. BENEFITS OF CLOUD BASED

FRAMEWORK:

The following are some of the benefits of

successful implementation of EDU-CLOUD

model.

It can help universities keep pace with

ever growing resource requirements and

energy costs.

It creates huge opportunities for faster

research.

Faculty can be benefited through

efficient access and flexibility when

integrating technology into their classes.

Technology enhancement can be done at

single end only.

Researchers want instant access to high

performance computing services,

without the responsibility of managing a

large server and storage farm.

Similar kind of education will be

available for all students.

It can provide important gains in

offering direct access to a wide range of

different academic resources, research

applications and educational tools.

Various user of the system can connect

to the campus through their devices.

Parent can easily check the progress of

their wards through this system.

It also promises to provide a variety of

services that will be very useful to

faculty, staff and students.

In addition to this the universities can

also open their technology

infrastructures to private, public sectors

for research advancements.

VI. CONCLUSION:

Cloud based technology, is growing rapidly

and being accepted in various applications

such as business, education, government etc.

Through this paper we have highlighted the

cloud computing educational environment

through the EDU-CLOUD Framework and

explore how universities and institutions

may take advantage of clouds not only in

terms of cost but also in terms of efficiency,

reliability, portability, flexibility, and

security. In conclusion educational cloud

computing environment offers a wide range

of services in application, platform, and

infrastructure levels to students, faculty,

researchers, and academic staff.

VII. REFRENCES:

[1] Justin, C., Ivan, B., Arvind, K. and Tom,

A. ―Seattle: A Platform for Educational



Cloud Computing‖ SIGCSE09, March 37,

2009, Chattanooga, Tennessee, USA. 2009.

[2] Shanthi Bala, P. ―Intensification of

educational cloud computing and crisis of

data security in public clouds‖, International

Journal on Computer Science and

Engineering (IJCSE), Vol. 02, No. 03, 2010,

741-745. Advanced Computing: An

International Journal ( ACIJ ), Vol.3, No.1,

January 2012

[3] M. Armbrust, A. Fox, R. Griffith, A.

Joseph, R. Katz, A. Konwinski, G. Lee, D.

Patterson, A.Rabkin, I. Stoica, and M.

Zaharia, ―Above the Clouds: A Berkeley

View of Cloud Computing,‖ UC Berkeley

Reliable Adaptive Distributed Systems

Laboratory, 2009.

[4] Al Noor, S., Mustafa, G., Chowdhury,

S., Hossain, Z. and Jaigirdar, F. ―A

Proposed Architecture of Cloud Computing

for Education System in Bangladesh and the

Impact on Current Education System‖

International Journal of Computer Science

and Network Security (IJCSNS), Vol.10

No.10. 2010.

[5] L. Vaquero, L. Rodero-Merino, J.

Caceres, and M. Lindner, ―A Break in the

Clouds: Towards a Cloud Definition,‖ ACM

SIGCOMM Computer Communication

Review, Volume 39 Issue 1, pages 50-55,

January 2009.

[6] Cloud Computing Articles. Cloud

Computing Education.

http://www.code2cloud.com/cloudcomputin

g-education/

[7] Cloud Computing Articles,

SaaS+PaaS+IaaS. Free Cloud Apps for

Educational Institutes: Schools, Colleges,

Universities. http://www.techno-

pulse.com/2010/08/free-cloud-apps-

educational-institutes.html

[8] Thomas, P. ―Cloud Computing: A

potential paradigm for practising the

scholarship of teaching and learning‖,

Electronic Library, The, Vol. 29 Iss: 2,

pp.214 – 224, 2011.

[9] Sultan, N. ―Cloud computing for

education: A new dawn?‖, International

Journal of Information Management 30

(2010) 109116.

[10] HP cloud system. A single platform for

private, public, and hybrid clouds. Simply

the most complete cloud system for

enterprises and service providers. Hewlett-

Packard Development Company 2011.

http://www.hp.com/hpinfo/newsroom/press_

kits/2011/EBcloudcomputing2011/fs_Cloud

_CloudSystem.pdf.

[11] Ellen Wagner, ―Delivering on the

promise of e-learning‖, white paper,

http://www.adobe.com/education/pdf/elearni

ng/Promise_of_eLearning_wp_final.pdf

[12] Luciana Carabaneanu, Romica

Trandafir, and Ion Mierlus-Mazilu, “Trends

in e-learning‖,

http://www.codewitz.net/papers/MMT_106-

111_Trends_in_E-Learning.pdf

[13] Som Naidu, ―E-learning a guidebook of

principles, procedures and practices‖,

CEMCA, 2006.

[14] ―What is Electronic Learning‖,

http://www.mup.com.au/uploads/files/pdf/9

78-0-522-85130-4.pdf

http://www.code2cloud.com/cloudcomputing-education/

http://www.code2cloud.com/cloudcomputing-education/

http://www.techno-pulse.com/2010/08/free-cloud-apps-educational-institutes.html



http://www.hp.com/hpinfo/newsroom/press_kits/2011/EBcloudcomputing2011/fs_Cloud_CloudSystem.pdf



http://www.adobe.com/education/pdf/elearning/Promise_of_eLearning_wp_final.pdf

http://www.adobe.com/education/pdf/elearning/Promise_of_eLearning_wp_final.pdf

http://www.codewitz.net/papers/MMT_106-111_Trends_in_E-Learning.pdf

http://www.codewitz.net/papers/MMT_106-111_Trends_in_E-Learning.pdf

http://www.mup.com.au/uploads/files/pdf/978-0-522-85130-4.pdf

http://www.mup.com.au/uploads/files/pdf/978-0-522-85130-4.pdf



[15] Michael Miller, ―Cloud Computing

Pros and Cons for End Users‖,

microsoftpartnercommunity.co.uk,2009.

http://www.informit.com/articles/article.asp

x?p=1324280

[16]

http://en.wikipedia.org/wiki/Cloud_computi

ng

[17] GTSI Group, ―Cloud Computing -

Building a Framework for Successful

Transition,‖ White Paper, GTSI

Corporation, 2009.

[18] T. Dillon, C. Wu and E. Chang, ―Cloud

Computing: Issues and Challenges‖, 24th

IEEE International Conference on Advanced

Information Networking and Appications,

2010.

[19] P. Mell and T. Grance, ―The NIST

Definition of Cloud Computing‖

Recommendation of NIST, 2011

[20] Cloud Computing vs. Virtualization

http://www.learncomputer.com/cloud-

computing-vsvirtualization/

[21] Wikipedia ,

http://en.wikipedia.org/wiki/Virtualization

[22] Y. Luo,"Network I/O Virtualization for

Cloud Computing",IEEE Computer

Society,Oct. 2010.

[23] V. Sarathy, P. Narayan, and R.

Mikkilineni, ―Next generation Cloud

Computing Architecture‖ 2nd

International

IEEE Workshop On collaboration & Cloud

Computing, 2010.

[24] N. Robinson, L. Valeri, J. Cave, T.

Starkey, H. Graux, S. Creese and P.

Hopkins, ―The Cloud Understanding the

Security, Privacy and Trust Challenges‖,

RAND Corporation, 2011.

[25] W. Jansen and T.Grance ―Guidelines on

Security and Privacy in Public Cloud

Computing‖, NIST Draft Special

Publication 800-144, 2011.

[26] Mirza, A., "Is E-Learning Finally

Gaining Legitimacy in Saudi Arabia?",

Saudi Computer Journal, Vol. 6, No. 2,

2007.

‗

http://www.informit.com/articles/article.aspx?p=1324280

http://www.informit.com/articles/article.aspx?p=1324280

http://en.wikipedia.org/wiki/Cloud_computing

http://en.wikipedia.org/wiki/Cloud_computing

http://www.learncomputer.com/cloud-computing-vsvirtualization/

http://www.learncomputer.com/cloud-computing-vsvirtualization/

http://en.wikipedia.org/wiki/Virtualization



Commerce Technology

Prof. Nidhi Satavlekar


MITSOM College, Pune

Abstract:-

Mobile Technology is growing at a

tremendous speed and Internet has become a

vital resource. Broadly mcommerce involves

transactions that have financial values over

mobile device. This paper discusses the

concept of mobile commerce, which is now

considered major force in the international

business ground. It focuses on basic

functional platform of mcommerce

applications for understanding mcommerce.

This paper has projected building blocks of

mcommerce applications. Paper highlights

on definitions of mcommerce scope, market,

In this growing technology one is also

bothered about security of transactions It

looks at how the technology of mobile

devices has captured horizontal and vertical

markets.

Mcommerce:-A mobile user is the one who

can rightly use information from mobile or

wireless device and synchronizes

information on to mobile or wireless device

across wireless networks.

The fundamental Functional platform of

Mcommerce applications

Mcommerce services are classified into five

functional units that include wireless

messaging services, wireless web access

services, voice-activated services, location-

based services and digital content services.

Let us have a look into these five units:

Messaging Services:- In today‘s scenario

people have become more and more techno

savy ,email and messaging have become a

daily activity. We can now send and receive

message using wireless media. We can now

use Yahoo!IM .SMS

1. also provides a wide variety of

information services, including weather

reports, traffic information,

entertainment information like theatre ,

cinema and concerts. SMS also provides

financial information on stock quotes,

brokerage service sand directory

assistance. Some of the factors that may

even popularize mobile email system

would be providing with needed

software, installing reader in their

phones to view more popular file types.

Also offering the ability to forward

mobile email or mobile content to user

groups.

2. Web Access Service:-These type of

services offer formatting of any web site

for display on a mobile device screen. In

PDA market, this service helps

synchronize the user‘s desktop and PDA

so that both devices are updated.

Contents can be reformatted and sent to

a mobile browser using variety of other



technology such as Wireless Markup

Language(WML).

3. Voice Activated Services:- These offer

services like reading the email received,

speech recognition and directly spoken

driving directions along with the

graphical maps. There are many such

voice portals to name the few eg:-

Mapquest.com,--- A voice interface

driven by predefined question and

comments like if the user wants to know

which is the next schedule flight to

<someplace>? , such types of questions

are then recognized by mcommerce

services and respond to them

accordingly will definitely help users in

daily activities.

4. Location Based Services:-This could

lead to suite of valuable location- based

application and services such as for

driving directions, making hotel

reservation based on the location of the

user, finding and booking good

restaurant. But location based services

will lead to affect privacy details

5. Digital Content Services:-Despite of the

low-Bandwidth limitations of wireless

networks, several technologies are in

development that aim to offer video on

PDA. As Amazon.com was the

originator of commerce application and

more popularly in e-books. Users can

now access e-books on their hand held

devices and read them. Along with e-

books now e-music has covered the

market for mobile consumers.

Building blocks for mCommerce

Applications:-The mcommerce services

constitute of corporate servers, network,

setup process of devices and software

components.

Client Service setup:--Currently mobile

consumers are facing with long setup

process, entering number of parameters for

connection establishing. GPRS have

lessened the need of dial up every time a

user wants to access services. CDPD are

easing connectivity situation.

Network:-- In mobile systems, the data

propagates from the content server to GSM

network to mobile devices. The service

provider depend upon web connectivity. The

components that are called up during

interaction are generally the Base Stations,

the home location register, the mobile

switching centers, the visiting location

register.

Server Software Components:-- These

software s take into consideration the

appearance of the information being

displayed. A service provider or company‘s

application server will need to recognize

different client types in order to serve

appropriate content to them.

Application of mcommerce :--

1. Mobile advertising brings up major

issues of privacy. Much of the mobile

advertising is now based on physical

location of the user.Additional

implication include maintaining the

integrity of online, mobile coupons.

Their can be fake coupons and cashing

them in. One scheme for protecting

against fake redemption would include

the use of unique , random codes



created as a hash of the promotional

code ,date time and some unique factors.

2. Mobile Banking:-Wireless banking

services are on the edge of becoming a

significant market for some reason:

People like to constantly manage their

bank account. People get SMS when

they perform any bank transaction. Users

can ask for balance inquiry.

3. Retails:-Handheld terminals can be used

to download sales and inventory

information for stock replacement.

4. Education:- In education sector

mcommerce technologies are used for

managing and accessing homework,,

attendance, extra curricular activities ,

referring material ,demonstrating science

application. This technologies offer

library access to students and faculties

on their handheld computer .It allow for

researchers to access and monitor the

results of test and surveys over wireless

network using handheld devices.

5. emergency room(ER). The doctors in ER

will analyze date and send advice back

to ambulance.

6. Travel:-Mobile devices can now be used

by maintenance floor personnel for

accessing ticketing information,

handling baggage as well as tracking lost

items.

Security issues:-

As we realized with the commercialization

of the Internet, security becomes a standard

issue that has to be managed. Security for

horizontal markets application includes

privacy integrity and non repudiation.

Privacy for mobile commerce generally

centers on the physical movements and

activities of individuals. Integrity ensures

data has not been modified in transit.

Nonrepudiation is similar to the wired or

physical world in that we must prove with

reasonable effort that a particular world has

willfully conducted a particular transaction.

Security solutions are restricted in the

mobile world mainly due to size and

mobility requirements of mobile devices.

Conclusions:- As m-commerce applications

and wireless devices are emerging with

rapid pace , one will take forward the other

one towards empowering innovation,

versatility and power in them. There are a

number of business opportunities and

challenges of bringing forth robust wireless

technologies ahead for fulfilling mobile

users requirements. With 4G systems, more

security, more speed and trendy display

mobile devices, mobile application will

survive and dominate the market.

Reference:-

1. Mcommerce security – A beginners

guide by Kapil Raina and Anurag

Harsh

2. http://www.peterindia.net/M-

CommerceOverview.html

3. http://en.wikipedia.org/wiki/Mobile_

computing

4. http://www.wapforum.org

http://www.peterindia.net/M-CommerceOverview.html

http://www.peterindia.net/M-CommerceOverview.html

http://en.wikipedia.org/wiki/Mobile_computing

http://en.wikipedia.org/wiki/Mobile_computing

http://www.wapforum.org/



Dial M for E Commerce

Prof. Shrinivas Kulkarni


MIT-SOM College, Pune

Introduction:

The recently released Google Trends report

reiterated the inflection point of online retail

in India. Online shopping in India, saw 128

per cent growth in interest from the

consumers in the year 2011 to 2012 in

comparison to only 40 per cent growth in

2010 to 2011, making 2012 the tipping point

for online shopping in India.

Data released by Internet and Mobile

Association of India (IAMAI) pegs the total

Indian market for e-commerce at around

INR 50,000 crore (USD 12 billion) of which

80 per cent is transacted through travel e-

commerce. Retail e-commerce shares just 20

per cent of the pie. However, experts believe

that by the year 2025, the total e-commerce

market will reach at least INR 4, 00,000

crore (USD 96 billion) and the share of retail

will be half of that.

A considerably large number of shoppers are

buying products such as cameras, mobiles,

computers and accessories, apparels,

jewellery, home and kitchen appliances,

toys, gift items etc online. Till about five

years ago, books and music were the largest

selling categories online, but not anymore.

With the number of internet users growing

at a fast pace, online retail is bound to see a

revolution. A closer look at the market

shows five big trends that will shape the

marketing strategies in the online retail

environment in India.

The E Commerce Overview:

India's e-commerce market grew at a

staggering 88 per cent in 2013 to $ 16

billion, riding on booming online retail

trends and defying slower economic growth

and spiraling inflation, according to a survey

by industry body Assocham.The increasing

Internet penetration and availability of more

payment options boosted the e-commerce

industry in 2013.

"Besides electronics gadgets, apparel and

jewellery, home and kitchen appliances,

lifestyle accessories like watches, books,

beauty products and perfumes, baby

products witnessed significant upward

movement in last one year,"

According to the survey, India's e-commerce

market, which stood at $2.5 billion in 2009,

reached $8.5 billion in 2012 and rose 88 per

cent to touch $16 billion in 2013. The

survey estimates the country's e-commerce

market to reach $56 billion by 2023, driven

by rising online retail. Online shopping grew

at a rapid pace in 2013 due to aggressive

online discounts, rising fuel prices and

availability of abundant online options.

http://economictimes.indiatimes.com/topic/e-commerce

http://economictimes.indiatimes.com/topic/inflation

http://economictimes.indiatimes.com/topic/penetration

http://economictimes.indiatimes.com/topic/electronics



Among the cities, Mumbai topped the list of

online shoppers followed by Delhi, while

Kolkata ranked third. The age-wise analysis

revealed that 35 per cent of online shoppers

are aged between 18 years and 25 years, 55

per cent between 26 years and 35 years, 8

per cent in the age group of 36-45 years,

while only 2 per cent are in the age group of

45-60 years. Besides, 65 per cent of online

shoppers are male while 35 per cent are

female.

To make the most of increasing online

shopping trends, more companies are

collaborating with daily deal and discount

sites, the survey pointed out. Customers are

looking for width of options and choices,

and thus online retail will soon no longer be

differentiated by deals and discounts.

Besides, online retailing is also being

considered a serious channel by sellers,

competing closely with their emphasis on

mainstream selling options.‖

The products that are sold most are in the

tech and fashion category, including mobile

phones, ipads, accessories, MP3 players,

digital cameras and jewellery, among others,

it found.

India has Internet base of around 150

million as of August, 2013, meaning, close

to 10 per cent of Internet penetration in India

throws a very big opportunity for online

retailers to grow and expand as future of

Internet seems very bright.

Those who are reluctant to shop online cited

reasons like preference to research products

and services online (30 per cent), finding

delivery costs too high (20), fear of sharing

personal financial information online (25)

and lack of trust on whether products would

be delivered in good condition (15), while

10 per cent do not have a credit or debit

card.

Drivers & Challenges:

Drivers:

With a mobile customer base of 951 million

Indians, M Commerce will expand rapidly,

in near future. With availability of 3G and

4G LTE Services, being launched by Airtel

and Reliance Geo, the transaction

experience will enhance the customer

satisfaction and may even lead to Customer

delight. This is one of the major

differentiators, for consumers, especially

during peak holiday

(Dussara/Diwali/Christmas/Valentine day

etc), when the customer satisfaction suffers.

The ever rising cost of fuel and perennial

parking problems, faced in large Metros, the

comfort of E Commerce will only grow in

leaps and bounds, in the coming years. The

latest trend of most Mobile users, buying

Smartphones, will expand the user base for

M Commerce and this initiative will

accelerate, as the smart phones prices will

drop under INR 4,000/ in FY-2014.Reports

have indicated that 87 million Indians prefer

accessing online shopping through their

smart phones. This has prompted most M

Commerce retailers to launch user friendly

Apps, which offers @ 2% conversions.

There are many tech start ups, which will

jump in fray investing in app development,

which would make these mobile transactions

smoother.

The growth will also be fueled by launching

Apps on 2G Phones, thru sms, which will

then enhance the potential user base to close

to 95% or over 800 Million Mobile users in

India.

The urban and semi-urban markets are

witnessing explosive growth in issuance of

Debit and Credit cards, which will be one of

the major growth drivers. The hinterland is

also preparing to participate, with RBI

initiative of Financial Inclusion, which will

http://economictimes.indiatimes.com/topic/ipads

http://economictimes.indiatimes.com/topic/debit%20card




enhance the Population under banking net to

over 50% from current 23%. According to a

report by Internet and Mobile Association of

India (IAMAI) and IMRB, India is expected

to have close to 165 million mobile internet

users by March 2014, up from 87.1 million

in December 2012 as more people are

accessing the web through mobile devices

and dongles.

The peculiarity behind the success of online

shopping in Tier II and II markets is

attributed to the fact that the accessibility

towards bigger brands is low but the

aspiration levels are high. Most e-retailers

agree that around 60 per cent of their orders

are placed from the top 10 cities and as high

as 40 per cent come from smaller towns.

This ratio was 80:20 five years back. On-

time delivery acts as a major differentiator

for smaller cities. Around 50-60 per cent of

our orders are from Tier II & III towns.

India is touted to be one of the biggest e-

commerce markets globally. This is one of

the reasons for the likes of Amazon to set up

shop in the country. It is also believed that

while players are mushrooming in the sector

at the moment, the future ahead would see

consolidation and emergence of clear

leaders.

Customers are looking for width of options

and choices, and thus online retail will soon

no longer be differentiated by deals and

discounts. Besides, online retailing is also

being considered a serious channel by

sellers, competing closely with their

emphasis on mainstream selling options.

Challenges:

Those who are reluctant to shop online cited

reasons like preference to research products

and services online (30 per cent), finding

delivery costs too high (20), fear of sharing

personal financial information online (25)

and lack of trust on whether products would

be delivered in good condition (15), while

10 per cent do not have a credit or debit

card.

Barriers to purchase:

Trust

Fear of loosing sensitive

Credit/Debit card details

Supply chain deficiencies, leading to

delays and breakages

Warranty and other obligations

Difference in item, ordered vs.

delivered, mainly in terms of

size,colour,model no., packaging and

accessories

Limitations of retailers website

navigational ease and comfort

Poor 3G Network, making access

difficult

Limited access to banking products

and services, mainly in semi-urban

and rural areas.

E-Commerce Players in India

Snap deal

Myntra.com

Flip cart

Yabhi.com

Times shopping

Jabong.com

Many others

Strategic Recommendation‘





Customer acquisition to remain focus

area

As online retail is still a new phenomenon in

the country, acquiring customers still

happens to be the major focus area of

marketers. ―There are two kinds of users that

visit online retail sites. Ones who browse but

have not yet made the first purchase online

and second, who we call ‗fence buyers‘.

These are people who have experimented

online retailing for majorly ticketing

transactions. The strategy ahead is to

convert the latter into active online shoppers

First shopping experience key to building

customer retention

It‘s one of the major challenges in the retail

industry to convert a store into a brand, so

that the customer attaches loyalty to the

online retail brand and not just the products

the portal stocks. It‘s the experience that

counts to build loyalty in this competitive

space. ―The key decision maker for a

shopper to return to a site will completely

depend on the first shopping experience.

Providing high resolution images/videos,

investing in logistics operations, offering

stress free return policy and personalizing

the entire transaction will be the game

changers in this industry,‖ More than

loyalty, even the ticket size per transaction

sees as increase with return shoppers.

Shoppers start transacting with lower

Average Order Value (AOV) and once the

experience is good, based on his experience

of website, product and delivery, moves to

better AOVs.

Tier II & III cities to drive growth

The peculiarity behind the success of online

shopping in Tier II and III markets is

attributed to the fact that the accessibility

towards bigger brands is low but the

aspiration levels are high. With higher GDP

Growth, the purchasing power for hinterland

consumers will increase in next few years

and they will use E Commerce to purchase

inspirational brands, which may not be

available in their cities/towns.

Proliferation of Shopping Applications

In the last few years, a lot of shopping

applications have come up from eBay to

Amazon to best buy to Macy‘s on iOS and

Android platforms. These applications are

taking advantage of the mobile features like

context (location, time and people) which

have made it easier for the consumers to use

them. As a result, Forrester estimated that in

2011, over 24% iPhone users and 21%

Android users used a shopping application

New and Diverse Mobile Advertising

Formats

One of the key developments in the last few

years has been the maturity of mobile

advertising. The mobile advertising has

moved from banner advertising to coupons,

real time call to action, last minute deals,

etc. Real time access to potential customer

makes the advertising meaningful and

interested audience can react spontaneously

to the deal. Mobile coupons have a

redemption rate of 15% to 40%. Compare

this to traditional print coupons, which are

redeemed at less than 2%. Imagine a cinema

hall giving last minute discounts by

broadcasting the deal to the people in the

vicinity if it finds the hall half full. The

mobile ads recognize the user‘s location and

show them how far away the nearest

McDonald‘s location is.

http://www.telecomcircle.com/2009/11/overview-of-mobile-advertising/





Growing use of social media/ social

commerce

There is overwhelming evidence that people

trust their friends more than the claims made

by the retailers selling them and hence

integration of social media is a definite boon

to mobile commerce. People look for

reviews and recommendations from their

friends and other consumers and in

exchange are willing to share their network

with the retailer. IBM recently released a

study that showed that consumers are more

than willing to share with retailers through

social networks. In exchange for a better,

more personalized shopping experience,

consumers will tell all about their media

consumption (75%); age, race, gender, and

income (73%); name and address (61%);

and lifestyle details such as hobbies and

other interests (59%). Social commerce and

mobile has been the biggest trend of E

commerce

Growing popularity of Local Commerce

Social-Location-Mobile are the buzz words

these days. Services like Foursquare and

Face book Places are offering deals based on

loyalty by using the check-in functionality.

If a person is going to a particular bar more

often, then the bar tender can figure out the

loyalty based on the number of check-ins

that the person has done in that bar and can

therefore offer a suitable offer? Group

Buying sites like Groupon and deal coupons

are contributing to the popularity of local

commerce. Almost all the popular deal sites

have mobile applications and they have seen

steady increase in traffic from the mobile

phones.

Increasing use of Mobile Phones to get

Product Reviews/Information

Mobile phones are helping the consumers to

make informed decisions while they are in a

store. Last year nearly half the consumers in

the US used their mobiles to look for

product reviews. The consumers used the

bar code readers to get the product

information.

Proliferation of Price Comparison

Applications

Price comparison applications from eBay,

Amazon and The Find are becoming a real

threat to the physical retailers. Many

customers when they walk into a store are

using these applications to find the price on

an online store or even the other physical

stores in the vicinity. The retailer loses the

sale in case the customer is able to find a

better deal somewhere else. Earlier in the

absence of information, the conversion of a

walk-in customer was much higher. The

chart below from Comscroe shows that

pricing is important for the consumers and

the stores are having lost sale due to price

comparison applications.

Online shopping no more a price war

Emergence of newer categories will see

success in e-retailing, going beyond apparels

and electronics. ―Local commerce will

emerge in a big way, complemented by

social media recommendations. Curated and

differentiated deals which offer unique



experience to customers will see a surge.

The next big thing to watch out in this

segment is the option of buying meals

online. It is an opportunity area with great

potential.

Aggressive marketing to create Brand:

With most Indian etailers, flush with latest

round of angel funding, they are becoming

aggressive on marketing. The recent ad blitz

on TV, Press, OOH and Social networking

will create a huge awareness going forward.

This initiative should also include setting up

exclusive brand showrooms, in high street

malls, where the customer can visit and

touch/feel the items, sold online. Consider

the fact, that most Retailer Big Bazaar,

Reliance, Chroma, have E Commerce

portals, which compliment their traditional

retailing. The Etailers can also adapt similar

strategy to expand their online sales. The

EBSR will enhance the brand equity and can

be used as face to face contact point for new

or first time customers.

Social networking

Unhappy customers, venting their frustration

on social networking sites, need to be

monitored carefully. The on line reputation

management, will be a must for most

retaiolrs.Perhaps,a Gift Coupon or Discount

Coupon for a delayed delivery or damaged

product, could help in a long way.

Creating new Apps, which will enhance

user experience

App development is a key to conversion

ratio. The etailers can invest own capital or

can also look at strategic alliances, based on

Revenue sharing concepts, which is very

common in MVAS space.

Create Customer Loyalty programs

Offer loyalty plans/programs, which will

offer additional benefits to existing

customers. There could be many offers,

bonus point, air mile rewards and a variety

of options, which will build long term loyal

consumer base. The Privilege card holders

can influence other fence seaters and should

be rewarded for their referral efforts.

Amazon offers a special scheme for $ 79

annual fees, with 48 hours guaranteed

delivery promise.

Attractive portal and strong/robust

Payment system

Many existing retailers are using third party

payment gateways. This trend will slowly

change to build own payment gateways. The

stronger Portal cyber security and easy

access with high band-width high band-

width super info highway will drive the

traffic and conversion.

Supply chain management

The most important aspect will be sourcing

of products and speedier delivery to

customers. The back end IT/SCM needs to

be strengthened, as most customers will

demand guaranteed delivery times. This is

more complicated, as buyers, - both

husband-wives are working and thus most

deliveries get staggered on the week-ends or

holidays. The investment in IT, bar coding,

on-line Tracking systems and all other

aspects of time and QOS compliances will

become more stringent. The bar on IT/SCM

has already been raised by MNCs, such as

Amazon/EBay and the Indian etailers will

need to compete with them.



Cash on Delivery and/or Free delivery

India is one of the major markets, where

lack of Credit/Debit cards, can be overcome

thru COD transactions. This has proven to

be a master stroke and have resolved many

other challenges / hurdles from customer

perspectives.However, customer

authentication, verification needs to be

strengthened, to minimize wastages.

Launch a prepaid Credit/Debit card

The Etailers can also launch own prepaid

Debit/Credit card, which can come with

Free initial top up and can be used, by those

customers, who do not have Credit/Debit

card. This would also be an ideal solution,

for parents, whose wards are staying away

for studies and can use this card for their

routine purchases.

Customer care:

Many novice customers may have lot of

queries, which could be resolved thru strong

customer care mechanism. This could be in

the forms of an online Avatar or Live

person, which can improve the conversion

rates dramatically.

Focus on in-house brands

Most organized retailers have in-house

brands, which contribute higher margins,

than branded items. This will help the

Etailers to break even and become profitable

in a short period of time.

Conclusions:

India with 1.2 Billion population and one of

the fastest growing middle class is most

potent market for M Commerce. The

explosion of Mobile phones and easier

access to organized payment system, will

add the required stimulus to E Commerce.

The projected figures of USD 100 Billion, is

definitely possible, as the customers see

value in on-line purchases, besides the

problems associated with traditional

purchasing options.



Green Computing – Development of Industry and Prevention of

Environment

Mrs.Poonam A. Lalwani Ms.Aparna N. Kulkarni Dr. AbhisheK V. Jain

Asst. Professor, Managing Director, Asst. Professor

MITSOM College, Pune Digixe Core IT & NBN Sinhgad SCS, Pune

Multimedia Services,Pune.

.

ABSTRACT

As computers play an ever-larger role

in our lives, energy demands, costs,

and waste are escalating dramatically.

Green Computing is now under the

attention of not only environmental

organizations, but also businesses from

other industries. In recent years,

companies in the computer industry

have come to realize that going green is

in their best interest, both in terms of

public relations and reduced costs.

However the IT department is usually

always the one department

that uses the most amount of power

which in turn is an excessive amount

of overhead for a business as well

as a source for toxic waste. Making

IT ―Green‖ can not only save money

but help save our world by making it a

better place through reducing and/or

Eliminating wasteful practices and

using non- toxic materials.

I. INTRODUCTION

Green Computing is the study and

practice of designing, manufacturing,

using, and disposing of computers,

servers, and associated subsystems such

as monitors, printers, storage devices,

and networking and communications

systems efficiently and effectively with

minimal or no impact on the

environment. The goals are similar to

green chemistry that is reduce the use

of hazardous materials, maximize

energy efficiency during the product's

lifetime, and promote recyclability or

biodegradability of defunct products

and factory waste. Sustainable IT

services require the integration of green

computing practices such as power

management, virtualization, improving

cooling technology, recycling,

electronic waste disposal, and

optimization of the IT infrastructure to

meet sustainability requirements. The

lifecycle for computers has a direct

impact on our environment, including

pollution, use of heavy metals and toxic



products, and significant energy

consumption levels. IT sector alone

accounts for 2% of CO² emissions

worldwide. IT companies‘ data centers

are responsible for nearly one quarter of

the total carbon emissions produced by

this sector. Green computing also

strives to achieve economic viability

and improved system performance and

use, while abiding by our social and

ethical responsibilities.

II. NEED OF GREEN

COMPUTING

We have great machines and

equipments to accomplish our tasks,

great gadgets with royal looks and

features make our lives more

impressive and smooth. Today almost

all streams weather its IT, medicine,

transportation, agriculture uses

machines which indirectly requires

large amount of power and money for

its effective functioning. In IT

department, it is observed that most of

the computer energy is often wasteful.

This is because we leave the computer

ON even when it is not in use. The CPU

and fan consume power screen savers

consume power even when the system

is not in use. Insufficient power and

cooling capacities can also results in

loss of energy. It is observed that most

of the data centers don‘t have sufficient

cooling capacities. This results in

environment pollution.

It is the need of the hour to educate

people about the ―GREEN‖ use of ICT.

In order to promote these ideas and

create standards and regulations various

organizations have been formed. Many

technology companies actually belong

several of these to further their goals of

becoming more ―green‖.

III. AREA OF FOCUS

It is important to understand the life

cycle of computer while applying the

Concept of GREEN IT. This was

explained with the help of following

figure 1.



FIGURE 1: Life Cycle Approach or Green IT

From the view of a user in an

organization, following are the Computer

Myths:

You should never turn off your

computer

Your computer is designed to

handle 40,000 on/off cycles. If you are an

average user, that‘s significantly more

cycles than you will initiate in the

computer‘s five to seven year life. When

you turn your computer off, you not only

reduce energy use, you also lower heat

stress and wear on the system.

Turning your computer off and

then back on uses more energy than

leaving it on

The surge of power used by a CPU

to boot up is far less than the energy your

computer uses when left on for more than

three minutes.

Screen savers save energy

This is a common

misconception. Screen savers were

originally designed to help prolong the

life of monochrome monitors. Those

monitors are now technologically

obsolete. Screen savers save energy only

if they actually turn off the screen or, with

laptops, turn off the backlight.

Network connections are lost when

computers go into low-power/sleep

mode

Newer computers are

designed to sleep on networks without

loss of data or connection. CPUs with

Wake on LAN (WOL) technology can be

left in sleep mode overnight to wake up

and receive data packets sent to the unit.

IV. WHAT CAN WE DO TO GO

GREEN



1. Turn off your computer at night so it

runs only eight hours a day you‘ll

reduce your energy use by 810 kWh

per year and net a 67 percent annual

savings.

2. Purchase flat screen monitors they use

significantly less energy and are not

as hard on your eyes as CRTs.

3. Unplug the electronics if not in use.

4. Consider a smaller monitor a 14-inch

display uses 40 percent less energy

than a 17-inch one.

5. Purchase an Energy Star–compliant

computer. Note that laptop models

use much less energy than desktop

units.

6. Save Paper when Printing: Things you

can do on your own—like printing

duplex, printing to PDF, previewing

before printing, and not printing

hundreds of copies of an email

forward to plaster around the office.

7. Recycling - Electronics Waste Can be

Recycled. Recycling can be defined

as the process of used materials

processing into new useful materials

with the aim to reduce environmental

pollution. The recycling process is

more environmentally friendly than

the process of making new stuff

because it can reduce the use of new

raw materials, land degradation,

pollution, and energy usage and also

can reduce greenhouse gases [2].

8. E-mail communications as an

alternative to paper memos and fax

documents.

9. plan your computer-related activities

so you can do them all at once,

keeping the computer off at other

times

V. CONCLUSION

The good news is, by embracing simple,

everyday green computing practices you

can improve energy management,

increase energy efficiency, reduce e-

waste, and save money in the process!

The main purpose of adopting an eco-

friendly lifestyle and making conscious

decisions is to reduce the harm to the

planet and create positive conditions for

the environment to flourish and thrive.

Going green is not a fad or a fashion. It is

a way of life. It is a conscious Go Green

to Breathe Easy effort to make a personal

contribution to improving the Earth‘s

health. Make your entire organization

Green in every way possible. Understand

the life cycle of IT products. Reduce as

much paper as possible and recycle it

when you can. Now the time came to

think about the efficiently use of

computers and the resources which are

non renewable.



VI. REFERENCES

[ 1 ] Baroudi, Hill, Reinhold, and

Senxian (2009) Green IT for Dummies,

[ 2 ] Climate Savers Computing Initiative

(2011) Retrieved from

http://www.climatesaverscomputing.org/

[ 3 ] Energy Star Program (2010)

Retrieved from

http://www.energystar.gov/

[4] http://www.theglobal warming

statistics.org/ global- warming-essays.

[ 5 ] Microsoft: Green IT taking the first

step (2010) Retrieved from

http://www.microsoft.com/environment/o

ur_commitme

nt/articles/green_guide.aspx

[ 6 ] Recycle-it America (2011) Retrieved

from http://www.recycleitamerica.com/

[ 7 ] San Murugesan, ―Harnessing

Green IT: Principles and Practices,‖ IEEE

IT Professional, January-February 2008.

[ 8 ] The Green Grid (2010) Retrieved

from http://www.uh.edu/ infotech

/news/story.php ?story_id=130

[ 9 ] Ryan, John C. & Durning, Alan T.

Stuff: The Secret Lives of Everyday

Things. 1997.

http://www.climatesaverscomputing.org/




http://www.microsoft.com/environment/our_commitme




http://www.recycleitamerica.com/

http://www.uh.edu/%20infotech%20/news/story.php

http://www.uh.edu/%20infotech%20/news/story.php



CHALLENGES AND OPPORTUNITIES

WITH BIG DATA

Prof. SEEMA RAWAT


MIT-SOM College, Pune

Executive Summary

The promise of data-driven decision-

making is now being recognized broadly,

and there is growing enthusiasm for the

notion of ``Big Data.‘‘ While the promise of

Big Data is real -- for example, it is

estimated that Google alone contributed 54

billion dollars to the US economy in 2009 --

there is currently a wide gap between its

potential and its realization.

Heterogeneity, scale, timeliness,

complexity, and privacy problems with Big

Data impede progress at all phases of the

pipeline that can create value from data. The

problems start right away during data

acquisition, when the data tsunami requires

us to make decisions, currently in an ad hoc

manner, about what data to keep and what to

discard, and how to store what we keep

reliably with the right metadata. Much data

today is not natively in structured format;

for example, tweets and blogs are weakly

structured pieces of text, while images and

video are structured for storage and display,

but not for semantic content and search:

transforming such content into a structured

format for later analysis is a major

challenge. The value of data

explodes when it can be linked with other

data, thus data integration is a major creator

of value. Since

most data is directly generated in

digital format today, we have the

opportunity and the challenge both to

influence the creation to facilitate later

linkage and to automatically link previously

created data. Data analysis, organization,

retrieval, and modeling are other

foundational challenges. Data analysis is a

clear bottleneck in many applications, both

due to lack of scalability of the underlying

algorithms and due to the complexity of the

data that needs to be analyzed. Finally,

presentation of the results and its

interpretation by non-technical domain

experts is crucial to extracting actionable

knowledge.

During the last 35 years, data

management principles such as physical and

logical independence, declarative querying

and cost-based optimization have led, during

the last 35 years, to a multi-billion dollar

industry. More importantly, these technical



advances have enabled the first round of

business intelligence applications and laid

the foundation for managing and analyzing

Big Data today. The many novel challenges

and opportunities associated with Big Data

necessitate rethinking many aspects of these

data management platforms, while retaining

other desirable aspects. We believe that

appropriate investment in Big Data will lead

to a new wave of fundamental technological

advances that will be embodied in the next

generations of Big Data management and

analysis platforms, products, and systems.

We believe that these research

problems are not only timely, but also

having the potential to create huge economic

value in the US economy for years to come.

However, they are also hard, requiring us to

rethink data analysis systems in fundamental

ways. A major investment in Big Data,

properly directed, can result not only in

major scientific advances, but also lay the

foundation for the next generation of

advances in science, medicine, and business.

Challenges and Opportunities with Big

Data

1. Introduction

We are awash in a flood of data

today. In a broad range of application areas,

data is being collected at unprecedented

scale. Decisions that previously were based

on guesswork, or on pains takingly

constructed models of reality, can now be

made based on the data itself. Such Big Data

analysis now drives nearly every aspect of

our modern society, including mobile

services, retail, manufacturing, financial

services, life sciences, and physical sciences.

Scientific research has been

revolutionized by Big Data. The Sloan

Digital Sky Survey has today become a

central resource for astronomers the world

over. The field of Astronomy is being

transformed from one where taking pictures

of the sky was a large part of an

astronomer‘s job to one where the pictures

are all in a database already and the

astronomer‘s task is to find interesting

objects and phenomena in the database. In

the biological sciences, there is now a well-

established tradition of depositing scientific

data into a public repository, and also of

creating public databases for use by other

scientists. In fact, there is an entire

discipline of bioinformatics that is largely

devoted to the duration and analysis of such

data. As technology advances, particularly

with the advent of Next Generation

Sequencing, the size and number of

experimental data sets available is

increasing exponentially.

Big Data has the potential to

revolutionize not just research, but also

education. A recent detailed quantitative

comparison of different approaches taken by

35 charter schools in NYC has found that

one of the top five policies correlated with

measurable academic effectiveness was the

use of data to guide instruction. Imagine a

world in which we have access to a huge

database where we collect every detailed

measure of every student's academic

performance. This data could be used to

design the most effective approaches to

education, starting from reading, writing,



and math, to advanced, college-level,

courses. We are far from having access to

such data, but there are powerful trends in

this direction. In particular, there is a strong

trend for massive Web deployment of

educational activities, and this will generate

an increasingly large amount of detailed data

about students' performance.

It is widely believed that the use of

information technology can reduce the cost

of healthcare while improving its quality, by

making care more preventive and

personalized and basing it on more extensive

(home-based) continuous monitoring.

McKinsey estimates a savings of 300 billion

dollars every year in the US alone.

In 2010, enterprises and users stored

more than 13 Exabyte‘s of new data; this is

over 50,000 times the data in the Library of

Congress. The potential value of global

personal location data is estimated to be

$700 billion to end users, and it can result in

an up to 50% decrease in product

development and assembly costs, according

to a recent McKinsey report. McKinsey

predicts an equally great effect of Big Data

in employment, where 140,000-190,000

workers with ―deep analytical‖ experience

will be needed in the US; furthermore, 1.5

million managers will need to become data-

literate. Not surprisingly, the recent PCAST

report on Networking and IT R&D

identified Big Data as a ―research frontier‖

that can ―accelerate progress across a broad

range of priorities.‖ Even popular news

media now appreciates the value of Big Data

as evidenced by coverage in the Economist

[Eco2011], the New York Times, and

National Public Radio.

While the potential benefits of Big

Data are real and significant, and some

initial successes have already been achieved

(such as the Sloan Digital Sky Survey), there

remain many technical challenges that must

be addressed to fully realize this potential.

The sheer size of the data, of course, is a

major challenge, and is the one that is most

easily recognized. However, there are

others. Industry analysis companies like to

point out that there are challenges not just in

Volume, but also in Variety and Velocity,

and that companies should not focus on just

the first of these. By Variety, they usually

mean heterogeneity of data types,

representation, and semantic interpretation.

By Velocity, they mean both the rate at

which data arrive and the time in which it

must be acted upon. While these three are

important, this short list fails to include

additional important requirements such as

privacy and usability.

The analysis of Big Data involves

multiple distinct phases as shown in the

figure below, each of which introduces

challenges. Many people unfortunately

focus just on the analysis/modeling phase:

while that phase is crucial, it is of little use

without the other phases of the data analysis

pipeline. Even in the analysis phase, which

has received much attention, there are

poorly understood complexities in the



context of multi-tenanted clusters where

several users‘ programs run concurrently.

Many significant challenges extend beyond

the analysis phase. For example, Big Data

has to be managed in context, which may be

noisy, heterogeneous and not include an

upfront model. Doing so raises the need to

track provenance and to handle uncertainty

and error: topics that are crucial to success,

and yet rarely mentioned in the same breath

as Big Data. Similarly, the questions to the

data analysis pipeline will typically not all

be laid out in advance. We may need to

figure out good questions based on the data.

Doing this will require smarter systems and

also better support for user interaction with

the analysis pipeline. Bottleneck in the

number of people empowered to ask

questions of the data and analyze it.

In fact, we currently have a major

We can drastically increase this number by

supporting many levels of engagement with

the data, not all requiring deep database

expertise. Solutions to problems such as this

will not come from incremental

improvements to business as usual such as

industry may make on its own. Rather, they

require us to fundamentally rethink how we

manage data analysis.

Fortunately, existing computational

techniques can be applied, either as is or

with some extensions, to at least some

aspects of the Big Data problem. For

example, relational databases rely on the

notion of logical data independence:

users can think about what they want



to compute, while the system (with skilled

engineers designing those systems)

determines how to compute it efficiently.

Similarly, the SQL standard and the

relational data model provide a uniform,

powerful language to express many query

needs and, in principle, allows customers to

choose between vendors, increasing

competition. The challenge ahead of us is to

combine these healthy features of prior

systems as we devise novel solutions to the

many new challenges of Big Data. In this paper, we consider each of the boxes

in the figure above, and discuss both what has already been done and what challenges

remain as we seek to exploit Big Data. We

begin by considering the five stages in the pipeline, then move on to the five cross-

cutting challenges, and end with a discussion of the architecture of the overall

system that combines all these functions.

2. Phases in the Processing Pipeline 2.1 Data Acquisition and Recording

Big Data does not arise out of a

vacuum: it is recorded from some data

generating source. For example, consider

our ability to sense and observe the world

around us, from the heart rate of an elderly

citizen, and presence of toxins in the air we

breathe, to the planned square kilometer

array telescope, which will produce up to 1

million terabytes of raw data per day.

Similarly, scientific experiments and

simulations can easily produce peta bytes of

data today.

Much of this data is of no interest,

and it can be filtered and compressed by

orders of magnitude. One challenge is to

define these filters in such a way that they

do not discard useful information. For

example, suppose one sensor reading differs

substantially from the rest: it is likely to be

due to the sensor being faulty, but how can

we be sure that it is not an artifact that

deserves attention? In addition, the data

collected by these sensors most often are

spatially and temporally correlated (e.g.,

traffic sensors on the same road segment).

We need research in the science of data

reduction that can intelligently process this

raw data to a size that its users can handle

while not missing the needle in the haystack.

Furthermore, we require ―on-line‖ analysis

techniques that can process such streaming

data on the fly, since we cannot afford to

store first and reduce afterward.

The second big challenge is to

automatically generate the right metadata to

describe what data is recorded and how it is

recorded and measured. For example, in

scientific experiments, considerable detail

regarding specific experimental conditions

and procedures may be required to be able to

interpret the results correctly, and it is

important that such metadata be recorded

with observational data. Metadata

acquisition systems can minimize the human

burden in recording metadata. Another

important issue here is data provenance.

Recording information about the data at its

birth is not useful unless this information



can be interpreted and carried along through

the data analysis pipeline. For example, a

processing error at one step can render

subsequent analysis useless; with suitable

provenance, we can easily identify all

subsequent processing that dependent on

this step. Thus we need research both into

generating suitable metadata and into data

systems that carry the provenance of data

and its metadata through data analysis

pipelines.

2.2 Information Extraction and Cleaning

Frequently, the information collected

will not be in a format ready for analysis.

For example, consider the collection of

electronic health records in a hospital,

comprising transcribed dictations from

several physicians, structured data from

sensors and measurements (possibly with

some associated uncertainty), and image

data such as x-rays. We cannot leave the

data in this form and still effectively analyze

it. Rather we require an information extraction

process that pulls out the required

information from the underlying sources and

expresses it in a structured form suitable for

analysis. Doing this correctly and

completely is a continuing technical

challenge. Note that this data also includes

images and will in the future include video;

such extraction is often highly application

dependent (e.g., what you want to pull out of

an MRI is very different from what you

would pull out of a picture of the stars, or a

surveillance photo). In addition, due to the

ubiquity of surveillance cameras and

popularity of GPS-enabled mobile phones,

cameras, and other portable devices, rich

and high fidelity location and trajectory (i.e.,

movement in space) data can also be

extracted.

We are used to thinking of Big Data

as always telling us the truth, but this is

actually far from reality. For example,

patients may choose to hide risky behavior

and caregivers may sometimes mis-diagnose

a condition; patients may also inaccurately

recall the name of a drug or even that they

ever took it, leading to missing information

in (the history portion of) their medical

record. Existing work on data cleaning

assumes well-recognized constraints on

valid data or well-understood error models;

for many emerging Big Data domains these

do not exist. 2.3 Data Integration, Aggregation, and

Representation

Given the heterogeneity of the flood

of data, it is not enough merely to record it

and throw it into a repository. Consider, for

example, data from a range of scientific

experiments. If we just have a bunch of data

sets in a repository, it is unlikely anyone will

ever be able to find, let alone reuse, any of

this data. With adequate metadata, there is

some hope, but even so, challenges will

remain due to differences in experimental

details and in data record structure.

Data analysis is considerably more

challenging than simply locating,

identifying, understanding, and citing data.

For effective large-scale analysis all of this



has to happen in a completely automated

manner. This requires differences in data

structure and semantics to be expressed in

forms that are computer understandable, and

then ―robotically‖ resolvable. There is a

strong body of work in data integration that

can provide some of the answers. However,

considerable additional work is required to

achieve automated error-free difference

resolution.

Even for simpler analyses that

depend on only one data set, there remains

an important question of suitable database

design. Usually, there will be many

alternative ways in which to store the same

information. Certain designs will have

advantages over others for certain purposes,

and possibly drawbacks for other purposes.

Witness, for instance, the tremendous

variety in the structure of bioinformatics

databases with information regarding

substantially similar entities, such as genes.

Database design is today an art, and is

carefully executed in the enterprise context

by highly-paid professionals. We must

enable other professionals, such as domain

scientists, to create effective database

designs, either through devising tools to

assist them in the design process or through

forgoing the design process completely and

developing techniques so that databases can

be used effectively in the absence of

intelligent database design.

2.4 Query Processing, Data Modeling, and

Analysis

Methods for querying and mining

Big Data are fundamentally different from

traditional statistical analysis on small

samples. Big Data is often noisy, dynamic,

heterogeneous, inter-related and

untrustworthy. Nevertheless, even noisy Big

Data could be more valuable than tiny

samples because general statistics obtained

from frequent patterns and correlation

analysis usually overpower individual

fluctuations and often disclose more reliable

hidden patterns and knowledge. Further,

interconnected Big Data forms large

heterogeneous information networks, with

which information redundancy can be

explored to compensate for missing data, to

crosscheck conflicting cases, to validate

trustworthy relationships, to disclose

inherent clusters, and to uncover hidden

relationships and models.

Mining requires integrated, cleaned,

trustworthy, and efficiently accessible data,

declarative query and mining interfaces,

scalable mining algorithms, and big-data

computing environments. At the same time,

data mining itself can also be used to help

improve the quality and trustworthiness of

the data, understand its semantics, and

provide intelligent querying functions. As

noted previously, real-life medical records

have errors, are heterogeneous, and

frequently are distributed across multiple

systems. The value of Big Data analysis in

health care, to take just one example

application domain, can only be realized if it

can be applied robustly under these difficult



conditions. On the flip side, knowledge

developed from data can help in correcting

errors and removing ambiguity. For

example, a physician may write ―DVT‖ as

the diagnosis for a patient. This abbreviation

is commonly used for both ―deep vein

thrombosis‖ and ―diverticulitis,‖ two very

different medical conditions. A knowledge-

base constructed from related data can use

associated symptoms or medications to

determine which of two the physician

meant.

Big Data is also enabling the next

generation of interactive data analysis with

real-time answers. In the future, queries

towards Big Data will be automatically

generated for content creation on websites,

to populate hot-lists or recommendations,

and to provide an ad hoc analysis of the

value of a data set to decide whether to store

or to discard it. Scaling complex query

processing techniques to terabytes while

enabling interactive response times is a

major open research problem today.

A problem with current Big Data

analysis is the lack of coordination between

database systems, which host the data and

provide SQL querying, with analytics

packages that perform various forms of non-

SQL processing, such as data mining and

statistical analyses. Today‘s analysts are

impeded by a tedious process of exporting

data from the database, performing a non-

SQL process and bringing the data back.

This is an obstacle to carrying over the

interactive elegance of the first generation of

SQL-driven OLAP systems into the data

mining type of analysis that is in increasing

demand. A tight coupling between

declarative query languages and the

functions of such packages will benefit both

expressiveness and performance of the

analysis. 2.5 Interpretation Having the ability to analyze Big Data is of

limited value if users cannot understand the

analysis. Ultimately, a decision-maker,

provided with the result of analysis, has to

interpret these results. This interpretation

cannot happen in a vacuum. Usually, it

involves examining all the assumptions

made and retracing the analysis.

Furthermore, as we saw above, there are

many possible sources of error: computer

systems can have bugs, models almost

always have assumptions, and results can be

based on erroneous data. For all of these

reasons, no responsible user will cede

authority to the computer system. Rather she

will try to understand, and verify, the results

produced by the computer. The computer

system must make it easy for her to do so.

This is particularly a challenge with Big

Data due to its complexity. There are often

crucial assumptions behind the data

recorded. Analytical pipelines can often

involve multiple steps, again with

assumptions built in. The recent mortgage-

related shock to the financial system

dramatically underscored the need for such

decision-maker diligence -- rather than

accept the stated solvency of a financial

institution at face value, a decision-maker

has to examine critically the many

assumptions at multiple stages of analysis.



In short, it is rarely enough to

provide just the results. Rather, one must

provide supplementary information that

explains how each result was derived, and

based upon precisely what inputs. Such

supplementary information is called the

provenance of the (result) data. By studying

how best to capture, store, and query

provenance, in conjunction with techniques

to capture adequate metadata, we can create

an infrastructure to provide users with the

ability both to interpret analytical results

obtained and to repeat the analysis with

different assumptions, parameters, or data

sets.

Systems with a rich palette of

visualizations become important in

conveying to the users the results of the

queries in a way that is best understood in

the particular domain. Whereas early

business intelligence systems‘ users were

content with tabular presentations, today‘s

analysts need to pack and present results in

powerful visualizations that assist

interpretation, and support user

collaboration as discussed in Sec. 3.5.

Furthermore, with a few clicks the

user should be able to drill down into each

piece of data that she sees and understand its

provenance, which is a key feature to

understanding the data. That is, users need

to be able to see not just the results, but also

understand why they are seeing those

results. However, raw provenance,

particularly regarding the phases in the

analytics pipeline, is likely to be too

technical for many users to grasp

completely. One alternative is to enable the

users to ―play‖ with the steps in the

analysis – make small changes to the

pipeline, for example, or modify values for

some parameters. The users can then view

the results of these incremental changes. By

these means, users can develop an intuitive

feeling for the analysis and also verify that it

performs as expected in corner cases.

Accomplishing this requires the system to

provide convenient facilities for the user to

specify analyses. Declarative specification,

discussed in Sec. 4, is one component of

such a system.

3. Challenges in Big Data Analysis

Having described the multiple phases

in the Big Data analysis pipeline, we now

turn to some common challenges that

underlie many, and sometimes all, of these

phases. These are shown as five boxes in the

second row of Fig. 1.

3.1 Heterogeneity and Incompleteness

When humans consume information,

a great deal of heterogeneity is comfortably

tolerated. In fact, the nuance and richness of

natural language can provide valuable depth.

However, machine analysis algorithms

expect homogeneous data, and cannot

understand nuance. In consequence, data

must be carefully structured as a first step in

(or prior to) data analysis. Consider, for

example, a patient who has multiple medical



procedures at a hospital. We could create

one record per medical procedure or

laboratory test, one record for the entire

hospital stay, or one record for all lifetime

hospital interactions of this patient. With

anything other than the first design, the

number of medical procedures and lab tests

per record would be different for each

patient. The three design choices listed have

successively less structure and, conversely,

successively greater variety. Greater

structure is likely to be required by many

(traditional) data analysis systems. However,

the less structured design is likely to be

more effective for many purposes – for

example questions relating to disease

progression over time will require an

expensive join operation with the first two

designs, but can be avoided with the latter.

However, computer systems work most

efficiently if they can store multiple items

that are all identical in size and structure.

Efficient representation, access, and analysis

of semi-structured data require further work.

Consider an electronic health record

database design that has fields for birth date,

occupation, and blood type for each patient.

What do we do if one or more of these

pieces of information is not provided by a

patient? Obviously, the health record is still

placed in the database, but with the

corresponding attribute values being set to

NULL. A data analysis that looks to classify

patients by, say, occupation, must take into

account patients for which this information

is not known. Worse, these patients with

unknown occupations can be ignored in the

analysis only if we have reason to believe

that they are otherwise statistically similar to

the patients with known occupation for the

analysis performed. For example, if

unemployed patients are more likely to hide

their employment status, analysis results

may be skewed in that it considers a more

employed population mix than exists, and

hence potentially one that has differences in

occupation-related health-profiles.

Even after data cleaning and error

correction, some incompleteness and some

errors in data are likely to remain. This

incompleteness and these errors must be

managed during data analysis. Doing this

correctly is a challenge. Recent work on

managing probabilistic data suggests one

way to make progress. 3.2 Scale

Of course, the first thing anyone

thinks of with Big Data is its size. After all,

the word ―big‖ is there in the very name.

Managing large and rapidly increasing

volumes of data has been a challenging issue

for many decades. In the past, this challenge

was mitigated by processors getting faster,

following Moore‘s law, to provide us with the resources needed to cope with increasing

volumes of data. But, there is a fundamental

shift underway now: data volume is scaling faster than compute resources, and CPU

speeds are static.

First, over the last five years the

processor technology has made a dramatic

shift - rather than processors doubling their

clock cycle frequency every 18-24 months,



now, due to power constraints, clock speeds

have largely stalled and processors are being

built with increasing numbers of cores. In

the past, large data processing systems had

to worry about parallelism across nodes in a

cluster; now, one has to deal with

parallelism within a single node.

Unfortunately, parallel data processing

techniques that were applied in the past for

processing data across nodes don‘t directly

apply for intra-node parallelism, since the

architecture looks very different; for

example, there are many more hardware

resources such as processor caches and

processor memory channels that are shared

across cores in a single node. Furthermore,

the move towards packing multiple sockets

(each with 10s of cores) adds another level

of complexity for intra-node parallelism.

Finally, with predictions of ―dark silicon‖,

namely that power consideration will likely

in the future prohibit us from using all of the

hardware in the system continuously, data

processing systems will likely have to

actively manage the power consumption of

the processor. These unprecedented changes

require us to rethink how we design, build

and operate data processing components.

The second dramatic shift that is

underway is the move towards cloud

computing, which now aggregates multiple

disparate workloads with varying

performance goals (e.g. interactive services

demand that the data processing engine

return back an answer within a fixed

response time cap) into very large clusters.

This level of sharing of resources on

expensive and large clusters requires new

ways of determining how to run and execute

data processing jobs so that we can meet the

goals of each workload cost-effectively, and

to deal with system failures, which occur

more frequently as we operate on larger and

larger clusters (that are required to deal with

the rapid growth in data volumes). This

places a premium on declarative approaches

to expressing programs, even those doing

complex machine learning tasks, since

global optimization across multiple users‘

programs is necessary for good overall

performance. Reliance on user-driven

program optimizations is likely to lead to

poor cluster utilization, since users are

unaware of other users‘ programs. System-

driven holistic optimization requires

programs to be sufficiently transparent, e.g.,

as in relational database systems, where

declarative query languages are designed

with this in mind.

A third dramatic shift that is

underway is the transformative change of

the traditional I/O subsystem. For many

decades, hard disk drives (HDDs) were used

to store persistent data. HDDs had far slower

random IO performance than sequential IO

performance, and data processing engines

formatted their data and designed their query

processing methods to ―work around‖ this

limitation. But, HDDs are increasingly being

replaced by solid state drives today, and

other technologies such as Phase Change

Memory are around the corner. These newer



storage technologies do not have the same

large spread in performance between the

sequential and random I/O performance,

which requires a rethinking of how we

design storage subsystems for data

processing systems. Implications of this

changing storage subsystem potentially

touch every aspect of data processing,

including query processing algorithms,

query scheduling, database design,

concurrency control methods and recovery

methods.

3.3 Timeliness

The flip side of size is speed. The

larger the data set to be processed, the

longer it will take to analyze. The design of

a system that effectively deals with size is

likely also to result in a system that can

process a given size of data set faster.

However, it is not just this speed that is

usually meant when one speaks of Velocity

in the context of Big Data. Rather, there is

an acquisition rate challenge as described in

Sec. 2.1, and a timeliness challenge

described next.

There are many situations in which

the result of the analysis is required

immediately. For example, if a fraudulent

credit card transaction is suspected, it should

ideally be flagged before the transaction is

completed – potentially preventing the

transaction from taking place at all.

Obviously, a full analysis of a user‘s

purchase history is not likely to be feasible

in real-time. Rather, we need to develop

partial results in advance so that a small

amount of incremental computation with

new data can be used to arrive at a quick

determination.

Given a large data set, it is often

necessary to find elements in it that meet a

specified criterion. In the course of data

analysis, this sort of search is likely to occur

repeatedly. Scanning the entire data set to

find suitable elements is obviously

impractical. Rather, index structures are

created in advance to permit finding

qualifying elements quickly. The problem is

that each index structure is designed to

support only some classes of criteria. With

new analyses desired using Big Data, there

are new types of criteria specified, and a

need to devise new index structures to

support such criteria. For example, consider

a traffic management system with

information regarding thousands of vehicles

and local hot spots on roadways. The system

may need to predict potential congestion

points along a route chosen by a user, and

suggest alternatives. Doing so requires

evaluating multiple spatial proximity queries

working with the trajectories of moving

objects. New index structures are required to

support such queries. Designing such

structures becomes particularly challenging

when the data volume is growing rapidly

and the queries have tight response time

limits. 3.4 Privacy

The privacy of data is another huge

concern, and one that increases in the

context of Big Data. For electronic health

records, there are strict laws governing what

can and cannot be done. For other data,



regulations, particularly in the US, are less

forceful. However, there is great public fear

regarding the inappropriate use of personal

data, particularly through linking of data

from multiple sources. Managing privacy is

effectively both a technical and a

sociological problem, which must be

addressed jointly from both perspectives to

realize the promise of big data.

Consider, for example, data gleaned from location-based services. These new architectures require a user to share his/her location with the service provider, resulting in obvious privacy concerns. Note that hiding the user‘s identity alone

without hiding her location would not

properly address these privacy concerns. An

attacker or a (potentially malicious)

location-based server can infer the identity

of the query source from its (subsequent)

location information. For example, a user‘s

location information can be tracked through

several stationary connection points (e.g.,

cell towers). After a while, the user leaves

―a trail of packet crumbs‖ which could be

associated to a certain residence or office

location and thereby used to determine the

user‘s identity. Several other types of

surprisingly private information such as

health issues (e.g., presence in a cancer

treatment center) or religious preferences

(e.g., presence in a church) can also be

revealed by just observing anonymous

users‘ movement and usage pattern over

time. In general, Barabási et al. showed that

there is a close correlation between people‘s

identities and their movement patterns

[Gon2008]. Note that hiding a user location

is much more challenging than hiding

his/her identity. This is because with

location-based services, the location of the

user is needed for a successful data access or

data collection, while the identity of the user

is not necessary.

There are many additional

challenging research problems. For example,

we do not know yet how to share private

data while limiting disclosure and ensuring

sufficient data utility in the shared data. The

existing paradigm of differential privacy is a

very important step in the right direction, but

it unfortunately reduces information content

too far in order to be useful in most practical

cases. In addition, real data is not static but

gets larger and changes over time; none of

the prevailing techniques results in any

useful content being released in this

scenario. Yet another very important

direction is to rethink security for

information sharing in Big Data use cases.

Many online services today require us to

share private information (think of Facebook

applications), but beyond record-level

access control we do not understand what it

means to share data, how the shared data can

be linked, and how to give users fine-

grained control over this sharing. 3.5 Human Collaboration

In spite of the tremendous advances

made in computational analysis, there

remain many patterns that humans can easily



detect but computer algorithms have a hard

time finding. Indeed, CAPTCHAs exploit

precisely this fact to tell human web users

apart from computer programs. Ideally,

analytics for Big Data will not be all

computational – rather it will be designed

explicitly to have a human in the loop. The

new sub-field of visual analytics is

attempting to do this, at least with respect to

the modeling and analysis phase in the

pipeline. There is similar value to human

input at all stages of the analysis pipeline.

In today‘s complex world, it often

takes multiple experts from different

domains to really understand what is going

on. A Big Data analysis system must support

input from multiple human experts, and

shared exploration of results. These multiple

experts may be separated in space and time

when it is too expensive to assemble an

entire team together in one room. The data

system has to accept this distributed expert

input, and support their collaboration.

A popular new method of harnessing

human ingenuity to solve problems is

through crowd-sourcing. Wikipedia, the

online encyclopedia, is perhaps the best

known example of crowd-sourced data. We

are relying upon information provided by

unvetted strangers. Most often, what they

say is correct. However, we should expect

there to be individuals who have other

motives and abilities – some may have a

reason to provide false information in an

intentional attempt to mislead. While most

such errors will be detected and corrected by

others in the crowd, we need technologies to

facilitate this. We also need a framework to

use in analysis of such crowd-sourced data

with conflicting statements. As humans, we

can look at reviews of a restaurant, some of

which are positive and others critical, and

come up with a summary assessment based

on which we can decide whether to try

eating there. We need computers to be able

to do the equivalent. The issues of

uncertainty and error become even more

pronounced in a specific type of crowd-

sourcing, termed participatory-sensing. In

this case, every person with a mobile phone

can act as a multi-modal sensor collecting

various types of data instantaneously (e.g.,

picture, video, audio, location, time, speed,

direction, acceleration). The extra challenge

here is the inherent uncertainty of the data

collection devices. The fact that collected

data are probably spatially and temporally

correlated can be exploited to better assess

their correctness. When crowd-sourced data

is obtained for hire, such as with

―Mechanical Turks,‖ much of the data

created may be with a primary objective of

getting it done quickly rather than correctly.

This is yet another error model, which must

be planned for explicitly when it applies. 4. System Architecture

Companies today already use, and

appreciate the value of, business

intelligence. Business data is analyzed for

many purposes: a company may perform

system log analytics and social media

analytics for risk assessment, customer

retention, brand management, and so on.

Typically, such varied tasks have been

handled by separate systems, even if each

system includes common steps of



information extraction, data cleaning,

relational-like processing (joins, group-by,

aggregation), statistical and predictive

modeling, and appropriate exploration and

visualization tools as shown in Fig. 1.

With Big Data, the use of separate

systems in this fashion becomes

prohibitively expensive given the large size

of the data sets. The expense is due not only

to the cost of the systems themselves, but

also the time to load the data into multiple

systems. In consequence, Big Data has made

it necessary to run heterogeneous workloads

on a single infrastructure that is sufficiently

flexible to handle all these workloads. The

challenge here is not to build a system that is

ideally suited for all processing tasks.

Instead, the need is for the underlying

system architecture to be flexible enough

that the components built on top of it for

expressing the various kinds of processing

tasks can tune it to efficiently run these

different workloads. The effects of scale on

the physical architecture were considered in

Sec 3.2. In this section, we focus on the

programmability requirements.

If users are to compose and build

complex analytical pipelines over Big Data,

it is essential that they have appropriate

high-level primitives to specify their needs

in such flexible systems. The Map-Reduce

framework has been tremendously valuable,

but is only a first step. Even declarative

languages that exploit it, such as Pig Latin,

are at a rather low level when it comes to

complex analysis tasks. Similar declarative

specifications are required at higher levels to

meet the programmability and composition

needs of these analysis pipelines. Besides

the basic technical need, there is a strong

business imperative as well. Businesses

typically will outsource Big Data

processing, or many aspects of it.

Declarative specifications are required to

enable technically meaningful service level

agreements, since the point of the out-sourcing is to

specify precisely what task will be

performed without going into details of how to do it.

Declarative specification is needed

not just for the pipeline composition, but

also for the individual operations

themselves. Each operation (cleaning,

extraction, modeling etc.) potentially runs

on a very large data set. Furthermore, each

operation itself is sufficiently complex that

there are many choices and optimizations

possible in how it is implemented. In

databases, there is considerable work on

optimizing individual operations, such as

joins. It is well-known that there can be

multiple orders of magnitude difference in

the cost of two different ways to execute the

same query. Fortunately, the user does not

have to make this choice – the database

system makes it for her. In the case of Big

Data, these optimizations may be more

complex because not all operations will be

I/O intensive as in databases. Some

operations may be, but others may be CPU



intensive, or a mix. So standard database

optimization techniques cannot directly be

used. However, it should be possible to

develop new techniques for Big Data

operations inspired by database techniques.

The very fact that Big Data analysis

typically involves multiple phases highlights

a challenge that arises routinely in practice:

production systems must run complex

analytic pipelines, or workflows, at routine

intervals, e.g., hourly or daily. New data

must be incrementally accounted for, taking

into account the results of prior analysis and

pre-existing data. And of course, provenance

must be preserved, and must include the

phases in the analytic pipeline. Current

systems offer little to no support for such

Big Data pipelines, and this is in itself a

challenging objective. 5. Conclusion

We have entered an era of Big Data.

Through better analysis of the large volumes

of data that are becoming available, there is

the potential for making faster advances in

many scientific disciplines and improving

the profitability and success of many

enterprises. However, many technical

challenges described in this paper must be

addressed before this potential can be

realized fully. The challenges include not

just the obvious issues of scale, but also

heterogeneity, lack of structure, error-

handling, privacy, timeliness, provenance,

and visualization, at all stages of the analysis

pipeline from data acquisition to result

interpretation. These technical challenges

are common across a large variety of

application domains, and therefore not cost-

effective to address in the context of one

domain alone. Furthermore, these challenges

will require transformative solutions, and

will not be addressed naturally by the next

generation of industrial products. We must

support and encourage fundamental research

towards addressing these technical

challenges if we are to achieve the promised

benefits of Big Data. Bibliography

1. [CCC2011a] Advancing Discovery

in Science and Engineering.

Computing Community Consortium. i. Spring 2011.

2. [CCC2011b] Advancing

Personalized Education. Computing

Community Consortium. Spring

2011.

3. [CCC2011c] Smart Health and

Wellbeing. Computing Community

Consortium. Spring 2011.

4. [CCC2011d] A Sustainable Future.

Computing Community Consortium.

Summer 2011.

5. [DF2011] Getting

Beneath the Veil of Effective

Schools: Evidence from New York

City. Will

6. Dobbie, Roland G. Fryer, Jr. NBER

Working Paper No. 17632. Issued

Dec. 2011.

7. [Eco2011]Drowning in

numbers -- Digital data will flood the

planet—and help us understand it i. better. The Economist, Nov 18,

2011.



ii. http://www.economist.com/blogs/d

ailychart/2011/11/big-data-0

8. [FJ+2011] Using Data for

Systemic Financial Risk

Management. Mark Flood, H V

Jagadish, Albert i. Kyle, Frank Olken, and Louiqa

Raschid. Proc. Fifth Biennial

Conf. Innovative Data

ii. Systems

iii. Research, Jan. 2011.

9. [Gar2011] Pattern-Based

Strategy: Getting Value from Big

Data. Gartner Group press release.

July

1. 2011. Available at

http://www.gartner.com/it/page.jsp?i

d=1731916

10. [Gon2008] Understanding

individual human mobility patterns.

Marta C. González, César A.

Hidalgo, i. and Albert-László Barabási. Nature

453, 779-782 (5 June 2008)

11. [LP+2009] Computational

Social Science. David Lazer, Alex

Pentland, Lada Adamic, Sinan Aral,

i. Albert-László Barabási, Devon

Brewer,Nicholas Christakis, Noshir

Contractor, James ii. Fowler, Myron Gutmann, Tony

Jebara, Gary King, Michael Macy, Deb Roy, and Marshall

12. Van Alstyne. Science 6 February

2009: 323 (5915), 721-723.

13. [McK2011] Big data: The

next frontier for innovation,

competition, and productivity. James

Manyika, i. Michael Chui, Brad Brown,

Jacques Bughin, Richard Dobbs,

Charles Roxburgh, and

ii. Angela

iii. Hung Byers. McKinsey Global

Institute. May 2011.

14. [MGI2011] Materials

Genome Initiative for Global

Competitiveness. National Science

and

i. Technology Council. June

2011.

15. [NPR2011a] Folowing the

Breadcrumbs to Big Data Gold. Yuki

Noguchi. National Public Radio,

Nov. http://www.npr.org/2011/11/29/1425

21910/the-digital-breadcrumbs-that-

lead-to-big-data

16. [NPR2011b] The Search for

Analysts to Make Sense of Big Data.

Yuki Noguchi. National Public

Radio, i. Nov. 30, 2011.

ii. http://www.npr.org/2011/11/30/14

2893065/the-search-for-analysts-

to-make-sense-of-big-data

17. [NYT2012] The Age of

Big Data. Steve Lohr. New York

Times, Feb 11, 2012. i. http://www.nytimes.com/201

2/02/12/sunday-review/big-

datas-impact-in-the-

world.html



Contribution of India's IT Industry to Economic Progress

Aditya Kurane

[S.Y.M.C.A.]

The contribution of India's IT industry to

economic progress has been quite

significant. The rapidly expanding socio-

economic infrastructure has proved to be of

great use in supporting the growth of Indian

information technology industry. The

flourishing Indian economy has helped the

IT sector to maintain

It‘s petitiveness in the global market. The IT

and IT enabled services industry in India has

recorded a growth rate of 22.4% in the last

fiscal year. The total revenue from this

sector was valued at 2.46 trillion Indian

rupees in the fiscal year 2007. Out of this

figure, the domestic IT market in India

accounted for 900 billion rupees. So, the IT

sector in India has played a major role in

drawing foreign funds into the domestic

market.

The growth and prosperity of India's IT

industry depends on some crucial factors.

These factors are as follows:

India is home to a large number of IT

professionals, who have the necessary

skill and expertise to meet the demands

and expectations of the global IT

industry.

The cost of skilled Indian workforce is

reasonably low compared to the

developed nations. This makes the

Indian IT services highly cost efficient

and this is also the reason as to why the

IT enabled services like business process

outsourcing and knowledge process

outsourcing have expanded significantly

in the Indian job market.

India has a huge pool of English-

speaking IT professionals. This is why

the English-speaking countries like the

US and the UK depend on the Indian IT

industry for outsourcing their business

processes. Also the Indian accent, which

is neutral, plays a major role, and

enables effective client –professional

Communication.

The emergence of Indian information

technology sector has brought about sea

changes in the Indian job market. The IT

sector of India offers a host of

opportunities of employment. With IT

giants like Infosys, Cognizant, Wipro,

Tata Consultancy Services, Accenture

and several other IT firms operating in

some of the major Indian cities, there is

no dearth of job opportunities for the

Indian software professionals. The IT

enabled sector of India absorbs a large

number of graduates from general

stream in the BPO and KPO firms. All

these have solved the unemployment

problem of India to a great extent. The



average purchasing power of the

common people of India has improved

substantially. The consumption spending

has recorded an all-time high. The

aggregate demand has increased as a

result. All these have improved the gross

production of goods and services in the

Indian economy. So in conclusion it can

be said that the growth of India's IT

industry has been instrumental in

facilitating the economic progress of

India.



Current trends in Information Technology

Monika Wicks

SY MCA

The current world is techno- centric more

than ever. The rapidly expanding

information sector has left a huge disparity

between where the world is heading and the

approaches businesses are employing to run

their operations. The challenges to

businesses are therefore phenomenal

especially considering the fact the IT

industry is undergoing a tectonic shift

in technology. Different aspects of the

computing landscape are changing at the

same time including communication,

delivery platforms and collaboration

channels. With the information technology

sector, technological innovations are short-

lived as they are frequently changing with

time.

Some of the latest trends that have bought a

revelation in current IT industry are:

Cloud Computing

It is certainly one of the most

sophisticated of the latest trends in

information technology. Cloud computing

provides services such as software,

computation, data access and storage

services without the end- user knowing the

knowledge of the physical location and the

configuration of the system that provides the

service. It is especially effective in cutting

running costs for business for data storage

and other operation costs. Data- centers are

now being down- sized to pave way for

cloud storage. Cloud computing also has in-

built scalability and elasticity features which

can efficiently guide the growth of

businesses.

Consumerization of information

Technology

Technological innovation is actually driven

by the consumer world. More mobile

applications are increasingly being built for

the purpose of mobile users but not for the

replacement of computer applications. The

days of monolithic suits are slowly fading

away and are being taken over by

applications meant specifically for mobile

tablets and smart phones.

Big data/analytics and patterns

As companies continue to drown in

unstructured data which they hardly access,

innovations like the SLDF are being

incorporated in order to manage data. There

are different kinds of SLDF which include

waterfall and the Agile Development

Methodology. Some of the features of ADM

include continuous integration of data, pair

programming, offering spike solutions and

refactoring. The waterfall is more traditional

but is being fast replaced by the Agile

Development Methodology systems. Other

effective systems of data management

include technologies such as in- line

http://importanceofmoderntechnology.com/how-important-is-technology-in-education-today/



duplication, flash or solid- state drives and

automated tiring of data.

Resource management

servers are being virtualized which benefits

businesses in reducing work load

management. Data centers are moving

towards smaller sizes but with greater

density for data storage, i.e. creation of

infinite data centers. Virtualization enables

the improvement of vertically scale data

centers. Its use optimizes server

performance hence creating more floor

space and saving on energy. New scripting

languages: They include Java and .NET.

Some of the features and benefits of .NET

include a fast turnaround time, a simpler

AJAX implementation, and a single

framework that handles a variety of

operations.

There is therefore no need for multiple

frameworks from different vendors in order

to perform different functionalities. It is also

better funded thus, enabling new features to

come out at the fastest pace possible. Some

of the features integrated into the platform

include LINQ, AJAX, the Unit Testing

Framework, Performance Profiler, and

Client Side Reporting among various other

features. Java is quite similar to .NET in

features and benefits.

Fabrics

This is the vertical integration of server

systems, network and storage systems along

with components that have element- level

management software which lays the

foundation that can optimize shared data

resources effectively and dynamically.

Systems that are incorporating this feature

are Cisco and HP which use it to unify

network control.



WINDOWS 8

Kuldeep jain

S.Y. MCA(com.)

The operating system was released to

manufacturing on August 1, 2012, and was

released for general availability on October

26, 2012.

Windows 8 introduced major changes to the

operating system's platform and user

interface to improve its user experience

on tablets, where Windows was now

competing with mobile operating systems,

including Android and iOS. In particular,

these changes included a touch-

optimized Windows shell based on

Microsoft's "Metro" design language,

the Start screen (which displays programs

and dynamically updated content on a grid

of tiles), a new platform for

developing apps with an emphasis

on touchscreen input, integration with online

services (including the ability to sync apps

and settings between devices), and Windows

Store, an online store for downloading and

purchasing new software. Windows 8 added

support for USB 3.0, Advanced Format hard

drives, near field communications,

and cloud computing. Additional security

features were introduced, such as built-

in antivirus software, integration

with Microsoft Smart Screen phishing

filtering service and support for UEFI

Secure Boot on supported devices

with UEFI firmware, to

prevent malware from infecting the boot

process.

Windows 8 hasn't done fantastically well in

terms of public reception - even leading

some at Microsoft to say that the company's

"Start Screen first" mentality was wide of

the mark. Sales of the software

also struggled at first, but after 90 days,

Microsoft indicated it has shifted enough

licenses to equal that of Windows 7.More

than 100 million Windows 8 licenses have

now been sold by Microsoft.

On October 17, 2013, Microsoft released the

first major update to the operating

system, Windows 8.1. The update addresses

some aspects of Windows 8 that were

criticized by reviewers and early

adopters and incorporates additional

improvements to various aspects of the

operating system.

http://en.wikipedia.org/wiki/Graphical_user_interface



http://en.wikipedia.org/wiki/Touchscreen

mit-som college, pune innovation in itmitsomcollege.edu.in/pdf/innovation-in-it.pdf ·...

Documents