big data social network analysis

42
Big Data Social Network Analysis by Chamin Nalinda (Registration No : 2011/CS/005, Index No : 11000058) [email protected] +94 772416604 SCS 3017 Literature Survey Supervised by Dr. H. A. Caldera BSc(Colombo), PGDip(Colombo), MSc(Colombo), PhD(Western Sydney) University of Colombo School of Computing Colombo 7 SRI LANKA TexMaker | Mendele Desktop |Harvard Style Referencing | Word Count = 5466

Upload: chamin-nalinda-loku-gam-hewage

Post on 16-Jul-2015

673 views

Category:

Social Media


1 download

TRANSCRIPT

Page 1: Big Data Social Network Analysis

Big Data Social NetworkAnalysis

by

Chamin Nalinda(Registration No : 2011/CS/005, Index No : 11000058)

[email protected]+94 772416604

SCS 3017Literature Survey

Supervised by

Dr. H. A. CalderaBSc(Colombo), PGDip(Colombo), MSc(Colombo), PhD(Western Sydney)

University of Colombo School of ComputingColombo 7SRI LANKA

TexMaker | Mendele Desktop |Harvard Style Referencing | Word Count = 5466

Page 2: Big Data Social Network Analysis

Declaration

I hereby declare that this literature survey report was written by Chamin Nalinda.A great deal of analysis was carried out in preparing this report and the bibliographyreflects key reference materials. Self learned knowledge was also included. Referenceshave been mentioned without violating the owner’s exact content(paragraphs, sentencesetc...)

Name of Candidate: L.G.H.C. Nalinda

Signature: ............................... Date: December 12, 2014

Page 3: Big Data Social Network Analysis

Abstract

Big Data Social Network Analysis (BDSNA) is the focal computational and graphicalstudy of powerful techniques that can be used to identify clusters, patterns, hiddenstructures, generate business intelligence, in social relationships within social networksin terms of network theory. Social Network Analysis (SNA) has a diversified set ofapplications and research areas such as Health care, Travel and Tourism, Defence andSecurity, Internet of Things (IoT) etc. . . With the boom of the internet, Web 2.0and handheld devices, there is an explosive growth in size, complexity and variety inunstructured data, thus the analysis and information extraction is of great value andadaptation of Big Data concept to SNA is vital.

This literature survey aims to investigate the usefulness of SNA in the “Big Data(BD)” arena. This survey report reviews major research studies that have proposedbusiness strategies, BD approaches to generate predictive models by gratifying con-temporary challenges that have arises from SNA.

Page 4: Big Data Social Network Analysis

Acknowledgements

I would like to offer my heartfelt thanks to Dr. H. A. Caldera, my supervisor for theLiterature Survey for his immense support and continuos feedback during the courseof the Survey and for guiding me by giving valuable ideas.

Further, my sincere gratitude goes to all the lecturers, assistant lecturers and theentire UCSC family.

Special thanks to my parents, brother and sister who have always given me thestrength through the journey of my life.

Chamin Nalinda, December 12, 2014

i

Page 5: Big Data Social Network Analysis

Contents

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivAcronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

1 Introduction 11.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Current status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Big Data Social Network Analysis Domains 52.1 Health care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Challeges and Future . . . . . . . . . . . . . . . . . . . . . . 82.2 Defence and Security . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Identifying key players in network . . . . . . . . . . . . . . . 92.2.2 Usecases from recent history . . . . . . . . . . . . . . . . . . 112.2.3 Challenges and Future . . . . . . . . . . . . . . . . . . . . . 11

2.3 Travel and Tourism . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.1 Web 2.0 forms Tourism 2.0 . . . . . . . . . . . . . . . . . . . 122.3.2 Tourism 2.0 Destination Management . . . . . . . . . . . . . 132.3.3 Challenges and Future . . . . . . . . . . . . . . . . . . . . . 14

2.4 Web 2.0 and IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.1 Challenges and Future . . . . . . . . . . . . . . . . . . . . . 16

2.5 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 BDSNA Tools and Technologies 183.1 Major Concerns in BDSNA . . . . . . . . . . . . . . . . . . . . . . 183.2 Real Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

ii

Page 6: Big Data Social Network Analysis

3.3 Lambda Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3.1 Batch layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3.2 Serving layer . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3.3 Speed Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4 Recommendation systems . . . . . . . . . . . . . . . . . . . . . . . 233.5 Web 2.0 IoT Architecture . . . . . . . . . . . . . . . . . . . . . . . 243.6 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Conclusion, Challenges and Future Directions 264.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2 Challenges and Future Directions . . . . . . . . . . . . . . . . . . . 27

Bibliography 29

iii

Page 7: Big Data Social Network Analysis

List of Figures

2.1 Sources Used to Find or Access Health and Welness Related informa-tion in 2008, in United States of America (USA) . . . . . . . . . . . 6

2.2 9/11 attackers having weak ties with others . . . . . . . . . . . . . . 92.3 Decentralized terrorist network . . . . . . . . . . . . . . . . . . . . . 102.4 PISTA ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 most consulted Social Networks (SNs) in cybertravelling . . . . . . . 132.6 Traveller recommendation system . . . . . . . . . . . . . . . . . . . . 142.7 TAM to gain loyalty . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.8 Tweeting trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 Expected growth in real time analytics by 2015 . . . . . . . . . . . . 213.2 Capabilities of Operational Intelligence . . . . . . . . . . . . . . . . . 213.3 Overview of Lambda Architecture . . . . . . . . . . . . . . . . . . . 223.4 Architecture for Social Internet of Things (SIoT) Client Side and Server

Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.1 Hosting data on cloud and challenges . . . . . . . . . . . . . . . . . . 27

iv

Page 8: Big Data Social Network Analysis

Acronyms

AMA American Medical Association

ANN Artificial Nueral Network

API Application Programming Interface

BD Big Data

BDA Big Data Analytics

BDSNA Big Data Social Network Analysis

BI Business Intelligence

BP Batch Processing

CC Cloud Computing

CO Cognative Objects

DARPA Defense Advanced Research Projects Agency

DB Bata Base

DD Deep Data

DM Data Mining

DMO Destination Management Organizations

DT Decision Tree

DW Data Warehousing

eWOM e-word-of-mouth

FB Facebook

FC Fog Computing

IoT Internet of Things

v

Page 9: Big Data Social Network Analysis

LA Lambda Architecture

NLP Natural Language Processing

NSA National Security Agency

OI Operational Intelligence

OM Opinion Mining

RFID Radio Frequency Identification

ROS Robotic Operating System

RTA Real Time Analysis

RTBDA Real Time Big Data Analytics

SIoT Social Internet of Things

SM Social Media

SNA Social Network Analysis

SNs Social Networks

SP Stream Processing

SW Software

TAM Technology Acceptance Model

TM Text Mining

TPA Technosocial Predictive Analytics

UGC User Generated Content

US United States

USA United States of America

WSNs Wireless Sensor Networks

WWW World Wide Web

vi

Page 10: Big Data Social Network Analysis

Chapter 1

Introduction

This literature survey is based on key domain areas that "Social Network Analysis"play a vital role with use of "Big Data" technologies. The discovered knowledge canbe utilize to extend current status of respected domains. This chapter highlightsimportance, history and growth potentials in the survey topic in a nutshell.

1.1 Approach

Social Networks(SNs) connect people with different ideas, education, status, back-groudns, geographies etc... The focal idea of Social Network Analysis(SNA) is iden-tifying network relationships within the network. Information diffusion is the keybehind relationship formation. Within SNs, variety of interest are sharing that adress-ing different domains, and it forms complex relationships. With World Wide Web(WWW) and Web 2.0, SNs have gained a new shift and focus. Online SNs are massivedata repositories. Visitors to SNs leave a digital footprint once they are logged in andhence all activities of logged users can be examined in online SNs. Data scientistsfound the importance of translating these technological opportunities into revenue,competitive advantages and useful discoveries to redefine human interaction[6] andday to day life. Otherwise the data would have remained in data tombs and oppor-tunities would have been ignored.

A trusted technique to analyse SNs are BD analytical approaches. Data Mining

1

Page 11: Big Data Social Network Analysis

(DM) techniques are heavily used to dig deeper into data in SNs.Big Data Analytics(BDA) is a proven method of defining new storage/access/query/scaling mechanismof data and of developing new approaches to sentiment analysis, predictive modeling,Natural Language Processing (NLP), click stream pattern recognition etc. . .

BDSNA is a fast growing research area. There are quite a number of algorithms,software tools and analytic engines that are optimized[39] for BDSNA. These toolsare capable of gathering data, processing, analyse and present results visually for aparticular domain. This literature survey gives an overview review on BDSNA topicas published on research papers, journals, web articles, books etc...

1.2 Motivation

“Connectivity” is the concept for forming SNs. Competencies given by SNs, sen-sors, online networks are rich data sources. People spend a substantial amount of timein online networks.Therefor SNs generate high volume of User Generated Content(UGC) with different varieties at a rapid velocity[40]. This UGC is a true reflectionof human behaviour in SNs hence UGC in SNs are of high commercial value. But it’senormity and unstructured nature has presented multiple challenges, hence the needfor storage, access, analytics and high computational performance needed to con-sider. As a result “BD” technology mix been with SNA to discover new diamensionsof knowledge.

Facebook (FB)1, Twitter2, LinkedIn3, Google+4, Tripadvisor5, Blogger6, Insta-gram7 are the leading SNs with vast user engagements in todays context. UGC inSNs are in the form of text, emoticons, images, ratings, likes etc. . . and addressmany domains such as travel and tourism[33], defence and security[4], healthcareand medicine. Nature and characteristics are different in these SNs, how ever thereare similarities which can aggregate in addressing domains.UGC poses many busi-

1https://www.facebook.com/2https://twitter.com/3https://www.linkedin.com/4https://plus.google.com/5tripadvisor.com/6https://www.blogger.com/7http://instagram.com/

2

Page 12: Big Data Social Network Analysis

ness opportunities. Discovery of knowledge that are resides in UGC while analysingattributes that are unique to each domain will create more opportunities for bothprivate and government sectors. Another big wave in the coming decade is IoT. Thiswill further create more UGC in semi structured and unstructured nature, and vari-eties of SNs. As a result “BD” will move to “Deep Data (DD)” concept while, “CloudComputing (CC)” will move to “Fog Computing (FC)”.

Deducing business intelligence via connecting dots using Operational Intelligence(OI) and comparing and applying discovered knowledged in the modern and futuresocietal context using classification, sentiment analysis and other techniques in BDand DM paradigm are blooming research topics. In these researches areas, it isintegral to determine which BDSNA algorithms and techniques have accommodationfor growth in size, scalability, quantification issues, pattern recognition issues andcapability of real time analytics in SNA application areas. Data scientists and otherresearchers are also seeking novel ways of redesigning the infrastructure to facilitateBDSNA with the rapid growth in IoT.

1.3 History

Various arguments are there to claim the initiative on SNA, while an experimentdone by Stanley Milgram in 1967 provide proper groundings for it. He came up with“six degree of separation”[37] concept where he stated that most people connectedby six acquaintances. SixDegrees.com was the first acceptable online social network.The research arena BDSNA boomed with Web 2.0 that came to light in 1999 [46].

Low availability of internet facilities and lack of Software (SW) tools to meet BDrequirements,were a major reason for BDSNA to stay out of sight in early days.

1.4 Current status

Today millions of people are connected with social networks in many differentways[28]. Social networks are in a neck to neck fight to keep their current users whileattracting new users. This leads to semistructured and unstructured data beinggenerated at a rapid pace.

3

Page 13: Big Data Social Network Analysis

BDSNA is an aggressive and lucrative research areas in modern computer sci-ence. Public and private sector organizations have open up their data repositoriesfor research purposes[13] and have encouraged data scientists to actively engage inmore research areas in BDSNA. Tech giants like Google8,Microsoft9 , FB, Amazon10

and IBM11 are investing in start-up companies that operate in BDSNA because of itslucrative nature and growth potentials. The demand for business intelligence toolsare erupting[10]. High performance, low latency, parallel distributed processing, realtime processing, scalability, migration are factors that are continuously optimized insuch tools. Further with IoT a new era has been born where trees tweet on theirconditions[12][15].

1.5 Chapter summary

Early days BDSNA was not so popular due various reasons and it has emergedwith Web 2.0 technology. SNs connect people with different views and opinions.The UGC data repositories of SNs are huge and those are in different varieties andvariations.BDSNA helps in analysing UGC in SNs and there by discover knowledge.This knowledge has a higher commercial value as well. Today, there are differentforms of online SNs that address different user groups (FB, LinkedIn, TripAdvisoretc...). Classification, sentiment analysis, clustering, Real Time Analysis (RTA) andvarious other BD and DM techniques are widely used in SNA. Today, the advances intechnology has spread to SNA where now tech giants and data scientists are lookingfor novel approaches to accommodate the needs of SNA such as storage, querying,accessing and analysing UGC with much more improved technologies.

Next chapter gives a detailed illustration on four major SNA domains that thisliterature survey mainly concerned with. Examples and use cases from survey reports,articles journals have included in order exploring the BDSNA importance to respectivedomains.

8https://www.google.com/9http://www.microsoft.com/

10http://www.amazon.com/11http://www.research.ibm.com/

4

Page 14: Big Data Social Network Analysis

Chapter 2

Big Data Social Network AnalysisDomains

In this chapter, SNA domains in health care, defence and security, travel and tourism,web 2.0 and IoT are discussed.Examples illustrate how BDSNA have used to addressstakeholder intensions and expectations. Further, this chapter exposes specializationsin each domain that emerged as a result of BDSNA.

2.1 Health care

As shown in "Figure 2.1", it is apparent that there’s a strong likelihood to useinternet as a source of finding health and wellness related information and people aremore likely to spend much time in SNs in their day to day lives. Web 2.0 attracts usersof all age groups. Discussions, information diffusion, collaboration over SNs growingso rapidly in healthcare space. Recent researches have identified that professionals inhealthcare are willing to use SNs as means of addressing their patients and monitorhealth conditions of patients. Further, patients who have recovered are also inter-ested in sharing their success stories in SNs in the forms of blogging, photo sharing,video uploads and articles. This information is publicly available to a vast variety ofpeople. As of now, we are in Health 2.0, “the use of social software and its abilityto promote collaboration between patients, their caregivers, medical professionals, and

5

Page 15: Big Data Social Network Analysis

other stakeholders in health” [17].

Figure 2.1: Sources Used to Find or Access Health and Welness Related informationin 2008, in USA

It is “Collective Wisdom” that act as driving force for people to increasingly useSN to find information relevant to their health matters. There are specifically devel-oped SNs like PatientsLikeMe1,OrganizedWisdom2,ICYou3, Google Health Groups,Sermo4, DailyStregth5 to bridge the knowledge and experience of patients and healthcare professional expertise[17].American Medical Association (AMA) emphasizes theimportance of adhering to professionalism to physicians, neurosurgeons and otherprofessionals, when publishing content over SN to safeguard career status in health-care background[9]. Even though there are challenges in collecting data, healthcaresector in SNs reflects accurate data where it is over 99.7%[20].

1http://www.patientslikeme.com/2http://www.organizedwisdom.com/Home3http://icyouhealth.tumblr.com/4https://www.sermo.com/5http://www.dailystrength.org/

6

Page 16: Big Data Social Network Analysis

A research focussed on cancer patients social behaviour on FB conducted by theUniversity of Texas M.D Anderson Cancer Center has enabled them to provide betterservice towards its patients. The UGC had been of poster types and text. This tech-nique is called “Telemedicine”. Just as Health 2.0, Medicine 2.0 is another conceptthat evolves with high user participation over sn to communicate and collaborate onhealth care. The Twitter network is also widely popular among patients and health-care professionals as a medium of communication[20]. How ever patients willingnessto communicate over SNs openly is mandatory, otherwise regulations will mark it asa violation of patients’ rights.

Videos, articles, comments, chats, images and other form of UGC related to health-care available on SNs represents a gold mine of opportunities[20][17]. Sophisticatedapplications have been developed integrating both DMand BD techniques. TrialX6 isone such application that patients can use. Once a patient tweets, TrialX will send atailored response to the patient from his/her past health history[20][32].

Gene engineering, drug research, disease research and public health domains utilizeUGC on SNs to discover knowledge and thereby develop models to enhance healthconditions of people. Twitter hashtags are quite useful when determining disease/drugrelated effects[8]. Automated filtering system that was developed by US Food andDrug Administration has proved that 98% of tweets are bogus, however the trueinformation is of great value[23].

Information extraction is critical. Migration of digital documentation from paperwork, and SNs data are huge repositories. An automated surveillance system wouldbe much effective in information extraction, analysing data and recognizing patterns.One such system implemented at University of Alabama has proven results. It hadbeen successful in determining, high risk patients, short-term health issues and ad-verse effects from drugs. Use of big data has enabled to deliver tailored prescriptionsfor patients[30].

Significant number of BD applications in healthcare domain exist today. SW toolthat is similar to Asthmapolis7 would be meaningful to implement considering SNsdata repository. Mobility is expected through big data tools hence mobile platforms

6http://trialx.com/enablers/7http://propellerhealth.com/

7

Page 17: Big Data Social Network Analysis

enable tools will have a lot of growth potentials.Ginger.io8 and mHealthCoach9 arethe leading tools[18] at present but these two have been unable to incorporate SNsdomain into there applications, and the necessity for such tools prevail.

2.1.1 Challeges and Future

UGC appear on anonymous blogs and spam comments are unreliable sources.Efficient NLP techniques and Text Mining (TM)techniques need to be utilize whendeveloping BD tools and appliations. Strong rules and regulations exist in healthcaredomain. This is a barrier to obtain useful information from SNs. Mere sentiments arenot enough to develop solid algorithms and models, patient information and otherrelated information will add much value to researches. "Privacy" concerns are anotherbarrier. People might not want others to use what they share on SNs

Web 2.0 will evolved to Web 3.0 and eventually Health 2.0 and Medicine 2.0will evolved to Health 3.0 and Medicine 3.0[17]. With the rise of IoT BD wearableswill take piority in healthcare[7]. SNs BD wearable concept will redefine humaninteractions with healthcare matters.

2.2 Defence and Security

With the 9/11 massacre in the United States (US), the National Security Agency(NSA) invested a huge amount of resources to counter attack terror networks. “Net-works and Networs” by John Arquilla and David Ronfeldt prior to 9/11 massacrehighlighted the network behavioral patterns of criminal networks. Modern war net-work structures are leaderless, extremely quick hence novel approaches are needed incounter terror threats. Valdis Krebs mapped Al-Qaeda network responsible for 9/11[37]. More and more importance was given in SNA to trace terror network and to-day SNA plays a key role in demolishing terror networks[11].Technosocial PredictiveAnalytics (TPA) methods for web DM, social web tools needed to capture and queryUGC in SNs[22]

8https://ginger.io/9http://www.mhealthcoach.com/

8

Page 18: Big Data Social Network Analysis

National security is the main concern. Unlike other SNs domain, defence domainis different in many ways since key players are not openly active. Weakly tied partiesare somewhat open in SNs, but even they hardly communicate. SNA in defencerequires two major parties, data collectors and data modellers.Data collectors face acumbersome time in gathering data due to the above reason. University of ArizonaArtificial Intelligence Center10 offers large data repository of newspaper articles, webpages, social network data that is terror related.Clustering technique have been usedto segregate possible terror networks and they have managed to pictorially representdiffused networks linked with weak ties(Figure 2.2) in network of 9/11 attackers[37].

Figure 2.2: 9/11 attackers having weak ties with others

2.2.1 Identifying key players in network

Two main focuses of analysing SNs in defece domain are to identify structureof possible networks and to recognize key players. With 9/11 attack, the structuredecentralized (yet still both centralized and decentralized networks do exists). Un-derstanding key player will help in taking the control of the entire network. Though

10http://ai.arizona.edu/research/terror

9

Page 19: Big Data Social Network Analysis

it sounds easy, factors such as incompleteness, fuzzy boundaries and dynamics makesit a tough task. In a decentralized network player do exists to handle financial as-sistance and other supplies while the leader plays a silent role in managing[4](Figure2.3). BDSNA is use to identify financial manager and there by recognize key roles.Twitter BD analytic techniques are most likey to be used in recognizing key players.[31]

Figure 2.3: Decentralized terrorist network

As shown in "Figure 2.4", PISTA Architecture is quite useful in filling major loopholes in national security domain. But at the moment, this architecuture has fewerapplications with SNs UGC integration. It is highly recommend to invest on extendingthe functionality of PISTA architecture to supportBDSNA in security domains sincemost SNs have video sharing, geo-location setting features in them[42].

Figure 2.4: PISTA ontology

10

Page 20: Big Data Social Network Analysis

2.2.2 Usecases from recent history

During recent history there had been several major incidents happening through-out the globe whith Web 2.0 initiatives. This section highlights some major incidentsand BDSNA technologies used in those situations.

• 2008 Egyptian Revolution started through an initiation of FB group.Importanceof giving attention on SNA was discovered[14].

• 2009 Pakistan Chief Justice restatement efforts were caused purely due to SNsinfluence. Government banned private media, yet people did social awarenessthrough SNA so Govt had to restate the Chief Justice back in his position [14].

• ISIS is a technologically sophisticated terror group that actively engage in SNs.ISIS use strong encryption techniques when communicating over SNs. Due tothis barrier,BDSNA approaches like NLP,Graph data bases (determine hierar-chies and identities) and cognitive computing platforms cannot solely be usedas they are. Project Minerva by Department of Defence USA, utilize high endalgorithms to determine terror activities that are pulled from Twitter.[47]

• FIFA World Cup 2014 can be considered as an event that used BDSNA toestablish peace around grounds and nearby cities. Brazil securities used realtime Twitter feeds, FB feeds and other SNs UGC and analysed semantics todetermine where to send troops to control riots. Security agencies used powerfulBD solution, Oracle Complex Event Processor11 to do real time querying on SNsfeeds.

2.2.3 Challenges and Future

BD analytics in defence sector provide meaningful insight to Governments. Thedirector of Defense Advanced Research Projects Agency (DARPA)12 in US empha-size the importance of algorithm optimization in discovering useful intelligence. “e-harassment”, “cyberbullying”, “hacking” are major investigating areas.The adoptionof SNs data is yet at a low stage, but considering recent history it is apparent that

11http://www.oracle.com/technetwork/middleware/complex-event-processing/documentation/index.html12http://www.darpa.mil/default.aspx

11

Page 21: Big Data Social Network Analysis

it is highly essential to take into consideration SNs data when discussion the secu-rity domain. Big argument against BDSNA in defence is, violation of privacy. Peopleshare their thoughts on FB, Twitter and other SNs because they have a right, and notto use those for other purposes. Recent whistle blowing incidents by Julian Assangeand Wikileaks, PRISM and Edward Snowden are such examples. It is apparent thatthe Government try to hide these information from public visibility[11]. To obtainsuccessive results there should be a balance between Govt policy towards SNs andusers attitudes.

2.3 Travel and Tourism

Tourism has always been a networked industry. Web 2.0 redefined tourism and allrelated industries. This phenomenon is Tourism 2.0[26]. In tourist networks, twomajor types of stakeholder (tourist, travel agent, accommodation providers, restau-rants etc. . . )[41] can identify tourist and service providers. Different views have beengiven to BDSNA in the domain of tourism. Two such broad views are using SNs as atool in tourist destination determination [33][26] and second is process and discoverinteresting patterns in SNs and apply derived knowledge to tourism[34][16].

2.3.1 Web 2.0 forms Tourism 2.0

SNs are powerful tools that uses Technology Acceptance Model (TAM) and e-word-of-mouth (eWOM). TAMillustrates users’ willingness to adapt to technologieswhile eWOM is content sharing on SNs in forms of text, images, videos etc... TAMand eWOM provide primary source of information for cybertravellers. Cybertrav-ellers behaviour depend on what other people say about destinations(Figure 2.5).Theneed for new framework to address destination governance is highlighted in this ap-proach. Service providers need to adopt their networks with features of embeddingSNs to support searching, visualization, interactivity and this would trigger positiveattitude towards travelling. Travel 2.0 SNs (TripAdvisor, WAYN13, Tripwolf14, Trav-elblog15, Trivago16)SNs features to address cybertraveller expectations.Here focus is

13http://www.wayn.com/14http://www.tripwolf.com/15https://www.travelblog.org/16http://www.trivago.com/

12

Page 22: Big Data Social Network Analysis

more towards leisure travellers rather than business travellers.[33][26][35][36]

Figure 2.5: most consulted SNs in cybertravelling

2.3.2 Tourism 2.0 Destination Management

Tourist attitudes, behaviour and psychology has huge impact when determiningdestinations to explore. Different market segments demands are different. eMar-keteers use tailored strategies to attract potential tourists... Destination Manage-ment Organizations (DMO) utilize DM and BDSNA techniques (clustering, ArtificialNueral Network (ANN), Decision Tree (DT)) to determine customer intentions frommixture of facts and opinion from UGC on SNs[43].

Travel 2.0 benefit BDSNA in demand/sales forecasting, inventory management,multichannel marketing campaign organization etc... Use of SNs methods are quiteimportant when removing noise and discover meaningful knowledge from SNs to bringmeaninguful insight[1].RapidMiner17 analyse traveller patterns and render dynamicpersonalize suggestions based on past as well as other linked networks results (pre-dicting air ticket price, hotel charges etc. . . [44]

At the time of decision making, traveller in a state of switching one to the otherdepending on reviews. FB pages, provide great insights about destinations/hotels.How ever researches have proved that it is very much likely to Tweet or post on FB iftravellers had a bad experience with service provide organizations.Twitter users aremore likely to re-Tweet negative reviews than positive reviews. This highlight theimportance of monitering UGC on SNs pages of service providers. Once the travellerhas selected preferred hotel/travel service, they are very much likely to visit brand

17https://rapidminer.com/

13

Page 23: Big Data Social Network Analysis

Figure 2.6: Traveller recommendation system

website of hotel/travel agency. It is vital to integrate TAM features to explore moreabout services that are offered to customer to win customer loyalty[36].Strategies helpDMO ultimately to boost their revenue and gain competitive advantage over peersthat ignores BDSNA.

2.3.3 Challenges and Future

This sections describes prevailing restrictions in BDSNA in tourism and traveldomain and how the future would be.

• Currently, most are relationald BDs. Tourism and travel sector need new in-frasturture tools to get maximum of bdsna.

• User opinions are subjective. Algorithms should support the viewing of gener-alized opinion of travellers and should not be affect it by outliers.

• Content that shares on FB, Twitter and other SNs have direct influence onDMO, travel agents and hotels.So there is a need for strong monitering mecha-nism need to incorporate to Travel 2.0 websites.

• Airline service providers can benefit from real time data analytics on flightdelays, UGC from SNs, and sensor data (weather patterns etc. . . ) serve greatlywhen optimizing operations.

14

Page 24: Big Data Social Network Analysis

Figure 2.7: TAM to gain loyalty

2.4 Web 2.0 and IoT

In an era where Web 2.0 evolves to Web 3.0 (Ubiquitous Computing), that hard-ware embedded software takes the lead in daily routines of mankind, will have a hugeinfluence on current SNs practices as well. Today mostly humans are connected toSNs. With advances in IoT, Cognative Objects (CO) or smart objects are capable ofsharing UGC over SNs. Tweeting trees(Figure 2.8), tweeting washing machines sendreal time content to humans[24][27]. Two broad SNs exists with SNs, humans to COSNs and CO to CO SNs SIoT[19].

Figure 2.8: Tweeting trees

15

Page 25: Big Data Social Network Analysis

Developers integrate SNs capability to every smart device because SN play animportant role in personal life. Google Glass18, Samsung Galaxy Gear watch19, AppleiWatch20 and many other wearable technologies have integrated SNs capability. LewisRobinson on his article to SocialMediaToday21 stated that “iWatch will check in foryou via Facebook when you arrive at an event. Your oven will take a photo of thecake you just baked and post it directly to Instagram”.[38] It is evident that automatedinterconnected smart devices can act without human intervention.

There will be more data as neverbefore. BDSNA will be able to provide morepersonalized information to all stakeholder groups, and advanced Business Intelligence(BI) can be derived using sophisticated analytical approaches. Concept of “SmartCities” is an example of advanced data analytic utilization of UGC from IoT devicesthat connected SNs and other CO. Waze22, is such real time traffic application thatconnect mobile devices with other CO (traffic lights, street signs etc. . . )

2.4.1 Challenges and Future

“Privacy” is again a major concern in this arena. Since devices having capability ofgenerating automated content sharing on SNs, it could be a violation of privacy of in-dividuals. How ever, Lawrence Ampofo on his recent article to Business2Community23

emphasizes that, “conception of privacy become more sophisticated” where people aremore likely to openly communicate their personal life through social networks and“data to be more liberated from wall gardens making available to all platforms”[2].

It is predicted that by the end of 2020, the number of IoT devices would riseabove 50 billion[38]. The potential for new concept SNs is massive. The amount ofunstructured data that is generated from IoT devices will be so huge that even currentbd technologies cannot accommodate the size, growth and scalability. The conceptof “Deep Data” and “Fog Computing” need to be utilized effectively to accommodateinfrastructure requirements.

18https://www.google.com/glass/start/19http://www.samsung.com/uk/consumer/mobile-devices/wearables/gear/20https://www.apple.com/watch/21http://www.socialmediatoday.com/22https://www.waze.com/23http://www.business2community.com/

16

Page 26: Big Data Social Network Analysis

2.5 Chapter summary

Health 2.0, Medicine 2.0 approaches have evolved as a result of Web 2.0 becauseit identified that, the potential from SNs to health care industry is massive. SNs arefastest method of communication between patients and health care professionals suchas nurses, doctors and specialists etc. . . In PatientsLikeMe, Google health groupsand various other SNs that specially focus towards health care are sharing knowledgeand experiences of all parties related to health care. Sophisticated SW tools such asTrialX utilizes BDSNA methods to send tailored responses to patients and doctors byanalysing related party data reflected on SNs. Specialized research areas such as drugresearch and disease research massively use BDSNA approaches like TM, sentimentanalysis, clustering and RTA etc. . .

Defence and Security domain is very different compared to other domains in BD-SNA. Finding reliable data repository is a major challenge because terror groupshardly reveal any data. But recent ISIS scenario is totally different. Today, gov-ernment agencies and authorities use BDSNA to establish security in their territory.RTA play an important role in analysing UGC of SNs. Highly sophisticated modelsand predictive algorithms have developed using BDSNA mechanisms.

With Web 2.0, Travel 2.0 evolved. UGC that are in form of text, video andimages etc. . . are useful resources for discovering traveller psychology and behaviour.Business models like TAM were developed as a result of BDSNA . Hotel owners, travelagencies are using BDSNA approaches in addressing their customer requirements.RTA and recommendations are heavily use in Travel 2.0.

IoT has paved the way for living things like trees and non-living objects such aswashing machine to share their status over SNs. As a result of smart devices beingpart of SNs, the amount of data that is generated, that is of unstructured and semistructured are unbelievable. This pushes data scientists to explore new technologieslike FC and DD to integrate to BDA.

Third chapter focuses on core BDA technologies in SNA. Network visualization,data storage, process, accessing, recommendations systems and RTA that discussedin above SNs domains are illustrated in technically and theoretically.

17

Page 27: Big Data Social Network Analysis

Chapter 3

BDSNA Tools and Technologies

In this Web 2.0 era, data is generation is exploding exponentially and data scientistsand IT professional are highly ambitious in turning BI to an asset in their busi-ness domain.This chapter illustrate, key concerns and core technologies and tools inBDSNA.

3.1 Major Concerns in BDSNA

This section highlights identified issues from previous chapter in a nutshell.

• Security and Privacy: Most UGC on SNs reflect people’s personal life moments.All scenarios we considered in the last chapter highlights security and privacyas a major concern[11][2].

• Explosive growth rate: With growth of Internet and IoT will generate moreUGC. Infrastructures should accommodate to store, process, capture and anal-yse new sources of semi-structured and unstructured data from all SNs. FBuses Apache Hadoop1 and Apache Hive2 for storage purpose because hardwarescalability is high, and Scribe3 as a log collection strategy[45].

1http://hadoop.apache.org/2https://hive.apache.org/3https://github.com/facebookarchive/scribe

18

Page 28: Big Data Social Network Analysis

• Extract valid UGC removing noise: TM, NLP and other DM techniques needto optimize to find validity of data[2].

• Real time analytics: Need for Stream Processing (SP) is erupting. User datagathered over a period will go through Batch Processing (BP) machanism todevelop models to check and analyse incoming events in real time.

• Sophisticated analytics tools and SW: Low latency and more visualization isexpected from BDSNA tools and SW. The Lincoln Laboratory is currentlyengage in research projects to develop sophisticated algorithms and softwaretools to generate networks from unstructured/semistructured data[10].

To represent different user groups in SNs wide range of tools and SW are avail-able in the market place. When considering selecting the right tool, factors suchas, intended goal, ease of use, operating platform, cost effectiveness etc. . . needto be taken into consideration. Out of all these “visualization” capability is vi-tal.Streanghts of network ties, user groups structures, and dynamics can be viewedusing these tools.[29].

Tool / SW DescriptionGephi4 Platform independent SW that is distributed under open source

licence. Good tool in visualizing networks and their relationships.NetLogo5 Free software that supports platform independency. Helps in visu-

alizing dynamics in network formation. Study of network behaviourcan be done using this tool.

iGraph6 Free SW that can be used to perform heavy calculations.Pajek7 Another free SW that runs only on Windows platform. Network

formation, dynamics, information diffusion and many other inbuiltfeature.

UCINet8 Commercial SW that supports only the Windows platform.NodeXL9 Fairly new to market. SNA can integrate with Excel. Free SW and

for the moment only available for Windows platform.4http://gephi.github.io5https://ccl.northwestern.edu/netlogo/6http://igraph.org/7http://pajek.imfm.si/doku.php8https://sites.google.com/site/ucinetsoftware/home9http://nodexl.codeplex.com/

19

Page 29: Big Data Social Network Analysis

NetworkX10 A good tool in programming perspective. Has developed using Cand Fortral libraries. Optimized for scaling for large matrices.

Nuero productions 5K Twitter browser and Neofomix Twitter Stream Graph areadvanced visualization tools that can be used to analyse UGC from Twitter.[24]

3.2 Real Time Analysis

FB, Twitter, LinkedIn, Goolge+, TripAdvisor and all leading SNs provide realtime visibility on what their users prefer. Intel BD Research Center forecasted thatthe uses cases for Real Time Big Data Analytics (RTBDA) will spread towards morein SNA than BP, yet BP will still act as the core for RTBDA. Real time analyticsbased on SP.OI[39] and Lambda Architecture (LA) are the core BDA technologiesthat SNs mainly use for RTBDA.

RTA explanation

RTBDA is an advance technique to make better decisions and meaningful actionsat precise time. There are two major important aspects in RTA. Real time actionsare treated as “streams of events” in RTA. To determine the required action to beperformed when an event comes to the system, the system need to capture, pro-cess and analyse the parameters and attributes in the incoming event stream, anddetermine the corresponding stream category or group with regard to application do-main. Then the corresponding categories stream would match with an action thatis determined by pre defined model.It is important to develop this “model” at firstphase in RTA. Further more the RTA engines are stateless engines, in that it doesn’trequire provisions for previous incoming streams in determining action for currentstream[25].

10http://networkx.lanl.gov/

20

Page 30: Big Data Social Network Analysis

Figure 3.1: Expected growth in real time analytics by 2015

Figure 3.2: Capabilities of Operational Intelligence

FB, Twitter, and other SNs use data records that are collected over a large periodof time. Model is developed considering the nature of the application domain(i.e.tourism, healthcare etc...), not the individual records that reside in data repositories.OI and LA are core technical approaches in designing and developing RTA engines.

21

Page 31: Big Data Social Network Analysis

3.3 Lambda Architecture

LA,developed by Nathan Marz,achieves the capability of real time processing bydecomposing the event into three layers, batch layer, serving layer and speedlayer. Everything starts from query = function(all data) equation[5]. The computa-tional cost is highly expensive for to perform this function for every event on the fly.In batch view, a precomputed query function will be used to check the result forthe query instead of calculating on the fly. The precomputed view is indexed so thatit can access fast with few random reads.

Figure 3.3: Overview of Lambda Architecture

3.3.1 Batch layer

Batch layer acts as the master holding the values of batch views that are computedon master data set (HDFS) and compute arbitrary views (MapReduce)[25]. Thismaster data set domain can be either historical data or historical data with currentdata (depend on business domain and key stakeholder interest). Apache Hadoop isused to process master data set and develop required model.

simplest pseudo code for batch layer[25]

function runBatchLayer():while(true): // repeatedly recompute batch views from beginning

22

Page 32: Big Data Social Network Analysis

recomputeBatchViews()

3.3.2 Serving layer

Real time querying is supported by the serving layer. Real time stream is ingestedinto the analytic engine and inside the engine, stream is processed, then the corre-sponding action is triggered. Apache Drill11 and Cloudera Impala12 are SP enginesthat are used to implement serving layer functions[25].

3.3.3 Speed Layer

There is a substantial latency in BP, and the impact is compensated via dis-tributed SP. Apache Storm13 and Apache S414 are used to implement this layer[25].

3.4 Recommendation systems

FB, Twitter, LinkedIn, Goolge+ and all leading SNs .These systems apply knowl-edge discovery techniques to the problem of making personalized recommendationsduring a live interaction[21].

ex: Consider a scenario where you add a friend on FB and FB will automaticallygive similar recommendations. (a generalized recommendation system)

Recommendation engine analyse people who add the same person that you add,and from those people(1), the engine analyse and determine other people(2) who areadded by those people(1). System will give people(2) as our recommended people toadd and expand our network

11https://github.com/apache/drill12http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html13https://storm.apache.org/14http://incubator.apache.org/s4/

23

Page 33: Big Data Social Network Analysis

SNs recommendations are determined by the number of Likes, clicks, user rat-ings and emoticons. The algorithms are mainly of two categories, content-basedalgorithms and collaborative filtering algorithms. Content based algorithmscheck similarity of target item (recommended). Collaborative filtering technique willuse, previous similar recommendations based on clicks, ratings etc. . . Additionallytime window technique is adapted to give recommendations according to time du-rations.(Google+ and Twitter trends etc...

3.5 Web 2.0 IoT Architecture

Distributed Wireless Sensor Networks (WSNs) to share data, Robotic OperatingSystem (ROS) as middleware platform and Radio Frequency Identification (RFID)as an identification technology, provide the core architectural infrastructure to COto recognize activities and at the same time incorporate knowledge to smart objects.Pachube platform15 provide fundamental API groundings for developers to developSIoT.

Figure 3.4: Architecture for SIoT Client Side and Server Side

It is important to understand SIoT network characteristics and relationships whendesigning and developing smart environments. Four main types of relationships areexists in SIoT networrks[3]

• parental object relationship: This is a family like structure that believes CO doshare similar characteristics with devices that are developed during the sametime period (argument here is that, technology changes so rapidly)

15http://datahub.io/dataset/pachube

24

Page 34: Big Data Social Network Analysis

• co-location object relationship:Object relationships needed to established dur-ing the design and development of smart environment, based on location baseinforamation.

• ownership object relationships: One person can be owner of several CO. Thisownership information is vital when interacting with SNs of CO.

• social object relationship: Devices with similar characteristics can share bestpractices to solve issues. “Cloud-of-cloud” concept is a broad view that sharessame idea. This idea can relate to edge computing IoT devices.

3.6 Chapter summary

Security and privacy need to give a great deal of attention when designing anddeveloping BDA SW and tools as well as developing algorithms to SNs domain. InSNA, “visualization” is an important aspect to look at when designing SW that cananalyse different user groups and Gephi tool out performs other network visualizationtools.

Lambda Architecture that uses BP and SP, utilize for RTA in BDSNA. OI usesin order to develop models that can be used in RTA. Recommendations systems arediffer from domain to domain and use quite a number of user actions such as userclick, likes and ratings etc. . . in designing algorithms.

Next chapter is the final chapter that summarizes literature survey and it givesfuture directives to BDSNA domains widening current status to a new level.

25

Page 35: Big Data Social Network Analysis

Chapter 4

Conclusion, Challenges and FutureDirections

This chapter summarises overall survey and provides insight into future directions forSNs in applying gathered knowledge in practice.

4.1 Conclusion

Today even tech averse and less techy people do have an understanding aboutSNs (like FB), but they are hardly aware of what search engines can do.Children,youngsters, adults and even old people are making their presence felt in SNs. Peopleare eager to share their personal life stories, and on the other hand people like to peepinto other peoples’ affairs. Interesting fact is that, not only humans, but also otherliving and non-living objects are becoming users of SNs. The highly dynamic UGCon SNs reflect user perspectives and feedback. UGC is not restricted to a particulardomain, it spreads to a vast variety of fields and BDSNA helps in addressing widerrange of stakeholder groups with higher degree of accurate BI.

This survey is focused on four major domains(Healthcare, Defence and Security,Travel and Tourism, Web 2.0 with IoT) in SNs. To derive useful knowledge andrecognize hidden patterns from user activities of SNs, it is important to differentiate

26

Page 36: Big Data Social Network Analysis

what is exiting and interesting among all activities. BDSNA is the solution. BDSNAhas redefined these sectors to a new dimension making it worth for all interestedparties. Qualitative and quantitative results have been obtained through BDSNA, togive a better service to users of SNs. Business strategies and models are creating tosatisfy the demands of users. Predictive models, recommendation systems and realtime analytics play a major role in today’s BDSNA.

Modern day BDSNA has been identified as a best approach as an answer tomany business domains. BDSNA has become an essential part of developing highlysophisticated intelligence tools and SW.

4.2 Challenges and Future Directions

SNs like Facebook are considering cloud storage as a solution to accommodategrowing needs of data storage. As shown in "Figure 4.1", the biggest challenge inadopting cloud storage that is identifies by all organizations, is security and privacyviolations. Even though a private cloud can provide security mechanisms to establishmore security, cyber attackers are smart enough to identify loopholes and therebyspoil data on a cloud. It is evident that a s o f yet there is no 100% guarantee ofusing cloud technology as a trusted service.

Figure 4.1: Hosting data on cloud and challenges

Most UGC on SNs are irrelevant to the considered domain. Incompleteness oftext information, multilingualism content, bogus user feedback are difficult to cater

27

Page 37: Big Data Social Network Analysis

to in doing genuine analytic. Deriving algorithms and strategies based on particulargeography user group is not sufficient. Data scientists need to give more attention tothese factors when doing SNA. Also TM and NLP are currently supported most inmicro blogging content (Tweets are limited to a maximum of 140 characters). Thesetechniques need to improve to a level where it can analyse much more text content.Mechanism similar to YouTube real time translation is quite beneficial in SNs domaincontext to spread awareness to wide range of users.

SNs have a huge impact on human behaviour and intensions, and it has challengedthe conventional behavioural patterns of humans over recent years. .FB can be usedto find a friend or relation and LinkedIn is a place to find professionals. It is apparentthat SNs play the role of a “search engine”. Integrating proper index methodologies,would enhance search function of SNs and would give its users more accurate results.Further, companies advertise their products and services on SNs. In the near future,users will find it more compelling and attractive to use SNs for their online shoppingexperiences. This highlights a big business opportunity for SNs like FB, but on theother hand, a possibility for users to stay away from SNs may arise. The need forshopping pattern analytics in SNs will also arise in the future.Like we have differ-ent type of SNs now for different purposes (FB and LinkedIn), there will be morecategories of SNs in future. IoT will be a driving factor in diversifying SNs.

28

Page 38: Big Data Social Network Analysis

Bibliography

[1] Rajendra Akerkar. Big Data & Tourism Big Data & Tourism To promote inno-vation and increase. 2012.

[2] Lawrence Ampofo. 5 ways the internet of thingswill change social media, October 2014. URLhttp://www.business2community.com/social-media/5-ways-internet-things-will-change-social-media-01047822.Accessed November,2014.

[3] Luigi Atzori, Senior Member, Antonio Iera, Senior Member, and Giacomo Mora-bito. SIoT : Giving a Social Structure to the Internet of Things. 15(11):1193–1195, 2011.

[4] Ala Berzinji. Detecting Key Players in Terrorist Networks. 2011.

[5] Nathan Bijnens. A real-time Lambda Architecture using Hadoop & StormNoSQL Matters Cologne 2014 by Nathan Bijnens Speaker. 2014.

[6] Jaap Bloem, Sander Duivestein, and Thomas Van Manen. Big Social Predictingbehavior with Big Data.

[7] BloombergTV. Can wearables and big data cure disease?, August 2014. URLhttp://www.bloomberg.com/video/parkinson-s-disease-new-ways-to-study-illness-VSlDSZNASRa2zh1WZ sXhg.html.Accessed November,2014.

[8] David Bollier and Charles M Firestone. The Promise and Peril of Big Data.2010. ISBN 0898435161.

[9] Jeff Cain. Social media in health care: the case for organizational policy andemployee education. American journal of health-system pharmacy : AJHP: official journal of the American Society of Health-System Pharmacists, 68

29

Page 39: Big Data Social Network Analysis

(11):1036–40, June 2011. ISSN 1535-2900. doi: 10.2146/ajhp100589. URLhttp://www.ncbi.nlm.nih.gov/pubmed/21593233.

[10] William M Campbell, Charlie K Dagli, and Clifford J Weinstein. with Contentand Graphs. 20(1), 2013.

[11] Neil Couch and Bill Robins. BIG DATA FOR DEFENCE AND SECURITY.

[12] Paul M. Davis. A tree that tweets, September 2010. URLhttp://www.shareable.net/blog/a-tree-that-tweets. Accessed Octo-ber,2014.

[13] YOREE KOH DON CLARK. Ibm and twit-ter forge partnership on data analytics, 2014. URLhttp://online.wsj.com/articles/ibm-and-twitter-forge-partnership-on-data-analytics-1414601963.Accessed October,2014.

[14] Mark Drapeau and Linton Wells Ii. Social Software and National Security : AnInitial Net Assessment. (April), 2009.

[15] Rob Faludi. New york times on botanicalls, again!, April 2013. URLhttp://www.botanicalls.com/. Accessed October,2014.

[16] Roberta Floris and Michele Campagna. Social Media Data in Tourism Planning:Analysing Tourists’ Satisfaction in Space and Time Roberta Floris, Michele Cam-pagna. 8(May):997–1003, 2014.

[17] California Healthcare Foundation. The Wisdom of Patients : Health Care MeetsOnline Social Media. (April), 2008.

[18] Peter Groves and David Knott. The ‘ big data ’ revolution in healthcare.(January), 2013.

[19] Dominique Guinard, Vlad Trifa, Friedemann Mattern, and Erik Wilde. Fromthe internet of things to the web of things: Resource-oriented architecture andbest practices. In Architecting the Internet of Things, pages 97–129. Springer,2011.

[20] Carissa Hilliard. Social media for healthcare: A content analysis of md an-derson’s facebook presence and its contribution to cancer support systems. ofUndergraduate Research in Communications, page 23.

30

Page 40: Big Data Social Network Analysis

[21] Jianming and Wesley W Chu. A Social Networ k-Based Recommender System( SNRS ).

[22] Maged N Kamel Boulos, Antonio P Sanfilippo, Courtney D Corley,and Steve Wheeler. Social Web mining and exploitation for seri-ous applications: Technosocial Predictive Analytics and related tech-nologies for public health, environmental and national security surveil-lance. Computer methods and programs in biomedicine, 100(1):16–23, Oc-tober 2010. ISSN 1872-7565. doi: 10.1016/j.cmpb.2010.02.007. URLhttp://www.ncbi.nlm.nih.gov/pubmed/20236725.

[23] Deborah Kotz. Using twitter as tool to trackside effects from drugs, April 2014. URLhttp://www.bostonglobe.com/lifestyle/health-wellness/2014/04/30/using-twitter-tool-track-side-effects-from-drugs/92zR3e6NBDGiiWl3n4feLK/story.html.Accessed November,2014.

[24] Matthias Kranz, Luis Roalter, and Florian Michahelles. Things That Twitter :Social Networks and the Internet of Things.

[25] Nathan Marz and James Warren. Big Data principals and practices of scalablereal time systems .

[26] Roberta Milano. The effects of online social media on tourism websites. 2011.

[27] Mark Million. Washing machine twitterswhen clothes are done, January 2009. URLhttp://latimesblogs.latimes.com/technology/2009/01/twitter-washing.html.Accessed November,2014.

[28] Alan Mislove, Hema Swetha Koppula, Krishna P Gummadi, Peter Druschel, andBobby Bhattacharjee. Growth of the flickr social network. In Proceedings of thefirst workshop on Online social networks, pages 25–30. ACM, 2008.

[29] Chamin Nalinda. Social network analysis tools and softwares, October 2014. URLhttp://techspiro.blogspot.com/2014/10/social-network-analysis-tools-softwares.html.Accessed October,2014.

[30] Mary K Obenshain. Application of Data Mining Techniques to Healthcare Data.(August):690–695, 2004.

31

Page 41: Big Data Social Network Analysis

[31] Onook Oh, Manish Agrawal, and H Raghav Rao. Information control and terror-ism: Tracking the mumbai terrorist attack through twitter. Information SystemsFrontiers, 13(1):33–43, 2011.

[32] Chintan Patel. Now you can talk to twitter andfind clinical trials on trialx, December 2012. URLhttp://trialx.com/enablers/2009/03/now-you-can-talk-to-twitter-and-find-clinical-trials-on-trialx/.Accessed November,2014.

[33] Loredana Di Pietro, Francesca Di Virgilio, and Eleonora Pantano. So-cial network for the choice of tourist destination: attitude and be-havioural intention. Journal of Hospitality and Tourism Technology, 3(1):60–76, 2012. ISSN 1757-9880. doi: 10.1108/17579881211206543. URLhttp://www.emeraldinsight.com/10.1108/17579881211206543.

[34] Angelo Presenza and Maria Cipollina. Analysis of links and features of tourismdestination’s stakeholders. an empirical investigation of a south italian region.2009.

[35] Pslulfdo and Ehwzhhq. An Empirical Study on the Relationship between TwitterSentiment and influence in Tourism Domain. 2012.

[36] Cornell Hospitality Report, Laura Mccarthy, Debra Stock, Rohit Verma, D Ph,Rod Clough, Gregg Gilman, Employment Practices, and Gilbert Llp. How Trav-elers Use Online and Social Media Channels to Make Hotel-choice Decisions. 10(18), 2010.

[37] Steve Ressler. Social network analysis as an approach to combat terrorism:past, present, and future research. Homeland Security Affairs, 2006. URLhttp://www.hsaj.org/?download&mode=dl&h&w&drm=resources%2Fvolume2%2Fissue2%2Fpdfs%2F&f=2.2.8.pdf&altf=2.2.8.pdf.

[38] Lewis Robinson. A tweet from your toaster: How the in-ternet of things will affect social media, May 2014. URLhttp://www.socialmediatoday.com/content/tweet-your-toaster-how-internet-things-will-affect-social-media.Accessed November,2014.

[39] Philip Russom. TDWI Checklist Report: Operational Intelligence: Real-TimeBusiness Analytics from Big Data.

[40] Philip Russom. T DW I R E S E A R C H BIG DATA. 2011.

32

Page 42: Big Data Social Network Analysis

[41] Series, Chris Cooper, C Michael Hall, New Zealand, Noel Scott, and RodolfoBaggio. Network Analysis and Tourism From Theory to Practice.

[42] Amit Sheth, Boanerges Aleman-meza, I Budak Arpinar, Chris Halaschek, andCartic Ramakrishnan. Semantic Association Identification and Knowledge Dis-covery for National Security Applications. 16(March):1–16, 2005.

[43] Sung-bum and Dae-young Kim. TRAVEL INFORMATION SEARCH BEHAV-IOR AND SOCIAL NETWORKING.

[44] Sarawut Supattranuwong and Sukree Sinthupinyo. Applying Data Mining toAnalyze Travel Pattern in Searching Travel Destination Choices. pages 38–44,2013.

[45] Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Na-mit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu.Data warehousing and analytics infrastructure at facebook. Proceed-ings of the 2010 international conference on Management of data - SIG-MOD ’10, page 1013, 2010. doi: 10.1145/1807167.1807278. URLhttp://portal.acm.org/citation.cfm?doid=1807167.1807278.

[46] Tim O’Reilly. What Is Web 2.0. URLhttp://oreilly.com/web2/archive/what-is-web-20.html.

[47] Alex Woodie. How big data analytics can help fight isis, October 2014. URLhttp://www.datanami.com/2014/10/14/big-data-analytics-can-help-fight-isis/.Accessed November,2014.

33