whitepaper: extract value from facebook data - happiest minds

Extract value fromFacebook Data

Abstract.................................................................................................................................................3

Introduction...........................................................................................................................................3

Building Blocks.....................................................................................................................................4• Configuration files

Parameterized Map/Reduce Program..................................................................................................5• Parameters• Extraction Process

Conclusion............................................................................................................................................6

About the Author...................................................................................................................................6

Contents

© Happiest Minds Technologies Pvt. Ltd. All Rights Reserved2

Abstract

In present times any marketing or customer strategy is

incomplete without a social media presence. With custom-

ers depending all the more on social media channels to

access and disseminate information and reviews, it

becomes all the more important for organizations to tap

social media channels for actionable insights. For analytic

engines that churn out insights required for quick and

intelligent decisions, social media is a key channel that

needs to be explored on a consistent basis.

Organizations are increasingly looking towards accelera-

tors and frameworks that enable them to get the required

intelligence from social media channels. Having the right

accelerator enables the organization make intelligent

decisions regarding their customer behaviour.

Extraction Process and FlowThe process and cornerstones of the accelerator is based

on the understanding that Facebook exposes its data in

form of a structured Facebook schema which can be

accessed via GraphAPIs.

Introduction

Modern organisations lay a lot of emphasis on offering

customized services to their customers. In such a situa-

tion, the customer’s social profile and behaviour related

information play a crucial role. Most of the organisations

have an analytic pattern that is customer centric, descrip-

tive, predictive as well as prescriptive. Organisations have

been putting in huge chunks of investments to get the

required view from their customer data and expect a quick

return on their investments.

When the need of the hour is a deliverable system that is

astute as well as fast and reliable, organisations need to

look at quick plug and play accelerators that will allow

them to access the required information quickly in real

time. The main benefit of such data for an organisation is

that it provides the time it needs to concentrate on analytic

problem statements which gives more importance to data.

The accelerator should not only be quick but also be

effective and enterprising, be able to adapt to changing

conditions, as well as be able to make the best use of the

available resources.

Facebook AcceleratorWith the amount of time the current generation spends on

social media, it is natural that most enterprises are now

trying to keep in touch with their customers through social

channels. It is no surprise that the top social media chan-

nels like Facebook, LinkedIn and Twitter serve as sources

of data in current times.


Hosted on an open source big data frame work

Leverages the power of disruptive technology and

ensures that data is available near real time

Since the accelerator is powered by metadata file, it

allows changes to be made as well as version up grada-

tion of face book schema without altering the code.

•

•

•

Facebook has a mechanism called Facebook Query Language (FQL) to allow data querying from the entire Facebook

schema. The complete schema can be found in the URL [https://developers.facebook.com/docs/reference/fql/ ]. A project

by the name “RestFB” - A subset of FQL schema, provides third party classes for the accelerator.

Building Blocks

Configuration files


Mandatory configuration file: The tables and columns in this file are imperative for other tables to gather data. While these

tables are independent, the tables in the optional configuration file are dependent.

Optional configuration file: The tables in this file and their corresponding columns are dependent on the tables in the man-

datory configuration file.

•

•

EVENT

eid

name

nid

pic

host

description

event_type

eveny_subtype

start_time

end_time

creator

update_time

location

venue

•

•

•

•

•

•

•

•

•

•

•

•

•

•

STREAM

post_id

app_id

source_id

updated_time

created_time

actor_id

target_id

message

action_links

attachment

comments

likes

privacy•

•

•

••

•

•

•

•

•

•

•

•

PAGE

page_id

pic

page_url

type

company_overview

location

bio

fan_count

••

•

•

•

•

•

•

LIKE

object_id

object_id_cursor

object_type

post_id

Li kecol

user_id•

••

•

•

•

••

•

•

••

•

•

•

•

•

••

USER

uid

first_name

last_name

name

pic

birthday

sex

relationship_status

current_location

interests

about_me

profile_url

family••

•

••

•

•

••

•

•

••

COMMENT

xid

post_id

from id

time

text

id

username

reply_xid•

•

•

•

•

••

•

FQL Table Schema


This program is devised to distil data from Facebook and load it in HDFS.

Parameters:

Metadata files make use of linked hashed maps to make sure they retain the order of the existing tables. Optional configu-

ration files are given to all reducers through the distributed cache.

Parameterized Map/Reduce Program

Page id

App id + secure key

Configuration files

Desired database name in HIVE

Number of machines can be specified too (reducers to be launched by Hadoop)

Extraction process involves the following steps

Configuration files are subjected to changes if any

Job is launched with the correct Face book page id as argument

Inside the mapper :

The mandatory configuration file is processed and it collates data from stream and event table

The HDFS folder is used to write output files for the stream and event table

•

•

•

••

•

•

•

Data Access from Analytics programs

Load

Configuration File +

ID and Access Token

FB name or ID of the brand page

Script calls a Map Reduce job

to fetch data in parallel

Fetch Data

Access

HIVEHDFS

Using a plug and play accelerator, teams will get access to almost all the data in near real time and help them do the actual

work (of analytics) rather than data collection and data cleansing. This helps organizations obviate the excess time required

for mundane activities and focus on the more relevant analytics that drive customer insights and revenue growth.


Conclusion

Inside the reducer:

The reducer is used to hold the post IDs from the streams and event tables. The number of IDs processed by

the reducers is calculated as the Total number of post IDs/number of reducers fired up. In this way the reduc

ers will have an even distribution of load. While the key is the number of reducers, values are represented by

the post IDs.

The reducers will write their own files which correlate with the tables in the optional configuration file. The

number of reducers can be provided as a parameter while submitting the job.

Post Map-Reduce phase:

Hive script creates database and tables according to the corresponding names specified.

Data from HDFS is copied into appropriate tables in the database created in the above step.

Now data is available in a tabular format and teams requiring this data can connect to Hive database and work

on it.

At the end of the job, the particular program would have collated enough data which gives information on the post, likes

on the post, comments made on the post, number of likes, users who have engaged with the post and basic user informa-

tion. The data will be pushed to Hive Database specified by the user into appropriate tables. The table names and

columns are in accordance with those specified in the configuration files.

Bhawna Manchanda is a Big Data Architect. She plays a key role in conceptualizing and implement-

ing BIG Data Solutions/Framework and Strategies in Happiest minds. She has also worked exten-

sively with Leading Banks in BIDW space.

About the Author

Bhawna ManchandaBig Data Architect

Sunny Malik has a Master’s Degree in Computer Science from University of Southern California

(USC). He has worked extensively on Application Development using open-source technologies

and currently focused on Big Data Technologies and Algorithm Development.

Sunny MalikBig Data Technologies and Algorithm Development

Skanda Bhargav is a Cloudera Certified Hadoop developer. He is a Computer Science graduate

from Viswesvaraya Technological University, Belgaum popularly known as VTU. He has contributed

to 3 books on Big Data subject which was published by http://www.packtpub.com/ .His interests are

Hadoop, Hive, Map Reduce and Sqoop.

Skanda BhargavHadoop developer.

•

•

http://www.happiestminds.com/big-data/

© Happiest Minds Technologies Pvt. Ltd. All Rights Reserved

Happiest Minds is focused on helping customers build Smart Secure and Connected experience by leveraging disruptive

technologies like mobility, analytics, security, cloud computing, social computing and unified communications. Enterprises

are embracing these technologies to implement Omni-channel strategies, manage structured & unstructured data and

make real time decisions based on actionable insights, while ensuring security for data and infrastructure. Happiest Minds

also offers high degree of skills, IPs and domain expertise across a set of focused areas that include IT Services, Product

Engineering Services, Infrastructure Management, Security, Testing and Consulting.

Headquartered in Bangalore, India, Happiest Minds has operations in the US, UK, Singapore and Australia. It secured a

$45 million Series-A funding led by Canaan Partners, Intel Capital and Ashok Soota.

Happiest Minds

© 2014 Happiest Minds. All Rights Reserved.E-mail: [email protected]

Visit us: www.happiestminds.com

Follow us on

7

This document is an exclusive property of Happiest Minds Technologies Pvt. Ltd.

http://www.happiestminds.com/

https://www.facebook.com/happiestminds

https://twitter.com/happiestminds

https://www.linkedin.com/company/happiest-minds-technologies

https://www.youtube.com/user/HappiestMinds

https://plus.google.com/u/0/100708608550362684247/posts

http://www.happiestminds.com/blogs/

http://www.happiestminds.com/technology-focus/cloud-computing/

http://www.happiestminds.com/technology-focus/mobility-solutions/

http://www.happiestminds.com/services/it-services/

http://www.happiestminds.com/product-engineering-services/

http://www.happiestminds.com/product-engineering-services/

http://www.happiestminds.com/services/infrastructure-management-Services/

http://www.happiestminds.com/big-data-analytics/

http://www.happiestminds.com/IT-security-services/cyber-security-services/

http://www.happiestminds.com/social/

http://www.happiestminds.com/services/infrastructure-management-Services/unified-communications-services/

http://www.happiestminds.com/Insights/omni-channel/

http://www.happiestminds.com/Insights/security-testing/

http://www.happiestminds.com/services/software-product-engineering/independent-testing-services

whitepaper: extract value from facebook data - happiest minds

Social Media

social media channels

social media presence

customer data

required information

customers social profile

customer behaviour

customer strategy

customer centric