revenue & employment analysis of international students in usa using pyhive

Post on 16-Apr-2017

211 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Revenue & employment Analysis of International Students in USA

Team Members: Priyanka Kale, Apekshit Bhingardive, Aditya VermaGuide: Dr. Jongwook Woo

24th Annual Student Symposium, CSULA26th February 2016

What is Big Data?

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.

It's not the amount of data that's important. It's what we do with the data that matters.

Machine Learning: big data often doesn't ask why and simply detects patterns.

Digital footprint: big data is often a cost-free byproduct of digital interaction.

Purpose of Analysis

To develop a system which will assist us to determine the revenue generated by international students.

Examining the relationship between new international enrollments and institutional income at public colleges, universities and professional organizations in the US.

Continued..

To understand the effects of increased international student enrollment on net revenue generation in US

Find out the income from Universities

Predict the impact of international students on revenue generation

Predict employment opportunities in the US

• Basic formula for calculating economic Benefit

Analysis is done using:

Analysis on huge data is done using the Hadoop File system (HDFS)

Hadoop environment using Horton Sandbox on Azure

Using Python and HIVE [Pyhive] – iPython Notebook

HUE

Google Fusion tables

WEKA Framework

Loading data into HDFS: File has been uploaded using Hadoop command line

Interface

Hortonworks Sandbox configuration

Number of nodes: 3 Size : Basic A4 with 8 cores 14 Gb memory

Creating tables in HUE from existing data

Connecting HIVE through Python Using Ipython notebook for writing the python

code

Embedding HiveQL inside python code.

Executing the Hive script from python code:

Visualizing data with Graphs

Alabam

a

Alask

a

Arizon

a

Arkan

sas

Califo

rnia

Color

ado

Connec

ticut

Delawar

e

Distric

t of C

olumbia

Feder

ated

State

s of M

icron

esia

Florid

a

Georg

iaGua

mHaw

aii

Idaho

Illinois

Indian

aIow

a

Kansa

s

Kentu

cky

Louisi

anaMain

e

Marsh

all Is

lands

Maryla

nd

Massa

chus

etts

Michiga

n

Minnes

ota

Mississ

ippi

Missou

ri

Monta

na

Nebra

ska

Nevad

a

New H

amps

hire

New Je

rsey

New M

exico

New Yor

k

North

Caro

lina

North

Dak

otaOhio

Oklaho

ma

Oregon

Palau

Pennsy

lvania

Puerto

Rico

Rhode I

sland

South

Caroli

na

South

Dak

ota

Tenn

esse

eTe

xas

$0.00

$5,000,000,000.00

$10,000,000,000.00

$15,000,000,000.00

$20,000,000,000.00

$25,000,000,000.00

TOTAL EARNING FROM FEES

Major earning states

California; 9.55%

New York; 10.84%

Pennsylvania; 7.36%

Percentage of total income

CaliforniaNew YorkPennsylvania

Visualizing Data in Google Fusion Tables

Supervised Learning using Classification:

WEKA framework has been used to classify the states depending on there total value of earnings.

UserClassifier Algorithm provided by WEKA tool has been used to generate graph of classification.

Final outcome of the Hive script executed in python has been processed using above mentioned algorithm.

Continued.. The class color differentiate the states into categories : For instance New York lies in orange color zone with being the among the top revenue generating state

Value Proposition:

International Students mobility trends: By 2017, the global middle class is projected to increase its spending on educational products and services by nearly 50 percent.

Institutions can take this growth into consideration!

United States a more welcoming nation!

Predictive Modelling:

Employment Analysis – How ? Finding data where international student work after their graduation

Based on the number students employed in current and past years

Number of employers hiring international students in every filed of the grad study [Job positions]

Thank You!

top related