geolocation analysis using hiveql
TRANSCRIPT
![Page 1: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/1.jpg)
Geolocation Data Analysis for Safe Residence using HiveQL
TEAM: PRIYANKA KALE, PRIYAL MISTRY, HITESH JAGTAP GUIDE: DR. JONGWOOK WOO
24th Annual Student Symposium, CSULA26th February 2016
![Page 2: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/2.jpg)
Table of Contents1. Introduction
2. Big Data
3. Flowchart
4. Specifications
5. Implementation
6. Visualization
7. GitHub
8. Business Perspective
9. References
![Page 3: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/3.jpg)
Introduction: Goal- To determine if a location is safe or not by analyzing huge
crime data (1.3 GB) for Chicago city in IL collected from 2001 to present(November 2015).
This is a study of real dataset provided by the government of United States of America using Big Data Analytics and related Tools.
Query output is visualized using different graphs and maps for better interpretation.
![Page 4: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/4.jpg)
Big Data
Volume
Complexity
Variety
Variability
![Page 5: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/5.jpg)
Flowchart
Download Dataset
Upload data into HDFS
Trigger Hive Queries
Result Tables
Output visualization
![Page 6: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/6.jpg)
Specifications
• Microsoft Azure Hortonwork’s sandbox: 1. Linux system2. No. of nodes: 43. 8 cores4. Size-14 Gb
![Page 7: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/7.jpg)
Implementation
Hue is a web application which helps to browse HDFS and work with Hive and Cloudera Impala queries, MapReduce jobs.
![Page 8: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/8.jpg)
Creation of tables in Hcatalog:
![Page 9: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/9.jpg)
Hive and Beeswax
Hive is an infrastructure built on top of Hadoop for data summarization, query and analysis
Beeswax an application to perform HIVE queries
![Page 10: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/10.jpg)
Processing in Beeswax:
![Page 11: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/11.jpg)
Total no and rank of crime type –
select primary_type, count(iucr), rank() over (ORDER BY count(iucr) desc) from crime group by primary_type limit
100;
Queries and Visualization
![Page 12: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/12.jpg)
number of crime as per location type for a given area- select location_description, count(iucr) from crime where address = '008XX N MICHIGAN AVE' group by location_description limit 100;
0200400600800
10001200
Total
Total
![Page 13: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/13.jpg)
Final Outcome of Analysis:CREATE TABLE UnsafeArea row format delimited fields terminated by ',' STORED AS RCFile AS select address,count(iucr) AS total_crimes,rank() over (ORDER BY count(iucr) desc) AS rank from crime GROUP BY address;
![Page 14: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/14.jpg)
GitHub
URL: https://github.com/priya708/Project-520
![Page 15: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/15.jpg)
Business Perspective Get better advertisement
Predictive Policing for Police department: The future of Law enforcement?
• Reducing Random Gunfire• Connecting Burglaries and Code Violations
![Page 16: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/16.jpg)
References
https://catalog.data.gov
https://cwiki.apache.org/confluence/display/Hive/Tutorial
https://hortonworks.com/tutorials
![Page 17: Geolocation analysis using HiveQL](https://reader035.vdocuments.us/reader035/viewer/2022062902/58f9a93c760da3da068b6c86/html5/thumbnails/17.jpg)
THANK YOU