pdf data loading without mr using pig

If you can't read please download the document

Upload: rajesh-kumar-mandal

Post on 11-Feb-2017

48 views

Category:

Documents

2 download

Report

Download

Embed Size (px):

TRANSCRIPT

Unstructured data (pdf) conversion and loading into HDFS

Objective : We received PDF data from client. We have to convert PDF data in txt format and load the data in HDFS so that we can generate reports based on client's data.

Data Sample :

Step 1 :

copy that pdf file into linux box in any folder.File name :InputData.pdfRun this below command in linux environment from specified location :

hadoop@hadoop:~/Testing$ pdftotext -layout -nopgbrk InputData.pdfhadoop@hadoop:~/Testing$ cat InputData.txt|tr -s " ">Input1.txt

Step 2 : copy Input1.txt into HDFS environment

hadoop@hadoop:~/Testing$ hadoop fs -copyFromLocal Input1.txt /rajesh

Step 3 : To view the file in HDFS

Step 4 :Run pig through HDFS mode

grunt> grunt> A= LOAD '/rajesh/Input1.txt' using PigStorage(' ') as (Sid:int,Sname:chararray,Ttrading:chararray,Sloc:chararray,OBal:int,CBal:int,Frate:int);2016-05-23 18:38:22,096 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS grunt> disHM= DISTINCT A; grunt> orHM = ORDER disHM by Sid; grunt> STORE orHM INTO '/rajesh/pigoutput' using PigStorage ',');

To view the output generated by pig :

Retractable Loading Spouts Why are they used? Why are they used? –Loading spouts are used to load or stack any dry material without dust What they do

Instructions for Using PIG® Waste Compactor - New Pig

Pig programming is more fun: New features in Pig

LOADING SOUND. LOADING SOUND.. LOADING SOUND

· + The Swedish Pig Breeders’ Association + Central Pig Breeders’ Association in the Netherlands + The Australian Pig Society ˘

TOM, - McGill LibraryYes, yes, Tom stole the pig, And at school they flogg’d his rig. TOM, THE PIPER'S SON. 3 Here’s a long tail’d pig, Or a short tail’d pig, Or a pig without

Pig Veterinary Society THE CASUALTY PIG Pig - April 2013-1.pdf · 5 Treatment Any pig that appears to be ill or injured should be cared for without delay and, where they do not respond

Fetal Pig Dissection. Websites virtual pig dissection: estive-system

Safely Monitor Loading of Grain Trailer without leaving

PLR (Pig Launcher/Receiver) Package Solutions - … · • Barred Tee • Handling Device • Jib Crane • Hoist • Pig Signallers • Pig Trays PIG LAUNCHER AND RECEIVERS Customer:

Boot Application loading without CoDeSys V3...Application Note – Boot Application loading without CoDeSys V3.5 – 1.10 Seite 13 von 23 4 Loading the Boot Application with FFT and