python vs big data - unimi.itmarchi.ricerca.di.unimi.it/teaching/bigdata2018/l1/introduction...

18
Python vs Big Data "Python?? Why Python?" h$ps://www.youtube.com/watch?v=Vru9xOEtOM

Upload: others

Post on 01-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

PythonvsBigData"Python??WhyPython?"h$ps://www.youtube.com/watch?v=Vru9xOEtOM

Page 2: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

WhatIsPython•  Pythonisahigh-level,interpretedandgeneral-purposedynamicprogramminglanguagethatfocusesoncodereadability.

•  ThePythoniswidelyusedandhavealargeandac9veprogrammercommunity.

•  Ithasacomprehensiveandlargestandardlibrarythathasautoma9cmemorymanagementanddynamicfeatures.

•  Iteasilyextensiblebyotherprogramminglanguage•  h$ps://www.python.org/•  h$ps://en.wikipedia.org/wiki/Python_(programming_language)

Page 3: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

WhyPython...somestepback...It's a dirty job, but someone have to do it

•  Peopleneedstoelaboratedatainordertoextractresults

data resultTransforma9on

data

program

result

Digitaliza0on Rendering

computer

Page 4: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

Datacoding•  Digitalcomputerscanhandleonlybinarysignals:sequencesof0and1(bit=binarydigit)

•  Inordertotransformdatabydigitalcomputers,itneedstodigitalizedata,i.e.transformrealsamples(images,sound,etc.)intosequencesofbits,packedfortechnologicalandhostoricalreasonsintogroupof8bit,calledbytes.

•  Themeaningofasequenceisgivenbytheformatusedtocodeandinterpreterthesequence,eg.ASCII,bitmap,mp3.

0100.00100011.11000100.0010 ASCIIcodes666066 Characters"B<B"

BITMAP3x8

h5ps://en.wikipedia.org/wiki/ASCII h5ps://en.wikipedia.org/wiki/BMP_file_format

Page 5: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

ComputersathardwarelevelAveryschema9candsimplifieddraYofadigitacomputer

CPU

RAM

IO

Keyboard

Printer

Network.....

program

program

data

data

Executor

Workingarea

Storingarea

Page 6: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

Codingtransformations•  Aclassicaldigitalcomputertransformsdigitaldatabyfollowingaprogram,i.e.asequenceofcommandsthatdescribesthetrasforma9onstobeappliedtodata.

•  Aprogramcanbewri$enusingvariouseHi-Levelprogramminglanguages,i.e.languageforhumans,eg.ADA,C,C++,Perl,Python,Java,Pascal,Basic.

•  Computers,athardwarelevel,understandonlyaverytrivialsetofcommands,theAssembly,aLow-Levelprogramminglanguage,alanguageforCPUs.

Page 7: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

Hi-LevellanguagesBASIC:10INPUT"Yourname?:",NAME$20PRINT"Hello";NAME$

C:#include<stdio.h>char*name[100];intmain(){prinj("Yourname?:");scanf("%s",name);prinj("Hello%s\n",name);return0;}

Python:name=input("Yourname?:")print("Hello",name)

Java:packagestringvariables;importjava.u9l.Scanner;publicclassStringVariables{Scanneruser_input=newScanner(System.in);Stringname;System.out.print("Yourname?");name=user_input.next();System.out.print("Hello"+name);}

Page 8: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

Assembly

h$ps://www.researchgate.net/figure/Assembly-instruc9ons-of-an-x86-example-op9mizing-frequently-executed-pieces-of-code_fig2_3881320

instruc9oninmemoryusedbyCPU instruc9ontransliteratedforhumans

Page 9: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

data

program

result

Usecomputers?startproblems!https://www.youtube.com/watch?v=tiq6v39YliQ

•  Datamanagement•  Portability

•  Codereadibility•  Codemaintenance•  Codeestensibility

•  speed?•  cost!

Page 10: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

DevelopeCode:ajobforteams• Codeshould(must?)be:•  readable:projectspassthorughmanyhandsandmaylive,fromchangetochange,formanyyears•  easytodevelope:•  easysyntaxàfastlearning•  noterror-prone:syntaxshouldaidgoodprogramming

•  withalotofalreadymadewheels:awidelibrarycollec9onofgoodfunc9onsaidtobuildupgoodcoderapidly(dontreinventthewheel)•  Cool:alargeconnectedcommunityofgeeksthatcodewithyourprogramminglanguageprobablyhavealreadysolvedallofyourpossibleproblems.

Page 11: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

Speed•  Speedgenerallyconflictswithcodemaintenance.•  Fastcodesinordertofullcontroltheflowoftheistruc9ons(usually):•  iscodedusinga"raw"programminglanguage(eg.C,C++)thusitresultoYenunreadable.

•  itdon'tuse"abstrac9ons"forimplemen9ngalgorithmandmanagingdatathusitbacameeasytomakemistakesandbugs

•  librariesareimplementedfromscratchinordertoop9mizecodeorremoveunusedpartofcode,thus"newcode,newbugs".

"Don'trunifthereisnotneeds"

Page 12: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

InterpretervsCompiler•  TheprocessoftranslatefromHItoLowLevelcanbemadeintwoway:translatetheprogramwithacompileroexecutetheprogramwithaninterpreter

•  Compilers:•  takealotof9meforcompilephasebuttheresult,theexecutable,runfastonCPU.

•  Anynewreleaseofthecodehavetobecompiledagain•  therenoeasywaystorunthecodestepbystepfortest(youhavetouseadebugger)

•  Interpreters:•  designedforinterac9vemode:easytodebugcode•  codeisexecutedbyanagent,notdirectlybyCPU

•  easytoporttonewkindofcomputer

•  Notsofast:eachlinehavetobetranslatedany9meisexecuted

Page 13: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

Speed

Cchar*aword=malloc(typeof(char)*10);scanf("%s",aword);for(i=0;strlen(aword);i++){

prinj("%c\n",aword[i]);}free(aword);+fast:compiledfortherunningCPU+smallbinary-unreadable-memorymgmtisourduty-easytomakemistakesonsyntax

pythonaword=input()forcinaword:

print(c)+easytoundestand+easytofinderrors+memorymgmtisdelagatedtosystem-notsofast:managingobjectrequiresabackgroundprocessthatsinksomecpu9me,itisinterpreted.

Page 14: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

Speedconstrains•  Speeddependsmainly:•  datamanagement:

•  howobjectsfordataarecreateand,moreimportants,destroyed.•  howaccesstodataismaderespecttothelayeredcachedmemory

•  CPUparallelism:•  modernCPUsaresuperscalar:candomanystepsatthesame9me,concurrently,ifthecodepermitsit.

Page 15: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

Datamanagementado-it-yourselfview(Cstyle)

INPUT"ABC",1

ABC

RAM

1.  createaword2.  createanumber3.  createaXtype

4.  put"ABC"inthefirstword

5.  put1inthefirstnumber

6.  destroytheword7.  destroythenumber

wordtype

numbertype

Xtype

1

garbageuncollectedwastememory

Page 16: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

WN

Xtype(dead)

Datamanagementadata-as-serviceview(Javastyle)

INPUT"ABC",1

ABC

RAM

1.  IneedawordWfor"ABC"

2.  IneedanumberNfor1

1

Xtype

garbagecollec9on9medservice

Page 17: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

Pythonspec•  Generalpurpouselanguage•  Focusedonreadability•  Interpreted•  Modular•  Dynamic•  Object-oriented•  Portable•  ExtensibleinC++&C

Page 18: Python vs Big Data - unimi.itmarchi.ricerca.di.unimi.it/Teaching/BigData2018/L1/Introduction to... · Interpreter vs Compiler • The process of translate from HI to Low Level can

Snakify•  Snakifyisaplajormfore-LearningPython3•  Connecttoh$ps://snakify.org/•  Signupusing

•  [email protected](dontuseyourprivateemail,ifpossible)

•  apasswordDIFFERENTfromtheonausedforemail

•  flagtheop9on"Ihaveateacher"•  put"[email protected]"inthefield"Teacher'semail"