advanced geoprocessing with python
DESCRIPTION
4-hour short course given at the Mid-America GIS Consortium Biennial meeting, April 2012, Kansas City, MO.TRANSCRIPT
Advanced geoprocessing with…
MAGIC 2012Chad Cooper – [email protected]
Center for Advanced Spatial TechnologiesUniversity of Arkansas, Fayetteville
Intros
• your name• what you do/where you work• used Python much?– any formal training?– what do you use it for?
• know any other languages?
Objectives
• informal class– expect tangents– code as we go
• not geared totally to ArcGIS• THINK – oddball and out of the ordinary
applications will make you want more…
Outline
• data types review• functions• procedural vs. OOP• geometries• rasters• spatial references• error
handling/logging
• documentation• 3rd party modules• module installation• the web– fetching– scraping–email– FTP
• files
Strings• ordered collections of characters• immutable – can’t change it• raw strings: path = r”C:\temp\chad\”• slicing fruit[0] ‘b’
• indexing: fruit[1:3] >> ‘an’• iteration/membership: for each in fruit
‘f’ in fruit
Strings• string formatting: ‘a %s parrot’ % ‘dead’ ‘a dead parrot’
• useful string formatting:
import arcpyf = "string"arcpy.CalculateField_management(fc, “some_field", '"%s"' % f)
Lists• list – ordered collection of arbitrary objects list1 = [0,1,2,3] list2 = ['zero','one','two','three'] list3 = [0,'zero',1,'one',2,'two',3,'three']
• ordered list2.sort() list2.sort(reverse=True) ['one','three',...] ['zero','two',...]
• mutable – you can change it list1.append(4) list1.reverse() list2.insert(0,’one-half’) [0,1,2,3,4] [4,3,2,1,0] [‘one-half’,’zero’…]
list2.extend([‘four’,’five’]) <- Extend concats lists
Lists…• iterable – very important! for l in list3 0 zero ...
• membership 3 in list3 --> True • nestable – 2D array/matrix list4 = [[0,1,2], [3,4,5], [6,7,8]]
• access by index – zero based list4[1] list4[1][2] [3,4,5] 5
Dictionaries
• unordered collection of arbitrary objectsd = {1:’foo’, 2:’bar’}
• key/value pairs – think hash/lookup table (keys don’t have to be numbers)
d.keys() d.values() [1, 2] [‘foo’,’bar’]
• nestable, mutable d[3] = ‘spam’ del d[key]
• access by key, not offset d[2] >> ‘bar’
Dictionaries
• iterabled.iteritems()<dictionary-itemiterator object at 0x1D2D8330>
for k, v in d.iteritems():print k, v
... 1 foo2 bar
Tuples
• ordered collection of arbitrary objects• immutable – cannot add, remove, find• access by offset• basically an unchangeable list (1,2,’three’,4,…)
• so what’s the purpose?– FAST – great for iterating over constant set of
values– SAFE – you can’t change it
List comprehensions
• Map one list to another by applying a function to each of the list elements
• Original list goes unchanged L = [2,4,6,8] J = [elem * 2 for elem in L] >>> J [4, 8, 12, 16]
Sets
• unordered collections of objects• like mathematical sets – collection of distinct
objects – NO DUPLICATES• example – get rid of dups in a list via list comp L1=[2,2,3,4,5,5,3] L2=[] [L2.append(x) for x in L1 if x not in L2] >>> L2 [2, 3, 4, 5]
Sets• get rid of dups via set:
>>> L1=[2,2,3,4,5,5,3]>>> set(L1)set([2, 3, 4, 5])>>>L1 = list(set(L1))>>>>>> L1[2, 3, 4, 5]
• union:>>> L2 = [4,5,6,7]>>> L1 + L2[2, 3, 4, 5, 4, 5, 6, 7]>>>>>> list(set(L1).union(set(L2)))[2, 3, 4, 5, 6, 7]
Sets
• intersection – data are the same>>> set(L1).intersection(set(L2))set([4, 5])>>>
• symmetrical difference – data are not the same>>> set(L1).symmetric_difference(set(L2))set([2, 3, 6, 7])
>>> L1[2, 2, 3, 4, 5, 5, 3]>>> L2[4, 5, 6, 7]
• difference – data in first set but not second>>> set(L1).difference(set(L2))set([2, 3])>>> set(L2).difference(set(L1))set([6, 7])
Programming paradigms:big blob of code
• OK on a small scale for GP scripts• gets out of hand quickly• hard to follow• think ModelBuilder-exported code
Programming paradigms:procedural programming
• basically a list of instructions• program is built from one or more procedures
(functions) – reusable chunks• procedures called at anytime, anywhere in program• focus is to break task into collection of variables,
data structures, subroutines• natural style, easy to understand• strict separation between code and data
Functions
• portion of code within a larger program that performs a specific task
• can be called anytime, anyplace• can accept arguments• should return a value• keeps code neat• promotes smooth flow
>>> def foo(bar):... print bar...>>> foo(“yo”)yo
Functionsimport arcpy
def get_raster_props(in_raster): """Get properties of a raster, return as dict""" # Cast input layer to a Raster r = arcpy.Raster(in_raster) raster_props = {} # Create empty dictionary to put props in below raster_props["x_center"] = r.extent.XMin + (r.extent.XMax - r.extent.XMin)/2 raster_props["y_center"] = r.extent.YMin + (r.extent.YMax - r.extent.YMin)/2 raster_props["max_elev"] = r.maximum raster_props["min_elev"] = r.minimum raster_props["no_data"] = r.noDataValue raster_props["terr_width"] = r.width raster_props["terr_height"] = r.height raster_props["terr_cell_res"] = r.meanCellHeight # Return the dictionary of properties return raster_props
Programming paradigms:Procedural example
import arcpy
def add_field(in_fc="Magic.gdb/Fields", in_fields=[("Distance", "Float", "0"), ("Name", "Text", 50)]): """Add fields to FC""" for in_field in in_fields: if in_field[1] == 'Text': arcpy.AddField_management(in_fc,in_field[0],in_field[1],"#", "#",in_field[2],"#","NULLABLE","NON_REQUIRED","#") else: arcpy.AddField_management(in_fc,in_field[0],in_field[1],"#", "#","#","#","NULLABLE","NON_REQUIRED","#")
add_field()
Programming paradigms:Object-oriented programming (OOP)
• break program down into data types (classes) that associate behavior (methods) with data (members or attributes)
• code becomes more abstract• data and functions for dealing with it are
bound together in one object
Programming paradigms:Object-oriented programming (OOP)
import arcpy
class Fields(object): """Class for working with fields""" # __init__ --> method signature def __init__(self, in_fc="Magic.gdb/Fields", in_fields=[("Distance", "Float", "0"), ("Name", "Text", 50)]): self.in_fc = in_fc self.in_fields = in_fields def add_field(self): """Add fields to FC""" for in_field in self.in_fields: if in_field[1] == "Text": arcpy.AddField_management(self.in_fc, in_field[0], in_field[1], "#", "#", in_field[2], "#", "NULLABLE", "NON_REQUIRED", "#") else: arcpy.AddField_management(self.in_fc, in_field[0], in_field[1], "#", "#", "#", "#", "NULLABLE", "NON_REQUIRED", "#")
if __name__ == "__main__": # Instantiate the Fields class f = Fields() # Call the add_field method f.add_field() print f.in_fields print f.in_fc
• objects let you wrap complex processes, but present a simple interface to them
• methods and attributes are encapsulated inside the object
• methods and attributes are exposed to users• you can then update the object without
breaking the interface• you can pass objects around - carefully
Programming paradigms:Object-oriented programming (OOP)
Programming paradigms:OOP - Inheritance
• classes can inherit attributes and methods • allows you to reuse and customize existing
code inside a new class• you can override methods• you can add new methods to a class without
modifying the existing class
Prog
ram
min
g pa
radi
gms:
OO
P - I
nher
itanc
e
import arcpy
class Fields(object): """Class for working with fields""" def __init__(self, in_fc="Magic.gdb/Fields", in_fields=[("Distance", "Float", "0"), ("Name", "Text", 50)]): self.in_fc = in_fc self.in_fields = in_fields def add_field(self): """Add fields to FC""" for in_field in self.in_fields: if in_field[1] == "Text": arcpy.AddField_management(self.in_fc, in_field[0], in_field[1], "#", "#", in_field[2], "#", "NULLABLE", "NON_REQUIRED", "#") else: arcpy.AddField_management(self.in_fc, in_field[0], in_field[1], "#", "#", "#", "#", "NULLABLE", "NON_REQUIRED", "#")
class MyFields(Fields): """Customized fields class""" def add_field(self): """Add fields to FC""" for in_field in self.in_fields: # Test to see if in_field exists already in featureclass if in_field[0] in [f.name for f in arcpy.ListFields(self.in_fc)]: # If field exists, delete it arcpy.DeleteField_management(self.in_fc, in_field[0]) print in_field[0], "deleted" if in_field[1] == "Text": arcpy.AddField_management(self.in_fc, in_field[0], in_field[1], "#", "#", in_field[2], "#", "NULLABLE", "NON_REQUIRED", "#") else: arcpy.AddField_management(self.in_fc, in_field[0], in_field[1], "#", "#", "#", "#", "NULLABLE", "NON_REQUIRED", "#") if __name__ == "__main__": # Instantiate MyFields class, which in inherits the Fields class f = MyFields() # Call add_field method f.add_field()
Prog
ram
min
g pa
radi
gms:
OO
P - I
nher
itanc
e
import arcpy
class Fields(object): """Class for working with fields""" def __init__(self, in_fc="Magic.gdb/Fields", in_fields=[("Distance", "Float", "0"), ("Name", "Text", 50)]): self.in_fc = in_fc self.in_fields = in_fields def add_field(self): """Add fields to FC""" for in_field in self.in_fields: if in_field[1] == "Text": arcpy.AddField_management(self.in_fc, in_field[0], in_field[1], "#", "#", in_field[2], "#", "NULLABLE", "NON_REQUIRED", "#") else: arcpy.AddField_management(self.in_fc, in_field[0], in_field[1], "#", "#", "#", "#", "NULLABLE", "NON_REQUIRED", "#") def get_field_props(self): desc = arcpy.Describe(self.in_fc) for field in desc.fields: print field.name, "-->", field.type
class MyFields(Fields): """Customized fields class""" def add_field(self): """Add fields to FC""" for in_field in self.in_fields: if in_field[0] in [f.name for f in arcpy.ListFields(self.in_fc)]: arcpy.DeleteField_management(self.in_fc, in_field[0]) print in_field[0], "deleted" if in_field[1] == "Text": arcpy.AddField_management(self.in_fc, in_field[0], in_field[1], "#", "#", in_field[2], "#", "NULLABLE", "NON_REQUIRED", "#") else: arcpy.AddField_management(self.in_fc, in_field[0], in_field[1], "#", "#", "#", "#", "NULLABLE", "NON_REQUIRED", "#") if __name__ == "__main__": # Instantiate MyFields class f = MyFields() # Call add_field method f.add_field() print f.in_fields # See, we really do inherit everything from the Fields class f.get_field_props()
Modularizing code
• I’m lazy, so I want to reuse code• import statement – call functionality in
another module• Have one custom module (a .py file) with code
you use all the time• Great way to package up helper functions• ESRI does this with ConversionUtils.py C:\Program Files (x86)\ArcGIS\Server10.0\ArcToolBox\Scripts
Geometries
• heirarchy:– feature class is made of features– feature is made of parts– part is made of points
• heirarchy in Pythonic terms:– part: [[pnt, pnt, pnt, ...]]– multipart polygon: [[pnt, pnt, pnt, ...], [pnt, pnt, pnt, ...]]– single part polygon with hole: [[pnt, pnt, pnt, ,pnt, pnt, pnt]]
Reading geometry• accessed through the geometry object of a
feature• example: describe_geometry_arcmap.py
1.open up SearchCursor
2.loop through rows
3.get geometry4.print out X, Y
import arcpydesc = arcpy.Describe("Points")sfn = desc.ShapeFieldNamerows = arcpy.SearchCursor("Points")for row in rows: geom = row.getValue(sfn) pnt = geom.getPart() print pnt.X, pnt.Y
Reading geometry
import arcpy
desc = arcpy.Describe("Points")sfn = desc.ShapeFieldNamerows = arcpy.SearchCursor("Points")for row in rows: geom = row.getValue(sfn) pnt = geom.getPart() print pnt.X, pnt.Y
Read
ing
geom
etry
import arcpy
infc = "Magic.gdb/Polygons"
# Identify the geometry fielddesc = arcpy.Describe(infc)shapefieldname = desc.ShapeFieldName# Create search cursorrows = arcpy.SearchCursor(infc)
# Enter for loop for each feature/rowfor row in rows: # Create the geometry object feat = row.getValue(shapefieldname) # Print the current multipoint's ID print "Feature %i:" % row.getValue(desc.OIDFieldName) partnum = 0 # Step through each part of the feature for part in feat: # Print the part number print "Part %i:" % partnum # Step through each vertex in the feature for pnt in feat.getPart(partnum): if pnt: # Print x,y coordinates of current point print pnt.X, pnt.Y else: # If pnt is None, this represents an interior ring print "Interior Ring:" partnum += 1
Read
ing
geom
etry
import arcpy
infc = "Magic.gdb/Polygons"
desc = arcpy.Describe(infc)shapefieldname = desc.ShapeFieldName
rows = arcpy.SearchCursor(infc)
for row in rows: feat = row.getValue(shapefieldname) print "\tFeature %i:" % row.getValue(desc.OIDFieldName) partnum = 0
for part in feat: parts = [] print "Part %i:" % partnum
for pnt in feat.getPart(partnum): if pnt: parts.append([pnt.X, pnt.Y]) else: parts.append(" ") partnum += 1 print parts
Writing geometry
• arcpy.Point• point features are point objects, lines and
polygons are arrays of point objects– arcpy.PolyLine, arcpy.Polygon
• Geometry objects can be created using the Geometry, Mulitpoint, PointGeometry, Polygon, or Polyline classes
Writi
ng g
eom
etry
data_list = [[33.09500,-93.90389], [33.03194,-93.89806], [34.34111,-93.50056], [34.24917,-93.67667], [34.22500,-93.89500], [33.76833,-92.48500], [33.74500,-92.47667], [33.68000,-92.46667], [35.05425,-94.12711], [35.03472,-94.12233], [35.03333,-94.12236], [35.01500,-94.12108], [35.00392,-94.12033]]
import arcpyimport timedef PushNbiToFeatureclass( inFc, inList): """ Take a list of NBI data and push it directly to a FGDB point FC """ try: cur = arcpy.InsertCursor(inFc) for line in inList: t = 0 feat = cur.newRow() feat.shape = arcpy.Point(line[1], line[0]) feat.setValue("Timestamp", time.strftime("%m/%d/%Y %H:%M:%S", time.localtime())) cur.insertRow(feat) del cur except Exception as e: print e.message
PushNbiToFeatureclass(r”path to fc”, data_list)
Writi
ng g
eom
etry
import arcpy
arcpy.env.overwriteOutput = 1
# A list of features and coordinate pairscoordList = [[[1,2], [2,4], [3,7]], [[6,8], [5,7], [7,2], [9,18]]]
# Create empty Point and Array objectspoint = arcpy.Point()array = arcpy.Array()# A list that will hold each of the Polygon objects featureList = []
for feature in coordList: # For each coordinate pair, set the x,y properties and add to the # Array object for coordPair in feature: point.X = coordPair[0] point.Y = coordPair[1] array.add(point) # Add the first point of the array in to close off the polygon array.add(array.getObject(0)) # Create a Polygon object based on the array of points polygon = arcpy.Polygon(array) # Clear the array for future use array.removeAll() # Append to the list of Polygon objects featureList.append(polygon)
# Copy Polygon object to a featureclassarcpy.CopyFeatures_management(featureList, "d:/temp/polygons.shp")
Rasters
• arcpy.Raster class– raster object: variable that references a raster
dataset– gives access to raster props
• raster calculations – Map Algebra– outras = Slope(“in_raster”)– can cast to Raster object for calculations
Rastersimport arcpy
def get_raster_props(in_raster): """Get properties of a raster, return as dict""" # Cast input layer to a Raster r = arcpy.Raster(in_raster) raster_props = {} # Create empty dictionary to put props in below raster_props["x_center"] = r.extent.XMin + (r.extent.XMax - r.extent.XMin)/2 raster_props["y_center"] = r.extent.YMin + (r.extent.YMax - r.extent.YMin)/2 raster_props["max_elev"] = r.maximum raster_props["min_elev"] = r.minimum raster_props["no_data"] = r.noDataValue raster_props["terr_width"] = r.width raster_props["terr_height"] = r.height raster_props["terr_cell_res"] = r.meanCellHeight # Return the dictionary of properties return raster_props
Spatial references
• can get properties from arcpy.Describe>>> sr = arcpy.Describe(fc).spatialReference>>> sr.typeu’Projected’ or u’Geographic’
• arcpy.SpatialReference class• methods to create/edit spatial refs
Spatial references
>>> sr_utm = arcpy.SpatialReference()>>> sr_utm.factoryCode = 26915>>> sr_utm.create()>>> sr_utm.name...
• arcpy.SpatialReference class• methods to create/edit spatial refs
ERROR
S
Exception Handling
• It’s necessary, stuff fails• Useful error reporting• Proper application cleanup• Combine it with logging try: do something... except: handle error... finally: clean up...
Exception handling – try/except
• most basic form of error handling• wrap whole program or portions of code• use optional finally clause for cleanup– close open files– close database connections– check extensions back in
Exception handling
import arcpytry: arcpy.Buffer_analysis("Observer")except Exception as e: print e.message
Exception handling
import arcpy
try: if arcpy.CheckExtension("3D") == "Available": arcpy.CheckOutExtension("3D") arcpy.Slope_3d("Magic.gdb/NWA10mNED", "Magic.gdb/SlopeNWA")except: print arcpy.GetMessages(2)finally: # Check in the 3D Analyst extension arcpy.CheckInExtension("3D")
Exception handling - raise
• allows you to force an exception to occur• can be used to alert of conditions
Exception handling - raiseimport arcpy
class LicenseError(Exception): pass
try: if arcpy.CheckExtension("3D") == "Available": arcpy.CheckOutExtension("3D") else: raise LicenseError arcpy.Slope_3d("NWA10mNED", "SlopeNWA")except LicenseError: print "3D Analyst license unavailable"except: print arcpy.GetMessages(2)finally: # Check in the 3D Analyst extension arcpy.CheckInExtension("3D")
Exception handlingAddError and traceback
• AddError – returns GP-specific errors• traceback – prints stack trace; determines
precise location of error– good for larger, more complex programs
Exce
ption
han
dlin
g –
AddE
rror
and
trac
ebac
kimport arcpyimport sysimport traceback
arcpy.env.workspace = r"C:\Student\Code\MAGIC.gdb"try: # Your code goes here float("a string")
except: # Get the traceback object tb = sys.exc_info()[2] tbinfo = traceback.format_tb(tb)[0] # Concatenate information together concerning the error into a message string # tbinfo: where error occurred # sys.exc_info: 3-tuple of type, value, traceback pymsg = "PYTHON ERRORS:\nTraceback info:\n" + tbinfo + "\nError Info:\n" + str(sys.exc_info()[1]) msgs = "ArcPy ERRORS:\n" + arcpy.GetMessages(2) + "\n" # Return python error messages for use in script tool or Python Window arcpy.AddError(pymsg) if arcpy.GetMessages(2): arcpy.AddError(msgs) print msgs # Print Python error messages for use in Python / Python Window print pymsg + "\n"
Logging
• logging module• logging levels:– DEBUG: detailed; for troubleshooting– INFO: normal operation, statuses– WARNING: still working, but unexpected behavior– ERROR: more serious, some function not working– CRITICAL: program cannot continue
Super-basic logging
import logginglogging.warning("Look out!")logging.info("Does this print?")
Super-basic logging to a log file
import logginglogging.basicConfig(filename='log_example.log', level=logging.DEBUG)logging.debug('This message should get logged')logging.info('So should this')logging.warning('And this, too')
import logginglogging.basicConfig(filename="log_example.log",level=logging.DEBUG)logging.debug("This message should go to the log file")logging.info("So should this")logging.warning("And this, too")
Super-basic logging to a log file
Meaningful logging
• “customize” the logger• add in info-level message(s) to get logged• log our errors to log file• can get much more advanced, see the docs
import arcpyimport sysimport tracebackimport loggingimport datetime
log_file = "meaningful_log_%s.log" % datetime.datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
arcpy.env.workspace = r"C:\Student\Code\MAGIC.gdb"
# Setup loggerlogging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)-8s %(message)s', datefmt='%Y-%m-%d %H:%M:%S', filename=log_file, filemode='w')logging.info(': START LOGGING')
try: # Your code goes here float("lfkjdlk")
logging.info(": DONE")
except: # Get the traceback object tb = sys.exc_info()[2] tbinfo = traceback.format_tb(tb)[0] # Concatenate information together concerning the error into a message string # tbinfo: where error occurred # sys.exc_info: 3-tuple of type, value, traceback pymsg = "PYTHON ERRORS:\nTraceback info:\n" + tbinfo + "\nError Info:\n" + str(sys.exc_info()[1]) msgs = "ArcPy ERRORS:\n" + arcpy.GetMessages(2) + "\n" # Return python error messages for use in script tool or Python Window arcpy.AddError(pymsg) if arcpy.GetMessages(2): arcpy.AddError(msgs) logging.error(": %s" % msgs) # Log Python error messages for use in Python / Python Window logging.error(": %s" % pymsg + "\n")
Mea
ning
ful l
oggi
ng
Mea
ning
ful l
oggi
ng
Code documentation
• Pythonic standards covered in PEPs 8 and 257• help()• comments need to be worth it• name items well• be precise and compact• comments may be for you
Creating documentation
• pydoc – built-in; used by help()– generate HTML on any module– kinda plain
• epydoc – old, rumored to be dead– produces nicely formatted HTML– easy to install and use
• Sphinx framework– “intelligent and beautiful documentation”– all the cool kids are using it (docs.python.org)– more involved to setup and use
Branching out
Installing packages
Installing packages (on Windows)
• Windows executables• Python eggs– .zip file with metadata, renamed .egg– distributes code as a bundle– need easy_install
• pip– tool for installing and managing Python packages– replacement for easy_install
pip
• can take care of dependencies for you• uninstallation!• install via easy_install, ironically
C:\pip search “kml”
C:\pip install BeautifulSoup
C:\pip install –upgrade pykml
C:\pip uninstall BeautifulSoup
virtualenv
• a tool to create isolated Python environments• manage dependencies on a per-project basis,
rather than globally installing• test modules without installing into site-
packages• avoid unintentional upgrades
virtualenv
• install via pip, easy_install, or by
• create the env
• activate the env
• use the env
C:\python virtualenv.py
C:\dir virtualenv <env>
C:\dir\<env>Scripts activate
(<env>) C:\dir\<env>Scripts\python>>>
virtualenv
• installs Python where you tell it, modifies system path to point there– good only while the env is activated
• use yolk to list installed packages in env
• But can this work in ArcMap Python prompt?
(test) C:\dir> yolk -l
virtualenv
• YES, with a little work...
• tells ArcMap to use Python interpreter in our virtualenv– kill ArcMap, back to using default interpreter
>>> execfile(r'C:\<env>\Scripts\activate_this.py', {'__file__': r'C:\<env>\Scripts\activate_this.py'})
The web
• Infinite source of information• Right-click and “Save as” is so lame (and too
much work)• Python can help you exploit the web– ftplib, http (urllib), mechanize, scraping (
Beautiful Soup), send email (smtplib)
Fetching data
• Built-in libraries for ftp and http• ftplib – log in, nav to directory, retrieve files• urllib/urllib2 – pass in the url you want, get it
back• wget – GNU commandline tool– Can call with os.system()
import urlliburllib.urlretrieve("http://www.fhwa.dot.gov/bridge/nbi/2011/RI11.txt", "C:/temp/RI11.txt")
Fetching data
Scraping
• Scrape data from a web page• Well-structured content is a HUGE help, as is
valid markup, which isn’t always there• BeautifulSoup 3rd party module– Built in methods and regex’s help out– Great for getting at tables of data
Scraping addresses
http://www.phillypal.com/pal_locations.php
import BeautifulSoup as bsimport urllib2
url = "http://www.phillypal.com/pal_locations.php"
# Open the URLresponse = urllib2.urlopen(url)# Slurp all the HTML code into memoryhtml = response.read()# Feed it into BS parsersoup = bs.BeautifulSoup(html)# Find all the table cells whose width=37%addresses = soup.findAll("td", {"width":"37%"})
print len(addresses)
for address in addresses: # Print out just the text print address.find(text=True)
Scraping addresses
1845 N. 23rd Street, 191213301 Tasker Street, 191455801 Media Street, 19131250 S. 63rd Street, 19139732 N. 17th Street, 19130631 Snyder Avenue, 191486901 Rising Sun Avenue, 19111851 E. Tioga Street, 19134720 W. Cumberland St., 191333890 N. 10th Street, 191404550 Haverford Avenue, 191391100 W. Rockland St., 191411500 W. Ontario Street, 191402423 N. 27th Street, 191321267 E. Cheltenham Ave., 245330 Germantown Ave., 191441599 Wharton Street, 191464253 Frankford Avenue, 191242524 E. Clearfield St., 191346300 Garnet Street, 191265900 Elmwood Street, 191434301 Wayne Avenue, 191404401 Aldine Street, 191364614 Woodland Avenue, 191434419 Comly Street, 191352251 N. 54th Street, 19131
Scraping addresses
Emailing
• smtp built-in library• best if you have IP of your email server• port blocking can be an issue import smtplib server = smtplib.SMTP(email_server_ip) msg = ‘All TPS reports need new cover sheets’ server.sendmail('[email protected]', '[email protected]', msg) server.quit()
• there’s always Gmail too…
Files
• built in open function – slurp entire file into memory – OK except for huge files
data = open(file).read().splitlines()
• iterate over the lines for line in data:
do something
• CSV module reader =
csv.reader(open('C:/file.csv','rb')) for line in reader: do something
Excel
• love, hate, love• many modules out there– xlrd (read) / xlwt (write) – only .xls– openPyXL – read/write .xlsx
• uses – Push text data to Excel file– Push featureclass data to Excel programmatically– Read someone else’s “database”
import xlrd
# Open the workbookwb = xlrd.open_workbook('Employees.xls')wb.sheet_names()
# Get first sheetsh=wb.sheet_by_index(0)# Print out the rowsfor row in range(sh.nrows): print sh.row_values(row)
# Get a single cellcell_b2 = sh.cell(1,1).valueprint "\n", cell_b2
Reading Excel
# Write an XLS file with a single worksheet, containing# a heading row and some rows of data.
import xlwtimport datetimeimport bs_scrape as bsimport nbi_data_processing as nbiezxf = xlwt.easyxf
def write_xls(file_name, sheet_name, headings, data, heading_xf, data_xfs): book = xlwt.Workbook() sheet = book.add_sheet(sheet_name) rowx = 0 for colx, value in enumerate(headings): sheet.write(rowx, colx, value, heading_xf) sheet.set_panes_frozen(True) # frozen headings instead of split panes sheet.set_horz_split_pos(rowx+1) # in general, freeze after last heading row sheet.set_remove_splits(True) # if user does unfreeze, don"t leave a split there for row in data: rowx += 1 for colx, value in enumerate(row): sheet.write(rowx, colx, value, data_xfs[colx]) book.save(file_name)
if __name__ == "__main__": import sys files = ["RI","HI"] all_data = [] stateDict = bs.FetchFipsCodes( ) for f in files: k = nbi.ParseNbiFile('C:/student/inputs/' + f + '11.txt', stateDict ) all_data.extend(k) hdngs = ["Structure","State","Facility carried","Lat","Lon","Year built"] kinds = "text text text double double yr".split() data = [] for each_row in all_data: data.extend([each_row]) # Format the headers heading_xf = ezxf("font: bold on; align: wrap on, vert centre, horiz center") # Set the data type formats kind_to_xf_map = { "date": ezxf(num_format_str="yyyy-mm-dd"), "int": ezxf(num_format_str="#,##0"), "money": ezxf("font: italic on; pattern: pattern solid, fore-colour grey25", num_format_str="$#,##0.00"), "price": ezxf(num_format_str="#0.000000"), "double":ezxf(num_format_str="00.00000"), "text": ezxf(), "yr": ezxf(num_format_str="0000") } data_xfs = [kind_to_xf_map[k] for k in kinds] write_xls("NBI_Data_To_Excel.xls", "NBI", hdngs, data, heading_xf, data_xfs)
Writi
ng E
xcel
Writi
ng E
xcel
Databases
• You can connect to pretty much ANY database• Is there one true solution??• pyodbc – Access, SQL Server, MySQL• Oracle – cx_Oracle• Others – pymssql, _mssql, MySQLdb• Execute SQL statements through a connection
conn = library.connect(driver/user/pwd) cursor = conn.cursor() for row in cursor.execute(sql)
do something
Resources - FREE
• Dive into Python• Python Cookbook• Think Python• Python docs• gis.stackexchange.com• Google is your friend (as always)• Python community is HUGE and GIVING
Conferences
• pyArkansas – annually in Conway– pyar2 list on python.org
• PyCon – THE national US Python conference• FOSS4G – international open source for GIS • ESRI Developer Summit – major dork-fest, but
great learning opportunity and Palm Springs in March
IDEs and editors
• Wing – different license levels, good people• PyScripter – open source, code completion• Komodo – free version also available• Notepad2 – ole’ standby editor• Notepad++ - people swear by it• PythonWin – another standby, but barebones• …dozens (at least) more editors out there…
More reading
• http://www.voidspace.org.uk/python/articles/OOP.shtml - great OOP article (which I used a a lot)