pygrametl
DESCRIPTION
pygrametlTRANSCRIPT
7/28/13 pygrametl
www.pygrametl.org 1/1
The code has a few functions defined in the top. After the functions, the pygrametl Dimension, FactTable, and Source objects are created. Using these objects, the main method only requires 10 lines of code to load the DW. Note how easy it is to fill the page
dimension which is slowly changing and snowflaked.
lookupatts=['testname'], prefill=True, defaultidvalue=-1)
datedim = CachedDimension( name='date', key='dateid', attributes=['date', 'day', 'month', 'year', 'week', 'weekyear'], lookupatts=['date'], rowexpander=datehandling)
facttbl = BulkFactTable( name='testresults', keyrefs=['pageid', 'testid', 'dateid'], measures=['errors'], bulkloader=pgcopybulkloader, bulksize=5000000)
# Data sources - change the path if you have your files somewhere elsedownloadlog = CSVSource(file('./DownloadLog.csv', 'r', 16384), delimiter='\t')
testresults = CSVSource(file('./TestResults.csv', 'r', 16384), delimiter='\t')
inputdata = MergeJoiningSource(downloadlog, 'localfile', testresults, 'localfile')
def main(): for row in inputdata: extractdomaininfo(row) extractserverinfo(row) row['size'] = pygrametl.getint(row['size']) # Convert to an int # Add the data to the dimension tables and the fact table row['pageid'] = pagesf.scdensure(row) row['dateid'] = datedim.ensure(row, {'date':'downloaddate'}) row['testid'] = testdim.lookup(row, {'testname':'test'}) facttbl.insert(row) connection.commit()
if __name__ == '__main__': main()