noah callawayzac fleischmann zak nelson brandon zahl apartment cloud

10
Noah Callaway Zac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

Upload: wilfred-turner

Post on 24-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

Noah Callaway Zac FleischmannZak Nelson Brandon Zahl

Apartment Cloud

Page 2: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

Aspirations / Reality

Aggregate apartments listings from all across the internet to create a……simple, one-stop, apartment search

Aggregate apartment listings from top sites.(Washington state only)

…mostly one-stop apartment search.…mostly simple.

Page 3: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

Building It

Brandon – Site specific extractorsStatistics

Noah – Server configurationFront-end development

Zac – Site specific extractorsAdvanced Search

Zak – Crawler / AggregatorCommute distance feature

Page 4: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

Page Extraction Statistics

Extractor Name Files Crawled Listings Found

Extraction Errors

% error-free

Rent.com 4907 325 0 100ApartmentRatings.com 7855 723 11 98.4

Craigslist.com 9794 2773 91 96.6

MyNewPlace.com 7392 901 70 91.6

Page 5: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

Extraction Accuracy Statistics

Extractor Name TP TN FP FN Precision Recall F-score

Rent.com 281 135 12 0 0.959 1.000 0.979

ApartmentRatings.com 39 0 1 0 0.975 1.000 0.987

Craigslist 63 147 3 9 0.955 0.875 0.913

MyNewPlace.com 186 186 10 44 0.949 0.809 0.873

Page 6: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

Experiment Conclusion

• Much higher accuracy on the structured pages versus unstructured craigslist

• Craigslist is candidate for machine learning

• Machine learning likely worse on others

Page 7: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

What we learned

•How to configure Amazon Web Services with a LAMP stack

•How to create a web application with AJAX

•How to use Jobo and Nutch for web crawling

•How to parse HTML for pertinent data

•The considerations of starting a web business

Page 8: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

Unexpected Outcomes

•Amazon Web Services was slower than a $7/month virtual server

•Most of the large listing sites were surprisingly easy to extract data from

•Aggregating information from the web is legally tricky

Page 9: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

Things We’d Do Differently

•Better version control

•More pre-coding design

•More quality control and testing

•More extensible extractors (Maybe an existing HTML parser)

Page 10: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud

Demo