noah callawayzac fleischmann zak nelson brandon zahl apartment cloud
TRANSCRIPT
![Page 1: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud](https://reader035.vdocuments.us/reader035/viewer/2022072010/56649dcf5503460f94ac35f4/html5/thumbnails/1.jpg)
Noah Callaway Zac FleischmannZak Nelson Brandon Zahl
Apartment Cloud
![Page 2: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud](https://reader035.vdocuments.us/reader035/viewer/2022072010/56649dcf5503460f94ac35f4/html5/thumbnails/2.jpg)
Aspirations / Reality
Aggregate apartments listings from all across the internet to create a……simple, one-stop, apartment search
Aggregate apartment listings from top sites.(Washington state only)
…mostly one-stop apartment search.…mostly simple.
![Page 3: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud](https://reader035.vdocuments.us/reader035/viewer/2022072010/56649dcf5503460f94ac35f4/html5/thumbnails/3.jpg)
Building It
Brandon – Site specific extractorsStatistics
Noah – Server configurationFront-end development
Zac – Site specific extractorsAdvanced Search
Zak – Crawler / AggregatorCommute distance feature
![Page 4: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud](https://reader035.vdocuments.us/reader035/viewer/2022072010/56649dcf5503460f94ac35f4/html5/thumbnails/4.jpg)
Page Extraction Statistics
Extractor Name Files Crawled Listings Found
Extraction Errors
% error-free
Rent.com 4907 325 0 100ApartmentRatings.com 7855 723 11 98.4
Craigslist.com 9794 2773 91 96.6
MyNewPlace.com 7392 901 70 91.6
![Page 5: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud](https://reader035.vdocuments.us/reader035/viewer/2022072010/56649dcf5503460f94ac35f4/html5/thumbnails/5.jpg)
Extraction Accuracy Statistics
Extractor Name TP TN FP FN Precision Recall F-score
Rent.com 281 135 12 0 0.959 1.000 0.979
ApartmentRatings.com 39 0 1 0 0.975 1.000 0.987
Craigslist 63 147 3 9 0.955 0.875 0.913
MyNewPlace.com 186 186 10 44 0.949 0.809 0.873
![Page 6: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud](https://reader035.vdocuments.us/reader035/viewer/2022072010/56649dcf5503460f94ac35f4/html5/thumbnails/6.jpg)
Experiment Conclusion
• Much higher accuracy on the structured pages versus unstructured craigslist
• Craigslist is candidate for machine learning
• Machine learning likely worse on others
![Page 7: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud](https://reader035.vdocuments.us/reader035/viewer/2022072010/56649dcf5503460f94ac35f4/html5/thumbnails/7.jpg)
What we learned
•How to configure Amazon Web Services with a LAMP stack
•How to create a web application with AJAX
•How to use Jobo and Nutch for web crawling
•How to parse HTML for pertinent data
•The considerations of starting a web business
![Page 8: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud](https://reader035.vdocuments.us/reader035/viewer/2022072010/56649dcf5503460f94ac35f4/html5/thumbnails/8.jpg)
Unexpected Outcomes
•Amazon Web Services was slower than a $7/month virtual server
•Most of the large listing sites were surprisingly easy to extract data from
•Aggregating information from the web is legally tricky
![Page 9: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud](https://reader035.vdocuments.us/reader035/viewer/2022072010/56649dcf5503460f94ac35f4/html5/thumbnails/9.jpg)
Things We’d Do Differently
•Better version control
•More pre-coding design
•More quality control and testing
•More extensible extractors (Maybe an existing HTML parser)
![Page 10: Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud](https://reader035.vdocuments.us/reader035/viewer/2022072010/56649dcf5503460f94ac35f4/html5/thumbnails/10.jpg)
Demo