![Page 1: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/1.jpg)
WebRecorder.io
Building a new archiving service for everyone!
![Page 2: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/2.jpg)
What is WebRecorder.io?
On-Demand Archiving through the browser
![Page 3: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/3.jpg)
What is WebRecorder.io?
On-Demand Archiving through the browser
What you see is what you archive (WYSIWYA)
![Page 4: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/4.jpg)
What is WebRecorder.io?
On-Demand Archiving through the browser
What you see is what you archive (WYSIWYA)
Available to anyone!
![Page 5: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/5.jpg)
What is WebRecorder.io?
On-Demand Archiving through the browser.
What you see is what you archive (WYSIWYA)
Available to anyone!
“Quality over Quantity” - High-Fidelity Replay of Web Content
![Page 6: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/6.jpg)
Current Service
Proof-of-Concept
![Page 7: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/7.jpg)
Current Service
Proof-of-Concept Users can record a page and browse
![Page 8: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/8.jpg)
Current Service
Proof-of-Concept Users can record a page and browse Users can download the WARC after
browsing
![Page 9: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/9.jpg)
Current Service
Proof-of-Concept Users can record a page and browse Users can download the WARC after
browsing Users can upload any WARC and replay
![Page 10: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/10.jpg)
Current Service
Proof-of-Concept Users can record a page and browse Users can download the WARC after
browsing Users can upload any WARC and replay No content stored, WARCs deleted after
30 mins.
![Page 11: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/11.jpg)
Current Service
Proof-of-Concept Users can record a page and browse Users can download the WARC after
browsing Users can upload any WARC and replay No content stored, WARCs deleted after
30 mins. Created last year as an experiment
![Page 12: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/12.jpg)
Current Service
Proof-of-Concept Users can record a page and browse Users can download the WARC after browsing Users can upload a WARC and replay back No content stored, WARCs deleted after 30 mins. Created last year as an experiment
You can use at: https://webrecorder.io
![Page 13: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/13.jpg)
New WebRecorder.io Service
First new version up at beta.webrecorder.io for demo
![Page 14: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/14.jpg)
New WebRecorder.io Service
First new version up at beta.webrecorder.io for demo
Initially invite only to monitor capacity
![Page 15: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/15.jpg)
New WebRecorder.io Service
First new version up at beta.webrecorder.io for demo
Initially invite only to monitor capacity User registration, login, individual
collections
![Page 16: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/16.jpg)
New WebRecorder.io Service
First new version up at beta.webrecorder.io for demo
Initially invite only to monitor capacity User registration, login, individual
collections Collections available at
beta.webrecorder.io/<user>/<coll>.
![Page 17: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/17.jpg)
New WebRecorder.io Service
First new version up at beta.webrecorder.io for demo
Initially invite only to monitor capacity User registration, login, individual collections Collections available at
beta.webrecorder.io/<user>/<coll> Collections can be private, public, or shared
privately (coming soon).
![Page 18: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/18.jpg)
Live Demo!
![Page 19: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/19.jpg)
Privacy Concerns
User responsible for their own archive, has full control
![Page 20: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/20.jpg)
Privacy Concerns
User responsible for their own archive, has full control
Collections private by default, but users may choose what to make public
![Page 21: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/21.jpg)
Privacy Concerns
User responsible for their own archive, has full control
Collections private by default, but users may choose what to make public
For now, WARCs downloadable only by owner, though may change.
![Page 22: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/22.jpg)
Privacy Concerns
User responsible for their own archive, has full control
Collections private by default, but users may choose what to make public
For now, WARCs downloadable only by owner, though may change.
May have additional access levels: share read-only, share for recording, etc...
![Page 23: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/23.jpg)
Privacy Concerns
User responsible for their own archive, has full control
Collections private by default, but users may choose what to make public
For now, WARCs downloadable only by owner, though may change.
May have additional access levels: share read-only, share for recording, etc...
Cookies: Cookies are recorded, but not replayed
![Page 24: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/24.jpg)
Privacy Concerns
User responsible for their own archive, has full control Collections private by default, but users may choose
what to make public For now, WARCs downloadable only by owner, though
may change. May have additional access levels: share read-only,
share for recording, etc... Cookies: Cookies are recorded, but not replayed Looking for ideas/better ways to address privacy.
Suggestions welcome!
![Page 25: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/25.jpg)
Goals/Features
Provide a flexible archiving service for high-fidelity web archiving.
![Page 26: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/26.jpg)
Goals/Features
Provide a flexible archiving service for high-fidelity web archiving.
Customizable UI, metadata and annotation support.
![Page 27: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/27.jpg)
Goals/Features
Provide a flexible archiving service for high-fidelity web archiving.
Customizable UI, metadata and annotation support.
On-Demand Full-Text Search.
![Page 28: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/28.jpg)
Goals/Features
Provide a flexible archiving service for high-fidelity web archiving.
Customizable UI, metadata and annotation support.
On-Demand Full-Text Search. Multiple privacy options, custom sharing
settings.
![Page 29: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/29.jpg)
Goals/Features
Provide a flexible archiving service for high-fidelity web archiving.
Customizable UI, metadata and annotation support.
On-Demand Full-Text Search. Multiple privacy options, custom sharing
settings. Multiple backends for storage.
![Page 30: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/30.jpg)
Goals/Features Provide a flexible archiving service for high-fidelity
web archiving.
Customizable UI, metadata and annotation support.
On-Demand Full-Text Search. Multiple privacy options, custom sharing settings. Multiple backends for storage. A version that can also be hosted on custom
hardware, not in “the cloud”
![Page 31: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/31.jpg)
Tools Used in WebRecorder.io
Built with open-source tools pywb – https://github.com/ikreymer/pywb
python wayback – Embedded in the web app, front end web service, handles url rewriting w/ custom rules, WARC reading, live web fetching.
warcprox – https://github.com/internetarchive/warcprox - Created by Noah Levitt of IA, HTTP/S proxy which records HTTP traffic to WARCs
![Page 32: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/32.jpg)
Help Wanted!
Looking for collaborators, developers, UI designers, archivists
![Page 33: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/33.jpg)
Help Wanted!
Looking for collaborators, developers, UI designers, archivists
If you ever wanted to participate in building an archiving service, here is your chance.
![Page 34: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/34.jpg)
Help Wanted!
Looking for collaborators, developers, UI designers, archivists
If you ever wanted to participate in building an archiving service, here is your chance.
Sign-up for mailing list on webrecorder.io or request an invite at beta.webrecorder.io
Also can email [email protected]
![Page 35: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/35.jpg)
Symmetrical Archiving – server and client side url rewriting for record and replay follow same path
Easy Part: HTML url rewritingHard part: JavaScript
Attempt to emulate original JS env as much as possible, customizable client-side hooks
Far from foolproof, Flash, Java applets still problematic.
Addendum: How It Works
![Page 36: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/36.jpg)
Help Wanted!
Looking for collaborators, developers, UI designers, archivists
If you ever wanted to participate in building an archiving service, here is your chance.
Sign-up for mailing list on webrecorder.io or request an invite at beta.webrecorder.io
Also can email [email protected]
![Page 37: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/37.jpg)
“Symmetrical Archiving”
User browses page through /record/ path → Page is recorded to WARC and indexed
User browses page through /replay/ path→ Page is replayed from WARC using index
Attempt symmetry in capture and replay as much as possible.
Assumption: Dynamic content generated for /record/ = Dynamic content generated for /replay/
![Page 38: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/38.jpg)
“Symmetrical Archiving”
/<coll>/record/ path ↔ url rewriting system ↔fetch HTTP data ↔ recording proxy writes WARCs ↔ live web
/<coll>/ path ↔ url rewriting system ↔ fetch HTTP data ↔ read from WARC
Attempt symmetry in capture and replay as much as possible. Recorded content is instantly replayable.
![Page 39: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/39.jpg)
“Symmetrical Archiving”
/<coll>/record/ path ↔ url rewriting system ↔fetch HTTP data ↔ recording proxy writes WARCs ↔ live web
/<coll>/ path ↔ url rewriting system ↔ fetch HTTP data ↔ read from WARC
Url rewriting is the hard part! Actually more like “emulating original page context”
when running through a proxy/recording.
![Page 40: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/40.jpg)
“When symmetry breaks”
JavaScript generated content, “leaks” to live web
![Page 41: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/41.jpg)
“When symmetry breaks”
JavaScript generated content, “leaks” to live web
Possible Solution: Extensive client side url-rewriting
![Page 42: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/42.jpg)
“When symmetry breaks”
JavaScript generated content, “leaks” to live web
Possible Solution: Extensive client side url-rewriting
Checks for window.location or window.top
![Page 43: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/43.jpg)
“When symmetry breaks”
JavaScript generated content, “leaks” to live web
Possible Solution: Extensive client side url-rewriting
Checks for window.location or window.top
![Page 44: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/44.jpg)
“When symmetry breaks”
JavaScript generated content, “leaks” to live web
Possible Solution: Extensive client side url-rewriting
Checks for window.location or window.top
![Page 45: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/45.jpg)
“When symmetry breaks”
Urls change based on timestamp, or date, eg. ?_=<timestamp>
Possible Solution: Override Date(), server-side “fuzzy matching” ignoring certain query params
Flash video in a custom flash SWF Possible Solution: may be able to force
html5, otherwise youtube-dl may download flash version, and replace with custom player (FlowPlayer)
![Page 46: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/46.jpg)
“When symmetry breaks”
Urls change based on timestamp, or date, eg. ?_=<timestamp>
Possible Solution: Override Date(), server-side “fuzzy matching” ignoring certain query params
Flash video in a custom flash SWF Possible Solution: may be able to force html5,
otherwise youtube-dl may download flash version, and replace with custom player (FlowPlayer)
General black-box Flash content with hard-coded links. Possible Solution: No good one so far! Maybe
shumway.js, a javascript flash player from Mozilla?
![Page 47: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/47.jpg)
“When symmetry breaks”
JavaScript generated content, “leaks” to live web
Possible Solution: Extensive client side url-rewriting
Checks for window.location or window.top Possible Solution: Rewrite
window.location → WB_wombat_location , window.top → WB_wombat_top
![Page 48: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/48.jpg)
wombat.js rewriting libraryThe following are some of the possible overrides by wombat.js: AJAX (XmlHTTPRequest.open) window.open History.pushState / replaceState Object.defineProperty() overrides on: document.domain, document.cookie WB_wombat_location emulates to window.location with rewriting (with server-side
rewriting) WB_wombat_top emulate window.top but hides container frame (with server-side
rewriting) Window postMessage() Date() constructor Seed Math.random with capture time document.write() setAttribute() / or mutation observers appendChild() / replaceChild() / insertChild()
![Page 49: WebRecorder.io Building a new archiving service for everyone!](https://reader035.vdocuments.us/reader035/viewer/2022081514/56649d345503460f94a0b6d9/html5/thumbnails/49.jpg)
pywb
wombat.js is part of pywb, a new open source python “wayback machine” implementation
Optional custom rules can be specified for any site by prefix or regex, specified in yaml file.
Fuzzy matching rules: Specify significant query params No config file required! Out-of-the-box simple collection
management tools for running an archive More details at: https://github.com/ikreymer/pywb Future updates will include improvements to rule
customization.