a billion word to remember

21
A Billion Words to Remember The Lifetime Reader George Nagy Rensselaer Polytechnic Institute

Upload: others

Post on 07-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Billion Word to Remember

A Billion Words to RememberThe Lifetime Reader

George NagyRensselaer Polytechnic Institute

Page 2: A Billion Word to Remember

What would it take to

record, remember, and retrieve

all text read or seen or heard

during one’s lifetime?

2/1/2017 A billion words to remember 2

Page 3: A Billion Word to Remember

2/1/2017 A billion words to remember 3

read

Page 4: A Billion Word to Remember

2/1/2017A billion words to remember

4

or heard

Page 5: A Billion Word to Remember

Not a new idea In 1945, Vannevar Bush proposed the Memex:

2/1/2017 A billion words to remember 5

The camera hound of the future wears on his forehead a lump a little larger than a walnut. It takes pictures 3 millimeters square … only a factor of 10 beyond present practice. … .Wholly new forms of encyclopedias … with a mesh of associative trails … The entire material of the Britannica in reduced microfilm form would go on a sheet eight and one-half by eleven inches. …

Page 6: A Billion Word to Remember

What will it take today?

2/1/2017 A billion words to remember

http://pngimg.com/upload/laptop_PNG5940.png

1. A Sensor Module 2. A Host Computer

Page 7: A Billion Word to Remember

Sensor Module

2/1/2017 A billion words to remember 7

Camera:1 frame per second (FPS)20 Megapixels RBG60° field of view (FOV)Autofocus 25 cm to ∝< 10 g

MIC

GPS (or link)

Onboard processor:Text detectionText-image compressionLog (time and space stamp)Encryption? 20 GB memory (images)

Bluetooth or Wi-Fi

Page 8: A Billion Word to Remember

Camera-based OCR in 1960 (20 x 20 pixel camera)

2/1/2017 A billion words to remember 8

Page 9: A Billion Word to Remember

2/1/2017 A billion words to remember 9

Text Detection and Recognition in Imagery: A Survey

Qixiang Ye, Member, IEEE and David Doermann, Fellow, IEEE

7.3 Remaining Problems Processing multilingual text.

Processing incidental text. Real-time detection and recognition.End-to-end recognition.

Open vocabulary recognition.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 37, NO. 7, JULY 2015

Over 200 citations!

Page 10: A Billion Word to Remember

Host module: laptop, tablet, or smartphone

2/1/2017 A billion words to remember 10

StoreDuplicate detectionReading orderOCRIndexText compression ~10 GB (text)

RetrieveBrowser & digilib search toolsInverted indexTemporal and spatial proximityUser modelPattern matchingVector-space modelPerfect hashingSignature filesLatent semantic indexingGraph algorithmsRelevance feedback…

Page 11: A Billion Word to Remember

2/1/2017 A billion words to remember 11

Page 12: A Billion Word to Remember

Volume Calculationstext-image volume:4K x 5K pixels x 3 B/pixel x 1 fps x 3600 s/h x 8 hrs / 100x compression= 17 GB /day

audio volume: 4 KB/sec x 3600 s/h x 8 hrs = 115 MB/day (estimates vary from 300 B/s for a vocoder to 1.4 MB/s for high-fidelity stereo CD audio books)

image text volume:2 B/char x 5 chars/ word x 300 words/min x 60 m/h x 8 hrs /5x = 300 KB/day 300 KB x 365 x 100 = 10 GB / lifetime

audio text volume: same

2/1/2017 A billion words to remember 12

Page 13: A Billion Word to Remember

Three advantages of searching apersonal collection compared to web search

2/1/2017 A billion words to remember 13

1. Total lifetime volume only 10 GB compared to millions of times as much on WWW

2. Desired items already familiarand therefore easier to identify from top returns

3. Fractured prose & OCR errors not bothersome because we won’t re-broadcast found items

Page 14: A Billion Word to Remember

Some underlying research problems

2/1/2017 A billion words to remember 14

Image acquisition

Text-image analysis

Information retrieval

Ethical and legal issues

Page 15: A Billion Word to Remember

Image acquisition problems

2/1/2017 A billion words to remember 15

• Text detection in spatial context: at home, at work, in local venues, in transit, abroad

• Mosaicking required by head and body motion

• Lazy compression of text images

• Optional hands-free annotation (via mic)

• Optional gestural annotation, e.g. by tracing a phrase on a printed page

or computer screen with a designated finger

• Long-lasting or self-charging power supply

Page 16: A Billion Word to Remember

Text-image analysis problems

2/1/2017 A billion words to remember 16

• Perspective-invariant recognition instead of rectification

• Reading-order (no gaze tracking)

• Duplicate detection from consecutive frames and after interruptions

• Retention policy for undecipherable and unindexable fragments of text, and for near-duplicates

• Adaptation to predictable reading material like the newspaper, mags, the rest of the Jack Aubrey series, IJDAR, Python v2.7.6 documentation

Page 17: A Billion Word to Remember

Information retrieval problems

2/1/2017 A billion words to remember 17

• Retrieval strategies that mesh with our own mental recall

• Personalization: scripts and languages— reading speed—reading postures—computer display settings—work, leisure, shopping and napping habits

• Selective, topic-, time-, or location-specific summarization

• Logging queries, responses, and user reactions for improving retrieval

Page 18: A Billion Word to Remember

Ethical and legal issues

2/1/2017 A billion words to remember 18

• Security and privacy: what do these mean over a lifetime?

• What is the legal difference between deliberately acquired information, as with a smartphone or camera, and autonomously acquired information?

• Where must the owner of a Lifetime Reader not look (and record)?

• What responsibility does delayed discovery of a crime entail (for instance, reading an airplane seat neighbor’s laptop screen that one glanced at two years ago)?

• What are the social and marketing implications of lifetime text logging?

Page 19: A Billion Word to Remember

Product announcement expected on or about April 1, 2021.

Thank you for your interest and support!

2/1/2017 A billion words to remember 19

Page 20: A Billion Word to Remember

2/1/2017 A billion words to remember 20

Page 21: A Billion Word to Remember

No, I won’t wear it!

2/1/2017 A billion words to remember 21