the future of information chris pal assistant professor, computer science university of rochester
TRANSCRIPT
The Future of Information
Chris Pal
Assistant Professor, Computer Science University of Rochester
What Comes to Your Mind?
For the words• Picture• Book• Library• Newspaper• Radio• Television• Telephone• Computer
What Comes to Your Mind?
Now, let’s consider some recent developments…
For the words• Picture• Book• Library• Newspaper• Radio• Television• Telephone• Computer
Electronic Storage of All Human Knowledge is Within Reach
The Internet Archive - Brewster Kahle
• The Wayback Machine • Archive of the Internet from 1996-Present• Size, 2 petabytes of data• Currently growing at 20 terabytes per month.• ‘Eclipses the amount of text contained in the
world's largest libraries, including the Library of Congress’
How is this Possible?
Storage Technology• 1 Terabyte of hard disk, approx. $500• A Petabyte -
on the order of $1million (8 racks)
Digitization Efforts• Books: 10 cents/page, about $30/book
150,000 books • Audio: $10/hour for archival
100,000 items• Video: $15/hour to digitize
50,000 videos120 TB Rack
Also, Consider thatMemory for Small Devices
• Has reached a critical point for many applications – importantly it is read/write
• Smaller and less expensive each year
$34 Retail $18 Retail
Concrete Examples
• kilo 103
• mega 106
• giga 109
• tera 1012
• peta 1015
• exa 1018
• zeta 1021
• yotta 1024
• 1 song as an MP3, 5 MB• 400 songs on 2GB Card• 200,000 songs on a PC• 200 Million songs in a room‘We’ are here now
Physical Transportation of Information can be Effective
• Data over radio is also being used in Mali• Locally processed and re-distributed via broadcast radio• Local access via local wireless or flash cards also possible• Similar to a North American video store
From: S. Keshav, U. Waterloo
What Should We Store First?
• There are still many choices1999 estimate of the world’s production of storable information: 1.5 exabytes
• Smallholders - Agricultural InformationExamples: Core Historical Literature of Agriculture (CHLA)
Digitization and Information Extraction
Information Extraction
• Allows us to create structured databases from unstructured text (e.g. monster.com)
ID Crop/Animal
Location Issue Remedial Measures
130 Sweet Corn
Long Island
Disease Resistant strains
129 Wheat Monroe County
Insect Pesticide A
128 Dairy Cow
Ithaca Milk Yield
etc.
• From the database we can: (1) enable better indexing and search(2) generate user tailored summaries & digests
Ways to Impact Small Landholders
• Create archival information sources- Digitize existing general knowledge and past experimental information & extract DB records - Obtain and include local information sources- Create image to text for local languages
• Mediate the flow of current information- New technologies: seeds, fertilizers, etc.- Alerts about diseases - Market information, access to inputs, capital
• Filter Information, Process and Distribute- How do we go from raw information to the small landholder?- Challenges: literacy, infrastructure
Paper, Subscriptions & Customization
• Traditional paper formats are still powerful (e.g. Classical Newsletters, BMPs, Spore, etc…)
• We can learn from magazine subscription models - market based implicit sustainability
• Information processing allows us to create user customize digests in both electronic (e.g. text, audio) and paper formats, a custom ‘newspaper’
• User customized search and feedback are active areas of research
• What if information could search for you?
Broadcast Radio
• Radio is comparatively low cost for information delivery to non-literate people
• Already used effectively for education in Africa, e.g. Education and Development Center in Africa reaches 80,000 children
• Can effectively reach women• Low power receivers:
solar power and hand cranked generators available for many years
Radio Today and Tomorrow
1. Radio receivers can now be fully integrated into small, low power recording devices and cell phones
Allows for time shifting of broadcasts
2. New technologies for data broadcast using radio allows large areas to be covered with low infrastructure costs
Market information, weather, local information, audio metadata for indexing
Indexing and Organizing Multimedia
Feedback Mechanisms
• Online retail sites have already deployed techniques for rating products and media
• Spectrum of feedback:- Simple numerical rating- Detailed product reviews- Complete online discussions and debates
• ‘Easy’ implement extensions of these ideas
• What about interactivity with low bandwidth communication, such as text messaging
Question Answering Forum
From Krithi R., IIT Bombay. Thousands of posts, serving all of India.
Both a web interface and cell phone based SMS interaction.
Leveraging Q&A Databases
• Consider Google 411 or Microsoft’s variant (demo)• Here, we can create methods to identify if an
answer already exists in the database• Given the archive, we can hire people to translate
questions and answers into local languages• We can then use this corpus as an excellent test
bed to develop automated translation techniques• Speech recognition and synthesis techniques
could be developed / tailored for these scenarios• Resulting technology could then be applied to
augment other information sources, e.g.
Community Generated Content
• Associate information with maps, by hand or with extraction• We could easily fit the text of an agricultural Wikipedia /
Agpedia / WikiGIS on a flash memory card• Distribute information formatted for cell phones, • Use text to speech to give access to non-literate users
WikiMedia vs. Wikipedia
But, What is a Television?
7” Digital ‘Picture Frame’ $60 $40 Cell Phone
[All you need to do to create a computer for the developing world is to connect a phone to a television] – C. Pal, C. Mundie and B. Gates :)
What is Important
• Devices with no moving parts
• Low power consumption
• For LCDs – better text fidelity
We can think about
• Television programs as files (100s MB), radio programs as files and
• New, inexpensive low power chips process these files
Implications and Ideas
• Parts of the developing world may skip the era of CRTs and broadcast TVs.
• Interactive radio may be a concrete first step.
• 5 year horizon: multimedia, wikimedia like agricultural portals tailored new device formats.
Highlights: How Technology Can Enhance Information Value ‘Chains’
• Read/write data storage now very inexpensive• Digitization and modern information extraction can
help organize information on a massive scale• Language technologies: translation, speech
recognition, speech synthesis – almost mature• Support the construction of high quality print,
radio, and multimedia productions by giving communicators greater access to information
• Low cost ‘personal’ and shared devices can be used to interact with structured multimedia
• Immediate, medium and long term solutions.
the Ecosystem