Personal Information Systems and Personal Semantics
Gregory Grefenstette
CLEF 2015
September 8, 2015 ,
Information is moving from the Web to Apps Each person generates a lot of data Two communities use it now Search in one’s own data is the future Four ways to search We need personal facets
2015 CLEF 2015 Grefenstette - 3
http://www.statista.com/statistics/263795/number-of-available-apps-in-the-apple-app-store/
Apple announced that 100 billion apps had been downloaded from its App Store (June 2015)
2015 CLEF 2015 Grefenstette - 4
2014
Another trend
Smart Glasses
http://en.wikipedia.org/wiki/File:A_Google_Glass_wearer.jpg
http://en.wikipedia.org/wiki/File:Aimoneyetap.jpg
http://en.wikipedia.org/wiki/File:Golden-i_3.8_Headset_Computer.png
Sony US Patent Application 20130069850
Microsoft US Patent Application 20120293548
https://www.youtube.com/watch?v=b7I7JuQXttw
Okay … Apps, Quantified Self, Smart Glasses Step back to NOW
Personal Big Data
Personal Big Data
Email sent
Email received
Social network posts
IP address location
SMS, chats
Search history
Web pages visited
Media viewed
Credit card purchases
Call data
GPS locations
Vitals signs
Activity/inactivity
Lifestyle
Conversations
Reading
People seen
Noises heard
•
Who uses this data today?
Surely, each person should have the same access to their own data
Impediments to using our own data
• Data Silos
• Ownership
• Privacy
• Big Data Problems • Variety • Volume • Merging -- Semantics
Supposing we could get all our data back into our own hands, how could we search it? Short course on 4 types of search
Search Engines – Cranfield/SMART Model
14 8 Sept 2015 CLEF 2015 Grefenstette
ftp://ftp.cs.cornell.edu/pub/smart/cran
.I 6
.W ventricular septal defect occurring in association with aortic regurgitation .I 7 .W radioisotopes in heart scanning. mainly used in diagnosis of pericardial effusions. also used to study tumors, heart enlargement, aneurysms and pericardial thickening. technetium, rihsa, radioactive hippurate, cholegraffin are used. .I 8 .W the effects of drugs on the bone marrow of man and animals, …
5 332 5 333 6 112 6 115 6 116 6 118 6 122 6 238 6 239 6 242 6 260 6 309 6 320 6 321 6 323 7 92 7 121 7 189 7 389 7 390 7 391 7 392 7 393 8 52 8 60
conditions . .I 237 cisternal fluid oxygen ... using a beckman micro-oxyg.. tension simultaneously in the.. and in arterial blood under.. that the cisternal oxygen.. oxygen tension of the surroun. the available free oxygen... duration in the cerebral... .I 238 ventricular septal defect obstruction . a case of ventricular... lesion and infundibular... coronary cusp of the aortic.. septal defect, was demonstra.. as a polyp-like mass in the... catheterization and angiocard ventricular outflow obstr... .I 239 functional adaptations of the congenital heart disease ....
queries
qrels documents
2015 CLEF 2015 Grefenstette - 15
Search Engines – Cranfield/SMART Model
2015 CLEF 2015 Grefenstette - 16
Schedules 3 Economics, Education, Society 33 Economics and Management 338 Industries, Products 338.1 – 338.4 Specific kinds of industries 338.4 Secondary Industries and Services 338.47 Goods and Services
Built from 338.471 – 338.479 Subdivisions for Goods and Services
Schedules 338.476 Technology 338.4767 Manufacturing 338.47677 Textiles 338.476772 Textiles of Seed hair fibres 338.4767721 Cotton
Built from 338.47677210 Facet Indicator for Standard Subdivision Table 1 338.476772109 Historical, geographic, persons treatment Built from 338.4767721094 Europe Western Europe Table 2 338.47677210942 England and Wales
338.476772109427 Northwestern England and Isle of Man 338.4767721094276 Lancashire
“The Lancashire cotton industry : a study in economic development” Assigned DDC Code: 338.4767721094276
Search Engines – Dewey Decimal Faceted Model
2015 CLEF 2015 Grefenstette - 17
Search Engines – Dewey Decimal Faceted Model
2 Other Search Models: Maps, Time Intervals
2015 CLEF 2015 Grefenstette - 18
Past Attempts
2015 CLEF 2015 Grefenstette - 19
MyLifeBits
2015 CLEF 2015 Grefenstette - 20
Gemmell, Jim, Gordon Bell, and Roger Lueder. "MyLifeBits: a personal database for everything." Communications of the ACM 49.1 (2006): 88-95.
"But even with convenient classifications and labels ready to apply, we are still asking the user to become a filing clerk – manually annotating every document, email, photo, or conversation."
LifeLog
2015 CLEF 2015 Grefenstette - 21
…The user can order the life-log agent to add retrieval keys (annotation) with an arbitrary name by simple operations on his cellular phone while the agent is capturing a life-log video. This enables the agent to identify a scene that the user wants to remember throughout his life, and thus the user can access easily to the videos that were captured during precious experiences"
Aizawa, Kiyoharu, Tetsuro Hori, Shinya Kawasaki, and Takayuki Ishikawa. "Capture and efficient retrieval of life log." In Pervasive 2004 Workshop on Memory and Sharing Experiences, pp. 15-20. 2004.
Stuff I’ve Seen
2015 CLEF 2015 Grefenstette - 22
…Research in cognitive psychology has found that people remember information, particularly older information, not in terms of exact time, but in terms of key episodes, such as a child’s birthday, exotic travel,…
Cutrell, Edward, Susan T. Dumais, and Jaime Teevan. "Searching to eliminate personal information management." Communications of the ACM 49.1 (2006): 58-64
PERSON
2015 CLEF 2015 Grefenstette - 23
…we define the general category for user’s activity in advance, such as ordinary activity and extra-ordinary activity. In ordinary activity is related to the activity in home or office. Generally, the activities occurred outside of those area, they are classified as extraordinary activities. In addition to these pre-defined activities, users can add their own activity through our learning based structure… For some duration, we record whole activities of user. For the repeated activities at same time, in same place with similar objects, our activity engine will register as user defined activities by asking in which category those can be included.
Kim, Ig-Jae, et al. "PERSON: personalized experience recoding and searching on networked environment." Proceedings of the 3rd ACM workshop on Continuous archival and retrival of personal experences. ACM, 2006.
Personal Data Prototype
2015 CLEF 2015 Grefenstette - 24
…Landmarks of tags are defined by the frequency of tags that are assigned to each item of personal data. A tag that has been in heavy use during a period of time is a candidate for a landmark. A tag that has rarely been used during a long period of time is also a candidate for a landmark. Outliers are candidates for landmarks in time-series data, such as home energy use, the number of steps walked, and histories of body weight. Data that exceed pre-defined or user-defined thresholds are also candidates. Other landmarks are public landmarks, which include shocking public news, bestsellers, blockbuster films, and annual rankings of top Web-search words. We can recall our own experiences on those days from these landmarks.
Teraoka, Teruhiko. "Organization and exploration of heterogeneous personal data collected in daily life." Human-Centric Computing and Information Sciences 2.1 (2012): 1-15.
Dublin City University
2015 CLEF 2015 Grefenstette - 25
…The user can order the life-log agent to add retrieval keys (annotation) with an arbitrary name by simple operations on his cellular phone while the agent is capturing a life-log video. This enables the agent to identify a scene that the user wants to remember throughout his life, and thus the user can access easily to the videos that were captured during precious experiences"
Qiu, Zhengwei. "A lifelogging system supporting multimodal access." PhD diss., Dublin City University, 2013. Wang, Peng, and Alan F. Smeaton. "Aggregating semantic concepts for event representation in lifelogging." Proceedings of the International Workshop on Semantic Web Information Management. ACM, 2011.
Okay, we’ve seen -- Apps / QS -- Personal Big Data -- Some early attempts Everyone says Time is important Maps are important String search is important but… Facets, what are our personal facets? How can we automate them?
2015 CLEF 2015 Grefenstette - 26
2015 PTraces Grefenstette - 27
swimming
2015 PTraces Grefenstette - 28
swimming
(my) people involved in something about swimming
2015 PTraces Grefenstette - 29
swimming
things I’ve bought involving swimming
2015 PTraces Grefenstette - 30
swimming
(my) photos and facebook posts related to swimming
2015 PTraces Grefenstette - 31
swimming
emails about swimming things
2015 PTraces Grefenstette - 32
swimming
places I’ve been involving swimming
2015 PTraces Grefenstette - 33
swimming
days involving swimming things
2015 PTraces Grefenstette - 34
swimming
phone calls about swimming things…
2015 PTraces Grefenstette - 35
swimming
Rather Self-Centred, no?
2015 CLEF 2015 Grefenstette - 36
Personal Information System
Personal archives
Induction semantic dimensions
Personal Semantic hierachies
Crowdsourced semantic Hierarchies (eg. Wikipedia)
Expert semantic Hierarchies (eg. MeSH)
Ingest/Annotate/Merge
2015 PTraces Grefenstette - 38
swimming
Kni tt i ng
poker
Paint i ng
.
.
.
Pai
nt i
ng
Expert >>> Crowdsourcing >>> Personal Ontology Folksonomy Models
Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models
Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models
Knitting>Knitting_methods_for_shaping>Short_row_(knitting) Knitting>Knitting_stitches Knitting>Knitting_stitches>List_of_knitting_stitches Knitting>Knitting_stitches>Basic_knitted_fabrics Knitting>Knitting_stitches>Decrease_(knitting) Knitting>Knitting_stitches>Dip_stitch Knitting>Knitting_stitches>Drop-stitch_knitting Knitting>Knitting_stitches>Elongated_stitch Knitting>Knitting_stitches>Fair_Isle_(technique) Knitting>Knitting_stitches>Grafting_(knitting) Knitting>Knitting_stitches>Loop_knitting Knitting>Knitting_stitches>Pick_up_stitches_(knitting) Knitting>Knitting_stitches>Plaited_stitch_(knitting) Knitting>Knitting_stitches>Slip-stitch_knitting Knitting>Knitting_stitches>Yarn_over Knitting>Knitting_tools_and_materials Knitting>Knitting_tools_and_materials>Eisaku_Noro_Company Knitting>Knitting_tools_and_materials>Hank_(textile) Knitting>Knitting_tools_and_materials>Knitting_machine Knitting>Knitting_tools_and_materials>Knitting_Nancy Knitting>Knitting_tools_and_materials>Knitting_needle Knitting>Knitting_tools_and_materials>Knitting_needle_cap Knitting>Knitting_tools_and_materials>Lazy_Kate Knitting>Knitting_tools_and_materials>Liaghra Knitting>Knitting_tools_and_materials>Nostepinne Knitting>Knitting_tools_and_materials>Row_counter_(hand_knitting) Knitting>Knitting_tools_and_materials>Stitch_holder Knitting>Knitting_tools_and_materials>Stocking_frame Knitting>Knitting_tools_and_materials>Variegated_yarn Knitting>Knitting_tools_and_materials>Yarn
Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models
Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models
2015 CLEF 2015 Grefenstette - 44
2015 CLEF 2015 Grefenstette - 45
Well, no….
2015 CLEF 2015 Grefenstette - 46
Tweet
2015 CLEF 2015 Grefenstette - 47
Less than 12 hours until I am in the pool crying... thankful for mirrored goggles
Swimming>pool Swimming>goggles
facets
I’d want this …
2015 CLEF 2015 Grefenstette - 48
swimming -- weightlifting, cycling, gymnastics, judo, table, volleyball, archery, rowing, badminton, track, water, taekwondo, tennis, field, diving, handball, boxing, softball, karate, pentathlon, fencing, athletics, triathlon, wrestling, soccer
http://webdocs.cs.ualberta.ca/~lindek/downloads.htm Distributional Semantics 1.5 billion words
Wordnet
Existing taxonomies are for societal exchanges
Do you want to buy this? What famous person did this when? What can we make for this?
2015 CLEF 2015 Grefenstette - 49
We are missing a description of what is related to us, doing something…
specific vocabularies loose taxonomies … facets
Somthing like…. Sports/swimming/backstroke Sports/swimming/on my back Sports/swimming/breastroke Sports/swimming/fins Sports/swimming/goggles Sports/swimming/fast lane Sports/swimming/slow lane Sports/swimming/laps Sports/swimming/lifeguard Sports/swimming/pool Sports/swimming/lake Sports/swimming/ocean Sports/swimming/Neuilly Nautic Centre Sport/swimming/South Hills Pool Sports/swimming/towel Sports/swimming/25m Sports/swimming/goggles Sports/swimming/cap Sports/swimming/swim suit 2015 CLEF 2015 Grefenstette - 50
2015 CLEF 2015 Grefenstette - 51
http://www.notsoboringlife.com/list-of-hobbies/ Not just swimming!
Conclusion on Personal facets
There is a lot of work to do • for predictable needs (hobbies, pastimes, sports), we do not
have the basic facets we need • for personal information (family, friends, familiar places), we
have very little • And this should be multilingual, too
2015 CLEF 2015 Grefenstette - 52
• Information is moving from the Web into Apps • People are generating information in these siloed Apps • People generate more digital information every day • Wearable computing will create even more
2015 CLEF 2015 Grefenstette - 53
Conclusion: Searching Personal Big Data
• Information is moving from the Web into Apps • People are generating information in these siloed Apps • People generate more digital information every day • Wearable computing will create even more
• At one point, people will want their information back
2015 CLEF 2015 Grefenstette - 54
Conclusion: Searching Personal Big Data
• Information is moving from the Web into Apps • People are generating information in these siloed Apps • People generate more digital information every day • Wearable computing will create even more
• At one point, people will want their information back • When you have too much information, you need facets • The facets for organizing personal information will be
needed and do not yet exist
2015 CLEF 2015 Grefenstette - 55
Conclusion: Searching Personal Big Data
Conclusion: Searching Personal Big Data • Information is moving from the Web into Apps • People are generating information in these siloed Apps • People generate more digital information every day • Wearable computing will create even more
• At one point, people will want their information back • When you have too much information, you need facets • The facets for organizing personal information will be
needed and do not yet exist • There are billions of cell phone users. They will all
want this. You should start working on it.
2015 CLEF 2015 Grefenstette - 56
- 57 - 57
Thank you !
www.inria.fr
Gurrin, Cathal and Smeaton, Alan F. and Doherty, Aiden R. (2014) LifeLogging: personal big data. Foundations and Trends in Information Retrieval, 8 (1). pp. 1-125. ISSN 1554-0677
Content type Per day Volume per day Volume per year Video 16 hours 90 GB 33 TB Autographer Camera
3000 images 1.3 GB 480 GB
Audio 16 hours 630 MB 230 GB Microsoft Sensecam
4500 images 82 MB 30 GB
Accelerometer 58,000 readings 138 KB 50 MB Locations 10,000 readings 27 KB 10 MB Bluetooth Interactions
400 (estimated) 5 MB 2 GB
Words heard or read
100,000 700 KB 255 MB