semantic web and information graph

16
Semantic web and Graphical Representation of Information Chao-Hsuan Shen Winter 2014 Audience To begin, I want you ask yourself one important question: What’s the point of all that connections and data, anyway? 1

Upload: lucas-shen

Post on 02-Dec-2014

54 views

Category:

Engineering


3 download

DESCRIPTION

my note on semantic web and info graph

TRANSCRIPT

Page 1: Semantic web and information graph

Semantic web and Graphical Representation of Information Chao-Hsuan Shen!!Winter 2014!

!!!!!!!!!!!!!!!!!!!Audience!

!!To begin, I want you ask yourself one important question: What’s the point of all that connections and data, anyway?

�1

Page 2: Semantic web and information graph

Table of Contents!

!!!

!

Audience! 1!

Table of Contents! 2!

Introduction! 3!

Semantic Web! 4!

What does semantic mean? ! 4!

What is semantic web?! 6!

How to build a semantic web? ! 8!

Difference between original Web 2.0 and semantic web! 10!

Applications! 11!

Conclusion and the future! 14!

Bibliograph! 16

�2

Page 3: Semantic web and information graph

Introduction!

In the book: “Connectome” , authored by MIT professor of computational neuroscience, Sebastian Seung, he describes a trend that has been haunting me since the first time I read the book. The vision as follow: At the beginning of science development, mathematics, physics and chemistry shaped a world view of materialism. By this world view, we interpret things as “a bunch of atoms.” Then by the advance of mechanics, biology and neuroscience, we marveled about the intricate machinery of working system. A mechanism world view is formed. We started to interpret things as “machines.” Now, affected by the advance of computer science and Internet, things now are “a bunch of information.”

Nowadays we are inundated by information. Countless data is being generated, processed, transferred, and stored. All these could happen because of the advancement of computing power, development of Internet infrastructure and protocols. To move forward, we need to change. We have to change the way we use internet. The classic web is a “web of document”. Semantic web is a “web of data.” The ultimate goal of the Web of data is to enable computers to do more useful work and to

develop systems that can support trusted interactions over the network. What is it really? How to make it happen?

�3

Page 4: Semantic web and information graph

Semantic Web!

What does semantic mean? !

“Semantic :of or relating to meaning in language” …..Merriam Webster It is all about meaning. What is meaning? How could we have a meaning? In spoken language, we define meaning, “the idea that is represented by a word, phrase, etc.” First we have vocabularies, but they are not sufficient to represent intricate ideas. That’s the reason we need grammar to combine words into sentences, and we could keep building complex ideas under this framework. This is the process of representing idea in spoken language. We could say, meaning is defined by combining entities by predefined syntactic rules. To elaborate, let’s have thought experiment. Assume I don’t know much about English and all I have is a Merriam Webster at hand. Say, now I am looking for the meaning of an unknown vocabulary. Then I open the dictionary, find the world I am looking for, then its explanation pops out. To be more analytical and skeptical, I am confused again, because the meaning of a word is defined by a combination of couple words, and none of them I do really understand. So I could look up words in the

�4

Page 5: Semantic web and information graph

explanation and end up find even more unknown words. The process could go on indefinitely. Finally I give up. However, most of people are doing fine with dictionary, why am I having such trouble? This is the classical philosophical problem: Can anyone learn a wholly new language only by dictionary? Answer is no. If dictionary can’t give us a meaning of a word, what is a meaning anyway? In the thought experiment, a dictionary for a pure novice is meaningless, but the dictionary itself represent the foundation of meanings in a language. What is a dictionary? In it, each word is recursively defined by others. Graphically speaking, a dictionary is a graph that connect and represent relationships between words. Dictionary is a web of relationships. One word’s meaning is defined by it’s relationships to others and it’s position in the graph. Each language has a graph, and different language’s graphs are like parallel universes. That’s the reason why we can’t learn other language only by dictionary unless we somehow possess enough mapping between our native language to new language. The mapping between two language system is the bridge of parallel universes. To have a meaning, we need a graph. Graph is a prerequisite of meaning. Without it, nothing could be semantic. This is the on going revolution to the next generation about how do we organize information in the world . We want to make the internet more semantic so we could do things we couldn’t even have imagined before. In context of INTERNET, we already have a graph. The question next is how to make it semantic?

�5

Page 6: Semantic web and information graph

What is semantic web?!

Definition of Semantic Web:!

“A set of formats and languages that find and analyze data on the World Wide Web, allowing

consumers and businesses to understand all kinds of useful online information.” !

In order to understand what is semantic web, we have to keep the analogy to human spoken language in mind. A graph is a combination of vertices and edges. Vertices in spoken language are words. What is vertices in World Wide Web? During the progress of past two decades, WWW has developed most rapidly as a medium of documents for people. The classic example is WiKipedia, which stores over 7 millions articles on the web. However, only vertices can’t form a graph. We need to connect those documents in a meaningful way so each one could find its meaning on the web. !What does connection mean?!

Someone may argue, in the INTERNET, we are so connected. What do you mean we need to connect documents again? Well, connection exists in different forms. What INTERNET does to us is to connect nodes around the world. Nodes means machines, but we don’t really care about machines. Machines represent the ability to process data. The truth is: we care about data. Only data have meanings to humans. I am not saying physically connecting hosts is not important. Semantic web has to be build upon physically connected architecture. The last two decades have laid down the foundation for future’s development. The connections we need to create for future semantic web are logical connections between informations on the web. !Why do we want to build a semantic web?!

Most of the Web’s contents today are designed for humans to read, not for computer programs to manipulate meaningfully. Computers can adeptly parse Web pages for layout and routine processing, but in general, computers have no reliable way to process the semantics. The semantic web will bring structure to the meaningful content of web pages, creating an environment where software roaming from page to page can readily carry out sophisticated tasks for users. The semantic web is not a separate web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. Make it an extension means we are not going to waste last 20 years handwork and start all over again. By tweaking old web system, we could unleash the real potential. So tweaking old system is only the beginning, the purpose is to enhancing people’s life.

�6

Page 7: Semantic web and information graph

Only by cooperation between human and machine we could efficiently enhance people’s life. Imagine now you are making a travel plan for your family. Now you have two extreme options to accomplish the task:

1. Delegate the task to a travel agent and she will bring the package to you later. 2. Design and build a super powerful AI and use it.

! The first option will solve the task but we can do better. First, good agent could be efficient but doesn’t scale well. The size of workload doesn’t shrink. Moreover, travel agent may know you budget and possible time table but he doesn’t know your life style, you family’s life style, preferences and so on. These criteria are essential for a wonderful travel experience. Although the second option seems to excluded human in decision loop so the whole process could be completed automatically, it is more unrealistic than first one for now. Nowadays all expert system and AI simulation could only be hold on supercomputers with carefully calibrated application by a crew of experienced professionals. This is money, so much money. So by carefully structure documents and websites, an authorized program could check on your family’s schedule, roam through different travel websites, airplane and hotel websites and leave those hard decision to you. For computer, quantitative problem is easy, but qualitative problem is hard, but for you happen to be opposite. Force computer to make qualitative decision would be very inefficient for now because current computer architecture is built to do mathematical computation and that is the reason why pure AI may not be too efficient to solve human problem. For this instance, neither AI or real travel agent could answer the question for your family ,like which is better, Greek or France? Semantic web want to provide a chance for meaningfully connecting discrete documents on the web and a chance for efficient human / computer cooperation. And in a era of information explosion, we do really need some help. Manually parsing data, find patterns ,understand and react seem to be outmoded in this generation.

�7

Page 8: Semantic web and information graph

How to build a semantic web? !

To build a semantic web is similar to build a language. Any language has 3 most distinctive building blocks.

1. basic elements 2. ways of combination 3. ways of abstraction

The basic elements in the semantic web are documents in countless websites. We already have many today. Let me elaborate on ways of combination. This is the essence of transforming documents into a web of useful data. !Ways of combination!

To formalize how documents could connect with each other, actually what we are trying to do is to build a language to describe relationships between any two document. We want to build a language to describe relationships between any two information entities. Actually the overall goal is the same as building a human language without emotion part. Why do we need a new language? In human world, to cooperate well, you have to take benefits of both parties into consideration. Here is the same principle, we need computer to do more for us so that humans could skip those boring searching and parsing part to what we really care and make crucial decisions. To achieve, we need a new language designed with machine in heart. The World Wide Web Consortium has been working very hard to promote and standardize this “semantic web unified language.” The architecture of such language is three layer hierarchy, explained from low level to high.

• RDF Format • Ontology Languages • Inference Engines !

RDF - Resource Description Framework!

The most fundamental building block is Resource Description Framework (RDF), a format for defining information on the Web. Each piece of data, and any link that connects two pieces of data, is identified by a unique name called a Universal Resource Identifier, or URI. (URLs—the common Web addresses that we all use, are special forms of URIs.) In the RDF scheme, two pieces of information are grouped together into what is called a triple. URIs can be agreed on by standards organizations or communities or assigned by individuals. The relation “is a” is so generally useful, for example, that the consortium has published a standard URI to represent it. The URI “http://en.wikipedia.org/ wiki/Dolphin” could be used by anyone working on RDF to represent the concept of dolphin. In this way, different people working with different sets of information can nonetheless share their data about dolphins and television animals. And people everywhere can merge knowledge bases on large scales. !

�8

Page 9: Semantic web and information graph

Ontology Language!

Individuals or groups may want to define terms and data they frequently use, as well as the relations among those items. This set of definitions is called an ontology. Ontologies can be very complex (with thousands of terms) or very simple. Web Ontology Language (known as OWL) is one standard that can be used to define ontologies so that they are compatible with and can be understood by RDF. !Inference Engines!

Ontologies can be imagined as operating one level above RDF. Inference engines operate one level above the ontologies. Software programs examine different ontologies to find new relations among terms and data in them. For example, an inference engine would examine the three RDF triples below and deduce that Flipper is a mammal. Finding relations among different sources is an important step toward revealing the “meaning” of information.

So in general, the RDF names each item and the relations among items in a way that allows computers and softwares to automatically interchange the information. Additional power comes from ontologies and other technologies that create, query, classify and reason about those relations. For example:

• SPARQL, a query language that allows applications to search for specific information within RDF data.

• GRDDL, which allows people to publish data in their traditional formats, such as HTML or XML, and specifies how these data can be translated into RDF. !

A more refined hierarchy could be represented as follow:

�9

Page 10: Semantic web and information graph

Difference between original Web 2.0 and semantic

web!

Just as the HTML and XML language have made the original Web robust, the RDF language and the various ontologies based on it are maturing. Here are some applications by which we start to appreciate the power of semantic web technology.

�10

Page 11: Semantic web and information graph

Applications!

Knowledge graph!!

If you want to understand the state of ongoing neural network simulation around the world, the chance is such information is distributed across different websites. So you have to google one keyword, find something and google more keywords. After so many iterations of search and research, you mat start to have a clearer picture. Can we do better? Like humans, data actually relates to each other. Try Google knowledge graph. When you search something, it returns a linked data graph. This would be much more efficient to repeatedly search and research. In the graph you could have a grand vision.

By using the idea and technology of semantic web, users don’t have to depend on google to build a graph, and developers don’t have to wait for google to open API. Like openstreetmap.org, each person could contribute to map building. And if everyone does its little part, the quality of aggregated map will not be inferior than google map. We could create our knowledge graph and enjoy the power of semantic web just by every participant does his/her little part. Information Verification!

Verification process of information could be enhanced too. How to verify the basic correctness of information online? Most of time, we don’t. We choose to believe, relying hugely on big brand and believe blindly. Even you do want to verify the information, the cost is very high. In the context of semantic web, if most of user carefully calibrate the ontologies and inference engine right, verification is not a peer to peer paradigm anymore. We could verify information in a network just like we did peer review before publishing papers. This distributed verification would be more efficient than in Web 2.0. !

�11

Page 12: Semantic web and information graph

Social network!

Nowadays, everyone seems to have joined one or more social networks. I personally use Facebook, Twitter, Google+, Weibo, WeChat, Line, Instagram and so on. When you post pictures, chat with friends, share informations, your personality seem to be divided by the walls between different social networks because data in different networks can not be integrated and connected. All data you create by using any social network service is the image or your online personality. Why should it be divided? In the Friend of Friend project (FOAF), a data language and ontologies is being used and applied well. It created a vocabulary that describes the personal information, by which users could decide what to post and finds common interests with each other. The basic idea behind FOAF is simple: the Web is all about making connections between things. FOAF provides some basic machinery to help us tell the Web about the connections between the things that matter to us. Thousands of people already do this on the Web by describing themselves and their lives on their home page. Using FOAF, you can help machines understand your home page, and by doing so, programs could learn about the relationships that connect people, places and things described on the Web. FOAF uses W3C's RDF technology to integrate information from your home page with that of your friends, and the friends of your friends. !Drug Discovery!

This is a very good example of how human can work with machines to deliver better results for personalization of drugs. Two challenges:

1. Each person’s unique information. • Genes • Physical environment • Emotional environment

2. Rapidly dynamically changing medical knowledge and divided database. How to meld a bewildering area of data set like: historic and current dedicate records per person + scientific reports on a number of drugs + drug tests + potential side effects and outcome from other patients? This is a decision about a person’s health, and all data above have to be considered and this is why personalized medication is not a reality yet. What semantic web technology could help?

�12

Page 13: Semantic web and information graph

A research team at Cincinnati Children’s Hospital Medical Center is leveraging semantic capabilities to find the underlying genetic causes of cardiovascular diseases. Began by downloading into a workstation the databases that held relevant information but from different origins and in incompatible formats. These databases included Gene Ontology (containing data on genes and gene products), MeSH (focused on diseases and symptoms), Entrez Gene (gene-centered in- formation) and OMIM (human genes and genetic disorders). The investigators translated the formats into RDF and stored the information in a Semantic Web database. They then used Protégé and Jena, freely available Semantic Web software from Stanford University and HP Labs, respectively, to integrate the knowledge. The researchers then prioritized the hundreds of genes that might be involved with cardiac function by applying a ranking algorithm somewhat similar to the one Google uses to rank Web pages of search results. They found candidate genes that could potentially play a causative role in dilated cardiomyopathy, a weakening of the heart’s pumping ability. The team instructed the software to evaluate the ranking information, as well as the genes’ relations to the characteristics and symptoms of the condition and similar diseases. The software identified four genes with a strong connection to a chromosomal region implicated in dilated cardiomyopathy. The researchers are now investigating the effects of these genes’ mutations as possible targets for new therapeutic treatments. This job used to be done only by humans. In the traditional research process, computer and database provide very limited query function. Because each database are somehow divided by its formats, so researchers have to pore through 4 or 5 databases, trying to discern possible candidates. This is a painstaking task obviously. With help of semantic web technology, searching and cross referencing database could be done be computers. Together, we could move and grow faster and more efficient.

�13

Page 14: Semantic web and information graph

Conclusion and the future!

In order to grasp the gist of semantic web, we need one more thought experiment

! Imagine we live in a world with millions of people, but we don’t have spoken and written language. People still have daily routines, see and touch different things, develop tools to facilitate their works, get inspired and have some ideas. However, without language, we could communicate only by voice, visual and sequence of behaviors. We don’t learn by reading, but by experiencing, doing, and seeing real things. Now we install spoken/written language into this world. What is changed? Before we have a language, we I want to express the idea of “apple” to you, I have no choice but to find a real apple and show you, but we both know what is apple by heart. By the first time we saw apple or maybe we even tastes it, we already sample the idea of “apple” in our brain. The real apple in the pre-language world is acting as a pointer to our idea of apple in mind. Communication means I want you to feel what I felt, see what I saw, experience what I experienced in the idea level. Without language, we are so limited by the source of pointers because finding a real thing or replay what just happened may be not feasible. Language gives people power because we provide them a more efficient source of pointers and ways to combine them to represent ideas. When you see “apple”, this word, I don’t have to show you a real apple and you know what I am talking about. In order to communicate, compute and processing data in human semantic ways, pure existence of information is not sufficient, we need a efficient way to link different information together. Language is the answer. Semantic web tries to formalize a language by which we could relate and connect information on countless websites, but this time we don’t do this purely for human, we do this for machine in such a way they could in return help us achieve things we couldn’t achieve without linked data.

�14

Page 15: Semantic web and information graph

The point goes beyond only linking documents on websites. Human culture is hitting a tipping point that we are promoting computers’ role in our decision making process. We need them to do more for us, more actively. So by creating an unified semantic web language, we could bridge the human knowledge system to still growing computing power around the world. And the power of this synergy is just about to explode. Connections are not limited only to documents or databases, but could reach to physical stuff in our world. In next two decade, everything would be connected to the internet from light bulbs to your glass, and the movement of semantic web will direct this connection so that computer has a role in it. Applications? Beyond imagination right now. Convenience, maybe. Ethical, privacy, control issue? Definitely. Grand visions rarely progress exactly as planned, but the semantic web is indeed emerging and making online information more useful than ever. Go back the this question: What’s the point of all that connections and data, anyway? If we couldn’t make people’s life easier and better, why do we need engineering? Here we could choose to believe that by this approach humans could do better. Moreover, we could actively participate and make sure things really get better. !!

�15

Page 16: Semantic web and information graph

Bibliograph!

• http://en.wikipedia.org/wiki/Semantic_Web!

• http://www.w3.org/standards/semanticweb/ !

• http://www.w3.org/DesignIssues/LinkedData.html!

• http://rdfa.info/!

• http://www.foaf-project.org/!

• http://en.wikipedia.org/wiki/Lingustics!

• http://microformats.org/!

• Berners-Lee, Tim; James Hendler and Ora Lassila (May 17, 2001). "The Semantic Web".

Scientific American !

• John F. Sowa: Principles of semantic networks. Explorations in the representation of

knowledge, Morgan Kaufmann, San Mateo, Cal. 1991, ISBN 1-55860-088-4.!

• G.W. Flake, D.M. Pennock, and D.C. Fain, “The Self-Organized Web: The Yin to the

Semantic Web’s Yang,” IEEE Intelligent Sys- tems, July/Aug. 2003, pp. 72–86.

�16