how structured data (linked data) help in big data ... · how structured data (linked data) ......

38
How structured data (Linked Data) help in Big Data Analysis --- Expand Patent Data with Linked Data Cloud Lishan Zhang Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2013-96 http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-96.html May 17, 2013

Upload: hoangdan

Post on 16-May-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

How structured data (Linked Data) help in Big Data

Analysis --- Expand Patent Data with Linked Data

Cloud

Lishan Zhang

Electrical Engineering and Computer SciencesUniversity of California at Berkeley

Technical Report No. UCB/EECS-2013-96

http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-96.html

May 17, 2013

Page 2: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

Copyright © 2013, by the author(s).All rights reserved.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission.

Page 3: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

 

       How  structured  data  (Linked  Data)  help  in  Big  Data  Analysis  

-­‐-­‐-­‐  Expand  Patent  Data  with  Linked  Data  Cloud    

 

 

 

M.Eng  Program  

Lishan  Zhang  

24106243  

 

Page 4: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  I  

   

Outline  

 

 

Abstract  .......................................................................................................................................  1  

Introduction  ..............................................................................................................................  2  

Literature  Review  ....................................................................................................................  6  

Unveil  the  underlying  information  among  Big  data  ..............................................................  6  

Previous  solutions  ..........................................................................................................................................  8  

Approaches  ........................................................................................................................................................  8  

Conclusion  .......................................................................................................................................................  10  

Methodology  ...........................................................................................................................  12  

SPARQL:  query  language  for  RDF  data  ....................................................................................  13  

SPARQL  Endpoint  query  ..............................................................................................................  14  

HTTP  request  ...................................................................................................................................  17  

User  Interface  design  ....................................................................................................................  17  

Discussion  ...............................................................................................................................  20  

Results  ................................................................................................................................................  20  

Explanation  of  Results  ................................................................................................................................  20  

What  is  different  ...........................................................................................................................................  22  

Limitation  of  this  approach  ......................................................................................................................  22  

Page 5: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  II  

Evaluation  .........................................................................................................................................  23  

User  Study  .......................................................................................................................................................  23  

Heuristic  Evaluation  ....................................................................................................................................  24  

Future  Work  .....................................................................................................................................  27  

Conclusions  or  Impact  Statement  ...................................................................................  29  

Bibliography  ...........................................................................................................................  30  

Appendix  .................................................................................................................................  32  

 

 

Page 6: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  1  

Abstract  

Big  Data  is  currently  a  big  topic  in  the  world.  It  is  a  commonly  used  term  to  describe  

data   that   exceeds   the  processing   capacity  of  on-­‐hand  database  management   tools.  

We  often  use  4V  (Volume,  Variety,  Velocity  and  Value)  to  describe  its  characteristics.  

Big  Data   can   be   structured   or   unstructured   data   that   has   potential   values   behind  

them.  It  is  of  vital  importance  to  extract  and  analysis  the  valuable  information  in  Big  

Data.      

On  the  other  hand,  Linked  Data  is  a  new  concept  for  most  of  the  people.  Linked  Data  

refers  to  the  collection  of  interrelated  datasets  that  can  be  publishing  and  sharing  on  

the  web.   Unlike  Big  Data,   Linked  Data   is   highly   structured.   It   is   used   to   build   the  

Semantic   Web   which   huge   amount   of   data   on   the   web   are   available   in   standard  

format.   The   technologies   enable   people   to   figure   out   more   advanced   analytical  

questions  by  querying  the  data  and  drawing  inferences  using  vocabularies.    

In  our  project,  we  would  like  to  explore  the  potential  use  of  Linked  Data  in  analyzing  

Big  Data.  We  will  build  a  search  engine  to  combine  information  in  Linked  Data  into  

these  Patent  Data  to  see  if  we  can  dig  out  more  information  of  each  patent.  There  is  

already  a  huge  Linked  Data  cloud  that  contains  a   large  amount  of  publishing  open  

data.  We  can  also  see  the  potential  to  connect  these  public  data  with  patent  data  to  

answer  advanced  questions.  When  we  search  for  inventor  name  or  certain  patent  in  

the   search   interface,   we   query   from   Linked   Data   Cloud   and   Patent   database  

separately  and  return  the  result.  In  this  way,  we  can  combine  the  patent  itself  with  

the  inventor  information  from  DBpedia.  

Page 7: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  2  

Introduction  

Nowadays,  we  are  generating  much  more  data  than  any  point  in  the  history.  

The   explosion   of   data   is   driven   from   two   particular   sources:   the   social   network  

sharing   information   about   our   activities   and   a   variety   of   sensors   collating  

information  on  our  environment.  [1]  

Needless   to   say,   there   could  be  priceless  value  hidden   in   this  booming  data.   If  we  

make  good  use  of   them,  we  may  gain  valuable   information  and  pattern   inside   the  

data.  However,  it  will  also  become  a  thread  if  we  cannot  handle  this  ever-­‐increasing  

amount  of  data.    

Big   Data   is   a   commonly   used   term   to   describe   data   that   exceeds   the   processing  

capacity   of   conventional   database   systems.   [2]  We   often   identified   Big   Data   with  

four  main  attributes:  Volume,  Velocity,  Variety  and  Value.  Big  Data  can  be  structured  

or   unstructured  data   that   has   potential   values   behind   them.  The  McKinsey  Global  

Institute   describes  Big  Data   as   “The   next   frontier   for   innovation,   competition   and  

productivity.”  [3]  But  processing  these  big  raw  datasets  pose  challenges  in  both  data  

management   and   algorithms.   It   is   of   vital   importance   to   extract   and   analysis   the  

valuable  information  in  Big  Data.      

The   major   difficulties   in   processing   Big   Data   include   capturing,   storage,   search,  

sharing,   analytics   and   visualizing.   [4]   There   are   already   several   approaches   to  

analyzing   Big   Data.   For   example,   MapReduce   is   a   programming   model   and   an  

implementation   for   processing   and   generating   large   data   sets.   It   runs   on   a   large  

cluster   of   machines   and   is   highly   scalable.   In   addition,   NoSQL   employed   non-­‐

relational   data   storage   systems   to   process   unstructured   and   semi-­‐structured   Big  

Page 8: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  3  

Data.  Some  institutes  and  companies  also  developed  their  own  mathematics  models  

and  algorithms  to  dig  out  useful  information  from  Big  Data.  

We  will  mainly   focus  on  variety  of  Data   in  this   thesis.  Variety  means  that  Big  Data  

has   different   types   of   data   and   various   degrees   of   structure   that   does   not   fit   into  

neat   relational   structures.   It   is   a   mix   of   structured,   semi-­‐structured   and  

unstructured   data   such   as   text,   sensor   data,   video,   log   files   and  more.   Those   data  

cannot  be  integrated  into  an  application  directly.  [2]  

The  current  approaches  for  Big  Data  emphasize  the  ability  to  deal  with  the  volume  

and  velocity  like  MapReduce  and  NoSQL.  In  the  paper,  we  are  trying  to  work  from  a  

different  approach.  We  are  concern  about  the  variety  of  Big  Data.  Since  most  data  is  

unstructured,   it   is   hard   to   interlink   different   datasets   and   create   valuable   context  

behind   that.  We   see   there  may   be   a   potential   value   to   link   different   datasets   and  

expend  the  value  of  the  sole  data  with  the  help  of  Linked  Data.    

Linked   Data   is   used   to   organize   and   publish   highly   structured   data  with   globally  

unique   identifiers,   which   make   it   easy   to   combine   various   datasets.   Richard  

Cyganiak   and   Anja   Jentzsch   created   Linked   Data   Diagram   of   the   Cloud   which  

describes  how  many  datasets  have  been  published  on  the  web.  [5]  The  Linked  Data  

cloud   is   growing   constantly,   data   integration   is   becoming  more   important   in   this  

field.    

Page 9: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  4  

 

Fig  1:  The  Linking  Open  Data  cloud  diagram  

 

In   this  paper,  we  are   trying   to   figure  out   the  potential  use   for   linked  data   into  Big  

Data   analysis   by   building   a   prototype   of   our   concepts.   We   are   using   U.S.   utility  

patent  dataset  and  linked  with  the  Public  Linked  Data  cloud.  We  will  build  a  search  

engine   for   Patent   Graph   search,   and   query   the   endpoint   from   Linked   Data  

Cloud  like  DBpedia  and  Freebase  and  simultaneous  query  the  SQL  data   from  

Patent  datasets   and   show   the   combined   results   in   the   interface.  The  diagram  

below  can  illustrate  the  querying  process:  

 

Page 10: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  5  

 

Fig  2:  The  querying  process  of  Patent  Search  Engine  

 

In  this  way  we  can  add  more  related  information  about  the  Patent  and  even  provide  

some  recommendations  for  Patent  search.  We  can  see  there  will  be  many  potential  

values  created  by  this  interconnection.  And  Linked  Data  would  definitely  be  valued  

later  in  Big  Data  Analysis.      

Page 11: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  6  

Literature  Review  

Unveil  the  underlying  information  among  Big  data  

Big  Data  has  become  one  of  the  hottest  topics  in  the  industry.    In  this  data  booming  

world,   some   traditional   technologies   can   no   longer   serve   the   need   to   analyze   the  

large  volume  of  data.    New  approaches  must  be  introduced  in  order  to  keep  up  with  

the  pace  of   the  Big  Data.    Linked  data  concept   is  a  useful  way   to  unveil   the  useful  

information,  especially  the  data  on  the  Internet.  

Big   Data   is   a   commonly-­‐used   term   to   describe   data   that   exceeds   the   processing  

capacity  of  conventional  database  systems.    We  are  generating  much  more  data  than  

before   with   the   booming   of   social   network   and   Media,   mobile   devices,   Internet  

Transactions  and  networked  devices  and  sensors.  

Big  Data  is  too  big,  too  fast  and  doesn’t  fit  the  conventional  database  architectures.    

Due  to  the  unique  nature  of  Big  Data,  the  first  question  we  need  to  answer  is  can  we  

find  an  alternative  way  to  process  the  data.    More   importantly,  can  we  dig  out   the  

useful  information  from  the  big  data?  

Big  data  requires  exceptional  technologies  to  efficiently  process   large  quantities  of  

data.  There  are  huge  amount  of  valuable  patterns  and  information  hidden  in  the  Big  

Data,  which   require   us   to   extract   them.   Usually,   there   are   four   problems  when   it  

comes  to  Big  data:    Volume,  Velocity,  Variety  and  Value  (4V)  [6]  .  

 

Page 12: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  7  

Volume  and  Velocity  

In   this  data  booming  world,   the  speed  of  data  growth   is  exponential.    Particularly,  

with  the  increasingly  popularity  of  social  media,  user  generated  content  has  started  

to  dominate.    For  example,  there  are  roughly  60  hours  of  video  uploaded  to  YouTube  

every   minute   [7].     It   is   also   astonishing   that   there   are   over   340   million   tweets  

generated  daily  in  May  2012  [8].    Just  to  make  this  more  visualizable,  the  amount  of  

information  in  the  world  doubles  every  five  years  [9].  There  is  more  information  in  

the  daily  edition  of  The  New  York  Times   than  an   individual  man  or  woman   in   the  

16th  Century  had  to  process  in  their  whole  lives.  

Huge   amount   of   data   requires   tremendous   storage   space   and   extremely   fast  

processing   speed   to   deal   with   the   data.     It   has   always   been   challenging   for   any  

company,  government  or  individual  to  deal  with  the  issue.  

 

Variety  and  Value  

Big   Data   relates   not   just   to   new   information   sources:   it’s   equally   applicable   for  

gaining  new  insights  from  data  that  was  previously  inaccessible  and  to  accelerating  

and  easing  existing  analytical  processes  [10].  In  fact,  most  big  data  is  low  value  until  

rolled  up  and  analyzed,  at  which  point  it  becomes  valuable.  

It   is   challenging   due   to   big   data’s   variety.     Big   data   has   different   structures   and  

shapes,   causing   it   very   difficult   to   analyze   with   traditional   technologies,   such   as  

MySQL   or   Oracle.     Integrating   these   data   sources   are   a   very   expensive   operation  

[11]   .     Plus,   correlating  different  pieces  of   data   and   reconnect   those  data   to  make  

Page 13: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  8  

them   more   valuable,   readable   and   accessible   has   always   been   an   interesting  

problem.  

 

Previous  solutions  

Previously,   there   are   several  ways   to   processing   and   analyzing   big   data.     Usually,  

they   utilize   advanced   hardware   and   parallel   processing   techniques   to   break   the  

speed  bottleneck.  Others  have  employed  non-­‐relational  data  storage  systems  to  deal  

with  unstructured  and  semi-­‐structured  big  data.    Meanwhile,  a  lot  of  companies  and  

have   been   trying   to   apply   unique   math   models,   advance   analytics   and   data  

visualization  technology  to  dig  the  insights  from  Bit  data.  

 

Approaches  

MapReduce  

MapReduce   is  a  breakthrough  concept  announced  by  Google.     It   is  a  programming  

model  and  an  implementation  for  processing  and  generating  large  data  sets  [12]  .  It  

is  able  to  run  on  a  large  cluster  of  machines  and  is  highly  scalable.  

MapReduce  is  not  only  successful  at  Google,  but   is  also  open-­‐sourced  to  the  public  

under   the   name   of   Hadoop,   a   highly   scalable   compute   and   storage   platform   [13].    

Hadoop  breaks  huge  chunk  of  data   into  pieces  and  process/analyze   it  at   the  same  

time.  

Page 14: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  9  

NoSQL  

NoSQL  was   a   database   that   did   not   expose   the   standard   SQL   interface   and   it  was  

first  used  by  Carol   Strozzi   [14].     It  works   in   conjunction  with  Hadoop   to   serve  up  

discrete  data  stored  among  large  volumes  of  multi-­‐structured  data  to  end-­‐user  and  

automated  Big  Data  applications  [15].  

 

Digging  useful  information  

Various   companies   have   taken   actions   to   dig   out   the   useful   information   from   the  

various  data  in  the  web.    For  example,  Splunk  is  a  small  company  that  has  been  in  

the  business  for  less  than  5  years.    Splunk’s  mission  is  to  make  ambiguous  big  data  

more  readable,  useful  and  valuable   to  everyone.    For  example,  one  of   its  partners,  

Amazon,  is  asking  Splunk  to  find  out  the  habits  of  their  customers.  

Another   company,   Jive,   is   a   software   company   in   the   social   business   software  

industry.    It  is  also  trying  to  help  its  customers  to  consolidate  the  big  data  they  are  

dealing   with.     One   of   the   example   data   is   the   price   information   of   all   the  

merchandise:    what  price  should  be  set  in  order  to  be  the  best  price.  

 

Downsides  

However,   all   of   these   approaches   are   not   perfect.     For   example,  Hadoop   is   a   very  

young  technology  and  still  developing.    It  is  very  hard  to  manage  the  Hadoop  system  

and  it  does  not  support  real-­‐time  data  processing  and  analysis.  

Page 15: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  10  

NoSQL,   on   the   other   hand,   is   that  most  NoSQL   databases   traded  ACID   (atomicity,  

consistency,   isolation,   durability)   compliance   for   performance   and   scalability.     It  

also  suffers  from  its  ‘youth’:    no  mature  management  and  monitoring  tools.  

 

Conclusion  

Key  results  

Big   Data   holds   tremendous   value   and   it   will   be   beneficial   to   understand   what   it  

really  means.    Many  new   technologies,   such  as  MapReduce  and  NoSQL,  have  been  

applied  to  solve  this  issue.    However,  it  is  never  safe  to  say  that  we  already  have  the  

perfect   tools   for   this   job.     As   the   data   continues   to   boom   exponentially,   new  

technology   such   as   Linked   data   will   definitely   be   the   key   to   the   next-­‐generation  

analytics  platform  and  data  management  system.  

Shortcomings  

Linked   data   applications   usually   follow   different   architectures   and   pattern.     For  

instance,  one  pattern  will  require  the  data  to  be  replicated  so  that  the  applications  

may   work   with   stale   data.     Another   pattern,   named   On-­‐The-­‐Fly   Dereferencing  

Pattern  works  very  slowly  when  dealing  with  complex  operations.  

 

Additional  Work  

Because   Linked   data   is   a   newly   raised   concept   and   it   is   still   undergoing   a   lot   of  

improvements,   some  of   the  disadvantages  will   be  nullified   in   soon.    However,   the  

Page 16: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  11  

fact  that  we  are  in  a  data-­‐exploding  era  cannot  be  reverted.    More  and  more  data  are  

coming   to  us  and   the   technology  must  keep  evolving   in  order   to  keep  up  with   the  

pace.  

Artificial   intelligence   can  be   applied  when  dealing  with  Big  Data.    A   ‘databot’   that  

can  crawl  the  Linked  data,  infer  relationships,  and  figure  out  what  information  can  

be  extracted  will  definitely  be  useful.  

   

Page 17: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  12  

Methodology    

For  this  thesis,  we  are  building  a  use  case  in  order  to  figure  out  the  potential  use  for  

Linked  Data  into  Patent  Data.  More  specifically,  we  will  build  a  search  engine  and  we  

named   it   “Patent   Graph”.   So   when   people   type   a   certain   patent   number   or   the  

inventor,  we  can  show  them  the  relevant  information  such  as  the  picture  of  inventor,  

his  workplace,  alma  mater,  doctoral  advisor  and  the  biography.  This  information  is  

obtained   from   DBpedia,   which   is   a   structured   data   format   from   Wikipedia.   And  

DBpedia  makes  this  information  available  on  the  web  so  that  people  can  easily  link  

to   the   data.   Besides,   we   will   also   make   new   search   around   the   result   simply   by  

clicking  the  related  information  on  the  page.  For  example,   if  we  are  interested  in  a  

co-­‐worker  or   the  advisor   in   the  patent   that  we  search,  we  can   just   click   the  name  

and  then  will  return  a  new  search  around  the  person  and  his  patents.  In  addition,  we  

can  provide  recommendations  based  on  the  searching  results.  If  time  allows,  we  will  

also  be  willing  to  convert  the  Patent  Data  into  RDF  format  and  publish  on  the  web  

then  more   people   can   benefit   from   that.   In   this   way,   the   Linked   Data   help   us   to  

analysis   the   Patent   Data   by   expanding   our   patent   datasets   with   related   data   and  

finding  more  useful  information.  

The  Patent  Data   that  we  use   is   the  Patent   Inventor  Database   from  Fung   institute.  

The  database  disambiguated  all  inventor  names  from  the  U.S.  utility  patent  database  

from  1979   to  2010.  And   the  Linked  Data  we  use   is  DBpedia.  The  DBpedia  dataset  

extract  structured  content  from  the  information  created  by  Wikipedia  and  it  can  be  

accessed  online  through  SPARQL  query  endpoint.  

 

Page 18: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  13  

Since  we  are  building  a  search  engine  to  extract  the  information  from  both  Linked  

Data  Cloud  and  relational  database,  we  are  building  a  web  service  based  on  that  and  

we  use  a  Model-­‐View-­‐Controller  (MVC)  software  architecture.    

My   part   of   work   includes   implementing   the   search   interface   and   query   from   the  

Linked  Data  Cloud.  The  techniques   involve  SPARQL  endpoint  query,  HTTP  request  

and  User  Interface  design.    

 

SPARQL:  query  language  for  RDF  data  

Resource  Description  Framework  (RDF)  is  a  directed,  labeled  graph  data  format  to  

describe   resources   on   the   web.   It   is   designed   to   be   read   and   understand   by  

computer  rather  than  people.  Most  RDF  documents  are  written   in  XML,  which  can  

easily  be  exchanged  between  different  computers  and  platforms.  The  RDF  language  

is  also  a  part  of   “The  Semantic  Web”.  Semantic  Web   is  a  set  of  standards  and  best  

practices   for   sharing  data   and   the   semantics   of   that  data  over   the  web   for  use  by  

application.   [16]   Rather   than   just   putting   data   on   the   web,   the   Semantic   Web   is  

about  making  links  so  that  a  person  or  machine  can  explore  the  web  of  data.  [17]  

We  define  RDF  statement  as  a  triple  of  the  form  (Subject,  Predicate,  Object)  and  uses  

uniform   resource   identifiers   (URIs)   to   name   the   data   objects.   For   example,   if   we  

need   to   express   “Tom   is   a   man”,   we   should   represent   as   Tom(Subject),  

sex(Predicate),  man(Object).  The  data  stored  in  Linked  Data  Cloud  is  RDF  data.  

SPARQL  stays  for  SPARQL  Protocol  and  RDF  Query  Language.  SPARQL  is  a  standard  

query  language  designed  for  querying  RDF  databases.  There  are  four  different  forms  

of  query  in  SPARQL:  SELECT,  ASK,  DESCRIBE  and  CONSTRUCT  and  we  use  SELECT  

Page 19: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  14  

form  most   of   the   time.   [18]The  main   idea  of   SPARQL   is   pattern  matching.   So   it   is  

easily  traverse  relationship  by  querying  collections  of  triples.  The  syntax  of  SPARQL  

is  quite  similar  to  SQL.  A  simple  SPARQL  query  example  can  be  as  follow:  

PREFIX dbont: <http://dbpedia.org/ontology/>

SELECT ?musician ?place

WHERE {

?musician dbont:birthPlace ?place .

}

 

First  we  need   to   initiate  a  namespace.   In   this  case   is  http://dbpedia.org/ontology.  

And  we  find  all  the  musicians  and  their  birth  places  as  place  and  return.  The  partial  

result   is   showed   below.   We   can   type   the   SPARQL   query   example   in   DBpedia  

endpoint  to  get  the  full  list.  

musician place http://dbpedia.org/resource/Federico_Garc%C3%ADa_Lorca http://dbpedia.org/resource/Andalusia

http://dbpedia.org/resource/Trinidad_Jim%C3%A9nez http://dbpedia.org/resource/Andalusia

http://dbpedia.org/resource/Ibn_Tufail http://dbpedia.org/resource/Andalusia

http://dbpedia.org/resource/Fran_Perea http://dbpedia.org/resource/Andalusia

http://dbpedia.org/resource/Ver%C3%B3nica_S%C3%A1nchez http://dbpedia.org/resource/Andalusia

http://dbpedia.org/resource/Berni_Rodr%C3%ADguez http://dbpedia.org/resource/Andalusia

http://dbpedia.org/resource/Jos%C3%A9_Celestino_Mutis http://dbpedia.org/resource/Andalusia

http://dbpedia.org/resource/Pepe_Marchena http://dbpedia.org/resource/Andalusia

http://dbpedia.org/resource/Antonio_de_Olivares http://dbpedia.org/resource/Andalusia

http://dbpedia.org/resource/Tanya_Anne_Crosby http://dbpedia.org/resource/Andalusia

 

SPARQL  Endpoint  query  

Endpoint  is  an  association  between  a  fully  specified  Interface  Binding  and  a  network  

address,   specified   by   a  URI.   It   is   used   to   communicate  with   an   instance   of   a  Web  

Service.  An  endpoint  indicates  a  specific  location  for  accessing  a  Web  Service  using  a  

Page 20: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  15  

specific  protocol  and  data  format.  [19]  A  SPARQL  endpoint  enables  users  to  query  a  

knowledge  base  via   the  SPARQL  language.  Results  are  typically  returned   in  one  or  

more   machine-­‐processable   formats   like   HTML.   For   simplicity,   we   can   say   that   a  

SPARQL  endpoint  is  the  place  you  send  your  SPARQL  query  and  receive  the  result.  

The  commonly  used  SPARQL  Endpoints  are  lists  below  (SparqlEndpoints,  2013):    

 Data  Source   Endpoint  Address  

DBpedia   http://dbpedia.org/sparql  

U.S.  Census   http://www.rdfabout.com/sparql  

FactForge   http://factforge.net/sparql  

data.gov.uk   http://data.gov.uk/sparql  

 In   our   project,  we   need   to   query   the   bio   information   of   the   patent   inventor   from  

DBpedia  through  SPARQL  endpoint  query.  The  information  of  a  certain  person  is  the  

same  as  we  often  see  in  Wikipedia,  but  it  is  in  a  different  format.  For  example,  as  for  

our  professor  David  A.  Patterson,  the  Wikipedia  page  and  DBpedia  page  are  showed  

as   below.   We   can   see   they   have   quite   different   representation   of   the   same  

information.   In  DBpedia,  data   is  machine-­‐readable.  We  can  get   the  value   from   the  

property  on  the  left  side.  We  just  need  to  select  the  properties  we  need  in  SPARQL  

query  and  can  get  the  corresponding  values  more  convenient.  

Page 21: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  16  

 Fig  3:  Screenshot  of  an  example  of  Wikipedia  

   

   Fig  4:  Screenshot  of  an  example  of  DBpedia  

   

Page 22: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  17  

HTTP  request  

The  Hypertext  Transfer  Protocol   (HTTP)  can  work  as  a   request-­‐response  protocol  

between   a   client   and   server.   An   HTTP   request   consists   of   a   request   method,   a  

request  URL,  header  fields  and  a  body.  The  request  methods  are  GET,  HEAD,  POST,  

PUT,   DELETE,   OPTIONS,   TRACE.   [20]   The   two   commonly   used   HTTP   request  

methods   are  GET   and  POST.  While   these   two  methods  have   similar   function,  GET  

emphasizes  requests  data  from  a  specified  resource  while  POST  submits  data  to  be  

processed  to  a  specified  resource.  We  use  POST  method  here  to  avoid  caching.  

In  our   case,   the   client   is   the  Search   Interface   that   submits   an  HTTP   request  using  

JavaScript  to  the  server  endpoint  with  the  SPARQL  query.  Then  the  server  returns  a  

response  to  the  client.  The  response  contains  status  and  content  information  about  

the  request.  Consider  that  JavaScript  is  not  good  at  dealing  with  RDF  data;  we  set  the  

return  format  as  json  format.  

 

User  Interface  design  

The  User  Interface  (UI)  design  for  our  prototype  is  simple  and  clean.  It  looks  like  a  

simplified  Wikipedia.  We  query   from  both   the  Patent  Data  and  Linked  Data  Cloud  

and   display   the   output   in   the   interface.   The   structure   of   the  User   Interface   is   the  

Patent   information  surrounded  by  some   information  of   the   inventor  of   the  patent.  

We  can  see  the  screenshot  as  below:    

Page 23: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  18  

 

 Fig  5:  Screenshot  of  Paten  Search  Interface  

   The   left   side   contains   the   basic   information   including   his   profile   picture,  working  

place,  Alma  Mater  and  Doctoral  Advisor.  The  upper  right  side  is  a  biography  of  the  

inventor.  Then   followed  his  patent   information  got   from  relational  database.   If  we  

click  the  link  in  the  left  side,  it  can  lead  us  to  the  certain  Wikipedia  page  to  get  more  

information.   The  UI   design   emphasizes   the   Patent   part  while   putting   the   relevant  

information  surrounded.  

 

The  procedure    

The  procedure  works  as  below:  

On   the   client-­‐side,   when   people   search   a   keyword,   a   HTTP   request   message   will  

send  to  the  DBpedia  web  server.  We  write  a  wrapper  class  “SPARQLWrapper.js”  in  

JavaScript  that  is  similar  to  SPARQL  Endpoint  interface  to  Python.  [21]  

Page 24: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  19  

The  SPARQL  endpoint  query  is  http://dbpdia.org/sparql.  We  send  the  request  with  

searched  title  and  some  properties  like  abstract,  workplaces  and  so  on  to  the  server  

endpoint.   But   it  will   return   html   page,  which   is   not  what  we   need.   So  we   set   the  

accept   field   in   Request   Header   to   identify   the   return   data   type.   Here  we   need   to  

return  json  format.  We  use  GET  and  POST  methods  to  send  the  SPARQL.  

The  web  server  then  will  provide  resources  and  return  a  response  message  to  the  

client.  The  response  message  is  read  by  JavaScript  and  write  into  html  and  display  in  

the  User  Interface.  

For  the  Patent  Data  part,  we  have  potentially  two  main  approaches.  One  approach  is  

to   use   the   Patent   Data   as   the   relational   database   and   query   the   data   from   local  

database.  And  the  other  approach  is  to  convert  it  to  RDF  format  and  store  it  in  triple  

store   or   even   publish   on   the  web.   The   first   approach   is   efficient   because  we   just  

need   to   obtain   the   Patent   information   from   the   search   keyword.   It   is   quite  

convenient   to   use   relational   database.   The   bottleneck  would   be   how   to   store   the  

data.  The  whole  dataset  could  be  saved  locally  or  upload  in  Google  Datastore.  

The  second  approach   is  more  complex  because  we  need   to  pre-­‐process   the  whole  

dataset   and   convert   to   RDF   format.   Since   the   Patent   Data   is   quite   large,   many  

existing   tools   like   Google   Refine   cannot   hold   such   a   large   amount   of   data.   The  

advantage  for  the  second  approach  is  that  the  Patent  Data  can  interlink  with  other  

Linked  Data  and  make  Patent  Data  more  available.    

Since   the   large   amount   of   Data   is   always   a   problem,   we   will   begin   from   a   small  

subset  and  go   from  there.  For  example,  we  can  use   the  Patent  Data   from  Berkeley  

Professor  first.  

Page 25: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  20  

Discussion  

In  this  section,  I  will  main  discuss  the  use  case  that  we  bring  Linked  Data  in  Patent  

Data  search.  Also  I  will  talk  about  how  linked  Data  helped  in  patent  search,  what  is  

the   limitation  and  how   linked  data  can  be  used   in  broader  context.   I  also  evaluate  

the  User  Interface  of  the  search  interface  and  test  with  real  users.  

 

Results        

 

Explanation  of  Results  

For  our  Capstone  Project,  we  would  like  to  explore  the  potential  use  of  Linked  Data  

to  help  Big  Data  Analysis.  And  thus  we  are  building  a  patent  search  engine  based  on  

these   two  concepts.  Linked  Data  has  many  advantages   like  highly  structured  data,  

machine-­‐readable   and   interlinked   between   different   data   sources.   So   we   take  

advantage   of   the   structured   data   format   of   Linked   Data   and   use   it   to   expand   the  

search   result   for   patent   and   add  more   values   to   it.   Basically  we   have   proven   the  

hypothesis   that   Linked   Data   works   in   this   situation   and   it   will   have   many   other  

implications.  

 

Here  is  our  User  Interface  for  after  searching  for  a  certain  patent:  

Page 26: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  21  

 

Fig  6:  Screenshot  of  Paten  Search  Result    

From   the   screenshot  we   can   easily   see   that   it   has   association   information   adding  

into   the   patent   search   result.   Here  we   add   some  wiki   information   for   the   certain  

inventor.  In  this  way  user  can  easily  distinguish  the  exact  inventor  by  looking  at  the  

biography   or   some   related   information   like  work   place,   alma  mater   and   doctoral  

advisor.  It  will  help  in  disambiguation  for  patents  since  there  will  be  a  large  amount  

of  people  with  the  same  name  but  work  in  different  areas  and  have  totally  different  

patents.  

Besides,   users   can   also   search   for   the   patents   for   the   coworkers   by   clicking   their  

names   in   the  page.  Or   if   the  users   are   interested   in   the  workplace  or   alma  mater,  

they  can  also   just   click   the   link  and   it  will   lead   them  to   the  Wikipedia  page  of   the  

certain  item.  

Page 27: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  22  

With  the  help  of  Open  Linked  Data,  we  have  a  new  kind  of  patent  association  search  

that  disambiguation  the  patent  search  and  provide  a  broader  context  of  the  patent  

related  information.    

 

What  is  different  

We  have  many   some   changes   compare   to   our   initial   ideas   in   our   implementation.  

First  for  the  patent  data,  we  retain  its  format  as  relational  database  and  query  with  

SQL   rather   than   converting   it   into  RDF   format.   Actually  we  have  worked   in   some  

small   prototype   to   convert   the   data   using   Google   Refine.   But   it   becomes   really  

complex  when  we  use  a  large  amount  of  data.  And  it  is  not  necessary  to  covert  data  

format  in  our  use  case.  So  we  decided  to  query  the  relational  database  directly  and  

combine  the  result  with  inventor  information  from  Linked  Data.  

Also  we  decide   to  put   the  patent  data   locally   and  use  PHP   to  query   the   relational  

database  and  send  back  to  client  side  with  json  format.  We  find  out  this  is  the  most  

efficient  way  of  doing  that  at  this  stage.  If  time  permits,  we  would  probably  put  them  

in  the  cloud  server  so  that  we  can  run  the  search  engine  remotely.  

 

Limitation  of  this  approach    There  are  also  some  limitations  of  our  patent  search.    

Firstly,  we  are  assuming  that  the  inventor  would  have  a  Wikipedia  page  so  that  we  

can  find  the  corresponding  information  in  DBpedia.  However,  this  would  not  also  be  

the  case.  Although  more  people  get  their  own  page  in  Wikipedia,  there  would  not  be  

Page 28: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  23  

all  the  people  who  held  their  patents.  In  such  case,  we  won’t  find  their  information  

from  the  Linked  Data  Cloud  and  it  would  cause  a  problem.  

Secondly,  the  user  will  need  to  type  the  full  name  of  the  inventor  in  order  to  match  

the   name   in   DBpedia   and   the   inventor   name   in   patent   database.   Compare   with  

Google   Patent   Search,   it   is   kind   of   limited   because   Google   can   find   us   a   lot   of  

information  based  on  selection  rank  even  if  we  didn’t  type  the  full  name.    

Thirdly,   we   are   using   patent   data   as   its   original   format   and   run   two   queries   to  

search  from  DBpedia  and  relational  database.  It  doesn’t  make  the  best  use  of  Linked  

Data  because  the  advantage  of  Linked  Data  over  other  format  is  that  it  is  in  the  same  

format  and  different  datasets  can  be  interlinked  together.  Later  it  would  be  better  if  

we  can  actually  convert  the  patent  data  into  RDF  format  and  even  publish  the  data  

into   Open   Linked   Data   Cloud.   In   this   way,   the   patent   data   would   have   been  

interlinked  with  all   the  other  data  source   in  the  cloud  and  make  use  of   the  Linked  

Data  concept  better.    

 

Evaluation  

 In   the  evaluation  part,   I  will  mainly  discuss   the  User   Interface  we  build   for  patent  

search  and  the  effectiveness  and  convenience  of  search  experience  for  real  users.  

 

User  Study  

We  have   asked   some   people   in   different   areas   to   do   the   usability   experiments   to  

experience  the  search  engine  and  made  some  changes  based  on  their  feedback.    

Page 29: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  24  

Most  of   them  think  that  the  patent  association  search  result   is  better  comparing   it  

with  the  traditional  approach.  They  often  encounter  the  problem  whether  they  get  

the  right  one  when  they  search  for  patents.  With  our  prototype  they  can  easily  get  

the   information   of   the   inventor   and   therefore   get   correct   and   comprehensive  

understanding  of  the  information  they  retrieve.  

They  thinks  that  our  patent  search  has  clear  output  with  the  associate  information  

and   it   can   also   run   relevant   search.   But   they   also   point   out   the   limitation   of   the  

approach.  We  can  only  have  basic   information   for   the  patent   itself.   If  users  would  

like  to  know  about  some  details  of  the  patent  itself,  we  cannot  provide  that  because  

we  don’t  have  that  information  in  Patent  Database.  

 

Heuristic  Evaluation  

 We  examine  our  User  Interface  with  the  famous  10  Usability  Heuristics  introduced  

by   Jakob   Nielsen.   It   is   a   usability   engineering   method   for   finding   the   usability  

problems  in  a  user  interface  design.  [22]  We  have  a  small  set  of  evaluators  examine  

the   interface   with   the   recognized   usability   principles   with   point   one   to   ten   and  

combine  the  result  of  evaluation.  

We  asked  our  users  to  go  through  a  set  of  tasks  we  designed  in  our  search  interface  

and  provide  evaluators  with   the  goals  of   the  system  and  allowed  them  to  do   their  

own  tasks.  After  that,  they  filled  out  the  sheet  of  Heuristic  Evaluation.  

 

The  Heuristic  Evaluation  Sheet  is  designed  as  followed:  

Page 30: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  25  

Heuristic  Evaluation  principles   Points  (1-­‐10)  

Comments  

Visibility of system status    

Match between system and the real world    

User control and freedom    

Consistency and standards    

Error prevention    

Recognition rather than recall    

Flexibility and efficiency of use    

Aesthetic and minimalist design    

Help users recognize, diagnose, and recover from errors

   

Help and documentation    

 

We  analyzed  the  results  the  real  users  provides  and  explained  the  evaluation  result.  

The  principle  got  Good   if   the  average  point   is  more   than  6  out  of  10,  otherwise   it  

need  to  improve.    

 (1).  Visibility  of  system  status:  Good  (8.7)  

Our  interface  has  clear  layout  and  different  components  will  not  combine  together  

when  it  shows.    User  can  easily  see  if  they  have  obtained  the  search  result  and  how  

the  information  likes.  

(2).  Match  between  system  and  the  real  world:  Good  (8.2)  

The  interface  is  kind  of   like  the  Wikipedia  format  to  show  the  bio-­‐information  and  

put  the  patent  in  the  front  of  the  page  so  that  it  is  easy  to  understand.  

Page 31: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  26  

(3).  User  control  and  freedom:  Good  (7.1)  

Users  can  search  new  patent  by  using  the  textbox  in  the  upper  left  corner  or  simply  

click  the  information  in  the  page.    

(4).  Consistency  and  standards:  Need  to  improve  (5.8)  

For  the  search  textbox,  we  can  only  do  search  for  the  existing  patents  number  and  

some  inventor  information.  So  user  may  get  confused  about  what  they  should  enter  

at  first.  

(5).  Error  prevention:  Need  to  improve  (5.0)  

We   don’t   build   the   function   for   auto-­‐completion   or   auto-­‐correction   so   that   users  

need  to  type  correctly  in  order  to  get  the  result.  

(6).  Recognition  rather  than  recall:  Good  (7.5)  

We   have   minimized   the   user’s   memory   load   by   making   the   objects   and   actions  

visible.  Users  don’t  have  to  remember  information  but  can  just  click  in  the  old  result.  

(7).  Flexibility  and  efficiency  of  use:  Good  (7.2)  

The  differences  between  novice  user  and  expert  user  will  not  be  huge  because  there  

are  no  complicated  actions  needed  for  the  search  feature.  

(8).  Aesthetic  and  minimalist  design:  Good  (6.8)  

The   interface   contains   the  most   relevant   and   needed   information   and   diminishes  

the  extra  information  with  low  visibility.  

Page 32: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  27  

(9).  Help  users  recognize,  diagnose,  and  recover  from  errors:  Need  to  improve  

(5.7)  

If  users   type  some  names   that  does  not  exist   in   the  Wikipedia  or   they  make  some  

typo,  there  is  no  error  messages  to  indicate  the  problem  precisely.  

(10).  Help  and  documentation:  Need  to  improve  (5.5)  

We  actually  didn’t   implement  the  documentation  part   to  help  user  understand  the  

functionality   of   the   search   engine.   Normally   people   will   understand   because   the  

interface  looks  like  all  the  other  search  engine.  

 

Future  Work      

Enriching  the  functionality  of  the  Patent  Search  

Now  we  only  focus  on  how  to  combine  the  Linked  Data  and  relational  Data  together  

to  make   the  patent   search  more  convenient.   So  we  only  use  a   limited   information  

collected  from  only  one  source  of  Open  Linked  Data  Cloud.   In   fact,   there  are  many  

more  things  we  can  do  to  enrich  the  functionality  of  the  Patent  Search.  For  example,  

we  can  obtain  the  geo  information  in  the  Patent  Data  and  do  some  visualization  of  

from  the  Geo  Names  Data   from  Linked  Data  Cloud.  Or  we  can  even  visualize  some  

Patent   Search   Graph   to   show   the   relationships   between   different   inventors   and  

their  patents  more  explicit.  

Page 33: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  28  

Querying  a  Collection  of  Datasets  in  Linked  Data  

We   query   data   from   only   DBpedia   for   this   project.   But   since   Linked   Data   is  

interlinked,   we   may   be   able   to   query   a   collection   of   datasets   using   an   existing  

SPARQL   endpoint   and   access   to   a   set   of   copies   of   relevant   dataset.   For   example,  

OpenLink  SW  has  a  majority  of  dataset  from  the  LOD  cloud  using  SPARQL  endpoint.  

[23]  

 

Applying  the  concept  to  other  topics  

Currently  we  apply   the  patent  data  with   the  DBpedia   in  Linked  Data  Cloud.  There  

are   many   other   sources   in   Linked   Data   Cloud   we   may   use   like   Geo   Names   data,  

IMDB  data,  BBC  music  and  so  on.  We  may  make  use  of  these  sources  and  find  other  

available  applications.  For  example,  we  can  search  for  a  certain  music  singer  and  get  

the  relevant  biographical  information  along  with  their  albums  and  songs  in  different  

data  sources.    

 

                                                                                                     

   

Page 34: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  29  

Conclusions  or  Impact  Statement                          

For   our   capstone   project,   it   is   a   research   project   to   explore   the   potential   use   of  

Linked  Data  into  Big  Data.  We  have  do  some  research  about  Big  Data,  knowing  the  

existing  approaches  to  analysis  Big  Data  and  their  strength  and  weakness.  And  we  

figured  out  that  the  highly  structured  Linked  Data  might  be  a  potential  solution  for  

unstructured  Big  Data  analytics  and  dig  out  more  values  behind  the  Big  Data.  

Based  on  that,  we  are  building  a  search  engine  to  describe  how  Linked  Data  help  in  

Big  Data  Analysis  by  expanding  the  Patent  Data  with  the  Open  Linked  Data  Cloud.  In  

this  way,  we  may  be  able  to  find  out  the  patent  association  information  through  the  

Linked   Data   Cloud   and   combine   with   the   patent   search   to   get   a   comprehensive  

answer.  

Although  we  have   learned  a   lot  about   the  mechanism  of  Linked  Data  and  use   it   in  

our  prototype,  there  is  something  remains  to  be  learned.  For  example,  we  just  query  

from   a   single   sources   from   Linked   Data   Cloud,   we   may   explore   multiply   queries  

from   different   sources   or   directly   convert   the   Patent   Data   into   RDF   format   and  

publish  it  in  the  Linked  Data  cloud.    

The  strength  for  Linked  Data  is  its  structured  and  uniform  format  that  information  

can   be   shared   among   different   datasets   and   it   can   be   read   automatically   by  

computers.   Yet   we   still   need   to   figure   out   the   drawbacks   like   complicated   pre-­‐

processing  procedures  and  the  way  to  protect  the  available  data  in  the  web.  

Our  prototype  has  proven  that  linked  data  has  many  advantages  and  can  be  used  in  

data  analysis  in  different  situations.  We  can  see  a  bright  future  for  making  better  use  

of  linked  data  and  semantic  web  to  help  in  big  data  analysis.  

Page 35: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  30  

Bibliography    1.  Ian  Mitchell,  Mark  Wilson.  Linked  Data:  Connecting  and  exploiting  big  data.  London  :  Fujitsu  UK,  2012.  

2.  Dumbill,  Edd.  What  is  big  data?  An  introduction  to  the  big  data  langscape.  [Online]  January  11,  2012.  http://strata.oreilly.com/2012/01/what-­‐is-­‐big-­‐data.html.  

3.  James  Manyika,  Michael  Chui,  Brad  Borwn,  Jacques  Bughin,  Richard  Dobbs,  Charles  Roxburgh,  Angela  Hung  Byers.  Big  Data:  The  next  frontier  for  innovation,  competition,  and  productivity.  s.l.  :  McKinsey  Global  Institute,  2011.  http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation.  

4.  Roebuck,  Kevin.  Big  Data:  High-­‐impact  Strategies  –  What  You  Need  to  Know:  Definitions,  Adoptions,  Impact,  Benefits,  Maturity,  Vendors.  s.l.  :  Lightning  Source  Incorporated,  2011.  

5.  Richard  Cyganiak,  Anja  Jentzsch.  Linking  Open  Data  cloud  diagram.  [Online]  2011.  http://lod-­‐cloud.net/.  

6.  Hopkins,  Brian  and  Evelson,  Boris.  Expand  Your  Digital  Horizon  with  Big  Data  .  s.l.  :  Forrester  ,  2011.  

7.  Oreskovic,  Alexei.  YouTube,  Google  Inc's  video  website,  is  streaming  4  billion  online  videos  every  day,  a  25  percent  increase  in  the  past  eight  months,  according  to  the  company.  [Online]  Jan.  23,  2012.  [Cited:  Nov.  30,  2012.]  http://www.reuters.com/article/2012/01/23/us-­‐google-­‐youtube-­‐idUSTRE80M0TS20120123.  

8.  twittersearch.  The  Engineering  Behind  Twitter’s  New  Search  Experience.  [Online]  May  31,  2011.  [Cited:  Nov  30,  2012.]  http://engineering.twitter.com/2011/05/engineering-­‐behind-­‐twitters-­‐new-­‐search.html.  

9.  O'Brien,  Kevin.  Why  Media  Literacy?  A  Catholic  Reflection.  [Online]  [Cited:  Nov.  30,  2012.]  http://www.medialit.org/reading-­‐room/why-­‐media-­‐literacy-­‐catholic-­‐reflection.  

10.  IDC  European  Software  Predictions.  Woodward,  Alys,  et  al.  2012,  IDC.  

11.  IDC  Worldwide  Big  Data  Taxonomy  .  Woo,  Benjamin,  et  al.  2011.  

12.  Ghemawat,  Jeffrey  Dean  and  Sanjay.  2004,  OSDI,  p.  13.  

13.  Jablonski,  Joey.  Introduction  to  Hadoop.  Fremont  :  Dell  Inc.,  2011.  

14.  ADAM  LITH,  JAKOB  MATTSSON.  Investigating  storage  solutions  for  large  data.  Goteborg  :  Chalmers  University  of  Technology,  2010.  

15.  Kelly,  Jeff.  Big  Data:  Hadoop,  Business  Analytics  and  Beyond.  Nov.  8,  2012.  

Page 36: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  31  

16.  DuCharme,  Bob.  Learning  SPARQL.  s.l.  :  O'REILLY,  2011.  

17.  Berners-­‐Lee,  Tim.  Linked  Data  Design  Issues.  [Online]  06  18,  2009.  http://www.w3.org/DesignIssues/LinkedData.html.  

18.  Matthews,  Andrew.  Understanding  SPARQL.  [Online]  2008.  http://www.ibm.com/developerworks/xml/tutorials/x-­‐sparql/section3.html.  

19.  SPARQL  endpoint.  [Online]  2011.  http://semanticweb.org/wiki/SPARQL_endpoint.  

20.  HTTP  Requests.  [Online]  http://docs.oracle.com/javaee/1.4/tutorial/doc/HTTP2.html.  

21.  Ivan  Herman,  Sergio  Fernandez,  Carlos  Tejo.  SPARQL  Endpoint  interface  to  Python.  [Online]  2008.  http://sparql-­‐wrapper.sourceforge.net/.  

22.  Nielsen,  Jakob.  10  Usability  Heuristics  for  User  Interface  Design.  [Online]  1995.  http://www.nngroup.com/articles/ten-­‐usability-­‐heuristics/.  

23.  Hartig,  Olaf.  Querying  Linked  Data  with  SPARQL.  [Online]  2009.  http://www.slideshare.net/olafhartig/querying-­‐linked-­‐data-­‐with-­‐sparql.  

24.  Public  Data  Sets  on  AWS.  [Online]  http://aws.amazon.com/publicdatasets.  

25.  SparqlEndpoints.  [Online]  2013.  http://esw.w3.org/topic/SparqlEndpoints.  

 

     

Page 37: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  32  

Appendix    Here  I  will  list  some  code  snippets  described  in  methodology.    SPARQLWrapper.js    (function(root,  factory)  {     if(typeof  define  ===  "function"){       define("SPARQLWrapper",  factory);   //  AMD  ||  CMD     }else{       root.SPARQLWrapper  =  factory();   //  <script>     }  }(this,  function(){  'use  strict'    function  SPARQLWrapper(endpoint){     this.endpoint  =  endpoint;     this.queryPart  =  "";     this.type  =  "json";  }  SPARQLWrapper.prototype  =  {     constructor:  SPARQLWrapper,     setQuery:  function(query){       this.queryPart  =  "query="  +  encodeURI(query);     },     setType:  function(type){       this.type  =  type.toLowerCase();     },     query:  function(type,  callback){       callback  =  callback  ===  undefined  ?  type  :  this.setType(type)  ||  callback;             var  xhr  =  new  XMLHttpRequest();       xhr.open('POST',  this.endpoint,  true);       xhr.setRequestHeader('Content-­‐type',  'application/x-­‐www-­‐form-­‐urlencoded');       switch(this.type){         case  "json":           type  =  "application/sparql-­‐results+json";           break;         case  "xml":           type  =  "application/sparql-­‐results+xml";           break;         case  "html":           type  =  "text/html";  

Page 38: How structured data (Linked Data) help in Big Data ... · How structured data (Linked Data) ... information!on!our!environment.![1]! ... analytics!platformand!data!management!system.!

  33  

        break;         default:           type  =  "application/sparql-­‐results+json";           break;       }       xhr.setRequestHeader("Accept",  type);       xhr.onreadystatechange  =  function(){         if(xhr.readyState  ==  4){           var  sta  =  xhr.status;           if(sta  ==  200  ||  sta  ==  304){             callback(xhr.responseText);           }else{             console  &&  console.error("Sparql  query  error:  "  +  xhr.status  +  "  "  +  xhr.responseText);           }                 window.setTimeout(function(){             xhr.onreadystatechange=  new  Function();             xhr  =  null;           },0);         }       }             xhr.send(this.queryPart);     }  }      return  SPARQLWrapper;    }));