big data brighton | big data in academia | jan 2013

25
January 2013 at University of Brighton http://meetup.com/Big-Data-Brighton

Upload: big-data-brighton

Post on 05-Jul-2015

478 views

Category:

Technology


2 download

DESCRIPTION

Four talks about Big Data in Academia at Big Data Brighton Jan 2013. Two of the talks' slides are here. I'll upload Miltos' slides when I receive them. Dr Patricia Roberts, Senior Lecturer & Researcher in database design, development and management, University of Brighton - Structured vs Unstructured Data: why structure matters. Simon Wibberley, PhD student in computational linguistics at the Text Analytics Group at the University of Sussex. Real-time text stream analysis, event detection, and entity recognition. Event detection on Twitter.

TRANSCRIPT

Page 1: Big Data Brighton | Big Data in Academia | Jan 2013

January 2013 at

University of Brighton

http://meetup.com/Big-Data-Brighton

Page 2: Big Data Brighton | Big Data in Academia | Jan 2013

Agenda• Miltos Petridis, Professor of Computer Science, University

of Brighton

• Dr Patricia Roberts, Senior Lecturer & Researcher in database design, development and management, University of Brighton - Structured vs Unstructured Data: why structure matters.

• Simon Wibberley, PhD student in computational linguistics at the Text Analytics Group at the University of Sussex. Real-time text stream analysis, event detection, and entity recognition. Event detection on Twitter.

• Kevin Long, Teradata - Summary and Business context

Page 3: Big Data Brighton | Big Data in Academia | Jan 2013
Page 4: Big Data Brighton | Big Data in Academia | Jan 2013

Big Data

“A  new  generation  of  technologies  and  architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-speed capture,  discovery  and/or  analysis”1

New investment initiatives are coming, such as in the US in 2012:

“more  than  $200  million  in  new  funding  through six agencies and departments to improve  the  nation’s   ability to extract knowledge and insights from large and complex collections  of  digital  data”  2

Page 5: Big Data Brighton | Big Data in Academia | Jan 2013

Knowledge and insights... hmm Before companies rush to use the technologies

they should be asking some questions:

• Can we make any assumptions about the

quality of the data we are using?

• Is there a significant difference between structured and unstructured data?

• Can the underlying structure of the data affect what you can do with it?

Page 6: Big Data Brighton | Big Data in Academia | Jan 2013

In this brief talk, I will be examining these

questions with reference to my research and recent trends

Page 7: Big Data Brighton | Big Data in Academia | Jan 2013

Can we make any assumptions about the quality of the data we are using?

• One of the problems about the recent explosion in the amount of data is that some data (particularly collected from social networking sites) is of dubious quality – A straw pole of my students found that 1 in 5

deliberately enter incorrect data about themselves online to protect their identity

• We might not have any assurance that the data is true or that it is correctly linked to metadata – Is data typed? – Is the data related to other data? How is it related? – Are relationships between data and its meaning

being lost?

Page 8: Big Data Brighton | Big Data in Academia | Jan 2013

A view of different data models 3

Page 9: Big Data Brighton | Big Data in Academia | Jan 2013

Is there a significant difference between structured and unstructured

data? • How is data structured? • Does the underlying data model matter? • What are the options for a data model? • Over the years many models of data have

evolved and most are still in use • Data models used give insights into

assumptions about the semantics of the data

Page 10: Big Data Brighton | Big Data in Academia | Jan 2013

Finding  meaning  from  ‘flat’  data

• A  problem  with  ‘flat’  or  unstructured  data  representations is that it has traditionally been difficult to aggregate and present to users in a way that they can understand

• In contrast, structured data can be summarised easily and its structure represents the meaning of data within an organization

• Data analytics are changing this by presenting  accessible  information  from  ‘flat’  data

Page 11: Big Data Brighton | Big Data in Academia | Jan 2013

Can the underlying structure of the data affect what you can do with it?

• The short answer from my research is ‘YES’

• How it affects what you can do with the data is the long answer – It is really easy to store a piece of data but

retrieving it (intact with its meaning and its relationships to other data) is more difficult

– When  ‘Big  Data’  technologies  are  used  to  knowledge and insights from the data we should be sure that the technology is not introducing new problems

Page 12: Big Data Brighton | Big Data in Academia | Jan 2013

Impedance mismatch problems

• Moving data from one paradigm to another often causes the meaning to be lost

• Can cause problems for developers who move data from one paradigm to another

• Also a problem for end users who may lose the connections

Page 13: Big Data Brighton | Big Data in Academia | Jan 2013

A way forward

• Working out goals in your data management • Understanding the structure of the data you

are using, wherever it comes from • Getting assurance about the quality of the

data • Then having confidence that the knowledge

and insights are based in firm foundations

Page 14: Big Data Brighton | Big Data in Academia | Jan 2013

Thank you

Any questions?

Page 15: Big Data Brighton | Big Data in Academia | Jan 2013

References 1. Carter, P (2011) , Big Data Analytics: Future

Architectures, Skills and Roadmaps for the CIO, SAS White paper, IDC Go-to-Market Services

2. E. Gianchandani. Obama administration unveils $200m big data r&d initiative. In The Computing Community Consortium (CCC) Blog, 2012.

3. Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Comput. Surv. 40, 1, Article 1 (February 2008)

Page 16: Big Data Brighton | Big Data in Academia | Jan 2013

Event Detecon on Twi�er

Simon Wibberley

Text Analycs Group

University of Sussex

[email protected]

Page 17: Big Data Brighton | Big Data in Academia | Jan 2013

What are Events? We just don’t know.

Page 18: Big Data Brighton | Big Data in Academia | Jan 2013

Event Categories

Constrained Unconstrained

Well Reported

Poorly ReportedInteresting

Relatively Easy Interesting

Very Tricky

Page 19: Big Data Brighton | Big Data in Academia | Jan 2013

Algorithms

• Query Driven

– Volume / rate analysis of matching data

– Addresses constrained event type

• Data Driven

– Mine stream for interesng data

– Addresses unconstrained event type

Page 20: Big Data Brighton | Big Data in Academia | Jan 2013

GB Dressage Gold

Page 21: Big Data Brighton | Big Data in Academia | Jan 2013

London Riots

Page 22: Big Data Brighton | Big Data in Academia | Jan 2013

London Riots

Page 23: Big Data Brighton | Big Data in Academia | Jan 2013

Event Characterisaon

• Fill in unknowns

• Self explanatory for (very) constrained events

• Select representave / well formed Tweet[s]

• Term relevance / clustering

• Topic analysis

• Geo-locaon / Enty extracon

Page 24: Big Data Brighton | Big Data in Academia | Jan 2013

CASM

• Centre for the Analysis of Social Media

• Collaboraon between DEMOS and TAG

• Applying text analycs to social media to

answer sociological quesons

• OSI funded EU senment anaylsis pilot project

h�p://www.demos.co.uk/projects/casm/

Page 25: Big Data Brighton | Big Data in Academia | Jan 2013

Ethics

Narrow Broad

Anonymous

Identity Preserving StasiJudiciary

Me!Social Science

Reffin, J (2012)