2017 welf disa@lnu-intro v1 - gxu.edu.cn...disa a calculus of culture | circumventing the black box...

Post on 09-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DISA

A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017

Koraljka Golub, koraljka.golub@lnu.seBased on presentation by Welf Löwe, welf.lowe@lnu.se

Linnaeus University Center of Excellence for Data Intensive Sciences and Applications

https://lnu.se/disa

Big Data, Information, and Knowledge

Big Data: Analyzing data sets and streams to gain knowledge about technical, scientific, sociological or economical phenomena.

• Data: symbols, signals, bits & bytes, words, numbers, tokens…

• Information: data interpreted in a context, i.e. meaning

• Knowledge: actionable information, i.e. insights allow for controlling processes and giving predictions

Creating knowledge from data: Collaborative effortCredit: EU CROS portal

What is “Big” today and tomorrow?

• Challenging quantities that go beyond the capability of humans and commonly used software tools • A constantly moving target• Ranging from a few dozen terabytes (1012) to many petabytes (1015)

today

• Challenging qualities, as data come in varying and complex formats, consisting of all types of structured and unstructured data

• My computer has ½ terabyte = 512 gigabytes of disk space= 1 year of music, 20 days of video, 150,000 photos, 10 days of movies

Turn data into knowledge to create value Credit: PLOS Biology, 7 July 2015Projection for 2025, 1 petabyte = 1024 terabytes

Big Data – profile of LNU

• DISA@LNU:- Multidisciplinary fundamental and applied Big Data research - Linnaeus University Center (ca 70 MSEK from LNU alone until 2021)

• DISA@IEC: - Applied Big Data research and innovation with industry- IEC is an ICT cluster organizing collaborations of academia and industry

(ca 6 MSEK from EU, LNU, Regions of Kalmar and Kronoberg, Tungafordon, TEC until 2018)

• iSchools: - Effort in Big Data related courses and education programs

IT Core – Foundations and Technologies

Transforming Big Data to Information to Knowledge• Signal processing: signal analysis and theories with application to direct and

inverse problems• Statistical analysis: collection, analysis, interpretation, presentation, and

organization of (sparse, high-dimensional) data• Machine learning: classification, estimation, prediction based on data• Visual analytics: analytical reasoning about data facilitated by interactive visual

interfaces

Coping with variety, velocity, and volume of Big Data • Composition, adaptation: building systems from (adapted) components• Self-adaptation: let systems adapt based on observations in its state and

environment• Future Internet: the Internet as a distributed, mobile data collection and

computing platform (Cloud Computing, Internet of Things)• Parallel & high-performance computing: scale with data processing and storage

Big Data Application Areas

1. Astrophysics

2. Wood and Building Technologies

3. Engineering of Smarter Systems

4. Software and Information Quality

5. Digital Humanities: English Linguistics, Media & Journalism, Library and Information Science

6. Computational Social Sciences

7. eHealth

eHealth: Improving our health systems

What if we could gain actionable knowledge from all European health registers?

Credit: BigApple Horizon 2020 proposal.

Accelerating DNA analysis

What if we could …• Reduce time for pediatricians to scan and analyze

the entire genome of a critically ill infant? • Explore differences and similarities of all

organisms and their evolutionary relationship?

GenBank dataset• 188’372’017 DNA sequences (Oct 2015)• One human DNA sequence ~3 gigabytes

Our DNA sequence analysis application* • Finds patterns in large-scale DNA sequences• Speedup 10x compared to baseline on regular PC

Credit: http://www.biol.unt.edu/

* Suejb Memeti, Sabri Pllana: Accelerating DNA Sequence Analysis using Intel Xeon Phi, ISPA 2015 IEEE.

What if we could… • Analyze for instance what people in Sweden talk about in public?• Explore the relationship between language and thought – public sentiment,

consumer trends, opinions, …• Look in real-time through a window to the world

The Nordic Twitter Stream initiative• A robot to monitor geocoded Tweet stream in five Nordic countries• Strict ethical guidelines: focus on macro-level patterns – not individuals• Text and data mining tools to analyze big and complex data sets• User-generated content – what do we talk about on a really big scale?• A consortium of linguists and computer scientists

The Nordic Tweet Stream initiative (NTS)

Understanding language

Applications inEngineering,

Sciences & Humanities

IT Core

Self-adaptationVisual analytics

Distributed ComputingFuture Internet

Parallel & high-performance computingSignal processingStatistical analysisMachine learning

Composition & adaptation

Astrophysics

Wood/BuildingTechnologies

Software& InfoQuality

Media &Journalism

EnglishLinguistics

eHealth

Comput.Social

Sciences

Library &Info Science

Ongoing Big Data collaborations

Collaboration with industry• Common research projects Atlas Copco, Ericsson, IBM, Intel, Meltwater,

NVIDIA, Sigma Technology, Telia, Vattenfall, Yaskawa, …• Interests across industriesCollaboration with branch organization • IT industry: IEC, SwedSoft• Building and construction: Smart Housing Småland, GodaHus• Heavy vehicle industry: Tunga fordon• Manufacturing: LTC, TECCollaboration with public sector • Active project involvement of and financial support from the regional municipalities• eHälsomyndigheten established in Kalmar much due to the activities at the eHealth

Institute• Collaboration with Stralsakerhetsmyndigheten

New Big Data collaborations

• Offer towards industry, the public sector, and interested LNU researches• Various research competences that we can contribute with • Many ways of collaboration

- High-Performance Computing Center- Data and Text Streaming Platform (social media and web data) - Thesis and seed projects- Digitalization courses and seminars for industry (with IKEA)- Development of and contribution to educational programs - Workshops for developing realistic research and innovation projects with

external funding- Research and innovation projects together with LNU- …

High-Performance Computing Center (HPCC)

• The HPCC will be a high- performance computing platform providing advanced computing and storage infrastructure and knowledgeable scientific and technical staff.

• It will provide services to scientists and to the regional industry.• The platform complements the large-scale computing and storage

infrastructures that are available at the national level, e.g., the Swedish National Infrastructure for Computing (SNIC).

• Core exists: cluster with 20 servers, two accelerated single-node systems; suitable for experimentation with limited-size problems.

• DISA invests ca. 3 MSEK, but more funding will be needed

• Contact Sabri.Pllana@lnu.se

Thesis and Seed projects

• Collaborative innovation and research in Big Data with Master’s and PhD students and senior researchers involved

• Explore ideas and jump start collaboration projects• Multidisciplinary collaboration between industry, the public sector and LNU• Short start up time, 3-6 month activities, budget of max 200 KSEK• Why

– Idea testing and prototyping– Explorative study– Get in contact with students as (future) customers – Get in contact with teachers as part of the marketing strategy– Assessment of students for hiring– IT support for internal development and innovation- Pilot for a larger research project

• Contact Diana.UnanderNordle@lnu.se

Research and innovation projects together with LNU • Requirements as for seed projects (multidisciplinary or/and industry-academia)• Longer planning and decision time (ca. 1 year) • Longer project time (3-5 years), • Higher funding (3-30 MSEK) but usually in-kind contributions required• Why

- LNU: keep or get back control over your research subject when there is a paradise shift towards data-driven scientific approaches

- Industry: do not miss benefitting from the data you have access to and getting necessary R&D resources for free

• Contact Welf.Lowe@lnu.se

top related