cise overview and big data - science automation technologies · • critical to transforming how...

46
CISE Overview and Big Data Suzi Iacono CISE Directorate National Science Foundation SI^2 Workshop January 17, 2013 Image Credit: Exploratorium.

Upload: others

Post on 28-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

CISE Overview and Big Data

Suzi Iacono CISE Directorate

National Science Foundation SI^2 Workshop January 17, 2013

Image&Credit:&Exploratorium.&

Page 2: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Economic Impact of IT •  Growth of IT industry coupled with

productivity gains across the entire economy have had enormous impact.

•  IT industries accounted for 25% of US economic growth since 1995. –  In 2010, IT industries grew 16% and

contributed 5% to overall US GDP

•  Use and production of IT accounted for ~2/3 of the post-1995 growth in labor productivity.

•  IT sector generates jobs: IT jobs have grown 125x faster than employment as a whole between 2001 and 2011, and in 2011, IT workers earned 74% more than the average worker.

•  IT diversifies regional economies to include idea-driven “creative” industries.

Sources: NRC (2009). Assessing the Impacts of Changes in the IT R&D Ecosystem.; NRC (2012). Continuing Innovation in Information Technology.; ITIF (2012). Looking for Jobs? Look to IT in 2010 and Beyond.

Page 3: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

30%&

CISE&Directorate&&

Computing and Communication

Foundations (CCF) Susanne Hambrusch

Algorithmic+Founda1ons+

Communica1on+and+Informa1on+Founda1ons+

So7ware+and+Hardware+

Founda1ons+

Computer and Network Systems (CNS) Keith Marzullo

Computer+Systems+Research+

Networking+Technology+and+

Systems+

Information and Intelligent Systems

(IIS) Howard Wactlar

HumanACentered+Compu1ng+

Informa1on+Integra1on+and+Informa1cs+

Robust+Intelligence+

70%&

CISE&Cross<Cu=ng&Programs&

CISE&Core&Programs&

Cross<FoundaAon&Programs&

Page 4: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

30%&

CISE&Directorate&&

Computing and Communication

Foundations (CCF) Susanne Hambrusch

Algorithmic+Founda1ons+

Communica1on+and+Informa1on+Founda1ons+

So7ware+and+Hardware+

Founda1ons+

Computer and Network Systems (CNS) Keith Marzullo

Computer+Systems+Research+

Networking+Technology+and+

Systems+

Information and Intelligent Systems

(IIS) Howard Wactlar

HumanACentered+Compu1ng+

Informa1on+Integra1on+and+Informa1cs+

Robust+Intelligence+

Office of Cyberinfrastructure

(OCI) Alan Blatecky

70%&

CISE&Cross<Cu=ng&Programs&

CISE&Core&Programs&

Cross<FoundaAon&Programs&

Page 5: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

!"Word"Cloud"created"from"CISE"FY"2011"award"9tles."

Page 6: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science
Page 7: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Computer+Science+&+Informa1on+Science+&+Computer+Engineering+(CISE),+65%+

Engineering+(excluding+Computer+

Engineering),+11%+

Interdisciplinary+Centers,+3%+ Sciences+&+

Humani1es,+21%+

PI&and&Co<PI&Departments&for&FY&2011&Awards&Funded&by&CISE&

Who is the CISE community?

Page 8: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Research Frontiers

Data&Explosion&Smart&Systems:&

Sensing,&Analysis&and&Decision&

Expanding&the&Limits&of&ComputaAon&

Secure&Cyberspace& Universal&ConnecAvity& AugmenAng&Human&CapabiliAes&

Page 9: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Image+Credit:+CCC+and+SIGACT+CATCS+

Advances in information technologies are transforming the

fabric of our society and data represents a transformative new

currency for science, engineering, education and commerce.

Page 10: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Where+do+the+data+come+from?+++Why+do+we+have+a+na1onal+ini1a1ve?+

Page 11: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

The Big Data Landscape I: Big Science

•  Science gathers data at an ever-increasing rate across all scales and complexities of natural phenomena

•  Sloan Digital Sky Survey in 2000 collected more data in its 1st few weeks than had been amassed in the entire history of astronomy –  Within a decade, over 140 terabytes of information

collected •  Large Hadron Collider generates scores of petabytes a year •  The proposed Large Synoptic Survey Telescope (3.3 gigapixel

digital camera) will generate 40 terabytes of data nightly •  By 2015, the world will generate the equivalent of

approximately 93 million Libraries of Congress

Page 12: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Source: Sajal Das, Keith Marzullo

Personal+Sensing+

Public+Sensing+

Social+Sensing+

People<Centric&Sensing&

Actions (controllers)

Percepts (sensors)

Agent (Reasoning)

Smart&Health&Care&

Situation Awareness: Humans as sensors feed multi-modal data streams

Pervasive&&&&&CompuAng&&

Social&&&&&&&&&&&InformaAcs&&

Sense+

Iden1fy+

Assess+

Intervene+

Evaluate+

Emergency&Response&Environment&Sensing&

The Big Data Landscape II: Smart Sensing, Reasoning and Decision-making

Credit:+Photo+by+US+Geological+Survey++

Page 13: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

The Big Data Landscape III: New Paradigms for Communications

1988$Remarkable

Pace of Innovation

EMAIL VOIP BLOGS

Today$

MOBILE SOCIAL NETWORKS

VIDEO

Page 14: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

The Big Data Landscape IV: The Long Tail of Science

•  Hundreds of thousands of scientists and engineers work individually or in small, distributed, disconnected groups – all generating data that collectively represent an enormous, largely untapped scientific resource

–  From running simulations, experiments, etc.

•  Making heterogeneous data across many areas of science more homogeneous could give way to breakthroughs across all areas of science and engineering

•  Estimated 40 exabytes of unique new information generated worldwide in 2010

•  Only 5% of the information created is “structured,” however, in a standard format of words or numbers; the rest are unstructured text, voice, images, etc.

Page 15: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

How Big is Big?

•  “Big Data”: “Datasets whose size are beyond the ability of typical database software tools to capture, store, manage, and analyze”

-McKinsey Global Institute, Big data: the next frontier for innovation, competition, and productivity, May 2011.

Image+Credit:+Sigrid"Knemeyer+

Page 16: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

…Not Just Volumes of Data

•  The science of big data is not just about volumes and velocity of data, but also – Heterogeneity and diversity

•  Levels of granularity •  Media formats •  Scientific disciplines

– Complexity •  Uncertainty •  Incompleteness •  Representation types

Page 17: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Why is Big Data Important? •  Critical to transforming how science is done and to

accelerating the pace of discovery in almost every science and engineering discipline

•  Transformative implications for commerce and economy •  Potential for addressing some of society’s most pressing

challenges

Image+Credit:+Chi"Birmingham+

Page 18: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Paradigm Shift: from Hypothesis-driven to Data-driven Discovery

hVp://www.sciencemag.org/site/special/data/+ hVp://www.economist.com/node/15579717++hVp://research.microso7.com/enAus/collabora1on/fourthparadigm/++

"""

The"Fourth"Paradigm:"Data!Intensive"Scien9fic"

Discovery"(2009,"Microso7+Corpora1on).+

+++++++

"""

The"Economist,"The+data+deluge+and+how+to+handle+it:+A+14Apage+special+report+(Feb+25,+

2010).++++++++

Page 19: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

The Age of Data: From Data to Knowledge to Action

•  Data-driven discovery is revolutionizing scientific exploration and engineering innovations

•  Automatic extraction of new knowledge about the physical, biological and cyber world continues to accelerate

•  Multi-cores, concurrent and parallel algorithms, virtualization and advanced server architectures will enable data mining and machine learning, and discovery and visualization of Big Data

Page 20: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Potential for Transformational Science & Engineering:

From Data to Knowledge to Action

•  Integration of discipline (or media format…) specific data, examine for relationships – Disaster informatics

•  3D toxic fume images •  Simulations of gas spread •  Maps of census concentrations •  First responder on-the- ground findings •  Evacuation routing

Page 21: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Examples of Research Challenges

•  More data are being collected than we can store •  Analyze the data as it becomes available •  Decide what to archive and what to discard

•  Many data sets are too large to download •  Analyze the data wherever it resides

•  Many data sets are too poorly organized to be usable •  Better organize and retrieve data

•  Many data sets are heterogeneous in type, structure, semantics, organization, granularity, accessibility … •  Integrate and customize access to federate data

•  Utility of data is limited by our ability to interpret and use it •  Extract and visualize actionable knowledge •  Evaluate results

•  Large and linked datasets may be exploited to identify individuals •  Design management and analysis with built-in privacy

preserving characteristics

Page 22: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

A National Imperative

Source: PCAST (December 2010), “Report to the President and Congress: Designing a Digital Future…”– a periodic congressionally-mandated review of the Federal Networking and Information Technology Research and Development (NITRD) Program.

•  PCAST calls on the Federal government to increase R&D investments for collecting, storing, preserving, managing, analyzing, and sharing the increasing quantities of data.

•  Furthermore, PCAST observed that the potential to gain new insights … to move from data to knowledge to action has tremendous potential to transform all areas of national priority.

Page 23: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Administration’s Big Data Research and Development Initiative

•  Big Data Senior Steering Group – chartered in spring 2011 under the Networking and Information Technology R&D (NITRD) Program –  Members from DARPA, DOD OSD, DHS, DOE-Science, HHS, NARA, NASA, NIST, NOAA, NSA, OFR, USGS, etc. –  Co-chaired by NSF (and NIH) –  Initial charge was to come up with a plan, a strategy

23+

Image+Credit:+Fuqing"Zhang"and"Yonghui"Weng,"Pennsylvania"State"University;"Frank"Marks,"NOAA;"Gregory"P."Johnson,"Romy"Schneider,"John"Cazes,"Karl"Schulz,"Bill"Barth,"The"University"of"Texas"at"Aus9n+

Page 24: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Big Data Membership Biven&& Laura&& DOE&Science&& [email protected]&&

Blatecki&& Alan&& NSFNational&Science&Foundation&& [email protected]&&

Collica&& Leslie&& NISTNational&Institute&of&Standards&and&Technology&& [email protected]&&

Deift&& Abby&& NSFNational&Science&Foundation&& [email protected]&&

Downing&& Gregory&& HHSDepartment&of&Health&and&Human&Services&& [email protected]&&

Espina&& Pedro&&OSTPWhite&House&Office&of&Science&and&Technology&Policy&&

[email protected]&&

Gerr&& Neil&& DARPADefense&Advanced&Research&Projects&Agency&& [email protected]&&

Gundersen&& Linda&& USGSU.S.&Geological&Survey&& [email protected]&&

Hall&& Alan&&NOAANational&Oceanic&and&Atmospheric&Administration&&

[email protected]&&

Iacono&& Suzanne&& NSFNational&Science&Foundation&& [email protected]&&

Jakubek&& David&& OSDOffice&of&the&Secretary&of&Defense&PATL&& [email protected]&&

Kaufman&& Daniel&& DARPADefense&Advanced&Research&Projects&Agency&& [email protected]&&

Larson&& Phillip&P.&&OSTPWhite&House&Office&of&Science&and&Technology&Policy&&

[email protected]&&

Lee&& Tsengdar&&NASANational&Aeronautics&and&Space&Administration&& [email protected]&&

Lipman&& David&&NIHNational&Institutes&of&Health&/NLMNIH’s&National&Library&of&Medicine&/NCBI&&

[email protected]&&

Little&& Michael&& NASANational&Aeronautics&and&Space&Administration&& [email protected]&&

Luker&& Mark&&NCONational&Coordination&Office&for&NITRD&/NITRDNetworking&and&Information&Technology&Research&and&Development&&

[email protected]&&

Marth&& Lisa&& NISTNational&Institute&of&Standards&and&Technology&& [email protected]&&

Muoio&&Patricia&A.&&

DNI&& [email protected]&&

Pantula&& Sastry&& NSFNational&Science&Foundation&& [email protected]&&

Preuss&& Don&&NIHNational&Institutes&of&Health&/NLMNIH’s&National&Library&of&Medicine&/NCBI&&

[email protected]&&

Quade&& Brittany&& NSFNational&Science&Foundation&& [email protected]&&

Romine&& Charles&& NISTNational&Institute&of&Standards&and&Technology&& [email protected]&&

Smith&& Darren&&NOAANational&Oceanic&and&Atmospheric&Administration&&

[email protected]&&

Spengler&& Sylvia&& NSFNational&Science&Foundation&& [email protected]&&

Statler&& Tom&& NSFNational&Science&Foundation&& [email protected]&&

Strawn&& George&&NCONational&Coordination&Office&for&NITRD&/NITRDNetworking&and&Information&Technology&Research&and&Development&&

[email protected]&&

Suskin&& Mark&& NSFNational&Science&Foundation&& [email protected]&&

Villani&& Jennifer&& NIHNational&Institutes&of&Health&/NIGMS&& [email protected]&&

Wigen&& Wendy&&NCONational&Coordination&Office&for&NITRD&/NITRDNetworking&and&Information&Technology&Research&and&Development&&

[email protected]&&

Zhao&& Fen&& NSFNational&Science&Foundation&& [email protected]&&

Nowell&& Lucy&& DOE&Science&& [email protected]&&

Page 25: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Big Data Membership

Bristol(( Sky(( USGSU.S.(Geological(Survey( ( [email protected]((

Kielman(( Joseph(( DHSDepartment(of(Homeland(Security(( [email protected]((

Petters(( Jonathan((DOE(Science(( [email protected]((

Carver(( Doris(( NSFNational(Science(Foundation(( [email protected]((

Adolfie(( Laura(( OSDOffice(of(the(Secretary(of(Defense( ( [email protected]((

Chadduck(( Robert(( NSFNational(Science(Foundation(( [email protected]((

Crowder(( Grace(( NSANational(Security(Agency(( [email protected]((

Dunn(( Michelle((NIHNational(Institutes(of(Health(( [email protected]( (

Florance(( Valerie(( NIHNational(Institutes(of(Health(( [email protected]((

Frehill(( Lisa(( OSDOffice(of(the(Secretary(of(Defense( ( [email protected]((

Helland(( Barbara(( DOEDepartment(of(Energy(QScience(( [email protected]((

Hoang(( Thuc(( DOEDepartment(of(Energy(QNNSA(( [email protected]((

Kannan(( Nandini(( NSFNational(Science(Foundation(( [email protected]((

Warnow(( Tandy(( NSFNational(Science(Foundation(( [email protected]((

Dean(( David(( DOE(Science(( [email protected]((

Lyster(( Peter(( NIHNational(Institutes(of(Health(( [email protected]((

Millemaci(( John(( OSDOffice(of(the(Secretary(of(Defense(( [email protected]((

Pearce(( Claudia(( NSANational(Security(Agency(( [email protected]( (

Allen(( Marc(( NASANational(Aeronautics(and(Space(Administration(([email protected]((

Pearl(( Jennifer(( NSFNational(Science(Foundation(( [email protected]((

Blaszkowsky((David(( TreasuryDepartment(of(the(Treasury(OFR(( [email protected]( (

Szykman(( James(( EPAEnvironmental(Protection(Agency(( [email protected]((

Tompkins(( Jerry(( NSANational(Security(Agency(( [email protected]((

Flood(( Mark(( TreasuryDepartment(of(the(Treasury(OFR(( [email protected]((

Holm(( Jeanne(( Data.gov(( jeanne.m.holm.jpl.nasa.gov((

Misawa(( Eduardo((NSFNational(Science(Foundation(( [email protected]((

(

Page 26: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Big Data Launch

•  Federal Big Data R&D Initiative launched by White House OSTP on March 29, 2012 at AAAS

•  Federal Announcements:

•  NSF – Subra Suresh •  NIH – Francis Collins •  USGS – Marcia McNutt •  DoD – Zach Lemnios •  DARPA - Ken Gabriel •  DOE – William Brinkman

•  Panel Discussion: •  Moderator - Steve Lohr, New York Times •  Daphne Koller, Stanford University •  James Manyika, McKinsey & Company •  Lucila Ohno-Machado, UC San Diego •  Alex Szalay, Johns Hopkins University

Image+Credit:+Na9onal"Science"Founda9on+

More information available at: http://nsf.gov/news/news_summ.jsp?org=CISE&cntn_id=123607&preview=false

Page 27: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Strategy to Address Big Data

FoundaAonal&research&to&develop&new&techniques&and&

technologies&to&derive&knowledge&from&data&

New&cyberinfrastructure&to&manage,&curate,&and&serve&data&to&

research&communiAes&

New&approaches&for&educaAon&and&workforce&development&

New&types&of&inter<disciplinary&

collaboraAons,&grand&challenges,&and&compeAAons&

Policy&

Page 28: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIG DATA)

Program Solicitation: NSF 12-499 Foundational research to extract knowledge from data

Foundational research to advance the core techniques and technologies for managing, analyzing, visualizing, and extracting useful information from large, diverse, distributed and heterogeneous data sets.

CrossADirectorate+Program:+NSF+Wide+

Mul1Aagency+Commitment:+NSF+and+NIH+

Image+Credit:+Jurgen"Schulze,"Calit2,"UC!San"Diego+

Page 29: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

BIG DATA Research Thrusts CollecAon,&Storage,&and&

Management&of&“Big&Data”+

• 3 awards

• Foundations of big data management

• Mitigating tradeoffs among speed of data ingestion, quicker answers and the freshness of data through the design of new storage devices with extreme capacities

• Databridge – linking data, human interactions and usage practices for the long-tail of science

Data&AnalyAcs&&

•  4 awards

• Novel machine learning where multi-dimensional vector data points are replaced by distributions

• Design and test mathematical and statistical techniques for large-scale heterogeneous data in DNA repositories

• Data analytics problems in next generation sequencing

•  Theory and algorithms fro couples tensors and associated software toolkits to make analysis possible

Research&in&Data&Sharing&and&CollaboraAon&

•  1 award (+1 shared with data collection)

• Open source tools for infrastructure for improving discovery through use of social analytic data

• Databridge– linking data, human interactions, and usage practices for the long-tail of science

Credit:+Fermilab+Photo+

Eight+midAscale+(up+to+$1M+a+year)+awards+out+of+over+136+projects+announced+on+Oct.+3.+

Page 30: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Award Citations

•  DCM: –  Dan Suciu – University of WA

•  A formal foundation for big data management –  Michael Bender – SUNY at Stony Brook & Martin Farach-Colton – Rutgers University

•  Eliminating the data ingestion bottleneck in big data applications

–  Arcot Rajasekar – University of North Carolina, Chapel Hill & Gary King – Harvard University & Justin Zhan – North Carolina Agriculture & Technical State University

•  Databridge – A sociometric system for long-tail science data collections

30+

Page 31: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Award Citations •  Data Analytics

– Eli Upfal – Brown University •  Analytic approaches to massive data computation with

applications to genomics – Aarti Singh – Carnegie-Mellon University

•  Distribution-based machine learning for high dimensional datasets

– Srinvas Aluru – Iowa State University & Wuchun Feng – Virginia Polytechnic Institute & State University & Oyekunie Olukotun – Stanford University

•  Genomes Galore – Core techniques, libraries, and domain specific languages for high throughput DNA sequencing

Page 32: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Award Citations

•  Data Analytics (continued) – Christos Faloutsos – Carnegie Mellon

University & Nikolaos Sidiropoulos – University of Minnesota – Twin Cities

•  Big Tensor Mining: Theory Scalable Algorithms and Applications

32+

Page 33: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Award Citations

•  E-Science Collaboration Environments – Thorsten Joachim – Cornell University & Paul

Kantor – Rutgers University •  Discovery and social analysis for large-scale

scientific literature

33+

Page 34: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Ideation Contest Launch

•  Opportunity to expand the innovation ecosystem •  Joint among NASA, NSF and DOE Office of

Science •  A contest focused on “How to make

heterogeneous data seem more homogeneous?” •  5 judges •  5 criteria •  Launched on Challenge.gov and the Top Coder

platform on Oct. 3 with a two week window –  http://challenge.gov/NASA/425-big-data-challenge-

series –  http://community.topcoder.com/coeci/nitrd/

Page 35: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Ongoing Big Data Programs at NSF

•  Dear Colleague Letters: –  Encourage CIF21 IGERTs to educate and support a new

generation of researchers able to address fundamental Big Data challenges: http://www.nsf.gov/pubs/2012/nsf12555/nsf12555.htm

–  Data-Intensive Education-Related Research Funding Opportunities announcing an Ideas Lab, for which cross disciplinary participation will be solicited, to generate transformative ideas for using large datasets to enhance the effectiveness of teaching and learning environments: http://www.nsf.gov/pubs/2012/nsf12060/nsf12060.jsp

–  Data Citation to the Geosciences Community to encourage transparency and increased opportunities for the use and analysis of data sets: http://www.nsf.gov/pubs/2012/nsf12058/nsf12058.jsp

Page 36: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Earthcube: GEO Science Infrastructure •  EAGER awards announced as part of White House Big Data Launch •  Integrates geosciences data and high-performance computing

technologies in an open, adaptable and sustainable framework to enable transformative research and education in Earth System Science

•  Innovative Model: Community designed, community owned, community governed

•  Interdisciplinary research: –  Building and sustaining “new” communities –  Workshops to bring together (GEO, SBE, CISE) communities –  EAGER awards to seed new research

Page 37: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

A Complex Policy Setting

•  Researchers want data. •  Public policy requires access to data. •  Public policy also requires protection of privacy and

intellectual property and other sensitive information. •  Much more to be done: Policy on data management

and data access.

Page 38: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Data Privacy

•  Never more important than today •  However, not all data contain people’s identities (as in data

landscape III) –  Not a Big Brother scenario –  Government (NSF) invests in privacy research

•  Values in design research community: –  Identity cloaking, anonymization –  Do-not-track cookie management –  Obfuscation, blurring –  Privacy preserving data mining, search, payment –  Just-in-time crypto –  Secure data distribution –  ….

Privacy in technology; privacy inspired technology

Page 39: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Emerging Frontiers

Data&Explosion& Smart&Systems:&Sensing,&Analysis&and&Decision&

Expanding&the&Limits&of&ComputaAon&

Secure&Cyberspace& Universal&ConnecAvity& AugmenAng&Human&CapabiliAes&

Page 40: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

10

100

1,000

10,000

100,000

1,000,000

1985 1990 1995 2000 2005 2010 2015 2020Year of Introduction

Processor Performance Plateaued Around the Year 2004

Credit:&Graph&reprinted&with&permission&from&The$Future$of$Compu4ng$Performance:$Game$Over$or$Next$Level?&NaAonal&Academy&of&Sciences&(2011).&

The Expectation Gap

Microprocessor Performance “Expectation Gap” over Time (1985-2020 projected)

Image+Credit:+USC"BMES"ERC+

Page 41: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Impact of Single-Processor Performance Plateau

Accentuated+by+emergence+of+massive&data&sets,+scien1sts+have+an+increasing+appe1te+and+need+for+speed+and+performance.+++

Important+new+science+ques1ons+in+physics,&materials,&biology,&&health&and&medicine,&and&climate&&change&require+increased+processing+power.+

&

Support&of&naAonal&defense&and&intelligence&community&will+need+increasingly+more+processing+power.+

Applica1ons+include+training+simula1ons,+autonomous+robo1c+vehicles,+airport+security,+surveillance,+video+analy1cs,+infrastructure+defense+against+cyber+aVacks,+and+data+analysis+for+intelligence.++

+

Both+consumer&and&enterprise&needs&are+increasing.+

Applica1ons+include+search+and+data+mining,+realA1me+decisionAmaking,++web+services,+digital+content+crea1on,+speech+recogni1on,+and+simula1on+and+modeling+for+product+design.++

Page 42: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Research to Expand the Limits of Computation

Happening&now+• Architectural+innova1ons+with+mul1Acore+and+manyAcore+

• DomainAspecific+integrated+circuits+• EnergyAefficient+compu1ng+and++new+processor+architectures+

Mid<term&soluAons&• Research+agenda+based+on+parallelism,+concurrency,+and+scalability+

• Algorithmic+innova1ons+exploi1ng+parallelism+

• So7ware+systems+leading+to+improved+performance+

Long<term&soluAons&&&• New+materials+(e.g.,+carbon+nanoAtubes,+graphene+based+devices)&

• NonAcharge+transfer+devices;+(e.g.,+electron+spin)++

• Bio,+nano,+and+quantum+devices&

ExploiAng&Parallelism&and&Scalability:&XPS&(NSF&13<507)&

Page 43: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Computing Research Agenda on Parallelism, Concurrency, and Scalability

•  Computational models and programming

languages to enable new ways of “thinking parallel” and expression of parallelism at every scale.

•  Algorithms and algorithmic paradigms that allow reasoning about parallel performance and scalability.

•  Software systems capable of handling both small and extreme-scale data systems and aware of communication and energy use.

•  Synthesis tools that generate efficient parallel codes from high-level descriptions.

•  Scalable and energy-efficient architectures ranging from sensors to clouds while addressing programmability, reliability, and security.

•  A new cross-layer approach integrating both software and hardware through new programming languages, models, algorithms, compilers, runtime systems and architectures.

hVp://www.nap.edu/catalog.php?record_id=12980++

Page 44: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Advanced Computational Infrastructure

XSEDE&

•  Anticipate and invest in diverse and innovative national scale shared resources, outreach and education complementing campus and other national investments

•  Leverage and invest in collaborative flexible “fabrics” dynamically connecting scientific communities with computational resources and services at all scales (campus, regional, national, international)

CIPRES+–+Cyberinfrastructure+for+Phylogenic+Research+

Page 45: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Opportunities for the Future

•  Our investments in research and education have already returned exceptional dividends to the Nation.

•  Many of tomorrow’s breakthroughs will occur as a result of new techniques and technologies for advancing computing science and engineering.

•  In turn, scientific discovery and technological innovation

are at the core of our response to national and societal challenges – from environment, energy, transportation, sustainability, and healthcare to cyber security and national defense.

Page 46: CISE Overview and Big Data - Science Automation Technologies · • Critical to transforming how science is done and to accelerating the pace of discovery in almost every science

Thanks!

[email protected]