data and society lecture 7: data infrastructurebermaf/data course 2018/lecture 7... ·...

38
Fran Berman, Data and Society, CSCI 4370/6370 Data and Society Lecture 7: Data Infrastructure 3/30/18

Upload: others

Post on 08-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Data and Society

Lecture 7: Data Infrastructure

3/30/18

Page 2: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Announcements 3/30

• Make sure you sign up and do 2 presentations by the end of the semester.

• Check what you think your grades are (attendance, op-ed, and presentation scores) with Fran during office hours. You are responsible for being sure that these are accurate.

Page 3: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Wednesday Section Friday lecture

First Half of Class Second Half of Class Assignments

January 17 : NO class January 19 L!: CLASS INTRO AND LOGISTICS Presentation Model / Op-Ed Instructions

Op-Ed instructions

January 24: NO class January 26 L2: BIG DATA 1 4 Presentations

January 31: NO class February 2 L3: BIG DATA 2 -- IoT 4 Presentations

February 7: NO class February 9 L4: DATA AND SCIENCE 4 Presentations Op-Ed due Feb. 9

February 14: 5 Presentations

February 16 L5: DATA AND HEALTH / LESLIE McINTOSH GUEST SPEAKER

4 Presentations Op-Ed drafts returned Feb. 21

February 21: 5 Presentations

February 23 L6: DATA STEWARDSHIP AND PRESERVATION

4 Presentations Research Paper instructions

February 28: 5 Presentations

March 2 CLASS CANCELED DUE TO SNOW

March 7 : 5 Presentations March 9: NO CLASS / PAPER PREPARATION Op-Ed Final due March 7

March 14: Spring Break March 16 SPRING BREAK

March 21: NO class March 23: NO CLASS / PAPER PREPARATION

March 28: 4 Presentations

March 30 L7: INFRASTRUCTURE 4 Presentations Research Paper due March 28

April 4: NO class April 6 L8: DATA RIGHTS, POLICY, REGULATION 4 Presentations

April 11: 4 Presentations April 13 L9: DATA AND ETHICS 4 Presentations

April 18: 4 Presentations April 20 L10: DATA AND COMMUNICATION 4 Presentations

April 25: 4 Presentations April 27 L11: DATA FUTURES 4 Presentations

Page 4: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Today (3/30/18)

• Lecture 7: Data Infrastructure

– Technology – driven science

– Data and the LHC

– Data and Entertainment

• Break

• 4 Student Presentations

4

Page 5: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Technology-driven Science: Many different kinds of infrastructure (cyberinfrastructure, e-infrastructure) needed for modern science

COMPUTE (more FLOPS)

DA

TA

(more

BY

TE

S)

Home, Lab,

Campus, Desktop

Applications

Compute-

intensive

HPC

Applications

Data-intensive

and

Compute-

intensive

HPC

applications

Compute-intensive Grid,

Distributed, and Cloud

Applications

Data - oriented

Grid, Distributed

and Cloud

Applications

NETWORK

(more

BW)

Data-intensive

applicationsLongitudinal evolution:

• ‘80’s, 90’s +: Computational Science, first national networks

• ‘90’s, 00’s +: Development of integrated cyberinfrastructure, emerging focus on data

• 00’s, 10’s +: Increasing integrated cyberinfrastructure workflows, emergence of data science

Page 6: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Cyberinfrastructure evolution: Technology-driven science

increasing focus

in the 80’s and 90’s (data issues often in the background …)

• Many reports in 80’s and early 90’s focused on the potential of information technologies (primarily computers and high-speed networks) to address key scientific and societal challenges

• First federal “Blue Book” in 1992 focused on key computational problems including

– Weather forecasting

– Cancer genes

– Predicting new superconductors

– Aerospace vehicle design

– Air pollution

– Energy conservation and turbulent combustion

– Microsystems design and packaging

– Earth’s bioshpere

– Broader education resources

Page 7: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

In the beginning … The Branscomb Pyramid, circa 1993

Branscomb Pyramid provides a framework to associate computational power with community use.

Original Branscomb Committee Report (“From Desktop to TeraFlop”) at http://www.csci.psu.edu/docs/branscomb.txt

Page 8: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

The Branscomb Pyramid, circa 2018

Small-scale devices and personal

computers

Small-scale Campus/Commercial

Clusters

Large-scale campus/commercial

resources, Center supercomputers

Leadership Class

PF EF

TF, PF

TF

MF, GF

Opportunities for Innovation at all levels …

Kilo 103

Mega 106

Giga 109

Tera 1012

Peta 1015

Exa 1018

Zetta 1021

Yotta 1024

Page 9: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Also in 1993: The Top500 List created to

rank supercomputers

• TOP500 list ranks and details the 500 most powerful supercomputers in the world

• Most powerful = performance on the LinPack benchmark.

• Rankings provide invaluable statistics on supercomputer trends by country, vendor, sector, processor characteristics, etc.

• List compiled by Hans Meuer of University of Mannheim, Jack Dongarra of University of Tennessee, and Erich Strohmaier and Horst Simon of NERSC / LBNL. List comes out in November and June each year.

http://top500.org/

Page 10: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Top500 List for November 2017

Page 11: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

What the Top500 List measures

Rmax and Rpeak values are in TFlops

• Computers assessed based on their performance on the LINPACK Benchmark – calculating the solution to a dense system of linear equations.

– User may scale the size of the problem and optimize the software in order to achieve the best performance for a given machine

– Algorithm used must conform to LU factorization with partial pivoting (operation count for the algorithm must be 2/3 n^3 + O(n^2) double precision floating point operations.

• Rpeak values calculated using the advertised clock rate of the CPU. (theoretical performance)

• Rmax = maximal LINPACK performance achieved (actual performance)

Rensselaer CCI Blue Gene Q on current Top500 list (November 2017):

• 229th most powerful supercomputer in the world

• 45th most powerful Academic supercomputer in the world

• 6th most powerful Academic supercomputer in the US (of 15 on Top500 list)

Rank System Cores Rmax Rpeak Power

Page 12: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Performance Development(Slide courtesy of Jack Dongarra)

0.1

1

10

100

1000

10000

100000

1000000

10000000

100000000

1 Gflop/s

1 Tflop/s

100 Mflop/s

100 Gflop/s

100 Tflop/s

10 Gflop/s

10 Tflop/s

1 Pflop/s

100 Pflop/s

10 Pflop/s

59.7 GFlop/s

400 MFlop/s

1.17 TFlop/s

10.5 PFlop/s

51 TFlop/s

74 PFlop/s

SUM

N=1

N=500

6-8 years

Jack’s Laptop (12 Gflop/s)

1993 1995 1997 1999 2001 2003 2005 2007 2009 2011

Jack’s iPad2 & iPhone 4s (1.02 Gflop/s)

Page 13: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Cyberinfrastructure evolution: Broader

national cyberinfrastructure development

• Publication of the Atkins report from NSF’s Blue

Ribbon Task Force on Cyberinfrastructure

accelerated CI as a critical national focus within

federal R&D investments and especially at NSF

• Report and follow-on programs and projects

evolved existing efforts and provided the seed for a

new era of Cyberinfrastructure innovations in the

research community whose impact can still be seen

today

– NSF Partnerships for Advanced Computing Infrastructure

– NSF TeraGrid, XSEDE

– DOE Science Grid

– NIH Big Data to Knowledge, etc.

Atkins Report: http://www.nsf.gov/cise/sci/reports/atkins.pdf

Page 14: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Data and the LHC

Page 15: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

The Large Hadron Collider (LHC)

• LHC is the world’s most powerful particle collider.

• LHC’s goal is to allow physicists to test the predictions of

different theories of particle physics, high-energy

physics, (in particular the properties of the Higgs Boson)

and the large family of new particles predicted by

supersymmetric theories.

• LHC contains seven detectors, each designed for a

different kind of research. LHC built near Geneva

between 1998 and 2008 in collaboration with over

10,000 scientists and engineers from over 100 countries

• LHC lies in a 17 mile circumference tunnel beneath the

France-Switzerland border.

• LHC collisions produce 10’s of PBs of data per year.

– Subset of data analyzed by distributed grid of 170+

computers in 36 countries

A collider is a type of a particle accelerator with two directed

beams of particles.

In particle physics colliders are used as a research tool: they

accelerate particles to very high kinetic energies and let them

impact other particles.

Analysis of the byproducts of these collisions gives scientists

good evidence of the structure of the subatomic world and the laws

of nature governing it.

Many of these byproducts are produced only by high energy

collisions, and they decay after very short periods of time. Thus many of them are hard or near

impossible to study in other ways.

Information from Jamie Shiers and Wikipedia

Page 16: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

What happens at CERN?

• Accelerators create particle collisions

– Protons circulate at close to the speed of light

– 10’s of millions of collisions every second

– Collisions recreate the conditions of the first moments of the universe

• Detectors study collisions and the thousands of particles emerging from them.

• Worldwide network of computers filters, records and processes the data from the collisions

– LHC computing grid processes PBs of data each year

• Physicists throughout the world analyze the data

Information from http://home.cern/

CERN's current and future accelerators

• Linear accelerator 2 Linac 2 is the starting point for the protons used in physics experiments at CERN

• Linear accelerator 3 Linac 3 is the starting point for the ions used in physics experiments at CERN

• Linear accelerator 4 Linac 4 boosts negative hydrogen ions to high energies. It will become the source of proton beams for the Large Hadron Collider in 2020

• The Antiproton Decelerator Not all accelerators increase a particle's speed. The AD slows down antiprotons so they can be used to study antimatter

• The Large Hadron Collider The 27-kilometre LHC is the world's largest particle accelerator. It collides protons or lead ions at energies approaching the speed of light

• The Low Energy Ion Ring LEIR takes long pulses of lead ions from Linac 3 and transforms them into the short, dense bunches suitable for injection to the Large Hadron Collider

• The Proton Synchrotron A workhorse of CERN's accelerator complex, the Proton Synchrotron has juggled many types of particle since it was first switched on in 1959

• The Proton Synchrotron Booster Four superimposed synchrotron rings receive protons from the linear accelerator, boost them to 800 MeV and inject them into the Proton Synchrotron

• The Super Proton Synchrotron The second-largest machine in CERN’s accelerator complex provides a stepping stone between the Proton Synchrotron and the LHC

Page 17: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Worldwide LHC Computing Grid

Image from http://wlcg.web.cern.ch/

Page 18: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

LHC – Stewardship and Preservation Challenges

• Significant volumes of high energy physics data are thrown away “at birth” – i.e. via very strict filters (aka triggers) before writing to storage. To a first approximation, all remaining data needs to be preserved for a few decades.

– LHC data particularly valuable as reproducibility of experiments is tremendously expensive and almost impossible to achieve

• Tier 0 and 1 sites currently provide bit preservation at scale

– Data more usable and accessible when services coupled with bit preservation

– In the process of “self certification” according to ISO 16363 of the Tier0 and TIer1 sites.

Slide adapted from Jamie Shiers, CERN 2016

Page 19: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Post-collision

David South | Data Preservation and Long Term Analysis in HEP | CHEP 2012, May 21-25 2012 | Page 6

After the collisions have stopped

> Finish the analyses! But then what do you do with the data?

§ Until recently, there was no clear policy on this in the HEP community

§ It’s possible that older HEP experiments have in fact simply lost the data

> Data preservation, including long term access, is generally not part of

the planning, software design or budget of an experiment

§ So far, HEP data preservation initiatives have been in the main not planned by the

original collaborations, but rather the effort a few knowledgeable people

> The conservation of tapes is not equivalent to

data preservation!

§ “We cannot ensure data is stored in file formats appropriate for

long term preservation”

§ “The software for exploiting the data is under the control of the

experiments”

§ “We are sure most of the data are not easily accessible!”

Slide adapted from Jamie Shiers, CERN 2016

Page 20: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Data: Outlook for

HL-LHC

• The LHC – including all

foreseen upgrades – will

run until circa 2040. By

that time between 10 and

100 EB of

data will have been

gathered.

• These data (the

uninteresting stuff has

already been discarded)

should be preserved for a

number of decades.

• Very rough estimate of a new RAW data per year of running using a simple extrapolation of current data volume scaled by the output rates.

• To be added: derived data (ESD, AOD), simulation, user data…

At least 0.5 EB / year (x 10 years of data taking)

PB

0.0

50.0

100.0

150.0

200.0

250.0

300.0

350.0

400.0

450.0

Run 1 Run 2 Run 3 Run 4

CMS

ATLAS

ALICE

LHCb

We are here!

Slide adapted from Jamie Shiers, CERN in 2016

Page 21: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Digitally-enabled Movies

Page 22: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Data Stewardship and Preservation especially important as the

Arts become more digitally-enabled

• Consumers must move (migrate) downloaded digital music to new media players when old players are too full, sometimes requiring re-registration of Digital Rights Management authorization to insure they do not lose access to favorite songs

• Authors must find applications interoperable with old word processing SW to read manuscripts written with obsolete SW

• Digital photos recorded on floppies can’t be accessed on modern computers without floppy disk drives

• Old video games may only run on obsolete game systems

Page 23: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Digital movies

• Most movies are not shot on film but recorded through digital media

– More than 80% of the movie theaters in the U.S. no longer handle film …

• Many digital technologies used in film-making:– Image capture

– Visual effects

– Mastering and final color grading

– Sound capture

– Sound effects

– Sound editing and mixing

– Digital distribution to theaters and other platforms, etc.

• Film industry has been adopting digital technologies in piecemeal fashion over the last 25+ years

The Girl with the Dragon Tattoo was

produced entirely in

digital format

Page 24: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Many components

of movie process

archived beyond

the film itself

Images from http://spectrum.ieee.org/consumer-electronics/standards/will-todays-digital-movies-exist-in-100-years#

Page 25: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Avatar: Digital tour-de-

force

• Film released in 2009 and

distributed by 20th Century Fox

• Directed and written by James

Cameron, produced by James Cameron and Jon Landau

• Became highest grossing film of all time (>$2B)

• Won Academy Awards for Best Art Direction, Best

Cinematography and Best Visual Effects

• Sequel coming

Avatar image from Film Education http://www.filmeducation.org/resources/film_library/getfilm.php?film=2037

Page 26: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Avatar both data-intensive and compute-

intensive

• Avatar technologies developed by Weta Digital Ltd and partners.

– Weta Digital Ltd. Data Center in New Zealand

– (Weta Digital also responsible for computer-rendered scenes in Lord of the Rings Trilogy, King Kong, etc.)

• Avatar IT equal parts of computing power in the data center (creating the visual effects) and data management of artistic processes (driving the film experience)

• Every minute of Avatar represents 17.28 GB of data (~ 3TB in all)

• Avatar used 1 PB of storage space for rendering

Page 27: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Avatar -- Innovative IT

• Technological innovations included:

– Performance capture process: actors wore special gear and cameras that translated live action into realistic animation in real-time

– 3D Fusion Camera: 2 high defcameras in a single camera body to create depth perception

– Virtual camera system: shows actors’ virtual counterparts in their digital surroundings in real time

– Motion capture stage, etc.

Avatar image from Wikipedia article with caption “Cameron pioneered a specially designed camera built into a 6-inch boom that allowed the facial expressions of the actors to be captured and digitally recorded for the animators to use later.”

http://en.wikipedia.org/wiki/Avatar_(2009_film)

Page 28: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Avatar Technologies

http://www.youtube.com/watch?v=OJ1JzYPjcj0

(8:39)

CGI = Computer-generated imagery

Page 29: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Weta Digital IT Environment

• Computing core included 40K processors and 104TB of RAM

– 10K square foot server farm with 34 racks of 32 HP Blade servers each

– Center uses water-cooled racks and leverages chilly climate of New Zealand

– Interconnected by 10 gigabit network so that storage seems local

• Data storage leveraged partnerships with NetApp and Fujitsu to develop storage system which

– reduced the amount of manual data management in the process of rendering files

– balanced the throughput requirements of the renderwall (compute) to maximize access to commonly used files

• Digital Asset Management System “Gaia” developed by Microsoft

Page 30: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Avatar (Weta Digital) computers occupied

spots 193-197 on the Top500 List in

November 2009

Page 31: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Great reads …

Page 32: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Lecture 7 Sources• Atkins Report: http://www.nsf.gov/cise/sci/reports/atkins.pdf

• LHC, www.wikipedia.com

• Worldwide LHC Computing Grid website, http://wlcg-public.web.cern.ch/tier-centres

• The Digital Dilemma, Strategic Issues in Archiving and Accessing Digital Motion Picture Materials, Science and Technology Council of the Academy of Motion Picture Arts and Sciences, http://www.scribd.com/doc/55498058/The-Digital-Dilemma

• Processing Avatar, Information Management http://www.information-management.com/newsletters/avatar_data_processing-10016774-1.html

• Data Plays a Supporting Role in Avatar, ComputerWorldhttp://www.computerworld.com/s/article/346361/Data_center_plays_supporting_role_in_i_Avatar_i_

• Wikipedia article on Avatar http://en.wikipedia.org/wiki/Avatar_(2009_film)

• “Will Today’s Digital Movies Exist in 100 Years”, IEEE Spectrum, http://spectrum.ieee.org/consumer-electronics/standards/will-todays-digital-movies-exist-in-100-years#

• “The Afterlife is Expensive for Digital Movies”, The New York Times, http://www.nytimes.com/2007/12/23/business/media/23steal.html?pagewanted=all

Page 33: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Discussion article for Today

• “Is big data racist? Why policing by data isn’t necessarily objective”, Ars Technica, https://arstechnica.com/tech-policy/2017/12/is-big-data-racist-why-policing-by-data-isnt-necessarily-objective/2/

Page 34: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Presentations

Page 35: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Presentation Articles for April 6

• “Embedding a tweet could be copyright infringement, says new court ruling”, The Verge, https://www.theverge.com/2018/2/16/17020278/tweet-embed-copyright-infringement-justin-goldman-tom-brady-photo-ruling [Ben H]

• “Hatch introduces bipartisan bill to clarify cross-border data policies”, The Hill, http://thehill.com/policy/technology/372637-hatch-introduces-bipartisan-bill-to-clarify-cross-border-data-policies [Alex C]

• “Canada’s Privacy Commissioner contemplates new online erasure, data protection rules”, Reuters, https://www.reuters.com/article/bc-finreg-data-protection-rules-canada/canadas-privacy-commissioner-contemplates-new-online-erasure-data-protection-rules-idUSKCN1GD66F [Ethan S]

• “How a fight over Star Wars download codes could reshape copyright law,” Ars Technica, https://arstechnica.com/tech-policy/2018/02/judge-slaps-down-disney-effort-to-stop-resale-of-star-wars-download-codes/[Yishan D]

Page 36: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Presentation articles for April 11

• “Click here to kill everyone,” New York Magazine, http://nymag.com/selectall/2017/01/the-internet-of-things-dangerous-future-bruce-schneier.html [Ethan G]

• “Where does blockchain fit into digital rights management,” IPWatchdog, http://www.ipwatchdog.com/2018/02/06/blockchain-fit-digital-rights-management/id=93024/ [Lindsay Z]

• “French news site L'Express exposed reader data online, weeks before GDPR deadline”, Zdnet, http://www.zdnet.com/article/french-magazine-lexpress-exposed-reader-data/ [Peter K]

• “Everything you need to know about Led Zeppelin’s “Stairway to Heaven” copyright trial”, LA Times, http://www.latimes.com/entertainment/music/la-et-ms-led-zeppelin-copyright-trial-info-20160614-snap-story.html [Kayla C]

Page 37: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Presentation articles for April 13

• “The Follower Factory,” New York Times, https://www.nytimes.com/interactive/2018/01/27/technology/social-media-bots.html [Wei P.]

• “Is it too late for big data ethics?” Forbes, https://www.forbes.com/sites/kalevleetaru/2017/10/16/is-it-too-late-for-big-data-ethics/#4fd4e33f3a6d [Daniel C]

• “Your Roomba already maps your home. Now the CEO plans to sell that map,” USA Today, https://www.usatoday.com/story/tech/nation-now/2017/07/25/roomba-plans-sell-maps-users-homes/508578001/[Michelle H]

• “Racist, sexist AI could be a bigger problem than lost jobs,” Forbes, https://www.forbes.com/sites/parmyolson/2018/02/26/artificial-intelligence-ai-bias-google/#fd91bbf1a015 [Halley F]

Page 38: Data and Society Lecture 7: Data Infrastructurebermaf/Data Course 2018/Lecture 7... · 2018-03-30 · Fran Berman, Data and Society, CSCI 4370/6370 Also in 1993: The Top500 List created

Fran Berman, Data and Society, CSCI 4370/6370

Presentation articles for Today

• “The world’s most valuable resource is no longer oil but data,” The Economist, https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource [Zimo X]

• “Data is infrastructure; how is data transforming UK construction and infrastructure?”, Lexology, https://www.lexology.com/library/detail.aspx?g=f29218ee-9027-44b2-8d93-df3b5fa06e5e [Sarah M]

• “America’s digital infrastructure is crumbling, too” Bloomberg View, https://www.bloomberg.com/view/articles/2018-02-01/america-s-digital-infrastructure-is-crumbling-too [Diego C]

• “The quest for digital equity”, Gov Tech, http://www.govtech.com/civic/The-Quest-for-Digital-Equity.html [Trulee]