data science for big data application and analytics mooc dr. brand niemann director and senior data...

20
Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Virginia-Big-Data-Meetup / http://www.meetup.com/Federal-Big-Data-Working-Group/ http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup / http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Gro up_Meetup March 2, 2015 1

Upload: rafe-lyons

Post on 12-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

1

Data Science for Big Data Application and Analytics MOOC

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Virginia-Big-Data-Meetup/ http://www.meetup.com/Federal-Big-Data-Working-Group/

http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

March 2, 2015

Page 2: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

2

Introduction

• Welcome:– Federal Big Data Working Group Meetup– Virginia Big Data Meetup– Lotico Northern Virginia Semantic Web– NEW: Natural Medicines for Health and Wellness

Page 3: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

3

Federal Big Data Working Group Meetup

• Federal: Supports the Federal Big Data Initiative, but not endorsed by the Federal Government or its Agencies;

• Big Data: Supports the Federal Digital Government Strategy which is "treating all content as data", so big data = all your content;

• Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products; and

• Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Courses) being considered by the White House

Page 4: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

4

The Profit and Data Enterprises• Marcus Lemonis (born

November 16, 1973) is a Lebanese-born American businessman, investor, television personality and philanthropist. He is currently the chairman and CEO of Camping World and Good Sam Enterprises, and the star of The Profit, a CNBC reality show about saving small businesses through People, Process, and Products.– http://

en.wikipedia.org/wiki/Marcus_Lemonis

• The Federal Big Data Working Group Meetup is also about helping government agencies develop:– People – Data Scientists– Process – Data Infrastructure– Products – Data Publications

• Some examples:– EPA– FDA– NOAA– HHS– Eastern Foundry

• And provide MOOCs for training and networking. (Massive Open Online Courses)

Page 5: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

5

Top 5 MOOCs for Data ScienceCOURSE ORGANIZATION NOTES

Machine Learning Coursera (Standford) One of the first MOOCs

Intro to Data Science Coursera (U of Washington) Starts in April 2013

Intro to Statistics Making Decisions Based on Data

Udacity Enroll anytime

Introduction to Infographics and Data Visualization

Knight Center @ U of Texas Starts January 12, 2013

Learning From Data CalTech Starts Jan 8, 2013

http://101.datascience.community/2012/12/26/top-5-moocs-for-data-science/

Page 6: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

6

Five MOOCs for Big DataApplications and Analytics

• Practical Data Science for Data Scientists by Niemann Based on Schutt and O’Neil Book

• Data Science for Data Mining by Niemann Based on North Book and Borne Class

• Federal Big Data Working Group Meetups by Niemann and Goodier

• Tackling the Challenges of Big Data, MIT ProfessionalX Online Course by Niemann Based on Rus and Madden MOOC

• Data Science for Big Data Application and Analytics MOOC by Niemann Based on Geoffrey Fox MOOC

• Data Science for Mining of Massive Datasets by Niemann Based on Stanford MOOC (IN PROCESS)

See: Top 5 MOOCs for Data Science

Page 7: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

7

Calendar• NITRD FASTER Bigdata at NSF, February 17, 2015:

– To be rescheduled• Mission Source Consulting Launch Party, February 28:

– Pre-launch of Natural Medicines for Health and Wellness Meetup• 5th Annual Government Big Data Forum, March 12, 2015• USDA CIO and ACDO on Open Data Plan and Roundtable, March 16, 2015• Government Technology & Innovation Incubator for Big Data Analytics II,

TBA. Week of March 23, Need Sponsor• Data Science for HealthData.gov Developers & Family Caregivers. April 6,

2015– David Portnoy out of the country – working on replacement

• The Wharton DC Alumni Innovation Summit, April 28-29, 2015• President's Chief Data Scientist and EPA Big Data Analytics, April 20, 2015

– David Portnoy helping organize• Data Science for Natural Medicines and Epigenetics, May 4, 2015

– Dr. Joel D. Wallach: Epigenetics

Page 8: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

8

Data Science for MyFamilySearch.org and FamilyTree DNA, February 16th

• January 13, 2015: Family Search Launches New App Gallery (more than 50 apps)

• February 12–14, 2015: RootsTech 2015 Developer Challenge in Salt Lake City, Utah

• My Entry: Big Data from Everywhere for Families and Community Service

• My Partner Work: Data Science for MyFamilySearch.org• Syed Ali’s App: National Geographic Genographic Project and

Big Data• You could be a partner and develop apps (e.g. A Billion Person

Family Tree with MongoDB by Randall Wilson, Family Tree of Data: Provenance and Neo4, etc.)

Page 9: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

9

FamilySearch.org

• “FamilySearch is a great resource, but FamilySearch alone can’t do everything. That is why we work with partners to provide complementary tools and resources and why the FamilySearch App Gallery is so important,” said Dennis Brimhall, FamilySearch CEO.

• “We’ve had partners for many years, and now we want to make it easier for our patrons to know about them and to find the apps they need.”

Page 10: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

10

MyTableBox of MyFamily Tree

http://semanticommunity.info/MyFamilySearch.org#MyTableBox_of_MyFamily_Tree

Page 11: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

11

Person Template for Brand Lee Niemann

http://semanticommunity.info/MyFamilySearch.org#Person_Template_for_Brand_Lee_Niemann

Page 12: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

12

Mini-Tutorial: Sony Camcorder and Camtasia Video to YouTube Video

• How is the data collected?– Sony Camcorder and PowerPoint Slides.

• Where is the data stored?– Hard drive and DVD in MP4 format.

• What are the results?– MP4 files converted and uploaded to YouTube.

• Why should we believe the results?– Because I and others have done it successfully

many times.

Page 14: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

14

Big Data Symposium at the National Research Council, October 23, 2014

• Symposium on the Interagency Strategic Plan for Big Data: Focus on R&D• 8:45 Session chair: Clifford Lynch, CNI (Conflict)

• Overview Presentation Allen Dearry, NIH (North Carolina) and Five Strategic Areas:– 1. Technologies, Howard Wactlar, NSF– 2. Knowledge to action, Peter Lyster, NIH (Previous Meetup)– 3. Sustainability, Sky Bristol, USGS (by videoconference)– 4. Education, Michelle Dunn, NIH– 5. Gateways, Kamie Roberts, NIST

– 10:15 Break – 10:30 Session chair: Alexa McCray, Harvard Medical School (Conflict)

• Panel discussion: Response to Five Strategic Areas and Comments– Keith Clarke, UC Santa Barbara– Kirk Borne, George Mason University (Previous Meetup)– Jane Snowdon, IBM Federal

http://sites.nationalacademies.org/PGA/brdi/PGA_152373

Note: This is where I started to organize a Meetup, but now we have gone beyond this.

Page 15: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

15

Big Data Application and Analytics MOOC: Email

• The email said: Check Out Professor Geoffrey Fox's MOOC Starting December 1st.

• Folks, Here is the link to the "Big Data Applications and Analytics" MOOC:– https://bigdatacourse.appspot.com/preview

• For big data novices, this is a gentle introduction.• For big data experts, this exposes Professor Fox's

perspectives and insights and source references without all the details and mathematical models.

Page 16: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

16

Data Science for Big Data Application and Analytics MOOC

• I downloaded the ZIP file of Course Syllabus (PDF), Slides (PDF) and Python Files and explored them. Then I mined and structured the content into MindTouch to build a Knowledge Base of the essence of Professor Geoffrey Fox's MOOC so I could make it part of the Federal Big Data Working Group Meetup MOOC.

• I asked Professor Fox: Are there data sets used in the course? His reply was: Only a few small sample datasets and simple Monte Carlo sets. So I set about to find them and reuse them in Spotfire Statistics and Visualizations.

Page 17: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

17

Data Science for Big Data Application and Analytics MOOC: Knowledge Base

Data Science for Big Data Application and Analytics MOOC

Page 18: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

18

Big Data Application and Analytics MOOC

• Section 1: Introduction• Section 2: Overview of Data Science: What is Big Data, Data Analytics and X-

Informatics? See Next Slide• Section 3: Technology Training• Section 4 - Physics Case Study• Section 5: Technology Training• Section 6 - e-Commerce and LifeStyle Case Study• Section 7 - Infrastructure and Technologies for Big Data X-Informatics• Section 8 - Web Search Informatics• Section 9 - Technology for X-Informatics• Section 10 - Health Informatics• Section 11 - Sensor Informatics• Section 12 - Radar Informatics• Section 13 Spotfire: Spotfire Recommendations for Analytic Data Publications Note:

My Section

Page 19: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

19

Semantic Web and Big Data:Features of Data Deluge

• Semantic Web/Grid Versus Big Data:– Original vision of Semantic Web was that one would

annotate (curate) web pages by extra "meta-data" (data about data) to tell web browser (machine, person) the "real meaning" of page

– The success of Google Search is "Big Data“ approach; one mines the text on page to find "real meaning"

– Obviously combination is powerful, but the pure "Big Data" method is more powerful than expected 15 years ago

Link

Page 20: Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

20

Agenda• 6:30 p.m. Welcome and Introduction (New Tutorial and Mentoring) Slides

Big Data Symposium at the National Research Council, October 23, 2014, Slides and Data Science for Big Data Application and Analytics MOOC Also See: Top 5 MOOCs for Data Science and Spotfire Recommendations for Analytic Data Publications

• 7:10 p.m. Brief Member Introductions• 7:15 p.m. Professor Geoffrey Fox (remote), Director of the Digital Science

Center and Associate Dean for Research and Graduate Studies at the School of Informatics and Computing, Big Data Applications and Analytics MOOC Also See: Community Grids Laboratory Web Sites and Future Systems

• 8:30 p.m. Open Discussion• 8:45 p.m. Networking• 9:00 p.m. Depart

http://www.meetup.com/Federal-Big-Data-Working-Group/events/218925651/