data scientist enablement dse 400 week 3 roadmap

17
Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 3 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others Content of this document is under Creative Commons Licence CC BY 4.0

Upload: dr-mohan-k-bavirisetty

Post on 07-May-2015

340 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Data scientist enablement   dse 400   week 3 roadmap

Data Scientist EnablementDSE 400 - Fast Track to Data Science

Week 3 Roadmap

Advanced Center of ExcellenceModern Renaissance CorporationIn Collaboration with SONO team and others

Content of this document is under Creative Commons Licence CC BY 4.0

Page 2: Data scientist enablement   dse 400   week 3 roadmap

AgendaYou can always find the latest version of this document at http://bit.ly/1dILgbT

RecapWeek 3 OverviewDiscussions on SONOLearning PathActivities and Practice AssignmentSubmissionLooking aheadReferencesCitation It is not in the stars to hold our destiny but in

ourselves. - William Shakespeare

Page 3: Data scientist enablement   dse 400   week 3 roadmap

During weeks 1-2 we covered following areasData Science and its LandscapePlay with datasets in R-StudioEmploy R packagesBasic Statistical ConceptsVisually describing the datasetsExplored SONO and participated in Discussions...

Recap

Page 4: Data scientist enablement   dse 400   week 3 roadmap

Discussions:Big Data in 2014. Netflix 1 M Case Study. Optional Q&A.

Learning plan:Read R for Machine Learning by Allison Chang

Activities:

Explore Amazon. Survey ML in your industry. Apply for Schmid Fellowship ...

Assignment 3:Download Mushroom dataset from MIT OCW Prediction Dataset Import into your R-Studio environment and apply Apriori algorithm.

DSE 400 - Week 3 at a glance

Page 5: Data scientist enablement   dse 400   week 3 roadmap

Discussion 1: Read Big Data In 2014: 6 Bold Predictions and share your thoughts on how impactful these predictions are going to be in your industry or the area of your focus. If you don’t have a preferred industry, focus on either on Healthcare or Education sectors.

Discussion 2: Research on Netflix 1 M Prize - Belcor Solution. Discuss how Belcor solution benefited Netflix by improving Recommendations. Can this algorithm/technique be applied elsewhere? Share your thoughts.

These discussions are required. If you already have access to SONO > DSE 400, you will be required to participate in these discussions. There will also be an Optional Q&A.

Please do not create additional threads in weekly KCs.

Social Engagement on SONO - Week 3http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1003

Page 6: Data scientist enablement   dse 400   week 3 roadmap

SONO or SOKNO (Social Knowledge platform) is chosen for the DSE program to enable Social Engagement, Collaboration as well as Knowledge Dissemination which are all important to an Open initiative like this.

To facilitate easy navigation, here are some tweaks you could employ to reach the right destination. To enter a Knowledge Cell, login first then use the full url to enter right KC. For week 3 you would use this link http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1003

Weekly KCs DSE 400 Week 1, 2 ... etc. map to knocell numbers 1001, 1002 and so on on these urls. Once you are in a KC click on Threads link on left panel, to go to the current discussions. We certainly appreciate your patience during this transitory phase.

SONO Tweaks

Page 7: Data scientist enablement   dse 400   week 3 roadmap

Recommended Learning PlanRead R for Machine Learning by Allison Chang (Sections 4.1 - 4.5, page 7) Look up and research recommended ML algorithms and associated R packagesAlso refer to the blog post Machine Learning for Beginnersand presentation on Machine Learning With R by David Chiu

<Optional> Watch Machine Learning: The Basics by Ron Bekkerman<Optional> Watch Introduction to R for Data Mining by Joseph Rickert

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. - Tom Mitchell, Machine Learning, 1997

Page 8: Data scientist enablement   dse 400   week 3 roadmap

<Practice> Visit Data Science Central. Examine “Visualization of the day”<Practice> Using publicly available resources, investigate what algorithmic techniques Amazon employs to recommend related products when you search for one. Do not employ private intellectual capital (iCap)<Practice> Survey Machine Learning Algorithmic Techniques your organization or industry employs. List top 10 of these with the use cases. Briefly discuss about the outcomes. Do not access or disclose any iCap.

<Optional> Explore State of World Children 2014 in Numbers. Where do the poorest children live? What is being done to improve their lives? What are systemic problems that still need to be solved?

Activities

Page 9: Data scientist enablement   dse 400   week 3 roadmap

Activities - contd ...<Optional> Check out The Eric and Wendy Schmidt 'Data Science for Social Good' Summer Fellowship. If interested, apply to this fellowship.

<Optional> Eminent Economist and Nobel Laureate, Amartya Sen from Harvard University has a theory that effectively says, “poverty and famines are caused artificially by the inefficiency inherent in the economic system, not the result of natural forces.” Research on Prof. Sen’s methodologies and examine what data he employs to reach these rather remarkable conclusions.

Need more? Reach out to our Research Fellow Ms. Rachel Fleming< [email protected]> and ask for advanced activities, challenges and research topics.

Page 10: Data scientist enablement   dse 400   week 3 roadmap

Assignment 3 - Submission Required

Download Mushroom dataset from MIT OCW Prediction Dataset page. Import this dataset into your R-Studio. Apply Apriori Algorithm to this dataset. You would require arules package to apply this algorithm.

<Help On Demand> You may reach out to our Research Fellow Ms. Rachel Fleming <[email protected]> if you have any difficulties with this assignment.

Page 11: Data scientist enablement   dse 400   week 3 roadmap

Submission Deadline Saturday, 11:59 PM your local time.

Mail Assignment 3 to <[email protected]> Notice the change in email address. Submit a single PDF document showing the screenshot/s of your R-Studio workspace and also the output from your Apriori Analysis. Use this naming convention: DSE 400 > Assignment 3 > Your Full Name for your document. No document links should be sent. Just one single PDF document. Please add DSE 400 > Assignment 3 in the subject line. Use only PDF format and kindly avoid other formats.

Page 12: Data scientist enablement   dse 400   week 3 roadmap

Week 4 Machine Learning - contd … Refer to R for Machine Learning by Allison Chang

Week 5 Visualizations. Submit your research Data Visualization Tools - A Comparative Study

Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc.

Week 8 Ethics, Privacy and Building Data Products.

DSE 400 - Weeks 4-8 ahead

Page 13: Data scientist enablement   dse 400   week 3 roadmap

References, Resources and Additional Reading

[MIT OCW] R for Machine Learning by Allison Chung An Introduction to Machine Learning. Hilary Mason, O’Reilly Media Inc., 2011Machine Learning, Tom Mitchell, Mc Graw-Hill Publishers, 1997Advanced Machine Learning. Hilary Mason, O’Reilly Media Inc., 2012Scaling Up Machine Learning. Bekkerman, Bilenko, and Langford, O’Reilly Publishers, 2011[MIT OCW] Prediction: Machine Learning and Statistics Stanford University Machine Learning Video CollectionCaltech Machine Learning Video Collection

Page 14: Data scientist enablement   dse 400   week 3 roadmap

Citation The dataset titled Mushroom (agaricus-lepiota) Data used here for Assignment 3, is drawn from The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf. Donor: Jeff Schlimmer [email protected]. Date: 27 April 1987.

R for Machine Learning by Allison Chang is recommended by MIT Course Prediction: Machine Learning and Statistics from Sloan School of Management, It is adopted in DSE 400 as per OCW guidelines.

Content that appears as is on this document only, is under Creative Commons License CC BY 4.0 This license may not necessarily apply to other material referenced here in this document.

Page 15: Data scientist enablement   dse 400   week 3 roadmap

For More Information

Week 3 discussions take place during this week on SONO DSE 400 Week 3

<Help On Demand> You may reach out to our Research Scholar Ms. Rachel Fleming <[email protected]> if you have any difficulties with the assignments.

We welcome questions, thoughts and suggestions. Post these on SONO in the right forum/discussion or write to us at <[email protected]>

You can always find the latest version of this document at http://bit.ly/1dILgbT

Page 16: Data scientist enablement   dse 400   week 3 roadmap

Fun@Work

Page 17: Data scientist enablement   dse 400   week 3 roadmap

Richard Feynmann was awarded Nobel Prize for Physics in 1965 along with Sin-Itiro Tomonaga and Julian Schwinger, "for their fundamental work in quantum electrodynamics, with deep-ploughing consequences for the physics of elementary particles".

Thank You