data scientology starter pack, Сергей Казаков
TRANSCRIPT
![Page 1: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/1.jpg)
Ростовское IT сообществоData Science Meetup4 марта 2017#dsmt61
Data scientology starter pack
Казаков Сергей[email protected]
![Page 2: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/2.jpg)
Episode 1
![Page 3: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/3.jpg)
Anaconda: https://www.continuum.io/● Linux, macOS, Windows(!!!)● python 2.7, 3.4, 3.5, 3.6● conda package manager
○ conda install package-name○ > 100 pre-built and tested scientific and
analytic Python packages○ > 620 more packages are available:
https://repo.continuum.io/pkgs/● Jupyter/IPython, Spyder, Visual Studio
![Page 4: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/4.jpg)
Python data analysis whales
![Page 5: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/5.jpg)
SciPy Ecosystem: https://www.scipy.org/
![Page 6: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/6.jpg)
Pandas● NumPy powered● IO Tools (text, sql, HDF5, json, …)● Series, Dataframe, Panel● filter, reshape, groupby,● aggregate, vectorized, rolling, expanding operations● merge, join, concatenate, whatever● plotting (matplotlib, seaborn)● …
cheat sheet: https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
![Page 7: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/7.jpg)
Jupyter Notebook● Jupyter ● Jupyter Hub ● Jupyter Lab
![Page 8: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/8.jpg)
Машинное обучение
![Page 9: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/9.jpg)
![Page 10: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/10.jpg)
Задачи● Классификация
○ Бинарная○ Многоклассовая
■ классы не пересекаются■ классы пересекаются
● Восстановление регрессии● Кластеризация
Все остальное
● Ранжирование● Обнаружение аномалий● Обучение с подкреплением● Уменьшение размерности● ...
![Page 11: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/11.jpg)
“Hello, data science world!”
Scikit-learn: http://scikit-learn.org/
X, y = make_blobs(n_samples=10000, n_features=10, centers=100)
clf = RandomForestClassifier()
X_train, X_test = X[:8000], X[8000:]
y_train, y_test = y[:8000], y[8000:]
clf.fit(X_train, y_train)
print clf.score(X_test, y_test)
print cross_val_score(clf, X, y).mean()
![Page 12: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/12.jpg)
Соревнования по машинному обучению
● данные○ train○ test
■ public■ private
● метрики● рейтинг участников● где
○ https://www.kaggle.com/○ http://www.image-net.org/○ https://www.numer.ai/
![Page 13: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/13.jpg)
![Page 14: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/14.jpg)
● Andrew Ng● ШАД Курс “Машинное обучение”
К.В. Воронцов
“Компьютерные науки” на Youtube
● А.Г. Дьяконов○ https://alexanderdyakonov.wordpress.com/○ http://www.machinelearning.ru/
![Page 15: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/15.jpg)
![Page 16: Data scientology starter pack, Сергей Казаков](https://reader031.vdocuments.us/reader031/viewer/2022020213/58ceb6801a28abb2218b5ded/html5/thumbnails/16.jpg)
Udacity Deep Learning Online Course
● CNN● RNN● LSTM
Let me google that for you:
● Keras○ Theano○ Tensorflow
● MXNet● Torch● Caffe