mining scipy lectures
DESCRIPTION
Lecture given at Scipy 2011, Austin, Tx about mining the Scipy Lectures - Visualization and ClusteringTRANSCRIPT
Mining LecturesMarcel Caraciolo - @marcelcaraciolo
1
Who’s me ? Marcel Pinheiro Caraciolo
Brazilian, lover of crabs
M.S.C Candidate at Data Mining and Recommender Systems
Current moderator of the Local Python User Group at Pernambuco
Interested at machine learning, recommender systems and mobile computing
Blogging about machine learning with Python since 2008 http://aimotion.blogspot.com
Young apprentice with Python programming since 2008.
Director of P&D - brazilian startup Orygens
2
How I started this analysis?
24 hours ago...
3
Question
How were the topics distributed around the Scipy Conference
General Sessions ?
4
Scrapping of Scipy Conference
Small Web-Crawler for extracting the approved lectures
urllib2, re, BeautifulSoap...5
Resume
Lectures
minutes length
41
820
6
It means...
=~ 4100 tweets posted.
7
Or watch...
Star Wars Trilogy
2x
8
Or finish Super Mario Game...
82 x!
9
Or open the Eclipse
2 x!
Na nossa língua agora...
Abrir o Eclipse 2 vezes!
11
10
Most popular Authors
Dharhas Pothina - 3
Wes McKinney - 2
All the others - 1
11
Playing with the text...
The most frequent words at the conference
nltk, re
12
But let’s take a deeper look.I used the clustering algorithm K-Means
Tool used for visualization Ubigraph
13
Distribution of the Lectures
Basic Frameworksmatplotlib, ipython
Parallelism performance, gpu, statistical
Building frameworksperformance, models, web services
VisualizationNumpy
toolkits using Numpy
data analysis, statistical
14
To sum up...
Mining english text is so much easier!!!Submit your work also!
Spread the scientific python over the community
I expect to be back to Scipy next year!
15
Mining LecturesMarcel Caraciolo - @marcelcaraciolo
https://github.com/marcelcaraciolo/clustering_scipy
16