mining scipy lectures

16
Mining Lectures Marcel Caraciolo - @marcelcaraciolo 1

Upload: marcel-caraciolo

Post on 13-May-2015

3.556 views

Category:

Technology


2 download

DESCRIPTION

Lecture given at Scipy 2011, Austin, Tx about mining the Scipy Lectures - Visualization and Clustering

TRANSCRIPT

Page 1: Mining Scipy Lectures

Mining LecturesMarcel Caraciolo - @marcelcaraciolo

1

Page 2: Mining Scipy Lectures

Who’s me ? Marcel Pinheiro Caraciolo

Brazilian, lover of crabs

M.S.C Candidate at Data Mining and Recommender Systems

Current moderator of the Local Python User Group at Pernambuco

Interested at machine learning, recommender systems and mobile computing

Blogging about machine learning with Python since 2008 http://aimotion.blogspot.com

Young apprentice with Python programming since 2008.

Director of P&D - brazilian startup Orygens

2

Page 3: Mining Scipy Lectures

How I started this analysis?

24 hours ago...

3

Page 4: Mining Scipy Lectures

Question

How were the topics distributed around the Scipy Conference

General Sessions ?

4

Page 5: Mining Scipy Lectures

Scrapping of Scipy Conference

Small Web-Crawler for extracting the approved lectures

urllib2, re, BeautifulSoap...5

Page 6: Mining Scipy Lectures

Resume

Lectures

minutes length

41

820

6

Page 7: Mining Scipy Lectures

It means...

=~ 4100 tweets posted.

7

Page 8: Mining Scipy Lectures

Or watch...

Star Wars Trilogy

2x

8

Page 9: Mining Scipy Lectures

Or finish Super Mario Game...

82 x!

9

Page 10: Mining Scipy Lectures

Or open the Eclipse

2 x!

Na nossa língua agora...

Abrir o Eclipse 2 vezes!

11

10

Page 11: Mining Scipy Lectures

Most popular Authors

Dharhas Pothina - 3

Wes McKinney - 2

All the others - 1

11

Page 12: Mining Scipy Lectures

Playing with the text...

The most frequent words at the conference

nltk, re

12

Page 13: Mining Scipy Lectures

But let’s take a deeper look.I used the clustering algorithm K-Means

Tool used for visualization Ubigraph

13

Page 14: Mining Scipy Lectures

Distribution of the Lectures

Basic Frameworksmatplotlib, ipython

Parallelism performance, gpu, statistical

Building frameworksperformance, models, web services

VisualizationNumpy

toolkits using Numpy

data analysis, statistical

14

Page 15: Mining Scipy Lectures

To sum up...

Mining english text is so much easier!!!Submit your work also!

Spread the scientific python over the community

I expect to be back to Scipy next year!

15

Page 16: Mining Scipy Lectures

Mining LecturesMarcel Caraciolo - @marcelcaraciolo

https://github.com/marcelcaraciolo/clustering_scipy

16