mining scipy lectures

Post on 13-May-2015

3.557 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Lecture given at Scipy 2011, Austin, Tx about mining the Scipy Lectures - Visualization and Clustering

TRANSCRIPT

Mining LecturesMarcel Caraciolo - @marcelcaraciolo

1

Who’s me ? Marcel Pinheiro Caraciolo

Brazilian, lover of crabs

M.S.C Candidate at Data Mining and Recommender Systems

Current moderator of the Local Python User Group at Pernambuco

Interested at machine learning, recommender systems and mobile computing

Blogging about machine learning with Python since 2008 http://aimotion.blogspot.com

Young apprentice with Python programming since 2008.

Director of P&D - brazilian startup Orygens

2

How I started this analysis?

24 hours ago...

3

Question

How were the topics distributed around the Scipy Conference

General Sessions ?

4

Scrapping of Scipy Conference

Small Web-Crawler for extracting the approved lectures

urllib2, re, BeautifulSoap...5

Resume

Lectures

minutes length

41

820

6

It means...

=~ 4100 tweets posted.

7

Or watch...

Star Wars Trilogy

2x

8

Or finish Super Mario Game...

82 x!

9

Or open the Eclipse

2 x!

Na nossa língua agora...

Abrir o Eclipse 2 vezes!

11

10

Most popular Authors

Dharhas Pothina - 3

Wes McKinney - 2

All the others - 1

11

Playing with the text...

The most frequent words at the conference

nltk, re

12

But let’s take a deeper look.I used the clustering algorithm K-Means

Tool used for visualization Ubigraph

13

Distribution of the Lectures

Basic Frameworksmatplotlib, ipython

Parallelism performance, gpu, statistical

Building frameworksperformance, models, web services

VisualizationNumpy

toolkits using Numpy

data analysis, statistical

14

To sum up...

Mining english text is so much easier!!!Submit your work also!

Spread the scientific python over the community

I expect to be back to Scipy next year!

15

Mining LecturesMarcel Caraciolo - @marcelcaraciolo

https://github.com/marcelcaraciolo/clustering_scipy

16

top related