mining%of%massive%datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... ·...

Mining of Massive Datasets Leskovec, Rajaraman, and Ullman Stanford University

Upload: others

Post on 04-Jul-2020

1 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

Mining of Massive Datasets Leskovec, Rajaraman, and Ullman Stanford University

Page 2: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

¡  Assumes Euclidean space/distance

¡  Start by picking k, the number of clusters

¡  Ini<alize clusters by picking one point per cluster §  For the moment, assume we pick the k points at random

Page 3: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

¡  1) For each point, place it in the cluster whose current centroid it is nearest

¡  2) AAer all points are assigned, update the loca<ons of centroids of the k clusters

¡  3) Reassign all points to their closest centroid §  Some<mes moves points between clusters

¡  Repeat 2 and 3 un.l convergence §  Convergence: Points don’t move between clusters and centroids stabilize

Page 4: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

x x

x … data point … centroid

Round 1

Page 5: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

x x

x … data point … centroid

Round 2

Page 6: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

x x

x … data point … centroid

Round 3

Page 7: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

How to select k? ¡  Try different k, looking at the change in the average distance to centroid, as k increases.

Page 8: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

x x x x x x x x x x x x x

x x

x xx x x x

x x x x

x x x x x x x x x

Too few; many long distances to centroid.

Page 9: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

x x x x x x x x x x x x x

x x

x xx x x x

x x x x

x x x x x x x x x

Just right; distances rather short.

Page 10: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

x x x x x x x x x x x x x

x x

x xx x x x

x x x x

x x x x x x x x x

Too many; little improvement in average distance.

Page 11: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

Average falls rapidly un<l right k, then falls much more slowly

Average distance to

centroid

Best value of k

Page 12: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

¡  Approach 1: Sampling §  Cluster a sample of the data using hierarchical clustering, to obtain k clusters

§  Pick a point from each cluster (e.g. point closest to centroid)

§  Sample fits in main memory

¡  Approach 2: Pick “dispersed” set of points §  Pick first point at random §  Pick the next point to be the one whose minimum distance from the selected points is as large as possible

§  Repeat un<l we have k points 12

Page 13: Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-26 · 1) For&each&point,&place&itin&the&cluster&whose& currentcentroid&itis&nearest!

¡  In each round, we have to examine each input point exactly once to find closest centroid

¡  Each round is O(kN) for N points, k clusters

¡  But the number of rounds to convergence can be very large!

¡  Can we cluster in a single pass over the data?

ITIN Federal & StateTax return assembly Guide-2012(1).ppt

INDIVIDUAL TAXPAYER IDENTIFICATION NUMBER (ITIN)

ITIN & NICHE Guidelines

Experience Australia Itin Family Tropical Getaway

C:AHFGNew folderRLS AMBTR(1) 3 · 3/20/2017 · AMBULANCE WRITE-UP BENCH ELGP-109.1 ELGP-123.1 ITIN-026 MGAS-063 MGAS-063 MGAS-023 MGAS-043 MGAS-043 ELGP-109.1 ELGP-123.1 ITIN-026

Pro40064 Ta Itin Flyers_rva_euro

Pro40064 Ta Itin Flyers Nau Reg Euro

Itin nwb3-businessmodel

i itin - Embroidery Online · i itin by Urban Elementz / #80139 / 42 Designs 80139-01 All Aboard Single Stitch 7.41 X 2.89 in. 188.21 X 73.41 mm 1,184 St. 80139-02 Alphabet Soup 1

Introduc)on*à Map/Reduce*lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/...Sources • Apache*Hadoop* • Yahoo!*Developer*Network* • Hortonworks* • Cloudera* • Prac)cal*Problem*Solving*with*Hadoop*and*Pig*

Frequent Pattern Mining - imaglig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/...k=1 tk. Christian Borgelt Frequent Pattern Mining 5 Frequent Item Set Mining: Basic Notions

Galicia and Way of Saint James Accessible Tourism Plan · Galicia and Way of Saint James Accessible Tourism Plan 8 55% 31% 14% Acceso 80% 14% 6% Itin. horizontal 5% 4% 91% Itin. vertical

Publication 1915 (Rev. 1-2010) - efile.com · publication). The IRS will send your ITIN in the form of an assignment letter. An ITIN does not change your immigration status or grant

Itin Power Point

Family Itin

Pro40064 Ta Itin Flyers_mna_world

INDIVIDUAL TAXPAYER IDENTIFICATION NUMBERS: ITIN ... · Problems Most Litigated Issues Appendices Case Advocacy A second major problem is that with the ITIN unit’s increased focus

ITIN FILERS CHECK OUT - California

Introduc)on to Map-Reduce - imaglig-membres.imag.fr › leroyv › wp-content › uploads › sites › 125 › 201… · • Hortonworks • Cloudera • Prac)cal Problem Solving

Alternative IDs, ITIN Mortgages, and/media/publications/profitwise-news... · Alternative IDs, ITIN Mortgages, and Emerging Latino Markets ... bill payment history, ... ,

NE RRIAL FURNITURE ITIN UERE REN REEN ITENRE ...34 2017 FURNITURE 35 NE RRIAL FURNITURE ITIN UERE REN REEN ITENRE FURNITURE TATINERY The DULPEDIA 2017 THE DULPEDIA 2017 …

Mining%of%Massive%Datasets%lig-membres.imag.fr/leroyv/wp-content/uploads/sites/125/... · 2015-11-24 · sim(A,B))= cos(r A, r B)! sim(A,B))=0.38,) sim(A,C))=0.32)! sim(A,B))< sim(A,C),)butnotby)much)!

TPS ED1401-ITIN - thibidiphanan.com1).pdf1 TPS_ED1401-ITIN. TPS_ED1401-ENIT 2. 3 TPS_ED1401-ENIT Index Indice Applications Applicazioni 4 General Description Descrizione Generale 5

TABLE OF CONTENTS - Apply for ITIN Number | W7 Form

Rockport Bonanza ITIN 2019 - assets.ventbird.com

Distributed Transaction Management - LIG Membreslig-membres.imag.fr/leroyv/wp-content/uploads/sites/12… · · 2016-10-26Distributed Transaction Management ... Two-Phase Locking-based

Isla isabela itin

SSN, EIN and ITIN (1)ttsmedia.ttstrain.com/SSNlp042815.pdfUse Form W‐7 to apply for an IRS individual taxpayer identification number (ITIN). An ITIN is a nine‐digit number issued

Introduction à Javalig-membres.imag.fr/.../cours/supportsPDF/IntroCourte.pdfApplet • Classe principale ne posséde pas de méthode main() • Hérite de java.awt.Appletou javax.swing.JApplet

How to get an ITIN - Tax Season Resources · For more information refer to our ITIN pamphlet, available at any Community Tax Center location or at CTC at Highland Mall 6001 Ai rpo

A-Anchor ITIN SSTES CONCRETE ACCESSORIES · PDF file10 A-Anchor ITIN SSTES CONCRETE ACCESSORIES Lifting System The CONAC A-Anchor Lifting System is versatile and economical, easy to

ITIN - Acra Lending

What is an ITIN? - Internal Revenue Service · PDF fileWhat is an ITIN? Presented by ... Private delivery service, UPS, FedEx. Internal Revenue Service. ITIN Operations. Mail Stop

ITIN Guide for Tax Volunteers1040 and W7(s) to the ITIN Unit for final processing U.S. Mail to the ITIN Unit Personal delivery to the IRS Office Community Tax Centers Certifying

Itin Aa-caa Online Pre-Applicationtraining