saurabh kumar, shana moothedath & madhu n. belur ... · why pandas? pandas: simpli es data...

24
Use of Pandas for data-analysis in allocation problems Saurabh Kumar, Shana Moothedath & Madhu N. Belur, Department of Electrical Engineering, Indian Institute of Technology Bombay SciPy, 15-Dec-15 Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 1 / 13

Upload: others

Post on 10-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

Use of Pandas for data-analysis in allocation problems

Saurabh Kumar, Shana Moothedath & Madhu N. Belur,

Department of Electrical Engineering,

Indian Institute of Technology Bombay

SciPy, 15-Dec-15

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 1 / 13

Page 2: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

Outline

1 Two typical allotment problems

2 Two separate groups to be allotted

Course to TA (Teaching Assistant)

Exam-centre to invigilator

3 Allotment versus data-analysis

4 Pandas

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 2 / 13

Page 3: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

Allocation/Matching Problems

Given two sets U and V : find a matching between U and V .

Perfect matching: each node is matched

(requires U and V to be same size)

Often: many perfect matchings: find ‘best’ in some sense

1

2

3

a

b

c

U V

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 3 / 13

Page 4: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

Allocation/Matching Problems

Given two sets U and V : find a matching between U and V .

Perfect matching: each node is matched

(requires U and V to be same size)

Often: many perfect matchings: find ‘best’ in some sense

1

2

3

a

b

c

U V

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 3 / 13

Page 5: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

Allocation/Matching Problems

Given two sets U and V : find a matching between U and V .

Perfect matching: each node is matched

(requires U and V to be same size)

Often: many perfect matchings: find ‘best’ in some sense

1

2

3

a

b

c

U V

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 3 / 13

Page 6: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

1

2

3

a

b

c

U V

U and V have no common node. Each edge: one node in U and one in V

‘Bipartite graph’

Other examples: Hospital-patient allocation, student-college admission

But not: room-partner allotment in hostels (non-bipartite, more-complex)

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 4 / 13

Page 7: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

1

2

3

a

b

c

U V

U and V have no common node. Each edge: one node in U and one in V

‘Bipartite graph’

Other examples: Hospital-patient allocation, student-college admission

But not: room-partner allotment in hostels (non-bipartite, more-complex)

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 4 / 13

Page 8: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

1

2

3

a

b

c

U V

U and V have no common node. Each edge: one node in U and one in V

‘Bipartite graph’

Other examples: Hospital-patient allocation, student-college admission

But not: room-partner allotment in hostels (non-bipartite, more-complex)

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 4 / 13

Page 9: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

1

2

3

a

b

c

U V

U and V have no common node. Each edge: one node in U and one in V

‘Bipartite graph’

Other examples: Hospital-patient allocation, student-college admission

But not: room-partner allotment in hostels (non-bipartite, more-complex)

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 4 / 13

Page 10: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

1

2

3

a

b

c

U V

U and V have no common node. Each edge: one node in U and one in V

‘Bipartite graph’

Other examples: Hospital-patient allocation, student-college admission

But not: room-partner allotment in hostels (non-bipartite, more-complex)

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 4 / 13

Page 11: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

1

2

3

a

b

c

U V

a b c

1

2

3

5 10 15

10 5 35

15 40 10

Edges have weights: preferences, eligibilities

Find maximum ‘total weight match’:

Allotter’s decision versus fairness

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 5 / 13

Page 12: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

1

2

3

a

b

c

U V

a b c

1

2

3

5 10 15

10 5 35

15 40 10

Maximum total for 1-a, 2-c and 3-b, even though 1-a value is least

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 6 / 13

Page 13: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

1st Allocation Problem: TA (Teaching Assistant) to course

Given two sets: students S and courses C : allot S and C

1 Data: students: course preferences, their score

2 Courses: number of slots: number of TA positions

Eligibility (student performance/specialization)

3 parties: students, course instructors, HoD (administrator)

Student: each wants 1st choice

Course instructor: wants high score student as TA

Administrator ensure: all students/slots are allotted

students/courses are ‘satisfied’ most

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 7 / 13

Page 14: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

1st Allocation Problem: TA (Teaching Assistant) to course

Given two sets: students S and courses C : allot S and C

1 Data: students: course preferences, their score

2 Courses: number of slots: number of TA positions

Eligibility (student performance/specialization)

3 parties: students, course instructors, HoD (administrator)

Student: each wants 1st choice

Course instructor: wants high score student as TA

Administrator ensure: all students/slots are allotted

students/courses are ‘satisfied’ most

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 7 / 13

Page 15: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

2nd allocation Problem: invigilator/centre

GATE: national level competitive exam: critical for IITs

Two sets: invigilators I and examination centres C : allot I to C

1 Data: invigilators give preference cities.

Some invigilators are more experienced/senior.

2 Centre: number of invigilators required

Centres have different experience

Three parties: invigilators, centres, Allotter

1 Invigilators want favorite city: Goa?

2 Centres want experienced invigilators

3 Allotter: No goof-ups in exam: top priority

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 8 / 13

Page 16: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

2nd allocation Problem: invigilator/centre

GATE: national level competitive exam: critical for IITs

Two sets: invigilators I and examination centres C : allot I to C

1 Data: invigilators give preference cities.

Some invigilators are more experienced/senior.

2 Centre: number of invigilators required

Centres have different experience

Three parties: invigilators, centres, Allotter

1 Invigilators want favorite city: Goa?

2 Centres want experienced invigilators

3 Allotter: No goof-ups in exam: top priority

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 8 / 13

Page 17: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

2nd allocation Problem: invigilator/centre

GATE: national level competitive exam: critical for IITs

Two sets: invigilators I and examination centres C : allot I to C

1 Data: invigilators give preference cities.

Some invigilators are more experienced/senior.

2 Centre: number of invigilators required

Centres have different experience

Three parties: invigilators, centres, Allotter

1 Invigilators want favorite city: Goa?

2 Centres want experienced invigilators

3 Allotter: No goof-ups in exam: top priority

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 8 / 13

Page 18: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

2nd allocation Problem: invigilator/centre

GATE: national level competitive exam: critical for IITs

Two sets: invigilators I and examination centres C : allot I to C

1 Data: invigilators give preference cities.

Some invigilators are more experienced/senior.

2 Centre: number of invigilators required

Centres have different experience

Three parties: invigilators, centres, Allotter

1 Invigilators want favorite city: Goa?

2 Centres want experienced invigilators

3 Allotter: No goof-ups in exam: top priority

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 8 / 13

Page 19: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

2nd allocation Problem: invigilator/centre

GATE: national level competitive exam: critical for IITs

Two sets: invigilators I and examination centres C : allot I to C

1 Data: invigilators give preference cities.

Some invigilators are more experienced/senior.

2 Centre: number of invigilators required

Centres have different experience

Three parties: invigilators, centres, Allotter

1 Invigilators want favorite city: Goa?

2 Centres want experienced invigilators

3 Allotter: No goof-ups in exam: top priority

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 8 / 13

Page 20: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

Allotter’s requirement

Exam conduct smooth!

1 Equalize experience within centre

2 Balance: Centre experience + invigilator experiences

3 Balance work across people and weekends

1 Allot senior invigilator’s higher choices

2 Minimize average choice number in an allotment

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 9 / 13

Page 21: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

Allotment method

1 Construct weights carefully: student-score (seniority), preferences

Maximum weight matching: Hungarian method:

Munkres in Python

max weight matching using NetworkX

2 Markov-Chain-Monte-Carlo (MCMC):

pymc in Python

Allotment: outside scope of today’s talk

Today: data analysis, using Pandas

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 10 / 13

Page 22: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

Why analyze data: just allot?

Allotter’s dilemma

1 Is an allotment good?

2 Choices perhaps skewed

3 Seek more choices?

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 11 / 13

Page 23: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

Data decides how good an allotment can be

Typical questions:

1 Did people give sufficiently different choices?

2 Some course (city) too popular, some course (city) unwanted

3 1st weekend for GATE more sought than 2nd weekend (for 2016)

4 If data is skewed, allotment inevitably bad

5 Allot some manually: critical courses/cities

6 Quickly extract specific type of choosers

7 Obtain ‘breakups’: pie-charts, bar-diagrams

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 12 / 13

Page 24: Saurabh Kumar, Shana Moothedath & Madhu N. Belur ... · Why Pandas? Pandas: simpli es data processing tasks: 1 Reading and writing les 2 Data extraction 3 Data manipulations 4 Getting

Why Pandas?

Pandas: simplifies data processing tasks:

1 Reading and writing files

2 Data extraction

3 Data manipulations

4 Getting statistics on data

5 Plotting and visualization

Saurabh, Shana, Madhu (EE, IITB) Pandas for data-analysis SciPy-15 13 / 13