handling panel data...prog1.rating, weighted on prog1.rating and filtered to prog1.date(01010),...

48
Copyright © 2007. All rights reserved Handling Panel Data, page 1 of 48 Handling Panel Data Copyright © 2007. Protected by International Copyright law. All rights reserved worldwide. Version: 15 May 2007 This document remains the property of Red Centre Software Pty Ltd and may only be used by explicitly authorised individuals who are responsible for its safe-keeping and return upon request. No part of this document may be reproduced or distributed in any form or by any means - graphic, electronic, or mechanical, including, but not limited to, photocopying, recording, taping, email or information storage and retrieval systems - without the prior written permission of Red Centre Software Pty Ltd.

Upload: others

Post on 26-May-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 1 of 48

Handling Panel Data Copyright © 2007. Protected by International Copyright law. All rights reserved worldwide.

Version: 15 May 2007

This document remains the property of Red Centre Software

Pty Ltd and may only be used by explicitly authorised

individuals who are responsible for its safe-keeping and

return upon request.

No part of this document may be reproduced or distributed

in any form or by any means - graphic, electronic, or

mechanical, including, but not limited to, photocopying,

recording, taping, email or information storage and retrieval

systems - without the prior written permission of Red Centre

Software Pty Ltd.

Page 2: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 2 of 48

Handling Panel Data

This document examines the various issues regarding handling panel data, and gives

some examples of various processing and analytical approaches using various sets of

fictitious data.

THE NATURE OF PANEL DATA ........................................................................................................... 3

Single Case Example ........................................................................................................................... 3

Confirming the Data Representation by Tables ................................................................................... 5

Some Example Tables ......................................................................................................................... 6

Favourite Program ............................................................................................ 6

Compare 1Jan01 Survey with 1Jun01 Survey ..................................................................................... 7

Number of Programs Watched by Day as a Chart .................................................. 8

Sum of Ratings and Mean Rating by Day............................................................ 10

Daily Mean without Regard to Program.............................................................. 12

Compare Rating for Prog11 to Average Rating and Calculate Anomaly ................... 12

A Multi-Case Example ........................................................................................................................ 13

Basic Histograms ............................................................................................ 13

Time Series.................................................................................................... 16

Period Comparisons ........................................................................................ 17

1000 Cases over 1000 Days over 1000 TV Programs....................................................................... 21

Frequency and Ratings Chart for Prog1.............................................................. 21

Total and Average Viewing Occasions ................................................................ 22

Weekdays and Weekends................................................................................. 23

Most Popular Program Report ........................................................................... 25

What Else did those who Watched Prog1 more than 5 Times in Year1 Watch in Year 2

and 3?........................................................................................................... 27

What Else did those who Rated Prog1 4 or 5 in Year1 Watch in Year 2 and 3? ........ 30

Combinations ................................................................................................. 31

10 Cases, 20 Programs, 14 Days....................................................................................................... 34

Basic Analysis................................................................................................. 35

Most Popular Program Report ........................................................................... 39

Isolating Weekdays from Weekends .................................................................. 41

Comparing Behaviour Across Periods................................................................. 45

Page 3: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 3 of 48

TTTTHE HE HE HE NNNNATURE OF ATURE OF ATURE OF ATURE OF PPPPANEL ANEL ANEL ANEL DDDDATAATAATAATA

The primary difference between panel data and normal survey data is that for a panel the

respondent is not anonymous, and may respond to the same questionnaire multiple

times. Multiplicity allows tracking at the individual level – for example, it should be

possible to calculate the average time between purchases by identifying respondents who

have purchased more than once, deriving the gap from the dates of purchase, and then

averaging the gaps.

Structurally, this requires that a case is a panel member rather than a questionnaire, and

that subsequent instances of the same questionnaire by the same respondent append to

the existing case for that respondent.

Single Case Example

To make this clear, here is some sample data for a TV ratings panel. It is possible to

watch the same program more than once on a single day. Programs are coded as two

digits, range 10 to 19. Ratings are 1 to 5.

Date: 1Jan01

Respondent Panel ID: 1

Mon Tues Wed Thur Fri Sat Sun

Program 11 12 10 11 18 11 16

Rating 3 1 3 5 3 2 2

Program 12 11 13 14 19 17 13

Rating 4 1 4 3 3 3 4

Program 19 14 18 19 11 19 15

Rating 2 3 2 2 4 2 5

Program 15 19 11 15

Rating 3 3 4 5

For each program watched, the data storage needs to capture the program code, the

rating and the date. This requires a hierarchy – for each program there is a rating, and

for each rating there is a date. Coding the dates as YMMDD, the data for Monday 1Jan01

program 11 rating 3 can be represented in Ruby format (patent pending) as

a11b3c10101

Continuing on for the same day, program 12 rating 4 and program 19 rating 2 are

respectively

a12b4c10101

a19b2c10101

Putting these data segments together for all of Monday 1Jan01 gives

Monday: a11b3c10101a12b4c10101a19b2c10101

Similarly, the other days are

Tuesday: a12b1c10102a11b1c10102a14b3c10102a15b3c10102

Page 4: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 4 of 48

Wednesday: a10b3c10103a13b4c10103a18b2c10103

Thursday: a11b5c10104a14b3c10104a19b2c10104a19b3c10104

Friday: a18b3c10105a19b3c10105a11b4c10105a11b4c10105

Saturday: a11b2c10106a17b3c10106a19b2c10106

Sunday: a16b2c10107a13b4c10107a15b5c10107a15b5c10107

Putting all the days together, to make a single case, gives

a11b3c10101a12b4c10101a19b2c10101a12b1c10102a11b1c10102a14b3c10102a15b3c101

02a10b3c10103a13b4c10103a18b2c10103a11b5c10104a14b3c10104a19b2c10104a19b3c1

0104a18b3c10105a19b3c10105a11b4c10105a11b4c10105a11b2c10106a17b3c10106a19b2

c10106a16b2c10107a13b4c10107a15b5c10107a15b5c10107

Six months later, Respondent ID #1 participates again, with data as

Date: 1Jun01

Respondent Panel ID: 1

Mon Tues Wed Thur Fri Sat Sun

Program 10 15 11 11 12 11 18

Rating 2 4 5 5 5 5 3

Program 13 11 17 17 16 13 17

Rating 2 4 3 2 2 4 2

Program 11 16 19 16 12 12 11

Rating 4 2 1 3 5 1 5

Program 17 12 16

Rating 2 1 5

Program 13 12

Rating 1 5

In Ruby format,

Monday: a10b2c010601a13b2c10601a11b4c10601

Tuesday: a15b4c10602a11b4c10602a16b2c10602a17b2c10602a13b1c10602

Wednesday: a11b5c10603a17b3c10603a19b1c10603

Thursday: a11b5c10604a17b2c10604a16b3c10604

Friday: a12b5c10605a16b2c10605a12b5c10605

Saturday: a11b5c10606a13b4c10606a12b1c10606a12b1c10606

Sunday: a18b3c10607a17b2c10607a11b5c10607a16b5c10607a12b5c10607

Putting all the days together gives

a10b2c010601a13b2c10601a11b4c10601a15b4c10602a11b4c10602a16b2c10602a17b2c10

602a13b1c10602a11b5c10603a17b3c10603a19b1c10603a11b5c10604a17b2c10604a16b3c

10604a12b5c10605a16b2c10605a12b5c10605a11b5c10606a13b4c10606a12b1c10606a12b

1c10606a18b3c10607a17b2c10607a11b5c10607a16b5c10607a12b5c10607

Finally, the response strings for the weeks of 1Jan01 and 1Jun01 are concatenated to a

single string, which is the entire case for Respondent #1.

a11b3c10101a12b4c10101a19b2c10101a12b1c10102a11b1c10102a14b3c10102a15b3c101

02a10b3c10103a13b4c10103a18b2c10103a11b5c10104a14b3c10104a19b2c10104a19b3c1

0104a18b3c10105a19b3c10105a11b4c10105a11b4c10105a11b2c10106a17b3c10106a19b2

c10106a16b2c10107a13b4c10107a15b5c10107a15b5c10107a10b2c010601a13b2c10601a1

1b4c10601a15b4c10602a11b4c10602a16b2c10602a17b2c10602a13b1c10602a11b5c10603

a17b3c10603a19b1c10603a11b5c10604a17b2c10604a16b3c10604a12b5c10605a16b2c106

05a12b5c10605a11b5c10606a13b4c10606a12b1c10606a12b1c10606a18b3c10607a17b2c1

0607a11b5c10607a16b5c10607a12b5c10607

Page 5: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 5 of 48

Confirming the Data Representation by Tables

Calling the example case and variable Prog1 (‘1’ for 1 case), each day of the original grid

for week commencing 1Jan01 can be obtained by the table Prog1.Program by

Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010),

Prog1.Date(10102), etc.

Mon Tues Wed Thur Fri Sat Sun

Program 11 12 10 11 18 11 16

Rating 3 1 3 5 3 2 2

Program 12 11 13 14 19 17 13

Rating 4 1 4 3 3 3 4

Program 19 14 18 19 11 19 15

Rating 2 3 2 2 4 2 5

Program 15 19 11 15

Rating 3 3 4 5

For days where the same program was watched more than once (enscribed in red), the

cell value shows the sum of ratings (by virtue of weighting on Rating), and the code

mean in the bottom row shows the average rating.

Similarly, for the week commencing 1Jun06, the data (as above) is

Mon Tues Wed Thur Fri Sat Sun

Program 10 15 11 11 12 11 18

Rating 2 4 5 5 5 5 3

Program 13 11 17 17 16 13 17

Rating 2 4 3 2 2 4 2

Program 11 16 19 16 12 12 11

Rating 4 2 1 3 5 1 5

Program 17 12 16

Rating 2 1 5

Program 13 12

Rating 1 5

Page 6: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 6 of 48

The table is

Some Example Tables

Favourite Program

Specification is

Sorting on the Code Mean row puts the highest ratings to the left.

Page 7: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 7 of 48

The favourite program is prog15, with a rating of 4.2, although prog11, at 3.9, is

watched more often. Sorting on the Total row instead gives

Compare 1Jan01 Survey with 1Jun01 Survey

This is done by filtering the Top axis to the date ranges for each.

Specification is

Page 8: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 8 of 48

The table is

This shows that in terms of ratings prog13 has dropped off from 4.0 to 2.3, and that

prog15 at 4.3 in week of 1Jan01 has been surplanted by prog11 at 4.7 in the week of

1Jun01.

Number of Programs Watched by Day as a Chart

Specification is

Page 9: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 9 of 48

Chart as stacked bars is

Page 10: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 10 of 48

Three programs including duplicates were watched on 7 days, four on 5 days, and five on

2 days.

Sum of Ratings and Mean Rating by Day

The sum of ratings is achieved by weighting Program on Rating.

Specification is

Chart is

Page 11: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 11 of 48

The sum of ratings for 7Jun01 (date code 10607) is 20. The rightmost fragment of the

daily breakdown table above is

5+5+5+2+3=20, for the stacked bars, and 20/5 = 4, for the code mean.

Page 12: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 12 of 48

Daily Mean without Regard to Program

The bottom row here is the same as the black line series in the chart above.

Compare Rating for Prog11 to Average Rating and Calculate Anomaly

The specification is

Note how the anomaly expression (v1-v2)/(v2/v2) forces days with no rating for prog11

to be excluded completely.

The table is

Page 13: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 13 of 48

The first row is the average rating for all programs. The second row is the average for

prog11 only. The third row is the anomaly, calculated as row1 – row2 for columns with a

rating for prog11 only.

A Multi-Case Example

The value of working with a single case is that the processing can be traced through and

confirmed. Here is a more realistic example, using 1000 programs over 10,000 cases

from 1Jan2001 to 28Dec2006. To facilitate creating the fictitious data I have limited

days in month to 28.

To prevent a potential combinatorial explosion, the level B and C of the hierarchy are

uncoded. If required by a specification, Ruby codes the requested levels on the fly. Level

A, for the actual TV programs, must be explicitly coded, otherwise there would be no

indication of program on the final output. The ratings at 1 to 5 are self-coding, and the

dates in YMMDD format are also self-coding, and are actually easier to enter directly into

filters than it would be to locate and click or drag if using the mouse.

Random dates are assigned over all cases. This simulates a worst-case panel where the

same respondent can provide more than one set of responses at different times.

Basic Histograms

The data was initiated as random numbers in Excel, and then manually adjusted to plant

a few ‘results’ which can then be ‘discovered’.

Page 14: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 14 of 48

Count of Programs:

The programs are on the X axis in coded order. I could not think up 1000 meaningful TV

program names, so you have to imagine that prog1 = MASH, prog2= ER, prog3 =

Surviver, or whatever, etc.

The chart shows that I have spiked prog1, 333, 2, plus several others. Sorting the X

axis and zooming to the left shows the most frequently viewed programs.

Page 15: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 15 of 48

Count of Dates

The Dates at Level C self-code on placement as the Top axis (to make the columns).

This sorts them in date order, since the format YMMDD if commencing post 2000 will be

in numeric order. The trend line shows that there are on average a bit under 58 program

ratings per day.

Count of Ratings

This one is easier as a table. It shows that no one rating is any more likely than any

other. Our respondents are a very even-minded group.

Page 16: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 16 of 48

Time Series

Most Popular Programs

Average Ratings

The mean rating is 3, as would be expected from randomly generated ratings.

Smoothing daily data at MA365 gives the appearance of some movement, but note that

it is within a very narrow band.

Page 17: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 17 of 48

Period Comparisons

This section looks at queries of respondent behaviour with respect to and in comparison

with different periods. This easiest approach is to construct a multi-response variable for

each program of interest. For Prog1, the filters to separate the viewing occasions across

the six years are

The variable is named Prog1Years. The data for the first several cases is

This shows that case#1, being the respondent with ID=1, watched prog1 three times in

2002, and once each in 2003, 2004 and 2005. Case#11 watched prog1 once in 2006.

Case#1 was manually fiddled to give the six instances of prog1.

Distribution of Prog1 Across the Years

Page 18: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 18 of 48

In 2001, 111 respondents watched prog1 113 times, and in 2002, 86 respondents

watched 89 times, etc. 111+86+95+100+92+89+9440 = 10013.

Showing the breakdown by case and sorting on the column for 2002 brings case#1 to the

top.

In 2002, case#1 saw prog1 twice extra, and case#6835 once extra, and 2+1=3, and 89-

86=3.

Since any row total >1 indicates prog1 was watched more than once, the full account of

multiple viewings can be obtained by sorting on the Total column.

Page 19: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 19 of 48

12 cases viewed one extra, two cases (#6362 and #6572) viewed two extra, and one

case (#1) viewed 5 extra, giving 1+1+1+1+1+1+1+1+1+1+1+1+2+2+5=21, which

accounts for the Code1 total of 10,021.

Those who saw Prog1 Three or More Times in 2002

The number of cases could be very large, precluding a table using CaseNum as an axis.

The table immediately above shows that only a single case fits the criterion. The filter

expression is

cvl_Prog1Years(2)>=3

This says ‘return true if the count of values in Prog1Years where the value is code 2 is

greater than or equal to 3’. The table Prog1Years by Count under this filter should

therefore reproduce the first line of Prog1Years by CaseNum sorted on Total, as above.

The table is

Page 20: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 20 of 48

The filter isolates the single respondent (case#1 as we know from the above table), who

watched prog1 three times in 2002, and once in each of 2003, 2004 and 2005.

Considering those who watched prog1 two or more times in 2002, the table Prog1Years

by CaseNum indicates that the cases are #1 and #6835. #1 watched three times, and

#6835 watched twice, so the table filtered to cvl_Prog1Years(2)>=2 should show 5

instances by 2 respondents.

Page 21: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 21 of 48

1000 Cases over 1000 Days over 1000 TV Programs

This is probably a bit more realistic, except that since the data was generated randomly,

all programs are virtually equally popular.

The data has been structured differently this time, with Day as the top level. The

reasoning is that Day is the biggest container – it contains the TV program watching

occasions – and so it should be level A. For each day a respondent can watch one or

more TV programs, and for each program, a rating is given.

Frequency and Ratings Chart for Prog1

Unlooping on the fly at 1000*1000*5 = 5 million vectors is clearly unrealistic, but each

level can be used in its own right with relative comfort. Day by Rating filtered to Prog1

with Code Mean takes a bit over 10 seconds to generate and display.

Page 22: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 22 of 48

This chart shows that fewer respondents watched prog1 around the middle of the period,

but that the rating did not diminish – those who watched continued to like it, and viewer

numbers recovered towards day1000.

Total and Average Viewing Occasions

A count of ratings shows the total number of program viewing occasions with a rating as

3,501,689.

The average number of programs viewed per day can be achieved by .Day by

avg#.Program

Page 23: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 23 of 48

So, respondents watched about 3.5 programs per day.

Weekdays and Weekends

Weekdays and weekends can be distinguished by a construction which uses modulus

arithmetic. The filters are

This produces a code 1 for days 1/5, and a code 2 for days 6/7. It also produces a lot of

b0 segments. The output showing just two weekend sections across the first four cases

is

This works OK as a filter against Programs1k1k.Day. The chart of Day by Programs

filtered to Weekdays (zoomed to far left) is

Page 24: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 24 of 48

All days with codes which are multiples of 7 or multiples of 7 minus 1 are empty. The

assumption of course is that days 1/5 are Monday/Friday, 6/7 are Saturday/Sunday, and

likewise without gap thereafter (and if a day(s) is out of field, its code(s) must be

skipped).

The 1000 respondents watch about 3,500 programs per weekday, verifying the 3.5 per

day chart above.

It is a shame about all the wasted space for the b0 segments, however. They could be

simply cleared out using search=”b0” replace=””, but it seemed more useful and natural

to make levelB carry some useful information, so I coded it as year, where

year1=day(1/365), etc

Page 25: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 25 of 48

This now gives a b1, b2 or b3 for year1, 2 or 3. This is not much use, however, because

the only matching structures are Programs1k1k.Day or Periods1k1k.DayType.

It might then actually be safer to kill the b0 segments and keep levelB as a dummy.

Most Popular Program Report

The several different ways of defining ‘most popular’ can all be accommodated by sorting

columns on a single table.

Program with Highest number of viewings:

prog998 has data recorded against it 3682 times.

Page 26: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 26 of 48

Top: Count1k1k, Programs1k1k Rating

Side: Programs1k1k Program

SideSort: by column 1 decreasing

Programs1k1k Program

prog988

prog441

prog835

prog760

prog932

prog604

prog468

prog511

prog89

prog109

Count Programs1k1k Rating

Code 1Cases

WFrat1 rat2 rat3 rat4 rat5

Code

Mean

3,682 972 770 742 720 702 748 2.98

3,674 977 755 717 704 754 744 3.00

3,673 979 708 721 804 718 722 3.01

3,660 982 750 792 687 731 700 2.96

3,654 968 751 744 705 684 770 2.99

3,648 972 697 726 678 760 787 3.06

3,643 968 753 731 766 706 687 2.96

3,640 980 714 770 732 718 706 2.98

3,638 973 746 751 732 713 696 2.96

3,635 977 761 664 727 749 734 3.01

FrequenciesFrequencies

Program with Highest Number of Viewers:

prog853 was viewed by 986 respondents.

Top: Count1k1k, Programs1k1k Rating

Side: Programs1k1k Program

SideSort: by column 2 decreasing

Programs1k1k Program

prog853

prog697

prog90

prog317

prog292

prog454

prog694

prog760

prog27

prog32

Count Programs1k1k Rating

Code 1Cases

WFrat1 rat2 rat3 rat4 rat5

Code

Mean

3,603 986 733 707 724 740 699 2.99

3,517 985 739 700 704 674 700 2.97

3,610 984 717 693 695 764 741 3.03

3,545 983 683 705 698 695 764 3.04

3,463 982 684 669 715 675 720 3.02

3,595 982 723 696 737 703 736 3.01

3,522 982 680 673 739 692 738 3.04

3,660 982 750 792 687 731 700 2.96

3,580 981 702 691 703 722 762 3.04

3,487 981 732 684 742 668 661 2.95

FrequenciesFrequencies

Program with Highest Number of Rating 5s:

prog65 was rating a 5 804 times.

Page 27: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 27 of 48

Top: Count1k1k, Programs1k1k Rating

Side: Programs1k1k Program

SideSort: by column 7 decreasing

Programs1k1k Program

prog65

prog729

prog604

prog905

prog330

prog708

prog932

prog228

prog85

prog740

Count Programs1k1k Rating

Code 1Cases

WFrat1 rat2 rat3 rat4 rat5

Code

Mean

3,620 972 727 736 678 675 804 3.03

3,611 976 731 679 671 735 795 3.05

3,648 972 697 726 678 760 787 3.06

3,599 973 726 694 692 705 782 3.03

3,592 971 682 708 702 724 776 3.06

3,563 967 659 720 680 728 776 3.07

3,654 968 751 744 705 684 770 2.99

3,559 975 661 710 706 714 768 3.06

3,593 978 696 714 720 697 766 3.03

3,612 975 720 709 685 732 766 3.03

FrequenciesFrequencies

Program with Highest Code Mean:

prog719, 375 and 708 all get a mean rating of 3.07.

Top: Count1k1k, Programs1k1k Rating

Side: Programs1k1k Program

SideSort: by column 8 decreasing

Programs1k1k Program

prog719

prog375

prog708

prog744

prog12

prog936

prog228

prog58

prog536

prog218

Count Programs1k1k Rating

Code 1Cases

WFrat1 rat2 rat3 rat4 rat5

Code

Mean

3,436 965 641 649 692 723 731 3.07

3,420 964 629 657 707 695 732 3.07

3,563 967 659 720 680 728 776 3.07

3,433 968 638 690 654 713 738 3.06

3,492 963 641 699 692 722 738 3.06

3,458 977 620 708 686 728 716 3.06

3,559 975 661 710 706 714 768 3.06

3,440 970 639 688 667 716 730 3.06

3,447 963 649 660 691 728 719 3.06

3,509 975 645 688 714 736 726 3.06

FrequenciesFrequencies

What Else did those who Watched Prog1 more than 5 Times in Year1 Watch in Year 2 and 3?

It is not possible to count code instances within a period inside a hierarchic structure at

crosstab time because filters would be needed, and since hierarchic, all non-filtered

periods disappear. The likely contender for an on-the-fly filter expression would be

Programs1k1k.Day(1/365)&(cvl_Programs1k1k.Program(1)>3)

Page 28: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 28 of 48

But this delivers an empty table. Cvl_ Programs1k1k.Program(1)>3 will return true only

for something like b1;1;1;1, and Programs1k1k.Day(1/365) excludes all days from code

366 onwards.

(A handy function would be cvf_, which counts across segments and returns a single

integer (‘f’ for ‘flat).)

Thus an intermediate construction is required. This filter writes a code 1 for each true

hierarchic node.

Case#4 and case#12 viewed prog1 3 times in year1, etc. This is done as a general

template so that anytime a similar construction is required – say for a given week – or

for a different program – then simply change the Day range and the Program code.

Sorting ProgCounter allows prediction of what should happen.

Page 29: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 29 of 48

Cvl_ >6 should produce an empty chart. Cvl_ >5 should have a base count of N=2, but

because the data is random over 1000 TV programs over 1000 days, quite a few TV

programs will be passed by the filter. Cvl_ >4 will allow more, etc.

The specification for the final chart is .Day by .Program filtered to cvl_ProgCounter>5.

On a 90 day roll, the chart is

Note the Base: N=2 at the bottom of the chart. This is as expected.

Cvl_ >6 should be empty.

Page 30: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 30 of 48

Note the Base: N=0 at the bottom of the chart. This is as expected.

What Else did those who Rated Prog1 4 or 5 in Year1 Watch in Year 2 and 3?

The ProgCounter construction can be extended to carry a variety of different counts.

Adding a code 2 defined like this

delivers the data as

Page 31: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 31 of 48

The code 1s are now interspersed with code 2s. The chart specification is .Day by

.Programs filtered to ProgCounter(2).

Combinations

Any type of combination is now easy to specify. The Side variable could be .Ratings, and

the filter could be any AND or OR of any number of code 1s in ProgCounter(1) and

ProgCounter(2), such as

Page 32: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 32 of 48

Ratings in Years 2 and 3 for those who saw Prog1 > 5 AND rated Prog1 a 4 or 5 in year 1.

Page 33: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 33 of 48

Ratings in Years 2 and 3 for Prog1 for those who saw Prog1 > 3 AND rated Prog1 a 4 or 5 in year 1.

The X axis has no hidden points, so that the greater density in Year1 is apparent. Rating

5s dominate in years 2 and 3.

Page 34: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 34 of 48

10 Cases, 20 Programs, 14 Days

This is an artificial data set for examining various approaches to analysing a panel for TV

program ratings. The basic data is organised hierarchically as

Level A: For each day

Level B: For each program viewed on that day

Level C: Give a rating from 1 to 5

Showing part of the first and second days only, the structure is

The data is small so that generation can be traced through the individual cases.

The Ruby variable which represents this structure is called Program10 (‘10’ for 10 cases).

All variables for this exercise have a trailing ‘10’ in the name.

Page 35: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 35 of 48

Basic Analysis

1. When and how often programs were watched

This is given by a grid table of Day by Prog.

Page 36: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 36 of 48

Prog1 was watched once on day1, three times on day2, etc. Prog18 was watched by 7

respondents, and prog20 was watched by 8 respondents, etc. All respondents submitted

a set of ratings for each of the 14 days.

2. Identifying Respondents

The 7 respondents who watched prog18 and the 8 who watched prog20 can be identified

by a table of Program10.Prog(18;20) by Case10.

Page 37: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 37 of 48

The 7 cases for prog18 are 1, 2, 3, 4, 5, 6, 8. The 8 cases for prog20 are 1, 2, 3, 4, 6, 7,

9, 10.

3. High and Low Ratings

The table Rating by Day with Totals (for number of times a program was rated) and

sorted code mean (for average rating) shows which days favoured which ratings.

Looking at just the rating 5s, days 3, 7 and 9 must have had some popular programs,

with 10 5s. Day13, with only 2 rating 5s, fewest viewing occasions at 29, and an

average rating of 2.59, looks like a loser.

Page 38: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 38 of 48

4. Popularity Specification

The table Rating by Program with columns for Count, Total and Code Mean is useful for

analysing popularity from several different points of view. The specification is

The table (top part only) with column% is

Page 39: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 39 of 48

Most Popular Program Report

The several different ways of defining ‘most popular’ can all be accommodated by sorting

columns on the above table. Percents are not needed.

5. Viewed Daily by Everyone:

Programs 1, 2, 3, 5, 8, 14, 15 and 16.

Page 40: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 40 of 48

6. Program with Highest number of viewings:

Prog16, at 35 times. This could have been equivalently obtained from the Total column –

the base values are different, but the frequency counts are the same.

7. Program with Highest Number of Rating 5s:

Prog15, with 11 rating 5s.

8. Program with Highest Code Mean:

Prog15 again, at 3.52.

Page 41: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 41 of 48

Isolating Weekdays from Weekends

There are several ways this could be done. In this example, since the days are coded

from 1, then assuming 1/5 and 8/14 are the weekdays, modulus arithmetic (remainder

after division, Ruby operator is ?) can be used to make a code 1 for Monday to Friday,

and a code 2 for Saturday and Sunday. This must be a hierarchic construction so that

the structure matches Program10. Level B is not needed, and so padded with a code

zero.

9. Time Series Chart of Program by Weekday

A chart of Day by Program filtered to DayType(Weekday) shows that the weekends have

been skipped.

Page 42: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 42 of 48

Page 43: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 43 of 48

10. Chart Comparing Mean Rating across Weekday/Weekend

The specification is made by

• Drag Program10.Prog to Top (bent red line)

• Select DayType10.Type and click the top Nest button (Lavender rectangles)

• Right-click in Top and select Spacer, move to between the two filters

• Drag Program10.Rating to Side (straight red line), remove all codes

• Drag Code Mean to Side and drop on top of Program10.Rating (blue line)

• Select Chart as the output type and click Run (black rectangles).

Page 44: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 44 of 48

Note the Spacer which separates the nested filters for Top, and that for Side the only

item is the Code Mean. The chart, using a Nested display style for the X axis, is

Page 45: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 45 of 48

This shows that prog7 and prog15 both rate better on weekends, and progs 8, 9 and 17

rate worse.

Comparing Behaviour Across Periods

11. What programs were watched in week2 by those who saw prog1 more than twice in week1?

This sort of query requires constructing a flat variable for filtering across all cases and

periods. The next chart will be of those who rated prog1 a 4 to 5, so the construction

ProgCounter10 does both at once.

The filter for the constructed code 1 says ‘Day was a 1 to 7 AND the program was prog1’.

Page 46: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 46 of 48

The filter for code 2 additionally specifies the rating as 4 to 5.

The Variable Data window says that case#1, with data of 1;2;1;2;1, saw prog1 three

times (there are three 1s) and rated it a 4 or 5 twice (there are two 2s), etc. More

program counter criteria for filtering can be defined and added as required.

The chart specification is Day by Program filtered to count of code 1s in

ProgCounter10>2. As can be seen from the Variable Data window immediately above,

this is true for cases 1 and 8.

The specification is made by

• Drag Program10.Day to Top (red line)

• Drag Program10.Prog to Side (red line)

• Check Case (blue rectangle)

• Select the function cvl (Count of Values, blue rectangle)

• Drag Watched Prog1 in Week1 to Filter (blue rectangle and line)

• Type in the condition as >2 (blue underline)

Page 47: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 47 of 48

Base: N=2 at the bottom of the chart means cases 1 and 8.

12. What programs were watched in week2 by those who rated prog1 a 4 or 5 in week1?

This is the above chart with the filter instead as ProgCounter10(2). This is true of 5

cases. The chart is

Page 48: Handling Panel Data...Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010), Prog1.Date(10102), etc. Mon Tues Wed Thur Fri Sat Sun Program 11 12 10 11 18 11 16 Rating

Copyright © 2007. All rights reserved Handling Panel Data, page 48 of 48

Similarly, it is easy to create charts of ratings filtered to those who watched, etc.

[end of document]