handling panel data...prog1.rating, weighted on prog1.rating and filtered to prog1.date(01010),...
TRANSCRIPT
Copyright © 2007. All rights reserved Handling Panel Data, page 1 of 48
Handling Panel Data Copyright © 2007. Protected by International Copyright law. All rights reserved worldwide.
Version: 15 May 2007
This document remains the property of Red Centre Software
Pty Ltd and may only be used by explicitly authorised
individuals who are responsible for its safe-keeping and
return upon request.
No part of this document may be reproduced or distributed
in any form or by any means - graphic, electronic, or
mechanical, including, but not limited to, photocopying,
recording, taping, email or information storage and retrieval
systems - without the prior written permission of Red Centre
Software Pty Ltd.
Copyright © 2007. All rights reserved Handling Panel Data, page 2 of 48
Handling Panel Data
This document examines the various issues regarding handling panel data, and gives
some examples of various processing and analytical approaches using various sets of
fictitious data.
THE NATURE OF PANEL DATA ........................................................................................................... 3
Single Case Example ........................................................................................................................... 3
Confirming the Data Representation by Tables ................................................................................... 5
Some Example Tables ......................................................................................................................... 6
Favourite Program ............................................................................................ 6
Compare 1Jan01 Survey with 1Jun01 Survey ..................................................................................... 7
Number of Programs Watched by Day as a Chart .................................................. 8
Sum of Ratings and Mean Rating by Day............................................................ 10
Daily Mean without Regard to Program.............................................................. 12
Compare Rating for Prog11 to Average Rating and Calculate Anomaly ................... 12
A Multi-Case Example ........................................................................................................................ 13
Basic Histograms ............................................................................................ 13
Time Series.................................................................................................... 16
Period Comparisons ........................................................................................ 17
1000 Cases over 1000 Days over 1000 TV Programs....................................................................... 21
Frequency and Ratings Chart for Prog1.............................................................. 21
Total and Average Viewing Occasions ................................................................ 22
Weekdays and Weekends................................................................................. 23
Most Popular Program Report ........................................................................... 25
What Else did those who Watched Prog1 more than 5 Times in Year1 Watch in Year 2
and 3?........................................................................................................... 27
What Else did those who Rated Prog1 4 or 5 in Year1 Watch in Year 2 and 3? ........ 30
Combinations ................................................................................................. 31
10 Cases, 20 Programs, 14 Days....................................................................................................... 34
Basic Analysis................................................................................................. 35
Most Popular Program Report ........................................................................... 39
Isolating Weekdays from Weekends .................................................................. 41
Comparing Behaviour Across Periods................................................................. 45
Copyright © 2007. All rights reserved Handling Panel Data, page 3 of 48
TTTTHE HE HE HE NNNNATURE OF ATURE OF ATURE OF ATURE OF PPPPANEL ANEL ANEL ANEL DDDDATAATAATAATA
The primary difference between panel data and normal survey data is that for a panel the
respondent is not anonymous, and may respond to the same questionnaire multiple
times. Multiplicity allows tracking at the individual level – for example, it should be
possible to calculate the average time between purchases by identifying respondents who
have purchased more than once, deriving the gap from the dates of purchase, and then
averaging the gaps.
Structurally, this requires that a case is a panel member rather than a questionnaire, and
that subsequent instances of the same questionnaire by the same respondent append to
the existing case for that respondent.
Single Case Example
To make this clear, here is some sample data for a TV ratings panel. It is possible to
watch the same program more than once on a single day. Programs are coded as two
digits, range 10 to 19. Ratings are 1 to 5.
Date: 1Jan01
Respondent Panel ID: 1
Mon Tues Wed Thur Fri Sat Sun
Program 11 12 10 11 18 11 16
Rating 3 1 3 5 3 2 2
Program 12 11 13 14 19 17 13
Rating 4 1 4 3 3 3 4
Program 19 14 18 19 11 19 15
Rating 2 3 2 2 4 2 5
Program 15 19 11 15
Rating 3 3 4 5
For each program watched, the data storage needs to capture the program code, the
rating and the date. This requires a hierarchy – for each program there is a rating, and
for each rating there is a date. Coding the dates as YMMDD, the data for Monday 1Jan01
program 11 rating 3 can be represented in Ruby format (patent pending) as
a11b3c10101
Continuing on for the same day, program 12 rating 4 and program 19 rating 2 are
respectively
a12b4c10101
a19b2c10101
Putting these data segments together for all of Monday 1Jan01 gives
Monday: a11b3c10101a12b4c10101a19b2c10101
Similarly, the other days are
Tuesday: a12b1c10102a11b1c10102a14b3c10102a15b3c10102
Copyright © 2007. All rights reserved Handling Panel Data, page 4 of 48
Wednesday: a10b3c10103a13b4c10103a18b2c10103
Thursday: a11b5c10104a14b3c10104a19b2c10104a19b3c10104
Friday: a18b3c10105a19b3c10105a11b4c10105a11b4c10105
Saturday: a11b2c10106a17b3c10106a19b2c10106
Sunday: a16b2c10107a13b4c10107a15b5c10107a15b5c10107
Putting all the days together, to make a single case, gives
a11b3c10101a12b4c10101a19b2c10101a12b1c10102a11b1c10102a14b3c10102a15b3c101
02a10b3c10103a13b4c10103a18b2c10103a11b5c10104a14b3c10104a19b2c10104a19b3c1
0104a18b3c10105a19b3c10105a11b4c10105a11b4c10105a11b2c10106a17b3c10106a19b2
c10106a16b2c10107a13b4c10107a15b5c10107a15b5c10107
Six months later, Respondent ID #1 participates again, with data as
Date: 1Jun01
Respondent Panel ID: 1
Mon Tues Wed Thur Fri Sat Sun
Program 10 15 11 11 12 11 18
Rating 2 4 5 5 5 5 3
Program 13 11 17 17 16 13 17
Rating 2 4 3 2 2 4 2
Program 11 16 19 16 12 12 11
Rating 4 2 1 3 5 1 5
Program 17 12 16
Rating 2 1 5
Program 13 12
Rating 1 5
In Ruby format,
Monday: a10b2c010601a13b2c10601a11b4c10601
Tuesday: a15b4c10602a11b4c10602a16b2c10602a17b2c10602a13b1c10602
Wednesday: a11b5c10603a17b3c10603a19b1c10603
Thursday: a11b5c10604a17b2c10604a16b3c10604
Friday: a12b5c10605a16b2c10605a12b5c10605
Saturday: a11b5c10606a13b4c10606a12b1c10606a12b1c10606
Sunday: a18b3c10607a17b2c10607a11b5c10607a16b5c10607a12b5c10607
Putting all the days together gives
a10b2c010601a13b2c10601a11b4c10601a15b4c10602a11b4c10602a16b2c10602a17b2c10
602a13b1c10602a11b5c10603a17b3c10603a19b1c10603a11b5c10604a17b2c10604a16b3c
10604a12b5c10605a16b2c10605a12b5c10605a11b5c10606a13b4c10606a12b1c10606a12b
1c10606a18b3c10607a17b2c10607a11b5c10607a16b5c10607a12b5c10607
Finally, the response strings for the weeks of 1Jan01 and 1Jun01 are concatenated to a
single string, which is the entire case for Respondent #1.
a11b3c10101a12b4c10101a19b2c10101a12b1c10102a11b1c10102a14b3c10102a15b3c101
02a10b3c10103a13b4c10103a18b2c10103a11b5c10104a14b3c10104a19b2c10104a19b3c1
0104a18b3c10105a19b3c10105a11b4c10105a11b4c10105a11b2c10106a17b3c10106a19b2
c10106a16b2c10107a13b4c10107a15b5c10107a15b5c10107a10b2c010601a13b2c10601a1
1b4c10601a15b4c10602a11b4c10602a16b2c10602a17b2c10602a13b1c10602a11b5c10603
a17b3c10603a19b1c10603a11b5c10604a17b2c10604a16b3c10604a12b5c10605a16b2c106
05a12b5c10605a11b5c10606a13b4c10606a12b1c10606a12b1c10606a18b3c10607a17b2c1
0607a11b5c10607a16b5c10607a12b5c10607
Copyright © 2007. All rights reserved Handling Panel Data, page 5 of 48
Confirming the Data Representation by Tables
Calling the example case and variable Prog1 (‘1’ for 1 case), each day of the original grid
for week commencing 1Jan01 can be obtained by the table Prog1.Program by
Prog1.Rating, weighted on Prog1.Rating and filtered to Prog1.Date(01010),
Prog1.Date(10102), etc.
Mon Tues Wed Thur Fri Sat Sun
Program 11 12 10 11 18 11 16
Rating 3 1 3 5 3 2 2
Program 12 11 13 14 19 17 13
Rating 4 1 4 3 3 3 4
Program 19 14 18 19 11 19 15
Rating 2 3 2 2 4 2 5
Program 15 19 11 15
Rating 3 3 4 5
For days where the same program was watched more than once (enscribed in red), the
cell value shows the sum of ratings (by virtue of weighting on Rating), and the code
mean in the bottom row shows the average rating.
Similarly, for the week commencing 1Jun06, the data (as above) is
Mon Tues Wed Thur Fri Sat Sun
Program 10 15 11 11 12 11 18
Rating 2 4 5 5 5 5 3
Program 13 11 17 17 16 13 17
Rating 2 4 3 2 2 4 2
Program 11 16 19 16 12 12 11
Rating 4 2 1 3 5 1 5
Program 17 12 16
Rating 2 1 5
Program 13 12
Rating 1 5
Copyright © 2007. All rights reserved Handling Panel Data, page 6 of 48
The table is
Some Example Tables
Favourite Program
Specification is
Sorting on the Code Mean row puts the highest ratings to the left.
Copyright © 2007. All rights reserved Handling Panel Data, page 7 of 48
The favourite program is prog15, with a rating of 4.2, although prog11, at 3.9, is
watched more often. Sorting on the Total row instead gives
Compare 1Jan01 Survey with 1Jun01 Survey
This is done by filtering the Top axis to the date ranges for each.
Specification is
Copyright © 2007. All rights reserved Handling Panel Data, page 8 of 48
The table is
This shows that in terms of ratings prog13 has dropped off from 4.0 to 2.3, and that
prog15 at 4.3 in week of 1Jan01 has been surplanted by prog11 at 4.7 in the week of
1Jun01.
Number of Programs Watched by Day as a Chart
Specification is
Copyright © 2007. All rights reserved Handling Panel Data, page 9 of 48
Chart as stacked bars is
Copyright © 2007. All rights reserved Handling Panel Data, page 10 of 48
Three programs including duplicates were watched on 7 days, four on 5 days, and five on
2 days.
Sum of Ratings and Mean Rating by Day
The sum of ratings is achieved by weighting Program on Rating.
Specification is
Chart is
Copyright © 2007. All rights reserved Handling Panel Data, page 11 of 48
The sum of ratings for 7Jun01 (date code 10607) is 20. The rightmost fragment of the
daily breakdown table above is
5+5+5+2+3=20, for the stacked bars, and 20/5 = 4, for the code mean.
Copyright © 2007. All rights reserved Handling Panel Data, page 12 of 48
Daily Mean without Regard to Program
The bottom row here is the same as the black line series in the chart above.
Compare Rating for Prog11 to Average Rating and Calculate Anomaly
The specification is
Note how the anomaly expression (v1-v2)/(v2/v2) forces days with no rating for prog11
to be excluded completely.
The table is
Copyright © 2007. All rights reserved Handling Panel Data, page 13 of 48
The first row is the average rating for all programs. The second row is the average for
prog11 only. The third row is the anomaly, calculated as row1 – row2 for columns with a
rating for prog11 only.
A Multi-Case Example
The value of working with a single case is that the processing can be traced through and
confirmed. Here is a more realistic example, using 1000 programs over 10,000 cases
from 1Jan2001 to 28Dec2006. To facilitate creating the fictitious data I have limited
days in month to 28.
To prevent a potential combinatorial explosion, the level B and C of the hierarchy are
uncoded. If required by a specification, Ruby codes the requested levels on the fly. Level
A, for the actual TV programs, must be explicitly coded, otherwise there would be no
indication of program on the final output. The ratings at 1 to 5 are self-coding, and the
dates in YMMDD format are also self-coding, and are actually easier to enter directly into
filters than it would be to locate and click or drag if using the mouse.
Random dates are assigned over all cases. This simulates a worst-case panel where the
same respondent can provide more than one set of responses at different times.
Basic Histograms
The data was initiated as random numbers in Excel, and then manually adjusted to plant
a few ‘results’ which can then be ‘discovered’.
Copyright © 2007. All rights reserved Handling Panel Data, page 14 of 48
Count of Programs:
The programs are on the X axis in coded order. I could not think up 1000 meaningful TV
program names, so you have to imagine that prog1 = MASH, prog2= ER, prog3 =
Surviver, or whatever, etc.
The chart shows that I have spiked prog1, 333, 2, plus several others. Sorting the X
axis and zooming to the left shows the most frequently viewed programs.
Copyright © 2007. All rights reserved Handling Panel Data, page 15 of 48
Count of Dates
The Dates at Level C self-code on placement as the Top axis (to make the columns).
This sorts them in date order, since the format YMMDD if commencing post 2000 will be
in numeric order. The trend line shows that there are on average a bit under 58 program
ratings per day.
Count of Ratings
This one is easier as a table. It shows that no one rating is any more likely than any
other. Our respondents are a very even-minded group.
Copyright © 2007. All rights reserved Handling Panel Data, page 16 of 48
Time Series
Most Popular Programs
Average Ratings
The mean rating is 3, as would be expected from randomly generated ratings.
Smoothing daily data at MA365 gives the appearance of some movement, but note that
it is within a very narrow band.
Copyright © 2007. All rights reserved Handling Panel Data, page 17 of 48
Period Comparisons
This section looks at queries of respondent behaviour with respect to and in comparison
with different periods. This easiest approach is to construct a multi-response variable for
each program of interest. For Prog1, the filters to separate the viewing occasions across
the six years are
The variable is named Prog1Years. The data for the first several cases is
This shows that case#1, being the respondent with ID=1, watched prog1 three times in
2002, and once each in 2003, 2004 and 2005. Case#11 watched prog1 once in 2006.
Case#1 was manually fiddled to give the six instances of prog1.
Distribution of Prog1 Across the Years
Copyright © 2007. All rights reserved Handling Panel Data, page 18 of 48
In 2001, 111 respondents watched prog1 113 times, and in 2002, 86 respondents
watched 89 times, etc. 111+86+95+100+92+89+9440 = 10013.
Showing the breakdown by case and sorting on the column for 2002 brings case#1 to the
top.
In 2002, case#1 saw prog1 twice extra, and case#6835 once extra, and 2+1=3, and 89-
86=3.
Since any row total >1 indicates prog1 was watched more than once, the full account of
multiple viewings can be obtained by sorting on the Total column.
Copyright © 2007. All rights reserved Handling Panel Data, page 19 of 48
12 cases viewed one extra, two cases (#6362 and #6572) viewed two extra, and one
case (#1) viewed 5 extra, giving 1+1+1+1+1+1+1+1+1+1+1+1+2+2+5=21, which
accounts for the Code1 total of 10,021.
Those who saw Prog1 Three or More Times in 2002
The number of cases could be very large, precluding a table using CaseNum as an axis.
The table immediately above shows that only a single case fits the criterion. The filter
expression is
cvl_Prog1Years(2)>=3
This says ‘return true if the count of values in Prog1Years where the value is code 2 is
greater than or equal to 3’. The table Prog1Years by Count under this filter should
therefore reproduce the first line of Prog1Years by CaseNum sorted on Total, as above.
The table is
Copyright © 2007. All rights reserved Handling Panel Data, page 20 of 48
The filter isolates the single respondent (case#1 as we know from the above table), who
watched prog1 three times in 2002, and once in each of 2003, 2004 and 2005.
Considering those who watched prog1 two or more times in 2002, the table Prog1Years
by CaseNum indicates that the cases are #1 and #6835. #1 watched three times, and
#6835 watched twice, so the table filtered to cvl_Prog1Years(2)>=2 should show 5
instances by 2 respondents.
Copyright © 2007. All rights reserved Handling Panel Data, page 21 of 48
1000 Cases over 1000 Days over 1000 TV Programs
This is probably a bit more realistic, except that since the data was generated randomly,
all programs are virtually equally popular.
The data has been structured differently this time, with Day as the top level. The
reasoning is that Day is the biggest container – it contains the TV program watching
occasions – and so it should be level A. For each day a respondent can watch one or
more TV programs, and for each program, a rating is given.
Frequency and Ratings Chart for Prog1
Unlooping on the fly at 1000*1000*5 = 5 million vectors is clearly unrealistic, but each
level can be used in its own right with relative comfort. Day by Rating filtered to Prog1
with Code Mean takes a bit over 10 seconds to generate and display.
Copyright © 2007. All rights reserved Handling Panel Data, page 22 of 48
This chart shows that fewer respondents watched prog1 around the middle of the period,
but that the rating did not diminish – those who watched continued to like it, and viewer
numbers recovered towards day1000.
Total and Average Viewing Occasions
A count of ratings shows the total number of program viewing occasions with a rating as
3,501,689.
The average number of programs viewed per day can be achieved by .Day by
avg#.Program
Copyright © 2007. All rights reserved Handling Panel Data, page 23 of 48
So, respondents watched about 3.5 programs per day.
Weekdays and Weekends
Weekdays and weekends can be distinguished by a construction which uses modulus
arithmetic. The filters are
This produces a code 1 for days 1/5, and a code 2 for days 6/7. It also produces a lot of
b0 segments. The output showing just two weekend sections across the first four cases
is
This works OK as a filter against Programs1k1k.Day. The chart of Day by Programs
filtered to Weekdays (zoomed to far left) is
Copyright © 2007. All rights reserved Handling Panel Data, page 24 of 48
All days with codes which are multiples of 7 or multiples of 7 minus 1 are empty. The
assumption of course is that days 1/5 are Monday/Friday, 6/7 are Saturday/Sunday, and
likewise without gap thereafter (and if a day(s) is out of field, its code(s) must be
skipped).
The 1000 respondents watch about 3,500 programs per weekday, verifying the 3.5 per
day chart above.
It is a shame about all the wasted space for the b0 segments, however. They could be
simply cleared out using search=”b0” replace=””, but it seemed more useful and natural
to make levelB carry some useful information, so I coded it as year, where
year1=day(1/365), etc
Copyright © 2007. All rights reserved Handling Panel Data, page 25 of 48
This now gives a b1, b2 or b3 for year1, 2 or 3. This is not much use, however, because
the only matching structures are Programs1k1k.Day or Periods1k1k.DayType.
It might then actually be safer to kill the b0 segments and keep levelB as a dummy.
Most Popular Program Report
The several different ways of defining ‘most popular’ can all be accommodated by sorting
columns on a single table.
Program with Highest number of viewings:
prog998 has data recorded against it 3682 times.
Copyright © 2007. All rights reserved Handling Panel Data, page 26 of 48
Top: Count1k1k, Programs1k1k Rating
Side: Programs1k1k Program
SideSort: by column 1 decreasing
Programs1k1k Program
prog988
prog441
prog835
prog760
prog932
prog604
prog468
prog511
prog89
prog109
Count Programs1k1k Rating
Code 1Cases
WFrat1 rat2 rat3 rat4 rat5
Code
Mean
3,682 972 770 742 720 702 748 2.98
3,674 977 755 717 704 754 744 3.00
3,673 979 708 721 804 718 722 3.01
3,660 982 750 792 687 731 700 2.96
3,654 968 751 744 705 684 770 2.99
3,648 972 697 726 678 760 787 3.06
3,643 968 753 731 766 706 687 2.96
3,640 980 714 770 732 718 706 2.98
3,638 973 746 751 732 713 696 2.96
3,635 977 761 664 727 749 734 3.01
FrequenciesFrequencies
Program with Highest Number of Viewers:
prog853 was viewed by 986 respondents.
Top: Count1k1k, Programs1k1k Rating
Side: Programs1k1k Program
SideSort: by column 2 decreasing
Programs1k1k Program
prog853
prog697
prog90
prog317
prog292
prog454
prog694
prog760
prog27
prog32
Count Programs1k1k Rating
Code 1Cases
WFrat1 rat2 rat3 rat4 rat5
Code
Mean
3,603 986 733 707 724 740 699 2.99
3,517 985 739 700 704 674 700 2.97
3,610 984 717 693 695 764 741 3.03
3,545 983 683 705 698 695 764 3.04
3,463 982 684 669 715 675 720 3.02
3,595 982 723 696 737 703 736 3.01
3,522 982 680 673 739 692 738 3.04
3,660 982 750 792 687 731 700 2.96
3,580 981 702 691 703 722 762 3.04
3,487 981 732 684 742 668 661 2.95
FrequenciesFrequencies
Program with Highest Number of Rating 5s:
prog65 was rating a 5 804 times.
Copyright © 2007. All rights reserved Handling Panel Data, page 27 of 48
Top: Count1k1k, Programs1k1k Rating
Side: Programs1k1k Program
SideSort: by column 7 decreasing
Programs1k1k Program
prog65
prog729
prog604
prog905
prog330
prog708
prog932
prog228
prog85
prog740
Count Programs1k1k Rating
Code 1Cases
WFrat1 rat2 rat3 rat4 rat5
Code
Mean
3,620 972 727 736 678 675 804 3.03
3,611 976 731 679 671 735 795 3.05
3,648 972 697 726 678 760 787 3.06
3,599 973 726 694 692 705 782 3.03
3,592 971 682 708 702 724 776 3.06
3,563 967 659 720 680 728 776 3.07
3,654 968 751 744 705 684 770 2.99
3,559 975 661 710 706 714 768 3.06
3,593 978 696 714 720 697 766 3.03
3,612 975 720 709 685 732 766 3.03
FrequenciesFrequencies
Program with Highest Code Mean:
prog719, 375 and 708 all get a mean rating of 3.07.
Top: Count1k1k, Programs1k1k Rating
Side: Programs1k1k Program
SideSort: by column 8 decreasing
Programs1k1k Program
prog719
prog375
prog708
prog744
prog12
prog936
prog228
prog58
prog536
prog218
Count Programs1k1k Rating
Code 1Cases
WFrat1 rat2 rat3 rat4 rat5
Code
Mean
3,436 965 641 649 692 723 731 3.07
3,420 964 629 657 707 695 732 3.07
3,563 967 659 720 680 728 776 3.07
3,433 968 638 690 654 713 738 3.06
3,492 963 641 699 692 722 738 3.06
3,458 977 620 708 686 728 716 3.06
3,559 975 661 710 706 714 768 3.06
3,440 970 639 688 667 716 730 3.06
3,447 963 649 660 691 728 719 3.06
3,509 975 645 688 714 736 726 3.06
FrequenciesFrequencies
What Else did those who Watched Prog1 more than 5 Times in Year1 Watch in Year 2 and 3?
It is not possible to count code instances within a period inside a hierarchic structure at
crosstab time because filters would be needed, and since hierarchic, all non-filtered
periods disappear. The likely contender for an on-the-fly filter expression would be
Programs1k1k.Day(1/365)&(cvl_Programs1k1k.Program(1)>3)
Copyright © 2007. All rights reserved Handling Panel Data, page 28 of 48
But this delivers an empty table. Cvl_ Programs1k1k.Program(1)>3 will return true only
for something like b1;1;1;1, and Programs1k1k.Day(1/365) excludes all days from code
366 onwards.
(A handy function would be cvf_, which counts across segments and returns a single
integer (‘f’ for ‘flat).)
Thus an intermediate construction is required. This filter writes a code 1 for each true
hierarchic node.
Case#4 and case#12 viewed prog1 3 times in year1, etc. This is done as a general
template so that anytime a similar construction is required – say for a given week – or
for a different program – then simply change the Day range and the Program code.
Sorting ProgCounter allows prediction of what should happen.
Copyright © 2007. All rights reserved Handling Panel Data, page 29 of 48
Cvl_ >6 should produce an empty chart. Cvl_ >5 should have a base count of N=2, but
because the data is random over 1000 TV programs over 1000 days, quite a few TV
programs will be passed by the filter. Cvl_ >4 will allow more, etc.
The specification for the final chart is .Day by .Program filtered to cvl_ProgCounter>5.
On a 90 day roll, the chart is
Note the Base: N=2 at the bottom of the chart. This is as expected.
Cvl_ >6 should be empty.
Copyright © 2007. All rights reserved Handling Panel Data, page 30 of 48
Note the Base: N=0 at the bottom of the chart. This is as expected.
What Else did those who Rated Prog1 4 or 5 in Year1 Watch in Year 2 and 3?
The ProgCounter construction can be extended to carry a variety of different counts.
Adding a code 2 defined like this
delivers the data as
Copyright © 2007. All rights reserved Handling Panel Data, page 31 of 48
The code 1s are now interspersed with code 2s. The chart specification is .Day by
.Programs filtered to ProgCounter(2).
Combinations
Any type of combination is now easy to specify. The Side variable could be .Ratings, and
the filter could be any AND or OR of any number of code 1s in ProgCounter(1) and
ProgCounter(2), such as
Copyright © 2007. All rights reserved Handling Panel Data, page 32 of 48
Ratings in Years 2 and 3 for those who saw Prog1 > 5 AND rated Prog1 a 4 or 5 in year 1.
Copyright © 2007. All rights reserved Handling Panel Data, page 33 of 48
Ratings in Years 2 and 3 for Prog1 for those who saw Prog1 > 3 AND rated Prog1 a 4 or 5 in year 1.
The X axis has no hidden points, so that the greater density in Year1 is apparent. Rating
5s dominate in years 2 and 3.
Copyright © 2007. All rights reserved Handling Panel Data, page 34 of 48
10 Cases, 20 Programs, 14 Days
This is an artificial data set for examining various approaches to analysing a panel for TV
program ratings. The basic data is organised hierarchically as
Level A: For each day
Level B: For each program viewed on that day
Level C: Give a rating from 1 to 5
Showing part of the first and second days only, the structure is
The data is small so that generation can be traced through the individual cases.
The Ruby variable which represents this structure is called Program10 (‘10’ for 10 cases).
All variables for this exercise have a trailing ‘10’ in the name.
Copyright © 2007. All rights reserved Handling Panel Data, page 35 of 48
Basic Analysis
1. When and how often programs were watched
This is given by a grid table of Day by Prog.
Copyright © 2007. All rights reserved Handling Panel Data, page 36 of 48
Prog1 was watched once on day1, three times on day2, etc. Prog18 was watched by 7
respondents, and prog20 was watched by 8 respondents, etc. All respondents submitted
a set of ratings for each of the 14 days.
2. Identifying Respondents
The 7 respondents who watched prog18 and the 8 who watched prog20 can be identified
by a table of Program10.Prog(18;20) by Case10.
Copyright © 2007. All rights reserved Handling Panel Data, page 37 of 48
The 7 cases for prog18 are 1, 2, 3, 4, 5, 6, 8. The 8 cases for prog20 are 1, 2, 3, 4, 6, 7,
9, 10.
3. High and Low Ratings
The table Rating by Day with Totals (for number of times a program was rated) and
sorted code mean (for average rating) shows which days favoured which ratings.
Looking at just the rating 5s, days 3, 7 and 9 must have had some popular programs,
with 10 5s. Day13, with only 2 rating 5s, fewest viewing occasions at 29, and an
average rating of 2.59, looks like a loser.
Copyright © 2007. All rights reserved Handling Panel Data, page 38 of 48
4. Popularity Specification
The table Rating by Program with columns for Count, Total and Code Mean is useful for
analysing popularity from several different points of view. The specification is
The table (top part only) with column% is
Copyright © 2007. All rights reserved Handling Panel Data, page 39 of 48
Most Popular Program Report
The several different ways of defining ‘most popular’ can all be accommodated by sorting
columns on the above table. Percents are not needed.
5. Viewed Daily by Everyone:
Programs 1, 2, 3, 5, 8, 14, 15 and 16.
Copyright © 2007. All rights reserved Handling Panel Data, page 40 of 48
6. Program with Highest number of viewings:
Prog16, at 35 times. This could have been equivalently obtained from the Total column –
the base values are different, but the frequency counts are the same.
7. Program with Highest Number of Rating 5s:
Prog15, with 11 rating 5s.
8. Program with Highest Code Mean:
Prog15 again, at 3.52.
Copyright © 2007. All rights reserved Handling Panel Data, page 41 of 48
Isolating Weekdays from Weekends
There are several ways this could be done. In this example, since the days are coded
from 1, then assuming 1/5 and 8/14 are the weekdays, modulus arithmetic (remainder
after division, Ruby operator is ?) can be used to make a code 1 for Monday to Friday,
and a code 2 for Saturday and Sunday. This must be a hierarchic construction so that
the structure matches Program10. Level B is not needed, and so padded with a code
zero.
9. Time Series Chart of Program by Weekday
A chart of Day by Program filtered to DayType(Weekday) shows that the weekends have
been skipped.
Copyright © 2007. All rights reserved Handling Panel Data, page 42 of 48
Copyright © 2007. All rights reserved Handling Panel Data, page 43 of 48
10. Chart Comparing Mean Rating across Weekday/Weekend
The specification is made by
• Drag Program10.Prog to Top (bent red line)
• Select DayType10.Type and click the top Nest button (Lavender rectangles)
• Right-click in Top and select Spacer, move to between the two filters
• Drag Program10.Rating to Side (straight red line), remove all codes
• Drag Code Mean to Side and drop on top of Program10.Rating (blue line)
• Select Chart as the output type and click Run (black rectangles).
Copyright © 2007. All rights reserved Handling Panel Data, page 44 of 48
Note the Spacer which separates the nested filters for Top, and that for Side the only
item is the Code Mean. The chart, using a Nested display style for the X axis, is
Copyright © 2007. All rights reserved Handling Panel Data, page 45 of 48
This shows that prog7 and prog15 both rate better on weekends, and progs 8, 9 and 17
rate worse.
Comparing Behaviour Across Periods
11. What programs were watched in week2 by those who saw prog1 more than twice in week1?
This sort of query requires constructing a flat variable for filtering across all cases and
periods. The next chart will be of those who rated prog1 a 4 to 5, so the construction
ProgCounter10 does both at once.
The filter for the constructed code 1 says ‘Day was a 1 to 7 AND the program was prog1’.
Copyright © 2007. All rights reserved Handling Panel Data, page 46 of 48
The filter for code 2 additionally specifies the rating as 4 to 5.
The Variable Data window says that case#1, with data of 1;2;1;2;1, saw prog1 three
times (there are three 1s) and rated it a 4 or 5 twice (there are two 2s), etc. More
program counter criteria for filtering can be defined and added as required.
The chart specification is Day by Program filtered to count of code 1s in
ProgCounter10>2. As can be seen from the Variable Data window immediately above,
this is true for cases 1 and 8.
The specification is made by
• Drag Program10.Day to Top (red line)
• Drag Program10.Prog to Side (red line)
• Check Case (blue rectangle)
• Select the function cvl (Count of Values, blue rectangle)
• Drag Watched Prog1 in Week1 to Filter (blue rectangle and line)
• Type in the condition as >2 (blue underline)
Copyright © 2007. All rights reserved Handling Panel Data, page 47 of 48
Base: N=2 at the bottom of the chart means cases 1 and 8.
12. What programs were watched in week2 by those who rated prog1 a 4 or 5 in week1?
This is the above chart with the filter instead as ProgCounter10(2). This is true of 5
cases. The chart is
Copyright © 2007. All rights reserved Handling Panel Data, page 48 of 48
Similarly, it is easy to create charts of ratings filtered to those who watched, etc.
[end of document]