Download - Kenett on info q and pse
![Page 1: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/1.jpg)
1
On InfoQ and PSE:
A brief introduction
Ron S. KenettKPA Ltd., Raanana, Israel and University of Torino, Torino, Italy
![Page 2: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/2.jpg)
2
IntroductionThis presentation is about doing the right research with statistical methods, the right way - we call it Quality Research. Research is a critical activity leading to knowledge acquisition and formulation of policies and management decisions.
By effective research we mean research that produces an impact, as intended by decision makers. One measure of effective research is Information Quality (InfoQ), an approach developed by Kenett and Shmueli (2009) to assess Information Quality. Practical Statistical Efficiency (PSE) is assessing the level of implementation of the research recommendations (Kenett, Coleman and Stewardson, 2003).
![Page 3: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/3.jpg)
3
Information Quality (Information Quality (InfoQInfoQ))
Are we doing the right research?
![Page 4: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/4.jpg)
4
Information Quality (Information Quality (InfoQInfoQ))
Primary Data Secondary Data- Experimental - Experimental- Observational - Observational
Data Quality
Information Quality
Analysis Quality
Knowledge
Goals
Kenett, R. abd Shmueli, G., “On Information Quality”, http://ssrn.com/abstract=1464444, 2009.
![Page 5: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/5.jpg)
5
Practical Statistical Efficiency (PSE)Practical Statistical Efficiency (PSE)
Are our research recommendations having an impact?
![Page 6: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/6.jpg)
6
PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}
Practical Statistical Efficiency (PSE)Practical Statistical Efficiency (PSE)
• V{D} = value of the data actually collected• V{M} = value of the statistical method employed• V{P} = value of the problem to be solved• P{S} = probability that the problem actually gets solved• V{PS} = value of the problem being solved• P{I} = probability the solution is actually implemented• T{I} = time the solution stays implemented• E{R} = expected number of replications
Kenett, R.S., Coleman, S.Y. and Stewardson, D. (2003), “Statistical Efficiency: The Practical Perspective”, Quality and Reliability Engineering International, 19: 265-272.
![Page 7: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/7.jpg)
7
Information Quality (Information Quality (InfoQInfoQ))
Data Quality
Information Quality
Analysis Quality
Knowledge
Goals
![Page 8: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/8.jpg)
8
1. Data resolution2. Data structure3. Data integration4. Temporal relevance5. Sampling bias6. Chronology of data and goal7. Concept operationalization8. Communication and data visualization
Information Quality (Information Quality (InfoQInfoQ))
![Page 9: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/9.jpg)
9
The InfoQ Suisse Cheese Model
Data structure
Temporal relevance
Data resolution
Sampling bias
Chronology of data and goal
Concept operationalization
Communication and data visualization
Data integration
![Page 10: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/10.jpg)
10
InfoQ1: Data Resolution• Two aspects of data resolution are measurement
scale and data aggregation. • The measurement scale of the data must be
adequate for the purpose of the study.• The level of aggregation of the data relative to the
task at hand. For example, consider data on daily purchases of over-the-counter medications at a large pharmacy. If the goal of the analysis is to forecast future inventory levels of different medications, when re-stocking is done on a weekly basis, then we would prefer weekly aggregate data to daily aggregate data.
![Page 11: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/11.jpg)
11
InfoQ2: Data Structure
• The data can combine structured quantitative data with unstructured, semantic based data.
• For example, in assessing the reputation of an organization one might combine data derived from balance sheets with data mined from text such as newspaper archives or press reports.
![Page 12: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/12.jpg)
12
InfoQ3: Data Integration
• Knowledge is often spread out across multiple data sources.
• Hence, identifying the different relevant sources, collecting the relevant data, and integrating the data, directly affect information quality.
![Page 13: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/13.jpg)
13
InfoQ4: Temporal Relevance• A data set contains information collected during a
certain period of time. The degree of relevance of the data to the current goal at hand must be assessed.
• For instance, in order to learn about current online shopping behaviors, a dataset that records online purchase behavior (such as Comscore data (www.comscore.com)) can be irrelevant if it is even several years old, because of the fast changing online shopping environment.
![Page 14: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/14.jpg)
14
InfoQ5: Chronology of Data and Goal• A data set contains daily weather information for a particular
city for a certain period as well as information on the Air Quality Index (AQI) on those days.
• For the United States such data are publicly available from the National Oceanic and Atmospheric Administration website (http://www.noaa.gov). To assess the quality of the information contained in this data set, we must consider the purpose of the analysis.
• Although AQI is widely used (for instance, for issuing a “code red” day), how it is computed is not easy to figure out. One analysis goal might therefore be to find out how AQI is computed from weather data (by reverse-engineering). For such a purpose, this data is likely to contain high quality information. In contrast, if the goal is to predict future AQI levels, then the data on past temperatures contains low- quality information.
![Page 15: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/15.jpg)
15
InfoQ6: Sampling Bias• A clear definition of the population of interest and how the
sample relates to that population is necessary in both primary and secondary analyses.
• Dealing with sampling bias can be proactive or retroactive. In studies where there is control over the design (e.g., surveys), sampling schemes are selected to reduce bias. Such methods do not apply to retrospective studies. However, retroactive measures such as post-stratification weighting, which are often used in survey analysis, can be useful in secondary studies as well.
![Page 16: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/16.jpg)
16
InfoQ7: Concept Operationalization
• Observable data are an operationalization of underlying concepts. “Anger” can be measured via a questionnaire or by measuring blood pressure; “economic prosperity” can be measured via income or by unemployment rate; and “length” can be measured in centimeters or in inches.
• The role of concept operationalization is different for explanatory, predictive, and descriptive goals,.
![Page 17: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/17.jpg)
17
InfoQ8: Communication and Data Visualization
• If crucial information does not reach the right person at the right time, then the quality of information becomes poor.
• Data visualization is also directly related to the quality of information. Poor visualization can lead to degradation of the information contained in the data.
![Page 18: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/18.jpg)
18
For each measure, Yi(x) is defined as a univariate desirability function di(Yi) which assigns numbers between 0 and 1 to the possible values of Yi, with di(Yi)=0 representing a completely undesirable value of Yi and di(Yi)=1 representing a completely desirable or ideal response value. The individual desirabilities are then combined to an overall desirability index using the geometric mean of the individual desirabilities:
Desirability Function = [(d1(Y1) x d2(Y2))x … dk(Yk))]1/k
with k denoting the number of measures. Notice that if any response Yi is completely undesirable (di(Yi) = 0), then the overall desirability is zero.
We use the Desirability Function to compute an InfoQ Score based on an assessment of indicators reflecting the 8 InfoQ dimensions.
Derringer, G., and Suich, R., (1980), "Simultaneous Optimization of Several Response Variables," Journal of Quality Technology, 12, 4, 214-219.Harrington, E. C. (1965). The desirability function. Industrial Quality Control, 21, 494-498
The InfoQ Score
![Page 19: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/19.jpg)
19
The InfoQ ScoreInfoQ Score = [(d1 (Y1 ) x d2 (Y2 ))x …d8 (Y8 ))]1/8
1. Data resolution2. Data structure3. Data integration4. Temporal relevance5. Sampling bias6. Chronology of data and goal7. Concept operationalization8. Communication and data visualization1 2 345 6 7 8
The lower the better
The higher the betterOn target
![Page 20: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/20.jpg)
20
PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}
Practical Statistical Efficiency (PSE)Practical Statistical Efficiency (PSE)
• V{D} = value of the data actually collected• V{M} = value of the statistical method employed• V{P} = value of the problem to be solved• P{S} = probability that the problem actually gets solved• V{PS} = value of the problem being solved• P{I} = probability the solution is actually implemented• T{I} = time the solution stays implemented• E{R} = expected number of replications
![Page 21: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/21.jpg)
21
PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}
V{D} = value of the data actually collected
Readily accessible data, is like observations below the lamppost
where there is light -not necessarily where you lost your
key or where the answer to your problem lies
![Page 22: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/22.jpg)
22
PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}
V{M} = value of the statistical method employed
A mathematical definition of statistical efficiency is given by:
Relative Efficiency of Test A versus Test B = Ratio of sample size for test
A to sample size for test B, where sample sizes are determined so that both
tests reach a certain power against the same alternative.
![Page 23: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/23.jpg)
23
PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}
V{P} = value of the problem to be solved
Statisticians too often forget this part of the equation. We frequently choose problems to be solved on
the basis of their statistical interest rather than the value of solving
them.
![Page 24: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/24.jpg)
24
PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}
P{S} = probability that the problem actually gets solved
Usually no one method or attempt actually solves the entire problem, only part of it. So this part of the
equation could be expressed as a fraction
![Page 25: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/25.jpg)
25
PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}
V{PS} = value of the problem being solved
This is both a statistical question and a management question. Did the
method work and lead to a solution that worked and were the data,
information and resources available to solve the problem?
![Page 26: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/26.jpg)
26
PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}
P{I} = probability the solution is actually implemented
Here is the non-statistical part of the equation that is often the most
difficult to evaluate. Implementing the solution may be far harder than just
coming up with the solution.
![Page 27: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/27.jpg)
27
PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}
T{I} = time the solution stays implemented
Problems have the tendency not to stay solved. This is why we need to put much emphasis on holding the gains in any process improvement.
![Page 28: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/28.jpg)
28
PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}
E{R} = expected number of replications
This is the part most often missed in companies. If the basic idea of the
solution could be replicated in other areas of the company, the savings
could be enormous.
![Page 29: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/29.jpg)
29
The Quality Ladder: Matching Management Approach with Statistical Methods
QualityQuality by Designby Design
ProcessProcess ImprovementImprovement
InspectionInspection
FireFire FightingFighting
Design of Design of ExperimentsExperiments
StatisticalStatistical ProcessProcess ControlControl
SamplingSampling
Data AccumulationData AccumulationKenett, R. and Zacks S., Modern Industrial Statistics: Design and Control of Quality and Reliability (with S. Zacks), Duxbury Press, San Francisco, 1998, Spanish edition 2002, 2nd paperback edition 2002, Chinese edition 2004.
![Page 30: Kenett on info q and pse](https://reader033.vdocuments.us/reader033/viewer/2022051515/54c419204a7959223b8b4569/html5/thumbnails/30.jpg)
30
The Statistical Efficiency ConjectureLet PSE = PSE of a specific project and L= the maturity level of an organization on the Quality Ladder (L=1,…4).
PSE is a random variable with specific realisations for individual projects.
E{ PSE } = The expected value of PSE in a given organisation over all projects.
The Statistical Efficiency Conjecture is linking Expected Practical Statistical Efficiency with the maturity of an organisation on the Quality Ladder.
In more formal terms it is stated as:
Conditioned on the right variable, E{ PSE } is an increasing function of L
Kenett, R., De Frenne, A., Tort-Martorell, X and McCollin, C., The Statistical Efficiency Conjecture, Chapter 4 in Applying Statistical Methods in Business and Industry – the state of the art , Coleman S., Greenfield, T. and Montgomery, D. (editors), John Wiley and Sons, 2008.
We partially demonstrated this with 21 case studies