sixsigma reference

Upload: ramnadh803181

Post on 05-Apr-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Sixsigma Reference

    1/38

    Six Sigma Ref

    Definition:

    Tool to use: What does it do?

    Picture available---> Click Go

    Defectives Y / Continuous &

    Discrete X

    N/A

    The word defective describes an entire unit that

    unit may be defective because of one or more d

    a graphical tool that allows you to view the

    proportion of defectives and detect the

    presence of special causes. The p chart is

    used to understand the ratio of

    nonconforming units to the total number of

    units in a sample.

    Data Type:

    P < .05 Indicates:

  • 8/2/2019 Sixsigma Reference

    2/38

    erence Tool rev. 2.0bAuthor: R. Chapin

    Why use it? When to use?

    ails to meet acceptance criteria, regardless of the number of defects within the unit. A

    fects.

    The p chart is a tool that will help you

    determine if your process is in control by

    determining whether special causes are

    present. The presence of special cause

    variation indicates that factors are

    influencing the output of your process.

    Eliminating the influence of these factors

    will improve the performance of your

    process and bring your process into

    control

    You will use a p chart in the Control

    phase to verify that the process remains

    in control after the sources of special

    cause variation have been removed.

    The p chart is used for processes that

    generate discrete data. The sample size

    for the p chart can vary but usually

    consists of 100 or more

  • 8/2/2019 Sixsigma Reference

    3/38

    Six Sigma 12 Step Process

    Step Description Focus Deliverable

    0 Project Selection Identify project CTQ's, develop team charter, define high-level process ma

    1 Select CTQ characteristics Y Identify and measure customer CTQ's

    2 Define Performance StandardsY

    Define and confirm specifications for the Y

    3 Measurement System Analysis Y Measurement system is adequate to measure Y

    4 Establish Process Capability Y Baseline current process; normality test

    5 Define Performance Objectives Y Statisicly define goal of project

    6 Identify Variation Sources X List of statistically significant X's based on analysis of historical data

    7 Screen Potential Causes X Determine vital few X's that cause changes to your Y

    8 Discover Variable Relationships XDetermine transfer function between Y and vital few X's; Determine optima

    settings for vital few X's; Perform confirmation runs

    9 Establish Operating Tolerances Y, X Specify tolerances on the vital few X's

    10Define and Validate Measurement System on X's

    in actual application Y, XMeasurement system is adequate to measure X's

    11 Determine Process Capability Y, X Determine post improvement capability and performance

    12 Implement Process Control X Develop and implement process control plan

    90528025.xls.ms_office GE PROPRIETARY INFORMATION

  • 8/2/2019 Sixsigma Reference

    4/38

    Definitions

    Term1-Sample sign test

    Accuracy

    Alpha risk

    Alternative hypothesis (Ha)

    Analysis of variance (ANOVA)

    Anderson-Darling Normality Test

    Attribute DataBar chart

    Benchmarking

    Beta risk

    Bias

    Blocking

    Boxplot

    CAP Includes/Excludes

    CAP Stakeholder Analysis

    Capability Analysis

    Cause

    Cause and Effect Diagram

    Center

    Center points

    Central Limit Theorem

    Characteristic

  • 8/2/2019 Sixsigma Reference

    5/38

    Chi Square test

    Common cause variability

    Confidence band (or interval)

    Confounding

    Consumers Risk

    Continuous Data

    Control limits

    Correlation

    Correlation coefficient (r)

    Critical element

    CTQ

    Cycle time

    Dashboard

    Data

    Defect

    Defective

    Descriptive statistics

    Design Risk Assessment

    Detectable Effect Size

    DF (degrees of freedom)

  • 8/2/2019 Sixsigma Reference

    6/38

    Discrete Data

    Distribution

    DMADV

    DMAIC

    DOE

    DPMO

    DPO

    DPU

    Dunnett's(1-way ANOVA):

    Effect

    Entitlement

    Error

    Error (type I)

    Error (type II)

    Factor

    Failure Mode and Effect Analysis

    Fisher's (1-way ANOVA):

    Fits

    Fitted value

    Fractional factorial DOE

    Frequency plot

    Full factorial DOE

    F-value (ANOVA)

    Gage R&R

    Gannt ChartGoodman-Kruskal Gamma

    GRPI

    Histogram

    Homegeneity of variance

  • 8/2/2019 Sixsigma Reference

    7/38

    Hypothesis testing

    I-MR Chart

    In control

    Independent variable

    Intangible benefits

    Interaction

    Interrelationship digraph

    IQR

    Kano Analysis

    Kruskal-Wallis

    Kurtosis

    L1 Spreadsheet

    L2 Spreadsheet

    Leptokurtic Distribution

    Levels

    Linearity

    LSL

    Lurking variable

    Main Effect

    Mallows Statistic (C-p)

    Mann-Whitney

    Mean

    Measurement system analysis

    Median

    Mode

  • 8/2/2019 Sixsigma Reference

    8/38

    Moods Median

    Multicolinearity

    Multiple regression

    Multi-vari chart

    Noise

    Nominal

    Non-parametric

    Normal Distribution

    Normal probability

    Normality test

    Null Hypothesis (Ho)

    Opportunity

    Outlier

    Percent of tolerance

    Platykurtic Distribution

    Pooled Standard Deviation

    Prediction Band (or interval)

    Probability

    Probability of Defect

    Process Capability

    Producers Risk

  • 8/2/2019 Sixsigma Reference

    9/38

    p-value

    Q1Q3

    Qualitative data

    Quality Function Deployment

    Quantitative data

    Radar Chart

    Randomization

    Rational Subgroup

    Regression analysis

    Repeatability

    Replicates

    Replication

    Reproducibility

    Residual

    Resolution

    Robust Process

    Rolled Throughput Yield

    R-squared

    R-Squared

    R-squared (adj)

    R-Squared adjusted

    Sample

    Sample Size Calc.

    Sampling

  • 8/2/2019 Sixsigma Reference

    10/38

  • 8/2/2019 Sixsigma Reference

    11/38

    Transformations

    Trivial many

    T-test

    Tukey's (1-wayANOVA):

    Unexplained Variation (S)

    Unit

    USL

    Variation

    Variation (common cause)

    Variation (special cause)

    Whisker

    Yield

    Z

    Z bench

    Z lt

    Z shift

    Z st

    184

  • 8/2/2019 Sixsigma Reference

    12/38

    DefinitionTests the probability of sample median being equal to hypothesized value.

    Accuracy refers to the variation between a measurement and what actually exists. It is the difference between an individual's average

    measurements and that of a known standard, or accepted "truth."Alpha risk is defined as the risk of accepting the alternate hypothesis when, in fact, the null hypothesis is true; in other words, stating a

    difference exists where actually there is none. Alpha risk is stated in terms of probability (such as 0.05 or 5%). The acceptable level of alpha

    risk is determined by an organization or individual and is based on the nature of the decision being made. For decisions with high

    consequences (such as those involving risk to human life), an alpha risk of less than 1% would be expected. If the decision involves minimal

    time or money, an alpha risk of 10% may be appropriate. In general, an alpha risk of 5% is considered the norm in decision making.

    Sometimes alpha risk is expressed as its inverse, which is confidence level. In other words, an alpha risk of 5% also could be expressed as

    a 95% confidence level.

    The alternate hypothesis (Ha) is a statement that the observed difference or relationship between two populations is real and not due to

    chance or sampling error. The alternate hypothesis is the opposite of the null hypothesis (P < 0.05). A dependency exists between two or

    more factors

    Analysis of variance is a statistical technique for analyzing data that tests for a difference between two or more means. See the tool 1-Way

    ANOVA.

    P-value < 0.05 = not normal.

    see discrete data

    A bar chart is a graphical comparison of several quantities in which the lengths of the horizontal or vertical bars represent the relative

    magnitude of the values.

    Benchmarking is an improvement tool whereby a company measures its performance or process against other companies' best practices,

    determines how those companies achieved their performance levels, and uses the information to improve its own performance. See the tool

    Benchmarking.

    Beta risk is defined as the risk of accepting the null hypothesis when, in fact, the alternate hypothesis is true. In other words, stating no

    difference exists when there is an actual difference. A statistical test should be capable of detecting differences that are important to you,

    and beta risk is the probability (such as 0.10 or 10%) that it will not. Beta risk is determined by an organization or individual and is based on

    the nature of the decision being made. Beta risk depends on the magnitude of the difference between sample means and is managed by

    increasing test sample size. In general, a beta risk of 10% is considered acceptable in decision making.

    Bias in a sample is the presence or influence of any factor that causes the population or process being sampled to appear different from

    what it actually is. Bias is introduced into a sample when data is collected without regard to key factors that may influence the population or

    process.

    Blocking neutralizes background variables that can not be eliminated by randomizing. It does so by spreading them across the experiment

    A box plot, also known as a box and whisker diagram, is a basic graphing tool that displays centering, spread, and distribution of a

    continuous data set

    CAP Includes/Excludes is a tool that can help your team define the boundaries of your project, facilitate discussion about issues related to

    your project scope, and challenge you to agree on what is included and excluded within the scope of your work. See the tool CAP

    Includes/Excludes.

    CAP Stakeholder Analysis is a tool to identify and enlist support from stakeholders. It provides a visual means of identifying stakeholder

    support so that you can develop an action plan for your project. See the tool CAP Stakeholder Analysis.

    Capability analysis is a Minitab tool that visually compares actual process performance to the performance standards. See the tool

    Capability Analysis.

    A factor (X) that has an impact on a response variable (Y); a source of variation in a process or product.

    A cause and effect diagram is a visual tool used to logically organize possible causes for a specific problem or effect by graphically

    displaying them in increasing detail. It helps to identify root causes and ensures common understanding of the causes that lead to theproblem. Because of its fishbone shape, it is sometimes called a "fishbone diagram." See the tool Cause and Effect Diagram.

    The center of a process is the average value of its data. It is equivalent to the mean and is one measure of the central tendency.

    A center point is a run performed with all factors set halfway between their low and high levels. Each factor must be continuous to have a

    logical halfway point. For example, there are no logical center points for the factors vendor, machine, or location (such as city); however,

    there are logical center points for the factors temperature, speed, and length.

    The central limit theorem states that given a distribution with a mean m and variance s2, the sampling distribution of the mean appraches a

    normal distribution with a mean and variance/N as N, the sample size, increases

    A characteristic is a definable or measurable feature of a process, product, or variable.

  • 8/2/2019 Sixsigma Reference

    13/38

    A chi square test, also called "test of association," is a statistical test of association between discrete variables. It is based on a

    mathematical comparison of the number of observed counts with the number of expected counts to determine if there is a difference in

    output counts based on the input category. See the tool Chi Square-Test of Independence. Used with Defects data (counts) & defectives

    data (how many good or bad). Critical Chi-Square is Chi-squared value where p=.05.

    Common cause variability is a source of variation caused by unknown factors that result in a steady but random distribution of output around

    the average of the data. Common cause variation is a measure of the process's potential, or how well the process can perform when special

    cause variation is removed. Therefore, it is a measure of the process technology. Common cause variation is also called random variation,noise, noncontrollable variation, within-group variation, or inherent variation. Example: many X's with a small impact.

    Measurement of the certainty of the shape of the fitted regression line. A 95% confidence band implies a 95% chance that the true

    regression line fits within the confidence bands. Measurement of certainty.

    Factors or interactions are said to be confounded when the effect of one factor is combined with that of another. In other words, their effects

    can not be analyzed independently.

    Concluding something is bad when it is actually good (TYPE II Error)

    Continuous data is information that can be measured on a continuum or scale. Continuous data can have almost any numeric value and can

    be meaningfully subdivided into finer and finer increments, depending upon the precision of the measurement system. Examples of

    continuous data include measurements of time, temperature, weight, and size. For example, time can be measured in days, hours, minutes,

    seconds, and in even smaller units. Continuous data is also called quantitative data.

    Control limits define the area three standard deviations on either side of the centerline, or mean, of data plotted on a control chart. Do not

    confuse control limits with specification limits. Control limits reflect the expected variation in the data and are based on the distribution of the

    data points. Minitab calculates control limits using collected data. Specification limits are established based on customer or regulatory

    requirements. Specification limits change only if the customer or regulatory body so requests.

    Correlation is the degree or extent of the relationship between two variables. If the value of one variable increases when the value of the

    other increases, they are said to be positively correlated. If the value of one variable decreases when the value of the other decreases, they

    are said to be negatively correlated. The degree of linear association between two variables is quantified by the correlation coefficient

    The correlation coefficient quantifies the degree of linear association between two variables. It is typically denoted by r and will have a value

    ranging between negative 1 and positive 1.

    A critical element is an X that does not necessarily have different levels of a specific scale but can be configured according to a variety of

    independent alternatives. For example, a critical element may be the routing path for an incoming call or an item request form in an order-

    taking process. In these cases the critical element must be specified correctly before you can create a viable solution; however, numerous

    alternatives may be considered as possible solutions.

    CTQs (stands for Critical to Quality) are the key measurable characteristics of a product or process whose performance standards, or

    specification limits, must be met in order to satisfy the customer. They align improvement or design efforts with critical issues that affectcustomer satisfaction. CTQs are defined early in any Six Sigma project, based on Voice of the Customer (VOC) data.

    Cycle time is the total time from the beginning to the end of your process, as defined by you and your customer. Cycle time includes process

    time, during which a unit is acted upon to bring it closer to an output, and delay time, during which a unit of work waits to be processed.

    A dashboard is a tool used for collecting and reporting information about vital customer requirements and your business's performance for

    key customers. Dashboards provide a quick summary of process performance.

    Data is factual information used as a basis for reasoning, discussion, or calculation; often this term refers to quantitative information

    A defect is any nonconformity in a product or process; it is any event that does not meet the performance standards of a Y.

    The word defective describes an entire unit that fails to meet acceptance criteria, regardless of the number of defects within the unit. A unit

    may be defective because of one or more defects.

    Descriptive statistics is a method of statistical analysis of numeric data, discrete or continuous, that provides information about centering,

    spread, and normality. Results of the analysis can be in tabular or graphic format.

    A design risk assessment is the act of determining potential risk in a design process, either in a concept design or a detailed design. Itprovides a broader evaluation of your design beyond just CTQs, and will enable you to eliminate possible failures and reduce the impact of

    potential failures. This ensures a rigorous, systematic examination in the reliability of the design and allows you to capture system-level risk

    When you are deciding what factors and interactions you want to get information about, you also need to determine the smallest effect you

    will consider significant enough to improve your process. This minimum size is known as the detectable effect size, or DES. Large effects

    are easier to detect than small effects. A design of experiment compares the total variability in the experiment to the variation caused by a

    factor. The smaller the effect you are interested in, the more runs you will need to overcome the variability in your experimentation.

    Equal to: (#rows - 1)(#cols - 1)

  • 8/2/2019 Sixsigma Reference

    14/38

    Discrete data is information that can be categorized into a classification. Discrete data is based on counts. Only a finite number of values is

    possible, and the values cannot be subdivided meaningfully. For example, the number of parts damaged in shipment produces discrete data

    because parts are either damaged or not damaged.

    Distribution refers to the behavior of a process described by plotting the number of times a variable displays a specific value or range of

    values rather than by plotting the value itself.

    DMADV is GE Company's data-driven quality strategy for designing products and processes, and it is an integral part of GE's Six Sigma

    Quality Initiative. DMADV consists of five interconnected phases: Define, Measure, Analyze, Design, and Verify.

    DMAIC refers to General Electric's data-driven quality strategy for improving processes, and is an integral part of the company's Six Sigma

    Quality Initiative. DMAIC is an acronym for five interconnected phases: Define, Measure, Analyze, Improve, and Control.

    A design of experiment is a structured, organized method for determining the relationship between factors (Xs) affecting a process and the

    output of that process.

    Defects per million opportunities (DPMO) is the number of defects observed during a standard production run divided by the number of

    opportunities to make a defect during that run, multiplied by one million.

    Defects per opportunity (DPO) represents total defects divided by total opportunities. DPO is a preliminary calculation to help you calculate

    DPMO (defects per million opportunities). Multiply DPO by one million to calculate DPMO.

    Defects per unit (DPU) represents the number of defects divided by the number of products.

    Check to obtain a two-sided confidence interval for the difference between each treatment mean and a control mean. Specify a family error

    rate between 0.5 and 0.001. Values greater than or equal to 1.0 are interpreted as percentages. The default error rate is 0.05.

    An effect is that which is produced by a cause; the impact a factor (X) has on a response variable (Y).

    As good as a process can get without capital investment

    Error, also called residual error, refers to variation in observations made under identical test conditions, or the amount of variation that cannot be attributed to the variables included in the experiment.

    Error that concludes that someone is guilty, when in fact, they really are not. (Ho true, but I rejected it--concluded Ha) ALPHA

    Error that concludes that someone is not guilty, when in fact, they really are. (Ha true, but I concluded Ho). BETA

    A factor is an independent variable; an X.

    Failure mode and effects analysis (FMEA) is a disciplined approach used to identify possible failures of a product or service and then

    determine the frequency and impact of the failure. See the tool Failure Mode and Effects Analysis.

    Check to obtain confidence intervals for all pairwise differences between level means using Fisher's LSD procedure. Specify an individual

    rate between 0.5 and 0.001. Values greater than or equal to 1.0 are interpreted as percentages. The default error rate is 0.05.

    Predicted values of "Y" calculated using the regression equation for each value of "X"

    A fitted value is the Y output value that is predicted by a regression equation.

    A fractional factorial design of experiment (DOE) includes selected combinations of factors and levels. It is a carefully prescribed and

    representative subset of a full factorial design. A fractional factorial DOE is useful when the number of potential factors is relatively largebecause they reduce the total number of runs required. By reducing the number of runs, a fractional factorial DOE will not be able to

    evaluate the impact of some of the factors independently. In general, higher-order interactions are confounded with main effects or lower-

    order interactions. Because higher order interactions are rare, usually you can assume that their effect is minimal and that the observed

    effect is caused by the main effect or lower-level interaction.

    A frequency plot is a graphical display of how often data values occur.

    A full factorial design of experiment (DOE) measures the response of every possible combination of factors and factor levels. These

    responses are analyzed to provide information about every main effect and every interaction effect. A full factorial DOE is practical when

    fewer than five factors are being investigated. Testing all combinations of factor levels becomes too expensive and time-consuming with five

    or more factors.

    Measurement of distance between individual distributions. As F goes up, P goes down (i.e., more confidence in there being a difference

    between two means). To calculate: (Mean Square of X / Mean Square of Error)

    Gage R&R, which stands for gage repeatability and reproducibility, is a statistical tool that measures the amount of variation in the

    measurement system arising from the measurement device and the people taking the measurement. See Gage R&R tools.

    A Gantt chart is a visual project planning device used for production scheduling. A Gantt chart graphically displays time needed to completetasks.

    Term used to describe % variation explained by X

    GRPI stands for four critical and interrelated aspects of teamwork: goals, roles, processes, and interpersonal relationships, and it is a tool

    used to assess them. See the tool GRPI.

    A histogram is a basic graphing tool that displays the relative frequency or occurrence of continuous data values showing which values occur

    most and least frequently. A histogram illustrates the shape, centering, and spread of data distribution and indicates whether there are any

    outliers. See the tool Histogram.

    Homogeneity of variance is a test used to determine if the variances of two or more samples are different. See the tool Homogeneity of

    Variance.

  • 8/2/2019 Sixsigma Reference

    15/38

    Hypothesis testing refers to the process of using statistical analysis to determine if the observed differences between two or more samples

    are due to random chance (as stated in the null hypothesis) or to true differences in the samples (as stated in the alternate hypothesis). A

    null hypothesis (H0) is a stated assumption that there is no difference in parameters (mean, variance, DPMO) for two or more populations.

    The alternate hypothesis (Ha) is a statement that the observed difference or relationship between two populations is real and not the result of

    chance or an error in sampling. Hypothesis testing is the process of using a variety of statistical tools to analyze data and, ultimately, to

    accept or reject the null hypothesis. From a practical point of view, finding statistical evidence that the null hypothesis is false allows you to

    reject the null hypothesis and accept the alternate hypothesis.

    An I-MR chart, or individual and moving range chart, is a graphical tool that displays process variation over time. It signals when a process

    may be going out of control and shows where to look for sources of special cause variation. See the tool I-MR Control.

    In control refers to a process unaffected by special causes. A process that is in control is affected only by common causes. A process that is

    out of control is affected by special causes in addition to the common causes affecting the mean and/or variance of a process.

    An independent variable is an input or process variable (X) that can be set directly to achieve a desired output

    Intangible benefits, also called soft benefits, are the gains attributable to your improvement project that are not reportable for formal

    accounting purposes. These benefits are not included in the financial calculations because they are nonmonetary or are difficult to attribute

    directly to quality. Examples of intangible benefits include cost avoidance, customer satisfaction and retention, and increased employee

    morale.

    An interaction occurs when the response achieved by one factor depends on the level of the other factor. On interaction plot, when lines are

    not parallel, there's an interaction.

    An interrelationship digraph is a visual display that maps out the cause and effect links among complex, multivariable problems or desired

    outcomes.Intraquartile range (from box plot) representing range between 25th and 75th quartile.

    Kano analysis is a quality measurement used to prioritize customer requirements.

    Kruskal-Wallis performs a hypothesis test of the equality of population medians for a one-way design (two or more populations). This test is

    a generalization of the procedure used by the Mann-Whitney test and, like Moods median test, offers a nonparametric alternative to the one-

    way analysis of variance. The Kruskal-Wallis test looks for differences among the populations medians. The Kruskal-Wallis test is more

    powerful (the confidence interval is narrower, on average) than Moods median test for analyzing data from many distributions, including data

    from the normal distribution, but is less robust against outliers.

    Kurtosis is a measure of how peaked or flat a curve's distribution is.

    An L1 spreadsheet calculates defects per million opportunities (DPMO) and a process Z value for discrete data.

    An L2 spreadsheet calculates the short-term and long-term Z values for continuous data sets.

    A leptokurtic distribution is symmetrical in shape, similar to a normal distribution, but the center peak is much higher; that is, there is a higher

    frequency of values near the mean. In addition, a leptokurtic distribution has a higher frequency of data in the tail area.

    Levels are the different settings a factor can have. For example, if you are trying to determine how the response (speed of data transmittal) isaffected by the factor (connection type), you would need to set the factor at different levels (modem and LAN) then measure the change in

    response.

    Linearity is the variation between a known standard, or "truth," across the low and high end of the gage. It is the difference between an

    individual's measurements and that of a known standard or truth over the full range of expected values.

    A lower specification limit is a value above which performance of a product or process is acceptable. This is also known as a lower spec limit

    or LSL.

    A lurking variable is an unknown, uncontrolled variable that influences the output of an experiment.

    A main effect is a measurement of the average change in the output when a factor is changed from its low level to its high level. It is

    calculated as the average output when a factor is at its high level minus the average output when the factor is at its low level.

    Statistic within Regression-->Best Fits which is used as a measure of bias (i.e., when predicted is different than truth). Should equal (#vars

    + 1)

    Mann-Whitney performs a hypothesis test of the equality of two population medians and calculates the corresponding point estimate and

    confidence interval. Use this test as a nonparametric alternative to the two-sample t-test.

    The mean is the average data point value within a data set. To calculate the mean, add all of the individual data points then divide that figureby the total number of data points.

    Measurement system analysis is a mathematical method of determining how much the variation within the measurement process contributes

    to overall process variability.

    The median is the middle point of a data set; 50% of the values are below this point, and 50% are above this point.

    The most often occurring value in the data set

  • 8/2/2019 Sixsigma Reference

    16/38

    Moods median test can be used to test the equality of medians from two or more populations and, like the Kruskal-Wallis Test,

    provides an nonparametric alternative to the one-way analysis of variance. Moods median test is sometimes called a median test or sign

    scores test. Moods Median Test tests:

    H0: the population medians are all equal versus H1: the medians are not all equal

    An assumption of Moods median test is that the data from each population are independent random samples and the population

    distributions have the same shape. Moods median test is robust against outliers and errors in data and is particularly appropriate in the

    preliminary stages of analysis. Moods Median test is more robust than is the Kruskal-Wallis test against outliers, but is less powerful for data

    from many distributions, including the normal.

    Multicolinearity is the degree of correlation between Xs. It is an important consideration when using multiple regression on data that has

    been collected without the aid of a design of experiment (DOE). A high degree of multicolinearity may lead to regression coefficients that are

    too large or are headed in the wrong direction from that you had expected based on your knowledge of the process. High correlations

    between Xs also may result in a large p-value for an X that changes when the intercorrelated X is dropped from the equation. The variance

    inflation factor provides a measure of the degree of multicolinearity.

    Multiple regression is a method of determining the relationship between a continuous process output (Y) and several factors (Xs).

    A multi-vari chart is a tool that graphically displays patterns of variation. It is used to identify possible Xs or families of variation, such as

    variation within a subgroup, between subgroups, or over time. See the tool Multi-Vari Chart.

    Process input that consistently causes variation in the output measurement that is random and expected and, therefore, not controlled is

    called noise. Noise also is referred to as white noise, random variation, common cause variation, noncontrollable variation, and within-group

    variation.

    It refers to the value that you estimate in a design process that approximate your real CTQ (Y) target value based on the design element

    capacity. Nominals are usually referred to as point estimate and related to y-hat model.

    Set of tools that avoids assuming a particular distribution.

    Normal distribution is the spread of information (such as product performance or demographics) where the most frequently occurring value is

    in the middle of the range and other probabilities tail off symmetrically in both directions. Normal distribution is graphically categorized by a

    bell-shaped curve, also known as a Gaussian distribution. For normally distributed data, the mean and median are very close and may be

    identical.

    Used to check whether observations follow a normal distribution. P > 0.05 = data is normal

    A normality test is a statistical process used to determine if a sample or any group of data fits a standard normal distribution. A normality test

    can be performed mathematically or graphically. See the tool Normality Test.

    A null hypothesis (H0) is a stated assumption that there is no difference in parameters (mean, variance, DPMO) for two or more populations.

    According to the null hypothesis, any observed difference in samples is due to chance or sampling error. It is written mathematically as

    follows: H0: m1 = m2 H0: s1 = s2. Defines what you expect to observe. (e.g., all means are same or independent). (P > 0.05)

    An opportunity is anything that you inspect, measure, or test on a unit that provides a chance of allowing a defect.

    An outlier is a data point that is located far from the rest of the data. Given a mean and standard deviation, a statistical distribution expectsdata points to fall within a specific range. Those that do not are called outliers and should be investigated to ensure that the data is correct. If

    the data is correct, you have witnessed a rare event or your process has changed. In either case, you need to understand what caused the

    outliers to occur.

    Percent of tolerance is calculated by taking the measurement error of interest, such as repeatability and/or reproducibility, dividing by the

    total tolerance range, then multiplying the result by 100 to express the result as a percentage.

    A platykurtic distribution is one in which most of the values share about the same frequency of occurrence. As a result, the curve is very flat,

    or plateau-like. Uniform distributions are platykurtic.

    Pooled standard deviation is the standard deviation remaining after removing the effect of special cause variation-such as geographic

    location or time of year. It is the average variation of your subgroups.

    Measurement of the certainty of the scatter about a certain regression line. A 95% prediction band indicates that, in general, 95% of the

    points will be contained within the bands.

    Probability refers to the chance of something happening, or the fraction of occurrences over a large number of trials. Probability can range

    from 0 (no chance) to 1 (full certainty).

    Probability of defect is the statistical chance that a product or process will not meet performance specifications or lie within the defined upperand lower specification limits. It is the ratio of expected defects to the total output and is expressed as p(d). Process capability can be

    determined from the probability of defect.

    Process capability refers to the ability of a process to produce a defect-free product or service. Various indicators are used-some address

    overall performance, some address potential performance.

    Concluding something is good when it is actually bad (TYPE I Error)

  • 8/2/2019 Sixsigma Reference

    17/38

    The p-value represents the probability of concluding (incorrectly) that there is a difference in your samples when no true difference exists. It

    is a statistic calculated by comparing the distribution of given sample data and an expected distribution (normal, F, t, etc.) and is dependent

    upon the statistical test being performed. For example, if two samples are being compared in a t-test, a p-value of 0.05 means that there is

    only 5% chance of arriving at the calculated t value if the samples were not different (from the same population). In other words, a p-value of

    0.05 means there is only a 5% chance that you would be wrong in concluding the populations are different. P-value < 0.05 = safe to

    conclude there's a difference. P-value = risk of wasting time investigating further.

    25th percentile (from box plot)

    75th percentile (from box plot)

    Discrete data

    Quality function deployment (QFD) is a structured methodology used to identify customers' requirements and translate them into key

    process deliverables. In Six Sigma, QFD helps you focus on ways to improve your process or product to meet customers' expectations. See

    the tool Quality Function Deployment.

    Continuous data

    A radar chart is a graphical display of the differences between actual and ideal performance. It is useful for defining performance and

    identifying strengths and weaknesses.

    Running experiments in a random order, not the standard order in the test layout. Helps to eliminate effect of "lurking variables",

    uncontrolled factors whihc might vary over the length of the experiment.

    A rational subgroup is a subset of data defined by a specific factor such as a stratifying factor or a time period. Rational subgrouping

    identifies and separates special cause variation (variation between subgroups caused by specific, identifiable factors) from common cause

    variation (unexplained, random variation caused by factors that cannot be pinpointed or controlled). A rational subgroup should exhibit onlycommon cause variation.

    Regression analysis is a method of analysis that enables you to quantify the relationship between two or more variables (X) and (Y) by fitting

    a line or plane through all the points such that they are evenly distributed about the line or plane. Visually, the best-fit line is represented on

    a scatter plot by a line or plane. Mathematically, the line or plane is represented by a formula that is referred to as the regression equation.

    The regression equation is used to model process performance (Y) based on a given value or values of the process variable (X).

    Repeatability is the variation in measurements obtained when one person takes multiple measurements using the same techniques on the

    same parts or items.

    Number of times you ran each corner. Ex. 2 replicates means you ran one corner twice.

    Replication occurs when an experimental treatment is set up and conducted more than once. If you collect two data points at each

    treatment, you have two replications. In general, plan on making between two and five replications for each treatment. Replicating an

    experiment allows you to estimate the residual or experimental error. This is the variation from sources other than the changes in factor

    levels. A replication is not two measurements of the same data point but a measurement of two data points under the same treatment

    conditions. For example, to make a replication, you would not have two persons time the response of a call from the northeast region during

    the night shift. Instead, you would time two calls into the northeast region's help desk during the night shift.

    Reproducibility is the variation in average measurements obtained when two or more people measure the same parts or items using the

    same measuring technique.

    A residual is the difference between the actual Y output value and the Y output value predicted by the regression equation. The residuals in

    a regression model can be analyzed to reveal inadequacies in the model. Also called "errors"

    Resolution is a measure of the degree of confounding among effects. Roman numerals are used to denote resolution. The resolution of your

    design defines the amount of information that can be provided by the design of experiment. As with a computer screen, the higher the

    resolution of your design, the more detailed the information you will see. The lowest resolution you can have is resolution III.

    A robust process is one that is operating at 6 sigma and is therefore resistant to defects. Robust processes exhibit very good short-term

    process capability (high short-term Z values) and a small Z shift value. In a robust process, the critical elements usually have been designed

    to prevent or eliminate opportunities for defects; this effort ensures sustainability of the process. Continual monitoring of robust processes is

    not usually needed, although you may wish to set up periodic audits as a safeguard.

    Rolled throughput yield is the probability that a single unit can pass through a series of process steps free of defects.

    A mathematical term describing how much variation is being explained by the X. FORMULA: R-sq = SS(regression) / SS(total)

    Answers question of how much of total variation is explained by X. Caution: R-sq increases as number of data points increases.

    Pg. 13 analyze

    Unlike R-squared, R-squared adjusted takes into account the number of X's and the number of data points. FORMULA: R-sq (adj) = 1 -

    [(SS(regression)/DF(regression)) / (SS(total)/DF(total))]

    Takes into account the number of X's and the number of data points...also answers: how much of total variation is explained by X.

    A portion or subset of units taken from the population whose characteristics are actually measured

    The sample size calculator is a spreadsheet tool used to determine the number of data points, or sample size, needed to estimate the

    properties of a population. See the tool Sample Size Calculator.

    Sampling is the practice of gathering a subset of the total data available from a process or a population.

  • 8/2/2019 Sixsigma Reference

    18/38

    A scatter plot, also called a scatter diagram or a scattergram, is a basic graphic tool that illustrates the relationship between two variables.

    The dots on the scatter plot represent data points. See the tool Scatter Plot.

    A scorecard is an evaluation device, usually in the form of a questionnaire, that specifies the criteria your customers will use to rate your

    business's performance in satisfying their requirements.

    A screening design of experiment (DOE) is a specific type of a fractional factorial DOE. A screening design is a resolution III design, which

    minimizes the number of runs required in an experiment. A screening DOE is practical when you can assume that all interactions are

    negligible compared to main effects. Use a screening DOE when your experiment contains five or more factors. Once you have screened out

    the unimportant factors, you may want to perform a fractional or full-fractional DOE.

    Segmentation is a process used to divide a large group into smaller, logical categories for analysis. Some commonly segmented entities are

    customers, data sets, or markets.

    It describes the relationship between output variance and input nominals

    The Greek letter s (sigma) refers to the standard deviation of a population. Sigma, or standard deviation, is used as a scaling factor to

    convert upper and lower specification limits to Z. Therefore, a process with three standard deviations between its mean and a spec limit

    would have a Z value of 3 and commonly would be referred to as a 3 sigma process.

    Simple linear regression is a method that enables you to determine the relationship between a continuous process output (Y) and one factor

    (X). The relationship is typically expressed in terms of a mathematical equation such as Y = b + mX

    SIPOC stands for suppliers, inputs, process, output, and customers. You obtain inputs from suppliers, add value through your process, and

    provide an output that meets or exceeds your customer's requirements.

    Most often, the median is used as a measure of central tendency when data sets are skewed. The metric that indicates the degree of

    asymmetry is called, simply, skewness. Skewness often results in situations when a natural boundary is present. Normal distributions will

    have a skewness value of approximately zero. Right-skewed distributions will have a positive skewness value; left-skewed distributions will

    have a negative skewness value. Typically, the skewness value will range from negative 3 to positive 3. Two examples of skewed data sets

    are salaries within an organization and monthly prices of homes for sale in a particular area.

    A measure of variation for "S-shaped" fulfillment Y's

    Unlike common cause variability, special cause variation is caused by known factors that result in a non-random distribution of output. Also

    referred to as "exceptional" or "assignable" variation. Example: Few X's with big impact.

    The spread of a process represents how far data points are distributed away from the mean, or center. Standard deviation is a measure of

    spread.

    The Six Sigma process report is a Minitab tool that calculates process capability and provides visuals of process performance. See the

    tool Six Sigma Process Report.

    The Six Sigma product report is a Minitab tool that calculates the DPMO and short-term capability of your process. See the tool Six Sigma

    Product Report.

    Stability represents variation due to elapsed time. It is the difference between an individual's measurements taken of the same parts after an

    extended period of time using the same techniques.

    Standard deviation is a measure of the spread of data in relation to the mean. It is the most common measure of the variability of a set ofdata. If the standard deviation is based on a sampling, it is referred to as "s." If the entire data population is used, standard deviation is

    represented by the Greek letter sigma (s). The standard deviation (together with the mean) is used to measure the degree to which the

    product or process falls within specifications. The lower the standard deviation, the more likely the product or service falls within spec. When

    the standard deviation is calculated in relation to the mean of all the data points, the result is an overall standard deviation. When the

    standard deviation is calculated in relation to the means of subgroups, the result is a pooled standard deviation. Together with the mean,

    both overall and pooled standard deviations can help you determine your degree of control over the product or process.

    Design of experiment (DOE) treatments often are presented in a standard order. In a standard order, the first factor alternates between the

    low and high setting for each treatment. The second factor alternates between low and high settings every two treatments. The third factor

    alternates between low and high settings every four treatments. Note that each time a factor is added, the design doubles in size to provide

    all combinations for each level of the new factor.

    Any number calculated from sample data, describes a sample characteristic

    Statistical process control is the application of statistical methods to analyze and control the variation of a process.

    A stratifying factor, also referred to as stratification or a stratifier, is a factor that can be used to separate data into subgroups. This is done toinvestigate whether that factor is a significant special cause factor.

    Measurement of where you can get.

    Tolerance range is the difference between the upper specification limit and the lower specification limit.

    Total observed variation is the combined variation from all sources, including the process and the measurement system.

    The total probability of defect is equal to the sum of the probability of defect above the upper spec limit-p(d), upper-and the probability of

    defect below the lower spec limit-p(d), lower.

    A transfer function describes the relationship between lower level requirements and higher level requirements. If it describes the relationship

    between the nominal values, then it is called a y-hat model. If it describes the relationship between the variations, then it is called an s-hat

    model.

  • 8/2/2019 Sixsigma Reference

    19/38

    Used to make non-normal data look more normal.

    The trivial many refers to the variables that are least likely responsible for variation in a process, product, or service.

    A t-test is a statistical tool used to determine whether a significant difference exists between the means of two distributions or the mean of

    one distribution and a target value. See the t-test tools.

    Check to obtain confidence intervals for all pairwise differences between level means using Tukey's method (also called Tukey's HSD or

    Tukey-Kramer method). Specify a family error rate between 0.5 and 0.001. Values greater than or equal to 1.0 are interpreted as

    percentages. The default error rate is 0.05.

    Regression statistical output that shows the unexplained variation in the data. Se = sqrt((sum(yi-y_bar)^2)/(n-1))

    A unit is any item that is produced or processed.

    An upper specification limit, also known as an upper spec limit, or USL, is a value below which performance of a product or process is

    acceptable.

    Variation is the fluctuation in process output. It is quantified by standard deviation, a measure of the average spread of the data around the

    mean. Variation is sometimes called noise. Variance is squared standard deviation.

    Common cause variation is fluctuation caused by unknown factors resulting in a steady but random distribution of output around the average

    of the data. It is a measure of the process potential, or how well the process can perform when special cause variation is removed; therefore,

    it is a measure of the process's technology. Also called, inherent variation

    Special cause variation is a shift in output caused by a specific factor such as environmental conditions or process input parameters. It can

    be accounted for directly and potentially removed and is a measure of process control, or how well the process is performing compared to its

    potential. Also called non-random variation.

    From box plot...displays minimum and maximum observations within 1.5 IQR (75th-25th percentile span) from either 25th or 75th percentile.

    Outlier are those that fall outside of the 1.5 range.Yield is the percentage of a process that is free of defects.

    A Z value is a data point's position between the mean and another location as measured by the number of standard deviations. Z is a

    universal measurement because it can be applied to any unit of measure. Z is a measure of process capability and corresponds to the

    process sigma value that is reported by the businesses. For example, a 3 sigma process means that three standard deviations lie between

    the mean and the nearest specification limit. Three is the Z value.

    Z bench is the Z value that corresponds to the total probability of a defect

    Z long term (ZLT) is the Z bench calculated from the overall standard deviation and the average output of the current process. Used with

    continuous data, ZLT represents the overall process capability and can be used to determine the probability of making out-of-spec parts

    within the current process.

    Z shift is the difference between ZST and ZLT. The larger the Z shift, the more you are able to improve the control of the special factors

    identified in the subgroups.

    ZST represents the process capability when special factors are removed and the process is properly centered. Z ST is the metric by which

    processes are compared.

  • 8/2/2019 Sixsigma Reference

    20/38

    184

    Training Link

  • 8/2/2019 Sixsigma Reference

    21/38

    3.096

    Step 12 p.103

  • 8/2/2019 Sixsigma Reference

    22/38

    C:\Six Sigma\CD

    Training\04B_analysis_01019

    9.pps - 7

    http://c/Six%20Sigma/CD%20Training/04B_analysis_010199.ppshttp://c/Six%20Sigma/CD%20Training/04B_analysis_010199.ppshttp://c/Six%20Sigma/CD%20Training/04B_analysis_010199.ppshttp://c/Six%20Sigma/CD%20Training/04B_analysis_010199.ppshttp://c/Six%20Sigma/CD%20Training/04B_analysis_010199.ppshttp://c/Six%20Sigma/CD%20Training/04B_analysis_010199.pps
  • 8/2/2019 Sixsigma Reference

    23/38

    C:\Six Sigma\CD Training\04A_efficient_022499.pps - 13

    http://c/Six%20Sigma/CD%20Training/04A_efficient_022499.ppshttp://c/Six%20Sigma/CD%20Training/04A_efficient_022499.pps
  • 8/2/2019 Sixsigma Reference

    24/38

  • 8/2/2019 Sixsigma Reference

    25/38

  • 8/2/2019 Sixsigma Reference

    26/38

    Step 12 p.103

    http://c/Six%20Sigma/CD%20Training/04A_efficient_022499.pps
  • 8/2/2019 Sixsigma Reference

    27/38

    GEAE CD (Control)

    http://c/Six%20Sigma/CD%20Training/04A_efficient_022499.pps
  • 8/2/2019 Sixsigma Reference

    28/38

    Tool What does it do? Why use? Wh

    1-Sample t-Test Compares mean to target

    The 1-sample t-test is useful in identifying

    a significant difference between a sample

    mean and a specified value when thedifference is not readily apparent from

    graphical tools. Using the 1-sample t-test

    to compare data gathered before process

    improvements and after is a way to prove

    that the mean has actually shifted.

    Thedata

    mea

    you

    bas

    1-Way ANOVA

    ANOVA tests to see if the difference between the means

    of each level is significantly more than the variation

    within each level. 1-way ANOVA is used when two or

    more means (a single factor with three or more levels)

    must be compared with each other.

    One-way ANOVA is useful for identifying a

    statistically significant difference between

    means of three or more levels of a factor.

    Use

    thre

    or m

    tota

    fact

    2-Sample t-TestA statistical test used to detect differences between

    means of two populations.

    The 2-sample t-test is useful for identifying

    a significant difference between means of

    two levels (subgroups) of a factor. I t is

    also extremely useful for identifying

    important Xs for a project Y.

    Whe

    data

    from

    two

    ANOVA GLM

    ANOVA General Linear Model (GLM) is a statistical tool

    used to test for differences in means. ANOVA tests to

    see if the difference between the means of each level is

    significantly more than the variation within each level.

    ANOVA GLM is used to test the effect of two or morefactors with multiple levels, alone and in combination, on

    a dependent variable.

    The General Linear Model allows you to

    learn one form of ANOVA that can be used

    for all tests of mean differences involving

    two or more factors or levels. Because

    ANOVA GLM is useful for identifying the

    effect of two or more factors (independent

    variables) on a dependent variable, it is

    also extremely useful for identifyingimportant Xs for a project Y. ANOVA GLM

    also yields a percent contribution that

    quantifies the variation in the response

    (dependent variable) due to the individual

    factors and combinations of factors.

    You

    to id

    the

    or m

    in co

    to qresp

    facto

    Benchmarking

    Benchmarking is an improvement tool whereby a

    company: Measures its performance or processagainst other companies' best in class practices,

    Determines how those companies achieved their

    performance levels, Uses the information to improve its

    own performance.

    Benchmarking is an important tool in the

    improvement of your process for several

    reasons. First, i t allows you to compare

    your relative position for this product or

    service against industry leaders or other

    companies outside your industry who

    perform similar functions. Second, it helpsyou identify potential Xs by comparing

    your process to the benchmarked process.

    Third, it may encourage innovative or

    direct applications of solutions from other

    businesses to your product or process.

    And finally, benchmarking can help to

    build acceptance for your project's results

    when they are compared to benchmark

    data obtained from industry leaders.

    Be

    Six

    Best SubsetsTells you the best X to use when you're comparing

    multiple X's in regression assessment.

    Best Subsets is an efficient way to select a

    group of "best subsets" for further analysis

    by selecting the smallest subset that fulf illscertain statistical criteria. The subset

    model may actually estimate the

    regression coefficients and predict future

    Typregr

    dete

    best

    http://c/Six%20Sigma/CD%20Training/04A_efficient_022499.ppshttp://topiccontent.asp/?tpcid=6045&hin=19&whatpage=Topic&ns=newhttp://topiccontent.asp/?tpcid=6047&hin=19&whatpage=Topic&ns=newhttp://topiccontent.asp/?tpcid=6046&hin=19&whatpage=Topic&ns=newhttp://topiccontent.asp/?tpcid=6114&hin=19&whatpage=Topic&ns=newhttp://topiccontent.asp/?tpcid=6060&hin=19&whatpage=Topic&ns=newhttp://topiccontent.asp/?tpcid=6060&hin=19&whatpage=Topic&ns=newhttp://topiccontent.asp/?tpcid=6114&hin=19&whatpage=Topic&ns=newhttp://topiccontent.asp/?tpcid=6046&hin=19&whatpage=Topic&ns=newhttp://topiccontent.asp/?tpcid=6047&hin=19&whatpage=Topic&ns=newhttp://topiccontent.asp/?tpcid=6045&hin=19&whatpage=Topic&ns=new
  • 8/2/2019 Sixsigma Reference

    29/38

    Tool Summary

    Regression Scatter plot Logistic regression

    Time series plots Matrix Plot Time series plot

    General Linear model Fitted line C chart

    Multi-Vari plot Step wise Regression P chart

    Histogram N chart

    DOE NP chart

    Best Subsets

    Factor ImR

    X-bar R

    ANOVA (more than 2) Kruskal-Wallis Chi Square

    Box plots T-test (for 2) ParetoDot plots Logistic Regression

    MV plot

    Histogram

    DOE

    Homogeneity of variance

    General linear model

    Matrix plot

    Y's (Responses)

    X's

    Continuous Data Attribu

    ContinuousData

    AttributeData

  • 8/2/2019 Sixsigma Reference

    30/38

    te Data

  • 8/2/2019 Sixsigma Reference

    31/38

    aka quantitative data

    Measurement Units (example) Ordinal (example)Time of day Hours, minutes, seconds 1, 2, 3, etc.

    Date Month, date, year Jan., Feb., Mar., etc.

    Cycle time Hours, minutes, seconds, month, date, year 10, 20, 30, etc.

    Speed Miles per hour/centimeters per second 10, 20, 30, etc.

    Brightness Lumens Light, medium, dark

    Temperature Degrees C or F 10, 20, 30, etc.

    Number of things (hospital beds) 10, 20, 30, etc.

    Test scores Percent, number correct F, D, C, B, A

    Defects N/A Number of cracks

    Defects N/A N/A

    Color N/A N/A

    Location N/A N/A

    Groups N/A N/A

    Anything Percent 10, 20, 30, etc.

    aka

    Continuous

  • 8/2/2019 Sixsigma Reference

    32/38

    Nominal (example) Binary (example)N/A a.m./p.m.

    N/A Before/after

    N/A Before/after

    N/A Fast/slow

    N/A On/off

    N/A Hot/cold

    N/A Large/small hospital

    N/A Pass/Fail

    N/A Good/bad

    Cracked, burned, missing Good/bad

    Red, blue, green, yellow N/A

    Site A, site B, site C Domestic/international

    HR, legal, IT, engineering Exempt/nonexempt

    N/A Above/below

    ualitative/categorical/attribute data

    Discrete

  • 8/2/2019 Sixsigma Reference

    33/38

    Tool Use When

    ANOVA

    Determine if the average of a group of

    data is different than the average of other

    (multiple) groups of data

    Box & Whisker PlotCompare median and variation between

    groups of data. Also identifies outliers.

    Cause & Effect Diagram/

    Fishbone

    Brainstorming possible sources of

    variation for a particular effect

    Chi-Square

    Determine if one set of defectives data is

    different than other sets of defectives

    data.

    Dot PlotQuick graphical comparison of two or

    more processes' variation or spread

    General Linear Models

    Determine if difference in categorical

    data between groups is real when taking

    into account other variable x's

    Histogram

    View the distribution of data (spread,

    mean, mode, outliers, etc.)

    Homogeneity of Variance

    Determine if the variation in one group of

    data is different than the variation in

    other (multiple) groups of data

    Kruskal-Wallis TestDetermine if the means of non-normal

    data are different

    Multi Vari Analysis (See also Run

    Chart / Time Series Plot)

    Helps identify most important types or

    families of variation

    Notched Box Plot

    Compare median of a given confidence

    interval and variation between groups of

    data

  • 8/2/2019 Sixsigma Reference

    34/38

    One-sample t-testDetermine if average of a group of data

    is statistically equal to a specific target

    ParetoCompare how frequently different causes

    occur

    Process MappingCreate visual aide of each step in the

    process being evaluated

    Regression

    Determine if a group of data

    incrementally changes with another

    group

    Run Chart/Time Series Plot Look for trends, outliers, oscillations, etc.

    Scatter PlotLook for correlations between groups of

    variable data

    Two-sample t-test

    Determine if the average of one group of

    data is greater than (or less than) the

    average of another group of data

  • 8/2/2019 Sixsigma Reference

    35/38

    Example Minitab Format Data Format Y

    Compare multiple fixtures to

    determine if one or more performs

    differently

    Stat

    ANOVA

    Oneway

    Response data must be stacked in

    one column and the individual points

    must be tagged (numerically) in

    another column.

    Variable

    Compare turbine blade weights using

    different scales.

    Graph

    Boxplot

    Response data must be stacked in

    one column and the individual points

    must be tagged (numerically) in

    another column.

    Variable

    Potential sources of variation in gage

    r&r

    Stat

    Quality Tools

    Cause and Effect

    Input ideas in proper column

    heading for main branches of

    fishbone. Type effect in pulldown

    window.

    All

    Compare DPUs between GE90 and

    CF6

    Stat

    Tables

    Chi-square Test

    Input two columns; one column

    containing the number of non-

    defective, and the other containing

    the number of defective.

    Discrete

    Compare length of service of GE90

    technicians to CF6 technicians

    Graph

    Character Graphs

    Dotplot

    Input multiple columns of data of

    equal lengthVariable

    Determine if height and weight are

    significant variables between two

    groups when looking at pay

    Stat

    ANOVA

    General Linear Model

    Response data must be stacked in

    one column and the individual points

    must be tagged (numerically) in

    another column. Other variables

    must be stacked in separate

    columns.

    Variable

    View the distribution of Y

    Graph

    Histogram

    or

    StatQuality Tools

    Process Capability

    Input one column of data Variable

    Compare the variation between

    teams

    Stat

    ANOVA

    Homogeneity of

    Variance

    Response data must be stacked in

    one column and the individual points

    must be tagged (numerically) in

    another column.

    Variable

    Compare the means of cycle time for

    different delivery methods

    Stat

    Nonparametrics

    Kruskal-Wallis

    Response data must be stacked in

    one column and the individual points

    must be tagged (numerically) in

    another column.

    Variable

    Compare within piece, piece to piece

    or time to time making of airfoils

    leading edge thickness

    Graph

    Interval Plot

    Response data must be stacked in

    one column and the individual points

    must be tagged (numerically) in

    another column in time order.

    Variable

    Compare different hole drilling

    patterns to see if the median and

    spread of the diameters are the

    same

    Graph

    Character Graphs

    Boxplot

    Response data must be stacked in

    one column and the individual points

    must be tagged (numerically) in

    another column.

    Variable

  • 8/2/2019 Sixsigma Reference

    36/38

    Manufacturer claims the average

    number of cookies in a 1 lb. package

    is 250. You sample 10 packages

    and find that the average is 235.

    Use this test to disprove the

    manufacturer's claim.

    Stat

    Basic Statistics

    1 Sample t

    Input one column of data Variable

    Determine which defect occurs themost often for a particular engine

    program

    StatQuality Tools

    Pareto Chart

    Input two columns of equal length Variable

    Map engine horizontal area with all

    rework loops and inspection pointsN/A

    Use rectangles for process steps

    and diamonds for decision pointsN/A

    Determine if a runout changes with

    temperature

    Stat

    Regression

    Regression

    Input two columns of equal length Variable

    View runout values over time

    Stat

    Quality Tools

    Run Chart

    or

    GraphTime Series Plot

    Input one column of data. Must also

    input a subgroup size (1 will show all

    points)

    Variable

    Determine if rotor blade length varies

    with home position

    Graph

    Plot or

    Graph

    Marginal Plot or

    Graph

    Matrix Plot (multiples)

    Input two or more groups of data of

    equal lengthVariable

    Determine if the average radius

    produced by one grinder is different

    than the average radius produced by

    another grinder

    Stat

    Basic Statistics

    2 Sample t

    Input two columns of equal length Variable

  • 8/2/2019 Sixsigma Reference

    37/38

    Xs p < 0.05 indicates

    Attribute

    At least one group of

    data is different than at

    least one other group.

    Attribute N/A

    All N/A

    DiscreteAt least one group is

    statistically different.

    Attribute N/A

    Attribute/

    Variable

    At least one group of

    data is different than at

    least one other group.

    Attribute N/A

    Attribute

    (Use Levene's Test) At

    least one group of data

    is different than at least

    one other group

    AttributeAt least one mean is

    different

    Attribute N/A

    Attribute N/A

  • 8/2/2019 Sixsigma Reference

    38/38

    N/A Not equal

    Attribute N/A

    N/A N/A

    VariableA correlation is

    detected

    N/A N/A

    Variable N/A

    VariableThere is a difference in

    the means