statistics 1.ppt

Upload: joseph-mcdonald

Post on 14-Jan-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

  • Copyright 2014 by McGraw-Hill Education (Asia). All rights reserved.

    1Data and statistics

  • Assessment Methods and Types

    Classification Percentage Assignments20 %Tests20 %Quizzes20 %Final exam40 %Total 100 %

  • Learning Objectives1-* Identify the data and its applications in business and economy. Describe the key aspects of data (Elements, variables, observations and scales). Differentiate between data classifications and sources. Define the descriptive statistics and statistical inference. Understand the role of computers in the statistical analysis, and the data mining. Point the light on the ethical guidelines for statistical analysis.

  • 1-*Introduction What does business statistics mean? How does such type of knowledge help the organizations and people in general? How can we get information about the events and use it in making decisions? What is the meaning of data? How the data can be collected, coded, analysed, and interpreted?

    All these questions can be answered by understanding business statistics?

  • Statistics1-* The term statistics refers to the numerical facts such as average, median, percent, and index numbers that help us understand the business and economic situations.

    In broader sense, statistics is defined as the art and science of collecting, coding, analysing, presenting and interpreting the data.

  • Applications of statistics in business and economics1-* The most successful managers and decision makers understand the information and know how to use it effectively. Statistics are used by different fields in the business Accounting Finance Marketing Production Economic

  • Data

    1-* Data are the facts and figures collected, analyzed and summarized for presentation and interpretation. All the data collected in a particular study are referred to as the data set.Elements: are the entities on which the data are collected.Variables: are the characteristics of interest for the elements.Observations: the set of measurements obtained for particular element.

  • Data1-*Element Observations Variables

    Subject AttendancecollaborationPresentation Business3Collaborative GoodEconomic4Selfish Very goodAccounting 2Boring Excellent

  • Scales of measurements There are four types of measurement scales; nominal, ordinal, interval and ratio. The scales are distinguished on the relationships assumed to exist between objects having different scale values.

    The four scale types are ordered in that all later scales have all the properties of earlier scalesplus additional properties.

    1-*

  • Continue 1-*Nominal scale: it is not really a scale because it does not scale objects along any dimension, it simply labels objects. The nominal variable is categorized variable but it not ordered. Ordinal scale: Numbers are used to place objects in order, but there is no information regarding the differences (intervals) between points on the scale. Interval scale: An interval scale is a scale on which equal intervals between objects, represent equal differences. The interval differences are meaningful. Ratio scale : Have a true zero point, ratios are meaningful

  • Comparison of Scales of measurements1-*

    NominalOrdinalIntervalRatioFrequency distribution YesYesYesYesMedian and percentageNoYesYesYesAdd or subtractNoNoYesYesMean and standard deviationNoNoYesYesRatio or coefficient of variationNoNoNoYes

  • Classification of data1-*The data can be classified base on two different angles:

    First: the nature of data: categorical and quantitative data.

    Second: the time of collection: Cross-sectional and Time series data.

  • First: the nature of data1-*Categorical data: is the data that can be classified into different specific categories and usually use nominal indicators. It always uses nominal or ordinal scale of measurement. Quantitative data: it is the data that use numerical values to indicate how much or how many. The quantitative data are obtained using either the interval or ratio scale of measurement.

  • Second: the time of collection1-*Cross-sectional data: the data that are collected at approximately same point of time. This data usually used to describe the current state of the variable, or investigate the relationships between the variables.Time series data: the data that are collected over several time periods. Ex, the evaluation of the progress of the student within three year. The time series data are frequently found in business and economic.

  • Sources of data1-* The data are usually obtained from existing sources or from surveys and experimental studies.

    First: existing sources: in some times, data needed for particular application already exist, which means the data are collected by someone and ready to be used. (Ex, universities, companies and governments)

  • Sources of data..cont..1-* Second: statistical studies: sometimes the data wanted for particular application are not found in the existing sources. In such case, the data can be obtained using a statistical study.The experimental study; a variable of interest is first identified. Then, one or more variables are identified and controlled, so, that data can be obtained about how they influence the variable of interest.The observational study; the statistical study makes no attempt to control the variables of interests.

  • The difference between experimental and observational studies

    1-*

    ExperimentalObservationalThe researcher undertakes some experiment and not just make observationsIn observational study, no experiment is conducted. In this type of study the researcher relies more on data collected. He or she simply makes an observation and arrives at a conclusion.There is human intervention in experiments. There is no human intervention in observational study, the researcher observes things through various studiesHawthorne studies are a good example for experiments.The study to determine the relation between smoking and lung cancer is a typical example for observational study.

  • Data acquisition errors Managers should always be aware of the possibility of data errors in statistical studies.

    The error in data happens when the data obtained is not equal to the actual data. The errors can occur in different ways:

    Typing errors.Outliers.Wrong answers.and so on.

    1-*

  • Descriptive statistics Descriptive statistics refer to the data summarized and presented in form that is easy for the reader. Such summarization can be tabular, graphical, or numerical.

    Several summarization methods can be found in shapes such as tabulation, Par Chart, and Pie Chart. Such graphical charts and tabulation help the reader in reading and interpreting the data.1-*

  • Descriptive statistics.. Frequencies and percent frequencies of presentation

    1-*

    FrequencyPercentGood323.1Very good538.5Excellent538.5Total13100.0

  • Statistical inference Many situations require information or data about a large group of elements (individuals, products, companies..etc)

    However, due to the time, cost, and other considerations, data can be collected from only a small portion of the group. The larger group of elements in particular study is called the population, and the smaller group is called the sample.1-*

  • Statistical inference.. 1-*For example, in case we want to know the average of marks of all Undergraduate Malaysian Students in business statistics, and there is no time to collect the data from all universities or that collecting the data is costly. So, it is acceptable to choose a sample from the population. This sample must represent the population. It is simply adding the value of marks and divided them by the number of students (sample size).

  • Computers and statistical analysis

    1-* Statisticians usually use different types of software to perform the statistical computations.

    This could be due to the large amount of data, and the complexity of outputs required. Such as frequencies, correlation, regression..etc. Therefore, several software programs can help in this issue.

    Programs such as Minitab, SAS, Excel, SPSS, and Amos can help in analysing and presenting the data.

  • Data mining

    1-*Data mining can be defined as the automated extraction of predictive information from large data bases. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.Generally, data mining (sometimes called data or knowledge discovery)

  • The key properties of data mining1-* Automatic discovery of patterns: the notion of automatic discovery refers to the execution of data mining models. Prediction of likely outcomes: many forms of data mining are predictive. For example, a model might predict income based on education and other demographic factors Creation of actionable information: data mining can derive actionable information from large volumes of data. Focus on large data sets and databases

  • Continue ..1-* With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history. By mining demographic data from comment, the retailer could develop products meet the needs of specific customer segments.

  • Process of data mining1-*

  • Data Mining Models and Tasks1-*

  • Ethical guidelines for statistical practices Ethical issues arise in statistics because of the importance role statistics play in the collection, analysis, presentation and interpretation of the data. In statistical study, unethical behavior can take many forms:Improper samplingInappropriate analysis of the dataDevelopment of misleading graphsUse of inappropriate summary statisticsBiased interpretation of the statistical results.

    1-*

    Copyright 2014 by McGraw-Hill Education (Asia). All rights reserved.

    2Descriptive statisticsTabular and Graphical Presentation

  • Learning Objectives1-* Identify how the categorical data can be summarized. Understand the meaning of frequency distribution, relative and percent frequencies of the categorical data. Identify how the quantitative data can be summarized. Discussion of the frequency distribution, relative and percent frequencies of the quantitative data. Identify the meaning of the steam- and- left display and how can be conducted. Identify the meaning of the cross-tabulation and how can be implemented.

  • First: Summarizing Categorical Data

    1-* Frequency distribution: it is a tabular summary of data showing the numbers (frequency) of items in each of several non overlapping classes. Relative frequency distribution: relative frequency distribution is a tabular summary of data showing the relative frequency of items in each of several non overlapping classes.

  • Continue ..1-* Percent frequency distribution: a tabular summarization of the data showing the percent frequency of the data for each segment.

  • 1-*Continue (Bar chart).. Bar chart is a graphical device for summarizing a frequency, relative or percent frequency distribution. 1- On one axis of the graph (usually the horizontal axis), specify the labels that are used for classes.2- Frequency, relative or percent frequency distribution can be used for the other axis of the chart (usually the horizontal axis).3- Using a bar of fixed width drawn above each class label, we extend the length of the bar until we reach the frequency, relative or percent frequency distribution. 4- For categorical data, the bars should be separated to emphasize the fact that each class is separated.

  • 1-*Continue (Bie chart).. Bie Chart provides another graphical device for presenting categorical data summarized in a frequency, relative frequency, or percent frequency distribution. - To construct Bie Chart;1- Draw a circle to represent all data.2- Use the relative frequencies to subdivide circle into sectors, parts, or segments to the frequency, relative frequency, or percent frequency for each sector.3- Since the circle contains 360 degrees, each sector= the relative frequency of the sector 360.

  • Second: Summarizing Quantitative Data

    1-*First: Frequency distribution As the same in qualitative data, frequency distribution is a tabular summary of the data showing the numbers (frequency) of items in each of several non overlapping classes.However, with quantitative data, the data must be more careful in defining the non overlapping.

  • Continue.1-*To define the frequency distribution of quantitative data, three steps have to be taken:Determine the number of non overlapping classesDetermine the width of the classDetermine the class limit

  • Number of classes

    1-* Classes are formed by specifying ranges that will be used to group the data. Usually used between 5 to 20.

    It is recommended using 5 or 6 classes for a small number of data. For the large number of the data, a large number of classes will be used.

  • Width of the classes

    1-*In general, the width of the classes is recommended to be same. There a relationship between the number of class and the width, usually large number of the class indicated a smaller class width and vice versa.The expression of the width of the class as following:

  • Class limit and midpoint1-*Class limit Each data item belongs to one and only one class.The lower class limit identifies the smallest possible data value assigned to the class The upper class limit identifies the largest possible data value assigned to the class.Class midpoint The class midpoints are the values halfway between lower and upper class limits.

  • Relative frequency and percent frequency distributions for quantitative data.

    1-*

  • Second : graphical charts

    1-*1. Dot plot One of the simplest graphical summaries.It is represented by horizontal axis shows the range of data and each data value is represented by dot placed on the axis.Summarizing Quantitative Data, continue..

  • 1-*2. Histogram It is common graphical presentation of quantitative data, and prepared for data previously summarized in either frequency, relative frequency or percent frequency distribution. it is constructed by placing the variables on the horizontal axis, and frequency, relative frequency or percent frequency distribution on the vertical axis. Frequency, relative frequency or percent frequency distribution of each class is shown by drawing rectangle that is determined by the class limits on the horizontal axis , and whose height is the corresponding frequency , relative frequency or percent frequency distribution. Summarizing Quantitative Data, continue..

  • Cumulative frequency distribution

    1-*The cumulative frequency distribution uses the number of classes, class widths, and class limits developed for frequency distribution. Cumulative relative frequency distribution shows the distribution of data items, while a cumulative percent frequency distribution shows the percentage of data items with value less than or equal to the upper limit of each class. However, rather than showing the frequency of each class, the cumulative frequency distribution shows a number of data items with values less than or equal to the upper class limit of each class.

  • 1-*3. Ogive graph

    It is a graph of a cumulative distribution shows data values on the horizontal axis and either the cumulative frequency, the cumulative relative frequency, or cumulative percent frequency on the vertical axis. The ogive is constructed by plotting a point corresponding to the cumulative frequency of each class.

  • Crosstabulation It is a tabular summary of data for two variables, and its usually used for examining the relationships between two variables.

    1-*

  • Exploratory data analysisStem- and leaf DisplayIt is one of the techniques for summarizing the data.1. Arrange the leading digits of each data value to the left of a vertical line.2. To the right of the vertical line, we record the last digit for each data value. 3. The numbers on the left represent the stem and the numbers on right are the leafs.1-*