chapter 1 statistics and fundementals
DESCRIPTION
StatisticTRANSCRIPT
-
Statistics and Fundementals
-
Statistics are used to describe our data but also assess what reliance we can place on information based on samples. A variable is any concept that we can measure and that varies between individuals or cases. Variables should be identified as nominal (also known as category, categorical and qualitative) variables or score (also known as numerical) variables. Formal measurement theory holds that there are more types of variable nominal, ordinal, interval and ratio. These are generally unimportant in the actual practice of doing statistical analyses. It is difficult to distinguish ordinal, interval and ratio measurement in practice in psychology. Nominal variables consist of just named categories whereas score variables are measured in the form of a numerical scale which indicates the quantity of the variable.
-
Imagine a world in which everything is the same; People are identical in all respects. They wear identical clothes; they eat the same meals; they are all the same height from birth; they all go to the same school with identical teachers, identical lessons and iden-
tical facilities; they all go on holiday in the same month; they all do the same job; they all live in identical houses; and the sun shines every day.
People are all have same sex and their gardens have the same plants and the soil is exactly the same no matter whose garden; they all die on their 75th birthdays and are all buried in the same wooden boxes in identical plots of land. They are all equally clever and they all have identical personalities. Their genetic make-up never varies. Mathematically speaking all of these characteristics are constants.
-
If no variation exist in the course of the life and the world seems less than realistic then we need statistics! In a richly varying world , statistics is essential. If nothing varies, then everything that is to be known about people could be guessed from information obtained from a single person. No problems would arise in generalising since what is true of lkay Baarr is true of everyone else theyre all called lkay Baarr after all. Fortunately, the world is not like that.
Variability is an essential characteristic of life and the social world in which we exist. The sheer quantity of variability has to be controlled when trying to make statements about the real world.
Statistics is largely makes the variability is comprehensible
-
We can give various definitions for statistics in a sense of its meaning is a discipline,
Statistics aims to give the decision about related subject data by collecting, summarizing,
organizing, analyzing and interpreting of data.
Statistics covers all subjects related with numerical data which are encountered in our
daily life. On the otherhand statistics aids to social and economical disciplines in
investigations of the facts.
To make the problems easy to understand and summarize them in organized and solve
them in graphically make statistics a standart communication science for all sciences. So,
Statistics is a standard communication science for all other sciences.
Statistics, is branch of Applied Mathematics. So It can be applied even all sciences.
Statistics is to extract information from data.
5
-
6
Statistics is a way to get information from data
Data Information
Definitions: Oxford English Dictionary
Statistics is a tool for creating new understanding from a set of numbers.
Statistics
-
There are two types of statistics in application, Descriptive Statistics, decribes sets of data to
summarize the information. It utilizes numerical and graphical methods.
Inferential Statistics, utilizes sample data to make estimates, decisions, predictions etc..
7
Statistics, is the science of data. It involves techniques
and methods for collecting, summarizing,
organizing, analyzing and interpreting numerical
information.
-
In all sciences, the main method for analyzing data
scientifically can be summarized as follows.
1. To observe the event being analyzed together with
its objects and decribe them clearly.
2. To generate fundamentals or rules about event after
collecting, summarizing, organizing, analyzing and
interpreting numerical information
3. To inference for the future.
4. To control the parameters of the event and to
provide technical and methodical improvemets.
8
-
Statistics applies the scientific steps upon data so gives aid for all sciences.
Statistics, is applied on science for observing events and
analyzing them at laboratories is used for the social sciences.
in Economy, Psycology, Sociology, Demograpy etc.., and
in State and in Business life as Healthy Services, Education,
Production, Selling, Marketing, Finance, Economy,
Advertisement and Sporting etc..
9
-
It is the reality that we must not to expect miracles or expect
%100 consistency from Statistics about events analyzed
because of the unknown and uncontrollable parameters
which are inevitable parts of events inspite of statistics
makes interprets and inferences and directs us in right
way.
10
-
Subject/Occurrence/Event
Data, Types of Data and Data sources, Time Series
Variable
Population and Sample
Statistical Survey & Sample Survey, Census and Sampling
Parameter
Measurement and Scales
Summation (Summing) Notation
11
-
We can think of events as changes in objects or in
relations among objects. It is the aim of the survey.
There are two types of event. Typical Event: these are similar events , like physical and
chemical.One explains all. Forexample fall down an object,
heating water, etc.
Collective Event :They are not similar.they may have common
parts , like biology.
They are most commonly related with live affairs. This kind of
events research one by one . Forexample to buy a book
bestseller, to watch a immigration of a specific kind of bird
horde etc. 12
-
Sometimes elimination between two groups might be difficult.
Because causes which effect the events may be different. These
reasons are general and occasional. In general, statistics study
with cumulative events. Forexample quality of soil and
climate of land are general reasons for the harvest in
agriculture facilities however quality of seed and agricultural
methods are occasinal reasons.
13
Statistics investigates collective events generally.
-
They are observed values of a variable.
They are numerical information about related subject.
Singular form of DATA is called DATUM.
Some data may not be described numerical but they can
convey as numerical form by counting.
There are two types of data.
1. Numerical Data (Interval Data or Quantitative Data)
2. Categorical Data (Nominal Data or Qualitative Data)
14
-
15
We can classify data by kinds of
grouping
1_ Continuous Data
It is any value within a given range of real numbers.Examples:WeightVoltageHeightSize of footTime TemperatureDistanceVelocity
Discrete DataThey are produced by counting processingExamples:
2_
-
Chapter 0 16
Numerical Data (Interval or Quantitative Data) Discrete Data Continuous Data
Ratio Data
Categorical Data (Nominal or Qualitative Data)
Grouped data
Ordinal data
We can classify data by kinds of
grouping 1_ Continuous Data
It is any value within a given range of real numbers.Examples:WeightVoltageHeightSize of footTime TemperatureDistanceVelocity
Discrete DataThey are produced by counting processingExamples:
2_
-
We can classify data by kinds of grouping
Interval Data
(Numerical or Quantitative Data)
They includes two types of data,
Nominal Data (Categorical or Qualitative Data)
They are produced by responses that are
belong to groups or categories.
Discrete Data They are produced by
counting processing Examples:
Number of children Defects per hour Number of X firms`stocks Number of cars saled Number of articles for a material Heart beats per minute
Continuous Data It is any value within a given
range of real numbers. Examples:
Weight Voltage Height Size of foot Time Temperature Distance Velocity
Examples:
Marital Status
Gender
Registration to vote
Eye Color
Religious
Ordinal Data
They are nominal data but their values are in order with respect to given codes.
Examples:
Student evaluation rating by grade (1:poor;2:good;3:excellent)
Product Quality rating (1:poor;2:average;3:good)
Size of T-shirts (S:small; M:Middle;L:Large;XL:ExtraLarge)
Ratio data are continuous data where both differences and ratios are interpretable.
Ratio data has a natural zero point.It is a meaningful zero point
which allows for the interpretation of ratio comparisons.
Examples:
Time is an example of a ratio measurement scale. Not only can
we say that difference between three hours and five hours is
the same as the difference between eight hours and ten hours
(equal intervals), but we can also say that ten hours is twice as
long as five hours (a ratio comparison). 17
-
A time series is a sequence of observations which are ordered in time. If observations are made on some phenomenon throughout time, it is most sensible to display the data
in the order in which they arise, particularly since successive observations will probably be dependent.
21
Examples for time series; 1. Economics: weekly share prices; monthly profits 2. Meteorology: daily rainfall; wind speed; temperature 3. Sociology: employment figures; number of patients applied to hospital in a day,
Time series are best displayed in a scatter plot. The series value X is
plotted on the vertical axis and time t on the horizontal axis. Time is
called the independent variable .
X : Time Y : Observations
-
Data Source is a specific data set, metadata set, database or metadata repository from where
data or metadata are available.
Data sources can be classified according to the
Their survey purposes.
If data are collected and prepared in firm then it is an Interior Kind Data but sourced from exterior
then it is called Exterior Data.
(X hospital Patients list treated in Psychology Dept.) is interor data for X hospital
(Number of hospital beds in public and private inpatient institutions Ministry of Health) is
exterior data for X Hospital.
Its source and how it is handled.
If the data are collected from population itself then it is called Primary Data otherwise is
called Secondary Data.
(X hospital Patients list treated in Psychology Dept.) is primary data for X hospital
(Health statistics of Turkey Ministry of Health) is Secondary data for X hospital
22
-
23
A variable is a characteristic of a population or a sample.
The values of a variable are possible observations of the variable.
They can change for each observation.
Forexample,
The mark on a statistics exam will vary from student to student.
So, it is a variable.
The price of a stock will change from day to day in stock market.
So, it is a variable.
-
They are shown with letters like X,Y,Z,,, .
They are used with indices to decribe instantaneous status of
unit. X1, X2, X3, ... ,Xi
Forexample, X explains the sales amount but X 3 is a
subscripted variable and means the sales amount for 3rd.
month of year.(March)
24
-
Variables are classified according to data features which they are observed;
25
by attributes, by scaling, by observation
by observation
Dependent variables
Independent variables
Controllable Variables
by scaling,Discrete variables
Continious variables
by attributes,Quantitative variables
Qualitative variables
-
Variables are classified according to data features which they are observed by ;
by attributes,
Quantitative variables Variables that are measured in terms of numbers.
(age,weight,height,speed and shoe size ...)
Qualitative variables Variables that express a qualitative attribute (hair color, eye color,
religion, favorite movie, gender,race,nation...)
by scaling,
Discrete variables Variable with possible scores of discrete points on the scale. (counting numbers, marital status, sexuality,...) Hint: A household could have three children or six children, but
not 4.53 children. Continious variables Variable where the scale is continuous and not made up of discrete
steps. (age,weight,intelligency level.,temperature,....) Hint :The response time could be 1.64
seconds, or it could be 1.64237123922121 seconds.
26
-
by observation
Dependent variables They are answer of the question = (What I observe in experiment ? ) A variable
which its value depends on the value of the independent variable. The independent variable is
manipulated by the experimenter and its effects on the dependent variable are measured. The time for boiling of water is a dependent variable which depends to heat temperature
or air pressure. The time for burning a candle out is dependent variable which depends on height of
candle.
Independent variables They are answer of the question = (What I change in experiment ?) A
variable is manipulated by an experimenter . The heat and the pressure degrees which boil water are independent variables The height of candle is independent variable affect the time which of burning a candle is
out.
Controllable Variables They are answer of the question = (What I keep the same ?) They are
quantities that a scientist wants to remain constant
(In the experiment of burning a candle fast, we can use same type of candle and to keep the
room windless.)
27
-
A population is a set of measurements of units (Usually
people, objects, transactions or events) that we are
interested in studying.
A single entity of population is called a unit.
All students having been educated in Turkey is a populaion.
Any student in this population is a person=a member=an
entity of the Students having been educated in Turkey.
Units are countable and measurable although colour and
taste are not consider as a unit.
Sometimes population is huge amount and it is impossible
to count or measure it .
28
-
Sample is a subset of the units of a population. If we have a
population then we select a much small and controllable
units as subset of population. This subset must be describe
the population. By this way we get an understanable results
for a population. But this way has got an error depens on
selection of sample. Forexample; All university students is a
population but some 100 students has got definite brand of
GSM telephony selected randomly is a sample for the
population All university students .
29
-
30
Population Samples
Sample _1
Sample _2
Sample _3
Unit
-
A parameter is a numerical decriptive measure of a
population. The mode of sex of childrens in a nursery is female.
The average height of the students enrolled to the course STAT is 170 cm.
Sample statistics is a numerical decriptive measure of a
sample. It is calculated from the observations in the
sample. The mode of sex of 15 students selected in the class is female.
The average height of the students in STAT_Section-A is 170 cm
31
-
Statistical Survey is a means of collecting data from a sample of that population and estimating their characteristics through the systematic use of statistical methodology. Sample Survey is a survey that includes elements of a sample Census (Population Survey) , a collection of data about every member of a population . Sampling (statistics), collecting data on only a sample of a population
32
-
33
To Measure is the process we use to assign numbers to variables of individual population units. The values obtained after measurement are called Measurement. Scaling is the process of measuring with respect to quantitative attributes. Scales are some group of techniques which assign measurements are meaningful.
-
1. Nominal(Categorical)Scale Objects are scaled according their definite attributes. Grouping company cars according their purpose to serve, grouping people
according their jobs, grouping sport teams according to natinality, Gender, Ethnicity and Marital Status are scaled by this technique.
To define counts, frequencies, maximum or minimum count are permissible.
34
Scales are techniques makes measurements are understandable.
2. Ordinal(Rankable) Scale Objects are scaled according their some attributes. Scales defines more or less of
attribute. The numerical scores which can be ordered from smallest to highest place.
To order liquids according their densities, to order students in the class according to their heights, to order teams 1st, 2nd, and 3rd in a sport race.
To find Median, % computing and to critize data greater than or not, are permissible.
-
3. Interval Scale Scale with a fixed and defined interval. Intervals between adjacent scale values are
equal with respect the the attribute being measured. Scaling devices developed for different systems, thermometers, some kind
calendars, to define the positions of runners like Ali is finished race 5 seconds behind Veli, .
Mean,Standart Deviation, Correlation computations are permissible.
35
Types of Scales
4. Ratio Scale Intervals describe ratios of magnitudes. There is a rationale zero point for the scale. The examples given are Ali took the score half of the score of Veli, Team A won
the twice of the score of Team B. Comparisions onto ratios is possible. Meter, kilogramme, degrees scales,.. All Statistical techniques are permissible.
-
Very often in statistics an algebraic expression of the form X1+X2+X3+...+XN is used in a formula to compute a statistic. (using indexes) The three dots ... in the preceding expression mean that there are many iterations about summation of variable Xs expressed as indexed. To write an expression like this way is very tedious often, so mathematicians have developed a shorthand notation to represent a sum of scores, called the summation notation.
36
28
+2 +6 +9 +7 +4
+ + + + + + +
-
The expression is read, "the sum of X sub i from i equals 1 to N" It means "add up all the numbers." In the example set of five numbers, where N=5, the summation could be written:
The "i=1" in the bottom of the summation notation tells where to begin the sequence of summation. If the expression were written with "i=3", the summation would start with the third number in the set. For example:
37
-
The General Rule : DO THE ALGEBRAIC OPERATION AND THEN SUM.
X Y X * Y
5 6 30
7 7 49
7 8 56
6 7 42
8 8 64
33 36 241
38
Example : Following data set is given. Find The sum of the product of the two
variables X and Y.
... s true technique, true result.
... s wrong technique, wrong result.
-
1. When the expression being summed contains a "+" or "-" at the highest level, then the summation sign may be taken inside the parentheses.
Chapter 0 39
2. The sum of a constant times a variable is equal to the constant times the sum of the variable.
3. The sum of a constant is equal to N times the constant.
-
Problem : A survey data is given; 3, 4, 5, 8, 9. compute following sums. a) (X+2) = b) X2 = Solution : a) (X+2) = (3+2)+(4+2)+(5+2)+(8+2)+(9+2) = (5)+(6)+(7)+(10)+(11) = 39 b) X2 = (32)+(42)+(52)+(82)+(92) = (9)+(16)+(25)+(64)+(81) = 175
40
Problem : if two data set are given; For variable X : 3, 4, 5, 8, 9. For variable Y : 1, 5, 6, 7, 8. compute following sums. Solution : a) XY = (3*1)+(4*5)+(5*6)+(8*7)+(9*8) = (3)+(20)+(30)+(56)+(72) = 181 b) (X Y)2 = (3-1)2+(4-5)2+(5-6)2+(8-7)2+(9-8)2 = (4)+(1)+(1)+(1)+(1) = 8