std11 stat em

267
i STATISTICS HIGHER SECONDARY – FIRST YEAR Untouchability is a sin Untouchability is a crime Untouchability is inhuman TAMILNADU TEXTBOOK CORPORATION College Road , Chennai- 600 006

Upload: vivek-poojary

Post on 27-Nov-2014

281 views

Category:

Documents


17 download

TRANSCRIPT

iSTATISTICS HIGHER SECONDARY FIRST YEARUntouchability is a sinUntouchability is a crimeUntouchability is inhuman TAMILNADU TEXTBOOK CORPORATIONCollege Road , Chennai- 600 006ii Government of TamilnaduFirst Edition 2004Reprinit - 2005ChairpersonDr. J. JothikumarReader in StatisticsPresidency CollegeChennai 600 005.ReviewersThiru K.Nagabushanam Thiru R.RavananS.G.Lecturer in Statistics S.G.Lecturer in StatisticsPresidency College Presidency CollegeChennai 600 005. Chennai 600 005.AuthorsTmt. V.Varalakshmi Tmt. N.SuseelaS.G.Lecturer in Statistics P.G.TeacherS.D.N.B. Vaishnav College Anna Adarsh Matricfor women Hr. Sec. SchoolChrompet, Annanagar,Chennai 600 044. Chennai600 040.Thiru G.Gnana Sundaram Tmt.S.EzhilarasiP.G.Teacher P.G.TeacherS.S.V. Hr. Sec. School P.K.G.G. Hr. Sec. SchoolParktown, Chennai 600 003. Ambattur, Chennai600 053.Tmt. B.IndraniP.G.TeacherP.K.G.G. Hr. Sec. SchoolAmbattur, Chennai 600 053.Price: Rs.Printed by offset at:This book has been prepared by the Directorate of SchoolEducation on behalf of the Government of Tamilnadu.This book has been printed on 60 G.S.M paperiiiCONTENTS Page1. Definitions, Scope and Limitations 12. Introduction to sampling methods 113. Collection of data,Classification and Tabulation 284. Frequency distribution 495. Diagramatic and graphicalrepresentation686. Measures of Central Tendency 947. Measures of Dispersion,Skewness and Kurtosis1418. Correlation 1919. Regression 21810. Index numbers 24111.DEFINITIONS, SCOPE ANDLIMITATIONS1.1 Introduction:Inthemodernworldofcomputersandinformationtechnology,theimportanceof statistics is very well recogonised byall the disciplines. Statistics has originated as a science of statehoodandfound applicationsslowlyandsteadily inAgriculture,Economics,Commerce,Biology,Medicine,Industry,planning,educationandsoon.Asondatethereisnootherhumanwalkoflife, where statistics cannot be applied.1.2 Origin and Growth of Statistics:The word Statisticsand Statisticalareall derivedfromtheLatinwordStatus,meansapoliticalstate.Thetheoryofstatisticsasadistinctbranchofscientificmethodisofcomparativelyrecentgrowth.Researchparticularlyintothemathematicaltheoryofstatisticsisrapidlyproceedingandfreshdiscoveries are being made all over the world.1.3 Meaning of Statistics:Statisticsisconcernedwithscientificmethodsforcollecting,organising,summarising,presentingandanalysingdataaswellasderivingvalidconclusionsandmakingreasonabledecisionsonthebasisofthisanalysis.Statisticsisconcernedwiththesystematiccollectionofnumericaldataanditsinterpretation.The word statisticis used to refer to1. Numerical facts, such as the number of people living inparticular area.2. The study of ways of collecting, analysing and interpretingthe facts.1.4 Definitions:Statisticsisdefineddifferentlybydifferentauthorsoveraperiodoftime.Intheoldendaysstatisticswasconfinedtoonlystateaffairsbutin moderndaysitembracesalmostevery sphereof2humanactivity.Thereforeanumberofolddefinitions,whichwasconfinedtonarrowfieldofenquirywerereplacedbymoredefinitions,whicharemuchmorecomprehensiveandexhaustive.Secondly,statisticshasbeendefinedintwodifferentways Statisticaldataandstatisticalmethods.Thefollowingaresomeofthe definitions of statistics as numerical data.1. Statisticsaretheclassifiedfactsrepresentingtheconditionsofpeopleinastate.Inparticulartheyarethefacts,whichcanbestatedinnumbersorintablesofnumbersorinanytabular or classified arrangement.2. Statisticsaremeasurements,enumerationsorestimatesofnaturalphenomenonusuallysystematicallyarranged,analysedandpresentedastoexhibitimportantinter-relationships among them.1.4.1 Definitions by A.L. Bowley:Statistics are numerical statement of facts in any departmentof enquiry placed in relation to each other. - A.L. BowleyStatistics may be called the science of counting in one of thedepartmentsduetoBowley,obviouslythisisanincompletedefinitionasittakesintoaccountonly theaspectof collectionandignoresotheraspectssuchasanalysis,presentationandinterpretation.Bowleygivesanotherdefinitionforstatistics,whichstates statisticsmayberightlycalledtheschemeofaverages .Thisdefinitionisalsoincomplete,asaveragesplay animportantroleinunderstandingandcomparingdataandstatisticsprovidemoremeasures.1.4.2 Definition by Croxton and Cowden:Statisticsmaybedefinedasthescienceofcollection,presentationanalysisandinterpretationofnumericaldatafrom thelogicalanalysis. ItisclearthatthedefinitionofstatisticsbyCroxton and Cowden is the most scientific and realistic one.According to this definition there are four stages:1. Collection ofData: It is the first step and this is the foundationuponwhichtheentiredataset.Carefulplanningisessential beforecollectingthedata.Therearedifferentmethodsofcollectionof3datasuchascensus,sampling,primary,secondary,etc.,andtheinvestigator should make use of correct method.2.Presentationofdata:Themassdatacollectedshouldbepresentedinasuitable,conciseformforfurtheranalysis.Thecollecteddatamaybepresentedintheformoftabularordiagrammatic or graphic form.3.Analysisofdata:Thedatapresentedshouldbecarefullyanalysedformakinginferencefromthepresenteddatasuchasmeasuresofcentraltendencies,dispersion,correlation,regressionetc.,4.Interpretationofdata:Thefinalstepisdrawingconclusionfromthedatacollected.Avalidconclusionmustbedrawnonthebasisof analysis.A high degreeof skill andexperienceis necessaryfor the interpretation.1.4.3 Definition by Horace Secrist:Statisticsmaybedefinedastheaggregateoffactsaffectedtoamarkedextentbymultiplicityofcauses,numericallyexpressed,enumeratedorestimatedaccordingtoareasonablestandardofaccuracy,collectedinasystematicmanner,forapredetermined purpose and placed in relation to each other.Theabovedefinitionseemstobethemostcomprehensiveand exhaustive.1.5 Functions of Statistics:Therearemanyfunctionsofstatistics.Letusconsiderthefollowing five important functions.1.5.1 Condensation:Generally speaking by the word tocondense ,we mean toreduceortolessen.Condensationismainlyappliedatembracingtheunderstandingofahugemassofdatabyprovidingonlyfewobservations.If in aparticularclass in Chennai School,only marksinanexaminationaregiven,nopurposewillbeserved.Insteadifwearegiventheaveragemarkinthatparticularexamination,definitely itservesthebetterpurpose.Similarly therangeof marksisalsoanothermeasureof thedata.Thus,Statistical measures helptoreducethecomplexityofthedataandconsequentlytounderstand any huge mass of data.41.5.2 Comparison:Classificationandtabulationarethetwomethodsthatareusedtocondensethedata.They helpustocompare data collectedfromdifferentsources.Grandtotals,measuresofcentraltendencymeasuresofdispersion,graphsanddiagrams,coefficientofcorrelation etc provide ample scope for comparison.If we haveone group of data, we can compare within itself.If the rice production (in Tonnes) in Tanjore district is known, thenwe can compare one region with another region within the district.Orifthericeproduction(inTonnes)oftwodifferentdistrictswithinTamilnaduisknown,thenalsoacomparative studycanbemade.Asstatistics is an aggregateof facts andfigures,comparisonisalwayspossibleandinfactcomparisonhelpsustounderstandthe data in a better way.1.5.3 Forecasting:By the wordforecasting,we mean topredict or toestimatebeforehand.Giventhedataofthelasttenyearsconnectedtorainfall of aparticulardistrictin Tamilnadu,itis possibletopredictorforecasttherainfallforthenearfuture.Inbusinessalsoforecastingplaysadominantroleinconnectionwithproduction,sales, profits etc. The analysis of time series and regression analysisplays an important role in forecasting.1.5.4 Estimation:Oneofthemainobjectivesofstatisticsisdrawninferenceaboutapopulationfromtheanalysisforthesampledrawnfromthat population. The four major branches of statistical inference are1. Estimation theory2. Tests of Hypothesis3. Non Parametric tests4. Sequential analysisIn estimation theory,we estimate the unknown value of thepopulationparameterbasedonthesampleobservations.Supposewearegivenasampleofheightsofhundredstudentsinaschool,basedupontheheightsofthese100students,itispossibletoestimate the average height of all students in that school.1.5.5 Tests of Hypothesis:5Astatisticalhypothesisissomestatementabouttheprobabilitydistribution,characterisingapopulationonthebasisoftheinformationavailablefromthesampleobservations.Intheformulationandtestingofhypothesis,statisticalmethodsareextremely useful.Whethercropyieldhasincreasedbecauseof theuseofnewfertilizerorwhetherthenewmedicineiseffectiveineliminatingaparticulardiseasearesomeexamplesof statements ofhypothesis and these are tested by proper statistical tools.1.6 Scope of Statistics:Statistics is notameredeviceforcollectingnumerical data,butasameansofdevelopingsoundtechniquesfortheirhandling,analysinganddrawingvalidinferencesfromthem.Statisticsisappliedineverysphereofhumanactivity socialaswellasphysical likeBiology,Commerce,Education,Planning,BusinessManagement,InformationTechnology,etc.Itisalmostimpossibletofindasingledepartmentofhumanactivitywherestatisticscannotbeapplied.Wenowdiscussbrieflytheapplicationsofstatistics in other disciplines.1.6.1 Statistics and Industry:Statisticsiswidelyusedinmanyindustries.Inindustries,control charts are widely used tomaintain a certain quality level. Inproductionengineering,tofindwhethertheproductisconformingtospecificationsornot,statisticaltools,namelyinspectionplans,control charts,etc.,areof extremeimportance.Ininspectionplanswehavetoresorttosomekindofsampling averyimportantaspect of Statistics.1.6.2 Statistics and Commerce:Statisticsarelifebloodofsuccessfulcommerce.Anybusinessmancannotaffordtoeitherbyunderstockingorhavingoverstockofhisgoods.Inthebeginningheestimatesthedemandforhisgoodsandthentakesstepstoadjustwithhisoutputorpurchases.Thusstatisticsisindispensableinbusinessandcommerce.AssomanymultinationalcompanieshaveinvadedintoourIndianeconomy,thesizeandvolumeof business is increasing.Ononesidethestiffcompetitionisincreasingwhereasontheothersidethetastesarechangingand newfashions areemerging.Inthis6connection,marketsurveyplaysanimportantroletoexhibitthepresent conditions and to forecast the likely changes in future.1.6.3 Statistics and Agriculture:Analysisofvariance(ANOVA)isoneofthestatisticaltools developed by Professor R.A.Fisher,plays a prominent role inagricultureexperiments.Intestsofsignificancebasedonsmallsamples,itcanbeshownthatstatisticsisadequatetotestthesignificantdifferencebetweentwosamplemeans.Inanalysisofvariance,weareconcernedwiththetestingofequalityofseveralpopulation means.Foran example,five fertilizers are applied tofive plots eachofwheatandtheyieldofwheatoneachoftheplotsaregiven.Insuch asituation,weareinterestedin finding outwhether the effectofthesefertilisersontheyieldissignificantlydifferentornot.Inotherwords,whetherthe samples are drawn from the same normalpopulationornot.TheanswertothisproblemisprovidedbythetechniqueofANOVAanditisusedtotestthehomogeneityofseveral population means.1.6.4 Statistics and Economics:Statisticalmethodsareusefulinmeasuringnumericalchangesincomplexgroupsandinterpretingcollectivephenomenon.Nowadaystheusesofstatisticsareabundantlymadeinanyeconomicstudy.Bothineconomictheoryandpractice,statistical methods play an important role.AlfredMarshall said,Statisticsarethe strawonly which Ilike every other economist have to make the bricks. It may also benotedthatstatisticaldataandtechniquesofstatisticaltoolsareimmenselyusefulinsolvingmanyeconomicproblemssuchaswages, prices, production, distribution of income and wealth and soon.StatisticaltoolslikeIndexnumbers,timeseriesAnalysis,Estimationtheory,TestingStatisticalHypothesisareextensivelyused in economics.1.6.5 Statistics and Education:Statisticsiswidely usedin education.Research hasbecomeacommonfeatureinallbranchesofactivities.Statisticsisnecessaryfortheformulationofpoliciestostartnewcourse,considerationoffacilitiesavailablefornewcoursesetc.Thereare7manypeopleengagedinresearchworktotestthepastknowledgeandevolvenewknowledge.Thesearepossibleonlythroughstatistics.1.6.6 Statistics and Planning:Statisticsisindispensablein planning.Inthemodernworld,whichcanbetermedastheworldofplanning,almostalltheorganisationsinthegovernmentareseekingthehelpofplanningforefficientworking,fortheformulationofpolicydecisionsandexecution of the same.Inordertoachievetheabovegoals,thestatisticaldatarelatingtoproduction,consumption,demand,supply,prices,investments,incomeexpenditureetcandvariousadvancedstatisticaltechniquesforprocessing,analysingandinterpretingsuchcomplexdataareofimportance.InIndiastatisticsplayanimportantroleinplanning,commissioningbothatthecentralandstate government levels.1.6.7 Statistics and Medicine:InMedicalsciences,statisticaltoolsarewidelyused.Inordertotesttheefficiencyofanewdrugormedicine,t -testisused or tocompare the efficiency of two drugs or two medicines, t-testforthetwosamplesisused.Moreandmoreapplicationsofstatistics are at present used in clinical investigation.1.6.8 Statistics and Modern applications:Recentdevelopmentsinthefieldsofcomputertechnologyand information technology have enabled statistics to integrate theirmodelsandthusmakestatisticsapartofdecisionmakingproceduresofmanyorganisations.Therearesomanysoftwarepackagesavailableforsolvingdesignofexperiments,forecastingsimulation problems etc.SYSTAT,asoftwarepackageoffersmerescientificandtechnicalgraphingoptionsthananyotherdesktopstatisticspackage.SYSTATsupportsalltypesofscientificandtechnicalresearch in various diversified fields as follows1. Archeology: Evolution of skull dimensions2. Epidemiology: Tuberculosis3. Statistics: Theoretical distributions4. Manufacturing: Quality improvement85. Medical research: Clinical investigations.6. Geology:EstimationofUraniumreservesfromgroundwater1.7 Limitations of statistics:Statisticswithallitswideapplicationineverysphereofhumanactivityhasitsownlimitations.Someofthemaregivenbelow.1. Statistics isnotsuitabletothestudyofqualitativephenomenon: Sincestatisticsisbasically ascience and dealswithasetofnumericaldata,itisapplicabletothestudyofonlythesesubjectsofenquiry,whichcanbeexpressedintermsofquantitativemeasurements.Asamatteroffact,qualitativephenomenonlikehonesty,poverty,beauty,intelligenceetc,cannotbeexpressednumericallyandanystatisticalanalysiscannotbedirectlyappliedonthesequalitativephenomenons.Nevertheless,statisticaltechniquesmaybeappliedindirectlybyfirstreducingthequalitativeexpressionstoaccuratequantitativeterms.Forexample,theintelligence of a groupof students can be studied on the basisof their marks in a particular examination.2. Statisticsdoesnot studyindividuals: Statisticsdoesnotgiveany specificimportancetotheindividual items,in factitdealswithanaggregateofobjects.Individualitems,whentheyaretakenindividuallydonotconstituteanystatisticaldata and do not serve any purpose for any statistical enquiry.3. Statisticallawsarenotexact:Itiswellknownthatmathematicalandphysicalsciencesareexact.Butstatisticallawsarenotexactandstatisticallawsareonlyapproximations.Statisticalconclusionsarenotuniversallytrue. They are true only on an average.4. Statisticstablemaybemisused: Statisticsmustbeusedonlybyexperts;otherwise,statisticalmethodsarethemostdangeroustoolsonthehandsoftheinexpert.Theuseofstatisticaltoolsbytheinexperiencedanduntracedpersonsmightleadtowrongconclusions.Statisticscanbeeasilymisusedbyquotingwrongfiguresofdata.AsKingsays9aptly statistics are like clay of which one can make a God orDevil as one pleases .5. Statisticsisonly,oneofthemethodsofstudyingaproblem:Statisticalmethoddonotprovidecompletesolutionoftheproblems becauseproblemsaretobestudiedtakingthebackgroundofthecountriesculture,philosophyorreligionintoconsideration.Thusthestatisticalstudyshouldbesupplemented by other evidences.Exercise 1I. Choose the best answer:1. The origin of statistics can be traced to (a)State (b)Commerce (c)Economics (d)Industry.2. Statistics may be called the science of countingis thedefinition given by(a)Croxton (b) A.L.Bowley (c) Boddington (d) Webster.II. Fill in the blanks:3. In the olden days statistics was confined to only _______. 4.Classification and _______ are the two methods that areused to condense the data.5. The analysis of time series and regression analysis plays animportant role in _______.6.______ is one of the statistical tool plays prominent role in agricultural experiments.III. Answer the following questions:7.Write the definitions of statistics by A.L.Bowley.8.What is the definitions of statistics as given by Croxton and Cowden.109.Explain the four stages in statistics as defined by Croxton and Cowden.10. Write the definition of statistics given by Horace Secrist.11. Describe the functions of statistics.12. Explain the scope of statistics.13. What are the limitations of statistics.14. Explain any two functions of statistics.15. Explain any two applications of statistics.16. Describe any two limitations of statistics.IV. Suggested Activities (Project Work):17. Collect statistical informations from Magazines, Newspapers, Television, Internet etc.,18. Collect interesting statistical facts from various sourcesand paste it in your Album note book.Answers:I.1. (a) 2. (b)II. 3. State affairs 4. Tabulation 5. Forecasting 6. Analysis of variance (or ANOVA)112. INTRODUCTION TO SAMPLINGMETHODS2.1 Introduction:Samplingisveryoftenusedinourdailylife.Forexamplewhilepurchasingfoodgrainsfromashopweusuallyexamineahandfulfromthebagtoassessthequalityofthecommodity.Adoctorexaminesafewdropsofbloodassampleanddrawsconclusionaboutthebloodconstitutionofthewholebody.Thusmost of our investigations are based on samples. Inthischapter,letusseetheimportanceofsamplingandthevariousmethodsofsample selections from the population.2.2 Population:Inastatisticalenquiry,alltheitems,whichfallwithinthepurview of enquiry, are known as Population or Universe. In otherwords,thepopulation is acomplete setof all possible observationsofthetypewhich istobeinvestigated.Totalnumberofstudentsstudyinginaschool orcollege,total numberof booksin alibrary,totalnumberofhousesinavillageortownaresomeexamplesofpopulation.Sometimesitispossibleandpracticaltoexamineeveryperson or item in the population we wish to describe. We call this aCompleteenumeration,or census.Weusesampling whenitisnotpossibletomeasureeveryiteminthepopulation.Statisticiansuse the word population torefer notonly topeople but to all itemsthat have been chosen for study.2.2.1 Finite population and infinite population:Apopulationissaidtobefiniteifitconsistsoffinitenumberofunits.Numberofworkersinafactory,productionofarticlesinaparticulardayforacompanyareexamplesoffinitepopulation. The total number of units in a population is calledpopulationsize.Apopulationissaidtobeinfiniteifithasinfinitenumberofunits.Forexamplethenumberofstarsinthesky,thenumber of people seeing the Television programmes etc.,122.2.2 Census Method:Informationonpopulationcanbecollectedintwoways censusmethodandsamplemethod.Incensusmethodeveryelementofthepopulationisincludedintheinvestigation.Forexample, if we study the average annualincomeof thefamiliesof aparticular village orarea,and if there are 1000 families in that area,wemuststudytheincomeofall1000families.Inthismethodnofamily is left out, as each family is a unit.Population census of India: Thepopulationcensusofourcountryistakenat10yearlyintervals.The latestcensuswastaken in 2001.The firstcensus wastaken in 1871 72.[LatestpopulationcensusofIndiaisincludedattheendofthechapter.]2.2.3 Merits and limitations of Census method:Mertis:1. Thedataarecollectedfromeachandeveryitemofthepopulation2. Theresultsaremoreaccurateandreliable,becauseeveryitem of the universe is required.3. Intensive study is possible.4. Thedatacollectedmaybeusedforvarioussurveys,analyses etc.Limitations:1. It requires a large number of enumerators and it is a costly method2.It requires more money, labour, time energy etc. 3.It is not possible in some circumstances where the universe is infinite.2.3 Sampling:The theory of sampling has been developedrecently butthisisnotnew.Inoureverydaylifewehavebeenusingsamplingtheoryaswehavediscussedinintroduction.Inallthosecaseswebelievethatthesamplesgiveacorrectideaaboutthepopulation.Mostof ourdecisionsarebasedontheexaminationof a few itemsthat is sample studies.132.3.1 Sample:Statisticiansusetheword sampletodescribeaportionchosen from the population. A finite subsetof statistical individualsdefined in a population is called a sample. The number of units in asample is called the sample size.Sampling unit:The constituents of a population which are individuals to besampledfromthepopulationandcannotbefurthersubdividedforthe purpose of the sampling ata time are called sampling units. Forexample,toknowtheaverageincomeperfamily,theheadofthefamilyisasamplingunit.Toknowtheaverageyieldofrice,eachfarm owner s yield of rice is a sampling unit.Sampling frame:Foradoptingany samplingprocedureitisessential tohavealistidentifyingeachsamplingunitbyanumber.Suchalistormapiscalledsamplingframe.Alistofvoters,alistofhouseholders,a list of villages in a district, a list of farmers etc. are a fewexamples of sampling frame.2.3.2 Reasons for selecting a sample:Sampling is inevitable in the following situations:1. Completeenumerationsarepracticallyimpossiblewhenthepopulation is infinite.2. When the results are required in a short time.3. When the area of survey is wide.4. Whenresourcesforsurveyarelimitedparticularlyinrespectof money and trained persons.5. When the item or unit is destroyed under investigation.2.3.3 Parameters and statistics:Wecandescribesamplesandpopulationsbyusingmeasuressuchasthemean,median,modeandstandarddeviation.Whenthesetermsdescribethecharacteristicsof apopulation,theyarecalled parameters.Whenthey describethecharacteristicsof asample,they arecalled statistics. A parameter is a characteristic ofapopulationandastatisticisacharacteristicofasample.Sincesamples are subsets of population statistics provide estimates of the14parameters.Thatis,whentheparametersareunknown,theyareestimated from the values of the statistics.In general, we use Greek or capital letters for populationparametersandlowercaseRomanletterstodenotesamplestatistics.[N,, ,arethestandardsymbolsforthesize,mean,S.D,of population.n , x ,s,arethestandardsymbol forthesize,mean, s.d of sample respectively].2.3.4 Principles of Sampling:Sampleshavetoprovidegoodestimates.Thefollowingprincipletellusthatthesamplemethodsprovidesuchgoodestimates1. Principle of statistical regularity:A moderately largenumberof unitschosen atrandom fromalargegrouparealmostsureontheaveragetopossessthecharacteristics of the large group.2. Principle of Inertia of large numbers:Otherthingsbeingequal,asthesamplesizeincreases,theresults tend to be more accurate and reliable.3. Principle of Validity:Thisstatesthatthesamplingmethodsprovidevalidestimates about the population units (parameters).4. Principle of Optimisation:Thisprincipletakes intoaccountthedesirabilityofobtainingasamplingdesignwhichgivesoptimumresults.Thisminimizes the risk or loss of the sampling design.Theforemostpurposeofsamplingistogathermaximuminformationaboutthepopulationunderconsiderationatminimumcost, time and human power. This is best achieved when the samplecontains all the properties of the population.Sampling errors and non-sampling errors:Thetwotypesoferrorsinasamplesurveyaresamplingerrors and non - sampling errors.1. Sampling errors:Althoughasampleisapartofpopulation,itcannotbeexpectedgenerallytosupplyfullinformationaboutpopulation.Sotheremaybeinmostcasesdifferencebetweenstatisticsand15parameters.Thediscrepancybetweenaparameteranditsestimatedue to sampling process is known as sampling error.2. Non-sampling errors:Inallsurveyssomeerrorsmayoccurduringcollectionofactual information. These errors are called Non-sampling errors.2.3.5 Advantages and Limitation of Sampling:Therearemanyadvantagesofsamplingmethodsovercensus method. They are as follows:1.Sampling saves time and labour.2.It results in reduction of cost in terms of money and man- hour.3.Sampling ends up with greater accuracy of results.4.It has greater scope.5.It has greater adaptability.6.If the population is too large, or hypothetical or destroyable sampling is the only method to be used.The limitations of sampling are given below:1. Samplingistobedonebyqualifiedandexperiencedpersons. Otherwise, the information will be unbelievable.2. Samplemethodmaygivetheextremevaluessometimesinstead of the mixed values.3. Thereisthepossibility of samplingerrors.Censussurvey isfree from sampling error.2.4 Types of Sampling:Thetechniqueofselectingasampleisoffundamentalimportanceinsamplingtheoryand it dependsuponthenatureofinvestigation.Thesamplingprocedureswhicharecommonlyusedmay be classified as1. Probability sampling.2. Non-probability sampling.3. Mixed sampling.2.4.1 Probability sampling (Random sampling):Aprobabilitysampleisonewheretheselectionofunitsfrom the population is made according to known probabilities. (eg.)Simple random sample, probability proportional to sample size etc.162.4.2 Non-Probability sampling:Itistheonewherediscretionisusedtoselect representativeunits from the population (or) to infer that a sampleis representativeofthepopulation.Thismethodiscalledjudgementor purposive sampling. This method is mainly used foropinionsurveys;Acommontypeofjudgementsampleusedinsurveysisquotasample.Thismethodisnotusedingeneralbecauseofprejudiceandbiasoftheenumerator.Howeveriftheenumeratorisexperiencedandexpert,thismethodmayyieldvaluableresults.Forexample,inthemarketresearchsurvey of theperformanceoftheirnewcar,thesamplewasallnewcarpurchasers.2.4.3 Mixed Sampling:Heresamplesareselectedpartlyaccordingtosomeprobabilityandpartlyaccordingtoafixedsamplingrule;theyaretermedasmixedsamplesandthetechniqueofselectingsuchsamples is known as mixed sampling.2.5 Methods of selection of samples:Here we shall consider the following three methods:1. Simple random sampling.2. Stratified random sampling.3. Systematic random sampling.1. Simple random sampling:Asimplerandom samplefrom finitepopulationisasampleselectedsuchthateachpossiblesamplecombinationhasequalprobabilityofbeingchosen.Itisalsocalledunrestrictedrandomsampling.2. Simple random sampling without replacement:In this method the population elements can enter the sampleonlyonce(ie)theunitsonceselectedisnotreturnedtothepopulation before the next draw.3.Simple random sampling with replacement:Inthismethodthepopulationunitsmayenterthesamplemorethanonce.Simplerandomsamplingmaybewithorwithoutreplacement.172.5.1 Methods of selection of asimple random sampling:Thefollowingaresomemethodsofselectionofasimplerandom sampling.a) Lottery Method:Thisisthemostpopularandsimplestmethod.Inthismethodalltheitemsofthepopulationarenumberedonseparateslipsofpaperofsamesize,shapeandcolour.They arefoldedandmixedupin acontainer. The required numbers of slips are selectedatrandomforthedesiresamplesize.Forexample,ifwewanttoselect5students,outof50students,thenwemustwritetheirnamesortheirroll numbersof all the50studentson slipsandmixthem. Then we make a random selection of 5 students.Thismethodismostly usedin lottery draws.If theuniverseis infinite this method is inapplicable.b) Table of Random numbers:As the lottery method cannot be used, when the population isinfinite,thealternativemethodisthatof usingthetableof randomnumbers. There are several standard tables of random numbers.1.Tippett s table2.Fisher and Yatestable3.Kendall and Smith s table are the three tables among them.A random numbertableis soconstructed thatall digits 0 to9appearindependentofeachotherwithequalfrequency.IfwehavetoselectasamplefrompopulationofsizeN=100,thenthenumberscanbecombinedthreeby threetogivethenumbersfrom001 to 100.[See Appendix for the random number table]Procedure to select a sample using random number table:Units of the population from which a sample is required areassignedwithequalnumberofdigits.Whenthesizeofthepopulationislessthanthousand,threedigitnumber000,001,002,..999areassigned.Wemay startatany placeandmay gooninanydirectionsuchascolumnwiseorrow-wiseinarandomnumber table. But consecutive numbers are to be used.Onthebasisofthesizeofthepopulationandtherandomnumbertableavailablewithus,weproceedaccordingtoour18convenience.IfanyrandomnumberisgreaterthanthepopulationsizeN,thenNcanbesubtractedfromtherandomnumberdrawn.ThiscanberepeatedlyuntilthenumberislessthanNorequal toN.Example 1:In an area there are 500 families.Using the following extractfromatableofrandomnumbersselectasampleof15familiestofind out the standard of living of those families in that area.4652 3819 8431 2150 2352 2472 0043 34889031 7617 1220 4129 7148 1943 4890 17492030 2327 7353 6007 9410 9179 2722 84450641 1489 0828 0385 8488 0422 7209 4950Solution:Intheaboverandomnumbertablewecanstartfromanyrow orcolumn and read three digit numbers continuously row-wiseor column wise.Now we start from the third row, the numbers are:203 023 277 353 600 794 109 179272 284 450 641 148 908 280Sincesomenumbersaregreaterthan 500,we subtract500 fromthose numbers and we rewrite the selected numbers as follows:203 023 277 353 100 294 109 179272 284 450 141 148 408 280c) Random number selections using calculators orcomputers:Randomnumbercanbegeneratedthroughscientificcalculatororcomputers.Foreachpressofthekeygetanewrandom numbers.Theways of selection of sample is similar tothatof using random number table.19Merits of using random numbers:Merits:1. Personalbiasiseliminatedasaselectiondependssolelyonchance .2. Arandomsampleisingeneralarepresentativesampleforahomogenous population.3. Thereisnoneedforthethoroughknowledgeof theunitsofthe population.4. Theaccuracy of asample can be testedby examining anothersamplefromthesameuniversewhentheuniverseisunknown.5. This method is also used in other methods of sampling.Limitations:1. Preparing lots or using random number tables is tedious whenthe population is large.2. Whenthereislargedifferencebetweentheunitsofpopulation,thesimplerandomsamplingmaynotbearepresentative sample.3. Thesizeofthesamplerequiredunderthismethodismorethan that required by stratified random sampling.4. Itisgenerallyseenthattheunitsofasimplerandomsamplelieapartgeographically.Thecostandtimeofcollectionofdata are more.2.5.2Stratified Random Sampling:Ofallthemethodsofsamplingtheprocedurecommonlyusedinsurveysisstratifiedsampling.Thistechniqueismainlyusedtoreducethepopulationheterogeneityandtoincreasetheefficiencyoftheestimates.Stratificationmeansdivisionintogroups.Inthismethodthepopulationisdividedintoanumberofsubgroupsorstrata.Thestratashouldbesoformedthateachstratum is homogeneous as far as possible. Then from each stratumasimplerandomsamplemaybeselectedandthesearecombinedtogether to form the required sample from the population.Types of Stratified Sampling:Therearetwotypesofstratifiedsampling.Theyareproportionaland non-proportional.Intheproportional sampling20equalandproportionaterepresentationisgiventosubgroupsorstrata.If the numberof items is large, the sample will have a highersize and vice versa.Thepopulationsizeisdenotedby Nandthesamplesizeisdenoted by nthe sample size is allocated to each stratum in such away thatthe sample fractions is a constant for each stratum. That isgivenbyn/N=c.Sointhismethodeachstratumisrepresentedaccording to its size.Innon-proportionatesample,equalrepresentationisgivento all the sub-strata regardless of their existence in the population.Example2:Asampleof50studentsisto bedrawnfromapopulationconsistingof500studentsbelongingtotwoinstitutionsAandB.ThenumberofstudentsintheinstitutionAis200andtheinstitutionBis300.Howwillyoudrawthesampleusingproportional allocation?Solution:Therearetwostratain thiscasewith sizesN1 = 200 and N2 = 300and the total population N = N1 + N2 = 500The sample size is 50.If n1 and n2 are the sample sizes,Nnn1 = N1=50050200=20Nnn2= N2=50050300=30The sample sizes are 20 from A and 30 from B. Then theunits from each institution are to be selected by simple randomsampling.Merits and limitations of stratified sampling:Merits:1. It is more representative.2. It ensures greater accuracy.213. It is easy to administer as the universe is sub -divided.4. Greatergeographicalconcentrationreducestimeandexpenses.5. When the original population is badly skewed, this method isappropriate.6. For non homogeneous population, it may field good results.Limitations:1. To divide the population into homogeneous strata, it requires more money, time and statistical experience which is adifficult one.2. Improper stratification leads to bias, if the different strataoverlap such a sample will not be a representative one.2.5.3 Systematic Sampling:Thismethodiswidelyemployedbecauseofitseaseandconvenience.Afrequentlyusedmethodofsamplingwhenacompletelistof thepopulation is availableis systematicsampling.It is also called Quasi-random sampling.Selection procedure:The whole sample selection is based on just a random start .Thefirstunitisselectedwiththehelpofrandomnumbersandtherestgetselectedautomaticallyaccordingtosomepredesignedpatternisknown as systematicsampling. With systematic randomsamplingevery Kthelementin the frame is selected for the sample,withthestartingpointamongthefirstKelementsdeterminedatrandom.Forexample,ifwewanttoselectasampleof50studentsfrom 500 students under this method Kth item is picked up from thesampling frame and K is called the sampling interval.Sampling interval, K=size Samplesize PopulationnN= K=50500 = 10K = 10is the sampling interval. Systematic sample consistsinselectingarandomnumbersayi KandeveryKthunit22subsequently.Supposetherandomnumber iis5,then weselect5,15,25,35,45,.Therandomnumber iiscalledrandomstart.Thetechniquewill generateKsystematicsampleswith equalprobability.Merits :1. This method is simple and convenient.2. Time and work is reduced much.3. If proper care is taken result will be accurate.4. It can be used in infinite population.Limitations:1.Systematic sampling may not represent the whole population.2.There is a chance of personal bias of the investigators.Systematicsamplingispreferablyusedwhentheinformationistobecollectedfromtreesinaforest,houseinblocks, entries in a register which are in a serial order etc.Exercise 2I. Choose the best Answer:1. Sampling is inevitable in the situations(a) Blood test of a person(b) When the population is infinite(c) Testing of life of dry battery cells(d) All the above2. The difference between sample estimate and populationparameter is termed as(a)Human error (b) Sampling error(c)Non-sampling error (d) None of the above3.If each and every unit of population has equal chance of beingincluded in the sample, it is known as (a) Restricted sampling(b) Purposive sampling (c) Simple random sampling(d) None of the above4. Simple random sample can be drawn with the help of (a)Slip method (b) Random number table (c)Calculator(d) All the above23 5. A selectionprocedure of a sample having no involvement of probability is known as(a)Purposive sampling (b) Judgement sampling(c)Subjective sampling (d) All the above 6. Five establishments are to be selected from a list of 50 establishments by systematic random sampling. If the first number is 7, the next one is (a)8 (b) 16 (c) 17 (d) 21II. Fill in the blanks:7.A population consisting of an unlimited number of units iscalled an ________ population8.If all the units of a population are surveyed it is called________9.The discrepancy between a parameter and its estimate due tosampling process is known as _______10. The list of all the items of a population is known as ______11. Stratified sampling is appropriate when population is_________12. When the items are perishable under investigation it is notpossible to do _________13. When the population consists of units arranged in asequencewould prefer ________ sampling14. For ahomogeneous population, ________ sampling isbetter than stratified random sampling.III.Answer the following questions:15. Define a population16. Define finite and infinite populations with examples17. What is sampling?18. Define the following terms(a) Sample (b) Sample size (c) census(d) Sampling unit(e) Sampling frame19. Distinguish between census and sampling20. What are the advantages of sampling over completeenumeration.21. Why do we resort to sampling?22. What are the limitations of sampling?2423. State the principles of sampling24. What are probability and non-probability sampling?25. Define purposive sampling. Where it is used?26. What is called mixed sampling?27. Define a simple random sampling.28. Explain the selection procedure of simple randomSampling.29. Explain the two methods of selecting a simple randomsampling.30. What is a random number table? How will you select therandom numbers?31. What are the merits and limitations of simple randomsampling?32. What circumstances stratified random sampling is used?33. Discuss the procedure of stratified random sampling. Giveexamples.34. What is the objective of stratification?35. What are the merits and limitations of stratified randomsampling?36. Explain systematic sampling37. Discuss the advantages and disadvantages of systematicrandom sampling38. Give illustrations of situations where systematic sampling isused.39. A population of size 800 is divided into 3 strata of sizes 300, 200, 300 respectively. A stratified sample size of 160 isto be drawn from the population. Determine the sizes of the samples from each stratum under proportional allocation.40. Using the random number table, make a random numberselection of 8 plots out of 80 plots in an area.41. There are 50 houses in a street. Select a sample of10 housesfor a particular study using systematic sampling.IV. Suggested activities:42. (a) List any five sampling techniques used in yourenvironment(b) List any five situations where we adoptcensus method.(i.e) complete enumeration).2543. Select a sample of students in yourschool (for a particular competition function) at primary, secondary highersecondary levels using stratified sampling usingproportional allocation.44. Select a sample of 5 students from your class attendance register using method of systematic sampling.Answers:I.1. (d) 2.(b) 3. (c) 4.(d) 5.(d) 6.(c)II.7.infinite8. complete enumeration or census9.sampling error10. sampling frame11.heterogeneous orNon- homogeneous12. complete enumeration13. systematic14. simple randomPOPULATION OF INDIA 2001POPULATION OF INDIA 2001India/State/Unionterritories*PERSONS MALES FEMALESPopulationVariation1991-2001Sex ratio(femalesperthousandmales)INDIA 1,2 1,027,015,247 531,277,078 495,738,169 21.34 933Andaman &Nicobar Is.*356,265 192,985 163,280 26.94 846AndhraPradesh75,727,541 38,286,811 37,440,730 13.86 978ArunachalPradesh1,091,117 573,951 517,166 26.21 901Assam 26,638,407 13,787,799 12,850,608 18.85 932Bihar 82,878,796 43,153,964 39,724,832 28.43 921Chandigarh* 900,914 508,224 392,690 40.33 77326Chhatisgarh 20,795,956 10,452,426 10,343,530 18.06 990Dadra &Nagar Haveli*220,451 121,731 98,720 59.20 811Daman &Diu*158,059 92,478 65,581 55.59 709Delhi* 13,782,976 7,570,890 6,212,086 46.31 821Goa 1,343,998 685,617 658,381 14.89 960Gujarat 5 50,596,992 26,344,053 24,252,939 22.48 921Haryana 21,082,989 11,327,658 9,755,331 28.06 861HimachalPradesh 46,077,248 3,085,256 2,991,992 17.53 970Jammu &Kashmir 2,310,069,917 5,300,574 4,769,343 29.04 900Jharkhand 26,909,428 13,861,277 13,048,151 23.19 941Karnataka 52,733,958 26,856,343 25,877,615 17.25 964Kerala 31,838,619 15,468,664 16,369,955 9.42 1,058Lakshadweep* 60,595 31,118 29,477 17.19 947MadhyaPradesh60,385,118 31,456,873 28,928,245 24.34 920Maharashtra 96,752,247 50,334,270 46,417,977 22.57 922Manipur 2,388,634 1,207,338 1,181,296 30.02 978Meghalaya 2,306,069 1,167,840 1,138,229 29.94 975Mizoram 891,058 459,783 431,275 29.18 938Nagaland 1,988,636 1,041,686 946,950 64.41 909Orissa 36,706,920 18,612,340 18,094,580 15.94 972Pondicherry* 973,829 486,705 487,124 20.56 1,001Punjab 24,289,296 12,963,362 11,325,934 19.76 874Rajasthan 56,473,122 29,381,657 27,091,465 28.33 922Sikkim 540,493 288,217 252,276 32.98 875Tamil Nadu 62,110,839 31,268,654 30,842,185 11.19 986Tripura 3,191,168 1,636,138 1,555,030 15.74 950Uttar Pradesh 166,052,859 87,466,301 78,586,558 25.80 898Uttaranchal 8,479,562 4,316,401 4,163,161 19.20 964West Bengal 80,221,171 41,487,694 38,733,477 17.84 93427Notes:1. ThepopulationofIndiaincludestheestimatedpopulationofentireKachchhdistrict,Morvi,Maliya-MiyanaandWankanertalukasofRajkotdistrict,JodiyatalukaofJamanagardistrictofGujaratStateandentireKinnaurdistrictofHimachal PradeshwherepopulationenumerationofCensusofIndia2001couldnotbeconductedduetonatural calamity.2. ForworkingoutdensityofIndia,theentireareaandpopulationofthoseportionsofJammuandKashmirwhichareunderillegal occupationof Pakistan and China have notbeen taken into account.3. FiguresshownagainstPopulationintheage-group0-6andLiteratesdonotincludethefiguresofentireKachchhdistrict,Morvi,Maliya-MiyanaandWankanertalukasofRajkotdistrict,JodiyatalukaofJamanagardistrictandentireKinnaurdistrictofHimachalPradeshwherepopulationenumerationofCensusofIndia2001couldnotbe conducted due to natural calamity.4. FiguresshownagainstHimachal PradeshhavebeenarrivedatafterincludingtheestimatedfiguresofentireKinnaurdistrictofHimachalPradeshwherethepopulationenumerationofCensusofIndia2001couldnotbeconducted due to natural calamity.5. FiguresshownagainstGujarathavebeenarrivedatafterincludingtheestimatedfiguresofentireKachchhdistrict,Morvi,Maliya-MiyanaandWankanertalukasofRajkotdistrict,Jodiyatalukaof JamnagardistrictofGujaratStatewherethepopulationenumerationof CensusofIndia2001could not be conducted due to natural calamity.283. COLLECTION OF DATA,CLASSIFICATION AND TABULATION3.1 Introduction:Everybody collects,interpretsanduses information,muchofitinanumericalorstatisticalformsinday-to-daylife.Itisacommon practice that people receive large quantities of informationeveryday through conversations,televisions, computers, the radios,newspapers,posters,noticesandinstructions.Itisjustbecausethereissomuchinformationavailablethatpeopleneedtobeabletoabsorb,selectandrejectit.Ineverydaylife,inbusinessandindustry,certainstatisticalinformationisnecessaryanditisindependenttoknowwheretofindithowtocollectit.Asconsequences,everybody hastocomparepricesandquality beforemakinganydecisionaboutwhatgoodstobuy.Asemployeesofanyfirm,peoplewanttocomparetheirsalariesandworkingconditions, promotion opportunities and so on.In time the firms ontheir part want to control costs and expand their profits.Oneofthemainfunctionsofstatisticsisto provideinformationwhichwillhelponmakingdecisions.Statisticsprovidesthetypeofinformationbyprovidingadescriptionofthepresent,aprofileofthepastandanestimateofthefuture.Thefollowingaresomeoftheobjectivesofcollectingstatisticalinformation.1. Todescribethemethodsofcollectingprimarystatisticalinformation.2. To consider the status involved in carrying out a survey.3. Toanalysetheprocessinvolvedinobservationandinterpreting.4. To define and describe sampling.5. To analyse the basis of sampling.6. To describe a variety of sampling methods.Statisticalinvestigationisacomprehensiveandrequiressystematiccollectionofdataaboutsomegroupofpeopleorobjects,describingandorganizingthedata,analyzing the data with29thehelpofdifferentstatisticalmethod,summarizingtheanalysisandusingtheseresultsformakingjudgements,decisionsandpredictions.Thevalidityandaccuracyoffinaljudgementismostcrucialanddependsheavilyonhowwell thedatawascollectedinthe firstplace. The quality of data will greatly affect the conditionsandhenceatmostimportancemustbegiventothisprocessandeverypossibleprecautionsshouldbetakentoensureaccuracywhile collecting the data.3.2 Nature of data:Itmay be noted thatdifferenttypes of data can be collectedfor different purposes. The data can be collected in connection withtimeorgeographicallocationorinconnectionwithtimeandlocation.The following are the three types of data:1. Time series data.2. Spatial data3. Spacio-temporal data.3.2.1 Time series data:Itis a collection of a set of numerical values, collected overaperiodoftime.Thedatamighthavebeencollectedeitheratregular intervals of time or irregular intervals of time.Example 1:The following is the data for the three types of expendituresin rupees for a family for the four years 2001,2002,2003,2004.3.2.2 Spatial Data:If the data collected is connected with that of a place, then itis termed as spatial data. For example, the data may beYear Food Education Others Total20012002200320043000350040005000200030003500500030004000500060008000105001250016000301. Numberofrunsscoredbyabatsmanindifferenttestmatches in a test series at different places2. District wise rainfall in Tamilnadu3. Prices of silver in four metropolitan citiesExample 2:The population of the southern states of India in 1991.State PopulationTamilnadu 5,56,38,318Andhra Pradesh 6,63,04,854Karnataka 4,48,17,398Kerala 2,90,11,237Pondicherry7,89,4163.2.3 Spacio Temporal Data:If the data collected is connected to the time as well as placethen it is known as spacio temporal data.Example 3:Population State1981 1991Tamil Nadu 4,82,97,456 5,56,38,318Andhra Pradesh 5,34,03,619 6,63,04,854Karnataka 3,70,43,451 4,48,17,398Kerala 2,54,03,217 2,90,11,237Pondicherry6,04,1367,89,4163.3 Categories of data:Anystatisticaldatacanbeclassifiedundertwocategoriesdepending upon the sources utilized.Thesecategories are,1. Primary data 2.Secondary data3.3.1 Primary data:Primarydataistheone,whichiscollectedbytheinvestigatorhimselfforthepurposeofaspecificinquiryorstudy.Suchdataisoriginalincharacterandisgeneratedbysurveyconductedbyindividualsorresearchinstitutionoranyorganisation.31Example 4:Ifaresearcherisinterestedtoknowtheimpactofnoon-mealschemefortheschoolchildren,hehastoundertakeasurveyandcollectdataontheopinionofparentsandchildrenbyaskingrelevantquestions.Suchadatacollectedforthepurposeiscalledprimary data.Theprimarydatacanbecollectedbythefollowingfivemethods.1. Direct personal interviews.2. Indirect Oral interviews.3. Information from correspondents.4. Mailed questionnaire method.5. Schedules sent through enumerators.1. Direct personal interviews:Thepersonsfromwhominformationsarecollectedareknownasinformants.Theinvestigatorpersonallymeetsthem andasksquestionstogatherthe necessaryinformations.Itisthesuitablemethodforintensive rather than extensive fieldsurveys. Itsuits best for intensive study of the limited field.Merits:1. Peoplewillinglysupplyinformationsbecausetheyareapproachedpersonally.Hence,moreresponsenoticedinthis method than in any other method.2. Thecollectedinformationsarelikelytobeuniformandaccurate. The investigator is there to clear the doubts of theinformants.3. Supplementaryinformationsoninformant spersonalaspectscanbenoted.Informationsoncharacterandenvironment may help later to interpret some of the results.4. Answersforquestionsaboutwhichtheinformantislikelyto be sensitive can be gathered by this method.5. The wordings in one or more questions can be altered to suitanyinformant.Explanationsmaybegiveninotherlanguagesalso.Inconvenienceandmisinterpretationsarethereby avoided.32Limitations:1. It is very costly and time consuming.2. Itisverydifficult,whenthenumberofpersonstobeinterviewedislargeandthepersonsarespreadovera widearea.3. Personal prejudice and bias are greater under this method.2. Indirect Oral Interviews:Underthismethodtheinvestigatorcontactswitnessesorneighbours or friends or some other third parties who are capable ofsupplyingthenecessaryinformation.Thismethodispreferredifthe required information is on addiction orcause of fire ortheft ormurderetc.,Ifafirehasbrokenoutacertainplace,thepersonslivinginneighbourhoodandwitnessesarelikelytogiveinformation on the cause of fire.In some cases, police interrogatedthirdpartieswhoaresupposedtohaveknowledgeofatheftoramurderandgetsomeclues.Enquirycommitteesappointedbygovernmentsgenerallyadoptthismethodandgetpeople sviewsand all possible details of facts relating to the enquiry.This methodis suitable whenever directsources do not exists or cannot be reliedupon or would be unwilling to part with the information.Thevalidity of the resultsdependsupon afewfactors,suchasthenatureofthepersonwhoseevidenceisbeingrecorded,theabilityoftheinterviewertodrawoutinformationfromthethirdpartiesbymeansofappropriatequestionsandcrossexaminations,andthenumberofpersonsinterviewed.Forthesuccessofthismethod one person or one group alone should not be relied upon.3. Information from correspondents:Theinvestigatorappointslocal agentsorcorrespondentsindifferentplacesandcompilestheinformationsentbythem.Informations toNewspapersandsome departments of Governmentcomebythismethod.Theadvantageofthismethodisthatitischeapandappropriateforextensiveinvestigations.Butitmay notensureaccurateresultsbecausethecorrespondentsarelikely tobenegligent,prejudicedandbiased.Thismethodisadoptedinthosecaseswhereinformationsaretobecollectedperiodicallyfromawide area for a long time.334.Mailed questionnaire method:Under this method a listof questions is prepared and is senttoalltheinformantsbypost.Thelistofquestionsistechnicallycalledquestionnaire.Acoveringletteraccompanyingthequestionnaireexplainsthepurposeoftheinvestigationandtheimportanceofcorrectinformationsandrequesttheinformantstofillintheblankspacesprovidedandtoreturntheformwithinaspecified time.This method is appropriate in those cases where theinformants are literates and are spread over a wide area.Merits:1. It is relatively cheap.2. Itispreferablewhentheinformantsarespreadoverthewide area.Limitations:1. Thegreatestlimitationisthattheinformantsshouldbeliterates who are able to understand and reply the questions.2. Itispossiblethatsomeofthepersonswhoreceivethequestionnaires do not return them.3. Itisdifficulttoverifythecorrectnessoftheinformationsfurnished by the respondents.Withtheviewofminimizingnon-respondentsandcollectingcorrectinformation,thequestionnaireshouldbecarefully drafted. There is no hard and fast rule.But the followinggeneralprinciplesmaybehelpfulinframingthequestionnaire.Acoveringletterandaselfaddressedandstampedenvelopeshouldaccompanythequestionnaire.Thecoveringlettershouldpolitelypointoutthe purpose of the survey and privilege of the respondentwhoisoneamongthefewassociatedwiththeinvestigation.Itshouldassurethattheinformationswouldbekeptconfidential andwouldneverbemisused.Itmay promiseacopy of the findings orfree gifts or concessions etc.,Characteristics of a good questionnaire:1. Number of questions should be minimum.2. Questionsshouldbeinlogical orders,movingfrom easy tomore difficult questions.343. Questions shouldbeshortandsimple.Technical terms andvagueexpressionscapableofdifferentinterpretationsshould be avoided.4. QuestionsfetchingYESorNOanswersarepreferable.Theremaybesomemultiplechoicequestionsrequiringlengthy answers are to be avoided.5. Personalquestionsandquestionswhichrequirememorypower and calculations should also be avoided.6. Questionshouldenablecrosscheck.Deliberateorunconscious mistakes can be detected to an extent.7. Questionsshouldbecarefullyframedsoastocovertheentire scope of the survey.8. Thewordingofthequestionsshouldbeproperwithouthurting the feelings or arousing resentment.9. Asfaraspossibleconfidentialinformationsshouldnotbesought.10. Physicalappearanceshouldbeattractive,sufficientspaceshould be provided for answering each questions.5.Schedules sent through Enumerators:Underthismethodenumeratorsorinterviewerstaketheschedules,meettheinformantsandfillingtheirreplies.Oftendistinctionismadebetweenthescheduleandaquestionnaire.Aschedule is filled by the interviewers in a face-to-face situation withtheinformant.Aquestionnaireisfilledby theinformantwhichhereceives and returns by post.It is suitable for extensive surveys.Merits:1. It can be adopted even if the informants are illiterates.2. Answersforquestionsof personal andpecuniary naturecanbe collected.3. Non-responseisminimumasenumeratorsgopersonallyand contact the informants.4. Theinformationscollectedarereliable.Theenumeratorscan be properly trained for the same.5. It is most popular methods.Limitations:1. It is the costliest method.352. Extensivetrainingistobegiventotheenumeratorsforcollecting correct and uniform informations.3. Interviewingrequiresexperience.Unskilledinvestigatorsare likely to fail in their work.Before theactualsurvey,apilotsurvey isconducted.Thequestionnaire/Scheduleispre-testedinapilotsurvey.Afewamongthepeoplefromwhomactualinformationisneededareaskedtoreply.If they misunderstandaquestion orfinditdifficultto answer or do not like its wordings etc., it is to be altered.Furtherit is to be ensured that every questions fetches the desired answer.Merits and Demerits of primary data:1. The collection of data by the method of personal survey ispossibleonlyifthearea coveredbytheinvestigatorissmall.Collectionofdatabysendingtheenumeratorisboundtobeexpensive.Careshouldbetakentwicethatthe enumerator record correct information provided by theinformants.2. Collectionofprimarydatabyframingaschedulesordistributingandcollectingquestionnairesbypostislessexpensive and can be completed in shorter time.3. Suppose the questions are embarrassing or of complicatednatureorthequestionsprobeintopersonnelaffairsofindividuals,thentheschedulesmaynotbefilledwithaccurate and correctinformation and hence this method isunsuitable.4. Theinformationcollectedforprimarydataismerereliable than those collected from the secondary data.3.3.2 Secondary Data:Secondarydataarethosedatawhich havebeenalreadycollectedandanalysedby someearlieragency foritsownuse; andlaterthesamedataareusedbyadifferentagency.AccordingtoW.A.Neiswanger, Aprimarysourceisapublicationinwhichthedataarepublishedbythesameauthority whichgatheredandanalysedthem.Asecondarysourceisapublication,reportingthedatawhichhavebeengatheredbyotherauthoritiesandforwhichothers are responsible .36Sources of Secondary data:Inmostof thestudiestheinvestigatorfindsitimpracticabletocollectfirst-handinformation on all relatedissuesand as such hemakesuseofthedatacollectedby others.Thereisa vastamountofpublishedinformationfromwhichstatisticalstudiesmaybemadeandfreshstatisticsareconstantlyin astateofproduction.Thesourcesofsecondarydatacan broadly beclassifiedundertwoheads:1. Published sources, and2. Unpublished sources.1.Published Sources:The various sources of published data are:1. Reports and official publications of(i) InternationalbodiessuchastheInternationalMonetaryFund,InternationalFinanceCorporationandUnitedNations Organisation.(ii) Central and State Governments such as the Report of theTandon Committee and Pay Commission.2. Semi-officialpublicationofvariouslocal bodiessuchasMunicipal Corporations and District Boards.3.Private publications-such as the publications of (i) Tradeandprofessional bodiessuch astheFederation ofIndianChambersofCommerceandInstituteofChartered Accountants.(ii) Financialandeconomicjournalssuchas Commerce , Capitaland Indian Finance .(iii) Annual reports of joint stock companies.(iv)Publications brought out by research agencies, researchscholars, etc.Itshouldbenotedthatthepublicationsmentionedabovevarywithregardtotheperiodicallyofpublication.Somearepublishedatregularintervals(yearly,monthly,weeklyetc.,)whereasothersareadhocpublications,i.e.,withnoregularityabout periodicity of publications.Note: Alotofsecondarydataisavailableintheinternet.Wecanaccess it at any time for the further studies.372. Unpublished SourcesAllstatisticalmaterialisnotalwayspublished.TherearevarioussourcesofunpublisheddatasuchasrecordsmaintainedbyvariousGovernmentandprivateoffices,studiesmadebyresearchinstitutions,scholars,etc.SuchsourcescanalsobeusedwherenecessaryPrecautions in the use ofSecondary dataThefollowingaresomeofthepointsthataretobeconsidered in the use of secondary data1. How the data has been collected and processed2. The accuracy of the data3. How far the data has been summarized4. How comparable the data is with other tabulations5. Howtointerpretthe data,especially when figures collectedfor one purpose is used for anotherGenerallyspeaking,withsecondarydata,peoplehavetocompromisebetweenwhattheywantandwhattheyareabletofind.Merits and Demerits of Secondary Data:1. Secondarydataischeaptoobtain.Manygovernmentpublicationsarerelativelycheapandlibrariesstockquantitiesofsecondarydataproducedbythegovernment,by companies and other organisations.2. Largequantitiesofsecondarydatacanbegotthroughinternet.3. Much of thesecondary dataavailablehasbeen collectedformany years and therefore it can be used to plot trends.4. Secondary data is of value to:- Thegovernment helpinmakingdecisionsandplanning future policy.- Businessandindustry inareassuchasmarketing,andsalesinordertoappreciatethegeneral economicandsocialconditionsandtoprovideinformationoncompetitors.- Researchorganisations byprovidingsocial,economical and industrial information.383.4 Classification:Thecollecteddata,alsoknownasrawdataorungroupeddataarealwaysinanunorganisedformandneedtobeorganisedandpresentedinmeaningfulandreadilycomprehensibleforminordertofacilitatefurtherstatisticalanalysis.Itis,therefore,essentialforaninvestigatortocondenseamassofdataintomoreandmorecomprehensibleandassimilableform.Theprocessofgroupingintodifferentclassesorsubclassesaccordingtosomecharacteristicsisknownasclassification,tabulationisconcernedwith thesystematicarrangementandpresentation of classified data.Thus classification is the first step in tabulation.ForExample,lettersinthepostofficeareclassifiedaccordingtotheirdestinationsviz.,Delhi,Madurai,Bangalore,Mumbai etc.,Objects of Classification:The following are main objectives of classifying the data:1. It condenses the mass of data in an easily assimilable form.2. It eliminates unnecessary details.3. Itfacilitatescomparisonandhighlightsthesignificantaspect of data.4. It enables one to get a mental picture of the information andhelps in drawing inferences.5. Ithelpsinthestatisticaltreatmentoftheinformationcollected.Types of classification:Statisticaldataareclassifiedinrespectoftheircharacteristics.Broadlytherearefourbasictypesofclassificationnamelya) Chronological classificationb) Geographical classificationc) Qualitative classificationd) Quantitative classificationa) Chronological classification:Inchronologicalclassificationthecollecteddataarearranged according to the order of time expressed in years, months,weeks,etc.,Thedataisgenerallyclassifiedinascendingorderof39time.Forexample,the data related with population, sales of a firm,importsandexportsofacountryarealwayssubjectedtochronological classification.Example 5:The estimates of birth rates in India during 1970 76 areYear 1970 1971 1972 1973 1974 1975 1976BirthRate 36.836.936.634.634.535.234.2b) Geographical classification:In this type of classification the data are classified accordingtogeographicalregionorplace.Forinstance,theproductionofpaddyindifferentstatesinIndia,productionofwheatindifferentcountries etc.,Example 6:Country America China Denmark France IndiaYieldofwheatin(kg/acre)1925 893 225 439 862c) Qualitative classification: Inthistypeofclassification dataareclassifiedon thebasisof same attributes orquality like sex, literacy, religion, employmentetc., Such attributes cannot be measured along with a scale.Forexample,ifthepopulationtobeclassifiedin respecttooneattribute,saysex,thenwecanclassify them intotwonamelythatof males and females. Similarly, they can also be classified into employedor unemployedonthebasisofanotherattribute employment . Thuswhentheclassificationisdonewithrespecttooneattribute,whichisdichotomousinnature,twoclassesareformed,onepossessingtheattributeandtheothernotpossessingtheattribute. This type of classification is called simple or dichotomousclassification.A simple classification may be shown as under40PopulationMale FemaleThe classification,wheretwoormoreattributesareconsideredandseveralclassesareformed,iscalledamanifoldclassification.Forexample,ifweclassifypopulationsimultaneouslywithrespecttotwoattributes,e.gsexandemployment,thenpopulationarefirstclassifiedwithrespectto sex into malesand females . Each of these classes may thenbe further classified into employmentand unemploymenton thebasisofattribute employmentandassuchPopulationareclassified into four classes namely.(i) Male employed(ii) Male unemployed(iii) Female employed(iv) Female unemployedStilltheclassificationmaybefurtherextendedbyconsideringotherattributeslikemaritalstatusetc.Thiscanbeexplained by the following chartPopulation MaleFemale Employed Unemployed EmployedUnemployedd) Quantitative classification:Quantitativeclassification refers tothe classification of dataaccordingtosomecharacteristicsthatcanbemeasuredsuchasheight,weight,etc.,Forexamplethestudentsofacollegemay beclassified according to weight as given below.41Weight (in lbs) No of Students 90-10050100-110200110-120260120-130360130-140 90140-150 40Total 1000Inthistypeof classification therearetwoelements,namely(i)thevariable(i.e)theweightintheaboveexample,and(ii)thefrequencyinthenumberofstudentsineachclass.Thereare50studentshavingweightsrangingfrom90to100lb,200studentshaving weight ranging between 100 to 110 lb and so on.3.5 Tabulation:Tabulationistheprocessofsummarizingclassifiedorgroupeddataintheformofatablesothatitiseasilyunderstoodand an investigator is quickly able to locate the desired information.Atableisasystematicarrangementofclassifieddataincolumnsandrows.Thus,astatisticaltablemakesitpossiblefortheinvestigator to present a huge mass of data in a detailed and orderlyform.Itfacilitates comparisonandoften revealscertain patterns indatawhichareotherwisenotobvious.Classificationand Tabulation ,asamatteroffact,arenottwodistinctprocesses.Actually they go together.Before tabulation data are classified andthen displayed under different columns and rows of a table.Advantages of Tabulation:Statisticaldataarrangedinatabularformservefollowingobjectives:1. Itsimplifiescomplexdataandthedatapresentedareeasilyunderstood.2. It facilitates comparison of related facts.3. Itfacilitatescomputation of various statistical measures likeaverages, dispersion, correlation etc.424. Itpresentsfactsinminimumpossiblespaceandunnecessaryrepetitionsandexplanationsareavoided.Moreover, the needed information can be easily located.5. Tabulateddataaregoodforreferencesandtheymakeiteasiertopresenttheinformationintheform ofgraphsanddiagrams.Preparing a Table:Themakingofacompacttableitselfanart.Thisshouldcontainalltheinformationneededwithinthesmallestpossiblespace.Whatthepurposeoftabulationisandhowthetabulatedinformationistobeusedarethemainpointstobekeptinmindwhilepreparingforastatistical table.Anideal tableshouldconsistof the following main parts:1. Table number2. Title of the table3. Captions or column headings4. Stubs or row designation5. Body of the table6. Footnotes7. Sources of dataTable Number:Atableshouldbenumberedforeasyreferenceandidentification.Thisnumber,ifpossible,shouldbewritteninthecentreatthetopofthetable.Sometimesitisalsowrittenjustbefore the title of the table.Title:Agoodtableshouldhaveaclearlyworded,briefbutunambiguoustitleexplainingthenatureofdatacontainedinthetable.Itshouldalsostatearrangementofdataandtheperiodcovered.Thetitleshouldbeplacedcentrallyonthetopofatablejustbelowthetablenumber(orjustaftertablenumberin the sameline).Captions or column Headings:Captionsinatablestandsforbriefandselfexplanatoryheadingsofverticalcolumns.Captionsmayinvolveheadingsand43sub-headingsaswell.Theunitofdatacontainedshouldalsobegivenforeachcolumn.Usually,arelativelylessimportantandshorter classification should be tabulated in the columns.Stubs or Row Designations:Stubsstandsforbriefandselfexplanatoryheadingsofhorizontalrows.Normally,arelativelymoreimportantclassificationisgiveninrows.Alsoavariablewithalargenumberofclassesisusuallyrepresentedinrows.Forexample,rowsmaystandforscoreofclassesandcolumnsfordatarelatedtosexofstudents. In the process, there willbe many rows for scores classesbut only two columns for male and female students.A model structure of a table is given below:Table Number Title of the TableSubHeadingCaption HeadingsCaption Sub-HeadingsTotalBodyTotalFoot notes:Sources Note:Stub Sub- Headings44Body:The body of the table contains the numerical information offrequencyofobservationsinthedifferentcells.Thisarrangementof data is according to the discriptionof captions and stubs.Footnotes:Footnotesaregiven atthefootofthetableforexplanationofanyfactorinformationincludedinthetablewhichneedssomeexplanation.Thus,theyaremeantforexplainingorprovidingfurtherdetailsaboutthedata,thathavenotbeencoveredintitle,captions and stubs.Sources of data:Lastlyoneshouldalsomentionthesourceofinformationfromwhichdataaretaken.Thismaypreferablyincludethenameof the author, volume, page and the year of publication. This shouldalsostatewhetherthedatacontainedin thetableisof primary orsecondarynature.Requirements of a Good Table:A goodstatistical tableisnotmerely acareless groupingofcolumnsandrowsbutshouldbesuchthatitsummarizesthetotalinformationinaneasilyaccessibleforminminimumpossiblespace.Thuswhilepreparingatable,onemusthaveaclearideaoftheinformationtobepresented,thefactstobecomparedandhepoints to be stressed.Though,thereisnohardandfastruleforformingatableyet a few general point should be kept in mind:1. Atableshouldbeformedinkeepingwiththeobjectsofstatistical enquiry.2. Atableshouldbecarefullypreparedsothatitiseasilyunderstandable.3. A table should be formed soas to suit the size of the paper.Butsuchanadjustmentshouldnotbeatthecostoflegibility.4. Ifthefiguresinthetablearelarge,they shouldbesuitablyroundedorapproximated.Themethodofapproximationand units of measurements too should be specified.455. Rowsandcolumnsinatableshouldbenumberedandcertain figures to be stressed may be put in boxor circleor in bold letters.6. Thearrangementsofrowsandcolumnsshouldbeinalogicalandsystematicorder.Thisarrangementmaybealphabetical,chronological or according to size.7. Therowsandcolumnsareseparatedbysingle,doubleorthicklines torepresentvarious classes and sub-classes used.Thecorrespondingproportionsorpercentagesshouldbegiveninadjoiningrowsandcolumnstoenablecomparison.Averticalexpansionofthetableisgenerallymoreconvenient than the horizontal one.8. The averages ortotals of differentrows should be given attherightofthetableandthatofcolumnsatthebottom ofthetable.Totalsforeverysub-classtooshouldbementioned.9. In case it is notpossible to accommodate all the informationinasingletable,itisbettertohavetwoormorerelatedtables.Type of Tables:Tablescan beclassifiedaccordingtotheirpurpose,stageofenquiry,natureofdataornumberofcharacteristicsused.Onthebasisofthenumberofcharacteristics,tablesmaybeclassifiedasfollows:1. Simple or one-way table 2. Two way table3. Manifold tableSimple or one-way Table:Asimpleorone-waytableisthesimplesttablewhichcontainsdataofonecharacteristiconly.Asimpletableiseasy toconstructandsimpletofollow.Forexample,theblanktablegivenbelowmay beusedtoshowthenumberofadultsindifferentoccupations in a locality.The number of adults in different occupations in a localityOccupations No. Of AdultsTotal46Two-way Table:A table,which contains data on two characteristics, is called a two-way table.Insuch case,therefore,eitherstub orcaption isdividedintotwoco-ordinateparts.Inthegiventable,asanexamplethecaption may be further divided in respect of sex . This subdivisionisshownintwo-way table,whichnowcontainstwocharacteristicsnamely, occupation and sex.TheumberofadultsinalocalityinrespectofoccupationandsexNo. of Adults OccupationMale FemaleTotalTotalManifold Table:Thus,moreandmorecomplextablescanbeformedbyincludingothercharacteristics.Forexample,wemayfurtherclassifythecaptionsub-headingsintheabovetableinrespectofmaritalstatus,religionandsocio-economicstatusetc.Atable ,which has more than two characteristics of data is consideredasa manifoldtable.Forinstance,tableshownbelowshowsthreecharacteristics namely, occupation, sex and marital status.No. of Adults OccupationMale FemaleTotalM U Total M U TotalTotalFoot note: M Stands for Married and U stands for unmarried.47Manifoldtables,thoughcomplexaregoodinpracticeastheseenablefullinformationtobeincorporatedandfacilitateanalysisofallrelatedfacts.Still,asanormalpractice,notmorethan four characteristics should be represented in one table to avoidconfusion.Otherrelatedtablesmaybeformedtoshowtheremaining characteristicsExercise - 3I. Choose the best answer:1.When the collected data is grouped with reference totime, we havea) Quantitative classificationb) Qualitative classification c) Geographical Classification d)Chorological Classification 2. Most quantitative classifications are a)Chronological b)Geographicalc) Frequency Distributiond)None of these3.Caption stands fora) A numerical information b) The column headingsc) The row headingsd) The table headings 4. A simple table contains data on a) Two characteristics b) Several characteristics c) One characteristicd) Three characteristics5. The headings of the rows given in the first column of atable are calleda) Stubs b) Captionsc) Titles d) Reference notesII.Fill in the blanks:6. Geographical classification means, classification of dataaccording to _______.7. The data recorded according to standard of education likeilliterate, primary, secondary, graduate, technical etc, willbe known as _______ classification.8. An arrangement of data into rows and columns is known as_______.9. Tabulation follows ______.10. In a manifold table we have data on _______.48III.Answer the following questions:11. Define three types of data.12. Define primary and secondary data.13. What are the points that are to be considered in the use ofsecondary data?14. What are the sources of secondary data?15. Give the merits and demerits of primary data.16. State the characteristics of a good questionnaire.17. Define classification.18. What are the main objects of classification?19. Write a detail note on the types of classification.20. Define tabulation.21. Give the advantages of tabulation.22. What are the main parts of an ideal table? Explain.23. What are the essential characteristics of a good table?24. Define one-way and two-way table.25. Explain manifold table with example.IV. Suggested Activities:26. Collect a primary data about the mode of transport of yourschool students.Classify the data and tabulate it.27. Collect the important and relevant tables from varioussources and include these in your album note book.Answers:1. (d) 2. (c) 3. (b) 4. ( c) 5. (a)6. Place7. Qualitative8. Tabulation9. Classification10. More than two characteristics494.FREQUENCY DISTRIBUTION4.1 Introduction:Frequencydistributionisaserieswhenanumberofobservationswithsimilarorcloselyrelatedvaluesareputinseparate bunches or groups, each group being in order of magnitudeinaseries.Itissimply atableinwhichthedataaregroupedintoclassesandthenumberofcaseswhichfallineachclassarerecorded.It showsthe frequency of occurrence of different valuesof a single Phenomenon.A frequency distribution is constructed for three main reasons:1. To facilitatethe analysis of data.2. To estimatefrequenciesof the unknown populationdistributionfrom the distribution of sample data and3. To facilitate the computation of various statisticalmeasures4.2 Raw data:The statistical data collected are generally raw data orungrouped data.Let us consider the daily wages (in Rs ) of 30labourers in a factory.80 70 55 50 60 65 40 30 80 9075 45 35 65 70 80 82 55 65 8060 55 38 65 75 85 90 65 45 75Theabovefiguresarenothingbutraworungroupeddataandthey are recordedas they occur without any pre consideration.Thisrepresentationofdatadoesnotfurnish any useful informationand is rather confusing to mind.A better way to express the figuresin an ascending or descending orderof magnitude and is commonlyknown as array. But this does not reduce the bulk of the data.Theabove data when formed into an array is in the following form:30 35 38 40 45 45 50 55 55 5560 60 65 65 65 65 65 65 70 7075 75 75 80 80 80 80 85 90 9050Thearrayhelpsustoseeatoncethemaximumandminimumvalues.Italsogivesaroughideaofthedistributionoftheitemsovertherange.When wehavealargenumberof items,the formation of an array is very difficult, tedious and cumbersome.The Condensationshouldbedirectedforbetterunderstanding andmay be done in two ways, depending on the nature of the data.a)Discrete (or) Ungrouped frequency distribution: In this form of distribution, the frequency refers todiscretevalue.Herethedataarepresentedinawaythatexactmeasurement of units are clearly indicated.Therearedefinitedifferencebetweenthevariablesofdifferentgroupsofitems.Eachclassisdistinctandseparatefromtheotherclass.Non-continuityfromoneclasstoanotherclassexist.Dataassuchfactslikethenumberofroomsinahouse,thenumberofcompaniesregisteredinacountry,thenumberofchildren in a family, etc.Theprocessofpreparingthistypeofdistribution isverysimple.Wehavejusttocountthenumberoftimesaparticularvalueisrepeated,whichiscalledthefrequencyofthatclass.Inorder to facilitate counting prepare a column of tallies.Inanothercolumn,placeallpossiblevaluesofvariablefromthelowesttothehighest.Thenputabar(Verticalline)opposite the particular value to which it relates.Tofacilitate counting, blocks of five barsare preparedand some space is leftin between each block.We finally countthenumber of bars and get frequency.Example 1:Inasurveyof40familiesinavillage,thenumberofchildrenperfamily was recorded and the following data obtained.1 0 3 2 1 5 6 22 1 0 3 4 2 1 63 2 1 5 3 3 2 42 2 3 0 2 1 4 53 3 4 4 1 2 4 5Representthedataintheformofadiscretefrequencydistribution.51Solution:Frequency distribution of the number of childrenNumber ofChildrenTallyMarksFrequency0 31 72 103 84 65 46 2Total 40b) Continuous frequency distribution: In this form of distribution refers togroups of values.Thisbecomesnecessaryinthecaseofsomevariableswhichcantakeany fractional valueandin which casean exactmeasurementis notpossible.Hencea discretevariablecan bepresentedin theform ofa continuous frequency distribution.Wage distribution of 100 employeesWeekly wages(Rs)Number ofemployees50-100 4100-150 12150-200 22200-250 33250-300 16300-3508350-4005Total 100524.3Nature of class:Thefollowingaresomebasictechnicaltermswhenacontinuousfrequencydistributionisformedordataareclassifiedaccording to class intervals.a) Class limits:Theclasslimitsarethelowestandthehighestvaluesthatcanbeincludedintheclass.Forexample,taketheclass30-40.The lowest value of the class is 30 and highest class is 40.The twoboundariesofclassareknownasthelowerlimitsandtheupperlimitoftheclass.Thelowerlimitofaclassisthevaluebelowwhich therecan benoitem in theclass.Theupperlimitof aclassisthevalueabovewhichtherecanbenoitem tothatclass.Oftheclass60-79,60isthelowerlimitand79istheupperlimit,i.e.inthecasetherecanbenovaluewhichislessthan60ormorethan79.Thewayinwhichclasslimitsarestateddependsuponthenatureofthedata.Instatisticalcalculations,lowerclasslimitisdenoted by L and upper class limit by U.b) Class Interval:Theclassintervalmaybedefinedasthesizeofeachgroupingof data.Forexample,50-75, 75-100, 100-125areclassintervals.Eachgroupingbeginswiththelowerlimitofaclassintervalandendsatthelowerlimitofthenextsucceedingclassintervalc) Width or size of the class interval:Thedifferencebetweenthelowerandupperclasslimitsiscalled Width or size of class intervaland is denoted by C .d) Range:Thedifferencebetweenlargestandsmallestvalueoftheobservationis calledThe Range and is denoted by R ieR = Largest value Smallest value R=L - Se) Mid-value or mid-point:Thecentral pointof aclassinterval iscalledthemidvalueormid-point.Itisfoundoutby addingtheupperandlowerlimitsof a class and dividing the sum by 2.53(i.e.) Midvalue =L U2+Forexample, if the class interval is 20-30 then the mid-value is20 302+ = 25f) Frequency:Numberofobservationsfallingwithinaparticularclassinterval is called frequency of that class.Letusconsiderthefrequencydistributionofweightsifpersons working in a company.Weight(in kgs)Number ofpersons30-40 2540-50 5350-60 7760-70 9570-80 8080-90 60 90-100 30Total 420Intheaboveexample,theclassfrequencyare25,53,77,95,80,60,30.Thetotalfrequencyisequalto420.Thetotalfrequencyindicatethetotalnumberofobservationsconsidered in a frequency distribution.g) Number of class intervals:Thenumberofclassintervalinafrequencyismatterofimportance.Thenumberof classinterval shouldnotbetoomany.Foran ideal frequency distribution,thenumberof class intervalscanvary from 5to15.Todecidethenumberof class intervalsforthefrequency distributiveinthewholedata,wechoosethelowestandthehighestofthevalues.Thedifferencebetweenthemwillenable us to decidethe class intervals.Thusthenumberofclassintervalscanbefixedarbitrarilykeepinginviewthenatureofproblemunderstudyoritcanbe54decidedwiththehelpofSturgesRule.Accordingtohim,thenumber of classes can be determined by the formula K = 1 + 3. 322 log10 N WhereN =Total number of observations log= logarithm of the number K=Number of class intervals.Thusif the numberof observation is10,then the number ofclass intervals isK = 1 + 3. 322 log 10 = 4.3224If100observationsarebeingstudied,thenumberofclassinterval is K= 1 + 3. 322 log 100 =7.6448and so on.h) Size of the class interval:Sincethesizeoftheclassintervalisinverselyproportional tothenumberofclassinterval in agiven distribution.Theapproximatevalueofthesize(orwidthormagnitude)oftheclass interval Cis obtained by using sturges rule asSize of class interval = C =RangeNumber of class interval=10Range1+3.322 log NWhere Range = Largest Value smallest value in the distribution.4.4 Types of class intervals:There are three methods of classifying the data according toclass intervals namelya) Exclusive methodb) Inclusive methodc) Open-end classesa) Exclusive method:Whentheclassintervalsaresofixedthattheupper limitofoneclassisthelowerlimitofthenextclass;itisknownastheexclusivemethodofclassification.Thefollowingdataareclassified on this basis.55Expenditure(Rs.)No. of families 0 -5000 605000-10000 9510000-15000 12215000-20000 8320000-25000 40Total400Itisclearthattheexclusivemethodensurescontinuityofdata as much as the upper limit of one class is the lower limit of thenextclass.Intheaboveexample,therearesofamilieswhoseexpenditureisbetweenRs.0andRs.4999.99.AfamilywhoseexpenditureisRs.5000wouldbeincludedintheclassinterval5000-10000.This method is widely used in practice.b) Inclusive method:Inthismethod,theoverlappingoftheclassintervalsisavoided.Boththelowerandupperlimitsareincludedin theclassinterval.Thistypeofclassificationmaybeusedforagroupedfrequencydistributionfordiscretevariablelikemembersinafamily,number of workers in a factory etc., where the variable maytakeonlyintegralvalues.Itcannotbeusedwithfractional valueslike age, height, weight etc.This method may be illustrated as follows:Class interval Frequency5-9 710-14 1215-19 1520-29 2130-34 1035-39 5Total 70Thustodecidewhethertousetheinclusivemethodortheexclusivemethod,itisimportanttodeterminewhetherthe variable56underobservationinacontinuousordiscreteone.Incaseofcontinuousvariables,theexclusivemethodmustbeused.Theinclusive method should be used in case of discrete variable.c) Open end classes:Aclasslimitismissingeitheratthelowerendofthefirstclassintervalorattheupperendofthelastclassintervalorbotharenotspecified.Thenecessityofopenendclassesarisesinanumber of practical situations, particularly relating toeconomic andmedical datawhentherearefewveryhighvaluesorfewvery lowvalues which are far apart from the majority of observations.The example for the open-end classes as follows :Salary Range No ofworkersBelow 200072000 400054000 600066000 800048000 andabove 34.5 Construction of frequency table: Constructingafrequencydistributiondependsonthenatureof thegiven data.Hence, the following general considerationmaybe borne in mind for ensuring meaningful classification of data.1. Thenumberofclassesshouldpreferablybebetween5and20.However there is no rigidity about it.2. Asfaraspossibleone shouldavoidvaluesof class intervalsas3,7,11,26.etc.preferablyoneshouldhaveclass-intervalsofeitherfiveormultiplesof5like10,20,25,100etc.3. Thestartingpointi.ethelowerlimitofthefirstclass,should either be zero or 5 or multiple of 5.4. Toensurecontinuityandtogetcorrectclassintervalweshould adopt exclusive method.5. Whereverpossible,itisdesirabletouseclassintervalofequal sizes.574.6 Preparation of frequency table:Thepremiseofdataintheformoffrequencydistributiondescribesthebasicpatternwhichthedataassumesinthemass.Frequency distribution gives a better picture of the pattern of data ifthenumberof itemsislarge.If theidentity of theindividualsaboutwhom a particular information is taken, is not relevant then the firststepofcondensationistodividetheobservedrangeofvariableintoasuitablenumberofclass-intervalsandtorecordthenumberof observations in each class. Letus consider theweightsin kg of50 college students.42 62 46 54 41 37 54 44 32 4547 50 58 49 51 42 46 37 42 3954 39 51 58 47 64 43 48 49 4849 61 41 40 58 49 59 57 57 3456 38 45 52 46 40 63 41 51 41Herethesizeof theclassinterval aspersturgesrule is obtained asfollowsSize of class interval =C=Range1+3.322 logN =64 - 321+3.322 log(50) =326.645Thus the number of class interval is 7 andsize of each classis 5.The required size of each class is 5.The required frequencydistribution is prepared using tally marks as given below:Class Interval Tally marks Frequency30-35235-40640-45 1245-50 1450-55655-60660-654Total 5058Example 2:Givenbelowarethenumberoftoolsproducedbyworkersinafactory.43 18 25 18 39 44 19 20 20 2640 45 38 25 13 14 27 41 42 1734 31 32 27 33 37 25 26 32 2533 34 35 46 29 34 31 34 35 2428 30 41 32 29 28 30 31 30 3431 35 36 29 26 32 36 35 36 3732 23 22 29 33 37 33 27 24 3623 42 29 37 29 23 44 41 45 3921 21 42 22 28 22 15 16 17 2822 29 35 31 27 40 23 32 40 37Constructfrequency distribution with inclusivetypeof classinterval. Also find.1. How many workers produced more than 38 tools?2. How many workers produced less than 23 tools?Solution:Usingsturgesformulafordeterminingthenumberofclassintervals, we haveNumber of class intervals =1+ 3.322 log10N =1+ 3.322 log10100 =7.6Sizes of class interval=RangeNumber of class interval =46 - 137.6 5Hencetakingthemagnitudeofclassintervalsas5,wehave7classes13-17,18-22 43-47aretheclassesbyinclusivetype.Usingtallymarks,therequiredfrequencydistributionisobtaininthe following table59ClassIntervalTally Marks Number oftools produced(Frequency)13-17 618-22 1123-27 1828-32 2533-37 2238-42 1143-47 7Total 1004.7Percentage frequency table:Thecomparisonbecomesdifficultandattimesimpossiblewhenthetotalnumberofitemsarelargeandhighlydifferentonedistributiontoother.Underthesecircumstancespercentagefrequency distributionfacilitateseasy comparability.Inpercentagefrequencytable,wehavetoconverttheactualfrequenciesintopercentages.Thepercentagesarecalculatedbyusingtheformulagiven below:Frequencypercentage =Actual FrequencyTotal Frequency 100It is also called relative frequency table:Anexampleisgivenbelowtoconstructapercentagefrequency table.Marks No. ofstudentsFrequencypercentage0-10 3 610-20 8 1620-30 12 2430-40 17 3440-50 6 1250-60 4 8Total 50 100604.8 Cumulative frequency table:Cumulativefrequency distributionhasa runningtotal of thevalues.Itisconstructedby addingthefrequency ofthefirstclassintervaltothefrequencyofthesecondclassinterval.Againaddthattotaltothefrequencyinthethirdclassintervalcontinuinguntil thefinal total appearingoppositetothelastclassinterval willbethetotalofallfrequencies.Thecumulativefrequencymaybedownwardorupward.Adownwardcumulationresultsinalistpresentingthenumberof frequencies less thanany given amountasrevealedby thelowerlimitofsucceedingclass interval andtheupwardcumulativeresultsinalistpresentingthenumberoffrequencies more thanand given amount is revealed by the upperlimit of a preceding class interval.Example 3:Agegroup(inyears)Numberof womenLess thanCumulativefrequencyMore thancumulativefrequency15-203 36420-257 106125-3015 255430-3521 463935-4012 581840-456 646(a) Less than cumulative frequency distribution tableEndvaluesupperlimitlessthanCumulativefrequencyLess than 20 3Less than 25 10Less than 30 25Less than 35 46Less than 40 58Less than 45 6461(b) More thancumulative frequency distribution tableEnd values lowerlimitCumulative frequencymore than15 and above 6420 and above 6125 and above 5430 and above 3935 and above 1840 and above 64.8.1 Conversion of cumulative frequency to simpleFrequency: Ifwehaveonlycumulativefrequency either lessthanormore than , we can convert it into simple frequencies.For exampleif wehave less than Cumulative frequency,we can convertthis tosimple frequency by the method given below:Class interval less thanCumulative frequencySimple frequency15-20 3 320-25 10 10 3= 725-30 2525 10 = 1530-35 4646 25 = 2135-40 5858 46 = 1240-45 64 64 58 =6Methodof converting morethancumulativefrequency tosimplefrequency is given below.Class interval more thanCumulative frequencySimple frequency15-20 64 64 61 = 320-25 61 61 54 = 725-30 54 54 39 = 1530-35 39 39 18 = 2135-40 18 18 6 = 1240-45 6 6 0 = 6624.9Cumulative percentageFrequency table:Insteadofcumulativefrequency, if cumulativepercentagesaregiven,thedistributioniscalledcumulativepercentagefrequency distribution. We can form this table eitherby convertingthefrequenciesintopercentagesandthencumulateitorwecanconvert the given cumulative frequency into percentages.Example 4:Income (in Rs ) No. offamilyCumulativefrequencyCumulativepercentage2000-40008 8 5.74000-6000 1523 16.46000-8000 2750 35.78000-10000 4494 67.110000-12000 31 125 89.312000-14000 12 137 97.914000-20000 3 140100.0Total1404.10 Bivariate frequency distribution:Intheprevioussections,wedescribedfrequencydistributioninvolvingonevariableonly.Suchfrequencydistributionsarecalledunivariatefrequencydistribution.Inmanysituationssimultaneousstudyoftwovariablesbecomenecessary.Forexample,wewanttoclassifydatarelatingtotheweightsareheight of a group of individuals, income and expenditure of a groupof individuals, age of husbands and wives.The data soclassified on the basis of two variables give risetothesocalledbivariatefrequencydistributionanditcanbesummarizedintheformofatableiscalledbivariate(two-way)frequencytable.Whilepreparingabivariatefrequencydistribution,thevaluesofeachvariablearegroupedintovariousclasses(notnecessarilythesameforeachvariable).Ifthedatacorrespondingtoonevariable, say X is grouped intom classes andthe data corresponding to the other variable, say Y is grouped intonclassesthenthebivariatetablewillconsistofmxncells.Bygoingthroughthedifferentpairsofthevalues,(X,Y)ofthevariablesandusingtallymarkswecanfindthefrequencyofeach63cell and thus, obtain the bivariate frequency table.The formate of abivariate frequency table is given below:Format of Bivariate Frequency tableClass-Intervalsx-seriesy-seriesMid-valuesMarginalFrequencyofYClass-intervalsMidValuesfyMarginalfrequency of X fxTotalfx= fy=NHere f(x,y) is the frequency of the pair (x,y).The frequencydistributionofthevaluesofthevariablesxtogetherwiththeirfrequency total (fx)iscalledthemarginal distribution of x and thefrequency distribution of thevaluesof thevariable Y together withthetotalfrequenciesisknownasthemarginalfrequencydistributionofY.Thetotal ofthevaluesof manual frequenciesiscalled grand total (N)Example 5:Thedatagivenbelowrelatetotheheightandweightof 20persons.Construct a bivariate frequency table with class interval ofheightas62-64,64-66 andweightas115-125,125-135,writedown the marginal distribution of X and Y.64S.No. Height Weight S.No. Height Weight1 70 170 11 70 1632 65 135 12 67 1393 65 136 13 63 1224 64 137 14 68 1345 69 148 15 67 1406 63 12116 69 1327 65 11717 65 1208 70 12818 68 1489 71 14319 67 12910 62 1292067 152Solution:Bivariate frequency table showing height and weight of persons. Height(x)Weight(y)662-64664-66666-68668-70770-72TTotal115-125 II (2) II (2)4125-135 I(1) I (1) II (2) I (1)5135-145 III (3) II (2) I (1) 6145-155 I (1) II (2) 3155-165 I (1) 1165-175 I (1) 1Total 35 444 20Themarginaldistributionofheightandweightaregiveninthe following table.Marginal distribution ofheight (X)Marginaldistributionof (Y)CI Frequency CI Frequency62-643 115-125464-665 125-135566-684 135-145668-704 145-155370-724 155-165 1Total20 165-1751 Total2065Exercise - 4I.Choose the best answer:1. In an exclusive class interval(a) the upper class limit is exclusive.(b) the lower class limit is exclusive.(c) the lower and upper class limits are exclusive.(d) none of the above.2. Ifthelowerandupperlimitsofaclassare10and40respectively, the mid points of the class is(a)15.0 (b)12.5(c) 25.0(d)30.03. Class intervals of the type 30-39,40-49,50-59 represents(a)inclusive type (b)exclusive type(c) open-end type (d)none.4. The class interval of the continuous grouped data is10-19 20-29 30-39 40-49 50-59(a) 9 (b)10(c) 14.5(d) 4.55. Raw data means (a)primary data (b) secondary data(c) data collected for investigation (d)Well classified data.II. Fillin the blanks:6. H.A.Sturges formula for finding number of classes is________.7. Ifthemid-valueofaclassintervalis20andthedifferencebetweentwoconsecutivemidvaluesis10theclasslimitsare ________ and ________.8. The difference between the upper and lower limitof class iscalled ______.9. Theaverageoftheupperandlowerlimitsofaclassisknown as _________.10. Numberofobservationsfallingwithinaparticularclassinterval is called __________ of that class.III. Answer the followingquestions:11. What is a frequency distribution?12. What is an array?13. What is discrete and continuous frequency distribution?6614. Distinguish between with suitable example.(i) Continuous and discrete frequency (ii) Exclusive and Inclusive class interval(iii) Less than and more than frequency table(iv) Simple and Bivariate frequency table.15. The following data gives the number of children in 50families.Construct a discrete frequency table.4 2 0 2 3 2 2 1 0 23 5 1 1 4 2 1 3 4 26 1 2 2 2 1 3 4 1 01 3 4 1 0 1 2 2 2 52 4 3 0 1 3 6 1 0 116. In a survey, it was found that 64 families bought milk in thefollowing quantities in a particular month.Quantity of milk(in litres) bought by 64 Families in a month.Construct a continuous frequency distribution making classes of 5-9,10-14 and so on.19 16 22 9 22 12 39 1914 23 6 24 16 18 7 1720 25 28 18 10 24 20 2110 7 18 28 24 20 14 2325 34 22 5 33 23 26 2913 36 11 26 11 37 30 138 15 22 21 32 21 31 1716 23 12 9 15 27 17 2117. 25valuesoftwovar