research methods for computational statistics
DESCRIPTION
Lecture notes for STIS studentsTRANSCRIPT
Metodologi Penelitian Komputasi Statistik
Setia Pramana
Educational Background
Hasselt Universiteit, Belgium, MSc in Applied Statistics 2005-2006.
Hasselt Universiteit, Belgium, MSc in Biostatistics 2006-2007.
Hasselt Universiteit, Belgium, PhD Statistical Bioinformatics, 2007-2011.
Educational Background
Medical Epidemiology And Biostatistics Dept. Karolinska Institutet, Sweden, Postdoctoral, 2011-2014
Biostatistics
The study of statistics as applied to biological areas such as Biological laboratory experiments, medical research (including clinical research), and public health services research.
Biostatistics, far from being an unrelated mathematical science, is a discipline essential to modern medicine – a pillar in its edifice’ (Journal of the American Medical Association (1966)
4
Bioinformatics
Bioinformatics is a science straddling the domains of biomedical, informatics, mathematics and statistics.
Applying computational techniques to biology data
Functional Genomics
Proteomics
Sequence Analysis
Phylogenetic
Etc,.
5
“Informatics” in Bioinformatics
DatabasesBuilding, QueryingObject DB
•Text String ComparisonText Search
Finding PatternsAI / Machine LearningClusteringData mining
etc
6
Current Research
Statistical methods for high-throughput data analyses particularly in Next generation sequencing (NGS) data (Whole genome-seq, Exome-seq and RNA-seq).
RNA microarray expression studies and GWAS in cancer and cardiovascular diseases.
Classification in NGS data.
R-Graphical User Interface (R-GUI) for high-throughput data analyses.
Course Outline
Basic concept Research
Problem identification and hypothesis
Literature Review
Research Design
Quantitative research
Make Scientific report/paper
Survival Data Analysis
9
Course Workload
40% Theory, 60% practice
Group Project (5 students)
Presentation every week
Slides can be seen at : http://www.slideshare.net/hafidztio/
Setia Pramana
Research
An organized, systematic, data-based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the purpose of finding answers or solutions to it.
It provides the needed information that guides managers to make informed decisions to successfully deal with problems.
The information provided could be the result of a careful analysis of data gathered firsthand or of data that are already available (in the company, industry, archives, etc.).
Purpose of A Research
Review or synthesize existing knowledge
Investigate existing situations or problems
Provide solutions to problems
Explore and analyze more general issues
Construct or create new procedures or systems
Explain new phenomenon
Generate new knowledge
or a combination of any of the above!
Research Outcome
1. Product or Innovation directly used by Industry
2. Patent
3. International Publication
Types of Research, by Purpose
Basic Research
Applied Research
Evaluation Research
Research and Development
Types of Research, by Time
Cross-Sectional Research
Longitudinal Research
Types of Research, by Method
Quantitative research:Descriptive Correlational researchCausal-comparativeExperimental Single-subject research
Qualitative Research:
Narrative research
Types of Research, by Method
Types of Research
Deductive Reasoning
Starts out with a general statement, or hypothesis, and examines the possibilities to reach a specific, logical conclusion.
The scientific method uses deduction to test hypotheses and theories.
Ex: "All men are mortal. Harold is a man. Therefore, Harold is mortal."
Theory
Hypothesis Observation
Confirmation
Inductive Reasoning
The opposite of deductive reasoning.
Makes broad generalizations from specific observations.
Ex: "Harold is a grandfather. Harold is bald. Therefore, all grandfathers are bald."
TheoryTentative Hypothesis Pattern
Confirmation
Deductive/Inductive Research
Basic Steps
1. Develop a research question
2. Conduct thorough literature review
3. Re-define research question/ hypothesis
4. Design research methodology/study
5. Create research proposal
Basic Steps
6. Apply for funding
7. Apply for ethics approval
8. Collect and analyze data/Software developing and testing
9. Draw conclusions and relate findings
Basic Steps
Research Question Development
Research Question Development
Problem Identification
Limit the research scope
Research Question Identification
Goals Identification
Hypothesis
Statistical Hypothesis
Hypothetical Statement
Building block of Science
Possible Source of RQs
Observational Research
Discussions, brainstorming
Experts, academics and industry
Bibliography, journals, research report, Populas science magazine, etc.
A Research Question Should
Have research value: Original, can be tested/evaluated.
Fisible: Can be answered, data available, cost and can be solved in time.
Match to the researchers qualification
FINER criteria for RQ
F Feasible Adequate number of subjectsAdequate technical expertiseAffordable in time and moneyManageable in scope
I Interesting Getting the answer intrigues investigator, peers and community
N Novel Confirms, refutes or extends previous findings
Hulley S, Cummings S, Browner W, et al. Designing clinical research. 3rd ed. Philadelphia (PA): Lippincott Williams and Wilkins; 2007.
FINER criteria for RQ
E EthicalAmenable to a study that institutional review board will approve
R Relevant To scientific knowledgeTo clinical and health policyTo future research
Hulley S, Cummings S, Browner W, et al. Designing clinical research. 3rd ed. Philadelphia (PA): Lippincott Williams and Wilkins; 2007.
Research Hypothesis
Hypothesis Definition
Research Hypothesis
The primary research question should be driven by the hypothesis rather than the data.
The research question and hypothesis should be developed before the start of the study.
A good hypothesis must be based on a good research question at the start of a study and drive data collection for the study.
Hypothesis
Is a clear statement of what is intended to be investigated.
It should be specified before research is conducted and openly stated in reporting the results.
It allows to Identify: the research objectives the key abstract concepts involved in the research its relationship to both the problem statement and
the literature review
Source of Hypothesis
Environment
Literature
Other Empirical Data
Personal Experience
Type of Hypothesis
Null Hypothesis
Alternative Hypothesis
Type of Hypothesis
Example
Example
There is no significant gain between pre-test and post- test scored of students exposed to Computer-Aided Instruction in Analytic Geometry
Special Consideration for Null Hypothesis
Hypothesis Testing:
1-sided or 2-sided hypotheses?
A 2-sided hypothesis states that there is a difference between the experimental group and the control group, but it does not specify in advance the expected direction of the difference.
A 1-sided hypothesis states a specific direction (e.g., there is an improvement in outcomes with computer-assisted surgery).
A 2-sided hypothesis should be used unless there is a good justification for using a 1-sided hypothesis.
Error Type
Research objective
The primary objective should be coupled with the hypothesis of the study.
Study objectives define the specific aims of the study and should be clearly stated in the introduction of the research protocol.
Example: Hypothesis : there is no difference in functional outcomes
between computer-assisted acetabular component placement and free-hand placement,
The primary objective can be stated as follows: this study will compare the functional outcomes of computer-assisted acetabular component insertion versus free-hand placement in patients undergoing total hip arthroplasty.
Research objective
The study objective is an active statement about how the study is going to answer the specific research question.
Objectives state exactly which outcome measures are going to be used within their statements.
They are important to not only guide the development of the protocol and design of study but also play a role in sample size calculations and determining the power of the study.
Literature Review
Literature Review
Is an evaluative report of studies found in the literature related to your selected area.
Should describe, summarize, evaluate and clarify this literature.
Give a theoretical basis for the research and help you determine the nature of your own research.
Select a limited number of works that are central to your area rather than trying to collect a large number of works that are not as closely connected to your topic area.
Boote, D.N. & Beile, P. (2005). Scholars before researchers: On the centrality of the dissertation literature review in research preparation. Educational Researcher 34/6, 3-15.
Literature Review Purpose
Provide a context for the research
Justify the research
Ensure the research hasn't been done before (or that it is not just a "replication study")
Show where the research fits into the existing body of knowledge
Enable the researcher to learn from previous theory on the subject
Literature Review Purpose
Illustrate how the subject has been studied previously
Highlight flaws in previous research
Outline gaps in previous research
Show that the work is adding to the understanding and knowledge of the field
Help refine, refocus or even change the topic
Strategies
Strategies
Kirby, S., Greaves, L. & Reid, C. (2006). Searching the Literature. In Experience research social change: Methods beyond the mainstream
Literature Review in a thesis
The cycle
Hasibuan, 2007, Metode Penelitian Komputasi
What you should do
Compare
Contrast
Criticize
Synthesize
Summarize
Hasibuan, 2007, Metode Penelitian Komputasi
Sources
Articles in International Journal
Thesis
Disertasi
Proceeding
Magazines
Abstract book
Websites
Literature Citation
Whenever you quote, summarize, paraphrase or refer to the work of another person you need to cite it.
Giving credit to the original author for any information that you learn through our research process and share with the readers.
Citing is the way to give credit to other's work when you use it in your papers, speeches and projects.
Citing other's work is a very important step in the academic writing process and the best way to avoid plagiarism.
Literature Citation
Two ways: Use sentence that introduce the author Add the author’s name at the end of the sentence
We must provide last name and year of publication
Paraphrase signal phrase:
“According to Smith (2004) the cost of treating alcoholism is increasing dramatically.”
Direct Quote:“ the cost of treating alcoholism is exceeded only by the cost of treating illness from tobacco use, and is increasing exponentially” (Smith, 2004)
Research Design
Research Design
A plan or strategy for conducting the research
Spells out the basic strategies that researchers adopt to develop evidence that is accurate and interpretable.
Deals with matters such as selecting participants for the research and preparing for data collection.
Purposes of Research Design
1. To provide answers to research questions
2. To control variance
Purposes of Research Design
1. To provide answers to research questions
2. To control variance
Characteristics for good research design
1. Freedom from bias
2. Freedom from confusing
3. Control of extraneous variables
4. Statistical correctness for testing hypothesis
TYPES OF RESEARCH1. Experimental research – involves
manipulating condition and studying effects – (IPO-Input-Process-Output)
2. Correlational research – involves studying relationship s among variables within a single group, and frequently suggests the possibility of cause and effect.
3. Survey research – involves describing the characteristics of a group by means of such instruments as interview schedules, questionnaires, and tests.
Ethnographic research - concentrates on documenting or portraying the everyday experiences of people using observation and interviews.
Involve how well, how much, how efficiently, knowledge, attitudes or opinion in the like exists.
Case study – is a detailed analysis of one or a few individuals
Historical research – involves studying some aspect of the past
Action research – is a type of research by practitioners designed to help improve their practice.
GENERAL RESEARCH TYPES
It is useful to consider the various research methodologies we have described as falling within one or more general research categories –
Descriptive
Associational
Intervention-type Studies
1. DESCRIPTIVE STUDIES It describe a given state of affairs as fully and
carefully as possible.
Examples:
- In Biology, where each variety of plant and animal species is meticulously described
and information is organized into useful taxonomic categories.
- In educational research, the most common descriptive methodology is the survey, as when researchers summarize the
characteristics (abilities, preferences, behaviors, and so on) of individuals or groups or physical environment (school)
2. ASSOCIATIONAL RESEARCHResearch that investigates relationships
is often referred to as associational research
Correlational and causal-comparative methodologies are the principal examples of associational research.
Example: Studying relationship
(a) between achievement and attitude
(b) between childhood experiences and adult characteristics
(c) between teacher characteristic and student achievement
(d) between methods of instruction & achievement (comparing
students who
have been taught by each method)
(e) between gender and attitude (comparing attitudes of males and females)
Descriptive research is not satisfying since most researchers want to have complete understanding of people and things not just merely describing but need further analysis.
Associational studies are, they too are ultimately unsatisfying.
- because it did not permit researchers to “do something” to influence or change outcomes.
- Simply determining interest or achievement of students does not tell us how to change or improve either interest or achievement.
3. INTERVENTION STUDIES
To find out whether one thing will have an effect on something else, researchers need to conduct some form of intervention study.
Is a particular treatment is expected to influence one or more outcomes.
Such studies enable researchers to assess
For example:
- the effectiveness of various teaching methods,
- curriculum models,
- classroom arrangements
- and other efforts to influence the characteristics of individuals or groups.
Experiment is the primary methodology used in intervention research
Some types of research may combine these 3 general types
Quantitative vs. qualitative research
Areas Quantitative Qualitative
Goals -Theory testing, establishing facts, statistical description, prediction, relationship between variables
- Sensitizing concepts, describe multiple realities, grounded theory, develop understanding
Design - Structured, predetermined, formal, specific detailed plan of operation
- Evolving, flexible
Areas Quantitative Qualitative
Data -Quantitative, quantifiable coding counts, measures, operationalized variables statistics
- Descriptive, personal documents, field notes, photographs, people’s own words, official documents
Sample - Large, stratified, control groups, precise, random, control of extraneous variables
- Small, non-representative, focused, purposeful, convenient
Areas Quantitative Qualitative
Technique or methods
- Experiments, surveys, structured interviewing, structured observation
- Observation, participant observation, review of documents, open-ended interviewing, first person accounts.
Relationship with subjects
- Detached, short term, distant, subject-researcher restricted
- Empathy, emphasis on trust, democratic
Areas Quantitative Qualitative
Data analysis
- Deductive, statistical
- Ongoing models, themes, concepts, inductive, analytic,constant comparative.
Problems - Controlling other variables, validity, reliability
- Time consuming, data reduction difficulties, procedures not standardized, difficulty to study large populations,Empathy, emphasis on trust, democratic
Research Types under Quantitative & Qualitative
Quantitative Qualitative1.Experimental
Research2.Single-Subject
Research3.Correlational
Research4.Causal-
Comparative Research
5.Survey Research
1.Ethnographic Research
2.Historical Research
IDENTIFY WHAT TYPE OF RESEARCHHistorical study of college entrance
requirements over time that examine the relationship between those requirements and achievement in mathematics.
An ethnographic study that describes in detail the daily activities of an inner-city high school and also finds a relationship between media attention and teacher morale in school
An investigation of the effects of different teaching methods on concept learning and gender
We can classify designs into a simple threefold classification by asking some
key questions.
This threefold classification is especially useful for describing the design with respect to internal validity.
A randomized experiment generally is the strongest of the three designs when your interest is in establishing a cause-effect relationship.
A non-experiment is generally the weakest in this respect only to internal validity or causal assessment.
In fact, the simplest form of non-experiment is a one-shot survey design that consists of nothing but a single observation O.
The most common forms of research descriptive ones
What research type would be appropriate for these research problem?
1. How do parents feel about the elementary school counseling program?
2. Do students who have high score on reading tests also have high scores on writing tests?
3. What effect does the gender of a counselor have on how he or she is “received by counselees”?
4. How can Tom Adams be helped to learn to read?
ANSWER1. ETHNOGRAPHIC STUDY
2. CORRELATIONAL STUDY
3. CAUSAL-CORRELATION STUDY/INTERVENTION STUDY
4. EXPERIMENT/CORRELATIONAL OR
ASSOCIATIONAL-INTERVENTION STUDY
Sampling Methods
What exactly IS a “sample”?
What exactly IS a “sample”?
A subset of the population, selected by either
“probability” or “non-probability” methods. If you have a “probability sample”
you simply know the likelihood of any member of the
population being included (not necessarily that it is
“random.”
SAMPLING 93
A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005)
Why sample?Resources (time, money) and workloadGives results with known accuracy that can be
calculated mathematically
The sampling frame is the list from which the potential respondents are drawn Registrar’s officeClass rostersMust assess sampling frame errors
SAMPLING…….
94
3 factors that influence sample representative-ness
Sampling procedure Sample size Participation (response)
When might you sample the entire population? When your population is very small When you have extensive resources When you don’t expect a very high response
Assumptions of quantitative sampling
We want to generalize to the population.
Random events are predictable.
Therefore…We can compare random events to our results.
Probability sampling is the best approach.
96
SAMPLING BREAKDOWN
SAMPLING…….97
TARGET POPULATION
STUDY POPULATION
SAMPLE
Process 98
The sampling process comprises several stages:Defining the population of concern Specifying a sampling frame, a set of items
or events possible to measure Specifying a sampling method for selecting
items or events from the frame Determining the sample size Implementing the sampling plan Sampling and data collecting Reviewing the sampling process
Assumptions of qualitative sampling
Social actors are not predictable like objects.
Randomized events are irrelevant to social life.
Probability sampling is expensive and inefficient.
Therefore…
Non-probability sampling is the best approach.
Types of samples
Types of Samples 101
Probability (Random) Samples
Simple random sampleSystematic random sampleStratified random sampleMultistage sampleMultiphase sampleCluster sample
Non-Probability SamplesConvenience samplePurposive sampleQuota
Simple Random Sample
1. Get a list or “sampling frame”a. This is the hard part! It must not systematically
exclude anyone.
b. Remember the famous sampling mistake?
2. Generate random numbers
3. Select one person per random number
SIMPLE RANDOM SAMPLING……..103
Estimates are easy to calculate.
Simple random sampling is always an EPS design, but not all EPS designs are simple random sampling.
Disadvantages
If sampling frame large, this method impracticable.
Minority subgroups of interest in population may not be present in sample in sufficient numbers for study.
Systematic Random Sample
1. Select a random number, which will be known as k
2. Get a list of people, or observe a flow of people (e.g., pedestrians on a corner)
3. Select every kth persona. Careful that there is no systematic rhythm to the
flow or list of people.
b. If every 4th person on the list is, say, “rich” or “senior” or some other consistent pattern, avoid this method
SYSTEMATIC SAMPLING……105
ADVANTAGES:
Sample easy to select
Suitable sampling frame can be identified easily
Sample evenly spread over entire reference population
DISADVANTAGES:
Sample may be biased if hidden periodicity in population coincides with that of selection.
Difficult to assess precision of estimate from one survey.
Stratified Random Sample
1. Separate your population into groups or “strata”
2. Do either a simple random sample or systematic random sample from there
a. Note you must know easily what the “strata” are before attempting this
b. If your sampling frame is sorted by, say, school district, then you’re able to use this method
STRATIFIED SAMPLING……107
Drawbacks to using stratified sampling.
First, sampling frame of entire population has to be prepared separately for each stratum
Second, when examining multiple criteria, stratifying variables may be related to some, but not to others, further complicating the design, and potentially reducing the utility of the strata.
Finally, in some cases (such as designs with a large number of strata, or those with a specified minimum sample size per group), stratified sampling can potentially require a larger sample than would other methods
Multi-stage Cluster Sample
1. Get a list of “clusters,” e.g., branches of a company
2. Randomly sample clusters from that list
3. Have a list of, say, 10 branches
4. Randomly sample people within those branchesa. This method is complex and expensive!
The Convenience Sample
1. Find some people that are easy to find
The Snowball Sample
1. Find a few people that are relevant to your topic.
2. Ask them to refer you to more of them.
The Quota Sample
1. Determine what the population looks like in terms of specific qualities.
2. Create “quotas” based on those qualities.
3. Select people for each quota.
The Theoretical Sample
Jenis Penelitian untuk Skripsi Komputasi Statistik STIS
Pengembangan sistem informasi statistik
Sistem informasi berbasis komputer yang dikembangkan untuk mendukung kegiatan pada domain/area statistik. Contoh: Sistem Informasi Rujukan Statistik, Sistem Informasi Geografis yang menggunakan data (hasil olahan) statistik, Sistem Informasi Diseminasi Statistik, serta Sistem Informasi Data Entri dan Monitoring dalam kegiatan pengumpulan data statistik.
Jenis Penelitian untuk Skripsi Komputasi Statistik STIS
Pengembangan aplikasi statistik
Program aplikasi yang dibuat untuk mendukung pemecahan masalah di bidang statistika.
Program harus dibuat sendiri dan pemecahan masalah tersebut belum bisa dilakukan dengan menggunakan paket program pengolahan data statistik yang sudah ada; atau program boleh dibuat dengan menggunakan pustaka/library yang sudah ada namun belum ada interface nya; atau bisa dilakukan dengan paket program namun proses/prosedurnya tidak/belum efisien sehingga perlu dibuat suatu aplikasi yang terintegrasi.
Contoh: Pengembangan Aplikasi Fitting Regresi, Aplikasi Pengujian Hipotesis Menggunakan Permutation Test dalam Resampling.
Jenis Penelitian untuk Skripsi Komputasi Statistik STIS
Kajian teknologi di bidang komputasi statistik
Kajian yang dilakukan pada dua bidang keilmuan tersebut yang hasilnya dapat bermanfaat bagi perkembangan ilmu komputer maupun statistik.
Tema penelitian yang tidak masuk dalam jenis pertama dan kedua bisa dimasukkan ke dalam jenis ketiga ini jika dipandang tema penelitiannya memiliki orisinalitas dan inovasi serta tingkat kontribusi yang tinggi bagi perkembangan ilmu komputer maupun statistik, Badan Pusat Statistik, maupun bagi masyarakat.
Contoh: Pengembangan Inference Engine Sistem Pakar Berbasis Database (Studi Kasus Penentuan Metode Penyusunan Indeks Harga dan Produksi), Pengembangan Mesin Pencari Statistik Berbasiskan Supervised Learning dan Relevant Feedback.
Metode,Teknik Dan InstrumenDalam Penelitian
Research Instruments:
Tools for gathering dataQuestioners Interview
Questioners
The most common instrument or tool of research for obtaining the data beyond the physical reach of the observer which
Closed form / Closed-ended
Open form / Open-ended
Questioners
Clarity of language
Singleness of purpose
Relevant to the objective of the study
Correct grammar
Questioner: Advantages
Facilitates data gathering
Is easy to test data for reliability and validity
Is less time-consuming than interview and observation
Preserves the anonymity and confidentiality of the respondents’ reactions and answers
Questioner: Disadvantages
Printing and mailing are costly
Response rate maybe low
Respondents may provide only socially acceptable answers
There is less chance to clarify ambiguous answer
Respondents must be literate and with no physical handicaps
Rate of retrieval can be low because retrieval itself is difficult
Interview
Purpose:
to verify information gathered from written sources
to clarify points of information
to update information and
to collect data
Interview: Types
Screening interview
Panel or Group Interview
Telephone interview
How to measure the instruments?
Validity- measure what is intends to measure External validity: is the results of a study can be generalized from a
sample to a population? Content validity: The appropriateness of the content of an instrument.
In other words, do the measures (questions, observation logs, etc.) accurately assess what you want to know
Reliability – stability in maintaining consistent measurement in a test administered twice Inter-Rater/Observer Reliability: The degree to which different
raters/observers give consistent answers or estimates. Test-Retest Reliability: The consistency of a measure evaluated over
time. Parallel-Forms Reliability: The reliability of two tests constructed the
same way, from the same content. Internal Consistency Reliability: The consistency of results across
items, often measured with Cronbach’s Alpha.