behavioral research (2)

Research Methodology

What many do not get about this topicThe people who have changed how we think about science and the world were often rebels or had very ‘radical ideas’ which threatened the established order of the predominant world view

Galileo has been called the "father of modern observational astronomy", the "father of modern

physics", the "father of science His observations of the satellites of Jupiter caused a revolution in astronomy: a planet with smaller planets orbiting it did not conform to the principles of Aristotelian cosmology, which held that all heavenly bodies should circle the Earth and met with opposition from astronomers, who doubted heliocentrism The matter was investigated by the Roman Inquisition in 1615, which concluded that heliocentrism was false and contrary to scripture, placing works advocating the Copernican system on the index of banned books and forbidding Galileo from advocating heliocentrism. Galileo was one of the first modern thinkers to clearly state that the laws of nature are mathematical

The Rebels• The first of the great anatomists was Galen of Pergamon (AD 130-

200) who made vast achievements in the understanding of the heart, the nervous system, and the mechanics of breathing. Because human dissection was forbidden, he performed many of his dissections on Barbary apes, which he considered similar enough to the human form. The system of anatomy he developed was so influential that it was used for the next 1400 years. Galen continued to be influential into the 16th century, when a young and rebellious physician began the practice of using real human bodies to study the inner workings of the human body

• Andreas Vesalius who came from a line of four prominent family physicians. Vesalius and other like-minded anatomy students would raid the gallows of Paris for half-decomposed bodies and skeletons to dissect. Rather than considering dissection a lowering of his prestige as a doctor, Vesalius prided himself in being the only physician to directly study human anatomy since the ancients.Although he respected Galen Vesalius often found that his study of the human form did not fit with the descriptions provided by Galen

The Rebels• Like his fellow revolutionary scientists, Vesalius’ masterpiece

was met with harsh criticism. Many of these criticisms understandably came from the church, but the most strident of all came from Galenic anatomists. These critics vowed that Galen was in no way incorrect, and so if the human anatomy of which he wrote was different from that which was proved by Vesalius, it was because the human body had changed in the time between the two. As a response to the harsh criticisms of his work, Vesalius vowed to never again bring forth truth to an ungrateful world. In the same year that he published de humani(1543), he burned the remainder of his unpublished works, further criticisms of Galen, and preparations for his future studies. He left medical school, married, and lived out the rest of his conservative life as a court physician (source brain blogger)

Not what but who you know• French chemist and microbiologist renowned for his

discoveries of the principles of vaccination, microbial fermentation and pasteurization. His medical discoveries provided direct support for the germ theory of disease and its application in clinical medicine-popularly known as the "father of microbiology".

• In 1847 he was given a 2 year appointment as an assistant in obstetrics with responsibility for the First Division of the maternity service of the vast Allgemeine Krankenhaus teaching hospital in Vienna. There he observed that women delivered by physicians and medical students had a much higher rate (13–18%) of post-delivery mortality (called puerperal fever or childbed fever) than women delivered by midwife trainees or midwives (2%).

Agree to Disagree (disagreeably) • This case-control analysis led Semmelweis to

consider several hypotheses. He concluded that the higher rates of infections in women delivered by physicians and medical students were associated with the handling of corpses during autopsies before attending the pregnant women. This was not done by the midwives. He associated the exposure to cadaveric material with an increased risk of childbed fever, and conducted a study in which the intervention was hand washing.

Who dares challenge the existing dogma?• Dr Semmelweis initiated a mandatory hand washing

policy for medical students and physicians. In a controlled trial using a chloride of lime solution, the mortality rate fell to about 2%—down to the same level as the midwives. Later he started washing the medical instruments and the rate decreased to about 1%. His superior, Professor Klein did not accept his conclusions. Klein thought the lower mortality was due to the hospital’s new ventilation system.

• Semmelweis did not get his assistant professorship renewed in 1849. He was offered a clinical faculty appointment (privatdozent) without permission to teach from cadavers. He returned home to Budapest.

Misconception # 2• The popular believe is that the material and methods in this

course are abstract and have little to do with important issues in everyday life. My Question-Why do we not use these methods to examine difficult questions?

• Terrorism (how it develops, how to prevent it-airport security)

• Torture (Is it effective? Does it provide useful information?)

• How can we best prevent rape and assault?

• Are there gun control approaches that reduce gun violence?

• Such questions are not addressed adequately by the ideas and tools provided by this field mainly because people maintain a view of this field as ‘academic and irrelevant’opinion

Also many times we make assumptions that go untested and may turn out to be incorrect-See unregulated radiation doses in CT scans Rebecca Bindman

Misconception # 3

• Numbers drive the ideas

• Actually it is the ideas that drive the numbers

• Numbers can describe and quantify and also tell us about differences between individuals and or groups as well as accurately describe changes that occur. The research ideas and tools in such a class as this can also help us distinguish between true and false claims and identify those claims that are significant and meaningful.

An Epidemic of False Claims-Scientific American May 7 2011

• False positives and exaggerated results in peer-reviewed scientific studies have reached epidemic proportions in recent years. The problem is rampant in economics, the social sciences and even the natural sciences, but it is particularly egregious in biomedicine.

• Many studies that claim some drug or treatment is beneficial have turned out not to be true. We need only look to conflicting findings about beta-carotene, vitamin E, hormone treatments, Vioxx and Avandia. Even when effects are genuine, their true magnitude is often smaller than originally claimed.

An Epidemic of False Claims

• Research is fragmented, competition is fierce and emphasis is often given to single studies instead of the big picture.

• Much research is conducted for reasons other than the pursuit of truth. Conflicts of interest abound, and they influence outcomes. In health care, research is often performed at the behest of companies that have a large financial stake in the results. Even for academics, success often hinges on publishing positive findings.

What is usefulness of this course• Claims are made all the time regarding some product or

process and sometimes some controversy

• A new study into the efficiency and reliability of wind farms has concluded that a campaign against them is not supported by the evidence

• Internet marketers of acai berry weight-loss pills and colon cleansers will pay $1.5 million to settle charges of deceptive advertising and unfair billing, the Federal Trade Commission announced today. The FTC complaint alleged that two individuals and five related companies deceptively claimed that their Acai Pure supplement would cause rapid and substantial weight loss, and that their Colotox colon cleanser would prevent colon cancer.

The scientific method

• A body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be scientific, a method must be based on empirical and measurable evidence subject to specific principles of reasoning Empiricism-knowledge comes only or primarily from sensory experience.

• Although procedures vary from one field of inquiry to another, identifiable features distinguish scientific inquiry from other methods of obtaining knowledge.

• Anthropology-Zoology

The scientific method• The scientific approach recognizes that both intuition and

authority can be sources of ideas but does not unquestionably accept something as true based on a person’s prestige or authority pg 3-5

• The fundamental characteristic of scientific method is empiricism-the idea that knowledge is based on observations and that these observations can be measured (creating data or a set of data) pg 5

• Science is adversarial Since a requirement is that hypotheses must be testable, researchers conduct and then publish their results, allowing others to review them and decide for themselves the validity and reliability of the data and the conclusions drawn from them pg 6

• Scientific evidence is peer reviewed-Editors of the journal examine the research submitted to determine its validity

Pseudoscience• Hypothesis generated are typically not testable

• Methodology is not scientific and validity of data is questionable

• Supportive evidence tends to be anecdotal and/or rely on “so-called’ experts

• Conflicting evidence is ignored

• Language used sounds scientific

• Claims tend to be vague, rationalize strongly held beliefs and appeal to preconcieved ideas

• Claims are never revised pg 9

Scientific Inquiry• Researchers propose hypotheses (a tentative idea

that must be tested) pg19 as explanations of phenomena, and design experimental studies to test these hypotheses via predictions which can be derived from them

• Scientific inquiry is generally intended to be as objective as possible in order to reduce biased interpretations of results. Another basic expectation is to document and share all data and methodology so they are available for careful scrutiny by other scientists, giving them the opportunity to verify results by attempting to reproduce them (replicate results)

Scientific Inquiry Scientists are funny

• “The history of biochemistry is a chronicle of controversies. These controversies exhibit a common pattern. There is a complicated hypothesis, which usually entails an element of mystery and several unnecessary assumption. This is opposed by a more simple explanation, which contains no unnecessary assumptions.

• The complicated one is always the popular one at first, but the simpler one, as a rule, eventually is found to be correct. This process frequently requires ten to twenty years, The reason for this long time lag was explained by Max Planck. He remarked that scientists never changed their mind, but eventually they die” --John Northrup Biochemist

Hypotheses and Theories• An hypothesis is a conjectural (if-then) statement while a

theory is a systematic body of ideas about a particular topic or phenomenon pg 19

• A question is asked that may refer to an observation (e.g. Do aggressive video games increase aggression in adolescents and young adults?) or may be in the form of an open-ended question (what strategies are best for coping with natural disasters?)

• We then make conjectures (hypotheses), and test them to see if our predictions (specific predictions) conform to what happens in the real world

• Theories encompass wider domains of inquiry that may bind many independently derived hypotheses together in a coherent, supportive structure. Theories, in turn, may help form new hypotheses or place groups of hypotheses into context

Basic Steps of Scientific Inquiry• Define a question

• Gather information and resources (observe)

• Form an explanatory hypothesis

• Test the hypothesis by performing an experiment and collecting data in a reproducible manner

• Analyze the data

• Interpret the data and draw conclusions that serve as a starting point for new hypothesis

• Publish results

• Retest (frequently done by other scientists)

Examples of Pseudoscience • Expectations that 2012 would bring large-scale disasters or

even the end of the world• Ancient Astronauts - Proposes that aliens have visited the earth in the past and influenced our civilization• Astrology - Belief that humans are affected by the position of celestial bodies• Flat Earth Society - Claims the Earth is flat and disc-shaped• Moon Landing Conspiracy - Contends the original moon landing was faked• Bermuda Triangle - An area where unexplained events, like disappearances of ships and airlplanes, have occurred• Cryptozoology - The search for Bigfoot (Yeti), the Loch Ness monster, El Chupacabra and other creatures that biologists believe do not exist

Some More Controversies• Mayan Calendar predictions for 2012

• Crystal healing

• Hypnosis – state of extreme relaxation and inner focus in which a person is unusually responsive to suggestions made by the hypnotist. The modern practice has its roots in the idea of animal magnetism, or mesmerism, originated by Franz Mesmer Mesmer's explanations were thoroughly discredited, and to this day there is no agreement amongst researchers whether hypnosis is a real phenomenon, or merely a form of participatory role-enactment

The Geocentric Model &The Wanderers • Most of the time we see Mars, Jupiter and Saturn moving around the Sun in

the same direction as the Earth, but during the relatively short time that the Earth overtakes one of these planets, that planet appears to be moving backward. As the Greeks noticed discrepancies between the way planets moved and the basic geocentric model, they began adjusting the model and creating variations on the original. In these models, planets and other celestial bodies move in circles that have been superimposed onto circular orbits around the Earth

• http://www.lasalle.edu/~smithsc/Astronomy/retrograd.html

The Earth Moved

• The solution proposed by Ptolemy, to these discrepancies came in the form of a mad, but clever proposal: planets were attached, not to the concentric spheres themselves, but to circles attached to the concentric spheres

• The Ptolemaic system, the most well-known versions of the geocentric model, was a complex interaction of circles. Ptolemy believed that each planet orbited around a circle, which was termed an epicycle, and the epicycle orbits on a bigger circle–the deferent–around the Earth.

• However, in practice, even this was not enough to account for the detailed motion of the planets on the celestial sphere! In more sophisticated epicycle models further "refinements" were introduced. In some cases, epicycles were themselves placed on epicycles

The Day the Earth Stood Still

• Ptolemic geocentric theory describes and correctly predicts-one could confidently predict when a planet’s apparent motion would come to a halt and turn around, and for how long it would seem to move backwards. Theory predicts but does not explain HOW or WHY the planets move as they do

• Correlation~Prediction Causality

• Navigation unaffected

• Occam’s razor or the law of parsimony

• Once Kepler proposed the theory of elliptical orbits, heliocentrismbecame such a simple model compared to Ptolemy's unwieldy cycles and epicycles, that heliocentrism rapidly gained in popularity and quickly became the dominant theory

Joy of Researcy

Conducting Research-Library Research and Journals

Chapter Two

Hypothetically Speaking• Researchers generally test a hypothesis-a tentative idea or

question that can be supported or refuted and then design a study to test the hypothesis. The researcher also makes a prediction regarding the outcome of the experiment pg 19

• If the prediction is not confirmed the researcher will either reject the hypothesis or conduct further research using different methods pg 19

• However, if the results of the study confirm the prediction the hypothesis is supported but not proven

Constructing the study

• Participants in the study are Subjects pg 20

• Participants in survey research are respondents

• Those who help researchers understand a particular culture or organization are informants

• Participants are often more fully described by characterizing them as students, employees, residents, patients etc.

• Other terms for subjects include respondents, informants pg 20

Sources of Ideas• Common sense-The things we all believe to be true

although such notions do not always turn out to be correct (also popular beliefs-the 5 sec rule pg 20-21 )

• Observation- Listening to music with degrading sexual lyrics predicts a range of early sexual behavior

• Serendipity-Luck? Pg 21 Pavlov? Accidental discovery of dogs salivating to other stimuli besides food (Otto Loewi and the discovery of Acetylcholine) it was generally accepted that neurons were connected by synapses and initially most neurophysiologists believed that signal

transmission between cells was electrical Other Example Accidental discovery of medications in 1950’s

• Theories

• Past research

Sense-Common and Otherwise• Common sense is often made up of much prejudice

and snap judgment, and therefore is not always useful and can certainly be irrational even when it is useful

• Testing a commonsense idea can be useful since such ideas do not always turn out to be true

• Stress theory of ulcers: As peptic ulcers became more common in the 20th century, doctors increasingly linked them to the stress of modern life. Medical advice during the latter half of the 20th century was, essentially, for patients to take antacids and modify their lifestyle. In the 1980s Australian clinical researcher Barry Marshal discovered that the bacterium H. pylori caused peptic ulcer disease, leading him to win a Nobel Prize in 2005

Another Crazy Idea• Immovable continents: Prior to the middle of the 20th century

scientists believed the Earth’s continents were stable and did not move. This began to change in 1912 with Alfred Wegener’s formulation of the continental drift theory, and later and more properly the elucidation of plate tectonics during the 1960s

• Accident and Serendipity- Pavlov did not set out to discover classical conditioning but was studying the digestive system and found that dogs would salivate to a neutral stimulus when paired with food

Theories• Theory-a systematic body of ideas about a particular

topic or phenomenon with a consistent structure that has two functions pg22

• 1) Theories organize and explain various facts and descriptions or observations putting them into a coherent framework (system)

• 2) Theories generate new knowledge by guiding our observations and generating new hypotheses Theories are living and dynamic (and the theory can be modified to account for new data)

• Theories Hypotheses A theory consists of much more than a simple idea and is grounded in prior research often with several consistent hypotheses

Theories (and facts) change• Theories can be modified by new discoveries Example-The

original conception of long term memory as a permanent fixed storage place was modified when Loftus (1979) demonstrated that memories could be influenced by how subjects were questioned pg23 participants viewed a simulated automobile accident and later asked questions Did you see the broken headlight? vs. Did you see a broken headlight? Subjects more likely to answer yes to first version

• Memories can also be induced so memory is not simply a record of what happened

• Relevant to Criminal Justice system and police procedures

Theories and data• Under sources of idea pg23 top Cozby and Bates cite the research

of Buss (2007) proposing that males feel more intense jealousy when a partner is unfaithful due to the physical infidelity while females are more jealous due to the emotional infidelity. This is consistent with evolutionary theory

• Females are more threatened by men who would form an emotional bond with another partner and withdraw support and resources –Males are more threatened that they might have to care for a child who does not share any of his genes taken from evolutionary theory pg23

Past Research• “Becoming familiar with a body of research on a

topic is perhaps the best way to generate ideas for new research” pg24

• Becoming familiar with a particular body of research allows you to see inconsistencies

• What you know about one research area may be applied to another research area

• Researchers refine and expand on known and published research

• Replication-An attempt to repeat a finding using a different setting, a different demographic group (age, sex etc) or different methodology

• Research is also stimulated by practical problems that may have immediate applications

Examining Data critically • Example of facilitated communication in

which a ‘facilitator’ held the hand of an autistic child to help press keys on a keyboard or otherwise assist in communication

• Montee et al. 1995 constructed study with three conditions (1) Both child and facilitator were shown the same picture and child asked to identify picture (by using keyboard) assisted by facilitator (2) Only child saw the picture (3) The child and facilitator saw different pictures (unknown to facilitator) – Results Pictures were correctly identified only in condition one

Evaluating web Information• Is the site associated with a major educational

institution or is it sponsored by one individual or organization and if so what may be the bias of that person or organization (e.g. Disabled People's International)

• Is the information provided by those responsible for the cite? What are their qualifications?

• Is the information current

• Do links from the site lead to legitimate organizations? Pg 35

Journals and Library Research• Most papers submitted for publication in major

journals are rejected (during peer review)

• Peer Review-Editors on the journal review the article and also send it to other experts in the field to review pg 25 Due to limited space and the number of articles received most articles submitted are rejected

• Journals usually specialize in one or two articles View pg26

• PsycINFO Science Citation Index Social Sciences Citation Index pubmed

Literature Review

• A “literature review” reviews the scholarly literature on a specific topic by summarizing and analyzing published work on that topic. A literature review has several purposes:

• 1) To evaluate the state of research on a topic

• 2) To familiarize readers and students with what has already been done in the field

• 3) To suggest future research directions or gaps in knowledge

Traditional and Open Access journals• In traditional, subscriber-pays publishing, the publisher,

who holds the copyright to an article, pays most printing and distribution costs and, in order to read an article, the journal subscriber pays fees, whether for hard-copy or online versions. Sometimes an author is required to pay printing page charges for complex graphics or color presentations.

• “Open access” publishing generally means that the author or publisher, who holds the copyright to an article, grants all users unlimited, free access to, and license to copy and distribute, a work published in an open access journal usually on-line

Traditional and Open Access journals• Traditional publishing - Individuals and libraries are charged fees to access

the article. Depending on the contract you sign as an author, you may not be able to distribute copies of your article or post it online.

• The now-common usage of the term "open access" means freely available for viewing or downloading by anyone with access to the internet.

• UK Wellcome Trust(global charitable foundation) assumes that “the benefits of research are derived principally from access to research results”, and therefore that “society as a whole is made worse off if access to scientific research results is restricted”

• Problems of traditional and open access• Sending papers to reviewers who are sympathetic (traditional)• Payment for publication (by authors) could create conflicts of interest and

have a negative impact on the perceived neutrality of peer review, as there would be a financial incentive for journals to publish more articles(open access)

• Open Access is also often seen as a solution to the situation where many libraries have been forced to cut journal subscriptions because of price increases

Traditional vs. Open Access Publishing

• Controversies about open access publishing and archiving confront issues of copyright and governmental competition with the private sector.

• Traditional publishers typically charge readers subscriber fees to fund the costs of publishing and distributing hard-copy and/or online journals.

• In contrast, most open access systems charge authors publication fees and give readers free online access to the full text of articles

Good and Bad sources

Anatomy of a Research Article

Abstract, Introduction, Method Section, Results Section and

Discussion (Conclusions)

Abstract and Introduction

• Abstract – a summary of the report which typically runs no more than 120 words. It includes information about the hypothesis, the procedure of the study and a summary of results (there may be some information about the discussion)

• Introduction The researcher outlines the problem including past research and theories relevant to the problem. Expectations are listed (usually in the form of hypotheses) pg35

Method Section• The method section is divided into subsections as determined

by the author and dependent on the complexity of the study and its design. Sometimes there is an overview of the design explained to the reader

• The next section describes the characteristics of the participants (number of subjects, male/female etc.)

• The next subsection describes the procedure, the materials or instruments used, how data was recorded.

• Additional subsections are used as necessary to describe equipment, procedures or other information to be included

• Details of all relevant information must be included to allow other researchers to replicate the study

Results and Discussion

• Results-In this section the researcher presents the findings, usually in three ways. First there is a narrative summary. Second there is a statistical description. Third tables are presented. “Statistics are only a tool the researcher uses. . .” Not understanding how the calculations were performed is not a deterrent to reading and understanding the logic behind the design and statistical procedures used

• Discussion-The researcher reviews the research from various perspectives, determining if the research supports the hypothesis or not and offer explanations in either case, what went wrong in the study. There is also usually a comparison with past research and there may be suggestions for practical applications of the research findings

The Quick Guide-copyrighted

• Introduction 1) What is known 2) What is not known which this study addresses

• Methods Who Where What –Who are the subjects-(describe them), where did they come from and what did you do with them (often divide them into groups such as experimental and control)

• Results-What happened? (e.g. which group did better)

• Discussion-What do the results mean. Interpretation of the study is in this section

Ethical Research-Chapter 3• Beneficence-The principle which states the need to

maximize benefits and minimize harm pg40

• Risk-Benefit Analysis- what is potential harm?, does

confidentiality hold?, was

there informed consent?

Milgram’s Methodology

• Through a rigged drawing, the participant was assigned the role of teacher while the confederate was always the learner. The participant watched as the experimenter strapped the learner to a chair in an adjacent room and attached electrodes to the learner’s arm. The participant’s task was to administer a paired associate learning test to the learner through an intercom system.

• Participants sat in front of an imposing shock generator and were instructed to administer an electric shock to the learner for each incorrect answer. Labels above the 30 switches that spanned the front of the machine indicated that the shocks ranged from 15 to 450 volts in 15-volt increments. Participants were instructed to start with the lowest switch and to move one step up the generator for each successive wrong answer.

Milgram’s Methodology• The subjects believed that for each wrong answer, the learner was

receiving actual shocks. In reality, there were no shocks. After the confederate was separated from the subject, the confederate set up a tape recorder integrated with the electro-shock generator, which played pre-recorded sounds for each shock level. After a number of voltage level increases, the actor started to bang on the wall that separated him from the subject. After several times banging on the wall and complaining about his heart condition, all responses by the learner would cease

• At this point, many people indicated their desire to stop the experiment and check on the learner. Some test subjects paused at 135 volts and began to question the purpose of the experiment. Most continued after being assured that they would not be held responsible

• After the 330-volt shock, the learner no longer screamed or protested when receiving a shock, suggesting that he was physically incapable of responding. The major dependent variable was the point in the procedure at which the participant refused to continue.

Milgram’s Methodology

Deception• If at any time the subject indicated his desire to halt the

experiment, he was given a succession of verbal prods by the experimenter, in this order

• Please continue.• The experiment requires that you continue.• It is absolutely essential that you continue.• You have no other choice, you must go on• If the subject still wished to stop after all four successive

verbal prods, the experiment was halted. Otherwise, it was halted after the subject had given the maximum 450-volt shock three times in succession

• The experimenter also gave special prods if the teacher made specific comments. If the teacher asked whether the learner might suffer permanent physical harm, the experimenter replied, "Although the shocks may be painful, there is no permanent tissue damage, so please go on

Ethical Research

• Milgram summarized the experiment in his 1974 article, "The Perils of Obedience", The legal and philosophic aspects of obedience are of enormous importance, but they say very little about how most people behave in concrete situations. I set up a simple experiment at Yale University to test how much pain an ordinary citizen would inflict on another person simply because he was ordered to by an experimental scientist.. . . The extreme willingness of adults to go to almost any lengths on the command of an authority constitutes the chief finding of the study and the fact most urgently demanding explanation. . . relatively few people have the resources needed to resist authority

• Milgram (1974) maintained that the key to obedience had little to do with the authority figure’s manner or style. Rather, he argued that people follow an authority figure’s commands when that person’s authority is seen as legitimate.

Data can surprise us• Before conducting the experiment, Milgram polled fourteen Yale

University senior-year psychology majors to predict the behavior of 100 hypothetical teachers. All of the poll respondents believed that only a very small fraction of teachers (the range was from zero to 3 out of 100, with an average of 1.2) would be prepared to inflict the maximum voltage. Milgram also informally polled his colleagues and found that they, too, believed very few subjects would progress beyond a very strong shock.

• Milgram also polled forty psychiatrists from a medical school and they believed that by the tenth shock, when the victim demands to be free, most subjects would stop the experiment. They predicted that by the 300 volt shock, when the victim refuses to answer, only 3.73 percent of the subjects would still continue and they believed that "only a little over one-tenth of one percent of the subjects would administer the highest shock on the board

The relevance of Milgram

• Milgram sparked direct critical response in the scientific community by claiming that "a common psychological process is centrally involved in both [his laboratory experiments and Nazi Germany] events

• There are psychological processes which can disengage morality from conduct

Criticism of Milgram• In addition to their scientific value, the obedience

studies generated a great deal of discussion because of the ethical questions they raised (Baumrind, 1964; Fischer, 1968; Kaufmann, 1967; Mixon, 1972). Critics argued that the short-term stress and potential long-term harm to participants could not be justified.

• In his defense, Milgram (1974) pointed to follow-up questionnaire data indicating that the vast majority of participants not only were glad they had participated in the study but said they had learned something important from their participation and believed that psychologists should conduct more studies of this type in the future. Nonetheless, current standards for the ethical treatment of participants clearly place Milgram’s studies out of bounds (Elms, 1995).

Mechanisms of moral disengagement. A.Bandura

• Theory of Moral Disengagement seeks to analyze the means through which individuals rationalize their unethical or unjust actions

• Moral justification- turns killing into a moral act. when non-violent acts appear to be ineffective and when there is a serious threat to a person's way of life. Justification can take many forms and can be considered a service to humanity or for the greater good of the community

• Displacement of Responsibility- Group decision making can diffuse responsibility. Personal responsibility is obscured

• Disregard for Consequences- People minimize the consequences of acts they are responsible for. It's easier to hurt others when they are not visible

• Dehumanization- People find violence easier if they don't consider they victims as human beings. The road to terrorism is gradual

• Euphemistic labeling- terms that are less negative or might be viewed as positive — to make actions seem less harmful. This sort of labeling also serves to limit or reduce their responsibility for their actions

• Advantageous comparison- people who engage in reprehensible acts make it seem less objectionable by comparing it to something perceived as being worse

Some criticisms of Milgram• Professor James Waller, Chair of Holocaust and Genocide Studies at Keene

State College, formerly Chair of Whitworth College Psychology Department, expressed the opinion that Milgram experiments do not correspond well to the Holocaust events

• The subjects of Milgram experiments, wrote James Waller (Becoming Evil), were assured in advance, that no permanent physical damage would result from their actions. However, the Holocaust perpetrators were fully aware of the finite nature of their hands-on killing and maiming of the victims.

• The laboratory subjects themselves did not know their victims and were not motivated by racism. On the other hand, the Holocaust perpetrators displayed an intense devaluation of the victims through a lifetime of personal development.

• Those serving punishment at the lab were not sadists, nor hate-mongers, and often exhibited great anguish and conflict in the experiment, unlike the designers and executioners of the Final Solution who had a clear "goal" on their hands, set beforehand.

• The experiment lasted for an hour, with no time for the subjects to contemplate the implications of their behavior. Meanwhile, the Holocaust lasted for years with ample time for a moral assessment of all individuals and organizations involved.

Risks of Research (continued)• Procedures that can cause physical harm are rare while

those that involve psychological stress are much more common (refer to Schacter’s study on stress and affiliation) If stress is possible the researcher must use all safeguards possible to assist in dealing with the stress and also include

a debriefing session pg 42

• Loss of privacy/confidentiality- Data should be stored securely and be made anonymously if possible but if not care should be taken to separate identifying data from actual data pg43

• Concealed Observation Is it ethical to use data taken from public web sites or those which require some identification

Risks of Research- Informed Consent

• Informed Consent Implies that potential subjects should be provided with all information that might influence their decision to participate in the study pg44

• Informed consent forms generally include 1) purpose of research 2) procedures involved 3) risk/benefits 4) any compensation 5) confidentiality 6) assurance of voluntary participation and permission to withdraw from study 7) contact information for subjects to ask questions

• To make form easier to understand it should not be written in the first person – I understand that participation is voluntary (first person) Instead – Participation in this study is voluntary pg44

Deception and Informed Consent• Deception occurs when there is active

misrepresentation of information. In the Milgram experiment there were two examples pg47

• 1) Subjects were told the study was about memory and learning while it was actually about obedience

• 2) Subjects were not told they would be delivering shocks to confederates (Milgram created a false reality for subjects)

• Milgram’s study took place before informed consent became routine. Might “honest’ informed consent resulted in a different outcome? Would it have biased the sample?

Deception and Ethics• The concepts of informed consent and debriefing have

become standard and more explicit pg48

• While false cover stories are still commonly used especially in Social Psychology, the use of deception is decreasing overall for three reasons

• 1) researchers have become more interested in cognitive variables rather than emotional ones and adopt practices more similar to those in cognitive studies which involve less deception (memory research)

• 2) there is greater sensitivity and awareness of ethical issues and how they should be handled in research

• 3) Review boards at universities are more stringent about approving research involving deception and want to know if alternatives are not available

Alternatives to Deception• Role Playing-different forms. Ss may be described a

situation and asked how they would respond or predict how real participants would react pg50

• However it is not easy to predict one’s own behavior especially when there is some undesirable behavior being studied (e.g. conformity, aggression)

• Most people overstate their altruistic tendencies

Alternatives to Deception• Simulations-enactment of some real situation

(can still pose ethical problems)

• Zimbardo prison experiment 1971 Stanford

• “Our planned two-week investigation into the psychology of prison life had to be ended prematurely after only six days because of what the situation was doing to the college students who participated. In only a few days, our guards became sadistic and our prisoners became depressed and showed signs of extreme stress”-Phillip Zimbardo http://www.prisonexp.org/

Alternatives to Deception• Honest Experiments-behavior studied without

elaborate deception (e.g. speed dating used to study romantic attraction)

• Subjects agree to have their behavior studied and know the hypotheses of the researchers

• Use situations when people seek assistance Assign students to different conditions of skill improvement (e.g. on-line or in-class help)

• Use naturally occuring events to test hypotheses (e.g. New York residents given PTSD checklist to determine if they were different from Wash D.C. residents after 9/11 attacks)

Sample selection and ethics• Justice principle- Any decisions to include or exclude certain

people from a research study must be make solely on scientific grounds (e.g. Tuskegee Syphilis Study 1932-1972) pg 52-54

• According to the rules of the U.S. Dept. of Health and Human services all institutions that 4receive federal funds must have an Institutional Review Board (IRB) responsible to review research proposed and conducted by that institution (even if it is not conducted on site at that institution)

• IRB must have at least 5 members with at least one member from outside the institution. Exceptions to IRB review include

• 1) research in there is no risk (anonymous questions, surveys etc.) are exempt from IRB review

• 2) Research with minimal risk(risk no greater than that encountered in daily life) are routinely approved by IRB. All other research with greater than minimal risk is reviewed and requires safeguards such as informed consent) See Table 3.1 pg 54 Assessment of Risk

IRB impact on Research• Some researchers may be frustrated over the

sometimes long process of review with numerous requests for revisions and clarifications.

• These IRB policies apply to all areas of research so that the caution necessary for some medical research is applied to other research with less risk

• Some studies indicate that students who have participated in research studies are more lenient in their judgments of the ethics of the experiment than the researchers themselves or the IRB members pg55

Risk-Benefits of Clinical Research

• Clinical trials involving new drugs are commonly classified into four phases

Risk-Benefits of Clinical Research•Phase I: Researchers test a new drug or treatment in a small group of people for the first time to evaluate its safety, determine a safe dosage range, and identify side effects.

•Phase II: The drug or treatment is given to a larger group of people to see if it is effective and to further evaluate its safety.

•Phase III: The drug or treatment is given to large groups of people to confirm its effectiveness, monitor side effects, compare it to commonly used treatments, and collect information that will allow the drug or treatment to be used safely.

•Phase IV: Studies are done after the drug or treatment has been marketed to gather information on the drug's effect in various populations and any side effects associated with long-term use (source NIH U.S. Library of Medicine)

Risk-Benefits of Clinical Research• More common than physical stress is psychological stress

(Schacter’s study (1959) on anxiety and affiliation)-In the study they had two conditions -- high anxiety and lower anxiety. In the high anxiety Researchers emphasized the ominous and expected pain of the electric shock experiment. In the low anxiety they made it seem nearly painless

• Subjects were to rate their anxiety level, and then decide if they prefer being alone or with others before the electric shock tests would begin. Lastly they were given the choice to be let out of the experiment (without credit for their psych class).

• Results- 63% of the high anxiety condition wanted to remain together, but only 33% wanted to be together in the low anxiety condition

Risk-Benefits of Clinical Research

• Psychological stress-Social psychology experiments (deception)

• Giving unfavorable feedback about S’s personality or asking about traumatic or unpleasant events

• The Bystander Intervention Model predicts that people are more likely to help others under certain conditions.

Social Psychology-Psychological harm/stress• Bystander intervention research

• Many factors influence people's willingness to help, including the ambiguity of the situation, perceived cost, diffusion of responsibility, similarity, mood and gender, attributions of the causes of need, and social norms.

• Situational ambiguity. In ambiguous situations, (i.e., it is unclear that there is an emergency) people are much less likely to offer assistance than in situations involving a clear-cut emergency (Shotland & Heinold, 1985). They are also less likely to help in unfamiliar environments than in familiar ones

• Perceived cost. The likelihood of helping increases as the perceived cost to ourselves declines (Simmons, 1991). We are more likely to lend our class notes to someone whom we believe will return them than to a person who doesn't appear trustworthy

Social Psychology-Psychological harm/stress-Bystander intervention research

• Diffusion of responsibility-The presence of others may diffuse the sense of individual responsibility. It follows that if you suddenly felt faint and were about to pass out on the street, you would be more likely to receive help if there are only a few passers-by present than if the street is crowded with pedestrians. With fewer people present, it becomes more difficult to point to the "other guy" as the one responsible for taking action. If everyone believes the other guy will act, then no one acts

• Similarity- People are more willing to help others whom they perceive to be similar to themselves—people who share a common background and beliefs. They are even more likely to help others who dress like they do than those in different attire (Cialdini & Trost, 1998). People also tend to be more willing to help their kin than to help non—kin (Gaulin & McBurney, 2001).

• Mood- People are generally more willing to help others when they are in a good mood

Social Psychology-Psychological harm/stress-Bystander intervention research

• Gender. Despite changes in traditional gender roles, women in need are more likely than men in need to receive assistance from strangers

• Attributions of the cause of need. People are much more likely to help others they judge to be innocent victims than those they believe have brought their problems on themselves (Batson, 1998). Thus, they may fail to lend assistance to homeless people and drug addicts whom they feel "deserve what they get."

• Social norms. Social norms prescribe behaviors that are expected of people in social situations (Batson, 1998). The social norm of "doing your part" in helping a worthy cause places a demand on people to help, especially in situations where their behavior is observed by others (Gaulin & McBurney, 2001). For example, people are more likely to make a charitable donation when they are asked to do so by a co-worker in full view of others than when they receive an appeal in the mail in the privacy of their own home

APA Ethics Code Research with Humans and Animals

• APA ethics code-Psychologists are committed to increasing scientific and professional knowledge of behavior and people’s understanding of themselves and others and to the use of such knowledge to improve the condition of individuals, organizations and society pg55

• Five general principles of the APA ethics code relate to beneficence, responsibility, integrity, justice and respect for the rights and dignity of others

• Of the ten ethical standards concerning conduct the focus is on the 8th Ethical Standard for Research and Publication

Ethics and Research with Humans• Institutional approval-IRB

• Informed consent includes purpose of experiment, right to decline or withdraw from study, consequences of declining, risks, benefits, confidentiality, incentives for participation and contact information

• Psychologist conducting intervention research clarify the nature of the treatment, services available to control group, how will treatment and control groups be formed, alternatives for those wishing to withdraw or not participate and any compensation offered for participation pg56

Ethics in Research with Humans (continued)

• 8.05 Psychologists may dispense with informed consent when there is no risk of harm or only anonymous questions or observations are used and confidentiality is protected pg57

• 8.06 Psychologist avoid offering excessive financial or other inducements and if a professional service is offered, its nature, risk and obligations are clarified

• 8.07 Psychologists do not use deception unless if can be justified by prospective scientific or other value and no reasonable alternatives are available. No deception is allowed in research that is expected to cause physical pain or severe emotional distress

Nuremburg Code• At the end of World War II, 23 Nazi doctors and scientists

were put on trial for the murder of concentration camp inmates who were used as research subjects. Of the 23 professionals tried at Nuremberg,15 were convicted, 7 were condemned to death by hanging, 8 received prison sentences from 10 years to life, and 8 were acquitted

• Ten points describing required elements for conducting research with humans became known as the Nuremburg Code

• 1) Informed consent is essential 2) Research should be based on prior animal work. The risks should be justified by the anticipated benefits. 3) Only qualified scientists must conduct research. 4) Physical and mental suffering must be avoided.

• 5) Research in which death or disabling injury is expected should not be conducted

Ethics and Animal Research

• Approximately 7% of articles in Psych Abstracts (PsychINFO) involve animals

• Animals commonly used to test effects of drugs, to study physiological mechanisms and genetics

• 95% of animals in research are rats, mice and birds

• Animal Rights groups have become more active

Environmental conditions for animals can be more easily

controlled than for humans

It is more difficult to monitor a human’s behavior than an

animal’s behavior

Most scientists agree that animal research benefits humans

Top Five Reasons to Stop Animal Testing- PETA

• It’s unethical to sentence 100 million thinking, feeling animals to life in a laboratory cage and intentionally cause them pain, loneliness, and fear.

• It’s bad science. The Food and Drug Administration reports that 92 out of every 100 drugs that pass animal tests fail in humans.

• It’s wasteful. Animal experiments prolong the suffering of people waiting for effective cures by misleading experimenters and squandering precious money, time, and resources that could have been spent on human-relevant research.

• It’s archaic. Forward-thinking scientists have developed humane, modern, and effective non-animal research methods, including human-based microdosing, in vitro technology, human-patient simulators, and sophisticated computer modeling, that are cheaper, faster, and more accurate than animal tests.

• The world doesn’t need another eyeliner, hand soap, food ingredient, drug for erectile dysfunction, or pesticide so badly that it should come at the expense of animals’ lives.

Ethics and Animal Research• 8.09 Psychologists acquire, care for, use and dispose of

animals in compliance with federal, state and local regulations and with professional standards pg59-60

• Psychologists ensure appropriate consideration for animal’s comfort, health and humane treatment

• All individuals under the supervision of a psychologist using animals have received instruction in research methods as well as the care, maintenance and handling of the species being used

• Surgery is performed under appropriate anesthesia minimizing infection and pain and subjecting animals to pain or stress must be justified scientifically

• When an animal’s life must be terminated it must be done rapidly minimizing pain and according to accepted procedure

Misrepresentation-Fraud and Plagiarism• Fabrication of data is fraud which is most commonly detected

when other scientists cannot replicate the results of a study pg 62-63

• Fraud is not considered a major problem in science (it is still rare)

in part because researchers know that others will read their reports and conduct their own studies and if found guilty of fraud reputations and careers are seriously damaged

• No independent agencies exist to check on the activities of scientists

• Plagiarism-misrepresenting another’s work as your own but can include a paragraph or even a sentence that is copied without a reference. Even if you paraphrase you must cite your source

• Szabo (2004)-50% of British university students believed that using internet for academically dishonest activates is acceptable

Fundamental Research Issues-chp4• Variable – any event, situation, behavior or individual

characteristic that varies. Any variable must have at least two or more levels or values pg69

• There are two broad classes of variable-those that vary in quality and those that vary in quantity ;for example gender is a qualitative variable and intelligence is a quantitative variable

• Common variables studied are reaction time, memory, self-esteem, stress etc.

• Discrete variables can have only finite set of values (no fractional values) sex, political affiliation, number of children) Continuous variable can take any value including fractional- Height, weight, some ability, IQ (do not have to report IQ in whole numbers-115 1/2)

Fundamental Research Issues• Operational definition- The set of procedures sued to

measure or manipulate a variable pg71

• Many measurements are indirect and we infer from them (We do not really measure temperature but the length of a column of mercury and infer temperature from that)

• Pain is a subjective state but we can create measures to infer how much pain someone is experiencing

• Wong-Baker FACES rating

scale

• To determine an operational definition we often ask “how does one behave if one possesses that trait?”

• Operational definitions forces scientists to discuss abstract concepts in concrete terms and communicate with each other using agreed upon concepts (how good is your operational definition=construct validity)

Relationships between Variables• Validity which refers to the degree to which a test

or other measure assesses or measures what it claims to measure is known as construct validity Does the operational definition reflect the true meaning of the variable? Pg71

• Validity which refers to whether you can generalize your results to other populations or situations is known as external validity (generalizability) pg85

Common Threats to Validity• History--the specific events which occur between

the first and second measurement.

• Maturation--the processes within subjects which act as a function of the passage of time. i.e. if the project lasts a few years, most participants may improve their performance regardless of treatment.

• Testing--the effects of being measured may change the behavior or performance of the subject.

• Instrumentation--the changes in the instrument, observers, or scorers which may produce changes in outcomes.

Threats to Validity (continued)

• Statistical regression-It is also known as regression to the mean. This threat is caused by the selection of subjects on the basis of extreme scores or characteristics. Give me forty worst students and I guarantee that they will show immediate improvement right after my treatment

• Selection of subjects--the biases which may result in selection of comparison groups. Randomization (Random assignment) of group membership is a counter-attack against this threat

Relationships Between Variables• Relationships between variables 1) Positive Linear

Relationship 2) Negative Linear Relationship 3) No Relationship and 4) Curvilinear Relationship pg72

• Positive linear Relationship Increases in one variable are accompanied by increases in a second variable

• Negative linear Relationship Increases of one variable are accompanied by decreases in a second variable

• No Relationship Levels of one variable are not related to levels of a second variable

• Curvilinear Relationship Increases in one variable are accompanied by systematic increases and decreases in a second variable pg73-74

Correlation Coefficient• Correlation refers to the degree of how strongly

variables are related to one another

• Correlated variables are those which tend to vary together; Correlation Causality

• Mexican lemon imports prevent highway deaths Obesity caused debt bubble

• Others- Pirates cause Global warming Number of radios and number of people in asylums

Correlation-Scatter Plot

• 1 is a perfect positive correlation

• 0 is no correlation (the values don't seem linked at all)

• -1 is a perfect negative correlation

The value shows how good the correlation is and if it is positive or negative

The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day, here are their

figures for the last 12 days

Ice Cream Sales vs Temperature

Temperature °C Ice Cream Sales

14.2° $215

16.4° $325

11.9° $185

15.2° $332

18.5° $406

22.1° $522

19.4° $412

25.1° $614

23.4° $544

18.1° $421

22.6° $445

17.2° $408

Correlation example

• You can easily see that warmer weather leads to more sales, the relationship is good but not perfect. The correlation is 0.9575

There has been a heat wave! It gets so hot that people aren't going near the shop, and sales start dropping.

• The correlation calculation only works well for relationships that follow a straight line. The calculated value of correlation is 0. But we can see the data follows a nice curve that reaches a peak around 25° C. But the correlation calculation is not "smart" enough to see this

• If you make a Scatter Plot, and look at it, you may see more than the correlation value says.

• Make your own scatterplot• http://www.alcula.com/calculat

ors/statistics/scatter-plot/

Random Variation• Random variability refers to uncertainty in events pg76

• Random Variability-Variability of a process (which is operating within its natural limits) caused by many irregular and erratic (and individually unimportant) fluctuations or chance factors that (in practical terms) cannot be anticipated, detected, identified, or eliminated.

• Research attempts to identify systematic relationships between variables ( reducing random variability)

Dispersion Sum of Squares In statistics, statistical dispersion (also called statistical

variability or variation) is variability or spread in a variable

• Subjects Score X X 2 x X2

• 1 0 0 -5 25

• 2 1 1 -4 16

• 3 2 4 -3 9

• 4 4 16 -1 1

• 5 5 25 0 0

• 6 6 36 1 1

• 7 7 49 2 4

• 8 8 64 3 9

• 9 8 64 3 9 S= SS = ?

• 10 9 81 4 16 N-1

• N=10 T=50 ∑X2= 340 = 0 ∑ = =90

Experimental vs Nonexperimental Methods

• Nonexperimental methods relationships are studied by observations or by measuring the variable of interest directly (recording responses to questions, examining collected data (much of the data is correlational-e.g. students who work longer hours have lower GPAs Variables are measured but not manipulated)

• Experimental method involves direct manipulation and control of variables. The two variables do not just vary together but one variable is introduced to determine how if affects the second variable pg78

Nonexperimental Method• Two limitations of Nonexperimental method

• 1) We are usually measuring covariation(correlation) which means it is difficult to determine the direction of cause and effect (Negative correlation between

anxiety and exercise -does anxiety reduce exercise or does exercise reduce anxiety? If exercise reduces anxiety than starting an exercise program would be a good way to reduce anxiety but if anxiety causes people to stop exercising then forcing someone to exercise may not reduce their anxiety)

• 2)We have the problem of a third variable (suppressor variable)pg 78-80 (in the example of anxiety and exercise a

third variable such as higher income may lead to both the lowering of anxiety and increase in exercise) Industrialization birth rate

increase in stork population

Class exercise – Interpret the correlation between shy sons and talkative mothers (r=Positive correlation-Talkative mothers have shy sons)

Third Variable (suppressor)Problem• Direction of cause and effect not always crucial. If you

are interested in making predictions while unable to manipulate variables it is still valuable (e.g. Astronomy)

• Example pg 79- Two causal patterns are possible in the correlation of Similarity: Liking

• 1) Similarity causes people to like each other

• 2) Liking causes people to become more similar

• However when there is a 3rd variable that is undesirable because it influences the relationship between the variables that an experimenter is examining (extraneous variable) and interpretation of the relationship is unclear (example of research on wine drinking and heart protection)

Confounding variables and Correlation• One limitation of nonexperimental methods is that

measures are indirect (and correlational) making it difficult to determine the direction of cause and effect pg80 (A perceived relationship between an independent variable and a

dependent variable that has been misestimated due to the failure to account for a confounding factor is termed a spurious relationship)

• Most common measure of correlation is the Pearson Product moment correlation coefficient (r) http://www.alcula.com/calculators/statistics/correlation-coefficient/

• r= SP SP=∑ XY- (∑X)(∑Y)

SSxSSy N

X= 2,4,4,5,7,8 =30 SP 2x5,4x9,4x9,5x11,7x15,8x17=378

Y=5,9,9,11,15,17=66 SP= 378-(30)(66) =48 r= 48_ =1.00

6 (24)(96)

http://www.alcula.com/calculators/statistics/correlation-coefficient/

Confounding Variables• Confounding variable-is an extraneous

variable(uncontrolled) in a statistical model that correlates (directly or inversely) with both variables being studied pg80 (A perceived relationship between an independent variable and a dependent variable that has been misestimated due to the failure to account for a confounding factor is termed a spurious relationship)

• If you eliminate the confounding variable you eliminate alternative or competing explanations

Correlation

Correlation and Prediction• Correlation refers to the degree of relationship

between two variables

• Regression-(Multiple) regression is a statistical tool used to derive the value of a criterion from several other independent, or predictor, variables. It is the simultaneous combination of multiple factors to assess how and to what extent they affect a certain outcome (y=X1+X2+X3. . . ETC.)

• “The terms correlation, regression and predication are so closely related in statistics that they are often used interchangeably”- J.Roscoe

• Construct regression model of predicting student grades with student grades as the dependent variable(y)

Latitude is significantly associated with theprevalence of multiple sclerosis: a meta-analysis

• Background There is a striking latitudinal gradient in multiple sclerosis (MS) prevalence, but exceptions in Mediterranean Europe and northern Scandinavia, and some systematic reviews, have suggested that the gradient may be an artefact. The authors sought to evaluate the association between MS prevalence and latitude by meta-regression

• Epidemiologic studies have shown a positive correlation of multiple sclerosis (MS) prevalence with latitude. However, there has not been a causal association found

• In statistics, a meta-analysis refers to methods that focus on contrasting and combining results from different studies, in the hope of identifying patterns among study results

Vitamin D and its immunoregulatory role in multiple sclerosis-Niino M,Drugs Today (Barc). 2010 Apr

• Mapping the distribution of multiple sclerosis (MS) reveals a high prevalence of the disease in high-latitude areas, suggesting a positive relationship between vitamin D and MS. Vitamin D is known to play an important role in bone and mineral homeostasis. It has recently been reported that several types of immune cells express vitamin D receptors and that vitamin D has strong immune-modulating effects. Vitamin D and its analogues inhibited experimental autoimmune encephalomyelitis (EAE, an animal model of MS) and there have been reports of small clinical trials on the treatment of MS with vitamin D.

• Furthermore, there have been discussions on the association between vitamin D levels and MS and about the genetic risk of vitamin D receptor (VDR) gene polymorphisms in MS. The current review discusses the immunological functions of vitamin D, the association between vitamin D and MS and expectations regarding the role of vitamin D in future treatments of MS

Sunlight and vitamin D for bone health and prevention ofautoimmune diseases, cancers, and cardiovascular disease-Michael F

Holick, Am J Clin Nutr 2004• Vitamin D is taken for granted and is assumed to be plentiful in a healthy

diet. Unfortunately, very few foods naturally contain vitamin D, and only a few foods are fortified with vitamin D. This is the reason why vitamin D deficiency has become epidemic for all age groups in the United States and Europe. Vitamin D deficiency not only causes metabolic bone disease among children and adults but also may increase the risk of many common chronic diseases.

• Solar ultraviolet B photons are absorbed by 7-dehydrocholesterol in the skin, leading to its transformation to previtamin D3, which is rapidly converted to vitamin D3

• Once formed, vitaminD3 is metabolized in the liver to 25-hydroxyvitamin D3 and then in the kidney to its biologically active form, 1,25- dihydroxyvitaminD3. Vitamin D deficiency is an unrecognized epidemic among both children and adults in the United States.

• Although chronic excessive exposure to sunlight increases the risk of nonmelanomaskin cancer, the avoidance of all direct sun exposure increases the risk of vitamin D deficiency, which can have serious consequences.

Vitamin D and multiple sclerosisHayes CE et al. Proc Soc Exp Biol Med. 1997 Oct;216(1):21-7

• This theory can explain the striking geographic distribution of MS, which is nearly zero in equatorial regions and increases dramatically with latitude in both hemispheres. It can also explain two peculiar geographic anomalies, one in Switzerland with high MS rates at low altitudes and low MS rates at high altitudes, and one in Norway with a high MS prevalence inland and a lower MS prevalence along the coast.

• Ultraviolet (UV) light intensity is higher at high altitudes, resulting in a greater vitamin D3 synthetic rate, thereby

accounting for low MS rates at higher altitudes. On the Norwegian coast, fish is consumed at high rates and fish oils are rich in vitamin D3.

Experimental Method• The experimental method reduces ambiguity by

manipulating one variable and measuring the other

• Example in Exercise and Anxiety-One group exercises daily for a week and another group does not exercise (Experimental vs Control group), Anxiety would be measured (discuss limits of this design) pg81

• Experimental method attempts to eliminate the influence of potentially confounding variables by controlling all aspects of the experiment except the manipulated variable which is held constant and ensuring that any variable that is not held constant are variables whose effects are random (random variables) give example

Randomization • The number of potential confounding variables is

infinite but the experimental method attempts to deal with this problem through randomizationwhich ensures that the extraneous confounding variable is as likely to affect one group as it is the other. Any variable that cannot be held constant can be controlled by randomization pg82

• Example If experiment is conducted over several days the researcher can use a random order of scheduling the sequence of the various experimental conditions (or can use a cross over) so that one group is not consistently studied in the morning or the afternoon

Random assignment• The thing that makes random assignment so powerful is

that greatly decreases systematic error – error that varies with the independent variable

• Extraneous variables that vary with the levels of the independent variable are the most dangerous type in terms of challenging the validity of experimental results. These types of extraneous variables have a special name, confounding variables. For example, instead of randomly assigning

students, the instructor may test the new strategy in the gifted classroom and test the control strategy in a regular class. Clearly, ability would most likely vary with the levels of the independent variable. In this case pre-knowledge would become a confounding extraneous variable

Independent and Dependent Variables• In research the variables are believed to have a cause

and effect relationship so that one variable is considered the cause (independent) while the other variable is considered the effect (dependent variable) pg83

• The independent variable is manipulated while the dependent variable is measured

• The independent variable is manipulated by the experimenter and the subject has no control over it (what the subject does is dependent on the variable manipulated by the experimenter) What are the independent & dependent variables in the class article? What are the operational definitions of terms in the study?

Internal and External Validity• Validity discusses to what extent are you measuring what

you claim to be measuring

• Internal validity is a property of scientific studies which reflects the extent to which a causal conclusion based on a study is warranted, and requires three elements pg85

• Temporal precedence-The causal variable (independent) is manipulated and the effect is observed/measured on the dependent variable

• Covariation-There must be some covariation between the two variables which is shown when subjects show some effect different than the control conditions

• Alternative explanations are eliminated (which means that confounding variables are eliminated or controlled)

• External validity refers to what extent the results can be generalized aka Generalizability

• Can the results of a study be replicated with other operational definitions, different subjects, different settings

• Researchers most interested in internal validity, establishing a relationship between two variables, may more likely conduct the study in a lab setting with a restricted sample while a researcher more interested in external validity might conduct a nonexperimental design with a more diverse sample

External Validity

Laboratory vs Field Experiments• Lab experiments require a high degree of control but the

setting may be too artificial and may limit the answering of some questions or the generality of results

• In Field Experiments the independent variable is manipulated in a natural setting (see study pg87 top) confederate coughs or not on passerbys who are then asked to rate their perceived risk of contracting a serious disease or having a heart attack)

• While it is more difficult to eliminate extraneous and confounding variables in field studies there is less danger of artificiality limiting the conclusions drawn from the study

Ethical and Practical Considerations

• In certain cases experimentation is unethical or impractical (e.g. child rearing practices) and variables are observed and measured as they occur

• When certain social variables are studied people are frequently categorized into groups based on their experience (example of studying corporal punishment groups were formed by who was spanked and who was not as a child-an ex post facto design (after the fact) Since no random assignment was made this would not be an experimental design pg88

Variables and Describing and Predicting Behavior

• Subject variables are characteristics of the subjects such as age, gender, ethnic group (categorical) and are nonexperimental by nature

• Since a major goal is to describe behavior, studies can be conducted with simple observations and manipulations (examples of Piaget and Buss’ study(2007) describing the reasons people reported having sex) pg88

• Multiple methods-Since no study is a perfect test of a hypothesis , multiple studies using multiple methods with similar conclusions increase our confidence in the findings pg89

Statistical Procedures in Measurement• Good research is inevitably dependent on

measurement

• Measurement devices or tests have at least threeessential attributes

• Standardization-test administered to well-defined group and their performance represents the norm (norm group) (standardization often includes the use of standard scores z

scores T scores etc. discussed in a later section)

• Validity-A test is valid when it measures what it is intended to measure

• Reliability-refers to the test’s precision in measuring

The problem of standardization-Diagnostic CT scans: assessment of patient, physician, and radiologist awareness of

radiation dose and possible risks-Radiology. 2004 May;231(2):393-8. Epub

2004 Mar 18 Lee,CL et al.

• PURPOSE: To determine the awareness level concerning radiation dose and possible risks associated with computed tomographic (CT) scans among patients, emergency department (ED) physicians, and radiologists.

• MATERIALS AND METHODS:

• Adult patients seen in the ED of a U.S. academic medical center during a 2-week period with mild to moderate abdominopelvic or flank pain and who underwent CT were surveyed after acquisition of the CT scan. Patients were asked whether or not they were informed about the risks, benefits, and radiation dose of the CT scan and if they believed that the scan increased their lifetime cancer risk. Patients were also asked to estimate the radiation dose for the CT scan compared with that for one chest radiograph. ED physicians who requested CT scans and radiologists who reviewed the CT scans were surveyed with similar questions and an additional question regarding the number of years in practice. The chi(2) test of independence was used to compare the three respondent groups regarding perceived increased cancer risk from one abdominopelvic CT scan.

• RESULTS:

• Seven percent (five of 76) of patients reported that they were told about risks and benefits of their CT scan, while 22% (10 of 45) of ED physicians reported that they had provided such information. Forty-seven percent (18 of 38) of radiologists believed that there was increased cancer risk, whereas only 9% (four of 45) of ED physicians and 3% (two of 76) of patients believed that there was increased risk (chi(2)(2) = 41.45, P <.001). All patients and most ED physicians and radiologists were unable to accurately estimate the dose for one CT scan compared with that for one chest radiograph.

• CONCLUSION:

• Patients are not given information about the risks, benefits, and radiation dose for a CT scan. Patients, ED physicians, and radiologists alike are unable to provide accurate estimates of CT doses regardless of their experience level

Radiation Dose Associated With Common Computed Tomography Examinations and the Associated Lifetime Attributable Risk of Cancer

Rebecca Smith-Bindman Arch Intern Med. 2009;169(22):2078-2086

• Background Use of computed tomography (CT) for diagnostic evaluation has increased dramatically over the past 2 decades. Even though CT is associated with substantially higher radiation exposure than conventional radiography, typical doses are not known. We sought to estimate the radiation dose associated with common CT studies in clinical practice and quantify the potential cancer risk associated with these examinations.

• Methods We conducted a retrospective cross-sectional study describing radiation dose associated with the 11 most common types of diagnostic CT studies performed on 1119 consecutive adult patients at 4 San Francisco Bay Area institutions in California between January 1 and May 30, 2008. We estimated lifetime attributable risks of cancer by study type from these measured doses.

• Results Radiation doses varied significantly between the different types of CT studies. The overall median effective doses ranged from 2 millisieverts (mSv) for a routine head CT scan to 31 mSv for a multiphase abdomen and pelvis CT scan. Within each type of CT study, effective dose varied significantly within and across institutions, with a mean 13-fold variation between the highest and lowest dose for each study type. The estimated number of CT scans that will lead to the development of a cancer varied widely depending on the specific type of CT examination and the patient's age and sex. An estimated 1 in 270 women who underwent CT coronary angiography at age 40 years will develop cancer from that CT scan (1 in 600 men), compared with an estimated 1 in 8100 women who had a routine head CT scan at the same age (1 in 11 080 men). For 20-year-old patients, the risks were approximately doubled, and for 60-year-old patients, they were approximately 50% lower.

• Conclusion Radiation doses from commonly performed diagnostic CT examinations are higher and more variable than generally quoted, highlighting the need for greater standardization across institutions.

Measurement Concepts Chp 5• Reliability refers to the consistency, precision or

stability of a measure of behavior pg96 Are the results the same or very similar each time you measure a variable?

• Measures that change or fluctuate are not reliable (assuming change is not due to the variable changing)

• Any measure has two parts 1) true score- real value of the variable and 2) measurement error-is shown by the greater variability

• Researchers cannot use unreliable measures (Duh!)

• Reliability is increased when we increase the number of items in our measure, survey or test

Measuring Reliability• We can measure reliability using the Pearson product

moment correlation coefficient pg98

• To calculate reliability we must have at least two scores on the measure across individuals. If the measure is reliable the two scores should be similar for each of the individuals studied (high positive correlation For most measures coefficient should be at least .80) pg 98

• Types of Reliability

• 1) Test-Retest –Measures the same individuals at least for two points in time then calculate the Pearson product moment r between the scores. Test-Retest reliability is sometimes called a coefficient of stability in that it measures how stable is the trait being measured (Discuss some threats to validity for this measure) This is not a good measurement for traits that are considered to be in a state of flux or events occurring between the two administrations of the test

Measuring Reliability• 2) Equivalent Form-Can avoid problems associated

with Test-Retest by giving equivalent forms of the same test to the same set of people, calculating the correlation between the two scores. You can administer the two tests close in time (something you cannot do with Test-Retest).

• However to the extent that the two forms are not totally equivalent a new source of error is introduced. Equivalent forms usually yield lower estimates of reliability than Test-Retest (why?) see next slide with two forms of Rey Complex Figure

Rey Complex Figures Form A & B

Measuring Reliability• Split-Half Reliability-Test is administered once, then the test is

split in half, scored separately and a Pearson r is calculated for each score

• Split-Half-correlation between the first and second half of the measurement

• Odd-Even correlation between the even items & odd items of a measurement

• In either case only one administration is required and the coefficient is determined by the internal components of the test (aka internal consistency reliability)

• Split-half not meaningful in speed tests (in which most items are not difficult and score depends on how many items answered correctly e.g. algebra test) Coefficient of reliability is inflated*

• Item-Total correlations-Look at the correlation between each item score with the total score, based on all items (also measures internal consistency)

• Cronbach’s alpha -is a coefficient of internal consistency Averages split-half coefficients. a function of the number of test items and the average

inter-correlation among the items pg99-100

Interrater Reliability• In research in which raters observe behaviors and make ratings or

judgments (and then those judgments are compared and agree determines interrater reliability)

• Bandura (1961) conducted a study to investigate if social behaviors (i.e. aggression) can be acquired by imitation 36 boys and 36 girls were tested from the Stanford University Nursery School aged between 3 to 6 years old. The role models were one male adult and one female adult

• Under controlled conditions, Bandura arranged for 24 boys and girls to watch a male or female model behaving aggressively towards a toy called a 'Bobo doll'. The adults attacked the Bobo doll in a distinctive manner -they used a hammer in some cases, and in others threw the doll in the air and shouted "Pow, Boom“. Another 24 children were exposed to a non-aggressive model and the final 24 child were used as a control group and not exposed to any model at all.

• To test the inter-rater reliability of the observers, 51 of the children were rated by two observers independently and their ratings compared. These ratings showed a very high reliability correlation (r = 0.89), which suggested that the observers had good agreement about the behavior of the children https://www.youtube.com/watch?v=hHHdovKHDNU

Construct Validity of Measures pg101

• Construct Validity is concerned with whether our methods of studying variables is accurate (is our operational

definition valid?) also see pg 90 Does our method actually measure the construct it was intended to measure

Measures of (construct)Validity/ Valid=True• Construct Validity

• Refers to the accuracy of our measurements and operational definition-Indicators of Construct Validity –Is our method of measuring a variable accurate

• Face Validity-The item appears to accurately measure the variable defined. Appearance is not sufficient to conclude that a measure is accurate. Some measures, such as surveys in popular magazines have questions that may look reasonable (have face validity) but tell you very little-Cosmopolitan Surveys

1) What Guys Secretly Think of Your Hair & Makeup: The truth revealed! 2) 20 Dresses He Will Love

3) What He Thinks When He Walks Through Your Door (4) 7 Facebook Habits that Guys Hate 5) 78 Ways to Turn Him On

6) The Secret to Getting Any Guy (7)How to be a Total Man Magnet (8) Sexy Summer Hair Ideas (9)Meet a New Guy by Summer! (10)How to Decode His Body Language http://www.cosmopolitan.co.uk/quizzes/how-hot-headed-are-you-quiz

Little if any empirical evidence exists to support the conclusions in these articles

Content Validity- How well does the content of a test sample the situations about

which conclusions are drawn. Requires some expertise to define a “universe of interest”, careful drawing of a sample of ideas from this universe and the preparation of test items that match these ideas-Compare the content of the measure with the universe of content that defines that construct pg103 (For example, the content of the SAT Subject Tests™ is evaluated by committees

made up of experts who ensure that each test covers content that matches all relevant subject matter in its academic discipline)

Both face validity and content validity focus on determining if the content of a measure reflects the meaning of the construct measured

http://www.cosmopolitan.co.uk/quizzes/how-hot-headed-are-you-quiz

Validity continued• Content Validity-Statistical methods may be applied to help

determine content validity. A test constructor may perform a correlation between the score on each item and the score on the total test. Test items that are not consistent with the total are either revised or eliminated

• Predictive Validity (a type of Criterion validity)- A measure is used to predict performance so that one measure occurs earlier than another Predictive validity is one type of Criterion Validity (LSAT and performance in Law School)

• Concurrent Validity applies to validation studies in which the two measures are administered at approximately the same time (for

example, an employment test may be administered to a group of workers and then the test scores can be correlated with the ratings of the workers' supervisors taken on the same day or in the same week. The resulting correlation would be a concurrent validity coefficient) pg104

• Concurrent validity and predictive validity are two types of criterion-related validity in which scores are correlated or measured against an external criterion . The difference between concurrent validity and predictive validity rests solely on the time at which the two measures are administered.

Validity continued• Convergent Validity-Defines how well one set of scores on a

measure are related to another set of scores measuring the same or similar concepts

• measures of constructs that theoretically should be related to each other are, in fact, observed to be related to each other

• Discriminant Validity-measures of constructs that theoretically should

not be related to each other are, in fact, observed to not be related to each other pg104 (compare Convergent and Discriminant Validity to differential diagnosis)

• Convergent and discriminant validity are both considered subcategories or subtypes of construct validity - neither one alone is sufficient for establishing construct validity

• Imagine you are under the assumption that those that would buy your product again are satisfied, as that would be what is expected. Testing for convergent validity in a survey may look like this:

• Question 1: Would you buy product X again if given the chance?

• Question 2: How satisfied are you with product X?

• If they say yes to the first question, but they do not score the product very highly in the second question, the question may have failed the validity test

Validity continued• Divergent validity is designed to see if you get the expected opposite result,

because that should also help imply that the question is answering in the way you wanted it to answer. For example:

• Question 1: Do you wish you did not own product X?

• Question 2: Would you buy product X again if given the chance?

• If they answered yes for the first question, and yes for the second question, it would imply that the question was too confusing, because you did not receive the opposite response you expected. This would be divergent validity

• A major impetus to the study of validity was provided a half century ago by Campbell & Fiske (1959), who introduced the multitrait-multimethod (MTMM) matrix as a means for construct validation. The MTMM method can be used when multiple traits are examined simultaneously and each of them is assessed by a given set of measures or measurement methods (e.g., Eid, 2000; Marsh & Hocevar, 1983). As shown initially by Campbell and Fiske, and further elaborated by subsequent authors, two types of validity coefficients are of special interest when the MTMM matrix is utilized in the validation process—convergent validity and discriminant validity coefficients.

• Reactivity-A measure is reactive if awareness of being measured changes an individual's behavior This is what threat to validity?

• History? Maturation? Testing? Selection (of subjects)? Regression?

Relationship between Reliability and Validity

• Validity is the extent to which a test measures what it is supposed to measure while reliability is how well it measures the variable(s)

• You can have reliability without validity but you cannot have validity without reliability

Association of Facilities of Medicine of Canada AFMC

• Validity of concepts such as illness or disease

• Cultural conventions affect where the boundary between disease and non-disease is placed: menopause may be considered a health issue in North America, but symptoms are far less commonly reported in Japan.

• Improvements in health have not reduced the demands on doctors. Instead, doctors are called on to broaden the scope of what they treat. Conditions, previously not regarded as medical problems, such as hyperactivity in children, infertility in young couples, weight gain in middle-aged adults, or the various natural effects of aging, now commonly lead patients to consult their doctor; the list is likely to expand.

Validity of Diagnostic Labels • ?Non-Disease

• In 2002, the British Medical Journal stimulated a debate over the appropriate expectations to place on doctors and on how to define the limits of medicine. Richard Smith, editor of the Journal, surveyed readers to collect examples of non-diseases, and found almost two hundred.

• He defined non-disease in terms of "a human process or problem that some have defined as a medical condition but where people may have better outcomes if the problem or process was not defined in that way." Examples include burnout, chemical sensitivity, genetic deficiencies, senility, loneliness, bags under the eyes, work problems, baldness, freckles, and jet lag.

• Smith’s purpose was to emphasize that disease is a fluid concept with no clear boundaries. He noted various dangers in being over-inclusive in defining disease:

• when people are diagnosed with a disease and become patients they could be denied insurance, lose their job, have their body invaded in the name of therapy, or be otherwise stigmatised.

• The debate is covered in the British Medical Journal, April 13, 2002; vol. 324: pages 859-866 and 883-907.

Measures of Validity (continued)

• Predictive Validity-extent to which a score on a scale or test predicts scores on some criterion measure

• Predictive Validity Concerns tests that are intended to predict future performance (GRE, LSAT). The construct validity of the measure is shown if it predicts future behavior

False Positives-False Negatives• Biomedical Research Imaging Center at the University of North

Carolina at Chapel Hill School of Medicine-Etta Pisano

• American Cancer Society issued new guidelines that recommend an annual MRI screen in addition to an annual mammography for women at high risk of breast cancer.

• But, because the false-positive rate of MRIs was relatively high --about 11 percent in the new study -- the authors don't recommend MRI as a screening tool for the general population.

• National Cancer Institute-Even though breast cancer is the most common noncutaneous cancer in women, fewer than 5 per 1,000 women actually have the disease when they are screened. Therefore, even with a specificity of 90%, most abnormal mammograms are false-positives

Effectiveness of Positron Emission Tomography for the Detection of Melanoma Metastases ANNALS OF SURGERY Vol. 227, No. 5, 764-771 1998 Holder,W et. al

• The purpose of this study was to determine the sensitivity, specificity, and clinical utility of 18F 2-fluoro-2-deoxy-D-glucose (FDG) total body positron emission tomography (PET) scanning for the detection of metastases in patients with malignant melanoma (melanoma causes the majority (75%) of deaths related to skin cancer).

• Introduction-Recent preliminary reports suggest that PET using FDG may be more sensitive and specific for detection of metastatic melanoma than standard radiologic imaging studies using computed tomography (CT). PET technology is showing utility in the detection of metastatic tumors from multiple primary sites including breast, lung, lymphoma, and melanoma. However, little information is available concerning the general utility, sensitivity, and specificity of PET scanning of patients with metastatic melanoma.

• Methods One hundred three PET scans done on 76 nonrandomized patients having

AJCC (American Joint Committee on Cancer) stage II to IV melanoma were prospectively evaluated. Patients were derived from two groups. Group 1 (63 patients) had PET, CT (chest and abdomen), and magnetic resonance imaging (MRI; brain) scans as a part of staging requirements for immunotherapy protocols. Group 2 (13 nonprotocol patients) had PET, CT, and MRI scans as in group 1, but for clinical evaluation only. PET scans were done using 12 to 20 mCi of FDG given intravenously. Results of PET scans were compared to

CT scans and biopsy or cytology results.

Effectiveness of PET tumor detection• Malignant tumors generally have greater rates of glucose

utilization and overall metabolism than normal tissues. FDG is a glucose analogue that is taken up by rapidly dividing cells.

• Most melanomas are rapid users of glucose; in fact, melanoma cells in vitro demonstrate a higher FDG uptake than any other tumor type.

• PET scanning uses tracers that emit positrons (positively charged

electrons) that are very short-lived. They are produced in medical cyclotrons or accelerators to be used quickly after preparation. The half-life of 18F is 109 minutes.

• Positrons rapidly combine with negative electrons and are annihilated. This process produces a pair of 511-KeV photons emitted 1800 to one another that are then detected by the PET scanner. A computer then processes the images so that they can

be viewed as multiple-plane images.

PET False Positives False Negatives• False negatives occur in 1) patients who have

hyperglycemia 2) Tumors that are slow-growing or have a large necrotic component may have decreased FDG uptake.

• False positives are caused by 1) urinary excretion of the isotope Administered radioiodine is excreted mainly by the urinary system, and so all dilations,

diverticuli and fistulae of the kidney, ureter and bladder may produce radioiodine retention.(Shapiro, Rufini et al. 2000)

2) Patients who are unusually muscular or have an increased resting muscle tone take up FDG at a much higher rate than persons with relaxed musculature.

• Back to the study- The purpose of this study was to determine prospectively the sensitivity, specificity, and clinical utility of FDG total body PET scanning for the detection of metastases in patients with malignant melanoma by comparing PET to double-contrast CT scans and histologically or cytologically correlating these findings.

Effectiveness of PET in Melanoma Detection • Methods (continued)

• Sensitivity was defined as the proportion of patients with metastatic melanoma who had a positive PET scan.

• Specificity was defined as the proportion of patients who did not have metastatic melanoma who had a negative PET scan

• FDG was synthesized using the Siemens RDS negative ion cyclotron and CPCU

automated chemistry module. 18 Fluorine as fluoride was produced using a proton-neutron reaction on 95% enriched'8 oxygen water. 18 Fluorine-FDG was synthesized in the CPCU using the modified Hamacher synthesis (mannose triflate/18F-fluoride reaction). The product was delivered pure, sterile, and in an injectable form. Each lot of

18Fluorine-FDG was analyzed to confirm radionuclide, radiochemical, and chemical purity as well as sterility and pyrogenicity. The product conformed with United States Pharmacopeia monograph standards. Huh?

• Results

• The accuracy of CT scanning for melanoma lung metastases was equivalent to that of PET scanning. However, PET scanning was superior to CT scanning in identifying melanoma metastases to regional and mediastinal lymph nodes, liver, and soft tissues. (The mediastinum is the

cavity that separates the lungs from the rest of the chest. It contains the heart, esophagus, trachea, thymus, and aorta)

Results (continued)

• PET CT

• Total scans 103 92

• Evaluable scans 100 92

• True-positive scans 49 26

• False-positive scans 8 7

• True-negative scans 40 38

• False-negative scans 3 21

Discussion • CT scanning is widely used for the detection of metastases in a variety of

malignant neoplasms, including melanoma. The primary value of CT scanning is the clear delineation of anatomic detail. A particular problem with CT scanning is that small lymph nodes or small metastases may not be detectable or may appear to be of normal size and configuration, while enlarged nodes and other masses may be due to inflammation and nonmalignant processes. These findings contribute to both the false-positive and false negative rates reported for CT scans. CT scanning for detection of both primary and metastatic disease in the lung is generally very good for lesions in the lung parenchyma.

• PET scanning as currently done does not reveal the anatomic detail of CT scanning. However, imaging of even extreme anatomic detail often cannot discern benign from malignant processes particularly with smaller 1cm lesions. The value of PET scanning lies in the visualization of high metabolic activity of rapidly growing tumors such as melanoma With close clinical correlation and tissue confirmation, PET scanning is an extremely useful tool to evaluate high-risk melanoma patients for the development of metastases

• Conclusion PET is superior to CT in detecting melanoma metastases and has a role as a primary strategy in the staging of melanoma.

Accuracy and reliability of forensic latent fingerprint decisions

• The criminal justice system relies on the skill of latent print examiners as expert witnesses. Currently, there is no generally accepted objective measure to assess the skill of latent print examiners

• The interpretation of forensic fingerprint evidence relies on the expertise of latent print examiners. The National Research Council of the National Academies and the legal and forensic sciences communities have called for research to measure the accuracy and reliability of latent print examiners’ decisions. Here, we report on the first large-scale study of the accuracy and reliability of latent print examiners’ decisions, in which 169 latent print examiners each compared approximately 100 pairs of latent and exemplar fingerprints from a pool of 744 pairs.

• Latent prints (“latents”) are friction ridge impressions (fingerprints, palmprints, or footprints) left unintentionally on items such as those found at crime scenes Exemplar prints (“exemplars”), generally of higher quality, are collected under controlled conditions from a known subject using ink on paper or digitally with a livescan device . Latent print examiners compare latents to exemplars, using their expertise rather than a quantitative standard to determine if the information content is sufficient to make a decision.

Proceedings of the National Academy of Sciences of the United States of America

PNAS Ulery,B et al MARCH 2011

Accuracy and reliability of forensic latent fingerprint decisions

• Latent print examination can be complex because latents are often small, unclear, distorted, smudged, or contain few features; can overlap with other prints or appear on complex backgrounds; and can contain artifacts from the collection process. Because of this complexity, experts must be trained in working with the various difficult attributes of latents

• Five examiners made false positive errors for an overall false positive rate of 0.1%. Eighty-five percent of examiners made at least one false negative error for an overall false negative rate of 7.5%. Independent examination of the same comparisons by different participants (analogous to blind verification) was found to detect all false positive errors and the majority of false negative errors in this study. Examiners frequently differed on whether fingerprints were suitable for reaching a conclusion.

Types of Variables Discrete vs Continuous• Discrete vs. Continuous

• A discrete variable is one with a well defined finite set of possible values, called states. Examples are: the number of dimes in a purse, a statement which is either “true” or “false”, which party will win the election, the country of origin, voltage output of a digital device, and the place a roulette wheel stops.

• A continuous variable is one which can take on a value between any other two values, such as: indoor temperature, time spent waiting, water consumed, color wavelength, and direction of travel. A discrete variable corresponds to a digital quantity, while a continuous variable corresponds to an analog quantity

Variables and Measurement Scales• We want to determine if there is a relationship between our

independent variable (chosen and/or manipulated by the Experimenter) and the dependent variable (measuring some aspect or behavior of our subject(s)

• Four Kinds of Measurement Scales

• Nominal scales- When measuring using a nominal scale, one simply names or categorizes responses (nominal variables are categorical). Gender, handedness, favorite color, and religion are examples of variables measured on a nominal scale. The essential point about nominal scales is that they do not imply any ordering among the responses. For example, when classifying people according to their favorite color, there is no sense in which green is placed "ahead of" blue. Responses are merely categorized. Nominal scales embody the lowest level of measurement. In an experiment the independent variable is often a nominal or categorical variable pg106 (example on pg

107 Group 1 participated in meditation Group 2 did not All subjects underwent MRI. The independent variable was participation/no participation, a nominal (categorical) variable

Variables and Measurement Scales• Ordinal Scales- allow us to rank order the levels of a variable

(category) being studied. However nothing is specified about the magnitude of the interval between the two measures so that in a rank order no particular value is attached to the intervals between numbers (horse race; First, Second, Third)

• Ordinal scales fail to capture important information that will be present in the other scales we examine. In particular, the difference between two levels of an ordinal scale cannot be assumed to be the same as the difference between two other levels. In a satisfaction scale ranking a customer’s satisfaction for a product, the difference between the responses "very dissatisfied" and "somewhat dissatisfied" is probably not equivalent to the difference between "somewhat dissatisfied" and "somewhat satisfied.“

• Example pg107 Movie rating system from one to four checks

Variables and Measurement Scales• Interval scales are numerical scales in which intervals have the same

interpretation throughout in that the intervals between the numbers are equal in size. As an example, consider the Fahrenheit scale of temperature. The difference between 30 degrees and 40 degrees represents the same temperature difference as the difference between 80 degrees and 90 degrees. This is because each 10-degree interval has the same physical meaning However there is no absolute zero on the scale (in this case the zero does not indicate an absence of temperature but is only an arbitrary reference point) pg107

• Since an interval scale has no true zero point, it does not make sense to compute ratios of temperatures. For example, there is no sense in which the ratio of 40 to 20 degrees Fahrenheit is the same as the ratio of 100 to 50 degrees; no interesting physical property is preserved across the two ratios-it does not make sense to say that 80 degrees is "twice as hot" as 40 degrees

Variables and Measurement Scales• Ratio scales- The ratio scale of measurement is the most

informative scale. It is an interval scale with the additional property that its zero position indicates the absence of the quantity being measured. Often these include physical measures such as length, weight or time (Since ratios are allowed you can say someone is twice as fast or slow as someone else) pg108

• With interval and ratio scales your can make quantitative distinctions that allow you to talk about amounts of the variable

• Since money has a true zero point, it makes sense to say that someone with 50 cents has twice as much money as someone with 25 cents (weight, time, length are also ratio scale measures)

• Since many variables in behavioral science are less precise ratio scales are often not achieved. However since statistical tests for Interval and Ratio variables are the same the real question becomes if you can achieve an interval scale of measurement for your study so that you can use (usually) more powerful statistical tests

Cramped Synchronized General Movements in Preterm Infants as an Early Marker for Cerebral Palsy Ferrari,F Arch Pediatr

Adolesc Med. 2002

• Objective To ascertain whether specific abnormalities (ie, cramped synchronized general movements [GMs]) can predict cerebral palsy and the severity of later motor impairment in preterm infants affected by brain lesions.

• Design Traditional neurological examination was performed, and GMs were serially videotaped and blindly observed for 84 preterm infants with ultrasound abnormalities from birth until 56 to 60 weeks' postmenstrual age. The developmental course of GM abnormalities was compared with brain ultrasound findingsalone and with findings from neurological examination, in relation to the patient's outcome at age 2 to 3 years.

Cramped Synchronized General Movements in Preterm Infants as an Early Marker for Cerebral Palsy

• An early prediction of cerebral palsy will lead to earlier enrollment in rehabilitation programs. Unfortunately, reliable identification of cerebral palsy in very young infants is extremely difficult.10 It is generally reported that cerebral palsy cannot be diagnosed before several months after birth11-15 or even before the age of 2 years.16

• A so-called silent period, lasting 4 to 5 months or more, and a period of uncertainty until the turning point at 8 months of corrected age have also been identified.12-13 The neurological symptoms observed in the first few months after birth in preterm infants who will develop cerebral palsy are neither sensitive nor specific enough to ensure reliable prognoses.

• Irritability, abnormal finger posture, spontaneous Babinski reflex,17-18

weakness of the lower limbs,19 transient abnormality of tone,12-13,20-24

and delay in achieving motor milestones11 are some of the neurological signs that have been described in these high-risk preterm infants

Early Marker for Cerebral Palsy continued

• Results Infants with consistent or predominant (33 cases) cramped synchronized GMs developed cerebral palsy. The earlier cramped synchronized GMs were observed, the worse was the neurologicaloutcome. Transient cramped synchronized character GMs (8 cases)were followed by mild cerebral palsy (fidgety movements wereabsent) or normal development (fidgety movements were present).Consistently normal GMs (13 cases) and poor repertoire GMs (30cases) either lead to normal outcomes (84%) or cerebral palsy with mild motor impairment (16%). Observation of GMs was 100%sensitive, and the specificity of the cramped synchronized GMs was 92.5% to 100% throughout the age range, which is much higher than the specificity of neurological examination.

• Conclusions Consistent and predominant cramped synchronized GMs specifically predict cerebral palsy. The earlier this characteristicappears, the worse is the later impairment

Observational Methods Chp 6• Observational methods are generally either

quantitative (focus on behaviors that can be quantified) or qualitative (focus on people behaving in natural settings-samples usually smaller than for quantitative methods)

• Naturalistic observation-individuals observed in their natural environment=field work/field observation-researchers do not attempt to influence events pg116

• Researcher interested in first, describing people, setting and events and second, analyze what was observed Naturalistic observation=qualitative

Observational Methods• Researcher decides if will be participant or nonparticipant

observer-Field research often very time consuming and inconvenient also often in unfamiliar environments

• Jane Goodall Instead of numbering the chimpanzees she observed, she gave them names Claiming to see individuality and emotion in chimpanzees, she was accused of anthropomorphism

• Hunter Thompson and the Hell’s Angels- became converted to their motorcycle mystique, and was so intrigued, as he puts it, that 'I was no longer sure whether I was doing research on the Hell's Angels or being slowly absorbed by them’ he remained close with the Angels for a year, but ultimately the relationship waned. It ended for good after several members of the gang gave him a savage beating or "stomping" over a remark made by Thompson to an Angel named Junkie George, who was beating his wife. Thompson said: "Only a punk beats his wife." The beating stopped only when senior members of the club ordered it

Methodological Issues in Observation• Coding-researcher chooses behavior and describes and measures

that behavior with a coding system pg119 In systematic observation usually two or more raters are used to code behavior pg120

• Sampling-Event recording simply tallies the frequency of a given behavior during the observation period. Interval recording similarly captures frequency, but divides the observation period into segments and counts the number of segments in which the target behavior is displayed, either throughout the interval or at a particular time point in the interval. Duration recording measures the length time a behavior lasts

• Functional behavior assessment, an observational strategy, assesses antecedents, frequency, duration, and consequences of the aggressive behavior for the target child and others in the environment to determine the functions that the aggressive behavior serves for the child. In spite of obvious benefits of direct observation, the strategy can be limited by several problems

Methodological Issues in Observation• Behaviors must be clearly defined, and observers must be trained to

fully understand the exact behaviors that are to be captured. Observer bias or the tendency to see what one expects to see is especially troublesome in direct observation of aggression

• In a study conducted by Baron (1976) an accomplice failed to move his vehicle for 15 seconds after the traffic signal at preselected intersections turned green. The reactions of passing motorists to this unexpected delay were recorded by two observers seated in a second parked car at the intersection using a tape recorder to determine the frequency, duration and latency of horn honking of motorists (Video recording has become very popular)

• Reactivity- the possibility that the presence of the observer will affect behavior can be minimized by concealed observation with small cameras and microphones pg120 What threat to validity does this represent?

Methodological Issues in Observation• Case study-observational method applied to an individual

Presents individual’s history, symptoms, characteristic behavior response to treatment pg121

• Case studies may or may not include naturalistic observation-In Psychology/Psychiatry the case study is usually a description of the patient with an historical account of some event pg121

• Case study often done when individual possesses a rare, unusual or unusual condition especially about some condition involving memory, language, social function

• Mania after termination of epilepsy treatment: a case report see file

Archival Research• Uses previously compiled information to answer

research questions and researcher does not collect original data Use of public records, databases or other written records (e.g. Census Bureau)

• Survey Archives-stored surveys from Political surveys from polling organizations, National Science Foundation-Researcher may not be able to afford collecting and tabulating all this data

• Two major problems with archival data- May be difficult to obtain desired records It is difficult to be certain of how accurate is the information collected by others pg124

Survey Research Chp 7• Survey research uses questionnaires and interview to ask

people to give information about themselves about attitudes, beliefs, demographic variables (age, gender, income etc.) Assume that people are willing and able to provide truthful and accurate answers pg130

• Survey research can be a good compliment to experimental research

• Some researcher ask questions without considering what useful information will be gained by such questions

• Response Set-Tendency to respond to all questions from a particular point of view “Faking good”-social desirability leads respondent to answer in most socially acceptable way

• If researcher communicates honestly, assures confidentiality and promises feedback participants can be expected to provide honest answers pg131

Survey Research• Attitudes and Beliefs surveys ask people to evaluate certain issues/situations/people

• Consumer Reports We conduct many surveys by selecting a random sample from the approximately 7 million readers who subscribe to Consumer Reports and/or to ConsumerReports.org, who are some of the most consumer-savvy people in the nation.

• Some surveys focus on behavior (how many times did you exercise this week?)

• Question Wording-Many of the problems in surveys stem from the wording and include 1) use of unfamiliar technical terms 2) vague or imprecise terms 3) ungrammatical sentences 4) run on sentences that overload memory 5) using misleading information

• Subtle wording differences can produce great differences in results. “Could,” “should,” and “might” all sound about the same, but may produce a big differences in agreement to a question.

• Strong words such as “force” and “prohibit” represent control or action and can bias your results “The government should force you to pay taxes” Different cultural groups may respond differently. One recent study found that while U.S. respondents skip sensitive questions, Asian respondents often discontinue the survey entirely-source qualtrics.com

Survey Research• Questions need to be Simple and easy to understand “And,”

“or”, or “but” within a question usually make it overly complex pg132-133

• Avoid 1) double barreled questions-questions that ask two things at once 2) Loaded questions leading people to respond in a certain way “Do you favor eliminating the wasteful

excesses in the public school budget”? Do you approve of the President’s

oppressive immigration policy? A leading question suggests to the respondent that the researcher expects or desires a certain answer. The respondent should not be able to discern what type of answer

the researcher wants to hear 3) Negative Wording- Do you feel the

city should not approve the proposed women’s shelter? -Agreeing with the question means disagreement with the proposal and can confuse people 4) Yea-saying and Nay-saying-Response Set-A tendency to agree or disagree with all questions when a

respondent notices that they have answered several questions the same way, they assume the next questions could be answered that way too-can reverse wordingpg133

http://www.surveymonkey.com/s.asp?u=952783415975

Responses to Questions• Closed ended questions-have a limited number of responses,

more structured ,easier to code written answers are the same for all respondents (yes-no agree-disagree) Fixed number of response alternatives

• Open-Ended questions harder to categorize and code. Frequently the different type of questions give different response patterns and different conclusions pg134-135

• In a poll conducted after the presidential election in 2008, people responded very differently to two versions of this question: “What one issue mattered most to you in deciding how you voted for president?” One was closed-ended and the other open-ended. In the closed-ended version, respondents were provided five options (and could volunteer an option not on the list). When explicitly offered the economy as a response, more than half of respondents (58%) chose this answer; only 35% of those who responded to the open-ended version volunteered the economy. Moreover, among those asked the closed-ended version, fewer than one-in-ten (8%) provided a response other than the five they were read; by contrast fully 43% of those asked the open-ended version provided a response not listed in the closed-ended version of the question. Pew Research Center Researchers

will sometimes conduct a pilot study using open-ended questions to discover which answers are most common. They will then develop closed-ended questions that include the most common responses as answer choices

Responses to Questions• In addition to the number and choice of response options offered, the

order of answer categories can influence how people respond to closed-ended questions. Research suggests that in telephone surveys respondents more frequently choose items heard later in a list (a “recency effect”).

• in the example discussed above about what issue mattered most in people’s vote (previous slide), the order of the five issues in the closed-ended version of the question was randomized so that no one issue appeared early or late in the list for all respondents. Randomization of response items does not eliminate order effects, but it does ensure that this type of bias is spread randomly

• Questions with ordinal response categories – those with an underlying order (e.g., excellent, good, only fair, poor OR very favorable, mostly favorable, mostly unfavorable, very unfavorable) – are generally not randomized because the order of the categories conveys important information to help respondents answer the question. Generally, these types of scales should be presented in order so respondents can easily place their responses along the continuum, but the order can be reversed for some respondents/questions

Wording and Order of Questions• "Thinking of your teachers in high school, would you say that the female

teachers were more empathetic with regard to academic and personal problems than the male teachers, or were they less empathetic?" The other group responded to a question with the direction reversed: "Thinking of your teachers in high school, would you say that the male teachers were more empathetic with regard to academic and personal problems than the female teachers, or were they less empathetic?" Responses were measured on a nine-point scale ranging form "less empathetic" (1) to "more empathetic" (9). Not only were the mean ratings statistically different, but when female teachers were the subject, 41 percent of respondents felt that the female teachers were more empathetic than male teachers; when male teachers were the subject, only 9 percent of respondents felt that female teachers were more empathetic than the male teachers. The direction of comparison significantly affected the results obtained when the authors compared soccer with tennis and tennis with soccer on which was the more exciting sport-Wanke, Schwarz and Noelle-Neumann (1995)-authors concluded that respondents generally "focus on the features that characterize the subject of comparison and make less use of the features that characterize the referent of the comparison."

Wording and Order of Questions• A researcher wishing to increase the variability and thereby make it harder

for statistics to demonstrate significant differences among stimuli (e.g., comparing different brands of tissues) can accomplish this by using scales with too many points. A two-point scale, on the other hand, used with a stimulus that subjects can actually rate on many gradations will result in a very imprecise measurement. This will make it very difficult to find differences among means. For example, will there be a significant difference between the mean ratings for the presidencies of Abraham Lincoln and

William Clinton if the scale consists of only two points, "good" and "bad“?

• Waddell (1995) suggested that traditional customer satisfaction measurement scales ask the wrong question by focusing on "How am I doing?" rather than "How can I improve?" He claims that consumers usually rate products/services as being better when using performance or satisfaction scales and that these scales often produce high average scores. Neal (1999) posited that satisfaction measures cannot be used to predict loyalty since loyalty is a behavior and satisfaction is an attitude-RATING THE RATING SCALES-

H.Friedman Journal of Marketing Management, Vol. 9:3, Winter 1999

Rating Scales• Rating scales ask people to provide quantity or

“how much” Rating scales provide a set of categories designed to elicit information about a quantitative or a qualitative attribute.pg135

• Simplest form presents people with five or seven response alternatives with he endpoint on the scale labeled to define the extremes

• Am I the greatest professor ever? strongly agree __ __ __ __ __ __ __ strongly disagree

• Graphic rating scale- requires a mark along a continuous 100 millimeter line that is anchored at either end with descriptors

Rating Scales• Semantic differential scale-

respondents rate any concept on a series of bipolar adjectives using a 7 point scale

• Almost anything can be measured using this technique-concepts are measured along three basic dimensions 1) evaluation (good-bad) 2) activity (fast-slow) 3) potency (weak-strong)

• Non verbal scales for children

• Labeling response alternatives Researchers may provide labels to more clearly define the meaning of each alternative-the middle alternative is a neutral point half-way between the endpoints

Rating Scales• There are instances in which you may not want a

balanced scale

• Example pg137 In comparison with other graduates how would you rate this student’s potential Lower 50% upper 50% upper 25% upper 10% upper 5% _________ _________ _________ _________ ________

• Most of the alternatives ask to rate someone within the upper 25% as students in this group tend to be highly motivated and professors tend to rate them positively

• High frequency vs. Low frequency scales –alternatives indicate different frequencies of variable How often do you exercise Less than once a month about once a month once every two weeks once a week ________ _______ ________ _______

Questionnaires & Surveys• Questionnaires should be professional and neatly

typed with clear response alternatives In sequencing the questions it is best to ask the most interesting questions first, questions on a particular topic grouped together and demographic questions presented last pg138

• Administer the questionnaire first to a small group of friends, colleagues for their feedback

• Questionnaires are in written form and may be given to groups or individuals while surveys can be written or given as interviews

Questionnaires & Surveys• Questionnaires given to groups(classes, meetings, job

orientation) have the advantage of having ‘captive audiences’ that are likely to complete the questionnaire and the researcher is usually present to answer questions pg139

• Mail questionnaires/surveys- Inexpensive but often with a low return rate due to distractions. Low interest and no one being present to answer questions or provide clarification

• Internet questionnaires/surveys-Responses are sent immediately to researcher Problems exist with 1) sampling People interested in the topic can complete the form and polling organizations sample from collected databases-Are the results similar to traditional methods? (2) Do people misrepresent themselves (seems unlikely but no way to know

Questionnaires & Surveys• Interviews-Because an interview involves interaction

between people it is more likely that a person will agree to answer questions versus a mailed interview pg140

• The interviewer can answer questions and provide clarification

• Problems with interviewer bias-interviewer may react positively or negatively to answers (inadvertently) or might influence answer due to characteristics (age,sex,race etc.) or bias could lead interviewers to see what they want to see

Types of Interviews• Face to Face interviews -Expensive and time consuming Interviewer

may have to travel to person’s home or person to office-Likely to be used when sample size is small

• Telephone interviews- Most large scale surveys are done via telephone which are less expensive than face-o-face interviews and allow data to be collected relatively quickly as many interviewers can work on the same survey at once-In computer assisted telephone interview (CATI) systems the questions appear on the computer screen and the data are entered directly for analysis

• Focus group interviews- 6-10 persons together for 2-3 hours usually selected because they share a particular interest or knowledge of a topic Often receive an incentive to compensate for time and traveling. Questions often open-ended and asked of everyone-plus advantage of group interaction. Interviewer must be skilled in dealing with individuals who wish to dominate discussion or hostility between members. Discussions often recorded and later analyzed. Although they provide a great deal of data they are also costly and time consuming pg142

Surveys to study changes over time• Surveys usually study one point in time but because

some questionnaires are given every year can track changes (also can use a panel study of the same group of people over time)

Autism rating items• Before age 3, did the child ever imitate another person?

• 1. Yes, waved bye-bye

• 2. Yes, played pat-a-cake

• 3. Yes, other ( ___________________________ )

• 4. Two or more of above (which? 1____2____3____ )

• 5. No, or not sure_______________________________

• Age 2-4) Does child hold his hands in strange postures?

• 1. Yes, sometimes or often 2. No________________

• (Age 3-5) Does child sometimes line things up in precise evenly-spaced rows and insist they not be disturbed?

• 1. No 2. Yes 3. Not sure

CARS Childhood Autism Rating Scale (sample item)

0 No evidence of difficulty or abnormality in relating to people. The child's behavior is

appropriate for his or her age. Some shyness, fussiness, or annoyance at being told

what to do may be observed, but not to an atypical degree.

1.5 (if between these points)

2 Mildly abnormal relationships. The child may avoid looking the adult in the eye, avoid

the adult or become fussy if interaction is forced, be excessively shy, not be as

responsive to the adult as is typical, or cling to parents somewhat more than most

children of the same age.


3 Moderately abnormal relationships. The child shows aloofness (seems unaware of

adult) at times. Persistent and forceful attempts are necessary to get the child's attention

at times. Minimal contact is initiated by the child.


4 Severely abnormal relationships. The child is consistently aloof or unaware of what the

adult is doing. He or she almost never responds or initiates contact with the adult. Only

the most persistent attempts to get the child's attention have any effect.

ADHD rating scale

• ADHD

Sampling• One way to describe the amount of possible sampling error is to use

interval estimation. Assuming that sampling errors are normally distributed you can establish a range of values on either side of the point estimate (sample) and then determine the probability that the parameter (value) lies within this range. This probability is expressed as a percentage and is called the level of confidence

• 95% of the total area under the cure lies within plus or minus two standard deviations with less than 5% outside those values. If the point estimate (sample) were 30 and the standard deviation were 4 you could be 95% certain that the population value is within 22-38 (95% confidence interval)

Sampling from a population• Since studying entire populations would be an enormous

undertaking we sample from the population and infer what the population is like based on the data obtained from the sample (using statistical significance)

• Simple Random Sampling Every member of the population has an equal probability of being selected-if 1,000 people in population everyone has 1/1000 chance to be selected. In conducting phone interviews researcher have computer generated list of phone numbers

Random Number Generator Assume we have a population of 500 subjects and we want a sample of 30 Select column and row starting point and use 3 digits to include all possible outcomes

Sampling• Stratified random sampling- The population is divided into

subgroups (strata) and members from each strata are randomly selected. The subgroups should represent a dimension that is relevant to the research e.g. If you are conducting a survey of sexual attitudes you may want to stratify on the basis of age, gender and amount of education as these factors are related to sexual attitudes (attributes such as height are not relevant to the research) pg146

• Stratified sampling also has the advantage of building in representation of all groups. Out of 10,000 students on campus 10% foreign students on a student visa then you will need at least 100 from this group in a sample of 1,000 students

• Sometimes researchers will “oversample” from a small subgroup to ensure their representation in the sample

Sampling distributions• If we have a very large population we may draw a random

sample of 30 from this population and determine some statistic (e.g. mean) . Then we repeat the process 1,000 times producing 1,000 random samples of size 30 with the corresponding 1,000 sample statistics. A frequency distribution can be drawn up, similar to a frequency distribution of any type of score resulting in a model called the (theoretical) sampling distribution of the statistic (in this case the sampling distribution of the means)

• The expected value of any statistic is the predicted value which would give the least error for many samples is the mean of the sampling distribution. The standard error of any statistic is the standard deviation of its sampling distribution (Source Roscoe chapter 19)

• the standard error of the sample is an estimate of how far the sample mean is likely to be from the population mean, whereas the standard deviation of the sample is the degree to which individuals within the sample differ from the sample mean

Sampling Distribution• If you took all of these separate means and calculated an overall

mean for the whole lot, you would end up with a value that was the same as the population mean (the mean you’d get if you could measure every one of them)

• The arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed-Central Limit Theorem. In terms of the Central Limit Theorem, as the sample size increases, the variance decreases, thus creating a relatively normal distribution

https://www.khanacademy.org/math/probability/statistics-

inferential/sampling_distribution/v/central-limit-theorem

Central Limit Theorem• The Central Limit Theorem (CLT for short) basically says

that for non-normal data, the distribution of the sample means has an approximate normal distribution, no matter what the distribution of the original data looks like, as long as the sample size is large enough (usually at least 30) and all samples have the same size.

• The use of an appropriate sample size and the central limit theorem help us to get around the problem of data from populations that are not normal. Thus, even though we might not know the shape of the distribution where our data comes from, the central limit theorem says that we can treat the sampling distribution as if it were normal

Sampling Distribution• Cluster Sampling-is a sampling technique where the entire

population is divided into groups, or clusters, and a random sample of these clusters are selected. After the clusters are chosen all observations/indviduals in the selected clusters are included in the sample. pg147

• Cluster sampling is typically used when the researcher cannot get a complete list of the members of a population they wish to study but can get a complete list of groups or 'clusters' of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, (for example, people who live in different postal districts in the UK)

• You could get a list of all classes taught (each class is a cluster), take a random sample of classes from this list and have all members (students) of the chosen classes complete your survey

Nonprobability Sampling • In probability sampling where the probability of

every member is knowable in nonprobability sampling the probability of being selected is not known-techniques are arbitrary. A population may be defined but little effort is expended to ensure the sample accurately

• nonprobability sampling does not involve random selection

• Nonprobability sampling is cheap and convenient

• Three types 1) Haphazard 2) Purposive 3) Quota

Nonprobability Sampling • Haphazard or Convenience Sampling (Accidental, Judgment)

• Select a sample that is convenient e.g. students walking into the campus café

• Seen in the traditional "man (person) on the street" interviews conducted frequently by television news programs to get a quick (although nonrepresentative) reading of public opinion. (use of college students in much psychological research is primarily a matter of convenience).

• In clinical practice, we might use clients who are available to us as our sample. In many research contexts, we sample simply by asking for volunteers. Clearly, the problem with all of these types of samples is that we have no evidence that they are representative of the populations we're interested in generalizing to -- and in many cases we would clearly suspect that they are not. People sampled such as TV viewers may be different from the general population (Fox News, MSNBC) and are often asked about controversial issues such as abortion, taxes, gun regulation, and wars which induce certain people to respond to such “polling” or sampling pg 152

Nonprobability Sampling • In purposive sampling, we sample with a purpose in mind. We usually

would have one or more specific predefined groups we are seeking. For instance, have you ever run into people in a mall or on the street who are carrying a clipboard and who are stopping various people and asking if they could interview them? Most likely they are conducting a purposive sample (and most likely they are engaged in market research). They might be looking for Caucasian females between 30-40 years old. They size up the people passing by and anyone who looks to be in that category they stop to ask if they will participate. One of the first things they're likely to do is verify that the respondent does in fact meet the criteria for being in the sample.

• Purposive sampling can be very useful for situations where you need to reach a targeted sample quickly and where sampling for proportionality is not the primary concern. With a purposive sample, you are likely to get the opinions of your target population, but you are also likely to overweight subgroups in your population that are more readily accessible

Nonprobability Sampling• A sample is chosen that reflects a numerical composition of

various subgroups in the population(technique is similar to stratified sampling without random sampling-you are collecting data in a haphazard way pg 148

• Quota sampling is a method of sampling widely used in opinion polling and market research. Interviewers are each given a quota of subjects of specified type to attempt to recruit for example, an interviewer might be told to go out and select 20 adult men and 20 adult women, 10 teenage girls and 10 teenage boys so that they could interview them about their television viewing.

• It suffers from a number of methodological flaws, the most basic of which is that the sample is not a random sample and therefore the sampling distributions of any statistics are unknown

Evaluating Samples• Even using random sampling does not ensure sample is

representative. Error derives from two sources-1) Sampling frame used 2) poor response rates

• Sampling frame- The actual population of individuals(or clusters) from which a random sample will be drawn. Rarely will this perfectly coincide with the population of interest as some biases will be introduced- You are compiling a list of phone numbers to call during the day from the directory and will exclude those with unlisted numbers, those without phones and those who are not home during the day

• Response rate- percentage of people in sample who respond (complete phone or mail survey) Mail surveys have lower response rates than phone surveys. Can increase response rate with explanatory postcard before survey arrives, send a second mailing of the survey or provide SSAE stamped self addressed pg 150

Experimental Design-Chapter 8• Researcher manipulates the independent variable (usually to

create groups) and then compares the groups in terms of their scores on the dependent variable (outcome measure) while keeping all other variables constant through direct experimental control or randomization- If score on the dependent variable are different then the researcher can conclude that the difference was due to the difference between groups and no other cause (and the experiment will have internal validity) pg157-8

• A Confounding variable varies along with the independent variable. Confounding occurs when the effects of the independent variable and an uncontrolled variable are intertwined so you cannot determine which causes the effect

Basic Experiments• The simplest experimental design has two variables, the

independent and dependent with the independent variable having a minimum of two levels, an experimental and control group This type of experiment can take one of two possible forms 1) posttest only design or 2) pretest-posttest design

• Obtain two equivalent groups (random selection), introduce independent variable and then measure the effect of the independent variable on the dependent variable random assignment to groups or assign same subjects to both groups (CIT study with cross-over design)

Posttest only vs Pretest-Posttest design

• After groups formed (experimental and control) must choose two levels of the independent variable (treatment for the experimental group and no treatment for the control group) e.g. Experimental group gets treatment to stop smoking and control group does not

• Pretest-Posttest designs- the only difference between the posttest only and pretest-posttest design is that in the latter a pretest is given before the experimental manipulation is introduced

Posttest only vs Pretest-Posttest• The pretest-posttest design makes it easier to

assume the groups are equal at the beginning of the experiment. However if you have randomly assigned subjects to the different groups using a sufficiently large sample the groups should be equal without using a pretest

• Generally need a minimum of 20-30 Subjects pg160

Posttest only vs Pretest-Posttest advantages and disadvantages

• Advantages Pretest-Posttest

• While randomization is expected to produce equivalent groups this assumption may go unmet with small sample sizes and a pretest can increase the likelihood of equivalency

• Pretest may be necessary for assignment to groups so that those that score low or high on any pretest can be randomly assigned to conditions

• The comparison of pretest to posttest allows each subject to be evaluated in terms of change between the measures (with no pretest such comparison is not possible)


• Pretests helps determine the effects of attrition (dropout) –Can examine pretest scores of dropouts to determine if their scores differed from those completing the study

• Disadvantages Pretest

• A pretest may be time consuming

• A pretest may sensitize (alert) the subjects to the hypothesis which can result in changing a subject’s behavior in the study (can disguise the pretest as part of another study or embed the pretest in a series of irrelevant measures-time consuming)


• Solomon four group design- Half the subjects receive only the posttest and the other half receive both pretest and posttest. If there is no impact of the pretest, the posttest scores will be the same in the two control groups (with and without pretest) see table 8.1 pg 162

• Repeated measures has advantage of needing fewer subjects which decreases the effects of natural variation between individuals upon the results. Repeated subject designs are commonly used in longitudinal studies, over the long term, in educational tests where it is important to ensure that variability is low and in research on such functions as perception involving only a few subjects often receiving extensive training pg164

Between group design vs. Repeated Measures design

• Between-group design is an experiment that has two or more groups of subjects each being tested by a different testing factor simultaneously-each subject is in either the treatment (experimental) group or the control group pg163

• A repeated-measures design is one in which multiple, or repeated, measurements are made on each subject. weekly blood pressures each subject measured after receiving each level of independent variable


• In the between groups design subjects are assigned to each of the conditions using random assignment

http://www.randomizer.org/form.htm

• In repeated measures the same individual participates in all of the groups. These studies are more sensitive to finding statistically significant results-Even if you have randomly selected and assigned subjects to conditions in the between groups design there is still individual variation (naturally occurring “random error”-differences between the subjects

assigned to the different groups) which may make the effect of the independent variable unclear but when testing the same person in different conditions (versus different persons in different conditions) this random error is eliminated

http://www.randomizer.org/form.htm


• One limitation of repeated measures is that the conditions must be presented in a particular sequence which could result in an order effect-the order of presenting the treatments affects the dependent (outcome) variable (maybe a subject performs better in the second condition because of practice in the first condition (practice effect) or performed poorer in the second condition due to fatigue (fatigue effect) or that the first treatment influences the second treatment (carryover effect)

• Carryover effect occurs when the first condition produces a change that is still influencing the person when the second condition is introduced


• Experiment- Subjects are presented with a list of words and asked to recall as many words as they can. In one condition, the words are presented one word per second; in the other condition, the words are presented two words per second. The question is whether or not having performed in one condition affects performance in the second condition. Perhaps learning the first list of words will interfere with learning the second list because it will be hard to remember which words were in each list. Or maybe the practice involved learning one list will make it easier to learn a second list. In either case, there would be a carryover effect: performance on the second list would be affected by the experience of being given the first list

• Such effects are dealt with through counterbalancing or extended time intervals between conditions presented serially

Repeated Measures-types of counterbalancing• Complete counterbalancing-

All possible orders of presentation are included in the experiment pg165-166

• Latin Square-A Latin square is an table filled with n x n different symbols in such a way

that each symbol occurs exactly once in each row and exactly once in each column.

Each condition appears at each ordinal position (1st 2nd 3rd etc.)

and occurs exactly once in each row and once in each column

• Using a Latin square controls for most order effects without having to include all possible orders (each condition preceeds and

follows each condition one time)

• Time Interval-longer rest periods counteract fatigue,practiceeffects but require a greater commitment to participate

Matched Pairs Design• Rather than using random assignment to groups you can first

match subjects on a variable (achieving equivalency in this manner rather than through randomization) and avoid repeated measures/counterbalanced designs pg169

• Example study 1000 subjects each receive one of two treatments -a placebo or a cold vaccine. The 1000 subjects are grouped into 500 matched pairs. Each pair is matched on gender and age. For example,Pair 1 might be two women, both age 21. Pair 2 might be two men, both age 21. Pair 3 might be two women, both age 22

• matched

•

1) matched

Conducting Experiments Chp9• Selecting research participants- Determining sample

size. Sampling error is a function of sample size and the error tends to be smaller for larger samples-The larger your sample size, the more sure you can be that their answers truly reflect the population. This indicates that for a given confidence level, the larger your sample size, the smaller your confidence interval.

• http://www.raosoft.com/samplesize.html

Manipulating the Independent variable• Straightforward manipulations- Subjects are selected and

assigned to conditions. The conditions are constructed to represent different levels (e.g. high versus low level of difficulty for material to be learned, high versus low levels of subject motivation, subjects are categorized as ‘experts’ or ‘naïve’)

• Generally easier to interpret results when the manipulation is straightforward (without accounting for possible subtleties in staged manipulations –experimenter effects etc.) pg179

• Most research uses this type of manipulation pg177

• Cost of Manipulation- Straightforward manipulations involve less presentation of verbal or written material while running the study with groups of subjects-this is less costly pg181

Staged Manipulations and Confederates• Staged manipulations used to create some psychological

state (frustration, anger etc.) Zitek et al. and ‘sense of entitlement’ Subjects playing a video game “lost” when the game crashed (unfair condition) or because the game was too difficult (fair condition) Subjects in the unfair condition later claimed more money than other subjects when competing against others on a different task

• Confederates frequently used in staged manipulations-Conformity experiments-Asch study in which confederates gave incorrect judgments on line length before subjects responded pg178

Strength of the Manipulation• The simplest design has two levels of the independent

variable. The stronger the manipulation the more likely differences will be greater between the groups

• Social psychology experiment in which subjects interact with similar or dissimilar confederates to determine relationship between similarity and liking. If you have a 10 point scale of similarity the strongest manipulation would be to assign subjects to interact with either confederates of level 1 similarity (group A) or level 10 (group B)- When attempting to determine if a relationship exists a strong manipulation may be the best choice-However the strongest manipulation may not represent real-life situations and therefore show low external validity Also ethically a strong manipulation on variables such as fear or anxiety may hold ethical concerns (what is the threat to validity in strong manipulations)

Measuring the Dependent variable• Types of Measures

• Self-report measures- used to measure attitudes, judgments, emotional states, attributions

• Behavioral measures-direct observations of behaviors-rate of behavior, reaction time, duration pg181

• Physiological measures-recordings of bodily responses-

EEG,EMG.GSR,MRI,fMRI

• Multiple measures- Most studies use more than one measure (what were they in the studies discussed in class?)

Study of health related behaviors multiple measures were taken on # illness days, doctor visits and medication(aspirin) taken pg183

• Multiple measures common everyday experience- people who are considering buying a

house look at the house's age, condition, location, style, features, and construction, as well as the price of nearby homes.

Doctors diagnosing an illness use multiple assessments: the patient's medical history, lab tests, pt answers to questions

Multiple Measures• Sensitivity of the dependent variable-The dependent

variable should be sensitive enough to detect differences between groups. Simple yes or no questions are much less sensitive than scaled question items (in forced choice yes-no people

tend to say yes even if they have some negative feelings and gradations of feelings are not detected) pg183-4

• Tasks can be made too difficult or too easy Ceiling effect-task is so easy that everyone does well and the independent variable seems to have no effect Floor effect-task so difficult that almost nobody does well-Freedman et al. Crowding did not have an effect on cognitive performance but in

later research when subjects asked to perform more complex tasks crowding did lower performance

Measures-Cost & Additional controls• Some measures are more costly than others

• While self-report measures involve generally inexpensive measures (paper and pencil, ready questionnaires) other measures more costly-interrater observations require video equipment and at least two observers to view tapes and code behavior-physiological measures require often expensive equipment

• While a control group is considered the minimum requirement for a true experiment (RCT) other types of controls are often needed to address potentially confounding factors

Subject and Experimenter Effects

• Demand characteristics-some aspect of the experiment which might convey the purpose of the study which and the subject may act to confirm or disconfirm your hypothesis

• This may be countered by deception/cover stories, use of unrelated filler items in a questionnaire, use of field studies or observation. Can also question subjects about their perception of the study pg185

Experimental controls• Placebo groups-groups not receiving the treatment

in the study

• The placebo effect refers to the phenomenon in which some people experience some type of benefit after the administration of a placebo (a substance with no known medical effects)

• In certain instances when the benefits of a drug or treatment are evident you must give the treatment to the control (placebo) group as soon as those subjects/patients in the group have completed their part in the study Has the placebo effect gotten stronger over time?

Placebos without Deception: A Randomized Controlled Trial in

Irritable Bowel Syndrome-Kaptchuk,T., et al. PLoS One. 2010; 5(12)

• Placebo treatment can significantly influence subjective symptoms. However, it is widely believed that response to placebo requires concealment or deception. We tested whether open-label placebo (non-deceptive and non-concealed administration) is superior to a no-treatment control with matched patient-provider interactions in the treatment of irritable bowel syndrome (IBS)

• Open-label placebo produced significantly higher mean (±SD) global improvement scores (IBS-GIS) at both 11-day midpoint (5.2±1.0 vs. 4.0±1.1, p<.001) and at 21-day endpoint (5.0±1.5 vs. 3.9±1.3, p=.002

• Placebos administered without deception may be an effective treatment for IBS. Further research is warranted in IBS, and perhaps other conditions, to elucidate whether physicians can benefit patients using placebos consistent with informed consent

• http://www.cbsnews.com/news/treating-depression-is-there-a-placebo-effect/https://www.youtube.com/watch?v=LQ_EixhrFaw

http://www.cbsnews.com/news/treating-depression-is-there-a-placebo-effect/

https://www.youtube.com/watch?v=LQ_EixhrFaw

Subject and Experimenter Effects• Experimenter's bias or experimenter effects, is a subjective bias

towards a result expected by the human experimenter. These effects may occur when the experimenter knows which condition the subjects are in

• Experimenter might unintentionally treat subjects in the different groups differently (verbally or non-verbally) or the experimenter may record or interpret the data and results of the different groups differently (Rosenthal study of ‘bright’ vs. ‘dull’ rats (1966) –Langer & Abelson 1974 Psychologists rated person in video as more disturbed when told it was a patient versus a job applicant pg187

• Can minimize effect by running all conditions simultaneously, automating procedures or by making observations single-blind (subject unaware of condition he/she is in) or double-blind neither subject or experimenter knows the condition of any subject

Experimental controls-additional considerations• Writing of research proposal allows you to organize and

plan a study (Introduction & Methods) pg189

• Pilot studies-a limited trial with a small number of subjects-can ask subjects for feedback

• Manipulation check-by using self-report, behavioral or physiological measures you can measure the strength of the manipulation in the pilot study (while it might be distracting in the actual study) and determine if you obtain non significant results was it due to a problem in defining/manipulating the independent variable pg190

• Debriefing also provides you with subject feedback

Complex Experimental Designs Chp 10• Experimental Designs with only two levels of the

independent provides limited information about the relationship between the independent and dependent variables (review High (medium) Low anxiety and test

performance and curvilinear relationships)

• If a curvilinear relationship is predicted then at least three levels of a variable must be used as many curvilinear relationships exist in psychology (example of fear and attitude change-increasing the amount of fear aroused by a persuasive message increases attitude change only up to a moderate level after which further increases in fear arousal actually reduce attitude change) pg 198

Factorial Designs• Designs with multiple levels of the independent variable are

more representative of actual events

• Factorial designs are designs with more than one independent variable (factor) All levels of each independent variable are combined with all levels of the other independent variable(s)pg199

• A researcher might be interested in the effect of whether or not a stimulus person (shown in a photograph) is smiling or not on ratings of the friendliness of that person. The researcher might also be interested in whether or not the stimulus person is looking directly at the camera makes a difference.

• In a factorial design, the two levels of the first independent variable (smiling and not smiling) would be combined with the two levels of the second (looking directly or not) to produce four distinct conditions: smiling and looking at the camera, smiling and not looking at the camera, not smiling and looking at the camera, and not smiling and not looking at the camera

Interpretation of Factorial Designs• Two types of effects are studied in a factorial design

• Main effect and Interaction effect If there are two independent variables there is a main effect for each of them pg200

• Main effect-is the overall effect of one independent variable and the dependent variable,-the overall effect of each independent variable. In the example of Therapy type and Therapy Duration there is a main effect for Therapy type and a main effect for duration of therapy

• Interaction effects occur when the is an interaction between the two independent variables such that the effect of one independent variable depend on the level of the other independent variable

Factorial DesignsType of Therapy (B) Factorial design

2x2 with four experimental conditions

Behavioral Cognitive

ShortDuration of Therapy (B)

n = 50 n = 50

Long n = 50 n = 50

A design with two independent variables with one variable at two levels and

the other at three is a 2 x 3 factorial design with six conditions. A 3 x 3 design

will have nine conditions

Factorial Designs

Type of Therapy (B)

Behavioral Cognitive

ShortDuration of Therapy (B)

n = 50 n = 50

Long n = 50 n = 50

In the above experiment the type of psychotherapy (cognitive vs. behavioral) is

one main effect for the first independent variable (Therapy type and the duration

of psychotherapy (short vs. long)a second main effect of Therapy duration)

Interpretation of Factorial Designs

• In the experiment, the main effect of type(cognitive vs. behavioral) is the difference between the average score for the cognitive group and the average score for the behavioral group … ignoringduration. That is, short-duration subjects and long-duration subjects are combined together in computing these averages. The main effect of duration is the difference between the average score for the short-duration group and the average score for the long-duration group … this time ignoring type.


We see that the subjects in the cognitive

conditions scored higher on average than

the subjects in the behavioral conditions

indicating a main effect for Therapy type

This 2x 2 factorial design has four

experimental conditions-short duration

behavioral therapy, long duration

behavioral therapy, short duration cognitive

therapy and long duration cognitive therapy

Interpretation of Factorial Designs• Interaction effect- whenever the effect of one

independent variable depends on the level of the other pg201-If cognitive psychotherapy is better than behavioral

psychotherapy when the therapy is short but not whenthe therapy is long, then there is an interaction between type and duration

of therapy When we say “it depends” we are indicating that some type of interaction is at work. You would like to go to Vegas if you have enough money and you have completed your assignments pg202


• Effects are all independent of each other. A 2x2 factorial experiment might result in no main effects and no interaction, one main effect and no interaction, two main effects and no interaction, no main effects and an interaction, one main effect and an interaction, or two main effects and an interaction. In looking at results presented in a design table or (more importantly) a graph, you can interpret what happened in terms of main effects and interactions.

Factorial Designs with Manipulated and Nonmanipulated variables

• One common type of factorial design includes both experimental (manipulated) and nonexperimental(nonmanipulated) variables These designs investigate how different people respond to certain situations. They investigate how the manipulated (independent) variable affects certain personal characteristics or attributes (age, gender,personality types etc.)

• Person X Situation studies

• Extroverts get excited about parties Introverts get anxious

Person X Situation Effects Type D personality in

patients with coronary artery disease Vukovic et al. Danubina 2014 Mar;26• BACKGROUND: During the past decade studies have shown that Type D personality is associated

with increased risk of cardiac events, mortality and poor quality of life. Some authors suggested that depression and Type D personality have substantial phenomenological overlap.

• SUBJECTS AND METHODS: The sample consisted of non-consecutive case series of seventy nine patients with clinically stable and angiographically confirmed coronary artery disease (CAD), who had been admitted to the Clinic of Cardiology, University Clinical Centre, from May 2006 to September 2008. The patients were assessed by the Type-D scale (DS14), The Beck Depression Inventory (BDI), and provided demographic information. Risk factors for CAD were obtained from cardiologists. (Type D (distressed) Negative affect (worry,anxiety) and social inhibition)

• RESULTS: The findings of our study have shown that 34.2% patients with CAD could be classified as Type D personality. The univariate analysis has shown that the prevalence of Type D personality was significantly higher in individuals with unstable angina pectoris and myocardial infarction (MI) diagnoses (p=0.02). Furthermore, some components of metabolic syndrome were more prevalent in patients with Type D personality: hypercholesterolemia (p=0.00), hypertriglyceridemia (p=0.00) and hypertension (p=0.01). Additionally, the distribution of depression in patients with a Type D personality and a non-Type D personality were statistically significantly different (p=0.00).

• CONCLUSION: To our knowledge, this study is the first one to describe the prevalence and clinical characteristics of the Type D personality in patients with CAD in this region of Europe. We have found that the prevalence of Type D personality in patients with CAD is in concordance with the other studies.

Person by Situation Interaction effects• Furnham et al. examined distracting effect of television

on cognitive processing (studying) in introverts and extroverts. Both extraverts and introverts performed better in silence but extraverts performed better than introverts in the presence of television distraction

• Is there a main effect? Is there an interaction effect?

• Factorial designs with both manipulated independent variables and subject variables recognize that a better understanding of behavior requires knowledge of both situational variables and personal attributes of people pg204

Interactions and Moderator Variables• Moderator variables influence the relationship

between two other variables A moderator is a variable (z) whereby x and y have a different relationship between each other at the various levels of z. Note that this is essentially what is entailed in an interaction. a moderator variable is one that influences the strength of a relationship between two other variables, and a mediator variable is one that explains the relationship between the two other variables

• Whereas moderator

variables specify when certain effects will hold, mediators speak to how or why such effects occur• (Baron & Kenny, 2986, p. 1176).

Mediate vs. Moderate• Mediating variable-Synonym for intervening variable.

Example: Parents transmit their social status to their children directly, but they also do so indirectly, through education: Parent’s status ➛ child’s education ➛ child’s status- education is a mediating variable (mediators explain)

• Moderating variable A variable that influences, or moderates, the relation between two other variables and thus produces an interaction effect. a moderator is a third variable that affects the correlation of two variables

• if we were to replicate the Asch Experiment experiment with a female subject and found that her answers (Y variable) were not affected by confederate’s answers (X variable), then we could say that gender is a Moderator (M) in this case

• https://www.youtube.com/watch?v=3ymkfDBwel0

Moderators vs. Confounders• Moderator: A moderator is a variable (z) whereby x and y have a

different relationship between each other at the various levels of z.

Note that this is essentially what is entailed in an interaction. A variable that influences, or moderates, the relation between two other variables and thus produces an interaction effect.

• Confounder: A third variable that is related to x in a non-causal manner and is related to y either causally or correlationally. The third variable (z) is related to y even when x is not present. A confounding variable is an extraneous variable (i.e., a variable that is not a focus of the study) that is statistically related to (or correlated with) the

independent variable. A variable that obscures the effects of another variable.

Let’s review How to control for confounding variables

• Confounding variable (continued)This is bad because the point of an experiment is to create a situation in which the only difference between conditions is a difference in the independent variable. This is what allows us to conclude that the manipulation is the cause of differences in the dependent variable. But if there is some other variable that is changes along with the independent variable, then this confounding variable could be the cause of any difference

• Controlling confounding variables-Essentially all person variables can be controlled by random assignment. If you randomly assign subjects to conditions, then on average they will be equally intelligent, equally outgoing, equally motivated, and so on

• variablehttps://www.youtube.com/watch?v=B7QdNYLp_E0 confounding variables

Moderator variables

• A moderator variable changes the strength of an effect or relationship between two variables. Moderators indicate when or under what conditions a particular effect can be expected. A moderator may increase the strength of a relationship, decrease the strength of a relationship, or change the direction of a relationship. In the classic case, a relationship between two variables is significant (i.e, non-zero) under one level of the moderator and zero under the other level of the moderator. For example, work stress increases drinking problems for people with a highly avoidant (e.g., denial) coping style, but work stress is not related to drinking problems for people who score low on avoidant coping (Cooper, Russell, & Frone, 1990).

Example of Moderation

• Stress Depression

Social SupportOne of the clearest examples of moderation was presented by Cohen and Wills (1985). They argued that the social support literature (to that point in 1985) had neglected to consider the role of social support as a moderator of the stress to adjustment relationship. This moderation relationship is often depicted as shown above

• This schematic suggests that the relationship between stress and depression may differ in strength at different levels of social support. In other words, stress may be more strongly associated with depression under conditions of low social support compared to conditions of high social support.

Outcomes of a 2 X 2 Factorial Design

• Two levels to each of two independent variables We must determine if there is a significant main effect for variables A, B and an interaction effect between the variables

• In the example to the right there is a Main Effect for Both Room Temperature and Test Difficulty but no interaction effect.

Main effects and interaction effects

• We see that the six subjects in the cognitive conditions scored three points higher on average than the six subjects in the behavioral conditions. This is the main effect of the type of psychotherapy.To see the main effect of the duration of psychotherapy, we compare the average score in the short condition with the average score in the long condition, now computing these averages across subjects in the cognitive and behavioral conditions. We see that the six subjects in the long conditions scored three points higher on average than the six subjects in the short conditions. This is the main effect of the duration of psychotherapy

Main Effects Therapy Type X Duration

Below are the same results plotted in the form of a bar graph. The main effect of type is

indicated by the fact that the two cognitive bars are higher on average than the two

behavioral bars. The main effect of duration is indicated by the fact that the two long-

duration (dark) bars are higher on average than the two short-duration (light) bars

Main Effects and Interaction Effects

Parallel lines in these types of graphs

indicate that there are main effects in the

results, but no interactions. If the lines are

not parallel this is indicative of an

interaction.

"Do students do better on hard tests or

easy tests?" "It depends, in a fifty degree

room there is no difference, but in a ninety

degree room they do much better on easy

tests.“ Interaction effect

Students do best when the test is easy and

the temperature is 90 degrees. Interaction

effect

Music is as distracting as noise: the differential distraction of background music and noise on the cognitive test performance

of introverts and extraverts Furnham, 2002• Previous research has found that introverts' performance on complex cognitive

tasks is more negatively affected by distracters, e.g. music and background television, than extraverts' performance. This study extended previous research by examining whether background noise would be as distracting as music. In the presence of silence, background garage music and office noise, 38 introverts and 38 extraverts carried out a reading comprehension task, a prose recall task and a mental arithmetic task. It was predicted that there would be an interaction between personality and background sound on all three tasks: introverts would do less well on all of the tasks than extraverts in the presence of music and noise but in silence performance would be the same. A significant interaction was found on the reading comprehension task only, although a trend for this effect was clearly present on the other two tasks. It was also predicted that there would be a main effect for background sound: performance would be worse in the presence of music and noise than silence. Results confirmed this prediction. These findings support the Eysenckian hypothesis of the difference in optimum cortical arousal in introverts and extraverts.

• What was the subject variable? What was the manipulated variable? Was there a main effect? Was there an interaction effect?

ANOVA• A procedure known as the Analysis of Variance

(ANOVA) is used to assess the statistical significance of main effects and interaction in a factorial design pg207

• the ANOVA can be used for factorial designs (or designs which employ more than one IV). Note that, in this context, an IV is often referred to as a factor. The factorial design is very popular in the social sciences. It has a few advantages over single variable designs. The most important of these is that it can provide some unique and relevant information about how variables interact or combine in the effect they have on the dependent variable

ANOVA example• The human literature had shown that children diagnosed

with Fetal Alcohol Syndrome (FAS) were more active and impulsive than children not receiving this diagnosis. They also seemed to have a more difficult time controlling themselves (i.e., self restraint). These problems typically become less severe as the child ages. Were the behavioral abnormalities observed in the children with FAS due to the fact that their mothers consumed alcohol while they were pregnant or due to nutritional factors (since the diet of an alcoholic is typically not wholesome & well balanced)? Another possible causal factor of the abnormalities observed is spousal abuse. Offspring of rodents given alcohol when pregnant show similar morphological and behavioral changes to that observed in humans

Study of Alcohol on Learning

• We will have two IVs or factors and each will have two levels (or possible values). The table below illustrates the design. Note that EDC refers to Ethanol Derived Calories

Age (factor B)

Adolescent Adult

MaternalDiet(factor A)

ChocolateMilk(0% EDC)

n=5 n=5

WhiteRussian(35% EDC)

n=5 n=5

• This is an example of a 2x2 factorial design with 4 groups (or cells), each of which has 5 subjects. This is the simplest possible factorial design. The Dependent Variable (DV) used was a Passive Avoidance (PA) task. Rats are nocturnal, burrowing creatures and thus, they prefer a dark area to one that is brightly lit. The PA task uses this preference to test their learning ability. The apparatus has two compartments separated by a door that can be lifted out. One of the compartments has a light bulb which is controlled by the experimenter. The floor can be electrified and the rat receives a brief, mild electric shock

Age (factor B)

Adolescent Adult

MaternalDiet(factor A)

ChocolateMilk(0% EDC)

n=5 n=5

WhiteRussian(35% EDC)

n=5 n=5

ANOVA example

• The first trial The rat is placed in the compartment with the light bulb as shown below. When the trial begins, three things happen. The door is raised, the light is turned on, and a stopwatch is started

Within a few seconds of the door

being raised, the rat will typically

sniff around and begin to move

into the darker compartment

(without the light). When the rat

has completely entered the darker

compartment, the door is closed

and the brief, mild shock is

administered. The goal is for the

rat to learn not to move into the

darker compartment. In other

words, by remaining passive, the

rat can avoid the shock, hence the

term passive avoidance

ANOVA example• For our purposes, we will use a criteria of 180 seconds as our

operational definition of learning PA. That is, when the rat remains in the brightly lit compartment for 3 minutes, we will say that it has learned the task and what we measure is the number of trials it takes the rat to do this. (Note that a

smart rat will take less trials to learn.)

Thus, the PA task was chosen as the DV because it can be thought of as a measure of "self restraint.“ The first possibility is that nothing is significant

Age (factor B)AmarginalsAdolescen

tAdult

Maternal Diet(factor A)

(0% EDC) 3 3 3

(35% EDC) 3 3 3

B marginals 3 3

ANOVA example continued• The second possibility is that the main effect of

factor A is significant. Here is one possible representation of this outcome

Age (factor B) AmarginalsAdolescent Adult


(0% EDC) 2 2 2

(35% EDC) 4 4 4

B marginals 3 3

Notice that the A marginals show a

difference of two and thus the main

effect of factor A is significant. The

animals receiving alcohol in utero took

more trials to learn PA than controls.

The fact that the effect is consistent

across both levels of factor B tells us

that there is no interaction. In

graphical form:

ANOVA example continued• The next possibility is that the main effect of factor B is

significant. Here is one possible representation of this outcome

Age (factor B) AmarginalsAdolescent Adult


(0% EDC) 4 2 3

(35% EDC) 4 2 3

B marginals 4 2

Notice that the B marginals

show a difference of two and

thus the main effect of factor B

is significant. The older animals

took fewer trials to learn PA

than the younger animals.

The fact that the effect is

consistent across both levels

of factor A tells us that there is no

interaction

ANOVA example continued• The next possibility is that both main effects are significant.

Here is one possible representation of this outcomeAge (factor B) A

marginalsAdolescent Adult


(0% EDC) 3 1 2

(35% EDC) 5 3 4

B marginals 4 2

Notice that both sets of marginals

show a difference of two and thus

main effects are significant. The

animals receiving alcohol in utero

took more trials to learn PA than

controls and the older animals took

less trials to learn PA than the

younger animals. The fact that both

of these main effects are consistent

across the levels of the remaining

factor tells us that there is no

interaction

ANOVA example continued• The next possibility is that the interaction is significant.

Here is one possible representation of this outcomeAge (factor B) A



(0% EDC) 2 4 3

(35% EDC) 4 2 3

B marginals 3 3Notice that both sets of marginals

show no difference, thus neither main

effect is significant. However, some of

the cell means do differ by two. The

animals receiving alcohol in utero

took more trials to learn PA when

young and less when older than

controls. In other words, the effects of

prenatal alcohol depended on the

age of the animal when tested.

Whenever the effect of one factor

depends upon the levels of another,

there is an interaction.

ANOVA example continued• The next possibility is the interaction and the main effect of

factor A are significant as shown belowAge (factor B) A



(0% EDC) 1 3 2

(35% EDC) 5 3 4

B marginals 3 3

Notice that the B marginals show no

difference, thus the main effect of B is

not significant. The A marginals do

show a difference of two which

demonstrates a main effect of factor A.

This tells us that the animals that

received alcohol in utero took longer to

learn PA than the animals that didn't.

However, the cell means tell the real

story here. That is, the effect depends

on age. The animals receiving alcohol

in utero took more trials to learn PA

when young but were normal when

older when compared to controls.

Independent Groups, Repeated Measures and Mixed Factorial designs

• In a 2 x 2 Factorial design with four conditions for an Independent Group (between-subjects) design, a different group of subjects will be assigned to each of the four conditions. Following the example on pg208 if you have a 2 x 2 design with 10 subjects in each condition you will need 40 subjects total

Level 1 Var B Level 2 Var A

• Level 1

• Level 2

S1,S2,S3, S4,S5,S6,S7,S8,

S9,S10

S11,S12,S13,S14, S15,S16,S17,S18,

S19,S20

S21,S22,S23,S24, S25,S26,S27,S28,

S29,S30

S31,S32,S34,S35, S36,S37,S38,

S39,S40

2 x 2 Independent Groups ( Between

Subjects Design


• In a repeated measures (within-subjects) design the same subjects will participate in ALL conditions

• Level 1 Var B Level 2 Var A

Level 1

Level 2

S1,S2,S3, S4,S5,S6,S7,S8,

S9,S10

S1,S2,S3, S4,S5,S6,S7,S8,

S9,S10

S1,S2,S3, S4,S5,S6,S7,S8,

S9,S10

S1,S2,S3, S4,S5,S6,S7,S8,

S9,S10

2 x 2 Repeated measures

(within-groups) design


• In a 2 x 2 mixed Factorial design ten different subjects are assigned to Levels 1 and 2 of Variable A but Variable B is a repeated measures with subjects assigned to each of the two levels of Variable A receiving both Levels of Variable B

• Level 1 Variable B Level 2

• Var A

• Level 1

• Level 2

S1,S2,S3, S4,S5,S6,S7,S8,

S9,S10

S1,S2,S3, S4,S5,S6,S7,S8,

S9,S10

S11,S12,S13,S14,S15,S16,S17,S18,

S19,S20

S11,S12,S13,S14,S15,S16,S17,S18,

S19,S20

2 x2 Mixed Factorial Design

Increasing the Number of Levels of an Independent Variable

• You can increase the complexity of the basic 2 x 2 Factorial design by increasing the number of levels of one or more of the independent variables pg209

Example of 2 x 3 Factorial Design• Dr. Sy Cottick investigated driver frustration under

low, medium, and high density traffic conditions and under traffic flow controlled by a police officer or a traffic signal ( 2 conditions of Traffic Control X 3 conditions of Traffic Density. The measure of frustration was the number of horns honked by drivers before receiving the right-of-way at a controlled intersection.

2 X 3 Factorial Example

• Is there a Main Effect for Traffic Density?

• Yes The average number of horn honks increases as traffic density increases

• Is there a main effect of type of controlled intersection?

• Yes People honk more often at signal controlled intersections than at officer controlled intersections

2

4

Mean = 3

4

6

Mean = 5

8

10

Mean = 9

Mean = 4.67

Mean = 6.67

Traffic Type of controlled intersection

Density Officer Signal

Low

Medium

High

2 X 3 Factorial Example

• Is there an interaction between traffic density and type of controlled intersection?

• No The same difference in horn honks between officer and signal exists at each level of traffic density, so there is no interaction.

12

10

8

Number of 6

Officer

Signal

horn honks 4

2

0

Low Medium High

Traffic Density

it is not always possible or

practical to do an RCT

(randomized clinical trial). It may

not be ethical to do a RCT in

some cases (for example,

tobacco use), it may be too

expensive, especially for early or

exploratory studies. This 2 x

2factorial design has four

experimental conditions

Single-Case, Quasi-Experimental and Developmental Research Chapter 11

• While the classic experimental design includes randomly assigned subjects to the and independent variable conditions with a dependent variable (outcome) measure with all other variables held constant three types of special research situations exist

• 1) Single-Case 2) Quasi-Experimental and 3)Developmental Research

Single-subject, N=1 Designs• Single-subject research is experimental rather than

correlational or descriptive, and its purpose is to document causal, or functional, relationships between independent and dependent variables. Single-subject research employs within- and between-subjects comparisons to control for major threats to internal validity and requires systematic replication to enhance external validity. (Martdia, Nelson, & Marchand-Martella, 1999).

• (Each participant serves as his or her own control).

• Single-subject research requires operational descriptions of the participants, setting, and the process by which participants were selected (Wolery& Ezell, 1993)

Single Case Experimental Designs• Early work in single subject

designs credited to B.F. Skinner with many case studies or single case designs in clinical counseling and educational settings

• Single case studies begin with a baseline measure (control) followed by a manipulation

• In order to determine if the treatment was effective there is a reversal design A-B-A pg216

Single Case Designs ABA Designs• A baseline and Observation

• B Treatment or Intervention

• A Withdrawal of Treatment

• The ABA design can be further improved by ABAB design and can be extended out even further ABABAB as a single reversal may not be powerful enough

Single Case Designs • A single reversal may not be enough

but in addition the observed effect may have been due to a random fluctuation in behavior which would justify multiple withdrawals and treatments pg 217-218

Unlike Group studies Single case designs frequently involve multiple repeated observations of the subject(s)

• Multiple Baseline Designs

• In certain instances it is unethical to reverse treatment that reduces dangerous or illegal behaviors such as drug/alcoholism or sexual deviancy. In such cases it may be necessary to demonstrate the effectiveness of treatment with a multiple baseline design

Multiple Baseline Designs• One variation of multiple baseline

designs is across subjects in which the behavior of several subjects is measured over time and the treatment is introduced at a different time for each subject. Change takes place over various subjects ruling out random effects

• Another version is a multiple baseline across behaviors

• Several different behaviors of a single subject are measured over time. At different times the same manipulation is applied to each of the behaviors

Multiple Baseline Designs• Multiple baselines across behaviors- A reward or token

system could be applied to different behaviors of the same subject/patient Different ones for grooming, socialization, appropriate speech pg219

• A third variation of the multiple baseline is across situations in which the same behavior is measured in different settings such as at home and at work

Single-Case Designs• Procedures with any one subject can be replicated with

other subjects enhancing the generalizability or external validity (or replicated across settings). This is often done in research

• Sidman (1960) suggests to present data from each single case design separately and not try to group the means of all the individuals as such means may be misleading (e.g. The treatment may have been effective in changing the behavior of some individuals but not others)

• Within education, single-subject research has been used not only to identify basic principles of behavior (e.g., theory), but also co document interventions (independent variables) that are functionally related to change in socially important outcomes (dependent variables; Wolf, 1978).

Help Line Evaluation • An evaluation was conducted of the impact of different

methods of agency outreach on the number of phone calls received by a help line (information and referral). The baseline period represented a time in which there was no outreach; rather, knowledge about the help line seemed to spread by word of mouth. The B phase represented the number of calls after the agency had sent notices about its availability to agencies serving older adults and families. During the C phase, the agency ran advertisements using radio, TV, and print media. Finally, during the D phase, agency staff went to a variety of different gatherings, such as community meetings or programs run by different agencies, and described the help line.

Evaluation of Help Line-Glatthorn

Nu

mb

er

of

Ca

lls

E

X H I B I T 7 - 1 4 Multiple Treatment Design

60

50

40

30

20

10

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Week

• Phone calls did not increase appreciably after notices were sent to other professionals or after media efforts, but it did increase dramatically in the final phase of the study. This graph demonstrates how tricky the interpretation of single-subject data can be. A difficulty in coming to a conclusion with such data is that only adjacent phases can be compared so that the effect for nonadjacent phases cannot be determined. One plausible explanation for the findings is that sending notices to professionals and media efforts at outreach were a waste of resources in that the notices produced no increase in the number of calls relative to doing nothing, and advertising produced no increase relative to the notices. Only the meetings with community groups and agency-based presentations were effective, at least relative to the advertising. An alternative interpretation of the findings is that the order of the activities was essential. There might have been a carryover effect from the first two efforts that added legitimacy to the third effort. In other words, the final phase was effective only because it had been preceded by the first two efforts. If the order had been reversed, the impact of the outreach efforts would have been negligible. A third alternative is that history or some other event occurred that might have increased the number of phone calls.

ASSESSMENT OF DEVIANT AROUSAL IN ADULT MALE SEX OFFENDERS WITH DEVELOPMENTAL DISABILITIES- Reyes et al. JOURNAL OF APPLIED BEHAVIOR ANALYSIS, 2006,39,173-188• Some statistics regarding very broad characteristics of sex offenders

are available (i.e., age, race, etc.), but are limited due to the wide variability in this population. In general, the demographic characteristics of sex offenders seem to match those of nonoffenders

• Ten individuals, residing in a treatment facility specializing in the rehabilitation of sex offenders with developmental disabilities, participated in an arousal assessment involving the use of the penile plethysmograph. All of these individuals had been accused of committing one or more sexual offenses and had been found incompetent to stand trial. The arousal assessments involved measuring change in penile circumference to various categories of stimuli both appropriate (adult men and women) and inappropriate (e.g., 8- to 9-year-old boys and girls). Before each session, the technician was required to calibrate the penile strain gauge to ensure accurate measurement. The video clips were presented one at a time in one of three predetermined orders

ASSESSMENT OF DEVIANT AROUSAL • Differentiated deviant arousal was characterized as

showing arousal in the presence of a particular age and gender category that was higher than the arousal to other categories and to the neutral stimulus. differentiated arousal patterns were also consistently higher than arousal levels to the neutral stimulus. Undifferentiated deviant arousal was characterized as showing similar arousal levels to deviant and non deviant stimuli that was higher than the arousal in the presence of the neutral stimulus. The arousal assessments showed that not all of the participants were differentially aroused by the deviant stimuli

ASSESSMENT OF DEVIANT AROUSAL • Specific targets for teaching are identified. Thus, skills

training can be conducted to teach avoidance of high-risk situations (e.g., being in situations with children of a certain age group)

• Second, the assessment results could be used to evaluate the effects of commonly used, but poorly validated, treatments. For example, classical conditioning, which typically involves pairing unpleasant odors with deviant arousal, has been commonly used but has not been validated.

• Thirdly The effects of presession masturbation could be tested to determine whether ejaculation serves as an establishing operation or an abolishing operation for sexual stimuli as reinforcing (or at least as arousing stimuli)

Program Evaluation• Program evaluation is a method for collecting,

analyzing, and using information to answer questions about projects, policies and programs, particularly about their effectiveness and efficiency.

• The question that needs to be answered is whether or not the programs people are funding, implementing, voting for, receiving or objecting to are producing the intended effect. The main focus is outcome evaluation which determines if the program was effective pg221

Program Evaluation• Evaluation is the systematic application of scientific

methods to assess the design, implementation, improvement or outcomes of a program (Rossi & Freeman, 1993; Short, Hennessy, & Campbell, 1996). The term "program" may include any organized action such as media campaigns, service provision, educational services, public policies, research projects.

• Rossi et al. (2004) identified five types of evaluations each attempting to answer different questions 1) Needs Assessment 2) Program Theory Assessment 3) Process Evaluation 4) Outcome Evaluation 5) Efficiency Assessment

Needs Assessment• A needs assessment is a part of planning processes

determining if there are problems that need to be addressed in a target population( Is adolescent drug abuse a problem in the community?) – A general 12 step process-Data may come from surveys, interviews, statistical data provided by various agencies pg221

• Confirm the issue and audiences

• Establish the planning team

• Establish the goals and objectives

• Characterize the audience

• Conduct information and literature search

• Select data collection methods

• Determine the sampling scheme

• Design and pilot the collection instrument

• Gather and report data; Analyze data; Manage data

• Synthesize data and create report

Program Theory • Program evaluation often involves collaboration of

researchers, service providers and prospective client of the program to determine that the proposed program does actually address the needs of the target population in appropriate ways.

• Example cited in assessing the needs of homeless men and women in NYC men needed help with drinking or drug problems, handling money and social skills while women needed help with heath and problems- Any designed program must take these factors into account and provide a rationale for how homeless individuals will benefit from the program

Process Evaluation• When the program is under way the evaluation researcher

monitors it to determine if it is being effective. Is the program doing what it is supposed to do? The types of questions asked when designing a process evaluation are different from those asked in outcome evaluation. The questions underlying process evaluation focus on how well interventions are being implemented. Typical questions asked include, but are not limited to:

• What intervention activities are taking place?

• Who is conducting the intervention activities?

• Who is being reached through the intervention activities?

• What inputs or resources have been allocated or mobilized for program implementation?

• What are possible program strengths, weaknesses, and areas that need improvement?

Outcome Evaluation (Impact Assessment)• Outcome evaluations measure to what degree

program objectives have been achieved (i.e. short-term, intermediate, and long-term objectives). This form of evaluation assesses what has occurred because of the program, and whether the program has achieved its outcome objectives. pg223

• An outcome evaluation focused on tobacco prevention activities can measure the following elements

• Changes in intended and actual tobacco-related behaviors

• Changes in people’s attitude toward, and beliefs about, tobacco

• Changes in people’s awareness and support for interventions and policy or advocacy effort

• True experimental designs may not always be possible in these conditions and quasi-experimental designs and single-case designs may offer good alternatives

Program Evaluation Efficiency Assessment• Final program evaluation question addresses efficiencypg222

assessment. Once shown that a program does have its intended effect, researcher must determine if it is worth the resources that must be dedicated to it Cost vs Benefits

When Bad things Happen to Good Intentions• The Drug Abuse Resistance Education DARE reviewed

• When it became known that the prestigious American Journal of Public Health planned to publish the study, DARE strongly objected and tried to prevent publication. "DARE has tried to interfere with the publication of this. They tried to intimidate us," the publication director reported (also see pg230 text)

• The U.S. Department of Education prohibits schools from spending its funding on DARE because the program is completely ineffective in reducing alcohol and drug use. DARE was declared as ineffective by U.S. General Accounting Office, the U.S. Surgeon General, the National Academy of Sciences, and the U.S. Department of Education-David J. Hanson, Ph.D. http://www.alcoholfacts.org/DARE.html

An outcome evaluation of Project DAREChristopher Ringwalt1, Susan T. Ennett2 and Kathleen D. Holt2 Health Educ. Res. (1991) 6 (3): 327-337

• This paper presents the results of an evaluation of the effects of the Drug Abuse Resistance Education (DARE)Project, a school-based drug use prevention program, in a sample of fifth and sixth graders in North Carolina. DARE is distinguished by its use of specially trained, uniformed police officers to deliver 17 weekly lessons in the classroom. The evaluation used an experimental design employing random assignment of 20 schools to either a DARE or no-DARE condition, pre- and post-testing of both groups, attrition assessment, adjustments for school effects, and control for non-equivalency between comparison groups.

• DARE demonstrated no effect on adolescents' use of alcohol, cigarettes or inhalants, or on their future intentions to use these substances. However, DARE did make a positive impact on adolescents' awareness of the costs of using alcohol and cigarettes, perceptions of the media's portrayal of these substances, general and specific attitudes towards drugs, perceived peer attitudes toward drug use, and assertiveness.

http://her.oxfordjournals.org/search?author1=Christopher+Ringwalt&sortspec=date&submit=Submit

http://her.oxfordjournals.org/content/6/3/327.abstract#aff-1

http://her.oxfordjournals.org/search?author1=Susan+T.+Ennett&sortspec=date&submit=Submit


http://her.oxfordjournals.org/search?author1=Kathleen+D.+Holt&sortspec=date&submit=Submit


How effective is drug abuse resistance education? A meta-analysis of Project DARE outcome evaluations S

T Ennett et al. Am J Public Health. 1994 September; 84(9): 1394–1401

• This study used meta-analytic techniques to review eight methodologically rigorous DARE evaluations

• INTRODUCTION Project DARE (Drug Abuse Resistance Education) is the most widely used school-based drug use prevention program in the United States, but the findings of rigorous evaluations of its effectiveness have not been considered collectively. METHODS. We used meta-analytic techniques to review eight methodologically rigorous DARE evaluations. Weighted effect size means for several short-term outcomes also were compared with means reported for other drug use prevention programs. RESULTS. The DARE effect size for drug use behavior ranged from .00 to .11 across the eight studies; the weighted mean for drug use across studies was .06. For all outcomes considered, the DARE effect size means were substantially smaller than those of programs emphasizing social and general competencies and using interactive teaching strategies. CONCLUSIONS. DARE's short-term effectiveness for reducing or preventing drug use behavior is small and is less than for interactive prevention programs.

Effect Size• Consider an experiment conducted by Dowson (2000) to investigate time of

day effects on learning: do children learn better in the morning or afternoon? A group of 38 children were included in the experiment. Half were randomly allocated to listen to a story and answer questions about it (on tape) at 9am, the other half to hear exactly the same story and answer the same questions at 3pm. Their comprehension was measured by the number of questions answered correctly out of 20.

• The average score was 15.2 for the morning group, 17.9 for the afternoon group: a difference of 2.7. But how big a difference is this? If the outcome were measured on a familiar scale, such as GCSE grades, interpreting the difference would not be a problem. If the average difference were, say, half a grade, most people would have a fair idea of the educational significance of the effect of reading a story at different times of day. However, in many experiments there is no familiar scale available on which to record the outcomes. The experimenter often has to invent a scale or to use (or adapt) an already existing one - but generally not one whose interpretation will be familiar to most people

Effect Size• One way to get over this problem is to use the amount of variation in scores

to contextualize the difference. If there were no overlap at all and every single person in the afternoon group had done better on the test than everyone in the morning group, then this would seem like a very substantial difference. On the other hand, if the spread of scores were large and the overlap much bigger than the difference between the groups, then the effect might seem less significant. Because we have an idea of the amount of variation found within a group, we can use this as a yardstick against which to compare the difference. This idea is quantified in the calculation of the effect size. effect size is a measure of the strength of a phenomenon

• The concept of effect size already appears in everyday language. For example, a weight loss program may boast that it leads to an average weight loss of 30 pounds. In this case, 30 pounds is the claimed effect size. Another example is that a tutoring program may claim that it raises school performance by one letter grade. This grade increase is the claimed effect size of the program CALCULATE EFFECT SIZE http://www.uccs.edu/~lbecker/

Robert Coe University of Durham

http://www.leeds.ac.uk/educol/documents/00002182.htm

effect size r

Small 0.10

Medium 0.30

Large 0.50

Effect Size• The concept is illustrated in Figure 1, which shows two possible ways the difference

might vary in relation to the overlap. If the difference were as in graph (a) it would be very significant; in graph (b), on the other hand, the difference might hardly be noticeable. In Dowson's time-of-day effects experiment, the standard deviation (SD) = 3.3, so the effect size was (17.9 - 15.2)/3.3 = 0.8. An effect size is exactly equivalent to a 'Z-score' of a standard Normal distribution. For example, an effect size of 0.8 means that the score of the average person in the experimental group is 0.8 standard deviations above the average person in the control group, and hence exceeds the scores of 79% of the control group. With the two groups of 19 in the time-of-day effects experiment, the average person in the 'afternoon' group (i.e. the one who would have been ranked 10th

in the group) would have scored about the same as the 4th highest person in the

'morning' group The basic formula to calculate the effect size is to subtract the mean of the control group from that of the experimental group and, then, to divide the numerator by the standard deviation of the scores for the control group

Quasi-Experimental Designs• The experimental method received a big boost in the 1920s from

a young Englishman named Ronald Fisher. Fisher's modern experimental methods were applied in agricultural research for 20 years or so before they began to be applied in psychology and eventually in education.

• In the early 1960s, a psychologist, Donald Campbell, and an educational researcher, Julian Stanley (Campbell & Stanley, 1963), published a paper that was quickly acknowledged to be a classic. They drew important distinctions between experiments of the type Fisher devised and many other designs and methods being employed by researchers with aspirations to experiments but failing to satisfy all of Fisher's conditions. Campbell and Stanley called the experiments that Fisher devised "true experiments." The methods that fell short of satisfying the conditions of true experiments they called "quasi-experiments," quasi meaning seemingly or apparently but not genuinely so.

Quasi-Experimental Designs• Quasi-experimental designs address the need to study the

effect of an independent variable in settings in which the controls of true experimental designs cannot be achieved pg222

• A quasi-experiment is an empirical study used to estimate the causal impact of an intervention on its target population. Quasi-experimental research share similarities with the traditional experimental design or randomized controlled trial, but they specifically lack the element of random assignment to treatment or control. Instead, quasi-experimental designs typically allow the researcher to control the assignment to the treatment condition, but using some criterion other than random assignment (e.g. an eligibility cutoff mark)

• Campell and Stanley 1963

Quasi-Experimental Designs• A short-hand proposed by Cook and Campbell and

adopted by many others uses the following code to describe quasi-experimental design (not used in text but very common)

• R = randomizationOn = observation at time nX = intervention (i.e. surgery or giving a drug)

The One-Shot Case Study (one group postest-only design)

•No control group. This design has virtually no

internal or external validity

there is no means for determining whether change occurred as a result of the treatment or programExample-Training program for employees has only one group with one intervention and one observation(after the fact)

Treatment Post-test

X O

.

Quasi-Experimental Designs• For example, you want to determine whether praising

primary school children makes them do better in arithmetic. You measure mathematics achievement with a test. To test this idea, you choose a class of 2nd grade pupils and increase praising of children and you find that their mathematics score did increase. You conclude that praising children, increases their mathematics score. X O

(praise) (math scores)

• What are the weaknesses of this design?

• 1) Selection: It is possible that the students you selected as subjects were already good in mathematics.2) History: If the school had organized a motivation course on mathematics for these students, it might influence their performance

Quasi-Experimental Designs• One-Group Pretest-Posttest Design

• Minimal Control. There is somewhat more structure, there is a single selected group under observation, with a careful measurement being done before applying the experimental treatment and then measuring after. This design has minimal internal validity, controlling only for selection of subject and experimental mortality. It has no external validity

O1 X O2

(pretest) (praise) (posttest)

• Using the previous study on praise and math scores we want to ensure that there was no pre-existing characteristic among the pre-school children, a pretest may be administered. If the children became more attentive after praising compared to the pretest, then you can attribute it to the practice of praising

Quasi-Experimental Designs• O1 X O2

(pretest) (praise) (posttest)

• What are the weaknesses for this design?

• 1) Maturation: If time between the pretest and posttest is long, it is possible that the subjects may have matured because of developmental changes.

• 2) Testing: Sometimes the period between the pretest and the posttest is too short and there is the possibility that subjects can remember the questions and answers (carryover effect)

• It may not be ethical to do a RCT (e.g. tobacco use)

• Although Campbell and Stanley used the term control group others prefer the term comparison group to emphasize the difference between this and RCT

Quasi-Experimental Designs• Nonequivalent Control Groups

uses a control group but it is selected from existing natural groups

• Example- one group is given a medicine, whereas the control (comparison) group is given none. If different dosages of a medicine are tested, the design can be based around multiple groups. Such a design is limited in scope and contains many threats to validity. It is very poor at guarding against assignment bias since it does not use random assignment and is also subject to selection bias. Because it's often likely that the groups are not equivalent, this designed was named the nonequivalent groups design to remind us

Nonequivalent Control Group Pretest-Posttest Design

In general, however, non-equivalent groups are usually chosen to

be as similar as possible to each other, which helps to control

extraneous variables. For example, if we are comparing

cooperative learning to standard learning classroom techniques

we probably would not use a daytime class as our cooperative

learning group and an evening class as our standard lecture

group pg228

However if we add a pretest we can improve this design. This

Nonequivalent Control Group Pretest-Posttest design gives us the

advantage of comparing the control group to the experimental

group but this is still not a true RCT as assignment to groups is

not random

Nonequivalent Control Group Pretest-Posttest Design

• The nonequivalent control group design still lacks random assignment but can be improved by matching subjects (similar to matched pairs designs). If we match subjects on multiple variables and combine the scores we produce a propensity score (propensity score matching)

• Matching attempts to mimic randomization by making the groups receiving treatment and not-treatment more comparable pg229

A story of Nonequivalence• Two heart surgeons walk into a room.

• − The first surgeon says, “Man, I just finished my 100th

heart surgery!”.

− The second surgeon replies, “Oh yeah, I finished my 100th heart surgery last week. I bet I'm a better surgeon than you. How many of your patients died within 3 months of surgery? Only 10 of my patients died.”

− First surgeon smugly responds, “Only 5 of mine died, so I must be the better surgeon.”

− Second surgeon says, “My patients were probably

older and had a higher risk than your patients.”

Propensity Score• In the statistical analysis of observational data,

propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus to those that did not. The technique was first published by Paul Rosenbaum and Donald Rubin in 1983

2007Jan05 GCRC Research-Skills Workshop 301

Publications in Pub Med with phrase "Propensity Score"

0

20

40

60

80

100

120

140

160

1801

98

3

198

4

198

5

198

6

198

7

198

8

198

9

199

0

199

1

199

2

199

3

199

4

199

5

199

6

199

7

199

8

199

9

200

0

200

1

200

2

200

3

200

4

200

5

200

6

Year

Nu

mb

er

of

pu

bli

ca

tio

ns


Propensity Score example• Consider an HIV database:

– E+: patients receiving a new antiretroviral drug (N=500). Exper. Gr.

– E-: patients not receiving the drug (N=10,000). Control gr.

– D+: mortality. Dependent variable

• Need to manually measure CD4.(CD4=T-Helper Cells send signals to other types of immune

cells, including CD8 killer cells. CD4 cells send the signal and CD8 cells destroy the infectious particle

• May be potential confounding by other HIV drugs as well as other prognostic factors

• Limitations

• Propensity score methods work better in larger samples to attain distributional balance of observed covariates.– In small studies, imbalances may be unavoidable.

• Including irrelevant covariates in propensity model may reduce efficiency; Bias may occur; Non Uniform Treatment effect


Propensity Score example • Option 1:

– Collect blood samples from all 10,500 patients.

– Costly & impractical.

• Option 2:– For all patients, estimate Pr(E+|other HIV drugs & prognostic

factors).

– For each E+ patient, find E- patient with closest propensity score.

– Continue until all E+ patients match with E- patient.

– Collect blood sample from 500 propensity-matched pairs.

• Panel of 7 specialists in critical care specified variables related to decision

• age, sex, yrs of education, medical insurance, primary & secondary disease category, admission dx

• Cepeda MS, Boston R, Farrar JT, Strom BL. Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. Am J Epidemiol 2003; 158: 280-287.

Interrupted Time Series• A time series is simply a

set of measurements of a variable taken at various points in time

• In an interrupted time-series design, a time series (the dependent variable) is interrupted (usually near the middle) by the manipulation of the independent variable.

• This design uses several waves of observation before and after the introduction of the independent (treatment) variable X

• O1 O2 O3 O4 X O5 O6 O7 O8

Interrupted Time Series Control Series Design• Control Series Design pg230

• The addition of a second time series for a comparison group helps to provide a check on some of the threats to validity of the Single Interrupted Time Series Design(previous slide),especially history

• Group A: O1 O2 O3 O4 X O5 O6 O7 O8

Group B: O1 O2 O3 O4 - O5 O6 O7 O8

• This design is like a pretest-posttest design but with multiple pretests and multiple posttests. The advantage of this approach is that it provides greater confidence that the change in the dependent variable was caused by the manipulation and is not just a random fluctuation.

Developmental Research Designs• Developmental research studies how individuals change

as a function of age. Can adopt two general approaches to studying individuals of different ages Researchers might select groups of people who are remarkably similar in most areas, but differ only in age

• Cross-sectional studies are designed to look at a variable at a particular point in time. Longitudinal studies involve taking multiple measures over an extended period of time, while cross-sectional research is focused on looking at variables at a specific point in time Cross sectional designs are more common as they cost less and provide immediate results allowing comparisons across various groups

Developmental Research Designs• Disadvantages of cross-sectional research. Researcher must

infer that that the differences among age groups are due to development but this

variable (development) is not directly observed but is based on comparisons of different cohorts of individuals

• A cohort is a group of people who share a common characteristic or experience within a defined period (e.g., are born on a certain day or period, are exposed to a drug or vaccine or pollutant, or undergo a certain medical procedure)

Cross-Sectional, Longitudinal & Sequential Studies

• Longitudinal studies are the best way to study changes as people grow older and also the best way to study how scores on a variable at one age are related to another variable at a later age although attrition (loss of subjects) from the study is often problematic

• Sequential method combines the cross-sectional and longitudinal methods. In a Study by Orth et al. different age groups were formed and compared (e.g. 25-34;35-44;45-54 etc.) (cross-sectional) but then each person is measured a second time (longitudinal)

Self-Esteem Development From Young Adulthood to Old Age:A Cohort-Sequential Longitudinal Study Orth, Trzesniewski & Robins , JPSP, 2010, Vol. 98, No. 4, 645–658

• The authors examined the development of self-esteem from young adulthood to old age. Data came from the Americans’ Changing Lives study, which includes 4 assessments across a 16-year period of a nationally representative sample of 3,617 individuals aged 25 years to 104 years. Latent growth curve analyses indicated that self-esteem follows a quadratic trajectory across the adult life span, increasing during young and middle adulthood, reaching a peak at about age 60 years, and then declining in old age. No cohort differences in the self-esteem trajectory were found. Women had lower self-esteem than did men in young adulthood, but their trajectories converged in old age. Whites and Blacks had similar trajectories in young and middle adulthood, but the self-esteem of Blacks declined more sharply in old age than did the self-esteem of Whites. More educated individuals had higher self-esteem than did less educated individuals, but their trajectories were similar. Moreover, the results suggested that changes in socioeconomic status and physical health account for the decline in self-esteem that occurs in old age

Quadratic Trajectory

Controlling for Threats to Validity pg224-227

• 1) History: did some other current event effect the change in the dependent variable?

• 2) Maturation: were changes in the dependent variable due to normal developmental processes?

• 3) Statistical Regression: did subjects come from very low or high performing groups?

• 4) Selection: were the subjects self-selected or non randomly selected into experimental and control groups, which could affect the dependent variable?

• 5) Experimental Mortality: did some subjects drop out? did this affect the results?

• 6) Testing: Did the pre-test affect the scores on the post-test?

• 7) Instrumentation: Did the measurement method change during the research?

• 8) Design contamination: did the control group find out about the experimental treatment? did either group have a reason to want to make

the research succeed or fail?

Odds Ratio• In statistics, the odds ratio (usually abbreviated “OR”) is one of three main ways to

quantify how strongly the presence or absence of property A is associated with the presence or absence of property B in a given population. If each individual in a population either does or does not have a property “A”, (e.g. "high blood pressure”), and also either does or does not have a property “B” (e.g. “moderate alcohol consumption”) where both properties are appropriately defined, then a ratio can be formed which quantitatively describes the association between the presence/absence of "A" (high blood pressure) and the presence/absence of "B" (moderate alcohol consumption) for individuals in the population. This ratio is the odds ratio (OR) and can be computed following these steps:

• 1) For a given individual that has "B" compute the odds that the same individual has "A"

• 2) For a given individual that does not have "B" compute the odds that the same individual has "A"

• 3) Divide the odds from step 1 by the odds from step 2 to obtain the odds ratio (OR)

• If the OR is greater than 1, then having “A” is considered to be “associated” with having “B” in the sense that the having of “B” raises (relative to not-having “B”) the odds of having “A”. Note that this is not enough to establish that B is a contributing cause of “A”: it could be that the association is due to a third property, “C”, which is a contributing cause of both “A” and “B”

Understanding Research Results Chp 12• Because experimenters must calculate the size of differences

that chance is likely to produce and compare them with the differences they actually observe, they necessarily become involved with probability theory and its application to statistics

• True experiments satisfy three conditions: the experimenter sets up two or more conditions whose effects are to be evaluated subsequently; persons or groups of persons are then assigned strictly at random, that is, by chance, to the conditions; the eventual differences between the conditions on the measure of effect (for example, the pupils' achievement in each of two or more learning conditions) are compared with differences of chance or random magnitude G.Glass Arizona State University

Understanding Research Results • Statistics used in two ways to understand and

interpret research

• 1) Statistics are used to describe data

• 2) Statistics are used to draw inferences

• Review Scales of Measurement (which have important implications

for the way data are described and analyzed)

• Nominal scales–categorical, do not imply any ordering among the responses

• Ordinal Scales-rank order the levels of a variable (category) being studied. nothing is specified about the magnitude of the interval between the two measures

• Interval scales -intervals have the same interpretation throughout in that the intervals between the numbers are equal in size. However there is no absolute zero on the scale

• Ratio scales- most informative scale. An interval scale with the additional property that its zero position indicates the absence of the quantity being measured

Understanding Research Results • Three basic ways to describe results of variables studied

• 1) Comparing group percentages (e.g. percent of males vs females who

like to travel)

• 2) Correlating scores of individuals on two variables (e.g. do students sitting in the front of the class receive better grades)

• 3) Comparing group means (mean number of aggressive acts by children who

witnessed an adult model aggression compared to mean number of aggressive acts by children who did not witness an adult model be aggressive)

• Frequency Distributions- indicate the number of individuals who receive each possible score on a variable (pg243) Often these distributions are graphed

• Raw Data -Data collected in original form. • Frequency- The number of times a certain value or class of values occurs.

• Frequency Distribution The organization of raw data in table form with classes and frequencies

Graphing Frequency Distributions

• Pie Charts-The frequency determines the size of the slice

Graphing Frequency Distributions• Bar Graphs-separate bar for

each piece of information X Axis-Horizontal Y axis-vertical bar

graphs used when x-axis variable nominal

• Frequency polygons- a line used to represent the distribution of frequency scores line graphs used

when x-axis values numeric pg247

Graphing Frequency Distributions• Histogram- uses bars to display a frequency

distribution. Values are continuous (versus bar graph) with bars drawn next to each other

Descriptive Statistics• Descriptive statistics is the discipline of quantitatively

describing the main features of a collection of data or the quantitative description itself. Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent

• Must have at least two statistics (characteristic of a sample) to describe a data set 1) measures of central tendency and measures of variability or dispersion. Measures of central tendency include the mean, median and mode, while 2) measures of variability include the standard deviation (or variance) pg245-6

Descriptive Statistics• The mean is an appropriate indicator of central

tendency only when scores are measured on an interval or ratio scale because the actual values of the numbers are used in calculating the statistic

Common Symbols (Greek)

• Μ mu refers to a population mean; and x, to a sample mean.

• σ sigma (lower case)refers to the standard deviation of a population; and s, to the standard deviation of a sample

• N is the number of elements in a population. n is the number of elements in a sample

• Σ is the summation symbol, used to compute sums over a range of values. Σx or Σxi refers to the sum of a set of n observations. Thus, Σxi = Σx = x1 + x2 + . . . + x

Common Symbols (greek)• Letter Name

• Α α alpha

• Β β beta

• Γ γ gamma

• Δ δ delta

• Ε ε epsilon

• Ζ ζ zeta

• Θ θ theta

• Κ κ kappa

• Λ λ lambda

• Μ μ mu

• Π π pi

• Ρ ρ rho

• Letter Name

• Σ σ sigma

• Κ κ kappa

• Λ λ lambda

• Μ μ mu

• Π π pi

• Ρ ρ rho

• Σ σ sigma

• Φ φ phi

• Χ χ chi

• Ψ ψ psi

• Ω ω omega

Central Tendency and Variability (dispersion)• A measure of central tendency is a single value that

attempts to describe a set of data by identifying the central position within that set of data. The mean (or average) is the most popular and well known pg245

measure of central tendency. its use is most often with continuous data. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. So, if we have n values in a data set and they have values x1, x2, ..., xn, the sample mean, usually denoted by

Median and Mode

Table 1.

• 37 33 33 32 29 28 28 23 22 22 22 21 21 21 20 20 19 19 18 18 18 18 16 15 14 14 14 12 12 9 6

The median is the midpoint of a distribution: the same

number of scores is above the median as below it. For the

data in Table 1, there are 31 scores. The 16th highest

score (which equals 20) is the median because there are

15 scores below the 16th score and 15 scores above the

16th score. The median can also be thought of as the

50th percentile. The mode is the most frequently

occurring value. For the data in Table 1, the mode is 18

Variability (Dispersion)• The terms variability, spread, and dispersion are

synonyms, and refer to how spread out a distribution is.

• Range- The difference between the highest and lowest score

• Variance- Variability can also be defined in terms of how close the scores in the distribution are to the middle of the distribution. Using the mean as the measure of the middle of the distribution, the pg 246

variance is defined as the average squared difference of the scores from the mean. The standard deviation is simply the square root of the variance

Correlation and Prediction• Correlation refers to the degree of relationship

between two variables

• Regression-(Multiple) regression is a statistical tool used to derive the value of a criterion from several other independent, or predictor, variables. It is the simultaneous combination of multiple factors to assess how and to what extent they affect a certain outcome (y=X1+X2+X3. . . ETC.)

• “The terms correlation, regression and predication are so closely related in statistics that they are often used interchangeably”- J.Roscoe

• How would you test the hypothesis that “enhanced interrogation” results in useful intelligence What model would you use? RCT? Correlation? Regression?

Correlation and strength of relationships• A correlation coefficient is a statistic that describes

how strongly variables are related to one another The most familiar correlation coefficient is the Pearson-product-moment coefficient. Pearson's r is a measure of the linear correlation (dependence)pg248

between two variables X and Y (does not describe

curvilinear relationships)

Correlation-Scatter Plot

• 1 is a perfect positive correlation

• 0 is no correlation (the values don't seem linked at all)

• -1 is a perfect negative correlation

The value shows how good the correlation is and if it is positive or negative

Correlation and strength of relationships• Restriction of range-One issue is that one variable or the

other is sampled over too narrow of a range. This restriction of range, as it is called, makes the relationship seem weaker than it is Suppose we want to know the correlation between a test such as the SAT and freshman GPA. We collect SAT test scores from applicants and compute GPA at the end of the freshman year. If we use the SAT in admissions and reject applicants with low scores, we will have range restriction because there will be nobody in the sample with low test scores. If individuals in your sample are very similar you will have a restriction of range. Trying to understand the correlates of intelligence will be difficult if everyone in your sample is very similar in intelligence

Effect Size• Effect Size refers to the strength of association between variables.

The Pearson r correlation coefficient is one indicator of effect size; it indicates the strength of the linear association between two variables pg 252 Cozby & Bates

• The concept of effect size already appears in everyday language. For example, a weight loss program may boast that it leads to an average weight loss of 30 pounds. In this case, 30 pounds is the claimed effect size. Another example is that a tutoring program may claim that it raises school performance by one letter grade. This grade increase is the claimed effect size of the program. These are both examples of "absolute effect sizes", meaning that they convey the average difference between two groups without any discussion of the variability within the groups. For example, if the weight loss program results in an average loss of 30 pounds, it is possible that every participant loses exactly 30 pounds, or half the participants lose 60 pounds and half lose no weight at all

Effect Size• Effect size-a measure of the strength of a phenomenon (for example

the change in an outcome after experimental intervention).

• The Pearson r correlation coefficient is one indicator of effect size; it indicates the strength of the linear association between two variables

• Correlation coefficients indicating small effects range from.10-.20 medium effects ~.30 large effects above .40 (others say .50)

• Sometimes the squared value of r is reported which transforms the value into a percentage (this is also referred to as the percent of shared variance between the two variables. The correlation between gender and weight is about .70 (males weighing more than females) squaring the value of .70 results in .49% Therefore 49% of the difference in weight between males and females is accounted for by gender

Effect Size

• An effect size is a measure that describes the magnitude of the difference between two groups. Effect sizes are particularly valuable in best practices research because they represent a standard measure by which all outcomes can be assessed

• An effect size is typically calculated by taking the difference in means between two groups and dividing that number by their combined (pooled) standard deviation. Intuitively, this tells us how many standard deviations’ difference there is between the means of the intervention (treatment) and comparison conditions; for example, an effect size of .25 indicates that the treatment group outperformed the comparison group by a quarter of a standard deviation.

Effect Size continued• An effect size of 0.33 denotes that a treatment led

to a one-third of a standard deviation improvement in outcome. Similarly, an effect size of 0.5 denotes a one-half of a standard deviation increase in outcome. Because effect sizes are based upon these mean and standard deviation scores it allows direct comparisons across studies

• Cohen's d is an effect size used to indicate the standardized difference between two means

• http://www.uccs.edu/~lbecker/

http://www.uccs.edu/~lbecker/

Regression Equations• The terms correlation, regression and prediction are so

closely related in statistics that they are often used interchangeably-J.Roscoe Regression equations are calculations used to predict a person’s score on one variable when that person's score on another variable are already known- Cozby&Bates pg 253

• Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0). https://www.youtube.com/watch?v=ocGEhiLwDVc

Linear Regression

• Intercept-the value at which the fitted line crosses the y-axis

Multiple Correlation/Regression• Multiple linear regression attempts to model the

relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. The dependent variable is affected by more than one independent variable

• In simple linear regression, a criterion variable (Y)is predicted from one predictor variable(X). In multiple regression, the criterion is predicted by two or more variables Y=a + b1X1 + b2X2 + b3X3

• Example Y= Health rating of chosen city.X1 = death rate per 1000 residentsX2 = doctor availability per 100,000 residentsX3 = hospital availability per 100,000 residentsX4 = annual per capita income in thousands of dollarsX5 = population density people per square mile

Multiple Correlation/Regression• Pew Research Center survey on Happiness (Y)-Results

of multiple regression: Married people are happier than unmarrieds. People who worship frequently are happier than those who don’t. Republicans are happier than Democrats. Rich people are happier than poor people. Whites and Hispanics are happier than blacks. Sunbelt residents are happier than those who live in the rest of the country.

• Also found some interesting non-correlations. People who have children are no happier than those who don’t, after controlling for marital status. Retirees are no happier than workers. Pet owners are no happier than those without pets

Correlation/Regression

Path Diagrams Simple correlation: r = .38 R = .38 Multiple Regression: r = .38 r = .30 R = .45

Parental

Support Happiness

Parental

Support

Happiness

Self-esteem

Partial Correlation• Extraneous or confounding variables are controlled

in experimental research by keeping them constant or through randomization. This is harder to do in non experimental research. pg256

• One technique to control for such variables in non experimental research is to use partial correlation

• A partial correlation is a correlation between the two variables of interest with the influence of the third variable removed from or “partialed out of” the original correlation –which tells you what the correlation between the primary variables would be if the third variable were held constant pg 256

Partial Correlation• In simple correlation, we measure the strength of the linear

relationship between two variables, without taking into consideration the fact that both these variables may be influenced by a third variable.

• The calculation of the partial correlation co-efficient is based on the simple correlation co-efficient. However, simple correlation coefficient assumes linear relationship. Generally this assumption is not valid especially in social sciences, as linear relationship rarely exists in such phenomena

• It may be of interest to know if there is any correlation between X and Y that is NOT due to their both being correlated with Z. To do this you calculate a partial correlation.

Partial Correlation• If you calculate the correlation for subjects on each of three

variables, X, Y, and Z and obtain the following

• X versus Y: rXY = +.50 r2XY = .25

• X versus Z: rXZ = +.50 r2XZ = .25

• Y versus Z: rYZ = +.50 r2YZ = .25

• For each pair of variables—XY, XZ, and YZ—the variance overlap, is 25%

• Partial correlation is a procedure that allows us to measure the region of three-way overlap precisely, and then to remove it from the picture in order to determine what the correlation between any two of the variables would be (hypothetically) if they were not each correlated with the third variable. Alternatively, you can say that partial correlation allows us to determine what the correlation between any two of the variables would be (hypothetically) if the third variable were held constant.

Partial Correlation•

or• rXY·Z =_____rXY—(rXZ)(rYZ)_______

sqrt[1—r2XZ] x sqrt[1—r2YZ]

• rXY·Z =___ .50—(.50)(.50)___

sqrt[1—.25] x sqrt[1—.25]

• rXY·Z =+.33 (therefore r2XY·Z = .11)

Structural Equation Modeling (SEM)• SEMs are suited to both theory testing and theory

development. Measurement is recognized as difficult and error-prone. Compared to regression and factor analysis, SEM is a relatively young field, having its roots in papers that appeared only in the late 1960s. As such, the methodology is still developing, and even fundamental concepts are subject to challenge and revision. This rapid change is a source of excitement for some researchers and a source of frustration for others.

• Researchers typically construct path diagrams to represent the model being tested. Path Diagrams play a fundamental role in structural modeling. Path diagrams are like flowcharts. They show variables interconnected with lines(arrows) that are used to indicate causal flow

Structural Equation Modeling (SEM)• Structural equation models go beyond

ordinary regression models to incorporate multiple independent and dependent variables as well as hypothetical latent constructs that clusters of observed variables might represent

http://www.youtube.com/watch?v=ZuX_QzZGjf0 start at 4’23” end at 11”30’

http://www.youtube.com/watch?v=ZuX_QzZGjf0

Structural Equation Modeling (SEM)• Interpretation of path coefficients: First of all, they are not

correlation coefficients. X and Y are converted to z-scores before conducting a simple regression analysis (path coefficients are regression coefficients converted into standardized z scores).

• Interpreting path coefficients-Suppose we have a network with a path connecting from region A to region B. The meaning of the path coefficient (e.g., 0.81) is this: if region A increases by one standard deviation from its mean, region B would be expected to increase by 0.81 its own standard deviations from its own mean while holding all other relevant regional connections constant. With a path coefficient of -0.16, when region A increases by one standard deviation from its mean, region B would be expected to decrease by 0.16 its own standard deviations from its own mean while holding all other relevant regional connections constant

• One of the nice things about SPSS is that it will allow you to start with a correlation matrix (you don’t need the raw data)

Score Transformations-A score has meaning

only as it is related to other scores

Feet Inches

5.006.255.505.75

60756669

• Often it is necessary to transform data from one

measurement scale to another. For example

you might want to convert height measured in in inches. The table shows the heights of four people measured in both feet and inches. To transform feet to inches, you simply multiply by 12. (Similarly, to transform inches to feet, you divide by 12)

Some conversions require that you multiply by a number and then add a second number. A good example of this is the transformation between degrees Centigrade and degrees Fahrenheit. The table below converts F to C temperatures of 4 US cities Houston 54 12.22

Chicago 37 2.78The formula to transform Minneapolis 31 -0.56 Centigrade to Fahrenheit Miami 78 25.56F= 1.8 C + 32

Score Transformations• The figure below shows a plot of degrees Centigrade as a

function of degrees Fahrenheit. Notice that the points form a straight line. Such transformations are therefore called linear transformations Many transformations are not linear. With nonlinear transformations, the points in a plot of the transformed variable against the original variable would not fall on a straight line. Examples of nonlinear transformations are: square root, raising to a power, or logarithm. Question- Transforming distance in miles into distance in feet is a linear transformation. True or False

• This is a linear transformation because you multiply the distance in miles by 5,280 feet/mile

Linear vs Nonlinear Score Transformations• Transforming a variable involves using a mathematical operation to

change its measurement scale.

• Linear transformation. A linear transformation preserves linear relationships between variables. Therefore, the correlation between x and y would be unchanged after a linear transformation. Examples of a linear transformation to variable xwould be multiplying x by a constant, dividing x by a constant, or adding a constant to x.

• Nonlinear transformation. A nonlinear transformation changes (increases or decreases) linear relationships between variables and, thus, changes the correlation between variables. Examples of a nonlinear transformation of variable x would be taking the square root of x or the reciprocal of x. A logarithmic scale is a scale of measurement that displays the value of a physical quantity using intervals corresponding to orders of magnitude, rather than a standard linear scale

Linear vs Nonlinear Score Transformations• The Richter magnitude scale (often shortened to Richter

scale) was developed to assign a single number to quantify the energy that is released during an earthquake.

• The scale is a base-10 logarithmic scale. An earthquake that measures 5.0 on the Richter scale has a shaking amplitude 10 times larger than one that measures 4.0, and corresponds to a 31.6 times larger release of energy

http://www.matter.org.uk/schools/Content/S

eismology/richterscale.html

Linear vs Nonlinear Score Transformations• Transforming scores from raw scores into transformed

scores has two purposes: 1) It gives meaning to the scores and allows some kind of interpretation of the scores, 2) It allows direct comparison of two scores

• Linear transformation-As one side changes the other changes in equal proportions. Converting the score into percentile ranks is one way of transforming scores The scale of the percentile rank is a non-linear transformation of that of the raw score, meaning that at different regions on the raw score scale, a gain of 1 point may not correspond to a gain of one unit or the same magnitude on the percentile rank scale

Percentile Rank Transformation• PR=100/N (cf-f/2) ; PR of 17=100/150 (64-21/2)=36

Linear Score Transformations• By itself, a raw score or X value provides very little

information about how that particular score compares with other values in the distribution.

• A score of X = 53, for example, may be a relatively low score, or an average score, or an extremely high score depending on the mean and standard deviation for the distribution from which the score was obtained.

• If the raw score is transformed into a z-score, however, the value of the z-score tells exactly where the score is located relative to all the other scores in the distribution. The formula for computing the z-score for any value of X is z = X – μ

σ

Linear Score Transformations-Z Scores• z = 0 is in the center (at the mean), and the extreme

tails correspond to z-scores of approximately –2.00 on the left and +2.00 on the right.

• Although more extreme z-score values are possible, most of the distribution is contained between z = –2.00 and z = +2.00.

• M=0,SD=1

357

z-Scores as a Standardized Distribution • The advantage of standardizing distributions is that

two (or more) different distributions can be made the same.

– For example, one distribution has μ = 100 and σ = 10, and another distribution has μ = 40 and σ = 6.

– When these distribution are transformed to z-scores, both will have μ = 0 and σ = 1.

– A z-score of +1.00 specifies the same location in all z-score distributions.

Understanding Research Results Statistical Inference Chp 13

• Inferential statistics allow researchers to assess 1) how their results reflect the larger population (Do the differences observed in the sample means reflect the difference in the population means?)and 2) the likelihood that their results are repeatable (replicable)

• Even in establishing the equivalence between groups (via controlling certain variables and randomization) the difference between the sample means is almost never zero (equivalence is not perfect)

Statistical Inference • In using statistical inference we begin with a null

and a research hypothesis

• Null hypothesis H0 - there is no relationship between two measured phenomena (it is assumed true until evidence indicates otherwise) H0: μ1 = μ2

• Research or alternative hypothesis H1 μ1 μ2

can be just the negation of the null hypothesis

• If we can determine that the null hypothesis is incorrect then we can accept the alternate (research) hypothesis which is that the independent variable did have an effect on the dependent variable

Statistical significance, probability and sampling distributions

• A significant result is one that has a very low probability of occurring by chance if the population means are equal

• Using probability theory and the normal curve, we can estimate the probability of being wrong

• Probability is the likelihood of the occurrence of some event. The probability required for significance is called the alpha level with the most common alpha probability used being set at .05 (the outcome of the study is considered significant when there is a probability of .05 or less that the results were due to chance-statistical significance is based on probability distributions)

Statistical significance, probability and sampling distributions

• The Sampling distribution is the probability distribution of a given statistic based on a random sample

• The more observations sampled the more likely you are to obtain an accurate estimate of the true population value

• http://onlinestatbook.com/stat_sim/sampling_dist/

Statistical Tests t-Test and F test• The t-distribution is a family of

continuous probability distributions that arise when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown

• t-Test assumes continuous data (interval or ratio)

http://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&docid=ctizr1sbvgYlaM&tbnid=4wXN1pjxVx3gPM:&ved=0CAUQjRw&url=http://www.calculushumor.com/3/category/statistics/1.html&ei=5QxkU5zdBczloATXt4HIAg&bvm=bv.65788261,d.cGU&psig=AFQjCNFCMi51c8Ck6I4-16ukOrUJrXa8Gw&ust=1399151983971205

http://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&docid=ctizr1sbvgYlaM&tbnid=4wXN1pjxVx3gPM:&ved=0CAUQjRw&url=http://www.calculushumor.com/3/category/statistics/1.html&ei=5QxkU5zdBczloATXt4HIAg&bvm=bv.65788261,d.cGU&psig=AFQjCNFCMi51c8Ck6I4-16ukOrUJrXa8Gw&ust=1399151983971205

Statistical Tests t-Test and F test• The t-value is calculated using the

formula as shown; t-value equals the mean difference divided by the difference in standard deviations

• Degrees of freedom-The number of degrees of freedom is equal to the number of observations minus the number of algebraically independent linear restrictions placed on them

• In an array of four scores 2,3,5,and 6 and knowing the mean (M=4) only the first three scores are free to vary while the last score drawn is not free to vary. Therefore df=3 (df=n-1)

• http://web.mst.edu/~psyworld/texample.htm best

• http://faculty.clintoncc.suny.edu/faculty/michael.gregory/files/shared%20files/Statistics/Examples_t_Test.htm Use # 3

http://web.mst.edu/~psyworld/texample.htm

http://faculty.clintoncc.suny.edu/faculty/michael.gregory/files/shared files/Statistics/Examples_t_Test.htm

Statistical Tests t-Test and F test• One-tailed versus two-tailed tests- If the test statistic is

always positive (or zero), only the one-tailed test is generally applicable, while if the test statistic can assume positive and negative values, both the one-tailed and two-tailed test are of use-if you are hypothesizing a difference but not predicting direction then it will be a two tailed test

• An example of when one would want to use a two-tailed test is at a candy production/packaging plant. Let's say the candy plant wants to make sure that the number of candies per bag is around 50. The factory is willing to accept between 45 and 55 candies per bag. It would be too costly to have someone check every bag, so the factory selects random samples of the bags, and tests whether the average number of candies exceeds 55 or is less than 45

Example of t-Test• Hypothesis-people who are allowed to sleep for only four

hours will score significantly lower than people who are allowed to sleep for eight hours on a cognitive skills test. Sixteen subjects are recruited in the sleep lab and randomly assigned to one of two groups. In one group subjects sleep for eight hours and in the other group subjects sleep for four and all are given a cognitive test the next day groups Scores

• df=n-1+n-1=14

8 hours sleep group (X)

5 7 5 3 5 3 3 9

4 hours sleep group (Y)

8 1 4 6 6 4 1 2

• Mx=5 My=4

α (1 tail) 0.05 0.025 0.01 0.005 0.0025 0.001 0.0005

α (2 tail) 0.1 0.05 0.02 0.01 0.005 0.002 0.001

df

1 6.3138 12.7065 31.8193 63.6551 127.3447 318.4930 636.0450

2 2.9200 4.3026 6.9646 9.9247 14.0887 22.3276 31.5989

3 2.3534 3.1824 4.5407 5.8408 7.4534 10.2145 12.9242

4 2.1319 2.7764 3.7470 4.6041 5.5976 7.1732 8.6103

5 2.0150 2.5706 3.3650 4.0322 4.7734 5.8934 6.8688

6 1.9432 2.4469 3.1426 3.7074 4.3168 5.2076 5.9589

7 1.8946 2.3646 2.9980 3.4995 4.0294 4.7852 5.4079

8 1.8595 2.3060 2.8965 3.3554 3.8325 4.5008 5.0414

9 1.8331 2.2621 2.8214 3.2498 3.6896 4.2969 4.7809

10 1.8124 2.2282 2.7638 3.1693 3.5814 4.1437 4.5869

11 1.7959 2.2010 2.7181 3.1058 3.4966 4.0247 4.4369

12 1.7823 2.1788 2.6810 3.0545 3.4284 3.9296 4.3178

13 1.7709 2.1604 2.6503 3.0123 3.3725 3.8520 4.2208

14 1.7613 2.1448 2.6245 2.9768 3.3257 3.7874 4.1404

15 1.7530 2.1314 2.6025 2.9467 3.2860 3.7328 4.0728

16 1.7459 2.1199 2.5835 2.9208 3.2520 3.6861 4.0150

17 1.7396 2.1098 2.5669 2.8983 3.2224 3.6458 3.9651

18 1.7341 2.1009 2.5524 2.8784 3.1966 3.6105 3.9216

19 1.7291 2.0930 2.5395 2.8609 3.1737 3.5794 3.8834

20 1.7247 2.0860 2.5280 2.8454 3.1534 3.5518 3.8495

21 1.7207 2.0796 2.5176 2.8314 3.1352 3.5272 3.8193

Statistical Tests t-Test and F test• The F test is an extension of the t test. If a study has only one

independent variable with two groups then F and t are basically identical. With more than two levels of an independ. variable and when there are two or more independent variables in a factorial design. Similar to the t, the larger the F ratio the more likely it is that the results are significant

• The F-test is designed to test if two population variances are equal. It does this by comparing the ratio of two variances (Analysis of Variance-ANOVA) Each Mean Square = SS/df

•

• http://www.chem.utoronto.ca/coursenotes/analsci/StatsTutorial/ftest.html

Zebras Taking Flight• A z-test is used for testing the mean of a population

versus a standard, or comparing the means of two populations, with large (n ≥ 30) samples whether you know the population standard deviation or not

• A t-test is used for testing the mean of one population against a standard or comparing the means of two populations

• An F-test is used to compare 2 populations’ variances. The samples can be any size. It is the basis of ANOVA.The F-test is designed to test if two population variances are equal This is the F-test, and plays an important role in the analysis of variance

Chi-square test • The Chi-square test is intended

to test how likely it is that an observed distribution is due to chance

• The "t" test and the F test are called parametric tests. They assume certain conditions about the parameters of the population from which the samples are drawn(assume interval or ratio data).

• Parametric and nonparametric statistical procedures test hypotheses involving different assumptions

• Parametric statistics test hypotheses based on the assumption that the samples come from populations that are normally distributed. Nonparametric tests make fewer and less stringent assumptions than their parametric counterparts.Nonparametric tests usually result in loss of efficiency

Chi-Square example• Suppose that the ratio of male to female students in the

Science Faculty is exactly 1:1, but in the Pharmacology Honors class over the past ten years there have been 80 females and 40 males. Is this a significant departure from expectation? Now we must compare our X2 value with a(chi squared) value in the X2 table with n-1 degrees of freedom (where n is the number of categories, i.e. 2 in our case -males and females) If our calculated value of X2 exceeds the critical value of then we have a significant difference

Female Male Total

Observed numbers (O)

80 40 120

Expected numbers (E)

60*3 60*3 120 *1

O - E 20 -20 0 *2

(O-E)2 400 400

(O-E)2 / E 6.67 6.67 13.34 = X2

Degrees of Freedom

Probability, p

0.99 0.95 0.05 0.01 0.001

1 0.000 0.004 3.84 6.64 10.83

2 0.020 0.103 5.99 9.21 13.82

3 0.115 0.352 7.82 11.35 16.27

4 0.297 0.711 9.49 13.28 18.47

5 0.554 1.145 11.07 15.09 20.52

6 0.872 1.635 12.59 16.81 22.46

7 1.239 2.167 14.07 18.48 24.32

8 1.646 2.733 15.51 20.09 26.13

9 2.088 3.325 16.92 21.67 27.88

10 2.558 3.940 18.31 23.21 29.59

11 3.05 4.58 19.68 24.73 31.26

12 3.57 5.23 21.03 26.22 32.91

13 4.11 5.89 22.36 27.69 34.53

14 4.66 6.57 23.69 29.14 36.12

15 5.23 7.26 25.00 30.58 37.70

16 5.81 7.96 26.30 32.00 39.25

Statistical Significance• The goal of a test is to allow you to make a decision about

your results. Significance levels show you how likely a result is due to chance. The most common level, used to mean something is good enough to be believed, is .95 (.05)This means that the finding has a 95% v

chance of being true. When you have a large sample size, very small differences will be detected as significant (.05 is the traditional level chosen).

• The more analyses you perform on a data set, the more results will meet "by chance" the conventional significance level. For example, if you calculate many correlations between different variables then you should expect to find by chance that one in every 20 correlation coefficients are significant at the p .05 level, even if the values of the variables were totally random and those variables do not correlate in the population

Type I and Type II Errors• The decision to reject the null hypothesis is based on

probabilities rather than certainties. In reviewing the decision matrix below there are two possible decisions (reject or accept the null Hypothesis) and two possible truths (the null hypothesis is true or false). There are also two correct decisions (correctly accepting the H0 when it is true and correctly rejecting the H0 when it is false) and two errors

• Type I error-we reject the H0 when it is true and Type II error we accept the H0 when it is false

• Decision matrix

Type I and Type II Errors• A test's probability of making a type I error is denoted by α. A test's

probability of making a type II error is denoted by β

• Type I errors occur when we obtain a large value (t or F) by chance and we incorrectly decide that the Ind.Var. had an effect when the significance level set to reject the H0 is .05 then the probability of a Type I error is .05 (α). The rate of the type II error is denoted by the Greek letter β (beta) and related to the power of a test (which equals 1−β). The power of a statistical test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false (i.e. the probability

of not committing a Type II error).

-blood tests for a disease will falsely detect the disease in some proportion of people who don't have it, and will fail to detect the disease in some proportion of people who do have it

Type I and II errors• If a jury in a criminal trial must decide guilt or innocence the example of

error type remains same pg 274-5 H0 = person is innocent

• H0 true H0 false

• Reject H0

Accept H0

• Type I error reject the null when it is true- We may obtain a large t of F value by chance Type I error is determined by choice of significance level (α) With .05 α

then 5 out of 100 times (1 out of 20) may make mistake. Can change α to .01 to lessen error. Type II error occurs when we accept the null but the null is incorrect. Probability of Type II is β and is low. If we lower the significance level (e.g. .001) makes it more difficult to reject the null hypothesis decreasing chances of Type II error but it increases chances of Type I error (Use decision grid for marriage- Which error is worse?

Guilty

Type 1 error α

Guilty Correct

decision 1-β

Innocent

Correct decision 1-α

Innocent Type II error

β

Choosing a Significance Level• Researchers traditionally use either a .05 or a .01

significance level For a juror which type error is more serious? Type I or Type II; for a physician Type I or Type II?

• false positive false negative false positive false negative

Found guilty (incorrect)

Type I

Found guilty (correct)

Found innocent (correct)

Found innocent (incorrect)

Type II

H0 = not guilty

Operate incorrectly

Type I error

Operate correctly

Don’t operate correctly

Don’t operate incorrectly

Type II error

H0 = no operation needed

Significance• Research is designed to demonstrate that there is a

relationship between variables not to say that the variables are unrelated (i.e. accepting the null Hypothesis)

• A study may come up with nonsignificant results when there is an effect (type II error) due to inadequate explanation to subjects, a weak manipulation or a measure of the dependent variable that is not reliable etc. (see threats to

validity) A meaningful result is more likely to be over looked when the significance level is very low(.001) Type II error pg 278

• Type II errors may result from too small sample sizes and effect sizes. However while nonsignificant results do not necessarily indicate that the null hypothesis is correct, significant results do not necessarily indicate a meaningful relationship. As your sample size increases, so does the power of your test

Long-term psychosocial consequences of false-positive screening mammography Brodersen J & Siersma VD, Ann Fam Med. 2013 Mar-Apr;11(2):106-15

• PURPOSE: Cancer screening programs have the potential of intended beneficial effects, but they also inevitably have unintended harmful effects. In the case of screening mammography, the most frequent harm is a false-positive result. Prior efforts to measure their psychosocial consequences have been limited by short-term follow-up, the use of generic survey instruments, and the lack of a relevant benchmark-women with breast cancer.

• METHODS: In this cohort study with a 3-year follow-up, we recruited 454 women with abnormal findings in screening mammography over a 1-year period. For each woman with an abnormal finding on a screening mammogram (false and true positives), we recruited another 2 women with normal screening results who were screened the same day at the same clinic. These participants were asked to complete the Consequences of Screening in Breast Cancer-a validated questionnaire encompassing 12 psychosocial outcomes-at baseline, 1, 6, 18, and 36 months.

• RESULTS: Six months after final diagnosis, women with false-positive findings reported changes in existential values and inner calmness as great as those reported by women with a diagnosis of breast cancer (Δ = 1.15; P = .015; and Δ = 0.13; P = .423, respectively). Three years after being declared free of cancer, women with false-positive results consistently reported greater negative psychosocial consequences compared with women who had normal findings in all 12 psychosocial outcomes (Δ >0 for 12 of 12 outcomes; P <.01 for 4 of 12 outcomes)

• CONCLUSION: False-positive findings on screening mammography causes long-term psychosocial harm: 3 years after a false-positive finding, women experience psychosocial consequences that range between those experienced by women with a normal mammogram and those with a diagnosis of breast cancer.

Choosing a sample size: Power analysis• We can select a sample size on the basis of desired

probability of correctly rejecting the null hypothesis This probability is called the power of the statistical test Power =1-p(Type II error)

• Power refers to the probability that your test will find a statistically significant difference when such a difference actually exists. In other words, power is the probability that you will reject the null hypothesis when you should (and thus avoid a Type II error). It is generally accepted that power should be .8 or greater; that is, you should have an 80% or greater chance of finding a statistically significant difference when there is one

• http://meera.snre.umich.edu/plan-an-evaluation/related-topics/power-analysis-statistical-significance-effect-size

• http://www.surveysystem.com/sscalc.htm#one

Replications• Scientists do not attach too much importance to the results of a

single study. Better understanding comes from integrating the results of numerous studies of the same variable(s) pg280

• Replicating Milgram-Would People Still Obey Today? Jerry M. Burger Santa Clara University

• Seventy adults participated in a replication of Milgram’s Experiment 5 up to the point at which they first heard the learner’s verbal protest (150 volts). Because 79% of Milgram’s participants who went past this point continued to the end of the shock generator’s range, reasonable estimates could be made about what the present participants would have done if allowed to continue.

• Obedience rates in the 2006 replication were only slightly lower than those Milgram found 45 years earlier. Contrary to expectation, participants who saw a confederate refuse the experimenter’s instructions obeyed as often as those who saw no model. Men and women did not differ in their rates of obedience, but there was some evidence that individual differences in empathic concern and desire for control affected participants’ responses.

Replicating Milgram• 79% of the people who continued past 150 volts (26 of 33)

went all the way to the end of the shock generator’s range. In short, the 150-volt switch is something of a point of no return. Nearly four out of five participants who followed the experimenter’s instructions at this point continued up the shock generator’s range all the way to 450 volts. This observation suggests a solution to the ethical concerns about replicating Milgram’s research. Knowing how people respond up to and including the 150-volt point in the procedure allows one to make a reasonable estimate of what they would do if allowed to continue to the end. Stopping the study within seconds after participants decide what to do at this juncture would also avoid exposing them to the intense stress Milgram’s participants often experienced in the subsequent parts of the procedure.

Replicating Milgram• Burger screened out any potential subjects who had taken more

than two psychology courses in college or who indicated familiarity with Milgram’s research. A clinical psychologist also interviewed potential subjects and eliminated anyone who might have a negative reaction to the study procedure.

• In Burger’s study, participants were told at least three times that they could withdraw from the study at any time and still receive the $50 payment. Also, these participants were given a lower-voltage sample shock to show the generator was real – 15 volts, as compared to 45 volts administered by Milgram.

• Several of the psychologists writing in the same issue of American Psychologist questioned whether Burger’s study is truly comparable to Milgram’s, although they acknowledge its usefulness.

Computer Analysis of Data• Most analysis is carried out via computer programs

such as SPSS, SAS,SYSTAT and others although the general procedure are very similar in all of the programs

Selecting the appropriate Statistical Test

• Parametric statistical procedures rely on assumptions about the shape of the distribution (i.e., assume a normal distribution) in the underlying population and about the form or parameters (i.e., means and standard deviations) of the assumed distribution. Nonparametric statistical procedures rely on no or few assumptions about the shape or parameters of the population distribution from which the sample was drawn

• http://www.ats.ucla.edu/stat/mult_pkg/whatstat/choosestat.html

Parametric vs. Nonparametric tests• Parametric and nonparametric are two broad classifications of statistical

procedures.

• Parametric tests are based on assumptions about the distribution of the underlying population from which the sample was taken. The most common parametric assumption is that data are approximately normally distributed.

• Nonparametric tests do not rely on assumptions about the shape or parameters of the underlying population distribution. If the data deviate strongly from the assumptions of a parametric procedure, using the parametric procedure could lead to incorrect conclusions. If you determine that the assumptions of the parametric procedure are not valid, use an analogous nonparametric procedure instead.

• The parametric assumption of normality is particularly worrisome for small sample sizes (n < 30). Nonparametric tests are often a good option for these data.

• Nonparametric procedures generally have less power for the same sample size than the corresponding parametric procedure if the data truly are normal. Interpretation of nonparametric procedures can also be more difficult than for parametric procedures.

Review of Scales of Measurement• A categorical variable, also called a nominal variable, is for mutual exclusive,

but not ordered, categories. For example, your study might compare five

different genotypes. You can code the five genotypes with numbers if you want,

but the order is arbitrary and any calculations (for example, computing an

average) would be meaningless.

• A ordinal variable, is one where the order matters but not the difference

between values. For example, you might ask patients to express the amount of

pain they are feeling on a scale of 1 to 10. A score of 7 means more pain that a

score of 5, and that is more than a score of 3. But the difference between the 7

and the 5 may not be the same as that between 5 and 3. The values simply

express an order.

• A interval variable is a measurement where the difference between two values

is meaningful. The difference between a temperature of 100 degrees and 90

degrees is the same difference as between 90 degrees and 80 degrees.

• A ratio variable, has all the properties of an interval variable, and also has a

clear definition of 0.0. When the variable equals 0.0, there is none of that

variable. Variables like height, weight, enzyme activity are ratio variables.

Temperature, expressed in F or C, is not a ratio variable. A temperature of 0.0 on either of those scales does not mean 'no temperature'. A temperature of 100

degrees C is not twice as hot as 50 degrees C, because temperature C is not a ratio

variable. A pH of 3 is not twice as acidic as a pH of 6, because pH is not a ratio variable

Nonparametric vs Parametric Tests• Nonparametric statistical tests

• Nonparametric statistical tests are used instead of the parametric tests we have considered thus far (e.g. t-test; F-test), when:

• The data are nominal or ordinal (rather than interval or ratio).

• The data are not normally distributed, or have heterogeneous variance (despite being interval or ratio).

• The following are some common nonparametric tests: Chi square:

• 1. used to analyze nominal data

• 2. compares observed frequencies to frequencies that would be expected under the null hypothesis

• Mann-Whitney U

• 1. compares two independent groups on a DV measure with rank-ordered (ordinal) data

• 2. nonparametric equivalent to a t-test

• Wilcoxon matched-pairs test

• 1. used to compare two correlated groups on a DV measured with rank-ordered (ordinal) data

• 2. nonparametric equivalent to a t-test for correlated samples

• Kruskal-Wallis test

• 1. used to compare two or more independent groups on a DV.with rank-ordered (ordinal) data

• 2. nonparametric alternative to one-way ANOVA

Generalizing Results Chp14External Validity is the extent to which findings may be generalized

Even though a researcher randomly assigns participants to experimental conditions rarely are those subjects randomly selected from the general population; subjects are selected because they are available (e.g college freshmen and sophomores who must fulfill course requirements)

Such subjects represent a very restricted population and as they are often older adolescents usually have a developing sense of identity, social and political attitudes that are also developing and a high need for peer approval

These student/subjects are rather homogenous as a group but different from older adults. We know about general principles of psychological functioning may be limited to a select and unusual group

Although the use of rats is convenient many research findings have been applied to humans particularly in the fields of memory, sexuality, drugs, brain function etc.

Generalizing Research results• While college students represent a ready group of volunteers those

researchers using different populations are even more dependent on volunteers than university researchers. Volunteers may be a unique population

• However college student populations are increasingly diverse and representative of society. Studies with certain college populations are replicated at other colleges using different mixes of students and many studies are later replicated with other populations

• Rosenthal and Rosnow (1975) stated that volunteers tend to be more highly educated, higher SES and more social

• Different kinds of people volunteer for different kinds of experiments. Titles of experiments may change who volunteers (e.g. “problem solving vs. interaction in small groups”) pg 289

• Internet surveys also solicit volunteers. Those individuals who use the internet more frequently. Higher internet use is associated with living in an urban area, being younger and college educated, and having a higher income

Gender and subgroups• A study published in July 2006 in Genome Research

compared the levels of gene expression in male and female mice and found that 72 percent of active genes in the liver, 68 percent of those in fat, 55.4 percent of the ones in muscle, and 13.6 percent of genes in the brain were expressed in different amounts in the sexes.

• In an analysis of 163 new drug applications submitted to the Food and Drug Administration between 1995 and 2000 that included a sex analysis, drug concentrations in blood and tissues from men and women in 11 of the drugs varied by as much as 40 percent. However, the applications included no sex-based dosing recommendations. Source Melinda Wenner Moyer Slate Magazine

Gender and subgroupsNature 465, 665 (10 June 2010) éditorial

• Admittedly, there can be legitimate reasons to skew the ratios. For instance, researchers may use male models to minimize the variability due to the estrous cycle, or because males allow them to study the Y chromosome as well as the X. And in studies of conditions such as heart disease, from which female mice are thought to be somewhat protected by their hormones, scientists may choose to concentrate on male mice to maximize the outcome under study

• However justifiable these imbalances may be on a case-by-case basis, their cumulative effect is pernicious: medicine as it is currently applied to women is less evidence-based than that being applied to men Moreover, hormones made by the ovaries are known to influence symptoms in human diseases ranging from multiple sclerosis to epilepsy. apart from a few large, all-female projects, such as the Women's Health Study on how aspirin and vitamin E affect cardiovascular disease and cancer, women subjects remain seriously under-represented in clinical cohorts

Gender and subgroups• Journals can insist that authors document the sex of

animals in published papers — the Nature journals are at present considering whether to require the inclusion of such information. Funding agencies should demand that researchers justify sex inequities in grant proposals and, other factors being equal, should favor studies that are more equitable.

• Drug regulators should ensure that physicians and the public alike are aware of sex-based differences in drug reactions and dosages. And medical-school accrediting bodies should impress on their member institutions the importance of training twenty-first-century physicians in how disease symptoms and drug responses can differ by sex.

Hypothetical study on aggression and crowding for males and females pg291

• A B

• High high

• Aggression males

Aggression males

females females

low low

low crowding high low crowding high

C D

high high

males

Aggression males Aggression

females

females

low crowding high low high

Figure A males and females essentially equal no interaction Figure B main effect for crowding but also for gender

Figure C Interaction between males and crowding no effect for females Figure D interaction Positive relationship for males and crowding with a negative relationship for females and crowding C and D results for males cannot be generalized to females

Cultural Considerations• Arnett et al (2008) state that psychology is built on the study

of WEIRD (Western, Educated, Industrialized, Rich, Democratic) people pg293

• Traditional theories of self concept are built upon western concepts of the self as separate or individualistic while in some other cultures self-esteem is derived more from the relationships to others

• “Asian-Americans are more likely to benefit from support that does not involve the sort of intense disclosure of personal stressful events and feeling that is the hallmark of support in many European American groups”pg 293

• However many studies find similarities across cultures

Generalizing from Laboratory Settings• Laboratory research has the advantage of studying

the effect of an independent variable under highly controlled conditions but does the ‘artificiality’ of the laboratory limit its external validity?

• Anderson, Lindsay and Bushman (1999) compared 38 pairs of studies for which there were similar laboratory and field studies on areas including aggression, helping memory and depression and found that the effect size of the independent variable on the dependent variable was very similar in the two types of studies (which raises the confidence in the external validity of the studies) pg296

Replications• Replications are a way of compensating for limitations in

generalizing from any single study

• An exact replication is an attempt to precisely follow the procedures of a study to determine if the same results will be obtained. An exact replication may be followed when a researcher is attempting to build on a previous study and wants to be confident in the external validity of the study to proceed with his/her own follow-up

• Review the findings of the “Mozart Effect” in which students who listened to 10 minutes of a Mozart Sonata showed a higher performance on a spatial reasoning task (S-B-IQ scale) (Rauscher, Shaw and Ky,1993) which resulted in many failures to replicate the original result. An alternative explanation may be that the effect is limited to music that also increases arousal or that the original study made a type I error (Incorrect rejection of the null hypothesis) or that results occur only under special conditions pg297

Conceptual Replications• In a conceptual replication researchers attempt to

understand the relationships between variables

• One way this is accomplished is to redefine the operationalized definition of a variable. While the original definition of exposure to music was defined as a 10 minutes of the Mozart Sonata for two pianos in D minor a new operationalized definitions may include a different selection of Mozart or a different composer

• When conceptual replications produces similar results this increases our confidence in the external validity of the original findings and demonstrates that the relationship between the theoretical variables holds

Generalizations Literature Reviews and Meta-Analyses• You can evaluate the external validity of a study by conducting a

literature review which summarizes and evaluates a particular research area. The literature review synthesizes and provides information which

• 1) summarizes what has been found to date 2) tells the reader what findings are strongly supported or not in the literature 3) points out inconsistencies in the findings and 4) discusses future direction for this area of research

• Meta-analysis-gives a thorough summary of several studies that have been done on the same topic, and provides the reader with extensive information on whether an effect exists and what size that effect has. The analysis combines the results of a number of studies (e.g. by use of effect size) Traditional reviews do not usually calculate effect sizes or attempt to integrate information from different experimental designs used across studies cited but is a more qualitative approach while a meta-analysis is a more quantitative approach pg299

Generalization and Variation• Variations in the service quality of medical practices

Ly DP & Glied SA Am J Manag Care 2013 Nov 1;19(11)

• There was substantial variation in the service quality of physician visits across the country. For example, in 2003, the average wait time to see a doctor was 16 minutes in Milwaukee but more than 41 minutes in Miami; the average appointment lag for a sick visit in 2003 was 1.2 days in west-central Alabama but almost 6 days in Northwestern Washington. Service quality was not associated with the primary care physician-to-population ratio and had varying associations with the organization of practices. CONCLUSIONS:

• Cross-site variation in service quality of care in primary care has been large, persistent, and associated with the organization of practices. Areas with higher primary care physician-to-population ratios had longer, not shorter, appointment lags.

Regional Differences in Prescribing Quality Among Elder Veterans and the Impact of Rural Residence Brian C. Lund Journal of Rural Health 29 (2013) 172–179

• Regional variation often reflects discrepancies in the implementation of best practices, and comparisons of high versus low performing sites may identify mechanisms for improving performance. A recent analysis of national Medicare data revealed significant regional variation, with the highest concentration of potentially inappropriate prescribing found in the Southern United States and the lowest rates in the Northeast and upper Midwest.22 Similar geographic distributions of prescribing quality have been previously reported among older adults in both outpatient and inpatient settings. The most direct interpretation of these findings are differences in provider-level characteristics, where different approaches to pharmacotherapy lead to patients in low performing regions being exposed to riskier medication regimens. However, prescribing is also influenced by system-level factors such as differences in health system organization, access to prescription drug benefits, and higher copayments for newer (and potentially safer) medications.

"Real World" Atypical Antipsychotic Prescribing Practices in Public Child andAdolescent Inpatient Settings Elizabeth Pappadopulos, et al. Schizophrenia

Bulletin, Vol. 28, No. 1, 2002

• The widespread use of atypical antipsychotics for youth treated in inpatient settings has been the focus of increasing attention, concern, and controversy. Atypical antipsychotic medications have supplanted traditional neuroleptics as first line treatments for schizophrenia and other psychotic disorders in adult populations. A similar trend has also been observed in the treatment of child and adolescent psychiatric patients, although data on the safety and efficacy of atypical agents in youth are scarce

• Among child and adolescent inpatients, atypical antipsychotics are mainly prescribed for aggression rather than for psychosis. Current debates revolve around whether these agents are appropriately monitored and managed. In an effort to address these concerns, a survey was developed and administered to physicians at four facilities and to a group of 43 expert clinicians and researchers.

"Real World" Atypical AntipsychoticPrescribing Practices in Public Child and

Adolescent Inpatient Settings • Taken together, these studies show that as many as 98

percent of children and adolescents in psychiatric hospitals are treated with psychotropic medications during their inpatient stay and approximately 45 percent to 85 percent of these patients receive multiple medications simultaneously. Antipsychotics are the most commonly prescribed agents across most inpatient settings for the treatment of aggression

• While overall rates of psychotropic prescribing (ranging from 68% to 79% of patients) did not differ across inpatient units, preferences for particular classes of medications varied by facility. In addition, a higher percentage of patients were given antipsychotics in the county-university hospital (74%) than in the State hospital (57%) or the private hospital (35%). While these trends may be due to differences in the patient populations treated at each facility, Kaplan and Busner note that the use of antipsychotics for nonpsychotic disorders was statistically equivalent across settings.

"Real World" Atypical Antipsychotic Prescribing Practices in Public Child and Adolescent Inpatient Settings

• Atypical antipsychotics represent a major advance in the treatment of schizophrenia and psychosis among adults because of their superior efficacy and side effect profile in comparison to conventional antipsychotics. However, because these benefits have not been reliably established in children (Sikich 2001), antipsychotic prescribing practices for child and adolescent psychiatric inpatients have largely developed from clinical experience rather than from scientific evidence.

• A recent literature review shows that published data on treatments for aggression are primarily from open studies and case reports. Much of the research conducted involves aggressive youth with compromised intelligence and are not easily applied to the general population of youngsters with aggressive behavior problems.

"Real World" Atypical Antipsychotic Prescribing Practices in Public Child and Adolescent Inpatient Settings

• Concerns about side effects, such as weight gain, elevated prolactin levels, and abnormal electrocardiograms, especially in children, have yet to be resolved by research. In the face of limited data from clinical trials, intensive study is needed on factors that influence physicians' antipsychotic prescribing preferences and that result in unnecessary treatment variability.

• Taken together, the audit of patient charts reveals much-needed real-world information about the administration of antipsychotics and other psychotropic medications in this set of public inpatient facilities for children and adolescents. The children and adolescents treated in these settings represent a particularly severe and comorbid patient population. Despite the fact that inpatient youth diagnosed with psychosis accounted for only a fraction (20%) of the population, antipsychotics were commonly prescribed in this sample and were often used in combination with other agents.

"Real World" Atypical Antipsychotic prescribing practices• Antipsychotics are administered to children and adolescents in public

inpatient settings in high proportions for complex comorbid conditions involving aggression. Ironically, this real-world patient population is excluded from clinical research, leaving clinicians to rely on clinical experience rather than empirical evidence, data reveal that there are great disparities in the use of antipsychotics across facilities, and this may be due in part to the lack of available data to guide these practices.

• Several findings regarding the administration of psychotropic medications surprised us and raised important areas of concern. The number and proportion of medications on admission were very similar to medication regimens at discharge. One would expect that after an average stay of more than 3 months, more adjustments would be made to the medication regimen. The rationales for this lack of change in treatment regimen are situation makes it difficult to determine whether and how changes in medication might affect patient outcomes.

Prescription practices• The administration of two or more psychotropic medications

(polypharmacy) is also an area of concern. In our chart review, because the number of medications given to patients tended not to change over the course of treatment, it is possible that polypharmacy in these facilities represents treatment inertia. In other words, physicians at these facilities tend to sustain, rather than initiate, the use of polypharmacy. Patients' charts did not provide enough information regarding rationale for physicians' medication strategies, and given that cases are often seen by a number of physicians, there is little evidence of continuity in medication use. For example, one study found that nearly half of patients given risperidone in a State hospital were taken off their medication within 15 days after discharge by their outpatient physician

• A clear rationale for medication strategy was often missing from medication progress notes. This is particularly important given the great concern over antipsychotics' side effects, a concern that was repeatedly raised during focus groups. In these ways, physicians' actual practices did not match experts‘ agreed-upon best practices. Many current practices revealed in the chart review did not by clinicians and researchers.

Prenatal exposure to ultrasound waves impactsneuronal migration in mice PNAS Ang et al. August 22, 2006 vol. 103 no. 34

• Neurons of the cerebral neocortex in mammals, including humans, are generated during fetal life in the proliferative zones and then migrate to their final destinations by following an inside-to outside sequence. The present study examined the effect of ultrasound waves (USW) on neuronal position within the embryonic cerebral cortex in mice. We used a single BrdU (Bromodeoxyuridine commonly used in the detection of proliferating cells in living tissues) injection to label neurons generated at embryonic day 16 and destined for the superficial cortical layers.

• Our analysis of over 335 animals reveals that, when exposed to USW for a total of 30 min or longer during the period of their migration, a small but statistically significant number of neurons fail to acquire their proper position and remain scattered within inappropriate cortical layers and or in the subjacent white matter. The magnitude of dispersion of labeled neurons was variable but systematically increased with duration of exposure to USW. These results call for a further investigation in larger and slower-developing brains of non-human primates and continued scrutiny of unnecessarily long prenatal ultrasound exposure.

Prenatal Exposure to Ultrasound

Schematic representation of the progression of neuronal migration to the superficial cortical

layers in the normal mouse. (A–D) Most cells labeled with BrdU at E16 arrive in the cortex by

E18, and, by P1, those cells become surpassed by subsequently generated neurons.

Eventually, these cells will settle predominantly in layers 2 and 3 of the cerebrum. (E–H)

Model of the USW effect. When cells generated at E16 are exposed to USW, they slow down

on E17, and some remain in the white matter or are stacked in the deeper cortical layers.

Effect Size• Effect Size refers to the strength of association between variables.

The Pearson r correlation coefficient is one indicator of effect size; it indicates the strength of the linear association between two variables pg 252 Cozby & Bates

• The concept of effect size already appears in everyday language. For example, a weight loss program may boast that it leads to an average weight loss of 30 pounds. In this case, 30 pounds is the claimed effect size. Another example is that a tutoring program may claim that it raises school performance by one letter grade. This grade increase is the claimed effect size of the program. These are both examples of "absolute effect sizes", meaning that they convey the average difference between two groups without any discussion of the variability within the groups. For example, if the weight loss program results in an average loss of 30 pounds, it is possible that every participant loses exactly 30 pounds, or half the participants lose 60 pounds and half lose no weight at all

Socioeconomic Inequality in the Prevalence of AutismSpectrum Disorder Durkin MS et al.PLoS One. 2010 Jul 12;5(7)

• The prevalence of ASD increased in a dose-response manner with increasing SES, a pattern seen for all three SES indicators used to define SES categories

Prevalence per 10001 of ASD by three SES indicators based on census block group of residence. 1Thin bars indicate

95% confidence intervals. Within each SES indicator, both the trend test and x2 tests were significant at p,0.0001. 2MHI

refers to median household income.

• The main results of this study were consistent with the only study larger than this to examine the association between ASD risk and an indicator of SES. That study, published in 2002 by Croen and colleagues, looked at more than 5000 children with autism receiving services coordinated by the California Department of Developmental Services and found a stepwise increase in autism risk with increasing maternal education

• Epidemiologists long have suspected that associations between autism and SES are a result of ascertainment bias, on the assumption that as parental education and wealth increase, the chance that a child with autism will receive an accurate diagnosis also increases

• Paranormal phenomena Signal to Noise ratio

Path Analysis• Path analysis is a straightforward extension of multiple

regression. Its aim is to provide estimates of the magnitude and significance of hypothesized causal connections between sets of variables. This is best explained by considering a path diagram.

• To construct a path diagram we simply write the names of the variables and draw an arrow from each variable to any other variable we believe that it affects. We can distinguish between input and output path diagrams. An input path diagram is one that is drawn beforehand to help plan the analysis and represents the causal connections that are predicted by our hypothesis. An output path diagram represents the results of a statistical analysis, and shows what was actually found.

• To construct a path diagram we simply write the names of the variables and draw an arrow from each variable to any other variable we believe that it affects. We can distinguish between input and output path diagrams. An input path diagram is one that is drawn beforehand to help plan the analysis and represents the causal connections that are predicted by our hypothesis

• An output path diagram represents the results of a statistical analysis, and shows what was actually found

Distributions and Central Tendency

Dispersion Sum of Squares

• Subject Score(x) X2

Dispersion Sum of Squares

• Subjects Score X X 2 x X2

• 1 0 0 -5 25

• 2 1 1 -4 16

• 3 2 4 -3 9

• 4 4 16 -1 1

• 5 5 25 0 0

• 6 6 36 1 1

• 7 7 49 2 4

• 8 8 64 3 9

• 9 8 64 3 9

• 10 9 81 4 16

• N=10 T=50 ∑X2= 340 = 0 ∑ = =90

A Modified Constraint-Induced Therapy program

• Answer the following questions about the article• 1) A constraint-induced movement therapy (CIT) program is what kind of intervention (pg1 abstract)

• 2) Describe the Subjects: (how many) children with (what disorder) were placed in (what kind) of design (pg 1 under Methods in abstract)

• 3) What were the two procedures being compared? _________vs. __________

• 4) What were the two specifically designed tests? Name then __________and ________

• 5)How many times were the tests administered?_____ At what points in the study were they administered________?

• Was there a significant difference between the groups? (yes or no)?

• Which of the two groups or procedures was more effective?__________

Type out the above questions on a separate sheet and fill in the blanks and turn in the paper with your name, class & title at the top. Each blank is worth 2 points 12 blanks=25 points (24 + 1 bonus point)

• ∑

Organization of report/article Appendix AThe body of the paper will have the following sections; Introduction, Methods, Results and Discussion

• Introduction includes 1) the problem under study 2) literature review 3) rationale and hypothesis of the study-Introduction progresses from broad theories and research findings to specific current details

• Method provides reader with details information about how the study was conducted. Often there are subsections describing subjects, apparatus materials and the procedure(s) used. Number and relevant characteristics of subjects are stated. Any equipment used is described and the procedure section states how the study was conducted step by step in temporal order. Methods also describes how extraneous variables were controlled and how randomization was used

Organization of report/article Appendix A• Results-In this section you offer the reader a straightforward

description of you analyses with no explanation of the findings. Present your results in the same order as your predictions were made in the introduction section. State which statistical test was used and what level of alpha was set at. In APA style, tables and figures are not presented in the main body of the manuscript but rather placed at the end of the paper. Avoid duplication of tables, figures as well as statements in the text

• Discussion-In this section the interpretations of the results are described considering what is the relationship between our results and past research and theory. Explain how the study either did or did not obtain the results expected, what flaws and limitations were in the methods used and if you can generalize your results and the implications for future research

Organization of report/article Appendix A• Introduction -1) What is known 2) What is not

known that this study addresses

• Methods –Subjects Who are they, Where did you get them, What did you do with them (how assigned to groups, conditions etc.)

• Results- What happened? Did the result match the prediction or not?

• Discussion-What do the results mean (interpret them) for this study, the field in general and the future

• Stephan Cowans, a Boston man who spent six years in prison for the shooting of a police sergeant, was released in 2004 after the discovery that the fingerprint used to convict him was not his.

• That same year, the FBI mistakenly linked Brandon Mayfield, an Oregon lawyer, to a fingerprint lifted off a plastic bag of explosive detonators found in Madrid after commuter train bombings there killed 191 people. Two weeks after Mayfield’s arrest, Spanish investigators traced the fingerprint to an Algerian man

Diabetes and Cognitive Systems in Older Black and White Persons• Introduction

• Diabetes has long been associated with impaired cognition in white individuals and although the prevalence of diabetes is increasing this association with cognition has not been fully tested in black individuals

• Methods

• Subjects were older community dwelling persons recruited from senior and private residential housing in the Chicagoland area. All subjects were enrolled in 1 of 2 studies of aging and cognition (Minority Aging Research Study and the Memory and Aging Project with 336 and1,187 subjects respectively). After 80 subjects were eliminated due to a diagnosis of dementia the remaining subjects (mean age73.1 and 79.9years, mean education 14.8 and14.3years, 92.8%white and 6.3% white in the second study and all black in the first) underwent clinical, neurological and neuropsychological evaluation including tests of semantic memory, episodic memory, working memory, perceptual speed and visuospatial abilities