Data & Individual population-Statistic Bias
In this chapter we discuss data & individual population-statistic bias.
Error that results from using a sample to estimate information about a population. Occurs because a sample gives incomplete information about a population.
completely randomized design
a design where each experimental unit is randomly assigned to a treatment
the information being conveyed is based on casual observation, not scientific research
the entire group of individuals to be studied
a person or object that is a member of the population being studied
a subset of the population that is being studied
a numerical summary of a sample
Consist of organizing and summarizing data. Describe data through numerical summaries, tables, and graphs.
Uses methods that take a result from a sample, extend it to the population, and measure the reliability of the result. One goal is to estimate parameters.
a numerical summary of a population
the process of statistics
1. Identify the research objective
2. Collect the data needed to answer the question(s) posed in step 1
3. Describe the data
4. Perform inference
Samples obtained through convenience rather than systematically, i.e. Internet or phone-in polls. Not based on randomness. Not considered reliable.
the characteristics of the individuals within the population
qualitative (categorical) variables
allow for classification of individuals based on some attribute or characteristic
Provide numerical measures of individuals. Math operations such as addition and subtraction can be performed on the values of a quantitative variable and will provide meaningful results.
a way to look at and organize a problem so that it can be solved.
A quantitative variable that has either a finite number of possible values or a countable number of possible values. The values result from counting.
A quantitative variable that has an infinite number of possible values that are not countable, but are instead measured.
Observations corresponding to a qualitative variable.
Observations corresponding to a quantitative variable.
Observations corresponding to a discrete variable.
Observations corresponding to a continuous variable.
nominal level of measurement
The values of a variable name, label, or categorize. The naming scheme does not allow for the values of the variable to be arranged in a ranked or specific order.
ordinal level of measurement
The variable has the properties of the nominal level of measurement and the naming scheme allows for the values of the variable to be arranged in a ranked or specific order.
interval level of measurement
The variable has the properties of the ordinal level of measurement and the differences in the values of the variable have meaning. Zero does not mean the absence of the quantity. Addition and subtraction can be performed on values of the variable.
ratio level of measurement
the variable has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. Zero means the absence of quantity. Multiplication and division can be performed on values of the variable.
The science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusions.
Represents how close to the true value of a measurement a measurement is. A variable is valid if it measures what it is supposed to measure.
The ability of different measurements of the same individual to yield the same results.
Four levels of measurement of a variable
Measure the value of the response variable without attempting to influence the value of either the response or explanatory variables. In an observational study, the researcher observes the behavior of the individuals in the study without trying to influence the outcome of the study. Association may be claimed but not causation.
An experiment where the researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, then records the value of the response variable for each group.
a variable that explains or causes changes in the response variable
a variable that measures an outcome or result of a study (variable whose changes are to be studied)
Occurs when the effects of two or more explanatory variables are not separated, so any change in the response variable may be due to a variable that was not accounted for in the study.
An explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. Lurking variables are typically related to explanatory variables considered in the study.
three categories of observational studies
1. cross-sectional studies
2. case-control studies
3. cohort studies
Observational studies that collect information about individuals at a specific point in time or over a very short period of time.
Retrospective studies that require individuals to look back in time or require the researcher to examine existing records. Individuals that have a certain characteristic are matched with those that do not.
Identifies a group of individuals to participate in the study (the cohort). The cohort is observed over a period of time. Characteristics about the individuals are recorded. Some individuals are exposed to certain factors, and others are not. At the end of the study, the value of the response value is recorded for the individuals.
Facts or propositions used to draw a conclusion or make a decision. The list of observed values for a variable.
a list of all individuals in a population along with certain characteristics of each individual
the process of using chance to select individuals from a population to be included in the sample
simple random sampling
every possible sample of size n from a population of size N has an equally likely chance of occurring
lists all the individuals in a population
sample without replacement
once an individual is selected, he is removed from the population and cannot be chosen again
sampling with replacement
a selected individual is placed back in the population and could be chosen again
provides an initial point for a random-number generator to start creating random numbers
Obtained by separating the population into non-overlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogeneous in some way.
Obtained by selecting every kth individual from the population. The first individual selected corresponds to a random number between 1 and k.
steps in systematic sampling
1. Approximate the population size, N
2. Determine the sample size desired, n
3. Compute N/n and round down to the nearest integer. This value is k.
4. Randomly select a number between 1 and k. Call this number p.
5. The sample will consist of the following individuals:
p, p+k, p+2k,…p+(n-1)k
Obtained by selecting all individuals within a randomly selected collection or group of individuals
self-selected convenience sample
Individuals themselves decide to participate in a survey. Also known as voluntary response samples.
the use of a combination of sampling techniques
the results of the sample are not representative of the population
three sources of bias in sampling
1. Sampling bias
2. Nonresponse bias
3. Response bias
the technique used to obtain the individuals to be in the sample tends to favor one part of the population over another
the proportion of one segment of the population is lower in a sample than it is in the population
individuals selected to be in the sample who do not respond to the survey have different opinions from those who do
methods to decrease nonresponse bias
2. rewards and incentives
the answers on a survey do not reflect the true feelings of the respondent
sources of response bias
1. interviewer error
2. misrepresented answers
3. wording of questions
4. ordering of questions or words
5. type of question (open or closed)
6. data entry error
a question for which the respondent is free to choose his or her response
a question for which the respondent must choose from a list of predetermined responses
Errors that result from undercoverage, nonresponse bias, response bias, or data entry error. May be present in a complete census of the population.
A controlled study conducted to determine the effect varying one or more explanatory variables or factors has on a response variable. Any combination of the values of the factors is called a treatment.
A person, object, or some other well-defined item upon which a treatment is applied.
an experimental unit that is a person
a baseline treatment that can be used to compare to other treatments
an innocuous medication, such as a sugar pill, that looks, tastes, an smells like the experimental medication
nondisclosure of the treatment an experimental unit is receiving
an experiment in which the experimental unit does not know which treatment he is receiving
an experiment in which neither the experimental unit nor the researcher knows which treatment the experimental unit is receiving
steps in designing an experiment
1. Identify the problem to be solved
2. Determine the factors that affect the response variable
3. Determine the number of experimental units
4. Determine the level of each factor
5. Conduct the experiment
(a) Randomly assign the subjects
(b) Collect and process the data
6. Test the claim
describe the overall plan in conducting an experiment
occurs when each treatment is applied to more than one experimental unit, to confirm that the effect of a treatment is not due to some characteristic of a single experimental unit
An experimental design where the experimental units are paired up according to some sort of relation or matching characteristics. Only two levels of treatment may be performed in a matched-pairs design.