Data & Individual Population-Statistic Bias

Data & Individual population-Statistic Bias

In this chapter we discuss data & individual population-statistic bias.

sampling error

Error that results from using a sample to estimate information about a population. Occurs because a sample gives incomplete information about a population.

completely randomized design

a design where each experimental unit is randomly assigned to a treatment


the information being conveyed is based on casual observation, not scientific research


the entire group of individuals to be studied


a person or object that is a member of the population being studied


a subset of the population that is being studied


a numerical summary of a sample

descriptive statistics

Consist of organizing and summarizing data. Describe data through numerical summaries, tables, and graphs.

inferential statistics

Uses methods that take a result from a sample, extend it to the population, and measure the reliability of the result. One goal is to estimate parameters.


a numerical summary of a population

the process of statistics

1. Identify the research objective
2. Collect the data needed to answer the question(s) posed in step 1
3. Describe the data
4. Perform inference

convenience samples

Samples obtained through convenience rather than systematically, i.e. Internet or phone-in polls. Not based on randomness. Not considered reliable.


the characteristics of the individuals within the population

qualitative (categorical) variables

allow for classification of individuals based on some attribute or characteristic

quantitative variables

Provide numerical measures of individuals. Math operations such as addition and subtraction can be performed on the values of a quantitative variable and will provide meaningful results.


a way to look at and organize a problem so that it can be solved.

discrete variable

A quantitative variable that has either a finite number of possible values or a countable number of possible values. The values result from counting.

continuous variable

A quantitative variable that has an infinite number of possible values that are not countable, but are instead measured.

Qualitative data

Observations corresponding to a qualitative variable.

Quantitative data

Observations corresponding to a quantitative variable.

Discrete data

Observations corresponding to a discrete variable.

Continuous data

Observations corresponding to a continuous variable.

nominal level of measurement

The values of a variable name, label, or categorize. The naming scheme does not allow for the values of the variable to be arranged in a ranked or specific order.

ordinal level of measurement

The variable has the properties of the nominal level of measurement and the naming scheme allows for the values of the variable to be arranged in a ranked or specific order.

interval level of measurement

The variable has the properties of the ordinal level of measurement and the differences in the values of the variable have meaning. Zero does not mean the absence of the quantity. Addition and subtraction can be performed on values of the variable.

ratio level of measurement

the variable has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. Zero means the absence of quantity. Multiplication and division can be performed on values of the variable.


The science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusions.


Represents how close to the true value of a measurement a measurement is. A variable is valid if it measures what it is supposed to measure.


The ability of different measurements of the same individual to yield the same results.

Four levels of measurement of a variable

1. nominal
2. ordinal
3. interval
4. ratio

observational study

Measure the value of the response variable without attempting to influence the value of either the response or explanatory variables. In an observational study, the researcher observes the behavior of the individuals in the study without trying to influence the outcome of the study. Association may be claimed but not causation.

designed experiment

An experiment where the researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, then records the value of the response variable for each group.

explanatory variable

a variable that explains or causes changes in the response variable

response variable

a variable that measures an outcome or result of a study (variable whose changes are to be studied)


Occurs when the effects of two or more explanatory variables are not separated, so any change in the response variable may be due to a variable that was not accounted for in the study.

lurking variable

An explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. Lurking variables are typically related to explanatory variables considered in the study.

three categories of observational studies

1. cross-sectional studies
2. case-control studies
3. cohort studies

cross-sectional studies

Observational studies that collect information about individuals at a specific point in time or over a very short period of time.

case-control studies

Retrospective studies that require individuals to look back in time or require the researcher to examine existing records. Individuals that have a certain characteristic are matched with those that do not.

cohort studies

Identifies a group of individuals to participate in the study (the cohort). The cohort is observed over a period of time. Characteristics about the individuals are recorded. Some individuals are exposed to certain factors, and others are not. At the end of the study, the value of the response value is recorded for the individuals.


Facts or propositions used to draw a conclusion or make a decision. The list of observed values for a variable.


a list of all individuals in a population along with certain characteristics of each individual

random sampling

the process of using chance to select individuals from a population to be included in the sample

simple random sampling

every possible sample of size n from a population of size N has an equally likely chance of occurring


lists all the individuals in a population

sample without replacement

once an individual is selected, he is removed from the population and cannot be chosen again

sampling with replacement

a selected individual is placed back in the population and could be chosen again


provides an initial point for a random-number generator to start creating random numbers

stratified sample

Obtained by separating the population into non-overlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogeneous in some way.

systematic sample

Obtained by selecting every kth individual from the population. The first individual selected corresponds to a random number between 1 and k.

steps in systematic sampling

1. Approximate the population size, N
2. Determine the sample size desired, n
3. Compute N/n and round down to the nearest integer. This value is k.
4. Randomly select a number between 1 and k. Call this number p.
5. The sample will consist of the following individuals:
p, p+k, p+2k,…p+(n-1)k

cluster sample

Obtained by selecting all individuals within a randomly selected collection or group of individuals

self-selected convenience sample

Individuals themselves decide to participate in a survey. Also known as voluntary response samples.

multistage sampling

the use of a combination of sampling techniques


the results of the sample are not representative of the population

three sources of bias in sampling

1. Sampling bias
2. Nonresponse bias
3. Response bias

sampling bias

the technique used to obtain the individuals to be in the sample tends to favor one part of the population over another


the proportion of one segment of the population is lower in a sample than it is in the population

nonresponse bias

individuals selected to be in the sample who do not respond to the survey have different opinions from those who do

methods to decrease nonresponse bias

1. callbacks
2. rewards and incentives

response bias

the answers on a survey do not reflect the true feelings of the respondent

sources of response bias

1. interviewer error
2. misrepresented answers
3. wording of questions
4. ordering of questions or words
5. type of question (open or closed)
6. data entry error

open question

a question for which the respondent is free to choose his or her response

closed question

a question for which the respondent must choose from a list of predetermined responses

nonsampling errors

Errors that result from undercoverage, nonresponse bias, response bias, or data entry error. May be present in a complete census of the population.


A controlled study conducted to determine the effect varying one or more explanatory variables or factors has on a response variable. Any combination of the values of the factors is called a treatment.

experimental unit

A person, object, or some other well-defined item upon which a treatment is applied.


an experimental unit that is a person

control group

a baseline treatment that can be used to compare to other treatments


an innocuous medication, such as a sugar pill, that looks, tastes, an smells like the experimental medication


nondisclosure of the treatment an experimental unit is receiving


an experiment in which the experimental unit does not know which treatment he is receiving


an experiment in which neither the experimental unit nor the researcher knows which treatment the experimental unit is receiving

steps in designing an experiment

1. Identify the problem to be solved
2. Determine the factors that affect the response variable
3. Determine the number of experimental units
4. Determine the level of each factor
(a) Control
(b) Randomize
5. Conduct the experiment
(a) Randomly assign the subjects
(b) Collect and process the data
6. Test the claim


describe the overall plan in conducting an experiment


occurs when each treatment is applied to more than one experimental unit, to confirm that the effect of a treatment is not due to some characteristic of a single experimental unit

matched-pairs design

An experimental design where the experimental units are paired up according to some sort of relation or matching characteristics. Only two levels of treatment may be performed in a matched-pairs design.