### Data & Individual population-Statistic Bias

In this chapter we discuss data & individual population-statistic bias.

**sampling error**

Error that results from using a sample to estimate information about a population. Occurs because a sample gives incomplete information about a population.

**completely randomized design**

a design where each experimental unit is randomly assigned to a treatment

**anecdotal**

the information being conveyed is based on casual observation, not scientific research

**population**

the entire group of individuals to be studied

**individual**

a person or object that is a member of the population being studied

**sample**

a subset of the population that is being studied

**statistic**

a numerical summary of a sample

**descriptive statistics**

Consist of organizing and summarizing data. Describe data through numerical summaries, tables, and graphs.

**inferential statistics**

Uses methods that take a result from a sample, extend it to the population, and measure the reliability of the result. One goal is to estimate parameters.

a numerical summary of a population

**the process of statistics**

1. Identify the research objective

2. Collect the data needed to answer the question(s) posed in step 1

3. Describe the data

4. Perform inference

**convenience samples**

Samples obtained through convenience rather than systematically, i.e. Internet or phone-in polls. Not based on randomness. Not considered reliable.

the characteristics of the individuals within the population

**qualitative (categorical) variables**

allow for classification of individuals based on some attribute or characteristic

**quantitative variables**

Provide numerical measures of individuals. Math operations such as addition and subtraction can be performed on the values of a quantitative variable and will provide meaningful results.

**approach**

a way to look at and organize a problem so that it can be solved.

**discrete variable**

A quantitative variable that has either a finite number of possible values or a countable number of possible values. The values result from counting.

**continuous variable**

A quantitative variable that has an infinite number of possible values that are not countable, but are instead measured.

**Qualitative data**

Observations corresponding to a qualitative variable.

**Quantitative data**

Observations corresponding to a quantitative variable.

**Discrete data**

Observations corresponding to a discrete variable.

**Continuous data**

Observations corresponding to a continuous variable.

**nominal level of measurement**

The values of a variable name, label, or categorize. The naming scheme does not allow for the values of the variable to be arranged in a ranked or specific order.

**ordinal level of measurement**

The variable has the properties of the nominal level of measurement and the naming scheme allows for the values of the variable to be arranged in a ranked or specific order.

**interval level of measurement**

The variable has the properties of the ordinal level of measurement and the differences in the values of the variable have meaning. Zero does not mean the absence of the quantity. Addition and subtraction can be performed on values of the variable.

**ratio level of measurement**

the variable has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. Zero means the absence of quantity. Multiplication and division can be performed on values of the variable.

**statistics**

The science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusions.

**validity**

Represents how close to the true value of a measurement a measurement is. A variable is valid if it measures what it is supposed to measure.

**reliability**

The ability of different measurements of the same individual to yield the same results.

Four levels of measurement of a variable

1. nominal

2. ordinal

3. interval

4. ratio

**observational study**

Measure the value of the response variable without attempting to influence the value of either the response or explanatory variables. In an observational study, the researcher observes the behavior of the individuals in the study without trying to influence the outcome of the study. Association may be claimed but not causation.

**designed experiment**

An experiment where the researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, then records the value of the response variable for each group.

**explanatory variable**

a variable that explains or causes changes in the response variable

**response variable**

a variable that measures an outcome or result of a study (variable whose changes are to be studied)

**confounding**

Occurs when the effects of two or more explanatory variables are not separated, so any change in the response variable may be due to a variable that was not accounted for in the study.

**lurking variable**

An explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. Lurking variables are typically related to explanatory variables considered in the study.

**three categories of observational studies**

1. cross-sectional studies

2. case-control studies

3. cohort studies

**cross-sectional studies**

Observational studies that collect information about individuals at a specific point in time or over a very short period of time.

**case-control studies**

Retrospective studies that require individuals to look back in time or require the researcher to examine existing records. Individuals that have a certain characteristic are matched with those that do not.

**cohort studies**

Identifies a group of individuals to participate in the study (the cohort). The cohort is observed over a period of time. Characteristics about the individuals are recorded. Some individuals are exposed to certain factors, and others are not. At the end of the study, the value of the response value is recorded for the individuals.

**data**

Facts or propositions used to draw a conclusion or make a decision. The list of observed values for a variable.

**census**

a list of all individuals in a population along with certain characteristics of each individual

**random sampling**

the process of using chance to select individuals from a population to be included in the sample

**simple random sampling**

every possible sample of size n from a population of size N has an equally likely chance of occurring

**frame**

lists all the individuals in a population

**sample without replacement**

once an individual is selected, he is removed from the population and cannot be chosen again

**sampling with replacement**

a selected individual is placed back in the population and could be chosen again

**seed**

provides an initial point for a random-number generator to start creating random numbers

**stratified sample**

Obtained by separating the population into non-overlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogeneous in some way.

**systematic sample**

Obtained by selecting every kth individual from the population. The first individual selected corresponds to a random number between 1 and k.

**steps in systematic sampling**

1. Approximate the population size, N

2. Determine the sample size desired, n

3. Compute N/n and round down to the nearest integer. This value is k.

4. Randomly select a number between 1 and k. Call this number p.

5. The sample will consist of the following individuals:

p, p+k, p+2k,…p+(n-1)k

**cluster sample**

Obtained by selecting all individuals within a randomly selected collection or group of individuals

**self-selected convenience sample**

Individuals themselves decide to participate in a survey. Also known as voluntary response samples.

**multistage sampling**

the use of a combination of sampling techniques

**bias**

the results of the sample are not representative of the population

**three sources of bias in sampling**

1. Sampling bias

2. Nonresponse bias

3. Response bias

**sampling bias**

the technique used to obtain the individuals to be in the sample tends to favor one part of the population over another

**undercoverage**

the proportion of one segment of the population is lower in a sample than it is in the population

**nonresponse bias**

individuals selected to be in the sample who do not respond to the survey have different opinions from those who do

**methods to decrease nonresponse bias**

1. callbacks

2. rewards and incentives

**response bias**

the answers on a survey do not reflect the true feelings of the respondent

**sources of response bias**

1. interviewer error

2. misrepresented answers

3. wording of questions

4. ordering of questions or words

5. type of question (open or closed)

6. data entry error

**open question**

a question for which the respondent is free to choose his or her response

**closed question**

a question for which the respondent must choose from a list of predetermined responses

**nonsampling errors**

Errors that result from undercoverage, nonresponse bias, response bias, or data entry error. May be present in a complete census of the population.

**experiment**

A controlled study conducted to determine the effect varying one or more explanatory variables or factors has on a response variable. Any combination of the values of the factors is called a treatment.

**experimental unit**

A person, object, or some other well-defined item upon which a treatment is applied.

**subject**

an experimental unit that is a person

**control group**

a baseline treatment that can be used to compare to other treatments

**placebo**

an innocuous medication, such as a sugar pill, that looks, tastes, an smells like the experimental medication

**blinding**

nondisclosure of the treatment an experimental unit is receiving

**single-blind**

an experiment in which the experimental unit does not know which treatment he is receiving

**double-blind**

an experiment in which neither the experimental unit nor the researcher knows which treatment the experimental unit is receiving

**steps in designing an experiment**

1. Identify the problem to be solved

2. Determine the factors that affect the response variable

3. Determine the number of experimental units

4. Determine the level of each factor

(a) Control

(b) Randomize

5. Conduct the experiment

(a) Randomly assign the subjects

(b) Collect and process the data

6. Test the claim

**design**

describe the overall plan in conducting an experiment

**replication**

occurs when each treatment is applied to more than one experimental unit, to confirm that the effect of a treatment is not due to some characteristic of a single experimental unit

**matched-pairs design**

An experimental design where the experimental units are paired up according to some sort of relation or matching characteristics. Only two levels of treatment may be performed in a matched-pairs design.