### Statistics Individual Data Distribution

This lesson covers statistics individual data distribution and population observation.

define statistics

statistics is the science of collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer questions. in addition, statistics is about providing a measure of confidence in any conclusions

individual

a person or object that is a member of the population being studied

descriptive statistics

consists of organizing and summarizing information collected

inferential statistics

uses methods that generalize results obtained from a sample to the population and measure the reliability of the results

statistic

a numerical summary of a sample

parameter

a numerical summary of a population

variables

the characteristics of the individuals of the population being studied

a sample of seniors is selected and it is found that 45% own a television

this is a statistic because the value is a numerical measurement describing a characteristic of a sample

the average annual salary of 50 of a company’s 800 employees is \$54,000

this is a statistic, because the data set of salaries of 50 employees is a sample

nation of origin

the variable is qualitative because it is an attribute characteristic

medal won in race

the variable is qualitative because it is an attribute characteristic

area of a park

the variable is continuous because it is countable

height of an office building

the variable is continuous because it is not countable

a polling organization contacts 2526 undergraduates who attend a university and live in the United States and asks whether or not they had spent more than \$200 on food in the last month

population: undergraduates who attend a university and live in the united states

sample: the 2526 undergraduates who attend a university and live in the united states

setup- a, b, c, d, e
size- 48, 40, 59, 41, 43
screen type- plasma, projection, projection, plasma, projection
number of channels available- 299, 111, 425, 270, 290

individuals being studied: the characteristics of high-definition televisions A through E

variables and their corresponding data being studied: size (48, 40, 59, 41, 43), screen type (plasma, projection, projection, plasma, projection), and number of channels available (299, 111, 425, 270, 290)

a study conducted by researchers was designed “to determine if the application of duct tape is as effective as cryotherapy in the treatment of common warts.” the researchers randomly divided 50 patients into two groups. the 25 patients in group 1 had their warts treated by applying duct tape. the 25 patients in group 2 had their warts treated by cryotherapy. once the treatments were complete, it was determined that 66% of the patients in group 1 & 86% of the patients in group 2 had complete resolution of their warts. the researchers concluded that cryotherapy is significantly more effective in treating warts than duct tape.

research objective: to determine if duct tape is as effective as cryotherapy in treating warts

sample: the 50 patients with warts

observational study

measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. that is, in an observational study, the researcher observes the behavior of the individuals without trying to influence the outcome of the study

designed experiment

if a researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, and then records the value of the response variable for each group

confounding

occurs when the effects of two or more explanatory variables are not separated. therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study

lurking variable

an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. in addition, lurking variables are typically related to explanatory variables considered in the study

three major categories of observational studies

1. cross-sectional studies: collect information about individuals at a specific point in time or over a very short period of time
2. case-control studies: retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records
3. cohort studies: first identify a group of individuals to participate in the study (the cohort) then observes them over a long period of time

census

a list of all individuals in a population along with certain characteristics of each individual

a study is conducted to determine if there is a relationship between Parkinson’s disease and childhood head trauma. doctors look at the hospital records for patients with parkinson’s disease for any childhood head trauma

the study is an observational study because the study examines individuals in a sample, but does not try to influence the response variable

while shopping, 350 people are asked to perform a taste test in which they drink two randomly placed, unmarked coffees. they are then asked which coffee they prefer

the study is an observational study because the study examines individuals in a sample, but does not try to influence the variable of interest

researchers wanted to determine if having a tv in the bedroom is associated with obesity. the researchers administered a questionnaire to 380 twelve-year-old adolescents. after analyzing the results, researchers determined that the body mass index of the adolescents who had a tv in their bedroom was significantly higher than that of the adolescents who did not have a tv in their bedroom

this is an observational study because the researchers observe the behavior of the individuals in the study without trying to influence an explanatory variable of the study

cross-sectional study

the response variable is the body mass index of the adolescents

the explanatory variable is whether the adolescent has a tv in the bedroom or not

possible lurking variables might be eating habits and the amount of exercise per week

“these results remain significant after adjustment for socioeconomic status” means that the researchers made an effort to avoid confounding by accounting for potential lurking variables

a television in the bedroom and obesity are associated because the body mass index of the adolescents who had a tv in their bedroom was significantly higher than that of the adolescents who did not have a tv in their bedroom

which sampling method does not require a frame?

systematic

cluster sample

obtained by dividing the population into groups and selecting all individuals within a random sample of the groups

stratified sample

obtained by dividing the population into homogenous groups and randomly selecting individuals from each group

when taking a systematic random sample of size n, every group of size n from the population has the same chance of being selected

false, because certain groups would never be selected

a simple random sample is always preferred because it obtains the same information as other sampling plans but requires a smaller sample size

false, because other sampling techniques may provide more information for less cost than a simple random sample

when conducting a cluster sample, it is better to have fewer clusters with more individuals when the clusters are heterogeneous

true, because when the clusters are heterogeneous, they are scaled down versions of the population

inferences based on voluntary response samples are generally not reliable

true, because it is often the case that the individuals who volunteer do not accurately represent the population

when obtaining a stratified sample, the number of individuals included within each stratum must be equal

false. within stratified samples, the number of individuals sampled from each stratum should be proportional to the size of the strata in the population

to estimate the percentage of defects in a recent manufacturing batch, a quality control manager at IBM selects every 14th computer that comes off the assembly line starting with the fourth until she obtains a sample of 30 computers

systematic sampling

to determine customer opinion of their pricing, greyhound lines randomly selects 60 busses during a certain week and surveys all passengers on the busses

cluster sampling

a salesperson obtained a systematic sample of size 30 from a list of 600 clients. to do so, he randomly selected a number from 1 to 20, obtaining the number 12. he included in the sample the 12th client on the list and every 20th client thereafter. list the numbers that correspond to the 30 clients selected

12, 32, …, 592

the human resource department at a certain company wants to conduct a survey regarding worker benefits. the department has an alphabetical list of all 7358 employees at the company and wants to conduct a systematic sample of size 70.

k = 105

determine the individuals who will be administered the survey. randomly select a number from 1 to k. suppose that we randomly select 4. starting with the first individual selected, the individuals in the survey will be 4, 109, …, 7249

what does it mean when a part of the population is under-represented?

a part of the population is under-represented when it is proportionally smaller in a sample than in its population

the owner of a shopping mall wishes to expand the number of shops available in the food court. he has a market researcher survey the first 110 customers who come into the food court during weekday afternoons to determine what types of food the shoppers would like to see added to the food court

cause of bias: sampling bias

best way to remedy this problem: ask customers throughout the day on both weekdays and weekends

a pro-life advocate wants to estimate the percentage of people who favor closing abortion clinics. she conducts a nationwide survey of 1980 randomly selected adults 18 years and older. the interviewer asks the respondents, “do you favor protecting unborn children by closing abortion clinics?”

response bias

a polling organization conducts a study to estimate the percentage of households that home school their children. it mails a questionnaire to 1958 randomly selected households across the United States and asks the head of each household if he or she home school their children. of the 1958 households selected, 18 responded.

nonresponse bias

a polling organization conducts a study to estimate the percentage of households that speak a foreign language as the primary language. they mail a questionnaire to 1,023 randomly selected households and asks the head of household if a foreign language is the primary language spoken at home. of the 1,023 households selected, 12 responded. this survey has bias.

nonresponse bias

possible remedy: conduct face-to-face or telephone interviews

to determine the public’s opinion of the police department, the police chief obtains a cluster sample of 15 census tracts within his jurisdiction and samples all households in the randomly selected tracts. uniformed police officers go door to door to conduct the survey

response bias

possible remedy: conduct a polling without police uniform

surveys tend to suffer from low response rates. based on past experience, a researcher determines that the typical response rate for an email survey is 40%. she wishes to obtain a sample of 400 respondents, so she emails the survey to 2000 randomly selected email addresses. assuming the response rate for her survey is 40%, will respondents form an unbiased sample?

no. the survey still suffers from undercoverage (sampling bias), nonresponse bias, and potentially response bias

what are some solutions to nonresponse?

offer rewards and incentives, attempt callbacks

what are the advantages of having a presurvey with open questions to assist in constructing a questionnaire that has closed questions?

the researcher can learn common answers

experimental unit

a person, object, or some other well-defined item upon which a treatment is applied

treatment

any combination of the values of the factors (explanatory variables)

response variable

the quantitative or qualitative variable for which the experimenter wishes to determine how its value is affected by the explanatory variable

factor

a variable whose effect on the response variable is to be assessed by the experimenter

placebo

an innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication

confounding

the effect of two factors (explanatory variables on the response variable) cannot be distinguished

blocking

grouping together similar experimental units and then randomly assigning the experimental units within each group to a treatment

generally the goal of an experiment is to determine the effect that the treatment will have on the response variable

TRUE

a school psychologist wants to test the effectiveness of a new method of teaching statistics. she recruits 200 second-grade students and randomly divides them into two groups. group 1 is taught by means of the new method, while group 2 is taught via traditional methods. the same teacher is assigned to both groups. at the end of the year, an achievement test is administered and the results of the two groups compared

response variable: the score on the achievement test

explanatory variable manipulated: method of teaching

2 levels of treatment

type of experimental design: completely randomized assignment

subjects: 200 students

researchers wanted to evaluate whether a certain herb improved memory in elderly adults as measured by objective tests. to do this, they recruited 98 men and 125 women older than 65 years and in good health. participants were randomly assigned to receive the herb, 45 mg 3 times a day, or a matching placebo. a measure of memory improvement was determined by a standardized test of learning and memory

type of experimental design: completely randomized design

population being studied: adults older than 65 years and in good health

response variable: score on standardized test of learning and memory

what is the factor? the herb

treatments: 45 mg 3 times a day or a matching placebo

experimental units: 98 men and 125 women older than 65 who are in good health that participated in the study

a marketing research firm wishes to determine the most effective method of promoting a rock band: print, radio, television, or online. the researcher segments volunteers by their ages. of the 490 volunteers, 140 are under 20 years old, 70 are 20-39 years old, 140 are 40-59 years old, and 140 are 60 years old or older. the volunteers from each group are randomly assigned to either the print advertising group, the radio group, the television group, or the online group. each group is exposed to the advertising. after 1 hour, a recall exam is given with the proportion of correct answers recorded.

randomized block design

response variable: the scores on the recall exam

explanatory variable manipulated: type of advertising

4 treatments

researchers wish to know if there is a link between hypertension (high blood pressure) and consumption of salt. past studies have indicated that the consumption of fruits and vegetables offsets the negative impact of salt consumption. it is also known that there is quite a bit of person-to-person variability as far as the ability of the body to process and eliminate salt. however, no method exists for identifying individuals who have a higher ability to process salt. it is recommended that daily intake of salt should not exceed 2300 milligrams (mg). the researchers want to keep the design simple, so they choose to conduct their study using a completely randomized design.

response variable: blood pressure

three factors that have been identified: daily consumption of fruits and vegetables, daily consumption of salt, body’s ability to process salt

blood pressure- not a factor

daily consumption of salt- can be controlled

daily consumption of fruits and vegetables- can be controlled

body’s ability to process salt- cannot be controlled

age- not a factor

gender- not a factor

if a factor cannot be controlled, what should be done to reduce variability in the response variable? experimental units should be randomized to each treatment group

to determine customer opinion of their safety features, daimler- chrysler randomly selects 120 service centers during a certain week and surveys all customers visiting the service center

cluster

the manager of a shopping mall wishes to expand the number of shops available in the food court. he has a market researcher survey the first 120 customers who come into the food court during weekend evenings to determine what types of food the shoppers would like to see added to the food court

cause of bias: sampling bias

best way to remedy this problem: ask customers throughout the day on both weekdays and weekends

a polling organization conducts a study to estimate the percentage of households that has two incomes. it mails a questionnaire to 1841 randomly selected households across the united states and asks the head of each household if he or she has two incomes. of the 1841 households selected, 42 responded.

nonresponse bias

a salesperson obtained a systematic sample size of 25 from a list of 500 clients. to do so, he randomly selected a number 1 to 20, obtaining number 13. he included in the sample the 13th client on the list and every 20th client thereafter. list the numbers that correspond to the 25 clients selected.

13, 33, …, 493

frequency distribution

lists the number of occurrences of each category of data

relative frequency distribution

lists the proportion of occurrences of each category of data

bar graph

a horizontal or vertical representation of the frequency or relative frequency of the categories. the height of each rectangle represents the category’s frequency or relative frequency

pareto chart

a bar graph whose bars are drawn in decreasing order of frequency or relative frequency

classes

the categories by which data are grouped

stem-and-leaf plots are particularly useful for large sets of data

false

a histogram of a set of data indicates that the distribution of the data is skewed right. which measure of central tendency will likely be larger, the mean or the median? why?

the mean will likely be larger because the extreme values in the right tail tend to pull the mean in the direction of the tail

a data set will always have exactly one mode

false

for a large sporting event the broadcasters sold 51 ad slots for a total revenue of \$135 million. what was the mean price per ad slot?

2.6 million

*the median for the given set of six ordered data values is 29.5

7
12
21
38*****
41
51

an insurance company crashed four cars of the same model at 5 mph. the costs of repair for each of the four crashes were 411, 443, 468, and 232. compute the mean, median, and mode cost of repair.

mean- 388.5
median-427
mode does not exist

which measure of central tendency best describes the “center” of the distribution?

mean

the sum of the deviations about the mean always equals

zero

complete the paragraph

the standard deviation is used in conjunction with the mean to numerically describe distributions that are bell shaped. the mean measures the center of the distribution, while the standard deviation measures the spread of the distribution

when comparing two populations, the larger the standard deviation, the more dispersion the distribution has, provided that the variable of interest from the two populations has the same unit of measure

true, because the standard deviation describes how far, on average, each observation is from the typical value. a larger standard deviation means that observations are more distant from the typical value, and therefore, more dispersed.

chebyshev’s inequality applies to all distributions regardless of shape, but the empirical rule holds only for distributions that are bell shaped

true, chebyshev’s inequality is less precise than the empirical rule, but will work for any distribution, while the empirical rule only works for bell-shaped distributions

find the sample variance and standard deviation: 23, 13, 6, 10, 9

s2= 42.7
s= 6.5

find the population variance and standard deviation: 8, 11, 15, 17, 19

population variance: 16
standard deviation: 4

compute the range and sample standard deviation for strength of the concrete (in psi): 3970, 4140, 3400, 3200, 2910, 3840, 4140, 4040

the range is 1230 psi

s=472 psi

the weight of an organ in adult males has a bell-shaped distribution with a mean of 300 grams and a standard deviation of 35 grams. use the empirical rule to determine the following
(a) about 95% of organs will be between what weights?
(b) what percentage of organs weighs between 265 grams and 335 grams?
(c) what percentage of organs weighs less than 265 grams or more than 335 grams?
(d) what percentage of organs weighs between 195 grams and 370 grams?

(a) 230 and 370 grams
(b) 68%
(c) 32%
(d) 97.35%

what makes the range less desirable than the standard deviation as a measure of dispersion?

the range does not use all the observations

Homepage