### Statistics Individual Data Distribution

This lesson covers statistics individual data distribution and population observation.

define statistics

**statistics is the science of collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer questions. in addition, statistics is about providing a measure of confidence in any conclusions**

individual

**a person or object that is a member of the population being studied**

descriptive statistics

**consists of organizing and summarizing information collected**

inferential statistics

**uses methods that generalize results obtained from a sample to the population and measure the reliability of the results**

statistic

**a numerical summary of a sample**

parameter

**a numerical summary of a population**

**the characteristics of the individuals of the population being studied**

a sample of seniors is selected and it is found that 45% own a television

**this is a statistic because the value is a numerical measurement describing a characteristic of a sample**

the average annual salary of 50 of a company’s 800 employees is $54,000

**this is a statistic, because the data set of salaries of 50 employees is a sample**

nation of origin

**the variable is qualitative because it is an attribute characteristic**

medal won in race

**the variable is qualitative because it is an attribute characteristic**

area of a park

**the variable is continuous because it is countable**

height of an office building

**the variable is continuous because it is not countable**

a polling organization contacts 2526 undergraduates who attend a university and live in the United States and asks whether or not they had spent more than $200 on food in the last month

**population: undergraduates who attend a university and live in the united statessample: the 2526 undergraduates who attend a university and live in the united states**

setup- a, b, c, d, e

size- 48, 40, 59, 41, 43

screen type- plasma, projection, projection, plasma, projection

number of channels available- 299, 111, 425, 270, 290

**individuals being studied: the characteristics of high-definition televisions A through E****variables and their corresponding data being studied: size (48, 40, 59, 41, 43), screen type (plasma, projection, projection, plasma, projection), and number of channels available (299, 111, 425, 270, 290)**

a study conducted by researchers was designed “to determine if the application of duct tape is as effective as cryotherapy in the treatment of common warts.” the researchers randomly divided 50 patients into two groups. the 25 patients in group 1 had their warts treated by applying duct tape. the 25 patients in group 2 had their warts treated by cryotherapy. once the treatments were complete, it was determined that 66% of the patients in group 1 & 86% of the patients in group 2 had complete resolution of their warts. the researchers concluded that cryotherapy is significantly more effective in treating warts than duct tape.

**research objective: to determine if duct tape is as effective as cryotherapy in treating wartssample: the 50 patients with warts**

observational study

**measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. that is, in an observational study, the researcher observes the behavior of the individuals without trying to influence the outcome of the study**

designed experiment

**if a researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, and then records the value of the response variable for each group**

confounding

**occurs when the effects of two or more explanatory variables are not separated. therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study**

lurking variable

**an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. in addition, lurking variables are typically related to explanatory variables considered in the study**

three major categories of observational studies

**1. cross-sectional studies: collect information about individuals at a specific point in time or over a very short period of time2. case-control studies: retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records3. cohort studies: first identify a group of individuals to participate in the study (the cohort) then observes them over a long period of time**

census

**a list of all individuals in a population along with certain characteristics of each individual**

a study is conducted to determine if there is a relationship between Parkinson’s disease and childhood head trauma. doctors look at the hospital records for patients with parkinson’s disease for any childhood head trauma

**the study is an observational study because the study examines individuals in a sample, but does not try to influence the response variable**

while shopping, 350 people are asked to perform a taste test in which they drink two randomly placed, unmarked coffees. they are then asked which coffee they prefer

**the study is an observational study because the study examines individuals in a sample, but does not try to influence the variable of interest**

researchers wanted to determine if having a tv in the bedroom is associated with obesity. the researchers administered a questionnaire to 380 twelve-year-old adolescents. after analyzing the results, researchers determined that the body mass index of the adolescents who had a tv in their bedroom was significantly higher than that of the adolescents who did not have a tv in their bedroom

**this is an observational study because the researchers observe the behavior of the individuals in the study without trying to influence an explanatory variable of the studycross-sectional studythe response variable is the body mass index of the adolescentsthe explanatory variable is whether the adolescent has a tv in the bedroom or notpossible lurking variables might be eating habits and the amount of exercise per week“these results remain significant after adjustment for socioeconomic status” means that the researchers made an effort to avoid confounding by accounting for potential lurking variablesa television in the bedroom and obesity are associated because the body mass index of the adolescents who had a tv in their bedroom was significantly higher than that of the adolescents who did not have a tv in their bedroom**

which sampling method does not require a frame?

**systematic**

cluster sample

**obtained by dividing the population into groups and selecting all individuals within a random sample of the groups**

stratified sample

**obtained by dividing the population into homogenous groups and randomly selecting individuals from each group**

when taking a systematic random sample of size n, every group of size n from the population has the same chance of being selected

**false, because certain groups would never be selected**

a simple random sample is always preferred because it obtains the same information as other sampling plans but requires a smaller sample size

**false, because other sampling techniques may provide more information for less cost than a simple random sample**

when conducting a cluster sample, it is better to have fewer clusters with more individuals when the clusters are heterogeneous

**true, because when the clusters are heterogeneous, they are scaled down versions of the population**

inferences based on voluntary response samples are generally not reliable

**true, because it is often the case that the individuals who volunteer do not accurately represent the population**

when obtaining a stratified sample, the number of individuals included within each stratum must be equal

**false. within stratified samples, the number of individuals sampled from each stratum should be proportional to the size of the strata in the population**

to estimate the percentage of defects in a recent manufacturing batch, a quality control manager at IBM selects every 14th computer that comes off the assembly line starting with the fourth until she obtains a sample of 30 computers

**systematic sampling**

to determine customer opinion of their pricing, greyhound lines randomly selects 60 busses during a certain week and surveys all passengers on the busses

**cluster sampling**

a salesperson obtained a systematic sample of size 30 from a list of 600 clients. to do so, he randomly selected a number from 1 to 20, obtaining the number 12. he included in the sample the 12th client on the list and every 20th client thereafter. list the numbers that correspond to the 30 clients selected

**12, 32, …, 592**

the human resource department at a certain company wants to conduct a survey regarding worker benefits. the department has an alphabetical list of all 7358 employees at the company and wants to conduct a systematic sample of size 70.

**k = 105determine the individuals who will be administered the survey. randomly select a number from 1 to k. suppose that we randomly select 4. starting with the first individual selected, the individuals in the survey will be 4, 109, …, 7249**

what does it mean when a part of the population is under-represented?

**a part of the population is under-represented when it is proportionally smaller in a sample than in its population**

the owner of a shopping mall wishes to expand the number of shops available in the food court. he has a market researcher survey the first 110 customers who come into the food court during weekday afternoons to determine what types of food the shoppers would like to see added to the food court

**cause of bias: sampling biasbest way to remedy this problem: ask customers throughout the day on both weekdays and weekends**

a pro-life advocate wants to estimate the percentage of people who favor closing abortion clinics. she conducts a nationwide survey of 1980 randomly selected adults 18 years and older. the interviewer asks the respondents, “do you favor protecting unborn children by closing abortion clinics?”

**response bias**

a polling organization conducts a study to estimate the percentage of households that home school their children. it mails a questionnaire to 1958 randomly selected households across the United States and asks the head of each household if he or she home school their children. of the 1958 households selected, 18 responded.

**nonresponse bias**

a polling organization conducts a study to estimate the percentage of households that speak a foreign language as the primary language. they mail a questionnaire to 1,023 randomly selected households and asks the head of household if a foreign language is the primary language spoken at home. of the 1,023 households selected, 12 responded. this survey has bias.

**nonresponse biaspossible remedy: conduct face-to-face or telephone interviews**

to determine the public’s opinion of the police department, the police chief obtains a cluster sample of 15 census tracts within his jurisdiction and samples all households in the randomly selected tracts. uniformed police officers go door to door to conduct the survey

**response biaspossible remedy: conduct a polling without police uniform**

surveys tend to suffer from low response rates. based on past experience, a researcher determines that the typical response rate for an email survey is 40%. she wishes to obtain a sample of 400 respondents, so she emails the survey to 2000 randomly selected email addresses. assuming the response rate for her survey is 40%, will respondents form an unbiased sample?

**no. the survey still suffers from undercoverage (sampling bias), nonresponse bias, and potentially response bias**

what are some solutions to nonresponse?

**offer rewards and incentives, attempt callbacks**

what are the advantages of having a presurvey with open questions to assist in constructing a questionnaire that has closed questions?

**the researcher can learn common answers**

experimental unit

**a person, object, or some other well-defined item upon which a treatment is applied**

treatment

**any combination of the values of the factors (explanatory variables)**

response variable

**the quantitative or qualitative variable for which the experimenter wishes to determine how its value is affected by the explanatory variable**

factor

**a variable whose effect on the response variable is to be assessed by the experimenter**

placebo

**an innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication**

confounding

**the effect of two factors (explanatory variables on the response variable) cannot be distinguished**

blocking

**grouping together similar experimental units and then randomly assigning the experimental units within each group to a treatment**

generally the goal of an experiment is to determine the effect that the treatment will have on the response variable

**TRUE**

a school psychologist wants to test the effectiveness of a new method of teaching statistics. she recruits 200 second-grade students and randomly divides them into two groups. group 1 is taught by means of the new method, while group 2 is taught via traditional methods. the same teacher is assigned to both groups. at the end of the year, an achievement test is administered and the results of the two groups compared

**response variable: the score on the achievement testexplanatory variable manipulated: method of teaching2 levels of treatmenttype of experimental design: completely randomized assignmentsubjects: 200 students**

researchers wanted to evaluate whether a certain herb improved memory in elderly adults as measured by objective tests. to do this, they recruited 98 men and 125 women older than 65 years and in good health. participants were randomly assigned to receive the herb, 45 mg 3 times a day, or a matching placebo. a measure of memory improvement was determined by a standardized test of learning and memory

t**ype of experimental design: completely randomized designpopulation being studied: adults older than 65 years and in good healthresponse variable: score on standardized test of learning and memorywhat is the factor? the herbtreatments: 45 mg 3 times a day or a matching placeboexperimental units: 98 men and 125 women older than 65 who are in good health that participated in the study**

a marketing research firm wishes to determine the most effective method of promoting a rock band: print, radio, television, or online. the researcher segments volunteers by their ages. of the 490 volunteers, 140 are under 20 years old, 70 are 20-39 years old, 140 are 40-59 years old, and 140 are 60 years old or older. the volunteers from each group are randomly assigned to either the print advertising group, the radio group, the television group, or the online group. each group is exposed to the advertising. after 1 hour, a recall exam is given with the proportion of correct answers recorded.

**randomized block designresponse variable: the scores on the recall examexplanatory variable manipulated: type of advertising4 treatments**

researchers wish to know if there is a link between hypertension (high blood pressure) and consumption of salt. past studies have indicated that the consumption of fruits and vegetables offsets the negative impact of salt consumption. it is also known that there is quite a bit of person-to-person variability as far as the ability of the body to process and eliminate salt. however, no method exists for identifying individuals who have a higher ability to process salt. it is recommended that daily intake of salt should not exceed 2300 milligrams (mg). the researchers want to keep the design simple, so they choose to conduct their study using a completely randomized design.

**response variable: blood pressurethree factors that have been identified: daily consumption of fruits and vegetables, daily consumption of salt, body’s ability to process saltblood pressure- not a factordaily consumption of salt- can be controlleddaily consumption of fruits and vegetables- can be controlledbody’s ability to process salt- cannot be controlledage- not a factorgender- not a factorif a factor cannot be controlled, what should be done to reduce variability in the response variable? experimental units should be randomized to each treatment group**

to determine customer opinion of their safety features, daimler- chrysler randomly selects 120 service centers during a certain week and surveys all customers visiting the service center

**cluster**

the manager of a shopping mall wishes to expand the number of shops available in the food court. he has a market researcher survey the first 120 customers who come into the food court during weekend evenings to determine what types of food the shoppers would like to see added to the food court

**cause of bias: sampling biasbest way to remedy this problem: ask customers throughout the day on both weekdays and weekends**

a polling organization conducts a study to estimate the percentage of households that has two incomes. it mails a questionnaire to 1841 randomly selected households across the united states and asks the head of each household if he or she has two incomes. of the 1841 households selected, 42 responded.

**nonresponse bias**

a salesperson obtained a systematic sample size of 25 from a list of 500 clients. to do so, he randomly selected a number 1 to 20, obtaining number 13. he included in the sample the 13th client on the list and every 20th client thereafter. list the numbers that correspond to the 25 clients selected.

**13, 33, …, 493**

frequency distribution

**lists the number of occurrences of each category of data**

relative frequency distribution

**lists the proportion of occurrences of each category of data**

bar graph

a horizontal or vertical representation of the frequency or relative frequency of the categories. the height of each rectangle represents the category’s frequency or relative frequency

pareto chart

**a bar graph whose bars are drawn in decreasing order of frequency or relative frequency**

classes

**the categories by which data are grouped**

stem-and-leaf plots are particularly useful for large sets of data

**false**

a histogram of a set of data indicates that the distribution of the data is skewed right. which measure of central tendency will likely be larger, the mean or the median? why?

**the mean will likely be larger because the extreme values in the right tail tend to pull the mean in the direction of the tail**

a data set will always have exactly one mode

**false**

for a large sporting event the broadcasters sold 51 ad slots for a total revenue of $135 million. what was the mean price per ad slot?

**2.6 million**

*the median for the given set of six ordered data values is 29.5

**7122138*****4151**

an insurance company crashed four cars of the same model at 5 mph. the costs of repair for each of the four crashes were 411, 443, 468, and 232. compute the mean, median, and mode cost of repair.

**mean- 388.5median-427mode does not exist**

which measure of central tendency best describes the “center” of the distribution?

**mean**

the sum of the deviations about the mean always equals

**zero**

complete the paragraph

**the standard deviation is used in conjunction with the mean to numerically describe distributions that are bell shaped. the mean measures the center of the distribution, while the standard deviation measures the spread of the distribution**

when comparing two populations, the larger the standard deviation, the more dispersion the distribution has, provided that the variable of interest from the two populations has the same unit of measure

**true, because the standard deviation describes how far, on average, each observation is from the typical value. a larger standard deviation means that observations are more distant from the typical value, and therefore, more dispersed.**

chebyshev’s inequality applies to all distributions regardless of shape, but the empirical rule holds only for distributions that are bell shaped

**true, chebyshev’s inequality is less precise than the empirical rule, but will work for any distribution, while the empirical rule only works for bell-shaped distributions**

find the sample variance and standard deviation: 23, 13, 6, 10, 9

**s2= 42.7s= 6.5**

find the population variance and standard deviation: 8, 11, 15, 17, 19

**population variance: 16standard deviation: 4**

compute the range and sample standard deviation for strength of the concrete (in psi): 3970, 4140, 3400, 3200, 2910, 3840, 4140, 4040

**the range is 1230 psis=472 psi**

the weight of an organ in adult males has a bell-shaped distribution with a mean of 300 grams and a standard deviation of 35 grams. use the empirical rule to determine the following

(a) about 95% of organs will be between what weights?

(b) what percentage of organs weighs between 265 grams and 335 grams?

(c) what percentage of organs weighs less than 265 grams or more than 335 grams?

(d) what percentage of organs weighs between 195 grams and 370 grams?

**(a) 230 and 370 grams(b) 68%(c) 32%(d) 97.35%**

what makes the range less desirable than the standard deviation as a measure of dispersion?

**the range does not use all the observations**