### Statistics Random Data

This chapter is about statistics random data and numerical data.

**statistics**

a branch of mathematics dealing with the collection, analysis, interpretation and presentation of masses of numerical data

**descriptive statistics**

organizes, presents, summarizes: graphically, numerically, pictorially..convenient and informative, simplifies comparisons

**Inferential**

estimates, predicts, decides, draw conclusions or inferences, USES SMALLER AMOUNTS OF DATA

**population**

is the group (individuals,items, measurements) of interest which is usually not easy to access directly

**paramater**

describes some characteristics of the population. paramaters are usually unknown

**sample**

a part of the population that we actually examine and for which we collect data. it is smaller (sometimes more accessible) group

**statistic**

is a number that describes a characteristic of a sample. often, a sample is used to estimate an unknown parameter

**statistical inference**

the process of making an estimate, prediction, or decision about a population based on a sample

**confidence levels**

tells the proportion of time that our conclusion will be correct

**significance levels**

a numerical measure of how often a result will be wrong

**variable**

a characteristic of a population or sample of interest

interval

nominal

ordinal

**interval**

real numbers..heights weights prices- quantitative or numerical histograms

**nominal**

qualitative or categorical. the values of nominal data are categories. ex: single=1 married=2 pie charts and bar graphs

**ordinal**

values have an order a ranking to them. order is maintained no matter what numeric values are assigned.

**bar chart**

used to display frequencies

the bar represents each category, height of the bar represents the frequency

the base of the bar is arbitrary

**pie chart**

shows relative frequencies. the pies represent categories.

**histogram stem and leaf and ogive**

are used when the data is interval

**contingency table**

used to describe the relationship between two nominal variables. lists the frequencies of each combination of the values of the two variables. the data can then be summarized in a bar cart graphically

**two interval variables related?**

scatter diagram or scatterplot

**independent variable**

predictor, explanatory- stays the same. is labeled x on the horizontal axis

**dependent variable**

is the outcome or the response and is labeled y on the vertical axis

**one nominal and one interval**

bar chart is an effective way to summarize this.

**time series plot**

observations measured at successive points in time.graphed on a line chart. time periods go on the horizontal axis

**mean**

average. appropriate for describing interval data.

**x-bar**

mean for a sample

**mu**

mean for a population

**median**

appropriate for interval or ordinal data. best for data dealing with extreme values. computed the same for population and sample. (n+1)/2

**mode**

occurs most frequently. useful for all data mainly for identifying the group with the highest frequency for nominal data

**geometric mean**

used when the variable is a growth rate of change.

**range**

largest observation-smallest observation

**variance**

population- sigma squared

sample s^2

**standard deviation**

square root of the variance

**Cross-sectional data**

data that is collected at a certain point in time. starting salaries of mba students.

**longitudinal data**

is collected over a period of time. weekly starting prices of gold.

**prospective**

collected from the current point in the future

**retrospective or historical**

collected on events that have happened in the past.

**sampling**

the process of selecting a subset of a whole population. cost efficient. and practical. the sample and the target population should be similar to each other.

**sampling plan.**

a method or procedure for specifying how a sample will be taken from a population.

simple random

stratified random

cluster sampling

**simple random**

everyone has an equal chance of being selected

**stratified random**

separating the population into mutually exclusive set. split into groups first and then use random sampling

**cluster sample**

a simple random sample of groups or clusters. may increase error