math 1090 Statistics: 2016

Thursday, December 1, 2016

what is/are statistics
when r = 0 there is no correlation between the 2 variables
r=.9 positive strong correlation
9=-.9 strong negative correlation
positive and negative correlation
coincidence, a common underlying cause or a direct cause
what is a best fit or regression line
it is useful to make future predictions
what is the difference between finding the difference between two variables and establishing causality between two variables
finding correlation is saying there is showing a relationship but it doesnt say that one thing happen because of the other
statistically significant
probability
make a probability distrubution
what kind of sampling method was used
does it represent sample statistics or population parameter
is it qualitative or quantitative data
the difference between accurate and percise
how to crate a vertical bard, dot plot and pie chart
expected value
68-95-99.7 rule
calculate standard deviation

Tuesday, November 22, 2016

section 9.1 and 9.2 hypothesis testing

a hypothesis is a claim about a population parameter, such as a population portion or population mean
in this chapter, the null hypothesis will always include the condition of equality, just ass the null hypothesis for the gender choice
it is useful to give names to the three different types of alternative hypotheses
the first form ("less than") leads to what is called a left-tailed hypothesis test, because it requires testing whether the population parameter lies to the left (lower values) of the claimed value
Similarly, the second form
null alternatives hypothesis - the null hypothesis, or H0 is the starting assumption for a hypothesis test. For the types of hypothesis always
in each case, identify the population
solution (cont)
identifying hypothesis
in each case, identify the population parameter about which a claim is made, state the null and alternative hypothesis for a hypothesis test, and indicate whether the hypothesis test will be left-tailed, right tailed, or two tailed
two possible outcomes of a hypothesis test
there are two possible outcomes to a hypothesis test
reject the null hypothesis
the null hypothesis is that the proportion of femal zebras is the accepted population proportion
do not reject the null hypothesis, in which case we lack evidence to support the biogist claim
drawing a conclusion from a hypothesis test
we will look at trwo ways to make the decision about rejecting or not rejecting the null hypothesis
statistical significance
if the probability of a particular result is 0.05 or less, we say that the result is statistically significant at the 0.05 or less, we say that the result is statistically significant at the 0.05 level;
we decide
if the chance
consider
p calues is a short for probability value;
the p valu for a hypothesis test of a claim about a population parameter is the probability of selecting a sample at least as extreme as the observed sample, assuming that the null hypothesis is true;
a small p value ( such as less than or equal to 0.05) indicates that the sample
you suspect that a coin may have a bias toward landing tails more often
your 100 coin tosses represent a random sample size n=100, and the result of 40 heads is the sample proportion at least as
the hypothesis test process
formulate the null and alternative hypotheses each of which must make a claim about a population parameter, such as a population mean
the hypothesis test process
step 3: determine the likelihood of observing a sample statistic (mean or proportion) at least as extreme as the one you found under the assumption
in the united states the average car is driven about 12000 miles each year. The owner of a large rental car company suspects that for his fleet, the mean distance is greater than 12000 miles each year. He selects a random sample of n= 225 cars from his fleet and finds that the mean annual mileage for this sample is x bar =12375
the p-value of 0.01 tells us that the result is significant
In American courts of law, the fundemental principle is that a defendant is presumed innocent

Tuesday, November 15, 2016

8.2

estimating a population mean: the basics
when we have only a single sample, the sample mean is the best extimate of the population mean,u
however, we do not expect the sample mean to be equal to the population mea, because there is likely to be some sampling error. Therfore, in order to make an inference about the population mean, we need some way
a precise calculation shows that if the distrubution of sample means is normal with a mean of u, then 95% of all sample means lie within 1.96 standard deviations of the population mean; for our purposes in this book, we will approximate this as 2 standard deviations
a confidence interval is a range of values likely to contain the true value of the population mean
the margin of error E= 2s/square root of n
we find the 95% confidence interval by adding and subtracting the margin of error
interperting the confidence interval
a study
E= 2s/ square root of n
square root of n = 2s/e
n = 2s/e ^2
in order to estimate the population mean with a specified margin of error of at most E, the size of the sample should be at least
you want to study housing costs in the country by sampling recent house sales in various (representative) regions. Your goal is to provide a 95% confidence interval estimate of the housing cost. Previous studies suggest that the population standard deviation is about 7200. What sample size ( at a minimum) should be used to ensure the sample mean is within
$500 of the true population mean
$100 if the true population mean
margin of error
based on a random sample of hospital costs for car crash victims, the sample mean is 9004 and the margin of error for a 95% confidence interval is $266
$8738<u$9270
the national health examination involves measurments from about 2500 people, and the results are used to estimate values of various poulations means. Is it valid to criticize this survey because the sample size is only about 0.01% of the population of all Americans? Explain
does it make sense
margin of error
the mean income of high school mathematics teachers estimated to be 48,213
finding margin of error and confidence intervals
sample size = 81
sample mean 4.5km
sample standard deviation 3.1 km
0.7 km margin of error
95% confidence interval for a population proportion
for a population proportion, the margin of error for the 95% confidence interval is 2 square root of p hat (1-p)/n
p hat is the sample proportion
the neilsen ratings for television use a random sample of households. A nielsen survey results in an estimate that a women's world cup soccer game had 72.3% of the entire viewing audience. Assuming that the sample consists of n = 5000 randomly selected households, find the margin of error and the 95% confidence interval for this estimate
choosing the correct sanple size
in order to estimate a population proportion with a 95% degree of confidence and a specified margin of error of E, the size of the sample should be at least
n=1/E^2
a study done by a reasearchers at Alfred University concluded that 80% of all student athletes in his country have been subjected to some form of hazing.
a study comiisioned by the U.S department of education

8.1 What is a sampling distribution

notation used to describe the sample statistics and population parameter
5 sizes of sampling distribution
sampling error - the error introduced because a random sample is used to estimate a population parameter.
n= sample size u = population mean
x bar = sample mean
the distrubution of a sample means is the distrubution that results when we find the means of all possible samples of a given size
the larger the sample size, the more closely this distrubution apporximates a normal distrubution
z= x- x bar/s
what is the mean of the distrubution of sample? its the pupulation mean
z score = sample proportion - pop. proportion/standard deviation
9-17 odd

Thursday, November 3, 2016

know how to calculate the z score
68-95-99.7 rule
know how to apply the central limit theorem
understand the concept of statistical significance and know how to apply it
know the difference and how to apply theoretical,relative and subjective probablilities
understand the law of large numbers and be able to calculate expected value.
also know how to answer questions related to life espectations
interpert the correlation coefficent
best fit line and r^2

the best-fit-line on a scatter diagram is a line that lies closer to the data points than any other possible line ( according to a standard statistical measure of closeness)
predictions with best-fit lines
don't expect a best-fit line to give a good prediction
a best fit line based on past data is not necessarily calid now and might not result in valid predictions of the future
dont make predictions about a population that is different from the population from which the sample data were drawn
valid predictions?
youve found a best-fit line for a correlation between the number of hours per day that people excercise and the number of calories they consume each day. You used this correlation to predict that a person who excercises 18 hours per day
no one excercises 18 hours a day
bes fit lines nd r^2 - the square of the correlation coefficent, or r^2, is the proportion of the variation in a variable that is accounted for by the best-fit line
the coefficent of derermination represents
the coefficent determination represents the percent of the data that is the closest to the line
political scientists are interested in knowing what factors affect voter turnout in elections/ One such factor is the unemployment .
the square of the correlation coefficients is r^2

Tuesday, November 1, 2016

7.2 interperting correlations

If you calculate th correlation coefficient for these data, you'll find that it is a relatively high r=0,880, suggesting a very strong correlation
you've conducted a study to determine how the number of calories
however, notice that two points

there is a correlation between the variables amount
scatter diagrams - is a graph in which each point represents the values of two variables
suppose there really were a gene that made people prone to both smoking and lung cancer. Explain why we would still find a strong correlation between smoking and lung cancer
we assign one variable to each axis and label the axis with values that comfortably fit all the data.
Thanks to a large bonus at work, you have a budget of $6,000 for a diamond ring.
types of correlation- positive correlation,
positive correlation - both variables tend to increase ( or decrease ) together
negative correlation - the two variables tend to change in opposite directions, with one increasing while the other decreases.
No correlation - there is no apparent ( linear) relationship between the two variables
nonlinear correlation - the two variables are related but the relationship results in a scatter diagram that does not follow a straight-line pattern
test next Thursday
statisticians measure the strength of a correlation with a number called te correlation coefficent, represented by the letter r.
properties of the correlation coefficent
the correlation,r, is a measure of the strength
if there is a negative correlation, the correlation coefficient is negtive (-1<r<0): When one variable increases,
7.1 - 1,2,3,4,11,17,19