- what is/are statistics
- when r = 0 there is no correlation between the 2 variables
- r=.9 positive strong correlation
- 9=-.9 strong negative correlation
- positive and negative correlation
- coincidence, a common underlying cause or a direct cause
- what is a best fit or regression line
- it is useful to make future predictions
- what is the difference between finding the difference between two variables and establishing causality between two variables
- finding correlation is saying there is showing a relationship but it doesnt say that one thing happen because of the other
- statistically significant
- probability
- make a probability distrubution
- what kind of sampling method was used
- does it represent sample statistics or population parameter
- is it qualitative or quantitative data
- the difference between accurate and percise
- how to crate a vertical bard, dot plot and pie chart
- expected value
- 68-95-99.7 rule
- calculate standard deviation
Thursday, December 1, 2016
Tuesday, November 22, 2016
section 9.1 and 9.2 hypothesis testing
- a hypothesis is a claim about a population parameter, such as a population portion or population mean
- in this chapter, the null hypothesis will always include the condition of equality, just ass the null hypothesis for the gender choice
- it is useful to give names to the three different types of alternative hypotheses
- the first form ("less than") leads to what is called a left-tailed hypothesis test, because it requires testing whether the population parameter lies to the left (lower values) of the claimed value
- Similarly, the second form
- null alternatives hypothesis - the null hypothesis, or H0 is the starting assumption for a hypothesis test. For the types of hypothesis always
- in each case, identify the population
- solution (cont)
- identifying hypothesis
- in each case, identify the population parameter about which a claim is made, state the null and alternative hypothesis for a hypothesis test, and indicate whether the hypothesis test will be left-tailed, right tailed, or two tailed
- two possible outcomes of a hypothesis test
- there are two possible outcomes to a hypothesis test
- reject the null hypothesis
- the null hypothesis is that the proportion of femal zebras is the accepted population proportion
- do not reject the null hypothesis, in which case we lack evidence to support the biogist claim
- drawing a conclusion from a hypothesis test
- we will look at trwo ways to make the decision about rejecting or not rejecting the null hypothesis
- statistical significance
- if the probability of a particular result is 0.05 or less, we say that the result is statistically significant at the 0.05 or less, we say that the result is statistically significant at the 0.05 level;
- we decide
- if the chance
- consider
- p calues is a short for probability value;
- the p valu for a hypothesis test of a claim about a population parameter is the probability of selecting a sample at least as extreme as the observed sample, assuming that the null hypothesis is true;
- a small p value ( such as less than or equal to 0.05) indicates that the sample
- you suspect that a coin may have a bias toward landing tails more often
- your 100 coin tosses represent a random sample size n=100, and the result of 40 heads is the sample proportion at least as
- the hypothesis test process
- formulate the null and alternative hypotheses each of which must make a claim about a population parameter, such as a population mean
- the hypothesis test process
- step 3: determine the likelihood of observing a sample statistic (mean or proportion) at least as extreme as the one you found under the assumption
- in the united states the average car is driven about 12000 miles each year. The owner of a large rental car company suspects that for his fleet, the mean distance is greater than 12000 miles each year. He selects a random sample of n= 225 cars from his fleet and finds that the mean annual mileage for this sample is x bar =12375
- the p-value of 0.01 tells us that the result is significant
- In American courts of law, the fundemental principle is that a defendant is presumed innocent
Tuesday, November 15, 2016
8.2
- estimating a population mean: the basics
- when we have only a single sample, the sample mean is the best extimate of the population mean,u
- however, we do not expect the sample mean to be equal to the population mea, because there is likely to be some sampling error. Therfore, in order to make an inference about the population mean, we need some way
- a precise calculation shows that if the distrubution of sample means is normal with a mean of u, then 95% of all sample means lie within 1.96 standard deviations of the population mean; for our purposes in this book, we will approximate this as 2 standard deviations
- a confidence interval is a range of values likely to contain the true value of the population mean
- the margin of error E= 2s/square root of n
- we find the 95% confidence interval by adding and subtracting the margin of error
- interperting the confidence interval
- a study
- E= 2s/ square root of n
- square root of n = 2s/e
- n = 2s/e ^2
- in order to estimate the population mean with a specified margin of error of at most E, the size of the sample should be at least
- you want to study housing costs in the country by sampling recent house sales in various (representative) regions. Your goal is to provide a 95% confidence interval estimate of the housing cost. Previous studies suggest that the population standard deviation is about 7200. What sample size ( at a minimum) should be used to ensure the sample mean is within
- $500 of the true population mean
- $100 if the true population mean
- margin of error
- based on a random sample of hospital costs for car crash victims, the sample mean is 9004 and the margin of error for a 95% confidence interval is $266
- $8738<u$9270
- the national health examination involves measurments from about 2500 people, and the results are used to estimate values of various poulations means. Is it valid to criticize this survey because the sample size is only about 0.01% of the population of all Americans? Explain
- does it make sense
- margin of error
- the mean income of high school mathematics teachers estimated to be 48,213
- finding margin of error and confidence intervals
- sample size = 81
- sample mean 4.5km
- sample standard deviation 3.1 km
- 0.7 km margin of error
- 95% confidence interval for a population proportion
- for a population proportion, the margin of error for the 95% confidence interval is 2 square root of p hat (1-p)/n
- p hat is the sample proportion
- the neilsen ratings for television use a random sample of households. A nielsen survey results in an estimate that a women's world cup soccer game had 72.3% of the entire viewing audience. Assuming that the sample consists of n = 5000 randomly selected households, find the margin of error and the 95% confidence interval for this estimate
- choosing the correct sanple size
- in order to estimate a population proportion with a 95% degree of confidence and a specified margin of error of E, the size of the sample should be at least
- n=1/E^2
- a study done by a reasearchers at Alfred University concluded that 80% of all student athletes in his country have been subjected to some form of hazing.
- a study comiisioned by the U.S department of education
8.1 What is a sampling distribution
- notation used to describe the sample statistics and population parameter
- 5 sizes of sampling distribution
- sampling error - the error introduced because a random sample is used to estimate a population parameter.
- n= sample size u = population mean
- x bar = sample mean
- the distrubution of a sample means is the distrubution that results when we find the means of all possible samples of a given size
- the larger the sample size, the more closely this distrubution apporximates a normal distrubution
- z= x- x bar/s
- what is the mean of the distrubution of sample? its the pupulation mean
- z score = sample proportion - pop. proportion/standard deviation
- 9-17 odd
Thursday, November 3, 2016
- know how to calculate the z score
- 68-95-99.7 rule
- know how to apply the central limit theorem
- understand the concept of statistical significance and know how to apply it
- know the difference and how to apply theoretical,relative and subjective probablilities
- understand the law of large numbers and be able to calculate expected value.
- also know how to answer questions related to life espectations
- interpert the correlation coefficent
- best fit line and r^2
- the best-fit-line on a scatter diagram is a line that lies closer to the data points than any other possible line ( according to a standard statistical measure of closeness)
- predictions with best-fit lines
- don't expect a best-fit line to give a good prediction
- a best fit line based on past data is not necessarily calid now and might not result in valid predictions of the future
- dont make predictions about a population that is different from the population from which the sample data were drawn
- valid predictions?
- youve found a best-fit line for a correlation between the number of hours per day that people excercise and the number of calories they consume each day. You used this correlation to predict that a person who excercises 18 hours per day
- no one excercises 18 hours a day
- bes fit lines nd r^2 - the square of the correlation coefficent, or r^2, is the proportion of the variation in a variable that is accounted for by the best-fit line
- the coefficent of derermination represents
- the coefficent determination represents the percent of the data that is the closest to the line
- political scientists are interested in knowing what factors affect voter turnout in elections/ One such factor is the unemployment .
- the square of the correlation coefficients is r^2
Tuesday, November 1, 2016
7.2 interperting correlations
- If you calculate th correlation coefficient for these data, you'll find that it is a relatively high r=0,880, suggesting a very strong correlation
- you've conducted a study to determine how the number of calories
- however, notice that two points
- there is a correlation between the variables amount
- scatter diagrams - is a graph in which each point represents the values of two variables
- suppose there really were a gene that made people prone to both smoking and lung cancer. Explain why we would still find a strong correlation between smoking and lung cancer
- we assign one variable to each axis and label the axis with values that comfortably fit all the data.
- Thanks to a large bonus at work, you have a budget of $6,000 for a diamond ring.
- types of correlation- positive correlation,
- positive correlation - both variables tend to increase ( or decrease ) together
- negative correlation - the two variables tend to change in opposite directions, with one increasing while the other decreases.
- No correlation - there is no apparent ( linear) relationship between the two variables
- nonlinear correlation - the two variables are related but the relationship results in a scatter diagram that does not follow a straight-line pattern
- test next Thursday
- statisticians measure the strength of a correlation with a number called te correlation coefficent, represented by the letter r.
- properties of the correlation coefficent
- the correlation,r, is a measure of the strength
- if there is a negative correlation, the correlation coefficient is negtive (-1<r<0): When one variable increases,
- 7.1 - 1,2,3,4,11,17,19
Subscribe to:
Comments (Atom)