Confidence Interval

  • It is reported in 2 parts: Confidence level (e.g. 95%) and Interval (e.g. 0.2 0.01)
  • It is used to infer info on the population parameter using a sample of the population.
  • 95% CI means out of 100 samples, and confidence intervals constructed for each sample, 95 of them will contain the population parameter. So, we are 95% confident that the population parameter lies within the CI.
  • To construct a CI for population proportion (for discrete variables)
    • : sample proportion
    • : z-value from standard normal distribution (depends on confidence level)
    • : sample size
  • To construct a CI for population mean
    • : sample mean
    • : t-value from t-distribution (depends on and confidence level)
    • : sample standard deviation
    • : sample size
  • Properties
    • When a smaller sample is taken, the CI will be wider
    • At a lower confidence level, the CI will be narrower

Hypothesis Testing

Used to decide if the data from a random sample is sufficient to support a hypothesis about the population

  1. State (mutually exclusive):
    1. Null Hypothesis (always lower probability)
    2. Alternate Hypothesis
  2. Collect sample data
  3. State the level of significance (typically 0.05, 0.01 or 0.1). Lower value: harder to reject null hypothesis
  4. Calculate P value (probability that null hypothesis is true)
  5. Compare with level of significance
    1. p value < significance level: null hypothesis rejected, accept alternate hypothesis
    2. p value > significance level: not enough information to reject null hypothesis

Example

A coin manufacturer claims they produced a biased coin with When testing the coin 8 times, there were 7H and 1T

  1. Hypotheses
    1. Null Hypothesis : Coin is as claimed i.e. P(H) = 0.3
    2. Alternate Hypothesis : P(H) > 0.3 (one tailed test) (two tailed test is P(H) \neq 0.3)
  2. Let the level of significance be 0.1
  3. P-value = P(observation | null is true) + P(equally extreme outcomes | null is true) + P(outcomes that are even more extreme | null is true) Note: at least as extreme β†’ at least as favourable to the alternate hypothesis
    1. P(HHHHHHHT | P(H) = 0.3) = (0.3)^7 x 0.7
      • P(other combinations of 7H 1T | P(H) = 0.3) = 7(0.3)^7 x (0.7)
      • P(HHHHHHHH | P(H) = 0.3) = (0.3)
  4. Since p-value < level of significance, we can reject the null hypothesis, and conclude the alternate hypothesis

Chi-sq test

Hypothesis testing for whether two variables are associated

Example

Null Hypothesis: Smoking is not associated with heart disease: rate(HD|S) = rate(HD|NS) = rate(HD) Alternate hypothesis: Smoking is associated with heart disease: rate(HD|S) rate(HD|NS)

Assume Null Hypothesis is true, calculate rate(HD).

Construct table for observation:

Heart DiseasNo Heart DiseaseRow Total
Smoker381496215000
Non Smoker448495685000
Col Total8299918100000
rate(HD) = 82/100000
Then, draw the table for the expected outcome (null hypothesis) and compare:
Heart DiseaseNo Heart DiseaseRow Total
Smoker12.314897.715000
Non Smoker69.7844930.385000
Col Total8299918100000
p value is low if there is a big difference between expectation and observation

if p-value < level of significance, we can reject the null hypothesis and conclude alternate hypothesis if p-value > level of significance, we cannot reject the null hypothesis, and therefore cannot conclude the alternate hypothesis (we cannot conclude the null hypothesis)

Note: Don’t need to know how to calculate p value, just use software