Inference for Comparing 2 Population Proportions (HT for 2 Proportions)
Now we get to the good stuff! We will need to know how to label the null and alternative hypothesis, calculate the test statistic, and then reach our conclusion using the critical value method or the p-value method.
The Test Statistic for a 2 Proportion Test:
[latex]z = \displaystyle \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\displaystyle \frac{\bar{p} \times \bar{q}}{n_1} + \displaystyle \frac{\bar{p} \times \bar{q}}{n_2}}}[/latex]
What the different symbols mean:
[latex]x_1[/latex] is the number of successes or observations in the first group (not always needed)
[latex]n_1[/latex] is the sample size from the first group (number of people, items, etc… in the study)
[latex]p_1[/latex] is the population proportion for the first group; this will be used in the null and alternative hypotheses as well
[latex]\hat{p_1}[/latex] is the sample proportion (or percentage) for the first group, given by [latex]\hat{p_1} = \frac{x_1}{n_1}[/latex]
[latex]\hat{q_1}[/latex] is what is left over from the sample proportion (or percentage) for the first group, given by [latex]\hat{q_1} = 1 - \hat{p_1}[/latex]
[latex]x_2[/latex] is the number of successes or observations in the second group (not always needed)
[latex]n_2[/latex] is the sample size from the second group (number of people, items, etc… in the study)
[latex]p_2[/latex] is the population proportion for the second group; this will be used in the null and alternative hypotheses as well
[latex]\hat{p_2}[/latex] is the sample proportion (or percentage) for the second group, given by [latex]\hat{p_2} = \frac{x_2}{n_2}[/latex]
[latex]\hat{q_2}[/latex] is what is left over from the sample proportion (or percentage) for the second group, given by [latex]\hat{q_2} = 1 - \hat{p_2}[/latex]
[latex]\bar{p} = \displaystyle \frac{x_1 + x_2}{n_1 + n_2}[/latex] is the pooled sample proportion, which combines the two sample proportions into a single value
[latex]\bar{q} = 1 - \bar{p}[/latex]
[latex]\alpha[/latex] is the significance level, usually given within the problem, or if not given, we assume it to be 5% or 0.05
Assumptions when conducting a 2 Proportion Test:
- We have a simple random sample
- The two samples or groups are independent
- There are at least 5 successes and at least 5 failures for each of the samples
- [latex]n\hat{p} \ge 5[/latex] and [latex]n\hat{q} \ge 5[/latex]
Steps to conduct the 2 Proportion Test:
- Identify all the symbols listed above (all the stuff that will go into the formulas). This includes [latex]x_1[/latex] and [latex]x_2[/latex] (if necessary), [latex]n_1[/latex] and [latex]n_2[/latex], [latex]\hat{p_1}[/latex] and [latex]\hat{q_1}[/latex], [latex]\hat{p_2}[/latex] and [latex]\hat{q_2}[/latex], [latex]\bar{p}[/latex] and [latex]\bar{q}[/latex], and [latex]\alpha[/latex]
- Identify the null and alternative hypotheses
- Calculate the test statistic, [latex]z = \displaystyle \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\displaystyle \frac{\bar{p} \times \bar{q}}{n_1} + \displaystyle \frac{\bar{p} \times \bar{q}}{n_2}}}[/latex]
- Find the critical value(s) OR the p-value OR both
- Apply the Decision Rule
- Write up a conclusion for the test
Example 1: Race/Name Resume Study[1]
In this study, investigators created mock identical resumés, which were sent to job placement ads in Chicago and Boston. Each resumé was randomly assigned either a commonly-white or commonly-black name. In total, 246 out of 2445 commonly-white named resumés received a callback and 164 out of 2445 commonly-black named resumés received a callback. Is there compelling evidence to conclude that callback rates are higher for common white names vs. common black names?
Solution
Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with rates or percents from two samples or groups (the applicants with common white names and those with common black names), so we will conduct a 2 Proportion Test.
- [latex]x_1 = 246[/latex] is the number of callbacks for applicants with common white names
- [latex]n_1 = 2445[/latex] is the sample size from the first group; those with common white names
- [latex]\hat{p_1} = \displaystyle \frac{x_1}{n_1} = \displaystyle \frac{246}{2445} = 0.101[/latex]
- [latex]\hat{q_1} = 1 - \hat{p_1}\ = 1 - 0.101 = 0.899[/latex]
- [latex]x_2 = 164[/latex] is the number of callbacks for applicants with common black names
- [latex]n_2 = 2445[/latex] is the sample size from the second group; those with common black names
- [latex]\hat{p_2} = \displaystyle \frac{x_2}{n_2} = \displaystyle \frac{164}{2445} = 0.067[/latex]
- [latex]\hat{q_2} = 1 - \hat{p_2} = 1 - 0.067 = 0.933[/latex]
- [latex]\bar{p} = \displaystyle \frac{x_1 + x_2}{n_1 + n_2} = \displaystyle \frac{246 + 164}{2445 + 2445} = 0.084[/latex]
- [latex]\bar{q} = 1 - \bar{p} = 1 - 0.084 = 0.916[/latex]
- [latex]\alpha = 0.05[/latex] (we were not told a specific value in the problem, so we are assuming it is 5%)
- Null and Alternative Hypothesis: In a 2-sample problem, the null hypothesis will always be the assumption that the groups are the same, or that they have the same rates. In this case we are being asked if there is evidence that the common white names are called back more frequently than the common black names, so the alternative hypothesis uses a [latex]>[/latex] symbol.
- [latex]H_{0}: p_1 = p_2[/latex]
- [latex]H_{A}: p_1 > p_2[/latex]
- Test Statistic
- [latex]z = \displaystyle \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\displaystyle \frac{\bar{p} \times \bar{q}}{n_1} + \displaystyle \frac{\bar{p} \times \bar{q}}{n_2}}} = \displaystyle \frac{0.101 - 0.067}{\sqrt{\displaystyle \frac{0.084 \times 0.916}{2445} + \displaystyle \frac{0.084 \times 0.916}{2445}}} = 4.29[/latex]
- P-Value: The p-value is found by looking up the test statistic calculated (in this case [latex]z = 4.29[/latex]) in the normal distribution table. We find that this corresponds to a value of [latex]0.9999[/latex]. Since this is a “greater than” test, we subtract from one, and get [latex]p-value = 1 - 0.9999 = 0.0001[/latex].
- Applying the Decision Rule: We now compare this to our significance level, which is 0.05. If the p-value is smaller or equal to the alpha level, we have enough evidence for our claim, otherwise we do not. Here, [latex]p-value = 0.0001[/latex], which is definitely smaller than [latex]\alpha = 0.05[/latex], so we have enough evidence for the alternative hypothesis…but what does this mean?
- Conclusion: Because our p-value of [latex]0.0001[/latex] is less than our [latex]\alpha[/latex] level of [latex]0.05[/latex], we reject [latex]H_{0}[/latex]. We have convincing evidence that the callback rate for common white names is higher than the callback rate for common black names.
Example 2: Seat Belt Use in New York and Boston[2]
Police officers in New York City can stop a driver who is not wearing their seat belt. In Boston, police officers can issue citations to drivers for not wearing their seat belts ONLY if the driver has been stopped for another violation. Data from random samples of female Hispanic drivers in 2002 is summarized in the following table:
City | Drivers (n) | Wearing Seat Belts (x) |
Boston | 117 | 68 |
New York | 220 | 183 |
Is there compelling evidence to conclude that a smaller rate (or proportion) of drivers wear their seat belts in Boston as compared to New York?
Solution
Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with rates or percents from two samples or groups (female Hispanic drivers in the two cities), so we will conduct a 2 Proportion Test. We will think of Boston as the first group and New York as the second group
- [latex]x_1 = 68[/latex] is the number of female Hispanic drivers in Boston who wore their seat belts
- [latex]n_1 = 117[/latex] is the sample size from the first group; female Hispanic drivers in Boston who were part of the study
- [latex]\hat{p_1} = \displaystyle \frac{x_1}{n_1} = \displaystyle \frac{68}{117} = 0.581[/latex]
- [latex]\hat{q_1} = 1 - \hat{p_1}\ = 1 - 0.581 = 0.419[/latex]
- [latex]x_2 = 183[/latex] is the number of female Hispanic drivers in New York who wore their seat belts
- [latex]n_2 = 220[/latex] is the sample size from the first group; female Hispanic drivers in New York who were part of the study
- [latex]\hat{p_2} = \displaystyle \frac{x_2}{n_2} = \displaystyle \frac{183}{220} = 0.832[/latex]
- [latex]\hat{q_2} = 1 - \hat{p_2} = 1 - 0.832 = 0.168[/latex]
- [latex]\bar{p} = \displaystyle \frac{x_1 + x_2}{n_1 + n_2} = \displaystyle \frac{68 + 183}{117 + 220} = 0.745[/latex]
- [latex]\bar{q} = 1 - \bar{p} = 1 - 0.745 = 0.255[/latex]
- [latex]\alpha = 0.05[/latex] (we were not told a specific value in the problem, so we are assuming it is 5%)
- Null and Alternative Hypothesis: In a 2-sample problem, the null hypothesis will always be the assumption that the groups are the same, or that they have the same rates. In this case we are being asked if there is evidence that there is a lower rate of seat belt use among female Hispanic drivers in Boston as compared to New York, so the alternative hypothesis uses a \(<\) symbol.
- [latex]H_{0}: p_1 = p_2[/latex]
- [latex]H_{A}: p_1 < p_2[/latex]
- Test Statistic
- [latex]z = \displaystyle \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\displaystyle \frac{\bar{p} \times \bar{q}}{n_1} + \displaystyle \frac{\bar{p} \times \bar{q}}{n_2}}} = \displaystyle \frac{0.581 - 0.832}{\sqrt{\displaystyle \frac{0.745 \times 0.255}{117} + \displaystyle \frac{0.745 \times 0.255}{220}}} = -5.03[/latex]
- P-Value: The p-value is found by looking up the test statistic calculated (in this case [latex]z = -5.03[/latex]) in the normal distribution table. We find that this corresponds to a value of [latex]0.0001[/latex]. Since this is a “less than” test, we keep the value from the table, and get [latex]p-value = 0.0001[/latex].
- Applying the Decision Rule: We now compare this to our significance level, which is 0.05. If the p-value is smaller or equal to the alpha level, we have enough evidence for our claim, otherwise we do not. Here, [latex]p-value = 0.0001[/latex], which is definitely smaller than [latex]\alpha = 0.05[/latex], so we have enough evidence for the alternative hypothesis…but what does this mean?
- Conclusion: Because our p-value of [latex]0.0001[/latex] is less than our [latex]\alpha[/latex] level of [latex]0.05[/latex], we reject [latex]H_{0}[/latex]. We have convincing evidence that there is a lower rate of seat belt use among female Hispanic drivers in Boston as compared to New York.
- Adapted from the Skew The Script curriculum (skewthescript.org), licensed under CC BY-NC-Sa 4.0 ↵
- Adapted from The Basic Practice of Statistics, 7th Edition, by Moore, Notz, and Fligner ↵