Foundations of Quantitative Research in Political Science
Now that we know how how regression equation works and how it is calculated, it is time for us to ask ourselves, how do we use, how do we interpret data from a regression analysis to test a hypothesis? The null hypotheses is stating that Beta 1 is 0. In other words, if beta one is 0, then the percent why it can go up as much as the ones that present trump won't change, right? Precentral remain beta 0, no matter how much percent white goes up. And that is consistent what we were quite as being affirmed in the null hypotheses that in the 2016 presidential election, voting for Trump in California was no different between counties with a greater white population and a smaller white population. So when we set up a regression equation like this, percent Trump equals Beta 0 plus Beta 1 times percent white. We are doing this as a way to test the hypothesis that you have above. Write that in 2016, voting for Trump was higher in counties with a graded white population and income into the smaller white population. And the null hypothesis is that there is no difference. And the implications of the our hypotheses and null hypotheses. Our Beta is bigger than 0 if our alternative hypothesis is correct, and beta one is equal to 0 if our null hypothesis and interact. And remember when we render regression, we came up with an estimate that in fact is bigger than 0. And it tells us that as California, as the white population across counties increases, we should expect the Trump vote to also increase. And we can also be precise about it by saying by how much it will increase, right? How much the Trump vote will increase for every one percentage point increase in the white population. But our results is, is this result statistically significant? So just statistically significant. Remember what, how we discussed this term's being statistically significant in terms of p-values. So we have learned that p-value, the p-value is the probability that we observe a result at least as extreme as the result you're observing if the null hypothesis is correct. So the P-value for beta one is going to be the probability that we observe a Beta of a slope that is as at least as extreme as 0.34, right? Beta-1 is the slope if the null hypothesis is correct. So in other words, we won't consider this, our estimated Beta 1 to be statistically significant. If it turns out that there is a considerable probability that we are observing this estimated beta one of 0.34, even though the true Beta one is actually 0, even though there is actually no relationship between percent white and voting for Trump. It could be the case that we're getting this bear one of 0.34 out of random chance when actuality in truth, the null hypothesis is correct and the independent variable and the dependent variable are not related at all. So if the null hypothesis is correct, the true beta one is 0. And the p-value is going to be the probability that we're estimating a slope at least as far from 0 as point. Therefore, even though the true slope is 0. So let's suppose that we set our significance level or alpha at 0.05. And then we get the p-value for Beta-1. Beta-1, the, the, our estimate for beta 1 is going to have a p-value. If the p-value is larger than 0.05, sorry, if the p-value is smaller. Okay, I fixed that. Now, if the p-value is smaller than 0.05, we reject the null hypothesis, right? So if the p-value for our estimate for Beta 1, if the p-value for what we calculated for Beta-1 is smaller than 0.05. Then we are concluding that the probability that we're getting that Beta-1 due to random chance is too small. Where we're concluding that the probability that we're getting that Beta-1 of 0.34, even though. True Beta one is actually 0, is too small our probability. So if we, if we get a p-value smaller than 0.05, we reject the null hypotheses as we, as we did with the case of the election rigging example, right? And we conclude that in 2016, voting for Trump in California was higher in counties with a greater white population than in counties with a smaller white population. By contrast, if the p-value is greater than 0.05, we calculate that there's too high a probability that we're getting, that, that we are calculating that Beta-1, even though the true Beta one is actually 0. If the p-value is bigger than point of five, recalculating that there's two higher probability that we're getting that Beta-1 due to random chance. And hypothesis testing with confidence intervals also applies to the significance test of the slope estimate. So by significance test, we're clear, BMI is this statistically significant rate and we call a result statistically significant. When. So our result is statistically significant at the 0.05 level if the p-value is smaller than 0.05, right? If we, statistical significance means that we can be confident that we're not getting that result out of random chance, right? We, we are concluding that there is too small probability that we're getting a result, even though the null hypothesis is true. And we can also arrive at the same conclusion about statistical significance, looking at confidence intervals. So we can construct a 95 percent confidence interval for Beta 1 and verify if it includes 0. And if the confidence interval does not include 0, we can, we can reject the null hypothesis. And remember that hypothesis testing with the p-value will never contradict hypothesis testing with confidence intervals. So in our case here, right, we're working with the alpha of 0.05. And so this means that whatever, whenever our conclusion you get on whether to reject the null hypotheses based on comparing the p-value to 0.5, you're going to get the same conclusion if you look at the 95 percent confidence interval, okay, So if the 95 percent confidence interval of a slope does not include 0, the p-value will necessarily be smaller than 0.05, okay? Because the same inputs or go into the calculation of the p-value and the confidence interval for the slope. So in our regression of the Trump vote on percent white, this is the results. These are the results that we have. We have a Beta-1 of 0.34, which is approximately 1 third. And the 95 percent confidence interval is 0.16.53, meaning we can be 95 percent confident that the true slope of that, the, the true slope of percent white on Trump vote is between 0.16.53. So it could be as the, the effect of prep on the Trump vote could be as low as 0.16 and as high as 0.53. And it is also the case that the p-value is really small. It is smaller than 0.001. So with these results, we can conclude that there's two lower probability that we are getting 0.34 as our slope due to random chance. There's two small probability that we are getting 0.34, even though the true Beta 1 is 0. So we reject the null hypothesis, and we conclude that in 2016, voting for Trump and California was higher in calories with a greater white population than in counties with a smaller white population. So finally, to conclude a little primer on what factors go into the calculation of the p-value and the sea ice. Remember the p-value and sea ice will always lead you to the same conclusion and the same factors that will impact how small the p-value is will also impact to whether the confidence interval includes 0. And there are four factors. The first is the sample size. The more data you have, the smaller the p-value, the more data you have, the more likely it will be that the confidence intervals will not include 0 as well. How steep is the slope? Or in other words, how far from 0 is your beta-1? The steeper the slope, and the further from the 0 the data one, the smaller the p-value. And also the more likely that the confidence interval would not include 0. The spread of the dependent variable is also a factor. The more spread out axis, the greater the variance x of the dependent variable, the smaller the p-value. And the more likely it is that the confidence intervals will not include the 0, which will, which means that you are getting statistical significance when the fourth factor is the distance to the regression line. So the closer overall your dots are to the regression line, the smaller the p-value, the more likely it is that the confidence intervals will not include the 0. And the more likely you will be to get a statistically significant result.