Foundations of Quantitative Research in Political Science
This video, we're going to do a recap, a refresher on how to run and interpret the results of a regression. So we're going to review regression analysis. Width, the same regression, the same data that we have analyzed before when we're looking at race and elation election outcomes in California in 2016. So remember we had the alternative hypotheses we post to hypotheses that in 2016, voting for shrimp in California was higher in counties with a greater white population than counties with a smaller population. And the null hypothesis is that in that, in 2016, voting for Trump in California was no different between counties with a greater white population and a smaller white population. So we can test these hypotheses with a regression analysis. So we run, we specify a regression model. It has percent Trump as the dependent variable. Beta 0 is the intercept. Beta one is the slope. And the equation says that the predicted percent Trump, the predicted support for Trump in a given county, would be equal to the intercept plus the slope of percent white times percent white. So in this case, our alternative hypotheses at 40 for Trump in California was higher in counties with a greater white population. It implies that beta one, our slope, is greater than 0. By contrast, the null hypotheses that in 2016, voting for Trump in California was no different between counties with a greater and smaller by population. This implies that beta one is 0. Beta one is 0. If the slope is 0, it means that no matter how much the percent white cross carries, change, how much this variable changes, there will be no difference in percent Trump present trump will not vary as present wide berries. So we have now weighed out the hypotheses and the null hypothesis, the alternative in the know and what they imply for our, the slope, for our estimate of the slope. So remember we calculated the regression and we saw that the regression result was talionis, the presenter present trunk is 23 plus 1 third times present weight. And we can see the scatter plot with the regression line, that the regression line is the, the best fit. The line that minimizes the distances between the dots in the line. So now we can plug in some number of percent white and see what's the predicted Trump co-chair. And as we plug these numbers in working also. Look at the regression line. So f percent white, S4B, we should expect support for Trump to be 36, 6%, 60% white. We should expect the Trump folks yet to be 4380% percent white. We should expect the Trump poach it be 50%. Now is, is this results of, of the slope that we calculated, which is approximately 0.34, is that statistically significant? We have learned about P-values, right? And how the P-value is the probability that we observe a result at least as extreme and the result we are observing if the null hypothesis is correct. So when we bring up the p-value for the slope, we're talking about the probability that you observe a slope at least as extreme as 0.80.34 if the null hypothesis is correct. And by null hypothesis being correct here we mean the beta one, the slope is 0. So the p-value is the probability that we're estimating a slope at least as far from 0 is 0.34, even though the true slope is 0. So let's suppose we set our significance level alpha at 0.05. If the p-value is smaller than 0.05, we reject the null hypotheses concluded in 2016, voting for Trump in California was higher in counties with equator white population, then counties with a smaller white population. By contrast, if the p value, if the probability that the slope is at least as extreme as this. So we observe even though the null hypothesis is true, if that probability is higher than 5% or the pyridine 0.05, we fail to reject the null hypothesis. We fail to conclude that voting for Trump and California was higher kinds with a greater white population. Also, Hypothesis Testing with confidence interval also applies to the significance test of the slope estimate. We can construct a 95% confidence interval for the slope and verify if this confidence interval includes 0. If it does not include 0, we reject the null hypothesis. So let's take another look at the summary of our regression results. So notice that our estimate for the slope is right there. As, sorry, our estimate for the intercept is right there. It's totally, it's 23, bear 0. The slope estimate, which 24.34 beta-1. And notice how the p-value is smaller than 0.001, right? Therefore smaller than 0.05. P value of the slope. 95% confidence interval does not include 0 because it's 95%, can be 95% confident that the true slope is between 0.16.53. So given that we can reject the null hypotheses, notice also that we have an R-squared of 0.2, which means that approximately 20% of the variation in the trap code is explained by the percentage of white residents in account. So this is just a refresher, right, of a simple regression. It's important that we recap that before we move on to multivariate regression. Now all those quantities, the p-value, the confidence intervals, the slope, the meaning of the slope interpretation, r squared, and so on and so forth. All of these things right? With time, they should become second nature to you. Whenever you take a first look at a regression table or regression result, we'll be able to see and to assess, interpret what those results are telling you about the relationship between the independent variable and a dependent variable. Now in the next video, we're going to start talking about multivariate regression. When we have not only one variable on the right hand side, but more than one variable on the right-hand side, more than one independent variable predicting our dependent variable.