Foundations of Quantitative Research in Political Science

Video: Regression and Confounding Variables

Standard Error

Introduction

You have learned that a regression output will give you the p-value of the slope estimate for each independent variable, as well as 95% confidence intervals. With information about the p-value and the 95% CIs, we can tell if a slope estimate is statistically significant. A slope estimate is statistically significant if we can be confident that the result is not due to random chance—that the result is due to an existing relationship in the data.

In this lesson, you will learn about a quantity that is key to determine if a result is statistically significant. This quantity is the standard error of the regression slope.

Standard Deviation v. Standard Error

A good way to understand the standard error is to compare it to the standard deviation. If you need a refresher on the standard deviation, please review an excellent explanation by the Khan Academy by clicking hereLinks to an external site..

Both the standard deviation and standard error are measures of spread, or how far apart the values of a dataset are. The difference is that the standard deviation is a measure of the spread of observed data, whereas the standard error is a measure of the spread of estimated data.

If we collect data on how old each student in a classroom is (observed data), we can calculate a standard deviation. If we use the same data to run a regression, and obtain a regression slope (estimated data), we can calculate the standard error of the regression slope.

Example: Age and Police Feeling Thermometer

In 2016, a large group of political science researchers collected data for the American National Election Studies (ANES)Links to an external site.. They interviewed a representative sample of over 3,500 Americans and asked them a series of questions about their political views. Among these questions, they asked respondents about their age and how they feel about the Police on a scale from 0 to 100, where 0 means disapproval and 100 means approval. They call this the Police Feeling Thermometer (from now on, Police FT).

Using this data, we can run a simple regression with the Police FT as the dependent variable and age as the independent variable. This is what we get:

Police FT = 62 + 0.27*Age

Meaning, for an increase of 1 in age, we should expect an increase of 0.27 in Police FT.

Remember, however, that we are using data from a sample of approximately 3,500 Americans.

Imagine if the researchers working on the ANES were to start over, and interview another set of 3,500 Americans, asking them the same questions. Now imagine that the researchers, using the hypothetical, new sample of Americans, were to run the same regression. Would they get the exact same slope?

In all likelihood, they would not! The 3,500 people in the new sample would have different ages and different Police FTs, in a way that the regression slope would most likely be different. Let's suppose that this new slope, instead of 0.27, would be 0.35.

Now imagine that the researchers working on the ANES were to repeat this process a third time: another sample, another regression, and this time a slope of 0.24. Now imagine that these researchers were to repeat this process a fourth time, a fifth time... ten times. Exercise your imagination one more time, and imagine that by doing all of that, researchers would get the following slopes:

Survey	Slope	Survey	Slope
ANES 2016 (Original)	0.27	ANES 2016 #6	0.19
ANES 2016 #2	0.35	ANES 2016 #7	0.26
ANES 2016 #3	0.24	ANES 2016 #8	0.31
ANES 2016 #4	0.25	ANES 2016 #9	0.22
ANES 2016 #5	0.23	ANES 2016 #10	0.28

The standard error of the slope is the spread of all these (hypothetical) slopes!

Fortunately, whenever we run a regression, we can still calculate what the spread of the slope would having to go through the process of collecting data over and over again. For every regression slope, there is a standard error.

Why is This Important?

The standard error is a key quantity that allows us to obtain the p-value and the confidence intervals for the regression slope. By extension, the standard error is key to understanding if a slope estimate is statistically significant.

We can find out if a slope estimate is statistically significant by dividing the slope estimate by the standard error (SE):

If the slope estimate is positive and more than twice the size of the SE, we can be 95% confident that the true slope is greater than zero.
If the slope estimate is negative, and by dividing the slope estimate by the SE we get less than -2, we can be 95% confident that the true slope is less than zero.
If by dividing the slope estimate by the SE we get a number between -2 and 2, we cannot be 95% confident that the true slope is different than zero.

The table below summarizes the relationship between the standard error of the regression slope, p-values, CIs, and hypothesis testing.

if then.png

Before moving onto the next video, one caveat: when we use 2 and -2 as thresholds, we're rounding from 1.98 and -1.98. So to be extremely precise, we would have to replace all the 2's and -2's for 1.98's and -1.98's. For practical reasons, however, there is no harm is rounding to 2.

Video: Regression and Confounding Variables

Video: Reading Regression Tables

8. Standard Error

8. Standard Error

Home

Modules

Quizzes

Library Resources

Foundations of Quantitative Research in Political Science

Standard Error