1. Regression Analysis: Introduction

1. Regression Analysis: Introduction

Foundations of Quantitative Research in Political Science

Regression Analysis: Introduction

A Regression Equation is a Linear Function

To understand regression analysis, let us remember how linear functions work.

A linear function takes the general form y = a + bx. For example, think of the following linear function:

y = 5 + 2x

This function allows us to draw a graph that will give us the value of y for each value of x:

linear function.png The linear function and the graph are telling us that:

  • If x = 0, y = 5 + 2*0 = 5
  • If x = 1, y = 5 + 2*1 = 7
  • If x = 3, y = 5 + 2*3 = 11

And so on.

A regression equation follows the same logic. Using data, we can calculate a linear function that describes a relationship between variables. For now, we will limit ourselves to discussing regressions that describe the relationship between two variables. A regression equation with two variables is called a bivariate regression or a simple regression.

A bivariate regression equation describes the relationship between a dependent variable and an independent variable. In any bivariate regression equation:

  • y is the dependent variable
  • is the independent variable

Example: Percent White and the Trump Vote in California Counties, 2016

This spreadsheet Download This spreadsheetcontains data on the racial composition and support for Trump in 2016 for 58 counties in California. The scatter plot below describes the relationship between these two variables:

trump_pct_white_scatter_v2.png
  • Each dot is a county in California
  • A county's support for Trump in 2016 is on the y (vertical) axis
  • A county's percentage of White residents in 2016 is on the x (horizontal) axis

Note that there is a positive relationship between the two variables: the "whiter" the county, the greater its support for Trump.

The dotted line on the scatter plot is a regression line. Support for Trump is the dependent variable (y), and percent White is the independent variable (x).

With the data on the spreadsheet, we can calculate the following regression equation:

As with a linear function, we can input values of percent White (x) and obtain predicted values for support for Trump (y). For example, this regression equation is telling us that:

  • If a county in California is 40% white, we should expect support for Trump in this county to be 23 + (40/3) = 36%
  • If a county in California is 50% white, we should expect support for Trump in this county to be 23 + (50/3) = 39.7%
  • If a county in California is 90% white, we should expect support for Trump in this county to be 23 + (90/3) = 53%

Note that this regression equation also implies that for every increase in one percentage point in percent White, we should expect an increase of 13 percentage points in support for Trump.

Next Steps

With a simple regression equation, we try to estimate the value of the dependent variable based on the value of the independent variable. We are estimating that in a county that is 40% White, support for Trump will be 36%

However, if you look at the scatter plot one more time, you will notice that there are a few counties where the percentage of White residents is around 40%, and in these counties, support for Trump ranges from as low as 10% to more than 50%. That is because the racial composition of a county is not the only thing that matters to determine its electoral results. Voting is a lot more complex than this, and there is no single variable that can single-handedly predict electoral results.

In the next pages of this module, we will delve deeper into these issues pertaining to regression analysis. In addition, we will approach questions such as:

  • How did we obtain this regression equation from the data in the spreadsheet?
  • What are the p-values associated with this equation? In other words, what is the probability that we get numbers at least as extreme as these numbers if the null hypothesis is true?
  • How much of the variation in voting for Trump is explained by the percentages of White residents in counties of California?

You will notice that the videos in this module look different from the other modules on the Foundations page. This is because the videos were adapted from another course (Econ/Poli 5D). We believe that these materials should help students master regression analysis.

Scroll to Top