Foundations of Quantitative Research in Political Science
- Candidates running for office often send mailers to potential voters, encouraging them to go out and vote. These letters or postcards usually include information about both the candidates and how to vote. Let's say we wanna know whether these mailers increase the chances that individuals vote. What kind of research design might we wanna use? Well, in this video we'll introduce you to a very important type of research design: the experiment. After watching this video, you should be able to identify a randomized experiment and articulate how it is different from an observational study; to find treatment and identify the treatment and control groups in a given experiment; and explain how randomized experiments mitigate our concerns about confounds. First, let's quickly remember why simple polling wouldn't be sufficient to answer our question. And by simple polling, I just mean you go out and you conduct a survey of a random sample of people to see, first, whether they received a mailer for the last election, and second, whether they voted. The reason is that our analysis might be influenced by confounding variables. In this video, I'll provide a brief recap of confounding variables but if this concept seems new or confusing I encourage you to revisit the module in confounding variables and then come back to this video. Okay, so what are confounding variables? Briefly, confounding variables are variables that influence our dependent variable and are correlated with our independent variable, and they may cause us to misinterpret our data if we don't take them into account. For example, it could be the case that as X increases Y increases. So we're tempted to conclude that X causes Y. But in reality, a third variable caused both the increase in X and in Y. And if we don't realize this, we could come to faulty conclusions about our data. In this video, we'll talk about one way to deal with confounding variables: experiments. Let's go back to our turnout example. We wanna know whether sending potential voters postcards with information about the election will increase the chances they vote. An initial poll suggests that they might. People who receive postcards were more likely to vote. But can you think of any potential confounds? Well, it could be that political parties are targeting specific types of voters. For example, consider a candidate that doesn't have much money to spend on his re-election campaign. This candidate might want to send postcards to voters that live in districts he won in the last election because he thinks that they're more likely to vote for him in this election. But these voters were already more likely to vote. We know this because they voted in the last election. So our poll might just be showing us that people who voted last time were more likely to vote this time. Consider another example. Maybe a politician wants to send postcards to the people who didn't vote last time because the politician thinks that people who voted for him last time will turn out to vote without a reminder. This might actually decrease the apparent relationship between postcards and voting. In other words, the effect of postcards on voting might actually be bigger than we see in our poll. So what do we do about confounds? Well, in this video series, we'll talk about two main types of solutions. First, we can fix the problem before we collect data using research design. Or we can do it after we collect data using statistics. In this video, we'll just talk about the first solution. What would be the ideal evidence to show us that receiving an informational postcard really does increase the chances that an individual turns out to vote? Well, you might think that we would wanna select a group of people, send them informational postcards with information about where to vote, the day of the election, et cetera, and then observe whether they decide to vote. But to be really sure that the postcards have an effect, we would also wanna know what those same people would have done without any interference from us, meaning we hadn't sent them the postcard. Ideally, we'd be able to use a time machine to go back in time before we sent the postcards and record whether they vote anyways without the postcards. Then we could compare how many people voted in each of these alternate realities. Did more people vote in the reality in which we sent the postcards compared to the reality in which we didn't? If they did then we could conclude that sending the postcards really did increase turnout in the election. So why is this the ideal solution? This is worth taking a moment to think about. This is the ideal research design because we know that everything in our two alternate realities is exactly the same: the potential voters, the economy, the important issues in the election, and the candidates. Everything's the same, except for one very important factor: whether the potential voters receive postcards or not. This allows us to conclude that any differences in turnout must be a result of the postcards. But obviously, we don't all have time machines laying around or the ability to observe alternate realities. So what do we do in real life? What we do is compare two very similar groups, think of them as our alternate realities or our pre-time machine voters and our post-time machine voters. It's important that these two groups are comparable in all ways, except for whether they receive the postcards. By "comparable," I just mean that we want the two groups to be as similar as possible to make sure they approximate the two alternate realities as much as we can. Note that the two groups don't have to be exactly similar in every single way. We just need them to be what we call similar on average. And we can get these two comparable groups simply by randomly assigning each potential voter to one group or the other. Once we have the two groups, we send one group postcards but we don't send anything to the second group. We call the first group the treatment group because they receive the treatment, in this case the postcards, and we call the second group - the group that didn't receive anything in the mail - the control group. Finally, we observe how many people in each group votes. If more people voted in the treatment group, then we can conclude that the postcards increased turnout. Let's take a step back and make sure that we understand how that worked. We wanna understand the effects of informational mailers on voting, but we were worried that maybe politicians only send these mailers to certain types of voters. In other words, we were worried that receiving a mailer would be correlated with some other characteristic of the voter, like the voter's baseline propensity to vote. If the voter's baseline propensity to vote influences whether they voted in this election - our dependent variable - and is correlated with receiving a mailer, our independent variable is a confounding variable. When we randomly assign some people to get postcards and others not to, we eliminate the possibility of a correlation between receiving a postcard and the baseline propensity to vote. In other words, we mitigate the problem of confounding variables. Note that for this to work, the process of splitting the groups, so deciding who gets postcards in this case, has to be random. That's how we know that the groups are comparable or the same on average. While we won't get into the math behind why this is the case, the cool thing about randomly assigning units to treatment and control groups is that it creates two groups that are very similar across both observable and unobservable characteristics. In contrast, when candidates are choosing to send postcards only to voters they think will vote for them, they're probably not sending the postcards randomly. These groups could be different in a bunch of different ways. They could belong to different political parties for example, or one group may be more interested in politics. So we really like randomization. In fact, many researchers consider randomized experiments to be the gold standard of research designs. However, it's important to note that randomized experiments are not the perfect solution for every research question. For example, consider studying the effects of war on polarization. Can we randomly assign some countries to go to war and some not to go to war and then measure how polarized the country is? No, there are obvious practical and ethical reasons why we should not attempt to do this. So to conclude, let's summarize what we've discussed. First, understanding how to deal with the confounding variables is a really important task for all researchers. Without understanding how to do this, we can't be confident in the relationships we see in our data. In this video, we wondered whether sending potential voters informational postcards increases voter turnout and briefly discussed what ideal evidence to test what this hypothesis would look like: the time machine study. Since this type of study is impossible in the near future, we then considered it a hypothetical experiment to test our hypothesis. We identified the treatment, the treatment group, and the control group, and we explained that using randomization to create two similar groups where one receives the treatment and the other doesn't, allows us to simulate the time machine research design. We also noted that experiments aren't always practical or ethical for researchers to use, and briefly mentioned that there are other ways to decrease our concerns about confounds. While experiments reduce concerns about confounds before we collect data, we can also use statistical tools to control for confounds after we collect data. And that's our introduction to experiments. Before moving on to other videos, I encourage you to check your understanding using the quiz questions in this module.