Foundations of Quantitative Research in Political Science
In this video, I discuss how to deal with confounding variables using multivariate regression. So let's revisit an example that was mentioned in Alex's video about experience. So here we're talking about the dependent variable gain, whether a person votes. The independent variable being whether the person received a letter. And we're going to bring up again the language of a treatment. And the treatment here is to send someone elaborate. So if we were to survey people in, ask them if they received the letter and if they voted. So if, if we were to see just if someone received a letter from up from some politician in the months before an election, if you were just to see they received the letter and we asked them if they voted or we find out whether the Vogue or not will not be able to use that data to conclude that receivable ledger is what causes the person who receive the ladder to go out and vote. Because it is likely that people who receive such letters are more predisposed to vote to begin with, it's likely that politicians could be sending letters to people who have voted in the past and only need a little, nice, little reminder to go vote on a particular collection. So if we could explore alternative realities with say, a time machine, we could be sure to only modify one aspect out of all others across those two different alternative realities, right? If we could only alter whether someone gets a letter, then we would be able to detect to causally identify the effect of the ladder on for me. But since we cannot do it at the next best solution to if we were to find out the effect of getting a letter in the mail is to randomly assign who gets the letter and who does it. Because if the treatment is randomly assigned and the sample is large enough, we can be confident that the treatment group is on average similar to the control group. Therefore, if there is a statistically significant difference in the dependent variable between the treatment and control groups. We can be confident that the difference was caused by the treatment. In researchers can raise enough funds to send out letters to a randomly selected group of voters and verify bulk records to see if those who received letters were more likely to vote. And there are no ethical issues with sending letters to encourage some people to vote. And indeed, because it's, it's possible to raise the funds and there are no ethical issues that prevent anybody from doing this. Political scientists have done experiments in which they send letters to a random, randomly assigned group voters. And they have verified the fact that getting a letter does make a person more likely to show up in the polls and vote. But it's side, it's not always going to be like this because there could be research questions were practical or ethical issues prevent the researcher from randomly assigning treatment. So for example, if we were to see to really try to identify F, investment in clean energy research has anything to do with carbon dioxide emissions. Ideally, we would get a randomly selected group of countries and increase investment in clean energy research in these countries. But even though there's that this is not necessarily unethical, it would be crazily expensive, right hand. Think about the amount of funds that a researcher would have to raise in order to increase investment in clean energy research in a random group of countries, right? No one would really be predisposed to find something like this because it would be simply too expensive. But a multivariate regression as a way to emulate an experiment by taking confounding variables into consideration. So for example, in the carbon dioxide emissions, in the clean energy research example, whenever we bring up wealth as a compound variable, someone could run a regression where the dependent variable is. Carbon, the level of carbon dioxide emissions probably be per capita, right? And on the right hand side we have the clean energy research and the above upper country. So by adding income to the regression, we compare the effect of clean energy research on carbon dioxide emissions. Holding income constant. So we're holding wealth constant, right? So we're taking this confounding variable into consideration. We're only comparing between countries with similar levels of wealth. So let's try to use more precise language rate. Let's correct what the slide says. Holding wealth, constant growth in income or similar things, but they're not exactly the same thing. But in any case, we will be controlling for wealth, would be detecting the effect of clean energy research on carbon dioxide emissions. Controlling for wealth. But confound variables are everywhere. So it's not the case that wealth is the only possible confounding variable that could be modeling the effect of our chosen independent variable on our dependent variable. I could quickly think of four possible confounding variables, but there are many more that I'm sure that if we were to spend more time thinking about it, we could come up with more. So for example, a confounding variable would be the amount of oil reserves that a country has. A second possible confounding variable would be whether the population of the country is conservative, right? Probably more against laws and regulations to limit carbon dioxide emissions. The presence or absence of laws and regulations to limit curb carbon dioxide emissions itself could be a confounding variable. It could be driving both the independent and the dependent variable. And also is if the government funds scientific research, it could be something that affects both the independent variable and the dependent variable. So these are only, for examples, out of many that variables it could be acting as confidence if you were to steady the determinants of carbon dioxide emissions by looking at investment, increase search. So there can always be confounds that we haven't thought about are, but also variables that we can observe or measure the depth. So these lurking confounding variables, they could always be present whenever we employ a multivariate regression. So even if we were to try to come up with measurements for all those things, oil reserves, ideology of the population, laws and regulations, government funding. This wouldn't, still wouldn't account for every single possible confounding variable that would be compromising our ability to effectively emulate an experience in an experiment. When you actually do an experiment, when you do the random assignment, you're making sure that the treatment and control groups that are similar to each other across every possible variable that you could imagine, including those variables that you can't observe, can't measure, or haven't thought about. But you, with multivariate regression, this is not the case in this means that the multivariate regression designed has its limitations, right? It has its promises by emulating an experiment, by controlling for some confounding variables. But it has its limitations because he can't control for all confounding variables. But sometimes a multivariate regression design is the best that we can do with the data we have. And sometimes political scientists will use multivariate regression that tried to detect relationship between variables using multivariate regression because there is no way to come up with any other solution that would allow for a precise cause of identification. Which certainly in the same way that you could with an experiment.