Foundations of Quantitative Research in Political Science
- Once we have our research question, hypothesis and theory, we want to collect data on our independent and dependent variables so with that we can test our hypothesis. Collecting data, however, can be very challenging sometimes. While ideally we want to have data on every single unit that we are interested in, which we call a population, oftentimes it's not feasible. When we only have data on some of the units that we're interested in, which we call a sample, it is very important that we understand how that data was collected because it matters for the conclusions that we can make. In this video, we will cover populations, samples, sampling procedures, and what conclusions we can make from different types of samples. So that after completing this video, you will be able to define a population, define different types of samples, and explain how the sampling procedure affects the conclusions we can make. Think about the 2020 election between Biden and Trump. As I record this, the election hasn't happened yet, it's a month away, but I really, really want to know who is going to win. I'm sure that you also wanted to know ahead of time who is going to win. Well, if I wanted to know who is going to win, I would have to ask every single person that was going to vote, which candidate they were going to vote for. So in this case, I'm really interested in making conclusions about all voters. In scientific language, we call who or what we want to study the population. A population is the entire group of units that we want to draw conclusions about. Take our example. If I really want to know who all voters will vote for, my population would be all voters. If instead I wanted to know who women were going to vote for, my population would be all women voters. If instead, I'm really interested in how retirees will vote and who they will vote for, my population would be all retired voters. Notice that the population changes depending on who or what we're interested in studying and making conclusions about. It's also very important to note that a population does not necessarily refer to people. A population can refer to countries, states, groups of people, organizations, et cetera. A population is defined by the units that we're interested in. For example, if I really wanted to know how cities in the United States spend their budget, my population would be all cities in the United States. However, while ideally we want to collect data on every single unit of our population, sometimes it's just not feasible. We just simply do not have the time or the money to ask every single voter how they will vote in the upcoming presidential election. In this case, what can we do? I still really want to know who's going to win the election. You might be thinking that we actually do have some idea about who's going to win and how people will vote. This is because you will have likely seen in your newsfeed an election poll. Tune into any news station before the election and you'll hear something like, "According to our latest national poll, Biden has 49% of the vote while Trump has 42% of the vote." But how did they know? We know that they didn't ask every single voter how they were going to vote. So how can they make this claim? Well, the news station only asks some voters and use this information to make conclusions about all voters. The first step is taking a sample from a population that we are interested in. A sample is a subset of units taken from the population that we want to collect data on. Notice that a sample is always smaller than the population. Take our example. While we cannot ask every single voter who they will vote for, we can ask some voters. That would be our sample. But remember, we're interested in understanding the population. Can we use samples to learn about populations? The answer is yes. Drawing conclusions about populations using samples is called inference. However, we can only draw inferences from certain types of samples. This is why it's so important to understand how our data was collected. In this video, we will cover two broad types of samples: non-probability samples, which we cannot use to make inference, and probability samples, which we can use to make inference. First, let's talk about non-probability samples. In non-probability samples, not everyone in the population has a probability of being selected into the sample. This means that we cannot make inferences using non-probability samples. Let's say we have a left leaning news agency that wants to conduct an electoral poll so it knows who will win the upcoming election. If their sample only comes from viewers of their news station, the sample will contain more left-leaning voters than there are in the general population. This means that the sample is not representative of the population. That is, our sample will look different than the population. When this is the case, data from our sample will show more support for the left-leaning candidate than there actually is in the population. Now let's talk about probability samples. Probability samples are samples where everyone in the population has some probability of being selected into the sample. While there are many types of probability samples, here we will focus on simple random sampling. In a simple, random sample, which I will simply refer to as a random sample, each unit in the population has the same probability of being selected into the sample. In other words, each unit in the population has an equal chance of being in the sample. Random samples are amazing because they give us samples that are representative of the population. This means that our sample will look very similar to our population across various characteristics. This also means that we can use these random samples to make inferences about the population. Think about our example. If we took a random sample of voters from the population of all voters, our sample would contain a very similar proportion of left-leaning voters and right-leaning voters than the population. This property of random samples is amazing because it allows us to use samples to generalize and make conclusions about the population. So if we have a random sample of voters and we asked them who they will vote for in the upcoming presidential election, we can use that to make conclusions about how all voters will vote with some degree of confidence. Random samples are amazing, but they're not perfect. To learn why random samples are not perfect, please watch the next video in this module on the margin of error. To recap, a population includes every unit that we want to study. A sample is a subgroup of a population. We really like random samples because they give us samples that are representative of the population. This means that we can use random samples to make inferences about the population. Non-probability samples do not allow us to make inferences about the population. This is precisely why understanding how the data was collected is very important. You should now feel comfortable moving on to the next video in this module and working through some of the example and quizzes in this module.