Foundations of Quantitative Research in Political Science
- In this course, you'll learn a lot about the basics of statistical analysis. Things like means medians, modes, samples, and confidence intervals. And this is great for answering questions like, "What are the odds of the thing that I observed happened by chance?" as well as being able to generalize a claim about some phenomenon from a sample to a population. But in this video, we're going to cover something a bit different from the basics of mathematical statistics. In this video, we're going to go over some of the fundamentals of creating a chart. While this is not a comprehensive overview of creating data visualizations, we will not shy away from the basics that go into many of the charts and graphs you see in everyday life. After this video, you will be able to understand the basics of creating data visualizations and have a firm grasp of what details you should include when creating graphs and charts. It's common for students to mix up two different types of graphs that are often used to summarize univariate data: histograms and bar charts. This is understandable because, as you will see, they look very similar. But the key thing to know when you're considering using a bar chart, histogram, or any other kind of plot is that it really depends on the type of variable or variables you are interested in visualizing. Let's begin with visualizing categorical data. Remember, with categorical data, the values have no particular order. In other words, you can assign categories to the data, but the categories themselves don't have a natural ordering. So how do we graphically visualize categorical data? To do this, we turn to what are called bar charts. Let's take a closer look at an example from data on political parties in the National Congress of Brazil. In this bar chart, we can see that the X axis consists of the initials of different Brazilian political parties and the Y axis displays the number of seats that a particular party won in 2002. This means that each bar represents a distinct category and the height represents the number of seats. Now, bar charts are only good for non-ordered categorical data. We can also use bar charts for the purpose of ordinal data. Remember, with ordinal data, the data itself is categorical, but the variables have a natural ordering. Let's take a look at a simple example to see this in practice. This graph displays the responses from participants in a hypothetical survey that asks whether individuals thought defense spending was too little, about right, or too much. We see on the X axis the responses that survey participants were allowed to give, and we can see that, in this case, that there's a natural ordering to the responses from low to high. We can also see that the height of the bars represents the proportion of people who chose a particular response. We can see in this example that approximately 30% of our survey respondents said that there was too little on defense spending; a bit more than 30% said that defense spending was about right; and about 35% said there was too much on defense spending. Before we move on, notice two things. First, in this graph, these proportions all add up to 100% to account for each individual that gave a response in the survey. Second, in both of these examples, the bars themselves are spaced apart by each category. We'll see in a second that this isn't the case for histograms. So what do we do when we make a graph of an interval or ratio variable, both of which are considered to be numerical variables? For these kinds of variables, we turn to histograms. Let's take a look at a simple example of a histogram that displays the age of people in a given room. We can see on this graph that the X axis is the age of people in the room. And on the Y axis is the number of people in each of these age categories. Notice another key feature of a histogram here. With histograms, we typically choose a range of values for each one of these bins here. In this example, I chose an age range of five years for each age category. Notice also that each of these bins touch one another. This is another difference between histograms and bar charts. In our bar chart example, none of the bars really touch one another. That's because with bar charts, we're interested in comparing individual categories or values like the number of seats a party holds in the legislature, and with histograms, we're more interested in comparing group values. In these last two examples, we looked at basic ways of how to correctly plot observations of a single characteristic or attribute, such as the number of seats a political party has in the legislature or the number of people belonging to a particular age range. We call this univariate data because, in these cases, we are interested in only a single attribute. But what if we want to visualize a relationship between two different variables? For example, what if we want to see whether one variable is correlated with another variable? With data visualization, we can typically create a scatter plot to accomplish this if we have two continuous variables. Let's take a look at an example. In this table, I have two different variables. On the left is the number of ads a particular politician ran in a given race. On the right is the number of votes they won in the election. It's not hard to imagine that the number of ads they ran in the race might've had an effect on their election results and not the other way around. In other words, the number of votes that candidate received may have depended on the number of ads they ran before the election. We can see whether this is the case by taking our table of data and turning it into a scatterplot. On the X axis, we're going to display our independent variable, which in this case is the number of ads a politician ran before the election, and on the Y axis, we're going to display our dependent variable, which is the amount of votes that particular candidate received. To create a scatterplot, we first make a scale on both axes. For the scales of both axes, we usually make them either have zero or the minimum value in the data and then have them run to either the maximum value in the data or a bit higher than that. For this example, let's put the number of ads a candidate that could potentially run at zero, since it's impossible to run a negative number of ads, and let's put the maximum number of ads ran in a campaign cycle at 100. Let's do something similar for the number of votes a candidate received. Next, we put a data point at the intersection of ads run and number of votes received so that one point represents a pair of data. When we do this, the scatterplot looks something like this. We can see here a clear, positive relationship between the number of ads a candidate ran and the number of votes they received in the election. Before concluding this video, I want to emphasize one more thing. Notice that each of the examples I've gone through have some common features. For example, if you go back, notice that each of the X and Y axes have a label. Both axes also have some sort of scale to give you a sense of the minimal and maximal values of the variables under consideration. Each of these features gives consumers key information about how to read a chart or graph. Without these essential features, your readers will have no understanding of what's going on in your chart, regardless of whether you chose the right graph for the right variable. You might want to consider incorporating other features as well, such as legends and different colored lines or bars to more easily display information. Unfortunately, we don't cover any of this in this video, but I do encourage you to check out some of the external resources we've provided you so that you can get a better understanding of data visualization. I also strongly encourage you to take the quiz at the end of this video to strengthen your knowledge on this topic. Thank you.