10.3 Sampling in quantitative research
Learning Objectives
- Describe how probability sampling differs from nonprobability sampling
- Define generalizability, and describe how it is achieved in probability samples
- Identify the various types of probability samples, and describe why a researcher may use one type over another
Quantitative researchers are often interested in making generalizations about groups that are larger than their study samples, which means that they seek nomothetic causal explanations. While there are certainly instances when quantitative researchers rely on nonprobability samples (e.g., when doing exploratory research), quantitative researchers tend to rely on probability sampling techniques. The goals and techniques associated with probability samples differ from those of nonprobability samples. We’ll explore those unique goals and techniques in this section.
Probability sampling
Unlike nonprobability sampling, probability sampling refers to sampling techniques for which a person’s likelihood of being selected from the sampling frame is known. We care about a potential participant’s likelihood of being selected for the sample because in most cases, researchers use probability sampling techniques to identify a representative sample from which to collect data. A representative sample resembles important characteristics of the population from which it was drawn, in ways that are important for the research being conducted. For example, if you wish to report about differences between men and women at the end of your study, you should ensure that your sample doesn’t contain only women. While that example is an oversimplification, representativeness means that your sample should contain the same sorts of variation that are present in your larger target population.
Obtaining a representative sample is important in probability sampling because of generalizability. In fact, generalizability is perhaps the key feature that distinguishes probability samples from nonprobability samples. Generalizability refers to the idea that a study’s results will tell us something about a group larger than the sample from which the findings were generated. In order to achieve generalizability, a core principle of probability sampling is that all elements in the researcher’s sampling frame have an equal chance of being selected for inclusion in the study. In research, this is the principle of random selection. Researchers use a computer’s random number generator to determine who from the sampling frame gets recruited into the sample.
Using random selection does not mean that your sample will be perfect. No sample is perfect. The researcher can only yield perfect results if they include everyone from the target population into the sample, which defeats the purpose of sampling. Generalizing from a sample to a population always contains some degree of error. This is referred to as sampling error, which is the statistical calculation of the difference between results from a sample and the actual parameters of a population.
Generalizability is a pretty easy concept to grasp. Imagine if a professor were to take a sample of individuals in your class to see if the material is too hard or too easy. However, the professor only sampled individuals that had grades over 90% in the class. Would that be a representative sample of all students in the class? That would be a case of sampling error—a mismatch between the results of the sample and the true feelings of the overall class. In other words, the results of the professor’s study don’t generalize to the overall population of the class.
Taking this one step further, imagine your professor is conducting a study on binge drinking among college students. The professor uses undergraduates at your school as their sampling frame. Even the professor used probability sampling methods, your school may differ from other schools in important ways. There are schools that are “party schools” where binge drinking may be more socially accepted, “commuter schools” at which there is little nightlife, and so on. If your professor plans to generalize their results to all college students, then they will have to make an argument that their sampling frame (undergraduates at your school) is representative of the population (all undergraduate college students).
Types of probability samples
There are a variety of probability samples that researchers may use. These include simple random samples, systematic samples, stratified samples, and cluster samples. Let’s build on the previous example. Imagine we were concerned with binge drinking and chose the target population of fraternity members. How might you go about getting a probability sample of fraternity members that is representative of the overall population?
Simple random samples are the most basic type of probability sample. A simple random sample requires a real sampling frame—an actual list of each person in the sampling frame. Your school likely has a list of all of the fraternity members on campus, as Greek life is subject to university oversight. You could the university list as your sampling frame. From there, you would sequentially assign a number to each fraternity member, or element, and then randomly select the elements from which you will collect data.
True randomness is difficult to achieve, and it takes complex computational calculations to do so. Although you think you can select things at random, human-generated randomness is actually quite predictable, as it falls into patterns called heuristics. Researchers must rely on computer-generated assistance to attain a truly random selection. You can utilize one of the many free websites that provide quality pseudo-random number generators. I suggest using Random.org, which contains a random number generator that can also randomize lists of participants. Sometimes, researchers use a table of numbers that have been generated randomly. There are several possible sources for obtaining a random number table. Some statistics and research methods textbooks offer such tables as appendices to the text.
As you might have guessed, drawing a simple random sample can be quite tedious. Systematic sampling techniques are somewhat less tedious but offer the benefits of a random sample. As with simple random samples, you must possess a list of everyone in your sampling frame. To draw a systematic sample, you’d simply select every kth element on sampling frame list. If you are unfamiliar with this concept, k is your selection interval or the distance between the elements you select for inclusion in your study. To begin the selection process, you’ll need to figure out how many elements you wish to include in your sample. Let’s say you want to interview 25 fraternity members on your campus, and there are 100 men on campus who are members of fraternities. In this case, your selection interval, or k, is 4. To arrive at 4, simply divide the total number of population elements by your desired sample size. This process is represented in Figure 10.2.
To determine where to begin selecting the 25 names from your list of 100, begin by selecting a number between 1 and k. If we select 3 as our starting point, we’d begin by selecting the third fraternity member on the list and then select every fourth member from there. This might be easier to understand if you can see it visually. Table 10.2 lists the names of our hypothetical 100 fraternity members on campus. You’ll see that the third name on the list has been selected for inclusion in our hypothetical study, as has every fourth name after that. A total of 25 names have been selected.
Number | Name | Include in study? | Number | Name | Include in study? | |
1 | Jacob | 51 | Blake | Yes | ||
2 | Ethan | 52 | Oliver | |||
3 | Michael | Yes | 53 | Cole | ||
4 | Jayden | 54 | Carlos | |||
5 | William | 55 | Jaden | Yes | ||
6 | Alexander | 56 | Jesus | |||
7 | Noah | Yes | 57 | Alex | ||
8 | Daniel | 58 | Aiden | |||
9 | Aiden | 59 | Eric | Yes | ||
10 | Anthony | 60 | Hayden | |||
11 | Joshua | Yes | 61 | Brian | ||
12 | Mason | 62 | Max | |||
13 | Christopher | 63 | Jaxon | Yes | ||
14 | Andrew | 64 | Brian | |||
15 | David | Yes | 65 | Mathew | ||
16 | Logan | 66 | Elijah | |||
17 | James | 67 | Joseph | Yes | ||
18 | Gabriel | 68 | Benjamin | |||
19 | Ryan | Yes | 69 | Samuel | ||
20 | Jackson | 70 | John | |||
21 | Nathan | 71 | Jonathan | Yes | ||
22 | Christian | 72 | Liam | |||
23 | Dylan | Yes | 73 | Landon | ||
24 | Caleb | 74 | Tyler | |||
25 | Lucas | 75 | Evan | Yes | ||
26 | Gavin | 76 | Nicholas | |||
27 | Isaac | Yes | 77 | Braden | ||
28 | Luke | 78 | Angel | |||
29 | Brandon | 79 | Jack | |||
30 | Isaiah | 80 | Jordan | |||
31 | Owen | Yes | 81 | Carter | ||
32 | Conner | 82 | Justin | |||
33 | Jose | 83 | Jeremiah | Yes | ||
34 | Julian | 84 | Robert | |||
35 | Aaron | Yes | 85 | Adrian | ||
36 | Wyatt | 86 | Kevin | |||
37 | Hunter | 87 | Cameron | Yes | ||
38 | Zachary | 88 | Thomas | |||
39 | Charles | Yes | 89 | Austin | ||
40 | Eli | 90 | Chase | |||
41 | Henry | 91 | Sebastian | Yes | ||
42 | Jason | 92 | Levi | |||
43 | Xavier | Yes | 93 | Ian | ||
44 | Colton | 94 | Dominic | |||
45 | Juan | 95 | Cooper | Yes | ||
46 | Josiah | 96 | Luis | |||
47 | Ayden | Yes | 97 | Carson | ||
48 | Adam | 98 | Nathaniel | |||
49 | Brody | 99 | Tristan | Yes | ||
50 | Diego | 100 | Parker | |||
In case you’re wondering how I came up with 100 unique names for this table, I’ll let you in on a little secret: lists of popular baby names can be great resources for researchers. I used the list of top 100 names for boys based on Social Security Administration statistics for this table. I often use baby name lists to come up with pseudonyms for field research subjects and interview participants. See Family Education. (n.d.). Name lab. Retrieved from http://baby-names.familyeducation.com/popular-names/boys. |
There is one clear instance in which systematic sampling should not be employed. If your sampling frame has any pattern to it, you could inadvertently introduce bias into your sample by using a systemic sampling strategy. (Bias will be discussed in more depth in the next section.) This is sometimes referred to as the problem of periodicity. Periodicity refers to the tendency for a pattern to occur at regular intervals. Let’s say, for example, that you wanted to observe campus binge drinking on different days of the week. Perhaps you need to have your observations completed within 28 days and you wish to conduct four observations on randomly chosen days. Table 10.3 shows a list of the population elements for this example. To determine which days we’ll conduct our observations, we’ll need to determine our selection interval. As you’ll recall from the preceding paragraphs, we find our selection interval by dividing our population size (28-day period) by our desired sample size (4 observation days). This formula leads us to a selection interval of 7. If we randomly select 2 as our starting point and select every seventh day after that, we’ll wind up with a total of 4 days on which to conduct our observations. You’ll see how that works out in the following table.
Day # | Day | Drinking | Observe? | Day # | Day | Drinking | Observe? | |
1 | Monday | Low | 15 | Monday | Low | |||
2 | Tuesday | Low | Yes | 16 | Tuesday | Low | Yes | |
3 | Wednesday | Low | 17 | Wednesday | Low | |||
4 | Thursday | High | 18 | Thursday | High | |||
5 | Friday | High | 19 | Friday | High | |||
6 | Saturday | High | 20 | Saturday | High | |||
7 | Sunday | Low | 21 | Sunday | Low | |||
8 | Monday | Low | 22 | Monday | Low | |||
9 | Tuesday | Low | Yes | 23 | Tuesday | Low | Yes | |
10 | Wednesday | Low | 24 | Wednesday | Low | |||
11 | Thursday | High | 25 | Thursday | High | |||
12 | Friday | High | 26 | Friday | High | |||
13 | Saturday | High | 27 | Saturday | High | |||
14 | Sunday | Low | 28 | Sunday | Low |
Do you notice any problems with our selection of observation days in Table 1? Apparently, we’ll only be observing on Tuesdays. Moreover, Tuesdays may not be an ideal day to observe binge drinking behavior. Unless alcohol consumption patterns have changed significantly since I was in my undergraduate program, I would assume binge drinking is more likely to happen over the weekend.
In cases such as this, where the sampling frame is cyclical, it would be better to use a stratified sampling technique. In stratified sampling, a researcher will divide the study population into relevant subgroups and then draw a sample from each subgroup. In this example, we might wish to first divide our sampling frame into two lists: weekend days and weekdays. Once we have our two lists, we can then apply either simple random or systematic sampling techniques to each subgroup.
Stratified sampling is a good technique to use when a subgroup of interest makes up a relatively small proportion of the overall sample, like in our previous example. In our example study of binge drinking, we want to include both weekdays and weekends in our sample. Since weekends make up less than a third of an entire week, there’s a chance that a simple random or systematic strategy would not yield sufficient weekend observation days. As you might imagine, stratified sampling is even more useful in cases where a subgroup makes up an even smaller proportion of the sampling frame—for example, students who are in their fifth year of their undergraduate program make up only a small percentage of the population of undergraduate students. While using a simple random or systematic sampling strategy may not yield any fifth-year students, utilization of stratified sampling methods would ensure that our sample contained the proportion of fifth-year students that is reflective of the larger population.
In this case, class year (e.g., freshman, sophomore, junior, senior, and fifth-year) is our strata, or the characteristic by which the sample is divided. When we use stratified sampling, we are concerned with how well our sample reflects the population. A sample with too many freshmen may skew our results in one direction because perhaps they binge drink more (or less) than students in other class years. Using stratified sampling allows us to make sure our sample has the same proportion of people from each class year as the overall population of the school.
Up to this point in our discussion of probability samples, we’ve assumed that researchers will be able to access a list of population elements in order to create a sampling frame. As you might imagine, this is not always the case. Let’s say that you want to conduct a study of binge drinking across fraternity members at each undergraduate program in your state. Imagine how difficult it would be to create a list of every single fraternity member in the state. Even if you could find a way to generate such a list, attempting to do so might not be the most practical use of your time or resources. When this is the case, researchers turn to cluster sampling. A researcher using the cluster sampling method will begin by sampling groups (or clusters) of population elements and then selecting elements from within those groups.
Let’s work through how we might use cluster sampling in our study of binge drinking. While creating a list of all fraternity members in your state would be next to impossible, you could easily create a list of all undergraduate colleges in your state. Thus, you could draw a random sample of undergraduate colleges (your cluster) and then draw another random sample of elements (in this case, fraternity members) from within the undergraduate college you initially selected. Cluster sampling works in stages. In this example, we sampled in two stages— (1) undergraduate colleges and (2) fraternity members at the undergraduate colleges we selected. However, we could add another stage if it made sense to do so. We could randomly select (1) undergraduate colleges (2) specific fraternities at each school and (3) individual fraternity members. It is worthwhile to note that each stage is subject to its own sample error, so choosing to sampling in multiple stages could yield greater error. Nevertheless, cluster sampling is a highly efficient method.
Jessica Holt and Wayne Gillespie (2008) [2] used cluster sampling in their study of students’ experiences with violence in intimate relationships. Specifically, the researchers randomly selected 14 classes on their campus and then drew a random subsample of students from those classes. You probably know that college classes are different sizes. If the researchers had simply selected 14 classes at random and selected the same number of students from each class to complete their survey, then students in the smaller classes would have had a greater chance of being selected for the study than students in larger classes. Keep in mind, the goal of random sampling is to ensure that each element has the same chance of being selected. When clusters are different sizes, as in the previous example, researchers often use a method called probability proportionate to size (PPS). This means that they account for the different sizes of their clusters by giving the clusters different chances of being selected based on their size so that each element within those clusters has equal chance of being selected.
To summarize, probability samples allow a researcher to make conclusions about larger groups. Probability samples require a sampling frame from which elements, usually human beings, can be selected at random from a list. Even though random selection has less error and bias then nonprobability samples, some error will always remain. If researchers utilize a random number table or generator, then they can accurately state that their sample represents the population from which it was drawn. This strength is common to all probability sampling approaches summarized in Table 10.4.
Sample type | Description |
Simple random | Researcher randomly selects elements from sampling frame. |
Systematic | Researcher selects every kth element from sampling frame. |
Stratified | Researcher creates subgroups then randomly selects elements from each subgroup. |
Cluster | Researcher randomly selects clusters then randomly selects elements from selected clusters. |
In determining which probability sampling approach makes the most sense for your project, it helps to have a strong understanding of your population. Simple random samples and systematic samples are relatively easy to carry out because they both require a list all elements in your sampling frame, but systematic sampling is slightly easier because it does not require you to use a random number generator. Instead, you can use a sampling interval that is simple to calculate by hand.
The relative simplicity of both approaches is counterweighted by their lack of sensitivity to the characteristics of your study’s target population. Stratified samples can better account for periodicity by creating strata that reduce or eliminate the effects of periodicity. Stratified samples also ensure that smaller subgroups are included in your sample, thus making your sample more representative of the overall population. While these benefits are important, creating strata for this purpose requires knowing information about your population before beginning the sampling process. In our binge drinking example, we would need to know how many students are in each class year to make sure our sample contained the same proportions. For example, we would need to know that fifth-year students make up 5% of the student population to ensure that 5% of our sample is comprised of fifth-year students. If the true population parameters are unknown, stratified sampling becomes significantly more challenging.
Each of the previous probability sampling approaches requires using a real list of all elements in your sampling frame. However, cluster sampling is different because it allows a researcher to perform probability sampling when a list of elements is not available or pragmatic to create. Cluster sampling is also useful for making claims about a larger population, like all fraternity members in a state. However, there is a greater chance of sampling error, as sampling occurs at multiple stages in the process, like at the university level and student level. For many researchers, this weakness is outweighed by the benefits of cluster sampling.
Key Takeaways
- The goal of probability sampling is to identify a sample that resembles the population from which it was drawn.
- There are several types of probability samples including simple random samples, systematic samples, stratified samples, and cluster samples.
- Probability samples usually require a real list of elements in your sampling frame, though cluster sampling can be conducted without one.
Glossary
Cluster sampling– a sampling approach that begins by sampling groups (or clusters) of population elements and selecting elements from within those groups
Generalizability– the idea that a study’s results will tell us something about a group larger than the sample from which the findings were generated
Periodicity– the tendency for a pattern to occur at regular intervals
Probability proportionate to size– in cluster sampling, giving clusters different chances of being selected based on their size so that each element within those clusters has an equal chance of being selected
Probability sampling– sampling approaches for which a person’s likelihood of being selected from the sampling frame is known
Random selection– using a randomly generated numbers to determine who from the sampling frame gets recruited into the sample
Representative sample– a sample that resembles the population from which it was drawn in all the ways that are important for the research being conducted
Sampling error– a statistical calculation of the difference between results from a sample and the actual parameters of a population
Simple random sampling– selecting elements from a list using randomly generated numbers
Strata– the characteristic by which the sample is divided
Stratified sampling– dividing the study population into relevant subgroups and then drawing a sample from each subgroup
Systematic sampling– selecting every kth element from a list
- Figure 10.2 copied from Blackstone, A. (2012) Principles of sociological inquiry: Qualitative and quantitative methods. Saylor Foundation. Retrieved from: https://saylordotorg.github.io/text_principles-of-sociological-inquiry-qualitative-and-quantitative-methods/ Shared under CC-BY-NC-SA 3.0 License (https://creativecommons.org/licenses/by-nc-sa/3.0/) ↵
- Holt, J. L., & Gillespie, W. (2008). Intergenerational transmission of violence, threatened egoism, and reciprocity: A test of multiple psychosocial factors affecting intimate partner violence. American Journal of Criminal Justice, 33, 252–266. ↵