Why Sampling Works

Have you ever wondered how a pollster can claim that a poll of 1000 people can accurately represent the views and experience of the 210 million adults living in this country?

What is the population? What is the sample?

In polls, we often talk about representing the views of ALL adults in the country; during the election season, we may talk instead about representing the views of ALL registered voters or ALL likely voters. But the population can be anything; it just needs to be defined clearly. For example, our population of interest could be women, young men, teachers, or small business owners. 

Appropriately selecting the sample is considerably harder and involves more steps than simply defining the population.

When selecting a sample, a pollster needs to make sure that any adult in the population has a chance of making it into the sample. This is the basis underlying any probability method: establishing a mechanism so that every element in the defined population has a known probability of selection; that is, every person has some chance of being selected to participate. Now, in some probability methods every person has an equal probability and in some they have non-equal probability: all that is necessary to ensure representativeness of the population is that it is a known probability and that no one is left out.

One common way pollsters conduct probability based sampling is to use random digit dialing so that every household in the United States with a working telephone becomes the sampling frame. But, since different households have different numbers of adults living within them, the pollster must also randomly select an adult to interview in each of the randomly selected households. One way to do this is to ask to speak to the adult who had the most recent birthday (birthdays are randomly distributed). If, instead, the pollster just took whoever happened to answer the phone, the sample would not be truly representative of the whole population because not every adult would have had a chance to be in the sample, only those who answer the phone. Furthermore, the sample would most likely overrepresent women and older people who are often at home more than men or younger adults.

The key advantage to a probability based sample is that we can calculate how likely it is that the findings from the sample accurately represent the full population. That is, we can calculate the margin of sampling error which is basically the “price we pay” for not interviewing every member of the population. (Read more about the margin of sampling error.)

But first it is important to know what a non-probability based sample “looks like.” Basically, whenever a sample is chosen without specifying the target population and without knowing the probabilities of selecting a given respondent, it is a non-probability sample. The challenge with these types of samples is that there is absolutely no way to know how well (or how badly as the case may be) such a sample does at representing the views of the total population. Internet-opt in surveys – a “take this poll” or “tell us what you think” box on a website - are perhaps the most obvious examples. In these cases, the pollster has no idea who is responding to the question. The “sample” is people who happened to be on that website and decided to answer the questions. There is no way to know who answered and who skipped and no way to know what the “total population” would be. Mall intercept surveys, American Idol voting, and call-in polls are other examples. 

This information was developed by AAPOR as part of a comprehensive online journalism polling course created in partnership with NewsU, a project of the Poynter Institute and funded by the Knight Foundation. The course launched  September 2007.