The leading association
of public opinion and
survey research professionals
American Association for Public Opinion Research

Good Samples

Almost every poll involves a sample of people drawn from a relevant population, the one to which the researcher wants to make an inference. Using probability theory, these samples will represent the relevant population if every member of the population has a known, non-zero chance of being selected. This is not the same as having an equal chance of being selected, a special version of this requirement.

Samples are drawn from a sample frame, a list of all members of the relevant population. This could be a list of registered voters, for example, for a pre-election poll. A good frame is complete (includes everyone), has no extraneous elements (those not in the relevant population), and no duplicates (people listed more than once).

By these criteria, a telephone directory is not a good frame for a telephone survey. It does not include all adults (those without a phone are missing, as are those with unlisted numbers or those who have only a cell phone); sometimes it has extraneous numbers (businesses may be listed in the white pages, especially doctors, lawyers and Realtors); and it might also include more than one number for a person or family. And a phone book is out-of-date as it does not contain people who recently moved into an area and it lists numbers for those who have moved away.

One kind of frame used for pre-election polls is a list of registered voters. These are kept and sold by local election officials; they are not national lists. A list of registered voters could be a good frame for a pre-election poll. It has to be supplemented with phone numbers, obtained from other lists and sources.

However, for many populations -- such as all adults in the United States -- there is no such list. To overcome these frame problems, the typical telephone poll uses a Random Digit Dialing (RDD) sample in which a computer generates the phone numbers from known area codes and prefixes. In addition to correcting for all the problems of the phone book, this also gives every phone number the same probability of selection, a procedure called a simple random sample.

When these numbers are called, the interviewer has to determine how many adults live in that household; this relates to the probability of selection of any individual. An adult living alone has a probability of 1.0 of being interviewed; an adult living with another has a probability of 0.5 of being interviewed. This person should be selected at random, and there are a variety of ways to do this (often the “most recent birthday” method is used). Even though each household has the same probability of being selected, the individual respondents do not. That depends on how large the household is and how many telephone numbers come into the house.

Another modification to the basic sample design involves using stratification. If you want to make a comparison between two groups, it's most efficient if the two samples are about the same size. A pollster would never stratify by sex, for example, because any sample would have about the same number of men and women in it. But if racial differences in attitudes or candidate preferences were a focus, the sample might be stratified on race to produce about the same number of interviews with whites and blacks. In a national sample in the United States, this means oversampling blacks and undersampling whites, based upon their relative proportions in the adult population. This sample could not be used to estimate the outcome of an election, however, unless the data were weighted to put these two groups back into their correct proportions.

Other types of sample frames, such as membership lists, can be used to draw a sample of a particular group. For example, the American Medical Association makes available (for a fee) a list of physicians in the United States. Many researchers use this list to draw a random sample of doctors to conduct opinion polls. It is important to know how complete the initial list is to determine how representative a sample drawn from that list will be of the total population of any such group.

This information was developed by AAPOR as part of a comprehensive online journalism polling course created in partnership with NewsU, a project of the Poynter Institute and funded by the Knight Foundation. The course launched  September 2007.