Kristen Olson, University of Nebraska-Lincoln (Chair)
Jolene D. Smyth, University of Nebraska-Lincoln (Co-Chair)
Rachel Horwitz, US Census Bureau
Scott Keeter, Pew Research Center
Virginia Lesser, Oregon State University
Stephanie Marken, Gallup
Nancy Mathiowetz, University of Wisconsin-Milwaukee
Jaki McCarthy, National Agricultural Statistics Service
Eileen O’Brien, US Energy Information Administration
Jean Opsomer, Westat
Darby Steiger, Westat
David Sterrett, NORC at the University of Chicago
Jennifer Su, SSRS
Z. Tuba Suzer-Gurtekin, University of Michigan
Chintan Turakhia, SSRS
James Wagner, University of Michigan
Table of Contents
- Introduction
- What is Happening with Telephone Surveys?
- Who Transitioned a Survey and Why?
- Motivation and Consequences of the Transition
- Examples of Surveys Transitioning from Exclusively Interviewer-Administered to Self-Administered or Mixed-Mode Designs
- Roadmap for this Report
- Coverage and Sample Design
- Single Frames
- Use of Multiple Frames
- Use of Nonsurvey Data
- Summary and Takeaways
- Within-Household Selection and Screening of Respondents
- Household Rostering with One Stage of Selection
- Household Rostering with Two Stages of Selection
- Any Adult, Most Knowledgeable Person, or Head of Household
- All Adults
- Age/Order Selection Methods
- Last Birthday and Next Birthday
- Respondent Selection and Sample Representativeness
- Unique Issues in Transitioning from One Mode to Another
- Summary and Takeaways
- Questionnaire Design
- Overview of Relevant Major Mode Features
- Device Differences within Web Modes
- Additional Questions That Are Particularly Hard to Transition
- Questionnaire Features That Are Hard to Transition
- Collection of Biomeasures, Environmental Samples, Interviewer Observations and Consent for Administrative Record Linkage
- Summary and Takeaways
- Testing Strategies for Getting Questionnaires and Other Materials from One Mode to Another
- Expert Reviews
- Cognitive Interviews
- Web Probing
- Usability testing
- Field Tests
- Experiments
- Packages of Testing Strategies in Surveys that Transitioned
- Tools to Evaluate Questionnaire Features
- Summary and Takeaways
- Recruitment, Nonresponse, and Operational Issues
- Modes of Contact in Self-Administered and Mixed-Mode Surveys
- Modes of Response in Self-Administered and Mixed-Mode Survey Transitions
- Longitudinal Surveys and Transitions to Self-Administered Modes
- Adaptive/Responsive Designs
- Designing Contact Attempts
- Incentives
- Tracking Contacts in All Modes
- Sample Composition
- Unique Issues in Transitioning Surveys from Interviewer-Administered Modes to Self-Administered Modes
- Summary and Takeaways
- Data Preparation, Processing and Management
- Introduction
- The Importance of Transparency
- A Note on Data Quality Control
- Data Capture and Integration
- Classification and coding
- Review and Validate
- Editing and Imputation
- Weighting
- Finalizing Data Files
- Special Considerations: Longitudinal Data
- Summary and Takeaways
- Survey Estimation
- Null Hypotheses to Test in Telephone vs. Self-Administered Data Comparisons
- Assumptions Made by Single Mode and Mixed-Mode Surveys about Mode-Specific Biases
- Diagnosing and Adjusting for Measurement and Selection Errors in Mixed-Mode Surveys
- Analytic Approaches to Diagnose and Adjust for Selection and/or Measurement Error
- Summary and Takeaways
- Costs
- Factors that Might Contribute to Changing Costs
- Differential Costs Between Modes
- Costs per Complete Versus Sample Size
- Costs for Bridge Surveys
- Timeline as Costs
- Summary and Takeaways
- Human Subjects Issues
- Obtaining Informed Consent
- Protection of Personally Identifiable Information
- Mandatory Reporting of Respondent Abuse or Harm to Self or Others
- Handling Respondent Distress
- Known Adult Respondent
- Summary and Takeaways
- Communicating the Impact of the Change of Modes
- How Do You Talk to the Public and Data Users about a Break in the Time Series?
- How Do You Communicate about a Break in the Time Series to the People in your Agency or Organization?
- What Information Do We Need to Provide to Data Users on Data Files about Mode of Contact and Participation?
- Conclusion
- References
Since the 1970s, telephone methods have been a ubiquitous way of collecting large scale surveys. This has been especially true for studies with complex questionnaires, surveys requiring screening for special populations, and those requiring smaller area geographic estimates. With the changing environment for telephone surveys, an increasing number of surveys are transitioning from telephone to combinations of multiple modes for both recruitment and survey administration, where phone may be only one of a number of modes that are used, if at all. Survey organizations are conducting these transitions from telephone to mixed modes with only limited guidance from existing empirical literature and best practices. This Task Force report is written with the goal of helping the survey research field navigate these challenges by examining what surveys have done in this transition, what is known, and where open areas are for additional insights and research.
To accomplish this goal, this task force reviewed existing methods reports, technical advisory panel reports, peer-reviewed literature and survey practices to develop a set of best practice recommendations for organizations transitioning ongoing phone surveys to self-administered and/or mixed-mode surveys, as well as identify needed areas of research. In this report, we provide a “lay of the land,” examining which modes are considered for use and being used when telephone surveys are transitioned to mixed-mode surveys, as well as their relative strengths and weaknesses. The goal of the report is not to provide an overview of how to do mixed-mode studies in general, but rather, we specifically focus on issues that emerge when transitioning existing telephone surveys to mixed-mode surveys (and thus requiring potential breaks in time series). In this way, this report is designed to help AAPOR members and other survey researchers bridge between the address-based sampling (ABS) task force report (Harter, et al. 2016) and the Future of General Population Telephone Surveys task force report (Lavrakas, et al. 2017).
In this report, we evaluate issues related to sample design, household selection and/or screening for eligible respondents, and coverage of different frames and selection approaches; questionnaire design and language of administration; nonresponse and survey operations; survey estimation, including issues related to weighting and measurement error when combining data from multiple modes; and costs. We did this through three approaches. First, we conducted an extensive review of the literature, examining published articles, technical reports, conference presentations, and internal reports conducted by members of the Task Force or their organizations. Second, we reached out to the greater AAPOR community via AAPORNet and asked for any description, papers, or documentation about surveys that had transitioned from telephone to self-administered or mixed-mode approaches or were thinking about making this transition. Finally, we conducted a convenience sample survey (described below) of the AAPOR Community to get more general insights into survey organizations, reasons behind making these transitions.
1.1 What is Happening with Telephone Surveys?
Traditional telephone surveys use a mix of landline random digit dial (RDD) and, recently, cell phone RDD samples. Although landline surveys omitted households without telephones in their homes, in the US, this has been traditionally around 3-4% of the population (Blumberg and Luke 2018). To select a landline RDD sample in the US, area codes traditionally were assigned to specific geographic areas – plain old telephone service (POTS, or households), commercial use, mobile use, or mixed; and RDD samples are drawn from sets of phone numbers called banks, often defined by the last two to four digits of a telephone number (647-555-xxxx) that have been assigned for household use. Because of the operational inefficiencies in traditional RDD sample designs, a great deal of research in the 1990s led to list-assisted landline RDD designs that improved on the efficiency of the RDD design (Casady and Lepkowski 1993; Brick, et al. 1995; Tucker, Lepkowski, and Piekarski 2002). These designs used directory listings to identify 100-banks (i.e. the last two digits of phone numbers in exchanges assigned to residential service) that had listed numbers in them for stratification purposes, sometimes dropping telephone numbers that were in 100-banks with no listed numbers (unlisted banks) altogether for increased operational efficiency. In the 1990s, the proportion of the population that was in unlisted banks was quite small (less than 4%), and they were not significantly different from the rest of the population on many characteristics, with the important exception of being more mobile (Brick, et al. 1995).
In the 2000s, cellular numbers and alternative telephone services (for example, Voice over Internet Protocol or VoIP and cable companies offering telephone service) grew notably. As shown in Figure 1, although the percentage of adults and children with no telephone service at home has remained relatively steady since 2003, the percentage of adults and children in households with access only through a cellular number has skyrocketed from about 3% in the early 2000s to 56.7% of adults and 67.5% of children in late 2018 being in households that only have a wireless number (Blumberg and Luke 2019).
Figure 1.1: Percentage of Adults and Children with Cellular Telephone Service Only and No Telephone Service, 2008-2018, National Health Interview Survey
Source: Early Release Reports on Wireless Substitution,
https://www.cdc.gov/nchs/nhis/releases.htm#wireless
The widespread use of cell phones had major implications for how telephone samples were designed and for operational efficiency. First, the number of listed landline numbers decreased and the proportion of households in banks with zero listings rose (Fahimi, Kulp and Brick 2009). This led to further declines in the efficiency of stratification for list-assisted designs. Second, it became necessary to include cellular RDD frames in telephone samples through dual frame designs (Lavrakas, et al. 2017). Although cellular RDD frames are functionally similar to landline RDD frames, there are no directory listings for cellular telephone numbers, reducing the efficiency with which cellular samples can be worked and limiting how sample designs can be drawn.
[1] Further, post-survey adjustment weights need to be developed that account for the dual frame approach, a difficult task (Brick, et al. 2011).
Other challenges have mounted, making it implausible to use telephone surveys as the only mode of data collection for many surveys. First, response rates for samples from cell and landline telephone frames have dropped precipitously in many telephone survey designs (Lavrakas, et al. 2017). Second, a strong advantage of a traditional landline RDD frame was that geographic targeting of areas as small as a county or even a ZIP code was quite efficient because telephone companies assigned banks of telephone numbers to specified geographic areas. The ability to target landline RDD samples geographically was somewhat diluted by number portability. A Federal Communications Commission order in 2003 allowed telephone customers to keep either landline or cellular numbers when they move or change telephone service providers (Federal Communications Commission 2016). Third, cellular numbers selected into an RDD sample do not have the same geographical associations as landline numbers, with the closest useful proxy of geography the ZIP code for the billing address (Skalland and Khare 2013; Pew Research Center 2015).
Fourth, a new sampling frame providing reasonable coverage of US addresses was made available that could be used to deliver requests to a general population via postal mail. This relatively new sampling frame, known widely as Address-Based Sampling (ABS), based upon the Delivery Sequence File (DSF) of the United States Postal Service (USPS, see Harter, et al. 2016 for details), is increasingly popular. The DSF is the basis for selecting samples of addresses, corresponding to housing units; institutionalized populations are not covered. Although these lists had coverage issues when initially used in the 2000s, these coverage issues are increasingly reduced through changes in requirements for how addresses are listed (Harter, et al. 2016). As a list of addresses, the DSF can be targeted geographically, although addresses need to be geocoded to add Census geography rather than ZIP codes (Census Tract and Block numbers). To address coverage issues, some surveys have used field staff to update addresses obtained from the DSF (e.g. Lepkowski et al. 2013). However, these issues are relatively inconsequential and the coverage of sampling frames developed from the DSF continues to improve.
These multiple simultaneous changes to the landline and cellular telephone frames and declining response rates have created a perfect storm for survey researchers attempting to measure the household population in the US (and elsewhere in the world). As such, multiple surveys have or are examining transitioning their surveys from a single-mode telephone survey to a self-administered and/or mixed-mode survey, using a combination of mail, web, phone, and/or face-to-face modes of data collection.
1.2 Who Transitioned a Survey and Why?
To understand the current status of surveys that transitioned to a different mode, the AAPOR Task Force on Transitions from Telephone Surveys to Self-Administered and Mixed-Mode Surveys (hereafter, the AAPOR Mixed Mode Task Force) conducted a survey of a convenience sample of organizations that have transitioned one or more surveys across modes, or are planning such a transition in the near future. Participation was solicited on AAPORnet and by personal contacts from members of the Task Force. Data collection began May 10, 2018 and concluded on July 2, 2018.
Representatives of 21 organizations responded to the survey, providing data about a total of at least 25 different data collection efforts. Most of these are specific named studies. Others reported on shifts in the standard data collection mode for the organization. Some of the transitioned studies involve national samples but many are geographically focused and most target special populations (e.g., children, twins, racial and ethnic minorities) rather than the general public.
This survey includes responses from researchers in government, academia, nonprofit organizations and commercial firms, though at least half of the studies are sponsored by government agencies. Most but not all are surveys of populations in the U.S. Nearly all are household rather than establishment surveys (21 of 23 who answered this question). Most are cross-sectional (N=17) rather than panel surveys (N=7). The survey transitions reported in the study began as early as 2004 and about half of them are still ongoing.
1.3 Motivation and Consequences of the Transition
Data quality topped the list of reasons for implementing the transition of modes. A large majority (17 of 22 responding) said that response rates in the interviewer-administered survey were either “extremely” or “very” important in making a decision to transition. Anticipated response rates in the new modes closely followed (15 “extremely” or “very” of 23). Anticipated frame coverage for the new modes matched this level of importance (15). Ten organizations said that demands for greater precision, such as lower standard errors at the same level of cost, were either extremely or very important.
Table 1.1 Why transition?
Number of respondents choosing each response |
|
|
|
|
|
|
|
|
|
|
Extremely important |
|
Very important |
|
Somewhat important |
|
A little/
not at all important |
Response rates to the interviewer administered survey |
|
12 |
|
5 |
|
2 |
|
3 |
Anticipated response rates to the self-administered of mixed-mode survey |
|
10 |
|
5 |
|
4 |
|
4 |
Anticipated coverage for the self-administered or mixed-mode studies |
|
9 |
|
6 |
|
3 |
|
6 |
Costs for interviewer administered survey |
|
9 |
|
2 |
|
3 |
|
5 |
Coverage of the frame of the interviewer administered survey |
|
8 |
|
3 |
|
5 |
|
6 |
Anticipated costs for the self-administered or mixed-mode survey |
|
8 |
|
2 |
|
4 |
|
6 |
Desire for greater precision/ lower standard errors / different estimation strategy at lower or same costs |
|
6 |
|
4 |
|
4 |
|
7 |
Client demands |
|
4 |
|
9 |
|
3 |
|
7 |
Sponsor or funding agency demands |
|
3 |
|
6 |
|
3 |
|
7 |
Source: AAPOR Mixed Mode Task Force survey of organizations that have transitioned a survey across modes
The actual outcome of the transition on response rates was mixed. Out of 17 respondents who answered a question about this, seven said rates increased, five said they decreased and five said they stayed the same.
Mixed-mode approaches are often implemented in order to reduce costs, beginning with the most cost-effective contact modes (e.g. self-administered mail) and following-up with more costly contact modes (e.g. face-to-face interviews) to improve response rates (de Leeuw 2005; Dillman, Smyth, and Christian 2014). As such, survey costs were important to many respondents (13 wanted to reduce them, 4 to keep them constant) as a motivation for the transition. And most of the respondents (13 of 19 answering) said that the mode change had reduced the costs. Only one said that the new mode was more expensive than the interviewer-administered survey. Three said the costs are comparable.
Client demands also played a role, with 13 reporting them as either “extremely” (N=4) or “very” (N=9) important. Sponsor or funding agency demands followed (9 extremely or very important).
An open-ended question about lessons learned yielded positive suggestions and assessments about the process. Respondents stressed the importance of close attention to design elements and to test thoroughly. One said it was “win-win initiative” while another said their client was very pleased with the results. But some reported that the mode effect was much larger than anticipated and another said the process to convert questionnaires to mixed modes is lengthy. Surveying a bilingual community presented particular challenges to one study.
1.4 Examples of Surveys Transitioning from Exclusively Interviewer-Administered to Self-Administered or Mixed-Mode Designs
The report contains multiple examples from a wide range of surveys that transitioned from interviewer-administered to self-administered or mixed modes. We provide examples throughout the report from our review of published literature, technical reports, websites, and conference presentations. Early general population surveys examining the possibility of transitioning away from interviewer-administered modes (whether they did or not) occurred in the early 2000s (e.g., Cantor, et al. 2005; Link, et al. 2008; Bailey, Grabowski, and Link 2010; DiSogra, Dennis, and Fahimi 2010), coinciding with the advent of the address-based Delivery Sequence File’s availability as a sampling frame (Iannacchione 2011; Harter, et al. 2016). Many of these early surveys included phone as part of the mixed-mode approach (e.g., Murphy, Harter, and Xia 2010; Brick, Williams, and Montaquila 2011; Jans, et al. 2013). Current work now includes mail and web in the mixed-mode approaches, with some surveys using probability-based web panels or nonprobability opt-in panels as the self-administered mode replacing the telephone survey (e.g., Breton, et al. 2017; American National Election Studies 2018; Brown, et al. 2018; Ghandour, et al. 2018; Penn State Harrisburg, 2018). There are additional surveys that have not fully abandoned telephone but have incorporated it into one of the possible modes of recruitment and/or data collection along with mail and web (e.g., Amaya, et al. 2018).
Surveys that have transitioned (or studied transitioning) from interviewer-administered to self-administered or mixed modes include both community surveys and large scale, national surveys covering a wide variety of topical domains. Thus, it is not only surveys about one topic that have moved to a self-administered mode from an interviewer-administered mode, or only for special populations.
1.5 Roadmap for this Report
In this report, we examine the various design features that need to be considered when transitioning from telephone to a self-administered or mixed-mode survey. In doing so, we also review issues related to coverage and sample designs (Chapter 2), within-household selection (Chapter 3), questionnaire design and measurement error (Chapter 4), testing of questionnaires and other materials (Chapter 5), recruitment methods, nonresponse, and operations (Chapter 6), data preparation and processing (Chapter 7), and survey estimation (Chapter 8). We also address what is known about survey costs when transitioning from telephone to different mode(s) (Chapter 9), human subjects issues that change when transitioning modes (Chapter 10) and communicating the impact of the change of modes to the public (Chapter 11). We focus on issues related to transitioning from telephone to other modes; we cite relevant more general mixed-mode survey literature where appropriate.
Return to Top
A key element to consider in any survey is the population that is covered in a sample frame. The issue of the population to which a frame allows inference is critically important when considering transitions from telephone to self-administered or mixed-mode surveys. While the sample frame can restrict or facilitate the survey mode, sampling frames and modes are distinct.
Because survey inference depends on the frame from which the sample is drawn, in ideal circumstances, a perfect frame that can be used to draw inference to all members of the target population exists, independent of the mode of recruitment or data collection. Ideally, the frame has a one-to-one correspondence to the target population. Unfortunately, perfect frames often do not exist, yielding inferences based upon the sample selected for a survey which may differ from the initially desired targeted population. Differences between the respondents and target population can be due to imperfections of the frames or errors due to nonresponse or measurement associated with particular recruitment or interview modes permitted by information on the frame.
As surveys transition from telephone to self-administered or mixed modes of data collection, one important question is whether the population covered by a frame is also changing with the mode switch, and whether a changing frame also shifts the target population. In our survey of organizations and surveys on transitions in modes, 20 respondents indicated that the target population of interest for the survey did not change when the mode changed. However, 3 respondents indicated that the target population changed, generally from a more restrictive target population (telephone households; special populations) to a more inclusive target population (all households; all adults).
In many cases, contact information on the frame limits potential mode choices, either for recruitment or survey completion. For example, landline Random Digit Dial (RDD) samples simultaneously make inference to landline telephone households possible, and limits (at least without augmentation for general population surveys) the mode of data collection and contact for the full sample to telephone interviews. Postal addresses can be identified for a subset of these landline telephone numbers on the RDD frame through identifying a telephone directory listing, but the use of a mailed survey component is then limited to these landlines with listed addresses. Data collection based on other frames, such as an address-based sampling frame, may begin with a different form of contact information (using addresses) and then merge additional contact information (telephone numbers) for a subset of the cases, allowing for the implementation of a mixed-mode survey (e.g. telephone and mail). Other sampling frames such as a list sample for an organization, may have contact information needed for multiple modes and allow survey designers the flexibility to choose which modes of contact and interviewing to apply. In settings where no single frame provides adequate coverage, multiple frames might provide the best approach to obtain the most complete coverage of the population and the most cost-efficient sample design, possibly requiring multiple (or different) modes of data collection for each frame.
In this chapter, we examine the use of both single and multiple frames that have been used when transitioning surveys from telephone to self-administered or mixed-mode surveys. We start by reviewing studies that used single frames, followed by those that used multiple frames.
2.1 Single Frames
Identifying the frame or frames that are available in order to be able to draw inferences to a target population of interest is the first step to obtaining a sample, no matter what mode is used in contacting sample members for data collection. A number of commonly used frames for general population surveys include a frame of all landline telephone numbers (both RDD and list assisted), a frame of cellular telephone numbers, a dual frame combining cellular and landline telephones, and a frame of addresses provided by survey vendors using information from the US Postal Service. Lists of identified persons or other sampling units are used for a variety of surveys, generally focused on more specialized populations such as registered voters.
The strength in population coverage and ease of sample designs that use the address-based Delivery Sequence File (DSF) have made it a commonly used frame for general population surveys that are transitioning from interviewer-administered to self-administered or mixed modes of data collection. Yet the studies that have used the DSF for a sample frame when transitioning to a different mode are far from uniform in design, modes, and approach.
Some studies that transitioned from telephone or interviewer-administered to self-administered modes that use the Delivery Sequence File as the frame use one or more self-administered modes of mail, web, or web with a mail follow-up to recruit and collect data from individuals (e.g., DiSogra, Dennis, and Fahimi 2010; Brick, Williams, and Montaquila 2011; Montaquila, et al. 2013; Jackman 2015; Kitada 2016; Lesser, et al. 2016; DeBell, et al. 2017). Use of the DSF as a sample frame with only self-administered modes may use the frame as-is or append information for stratification or potential recruitment purposes. For example, Lesser, et al. (2016) describe a series of mode experiments comparing a transportation survey that had traditionally been conducted using a stratified RDD sample with telephone data collection to stratified random samples from the DSF frame with mail-only and web+mail modes of data collection. For both frames, the strata in each mode were defined by geographic regions in the state of Oregon. DeBell, et al. (2017) also used the DSF frame, excluding drop point addresses where bundles of addresses receive mail together (more likely in urban areas, DeBell, et al. 2017, p. 5), and a simple random sample was selected from the list of US addresses, for the American National Election Studies. In order to examine alternative methods for recruiting individuals for participating in a web version of the American National Election Studies, information on names and other potential characteristics of sample units was purchased from a vendor and matched to the DSF frame. This match allowed the mailing to be targeted to individuals living at particular housing units rather than simply sent to the family at a housing unit or “resident,” but this additional information was not used for sample selection within the household. Most cross-sectional studies transitioning to self-administered or mixed modes from telephone do not use cluster sampling as including face-to-face interviewing as one mode in these mixed-mode surveys is rare. In one example where cluster sampling is used, Biemer, et al. (2018) used the DSF frame, excluding drop point addresses, to draw an equal probability cluster sample in the US when experimentally examining the transition of the Residential Energy Consumption Survey (RECS) from an in-person interview to a web/mail questionnaire.
Using the DSF as a sample frame does not preclude using the telephone as one of the modes of contact or data collection in a mixed-mode survey. In these studies, phone numbers are matched to some addresses from the DSF. Telephone attempts are made to sample units with matched phone numbers, and mail is used to request telephone numbers from those who were not successfully matched. Thus, use of the ABS frame may still use telephone as the primary data collection mode, but with a mixed-mode recruitment attempt. Montaquila, et al. (2013) call this the
ABS Phone-Based Model (p. 67). For instance, Allison, Stevenson, and Kniss (2014) transitioned the Wisconsin Family Health Survey from a landline RDD stratified sample to a mixed-mode stratified ABS sample. Addresses were matched to a name and phone number (52% of addresses yielded a phone number); those without a phone number match were mailed a short mail questionnaire requesting a telephone number.
The California Health Interview Survey (CHIS) has examined transitioning from an RDD frame to an ABS frame over a number of years. For an initial test conducted in 2012-2013 (Jans, et al. 2013), two majority Hispanic communities in California that varied in known characteristics of interest were selected. Households were selected from the U.S. Postal Service's DSF, and addresses were merged with telephone directory listings to obtain landline telephone numbers for matched addresses. All households were sent a paper screener questionnaire in both English and Spanish to obtain the household’s home and cell phone telephone number(s), even for addresses for which a merged telephone number was available, as well as demographic characteristics to determine eligibility. Across the two communities, 19% returned a mail screener with a phone number; phone numbers were matched to 37% of the households who did not provide a phone number. Households that completed the screener questionnaire or who had a matched telephone number then were called for a telephone interview. Kali and Flores Cervantes (2016) used a very similar protocol in California as a pilot of the 2013-2014 CHIS in Sonoma County. Here, telephone numbers were matched to 48% of sampled addresses from the DSF address-based frame, and the sampled addresses that were not successfully matched to a telephone number were sent a short screener questionnaire to collect telephone numbers (see also Brick, et al. 2013 and Kali and Flores Cervantes 2016 for similar designs). In 2018, the CHIS used ABS and surname list frames to push respondents to a web survey in three California counties. Rather than requesting a telephone number for all respondents, respondents were requested to complete an English-language web survey or to call the CHIS directly to complete the survey in a language other than English. Addresses with matched telephone numbers were followed up with telephone calls.
This mixed-mode data collection approach can be used in conjunction with field interviewers. For instance, Mayfield, et al. (2015) used a similar approach for a geographically-targeted and child-age specific sample in Los Angeles, adding in-person interviews for those addresses that did not return a mail screener. Sterrett, et al. (2015), examining New York and New Jersey neighborhoods affected by Superstorm Sandy, used an ABS sample to send an initial mailed invitation to participate in a web survey. Those with a matched telephone number, about 45% of the sample, were followed up with telephone calls. Remaining nonrespondents were followed up with field interviewers.
Samples drawn from lists of special populations (list samples) often provide more flexibility in the combination of modes that can be used. For instance, Parast, et al. (2018) compared telephone, mail, and a mixed-mode mail with a telephone follow-up approach as both recruitment and data collection modes for a list sample of caregivers for individuals who had been in hospice (see also Mathews, et al. 2017 for a list sample of emergency department patients with a web survey component along with mail and telephone). Lien (2015) compared phone-only with a mixed-mode web+phone survey for individuals who called into a Tobacco Helpline. Lykes and Meyers (2017) had a sample frame of new vehicle purchasers with a combination of three modes used for either recruitment or survey administration for a survey on auto quality – a paper mailed invitation to complete a web survey and following up with nonrespondents by telephone. Atkeson, et al. (2011) use voter registration files in New Mexico and Colorado as the frame, sending postal mail letters to sampled persons to complete a web survey, information on how to request a mail survey for those who did not wish to complete the web survey, and a follow-up mail survey to a subset of nonrespondents.
Some survey organizations are turning from the probability-based RDD samples to web-based non-probability samples, either through web panels or social media analysis (Baker, et al. 2010; Murphy, et al. 2014). Rather than use a sample frame with known coverage and a means for assessing probabilities of selection, non-probability samples often use advertisements or commercial partnerships to invite a large segment of the public to participate in a survey/join a panel. Non-probability sample providers try to recruit as many people as possible (rather than specifically selected cases from an existing frame). Non-probability samples then use matching, calibration, and/or post-stratification weighting to external benchmarks to achieve their desired coverage or representativeness for a target population (Wang, Rothchild, Goel, and Gelman 2015; Elliott and Valiant 2017; Mercer, Lau, and Kennedy 2018). As such, errors related to each stage of representation - coverage, selection, and participation - are confounded in nonprobability web samples, making it difficult to identify exactly which error source is at play when estimates deviate from “true” values (Tourangeau 2019).
For instance, the Center for Survey Research at Penn State Harrisburg recently transitioned their omnibus telephone survey of Pennsylvania residents to an opt-in nonprobability web panel with quota sampling after their call yield fell from 14.4% of calls reaching a person in 2012 to 4.5% of calls reaching a person in 2017 (Penn State Harrisburg Center for Survey Research 2019b). The quotas are set by age, sex, and region of the state; the data collection approach also includes a variety of data quality checks to establish residency in the state of Pennsylvania, exclude bots, and respondents who are skipping over questions. In making this transition, the survey was able to increase sample size from n=600 telephone completes to n=1000 web completes, with similar estimates reported on a variety of topical domains and demographics (Penn State Harrisburg Center for Survey Research 2019a, b). Other approaches blend telephone samples or other high quality surveys with nonprobability samples, using statistical adjustments to select, calibrate and/or combine the nonprobability sample with telephone or other high quality survey estimates (e.g., Ansolabehere, Schaffner, and Luks 2017; Mercer, et al. 2017; AP 2018; Dutwin 2019). AP VoteCast, a survey of the American electorate in the 2018 midterm elections conducted by NORC at the University of Chicago for The Associated Press and Fox News, used a calibration approach featuring multilevel regression and post-stratification models to combine 40,000 interviews from probability samples of registered voters with about 100,000 interviews with registered voters from non-probability online panels. In addition, SSRS has developed a “hybrid” sample that blends data from their Omnibus telephone survey and opt-in nonprobability panel that reduces the cost per sampled case with reported similar estimates on a variety of leisure activity domains (Dutwin 2019). In Spring 2019, Rutgers-Eagleton Poll and Farleigh Dickinson announced a partnership to compare and combine telephone polling data with online probability sample and nonprobability web sample data (Jenkins and Koning 2019).
How well estimates from nonprobability web samples represent the full population compared to an RDD survey varies by estimate, study, and nonprobability sample provider. For instance, Ansolabehere and Schaffner (2014) compare estimates on homeownership, cigarette smoking, and voting in the 2008 election, among others, from a YouGov nonprobability sample, a dual frame telephone survey, and a mail survey with an unknown frame (a “list provided by a data vendor”, p. 287), using propensity weights for each survey that account for standard demographics (age, race, sex, education) as well as political ideology and voter registration status, finding lower Mean Squared Error of estimates relative to national benchmarks for the nonprobability and mail samples compared to the telephone sample (although all were relatively low). In a review of pre-elections polls for the 2016 Presidential Election, Kennedy, et al. (2018) found that opt-in internet polls had error rates similar to live interviewer RDD polls. Yeager, et al. (2011), in contrast, found that across seven different non-probability internet samples and a variety of topics including health-related topics (e.g., cigarette smoking, alcohol consumption), holding a driver’s license or US passport, comparing to benchmark estimates, primary demographics (e.g., age, race, sex), and secondary demographics (e.g., marital status, homeownership, income) almost all of the unweighted estimates were less accurate than those from a probability-based web sample and less accurate than an RDD-phone sample (all surveys had a sample size of around 1000 respondents), and that the weighted estimates did not dramatically improve the accuracy of the nonprobability samples. MacInnis, et al. (2018) replicated the Yeager, et al. findings, showing again that the probability-based web sample and RDD sample were more accurate relative to benchmarks than estimates from six different nonprobability samples. Dutwin and Buskirk (2017) also found that low-response rate RDD surveys were more accurate than two different nonprobability samples on a variety of cross-classified demographic variables. Kennedy, et al. (2016) compared estimates on a wide variety of topics including volunteerism, internet access frequency, health, having a driver’s license, among others, from nine nonprobability sample vendors with a probability-based web panel. They found tremendous heterogeneity in the composition of the samples and accuracy of the estimates relative to national benchmarks across the various nonprobability samples, with some having more accurate estimates than the probability-based web panel and some having less accurate estimates. Thus, there is mixed evidence on the quality of nonprobability samples as an option as a frame and sampling method for transitioning away from RDD surveys.
An additional frame option used by those transitioning from telephone to self-administered or mixed-mode is a probability-based web panel (Blom, et al. 2016; Bosnjak, Das, and Lynn 2016; DiSogra and Callegaro 2016). Some surveys use existing probability-based web panels that were developed and built by another company, and pay a fee to conduct a survey based on the number of panel members invited to participate and/or the number of minutes of respondent time (e.g., a nonexhaustive list includes Ipsos Knowledge Panel, AmeriSpeak, Understanding America Study, American Life Panel in the US; LISS in the Netherlands, German Internet Panel and GESIS Panel in Germany). Other organizations that need frequent surveys have built their own probability-based panel. For example, the Pew Research Center transitioned their regular dual-frame RDD telephone surveys to the probability-based web-based American Trends Panel, providing internet access to non-internet households. Initially, American Trends Panel participants were recruited via an RDD request from a dual-frame sample design; in 2018, this recruitment changed to a mailed survey request selected using a stratified address-based sample from the DSF (Keeter 2019; Pew Research Center 2019). In Germany, the GESIS panel sample was developed using municipal registers (Bosnjak, et al. 2018). Then, a high-intensity recruitment effort was undertaken, including face-to-face interviewing. Subsequent waves of the panel were administered either via web or paper. Likewise, NORC’s probability-based AmeriSpeak Panel, which uses NORC’s address-based national sample frame, incorporates phone, mail, and face-to-face interviews during the panel recruitment and then administers subsequent surveys to the panelists via phone and web (Bilgen, Dennis, and Ganesh 2019; Dennis 2019).
2.2 Use of Multiple Frames
In some studies, multiple frames may be necessary to improve coverage and more efficiently survey the population. Use of multiple frames may not necessitate using multiple modes. For example, dual-frame telephone surveys can be used to combine landline and cellular phone numbers, with optimal allocation of sample to each frame (Lohr and Brick 2014), but the mode of data collection (telephone) is consistent across devices (landline and cell phones).
Some surveys transitioned from an RDD frame to a multi-frame design to gain efficiencies in data collection. The entire sample can be (optimally) allocated and interviewed across two frames . For example, one frame that produces a sample with good coverage properties but also has high data collection costs (possibly due to low eligibility or due to costs associated with contacting sampled units), and a second frame that produces a lower quality sample in terms of coverage but has less expensive data collection costs (possibly due to high eligibility or ease of contact). Multi-frame designs can also be simply used combining addresses available on a list frame with an ABS sample to supplement missing portions of the population from the list in order to obtain a representative general population sample.
For example, in recent studies conducted by the National Oceanic Atmospheric Association, the Coastal Household Telephone Survey (CHTS) transitioned from a landline RDD in a limited subset of counties to using a dual frame mail survey design in order to estimate fish catch in four states (Brick, Andrews, and Mathiowetz 2016).In this study of anglers, one frame used a list of state licensed anglers, with an expected higher rate of respondents who had fished. This list frame does not have complete coverage of all anglers, as not all anglers need to purchase a license, leading to coverage issues for this frame. An ABS sampling frame provided high coverage of the states’ population but had a lower chance of obtaining an angler.A sample of addresses was selected from the ABS frame to supplement the list sample. The angler list was merged with the ABS sample, and the addresses that did not merge with the list frame were subsampled at a lower rate than those that did merge with the list frame. This approach provided a more efficient design with better coverage as compared to using only one of these frames.Similar creative solutions may be useful when other rare subpopulations are of interest. For instance, addresses with household members who may speak a particular language (e.g., Spanish, Korean) may be identified through a compiled surname listing (e.g., Zuckerberg and Mamedova 2012; Brick, et al. 2013; Wells, et al. 2018).In a field test to transition the California Health Interview Survey from dual frame RDD to a mixed-mode web+phone ABS sample, Wells, et al. (2018) used Spanish, Korean, and Vietnamese surname lists to potentially identify non-English speaking households in three counties in California.
2.3 Use of Nonsurvey Data
Two reports were released by the National Academies of Sciences, Engineering, and Medicine in 2017 examining the potential for improvement in federal statistics through the use of alternative sources of data, including both government and private-sector sources. The first report discussed the multiple types of additional data sources, such as federal and state administrative data, electronic health records, web scrapings, credit card transactions, satellite images and sensor data (National Academies of Sciences, Engineering, and Medicine 2017a). The second report assesses alternative approaches for implementing procedures that would combine diverse data sources from both government and private-sector sources (National Academies of Sciences, Engineering, and Medicine 2017b). Although the National Academies reports focused on the use of these alternative data sources for estimation purposes, they can also be used to improve the efficiency of sample frames. For example, the 2016 National Survey of Children’s Health transitioned from RDD to an ABS design using the Census Bureau’s Master Address File as the frame. The Census Bureau identified potential addresses with children and those who receive social security benefits using administrative records, as well as information about the poverty level of the block group, to stratify the addresses on the frame (U.S. Census Bureau 2018a, b; Ghandour, et al. 2018). Invitations to a web survey with mail survey follow-up were then sent more efficiently based on this stratification. As noted in the National Academies reports, more research and development is needed to evaluate these sources of data for stratification and estimation purposes.
A number of researchers outside the federal government are also exploring the availability of data provided by commercial vendors who assemble data from multiple sources, such as credit reporting agencies, magazine subscriptions, property records, and so forth (Harter, et al. 2016; Couper 2017). Commercial data are incomplete, but available for a large proportion of households (Pasek, et al. 2014; West, et al. 2015; Harter, et al. 2016). Given their imperfections, commercial data have been used in a dual frame approach or to stratify the population into groups likely to be eligible and likely to be ineligible (Valliant, et al., 2014; Brick, Andrews and Mathiowetz 2016). These commercial data can allow for additional modes of contact to improve response rates and coverage. For example, Link and Burks (2013) appended commercial data to identify housing units with young adults, in particular racial/ethnic groups, in block groups with particular demographic characteristics, and those that are matched to a telephone number to evaluate different mixed-mode strategies combining web and mail.
2.4 Summary and Takeaways
2.4.1 Many surveys that have transitioned from telephone to self-administered or mixed-mode approaches have used the Delivery Sequence File alone or in combination with list frames.
2.4.2 Use of the Delivery Sequence File as a frame does not preclude use of telephone as one of the modes in a mixed-mode survey.
2.4.3 Sample designs using the DSF for self-administered and mixed-mode data collections are often simple random samples or stratified samples. Cluster samples for these types of surveys are rare.
2.4.4 Although nonprobability web samples can be used when transitioning a telephone sample to a self-administered or mixed-mode survey, their use is not (yet) ubiquitous. Those that use nonprobability web samples all require high quality census data or probability-based samples (often collected via telephone or some other mode(s)) for purposes of sample selection or post-survey adjustment. Incorporating the use of nonsurvey data with probability-based surveys is an important area of future research.
Return to Top
A challenging decision related to sampling when moving from telephone to self-administered or mixed-mode surveys is how to select respondents within a household. Without an interviewer to assist the respondent, the selection decision moves out of the hands of the survey organization and into the hands of the sampled household. As such, respondent selection methods vary by mode of data collection, ranging from full probability-based to quasi probability-based methods to non-probability-based methods. Probability-based methods minimize selection bias, but require knowledge of eligible persons within the household, can be intrusive, and may result in higher nonresponse. In an interviewer-administered mode, an interviewer can assist with implementing probability-based methods, including those that require a household roster. In self-administered modes, however, the rostering of a household must be completed by a household informant. To put control over selection back in the hands of the survey organization, in some surveys, the selection process of rostering and selecting an individual is often separated into two steps – the household completes a roster, sends it back to the survey organization, and the survey organization selects the sampled person. These methods increase the length of the survey field period, resulting in higher survey costs and potentially lower response rates. Quasi- and non-probability-based methods have some level of selection bias, but they tend to be less burdensome on respondents, have higher response rates, and be more cost-efficient compared to probability-based methods (e.g., Marlar, et al. 2018).
Survey organizations make decisions about which respondent selection methods to use depending on the mode of data collection, target population, and information available on the frame. As a survey moves from telephone to self-administered modes, the within-household selection method may also change. In our convenience sample of surveys that are transitioning from telephone to self-administered or mixed modes, 32% reported that the method of respondent selection changed when modes changed, and 40% of the respondents reported that the survey screens for special populations. Screening for special populations does not necessarily occur in two steps of household selection – only half of the surveys reported using two steps for selection screening in special populations such as children, teens, or individuals with a particular characteristic.
3.1 Household Rostering with One Stage of Selection
Full rostering methods are common in face-to-face surveys and in some telephone surveys (Gaziano 2005; Smyth, Olson and Stange forthcoming). Among probability-based methods, the Kish (1949) household roster method involves enumeration of all adults in the household by sex and age, with random selection of one respondent. After a household roster has been constructed, follow-up questions may be required to reduce household coverage error. Although this method ensures that every eligible member in the household has equal probability of selection, it imposes respondent burden and increases the likelihood of nonresponse, particularly for telephone surveys. Rizzo, Brick and Park (2004; we will refer to this as the Rizzo method) proposed a modified version of household rostering for use on the telephone where respondents are initially asked how many adults currently live in the household, and then randomly generates a selection based on the number of adults in the household. Full household rosters (or any other selection method) are used only in households with three or more adults in instances where the phone answerer is not randomly selected to be the respondent. At the time of development of this method, only about 15% of households in the United States required full rosters. Beebe et al. (2007) compared the Rizzo method of respondent selection to the “next birthday” method, finding the same average number of attempts to interview for each method (5.6 attempts for the last/next birthday method v. 5.7 attempts for the Rizzo method), but showed a lower refusal rate and higher response and cooperation rates for the birthday method.
The Kish method is not generally used in mailed invitations for self-administered surveys. With a move to web-push designs in which the sampled household completes a roster online, the selection method with a full roster may look like the Kish method in the web mode. We know of only one implementation of a Kish method of selection for adults in a small scale mail survey, with no details provided as to how it was actually implemented by the respondent (Reich, Yates, and Woolson 1986). Gallagher, Fowler, and Stringfellow (1999) evaluated the use of a modified Kish selection procedure to identify a randomly selected child as the target of a questionnaire completed by parents. In this procedure, adults were asked if they had any children who met an eligibility criteria (aged 17 or younger and on a health care policy), and then were asked to list children’s names, ages, and sex from oldest to youngest. They then had one of six different selection instructions printed under the grid directing them to the targeted child about which to answer questions. Although this seemed complex, it yielded identical response rates to a condition in which the child was pre-selected by the researchers.
Web surveys may ask household informants to complete a household roster to select respondents within the household. Bosa, Gagnon and Caron (2017) report that the “standard” method for web surveys at Statistics Canada was that a household informant completes a full roster online, with an individual randomly selected from the household roster to be the selected respondent. In an experiment for a pilot for the Canadian National Travel Survey, the response rate for this full roster was 13.6% (statistically lower than that for a last birthday and age-order method), with only about 3% of respondents selected incorrectly (statistically better than the last birthday and age-order methods). As part of a larger experiment (detailed below), the recruitment pilot test for the 2016 American National Election Studies (ANES) included a method akin to a modified Rizzo-Brick-Park method for households that did not match to commercial databases (DeBell, et al. 2017). Respondents were asked to report the number of adult citizens in the household, and one person was randomly selected from this number. If the individual completing the screener was selected (roughly 2/3 of sampled households), the survey continued. If the individual completing the screener was not selected (roughly 1/3 of sampled households), then the screener respondent completed a household roster to identify the appropriate respondent from the remaining adults in the household.
3.2 Household Rostering with Two Stages of Selection
In a self-administered context, full household rosters are often used in mail or web/mail surveys with two stages of selection. Looking only at mail surveys, Montaquila, et al. (2013) call this the
ABS Two Phase Mail-Based Model. Examinations of two-stage within-household selection methods may focus on selecting an adult within a household, but often also are interested in selecting a particular subpopulation, such as children, older adults, veterans, people who engage in a certain activity, or those who speak a language other than English. In two-stage within-household selection methods, a screener is first sent to a household to obtain a household roster and limited additional information, and the survey organization uses this information to identify the respondent or focal child/teen/household member, as appropriate for the set of topical questionnaires for that household. The household is then sent a “topical” or “main” survey, containing the survey content of interest for key survey estimates, usually with the instructions for the sampled adult to complete the questionnaire or the household informant to complete the survey for the targeted child, teen, or other household member. When screening for a particular population, other respondent selection techniques might be used as a starting point to identify a screener respondent, and then a screening questionnaire is used to ascertain an appropriate respondent meeting the more restrictive criteria for a given study.
Gallagher, Fowler and Stringfellow (1999) used a two-stage selection procedure in a mail survey to select children among households that subscribed to a healthcare plan. Parents provided a roster of children in the household. The two-stage selection did not statistically reduce response rates compared to pre-selection of a child or use of a modified Kish procedure used by a parent to select a child overall, although this two-stage procedure did reduce response rates when the parent also completed a survey about themselves.
Brick, Williams and Montaquila (2011) and Montaquila, et al. (2013) used a two-phase approach to selecting persons within a household in the mail survey version of the National Household Education Survey (NHES; see also Han et al. 2010 for a two-phase approach to sample veterans). Here, addresses were selected from an ABS frame. Selected households were asked to complete a screener questionnaire, including whether there were any and the number of children in the household and a full roster of the children in the household (name, age, sex, type of school, and year in school). Household members who met the eligibility criteria for the NHES were identified from the screener questionnaire, and a target child was selected to report on in the questionnaire. The 2009 NHES pilot study yielded a 58.7% response rate to the screener, with a 60.6% response rate from the addresses that had only mail attempts. The topical questionnaire yielded a response rate to the mail survey of 71.1% among selected screener respondents. Both of these response rates were the same as or higher than the previous RDD version of this survey.
In 2016, a mixed-mode experiment was added to the NHES in which a subset of sampled addresses was randomly assigned to complete the screener and main questionnaire on the web rather than in the two-stage paper mail survey approach (McPhee, et al. 2018; Wilkinson-Flicker, et al. 2016). Addresses in this condition were asked to complete the screener providing information about all members of the household on the web (the 2016 questionnaire contained a topical questionnaire on adult education), and then were notified automatically about what child was selected to be the focus of the questionnaire (to be answered by a knowledgeable adult) or which adult should answer for their own educational experiences. Nonresponding households assigned to the web were sent a paper questionnaire during nonresponse follow-ups. The weighted screener response rate was 62.1% for addresses initially assigned to web, compared to 67.2% for addresses initially assigned to mail. Over 85% of the selected knowledgeable adults for the children and/or adults completing for themselves who completed the screener online continued to complete the topical surveys online. This was higher than the addresses who completed the screener by paper, either through initial assignment to that mode or through completing the screener during the nonresponse follow-ups.
Brick, et al. (2012) (see also Mathiowetz, et al. 2010) used a two-phase mail approach for surveying saltwater anglers. In addition to interest-getting questions that could also be used for nonresponse adjustment, households were asked to complete a full household roster for all members of the household, identifying the sex, age, race/ethnicity, and number of days fishing during specific months from shore or boat in a specific state (Fishing Effort Survey). Eligible households were sent a survey packet for an identified angler. Marlar, et al. (2017) expanded this two-stage approach for this population, using a mail screener, but incorporating a mailed invitation to participate in a web survey for the topical questionnaire, followed by a mail survey sent to nonrespondents, which was completed online by 68% of respondents in two states.
In the 2016 National Survey of Children’s Health (NSCH), households were asked to complete an online screener questionnaire; households without children simply identified that they did not have any children (Ghandour, et al. 2018; U.S. Census Bureau 2018). Those households with children reported the number of children in the household, the language spoken at the household, and then information for each child in the household, including the child’s name, race/ethnicity, age, sex, English-speaking ability, and a variety of questions on the children’s medical history to identify children with special needs. Households with children automatically had a child subsampled for the topical survey. Nonrespondents to the web survey request then received a mail screener questionnaire. Completed paper screeners were sent back to the survey organization (the Census Bureau) for processing and selection of the child.
The California Health Interview Survey (CHIS) 2018 web experiment pilot used a household roster to identify potential eligible teen and child respondents, although adults were selected using quasi-probability methods (Wells, et al. 2018). The selected adult respondent completed a household roster, and eligible children and teens were identified. The adult answered questions about the selected child and was asked for permission to survey and for contact information, including email addresses and phone numbers, for eligible teens identified from the roster. In the pilot, the adults successfully completed interviews for 79 of 136 eligible children (58.1% unweighted completion rate; 64.9% weighted completion rate), similar to the rate observed in the 2017 telephone-based CHIS (63.7%). Of the 125 eligible teens who were identified, parents provided permission for 38 teens, or 30.4%, similar to that of the 2017 telephone CHIS. Only 12 of the 38 teens completed an interview, for a 14.0% teen response rate, much lower than the 23.4% response rate from the 2017 CHIS.
DeBell, et al. (2017) experimentally evaluated a variety of selection methods for the 2016 ANES pilot study. One of the experimental methods evaluated included a mailed two-stage selection with a web follow-up, where mail respondents completed a mailed two-page screening questionnaire with a roster, and one respondent was randomly selected from this roster (condition 3). Nonrespondents to the mailed screening questionnaire were asked to follow an age-order selection procedure rather than send in a mailed household roster and complete a web survey. Other experimental conditions included a variety of web-based selection methods that incorporated information from commercial sources on names and size of household. Although the response rate to the mailed screener (54%) was higher than that for the web-based screeners (47% or 48%), the overall response rate for the web-based surveys was higher due to a larger drop off at the mailed topical questionnaire (59%) compared to the online questionnaires (83% to 89%).
The 2017 National Household Travel Survey (NHTS) used a mailed “recruitment survey” rather than a screening survey to identify the number of people in the household who would need to complete a one-day travel log (Federal Highway Administration and Westat 2018). The recruitment survey included a number of questions about transportation and the household, as well as a household roster, with a weighted response rate of 30.4%.
3.3 Any Adult, Most Knowledgeable Person, or Head of Household
Early transitions from telephone to mail surveys considered options that did not require an explicit probability model for selecting a household respondent. One of these methods was to allow any adult in the household to participate. For instance, when examining a transition to a mail survey for the Behavioral Risk Factor Surveillance System (BRFSS), Battaglia, et al. (2008) included an “any adult” method of within-household selection experimental condition, stating “This survey should be completed by any adult, age 18 or older, living in your household except a college student living away at school; anyone in a prison, mental hospital or nursing home” (p. 468). The “any adult” method in this experiment yielded a higher response rate than the next birthday and all adults methods (described below), and yielded a somewhat less representative sample based on demographic characteristics.
Another non-probability method to obtain an individual to represent the views of the household involves asking the most knowledgeable person, decision maker, or head of household to participate in the survey. This method is typically used when the research question for the study requires knowledge about a particular issue. This respondent selection technique might involve interviewing multiple members of the household, and the person most knowledgeable about a topic area then provides the responses to the survey questions. For instance, when evaluating whether the Survey of Consumers could be transitioned from RDD to a self-administered mail survey, Elkasabi, et al. (2014) asked for “the head of the household or his or her partner complete the questionnaire” (p. 743). Biemer, et al. (2018) report on a pilot study to transition the Residential Energy Consumption Survey (RECS) to mixed web/mail modes in which a knowledgeable adult was asked to report on the energy use in the home (see also Residential Energy Consumption Survey n.d.). The use of a knowledgeable household reporter is appropriate when inference is made to the household level, but deviates from a probability sample when inference at the adult level is needed.
3.4 All Adults
An alternative method for within-household selection in self-administered surveys is to ask all eligible adults to complete a questionnaire. This can be useful when the goal is to obtain information from multiple people in the household. Here, households are not asked to select a single individual. Rather, all people who meet the eligibility criteria are asked to complete a survey. The costs of this selection method thus are higher in the mail mode due to the extra need to print and mail additional copies of a questionnaire.
Battaglia, et al. (2008) examined whether all adults in a household would complete a BRFSS survey. Households were instructed “This survey should be completed by every adult, age 18 or older, living in your household.” About 33% of households returned at least one questionnaire, but only 85% of eligible adults completed a questionnaire, resulting in an overall response rate of 28%. This was lower than the any adult and next birthday methods in this experiment, but this condition yielded a respondent pool that was more representative of young adults and of males overall. Replicating Battaglia, et al. (2008), Hicks and Cantor (2012) compared the all adult and next birthday methods of within-household selection in a mail survey version of the Health Information National Trends Survey (HINTS). They also found similar household-level response rates for the two methods (35% all adults; 39% next birthday) and that 85% of adults completed the all adult questionnaire, yielding a cumulative response rate for the all adults condition of about 30%.
Medway and Battle (2015) compare the implementation of the National Adult Training and Education Survey (NATES) in which all eligible persons in a household aged 16 to 65 who were not in high school were asked to complete a NATES adult topical questionnaire in a two-phase approach where households returned a screener, and then one adult was selected from the household. They found that the two-stage selection yielded lower response rates for people within the household (79% compared to 96% for all adults), although households participated in some form at about the same rate in the two approaches (between 65% and 69%). Even though all adults were asked to participate, not all adults in the household completed a questionnaire, with full-household participation rates declining as the number of eligible persons in the household increased (about 90% or more for households with 1, 2, or 3, adults; 16% for households with 4+ adults).
Brick, Andrews and Mathiowetz (2016, p. 385) collected information on all adults in the household, but did not require these adults to answer for themselves. Rather, in this study of anglers, a proxy reporter was permitted to report on fishing trips made by all household members. Thus, this approach combines an all-adults level of data reported with the “any adult” method of household reporting. This approach is similar to that used in the NHTS in which reports were made for all household members about their travel during a given day, collected either via self- or proxy-report (Federal Highway Administration and Westat 2018). A mailed travel log was to be completed during the day, and then entered into a web survey or reported by telephone for each member of the household; parents were instructed to provide information for children under the age of 16. Conditional on answering the recruitment survey, the response rate for the travel log was 51.4%.
3.5 Age/Order Selection Methods
Selection methods based on the age of household members and their relative position in the household, sometimes incorporating gender of the respondent (e.g., youngest male/female; youngest male/oldest female methods; Troldahl and Carter 1964; Hagan and Collier 1983), have been used in telephone surveys and have seen some use in self-administered and mixed-mode surveys. Age-order methods are non-probability-based respondent selection approaches commonly used for phone-based surveys typically involving short field periods. On the telephone, the interviewer asks first to speak to “the youngest male (or female), 18 years of age or older, who is now at home” where the gender is based on a rotation. If no adult male (or female) is at home, the interviewer asks to speak to the youngest female (or male), 18 years of age or older. This method targets not only younger age groups, it also targets respondents at home. The distribution of the sample therefore is heavily dependent on when the calls are made. This method also assumes that adults in the household identify with binary gender categories, something that may not be universal across all members of a household.
The age-order approach has been used as surveys transition from telephone to self-administered or mixed-mode. In these modes, the focus is not on adults at home at the time of the call, but all adults who live in the household. In a mailed survey, the wording can be quite complex and requires multiple recruitment letters to fully reflect all of the combinations of age-order in the household (see example wording below in Table 3.1). Some approaches start by asking for the number of members of the household, and then provide guidance to the respondent based on the number of adults in the household, similar to the approach used by Gallagher, Fowler and Stringfellow (1999) in their modified Kish selection of children. Other approaches simply inform the householder which person matching a particular age-order and/or sex combination has been selected for the survey.
Table 3.1. Example wording from age-order selection methods in self-administered surveys
Study |
# letters |
Example Wording |
Bosa, Gagnon and Caron (2017) |
6 |
Oldest adult: “Who should complete this survey?
•If you are the only person in your household who is 18 years of age or older, you have been selected to participate in the survey.
•If your household has two or more members18 years of age or older, the oldest member among them has been selected.”
3rd oldest adult: “Who should complete this survey?
•If you are the only person in your household who is 18 years of age or older, you have been selected to participate in the survey.
•If your household has two members 18 years of age or older, the older member of them has been selected.
•If your household has three or more members 18 years of age or older, list those members in order of oldest to youngest.
1.________ 2.________ 3.________
The third person on the list has been selected.” |
|
|
|
DeBell, et al. (2017) |
|
We would like to ask the [oldest/youngest] [male/female] in your household who is 17 or older their views on a variety of topics related to life in the United States today. If there is no [male/female] there, then we would like the [oldest/youngest] [male/female] who is 17 or older to take the survey. The mail version of the survey is over, but the survey can still be completed online in the next two days. |
|
|
|
Olson and Smyth (2014) |
6 x 2 modes |
Mail: To make sure we hear from all different types of Nebraskans, please share this letter with the <oldest/youngest/second youngest> adult (age 19+) <sex> in the household and have them complete the enclosed questionnaire.
Web: To make sure we hear from all different types of Nebraskans, please share this letter with the <oldest/youngest/second youngest> adult (age 19+) <sex> in the household and have them go to the website listed below to complete the questionnaire. |
|
|
|
Olson, Stange, and Smyth (2014) |
2 |
Oldest adult: In order to make this study more scientific, we ask that the enclosed survey be completed by the adult (age 19 or older) in your household who is the oldest adult in your household.
Youngest adult: In order to make this study more scientific, we ask that the enclosed survey be completed by the adult (age 19 or older) in your household who is the youngest adult in your household. |
|
|
|
Wells, et al. (2018) |
6 |
Condition 2C:
Step 1: Identify who should complete the survey
How many adults, 18 years of age or older, are in your household?
One adult: You should complete the survey.
Two adults: The older adult should complete the survey.
Three or more adults: List the three oldest adults in order from oldest to youngest. The third person on the list should complete the survey.
1.________ 2.________ 3.________ |
In an experimental comparison of web, mail, and mixed web/mail surveys, Olson and Smyth (2014) found no statistical difference in selection accuracy in an age/order selection method across modes of data collection, with between about 15% and 20% of respondents inaccurately selected among respondents to a previous survey. Olson, Stange, and Smyth (2014) compared the youngest adult and oldest adult selection methods as part of an experimental evaluation of within-household selection methods in a mail survey. The oldest adult method yielded a response rate of 37.4%, compared to the significantly lower response rate of 32.0% for the youngest adult method. Across all households, about 30% of the selections were inaccurately made in the oldest adult procedure and about 35% were inaccurately made in the youngest adult procedure, not statistically different. Bosa, Gagnon, and Caron (2017) mailed letters identifying the adult by age position in the household, yielding a 20.4% response rate, which was higher than that for a full roster, with 13% of selections inaccurately conducted, but significantly better than the last birthday method. Bosa, Gagnon, and Caron report that the age-order method will be used for two surveys in Canada (the National Travel Survey and the Canadian General Social Survey). The CHIS piloted a web-based instrument with an age-order selection method as one of the experimental conditions for within-household selection of an adult in the household (Wells, et al. 2018). This condition yielded a weighted response rate of 13.6%. The CHIS included a household roster that allowed them to evaluate the accuracy of the selection within the household; 30% of the respondents were inaccurately selected overall. DeBell, et al. (2017) used an age-order selection for nonrespondents to a screener in one experimental treatment for the 2016 American National Election Studies pilot.
3.6 Last Birthday and Next Birthday
The quasi-probability “last birthday” and “next birthday” methods are commonly used for both telephone and self-administered surveys. The two versions of next and last birthday differ as to whether the person who had the most recent birthday (the “last birthday”) or will have an upcoming birthday (the “next birthday”) is the selected person and asked to complete the survey. As such, telephone surveys that initially used this method do not need to transition to a different method when incorporating a self-administered questionnaire. The birthday methods avoid intrusive questions required for household rostering, but do not deliver fully randomized respondent selection due to anchoring the selection birthday date to a field period. If the month of birth was randomly assigned, then all persons would have an equal chance of selection. For instance, one could imagine randomly selecting a month within a year, and assigning that month as the “eligible” month for birthday closest to the month for a sampled household. We know of no studies that have done this approach.
Although surveys commonly state that they are using a next birthday method, how the method is implemented varies across studies. As shown in Table 3.2, some of the next birthday methods define the age for an adult (e.g., 18 years of age or older). Some embed the birthday selection within a questionnaire asking for the number of adults in the household first. Others provide some justification for the birthday method (e.g., “In order to make sure we get response from a random sample of people…”). Others provide more directions, including the need to follow the selection instructions only if more than one person lives in the household. Still others link the next birthday date to a particular calendar date.
Table 3.2. Example wording from birthday selection methods in self-administered surveys
Study |
Which birthday |
Example Wording |
Battaglia, et al. (2008) |
Next |
This survey should be completed by one adult living in your household.
1. How many adults, age 18 or older, live in this household? Note: Please include yourself.
__ __ Number of adults
Not counting
- college students living away at school
- or anyone in a prison, mental hospital or nursing home.
If only one adult lives here, that person should complete the survey.
If more than one adult lives here, the one with the next birthday should complete the survey. |
|
|
|
Bosa, Gagnon and Caron (2017) |
Last |
Who should complete this survey?
The person in your household who had the most recent birthday, and is 18 years of age or older, has been selected to participate. |
|
|
|
Hicks and Cantor (2012) |
Next |
- Is there more than one person age 18 or older living in this household? Yes/No
- Including yourself, how many people age 18 or older live in this household? __ __
- The adult with the next birthday should complete this questionnaire. This way, across all households, HINTS will include responses from adults of all ages.
- Please write the first name, nickname, or initials of the adult with the next birthday. This is the person who should complete the questionnaire. _____
|
|
|
|
Westat (2013) |
Next |
In order to make sure we get responses from a random sample of people, we ask that the adult in your household with the next birthday complete and return this questionnaire in the next two weeks. |
|
|
|
Westat (2018) |
Next |
In order to make sure we get responses from a random sample of people, we ask that the adult in your household with the next birthday complete and return this questionnaire in the next two weeks. |
|
|
|
Olson, Stange, and Smyth (2014) |
Last, next |
Last birthday: In order to make this study more scientific, we ask that the enclosed survey be completed by the adult (age 19 or older) in your household who most recently celebrated a birthday.
Next birthday: In order to make this study more scientific, we ask that the enclosed survey be completed by the adult (age 19 or older) in your household will be the next to celebrate a birthday. |
|
|
|
Olson and Smyth (2017) |
Next, Next with cover, Next with confirmation question |
Next: “To assure that we have heard from people of all types, we ask that the adult (age 18 or older) in your household who will have the next birthday complete the enclosed survey.”
Next w/cover: Advance letter: “To assure that we have heard from people of all types, we ask that the adult (age 18 or older) in your household who will have the next birthday complete the enclosed survey.” Additional instructions on the questionnaire cover: “Thank you for your help! Please have the adult age 18 or older in your household who will have the next birthday complete this survey.”
Next w/confirmation q’n: Advance letter: “To assure that we have heard from people of all types, we ask that the adult (age 18 or older) in your household who will have the next birthday complete the enclosed survey.” Additional question on the questionnaire cover: “Thank you for your help! Are you the adult age 18 or older in your household who will have the next birthday? Yes -> Please continue. No-> Please have the adult in your household who will have the next birthday complete the survey.” |
|
|
|
Stange, Smyth, and Olson (2016) |
Next |
No calendar: Please have the adult age 19 or older in your household who will have the next birthday that will take place after July 1st, 2012, complete the questionnaire and return it in the enclosed envelope. Hearing from the person with the next birthday is very important because it ensures that we get responses from all different types of Nebraskans—men and women, the young and old, those who typically read the mail and those who do not.
Calendar: Please have the adult age 19 or older in your household who will have the next birthday that will take place after July 1st, 2012, complete the questionnaire and return it in the enclosed envelope. Hearing from the person with the next birthday is very important because it ensures that we get responses from all different types of Nebraskans—men and women, the young and old, those who typically read the mail and those who do not. We have printed the calendar at the right in case it helps you identify the right person in your household.
Standard: To make sure that our results accurately reflect the opinions of all Nebraskans, we ask that the enclosed survey be completed by the adult (age 19 or older) in your household who will be the next to celebrate a birthday.
Explanatory: Some people like filling out surveys and others do not, but hearing from only certain types of people can lower the quality of our results. To make sure that our results accurately reflect the opinions of all Nebraskans, we need to randomly pick someone within your household to answer the survey. Because the timing of birthdays is pretty random, we can use them to determine who should answer. Please take a moment to think about the birthdays of all the adults (age 19 or older) in your home. Who will be the next to celebrate a birthday? We ask that the enclosed survey be completed by the adult (age 19 or older) in your household who will be the next to celebrate a birthday. To ensure the quality of our results, it is very important that this is the person to complete the survey. |
|
|
|
Wells, et al. (2018, 2019) |
Next,
Next with confirmation question |
Next in cover letter: Please have the adult, age 18 years of age or older, in your household who has the next birthday complete the survey.
Next in cover letter + verification question: Please have the adult, age 18 years of age or older, in your household who has the next birthday complete the survey. Verification question: “Are you the adult 18 or older in your household who will have the next birthday?” |
The birthday methods have received a great deal of attention for how to improve the accuracy of selection, with mostly little success. For instance, in a mail survey, Stange, Smyth, and Olson (2016) focused on the next birthday method, examining two different methods to improve representativeness and selection accuracy. First, they included a calendar as a visual display for birthdays; this calendar had no effect on the demographic composition of the sample, and the calendar yielded less accurate selections (47% inaccurate in 2+ person households) than not including the calendar at all (37% inaccurate in 2+ person households). Second, they examined extensive explanatory language for the importance of the within-household selection method – this also had no effect on the composition of the sample.
Success in improving the accuracy of selection for the birthday methods has been found using a verification question to confirm that the selected respondent is the correct one. Olson and Smyth (2017) focused on the next birthday method in a mail survey, including the instructions (1) in the cover letter alone, (2) on the cover of the questionnaire, and (3) with a verification question to confirm that the selected respondent was in fact the person in the household with the next birthday. The next birthday method including the verification question had a lower response rate, but had the highest rate of selection accuracy out of the three methods. Wells, et al. (2018, 2019) included an experimental comparison of the next birthday method and the next birthday method with a confirmation question in the CHIS pilot. The next birthday method with the verification question yielded the highest response rate of the three methods (15%, compared to 13.9% for next birthday and13.6% for age-order) and a substantial improvement in selection accuracy (10% inaccurately selected, compared to 30% inaccurately selected for the other methods) (this is replicated statewide in Wells, Hughes, Park, and Ponce 2019). It is unknown whether the verification question improves selection accuracy for other selection methods.
3.7 Respondent Selection and Sample Representativeness
3.7.1 Demographic Composition Differences Depending on Within-Household Selection Method
Most of the studies examining within-household selection compare alternative methods of selecting an individual from a household
within a mode, rather than
across modes.
Surveys with a household informant do not require randomly selecting an eligible person within the household. Thus, respondent-level characteristics may differ from benchmarks simply because the process of identifying the household respondent in a self-administered survey (e.g., opening the mail, reading the cover letter, completing the survey) is not random and because self-selected mail survey respondents and self-selected interviewer-administered survey respondents may be, on average, different. For instance, the National Pilot test for the RECS examined four different self-administered mode combinations with a knowledgeable adult respondent in each of the mode treatments. Across the four treatments, “renters, older housing units, single-person households, and apartments” were underrepresented, especially in the mode protocol that only offered web as the mode of completion (Biemer, et al. 2018, p. 18).
The Wisconsin Family Health Survey (FHS) is a household-level survey of Wisconsin households, collecting information on all household members (Allison, Stevenson, and Kniss 2014). In 2012, the FHS switched from an RDD survey to using an ABS frame with a goal of finding a cost-effective method of identifying cell-phone only households, and started by matching addresses to listed landline phone numbers. For addresses that did not match to a landline phone number, a short mail screener was sent, with the goal of obtaining a phone number to complete a full survey, including cell-phone only households. The FHS obtained 30% of their interviews with respondents on a cell phone. Among those households who were identified in the part of the ABS frame that was not matched to a landline telephone number, 65.3% were cell-only. Among in the part of the ABS frame that was matched to a landline telephone number, 7.8% were cell-only households. The “unmatched” sample included significantly younger respondents, respondents who were more likely to be single, more likely to be non-white, have children, live below the poverty level, and rent.
Examinations of one-stage within-household selection methods generally compare quasi-probability methods of selecting an adult within a household. As summarized by Smyth, Olson and Stange (forthcoming),
studies of within-household selection methods in telephone surveys or in self-administered surveys found few demographic differences across within-household selection methods within each of those modes. These studies generally compared any adult, next birthday, all adults, and an age-gender position selection method (e.g., youngest adults; oldest female). Including a demographic characteristic in the selection procedure (e.g., youngest male) yielded more people with that characteristic in the sample in both telephone and self-administered modes (Marlar, et al. 2014; Olson, Stange, and Smyth 2014). Additionally, mail surveys tend to overrepresent older adults, adults with higher levels of education, and non-Hispanic white adults relative to population benchmarks (Smyth, Olson and Stange forthcoming). In a web-only survey, Bosa, Gagnon, and Caron (2017) found that the age-order selection method yielded a respondent pool that was more similar to age/sex distributions in Canada than a household roster, and was similar to the last birthday method.
Stange, Smyth and Olson (2016) found no difference in demographic composition for respondents to the next birthday method when they received additional explanatory instructions about the importance of the within-household selection method or not. They also found no difference in demographic composition for those who received a calendar embedded in the cover letter to help with the selection task. Similarly, Olson and Smyth (2017) found no differences in demographic composition over three different placements of instructions for the next birthday method.
When using a two-stage approach to select households, the target population often changes from a general population survey to a special population. For example, the NHES uses a two-stage approach to screen households for the presence of children in certain age ranges (Zukerberg and Mamedova 2012; Montaquila et al. 2013). In pilot work that compared a 2009 national mail survey with the 2007 RDD survey conducted two years earlier, Brick, Williams and Montaquila (2011) found that the mail survey garnered more lower income and renter households, but fewer Hispanic households, differences that the authors attribute to coverage of cell-only households and English-only screening instruments in the pilot.
The NSCH also uses a two-stage approach to identify children overall and those with special health care needs in particular. A nonresponse bias analysis of the 2016 NSCH found that screener respondents were from Census Block Groups or Tracts that had slightly higher socioeconomic status and had slightly more white respondents than the full sample, and that this trend continued for topical respondents.
3.7.2 Responses to Survey Questions by Respondent Selection Methods
Few studies examining one stage within-household selection in self-administered or mixed-mode surveys within a survey across different methods have found notable differences in key survey estimates. For instance, Battaglia, et al. (2008, Link et al. 2006) found that eight health variables showed no differences across three respondent selection conditions. Similarly, Hicks and Cantor (2012), examining 24 variables in the transition of the HINTS to mail found that only two differed significantly between the all adult and next birthday selection methods, differences that disappeared after weighting adjustments. Stange, Olson and Smyth (2016) found no difference in item nonresponse rates or substantive estimates about social attitudes or about trees and forests across different methods of implementing the next birthday method in two different mail surveys.
Olson and Smyth (2017), examining three alternative methods of implementing the next birthday method in a mail survey, found no difference in item nonresponse rates across the three experimental treatments. They also found that survey estimates related to household tasks (e.g., being a mail opener, paying bills) differed significantly across accurately and inaccurately selected respondents across the within-household selection methods. This suggests that estimates on topics for which household members differ in perceptions or behaviors are likely to be the most sensitive to within-household selection method differences.
3.8 Unique Issues in Transitioning from One Mode to Another
Transitioning a survey from an interviewer-administered mode such as phone to a self-administered mode poses unique sets of challenges.
Handoff issues. The first issue is the hand-off to another selected adult in the household. For phone, an interviewer is able to monitor the hand-off to the selected respondent. For self-administered modes, researchers have to rely on the respondent to follow the respondent selection instructions and therefore the risk of respondent self-selection is higher. It is consistently found that
accuracy of within-household selection is lower in larger households (Battaglia, et al. 2008; Olson and Smyth 2014; Olson, Stange, and Smyth 2014; Olson and Smyth 2017; Smyth, Stange, and Olson forthcoming),
generally because the handoff to a respondent other than the initial informant is considered to be challenging. Battaglia, et al. (2008) report that respondents to a mail survey who were not those with the next birthday in the household completed the survey because “the person with the next birthday did not want to fill out the questionnaire” (p. 466). Brick, Andrews, and Mathiowetz (2016) avoid this issue by requesting proxy reports for all members in a household, but this may not be feasible or desirable for all studies.
Mode of screener. There are few studies that experimentally examine mode differences for completion of screening instruments separate from completion of the overall questionnaire, largely because this requires using a two-stage screening design for within-household selection.
Those that do examine screener mode differences tend to find that mail screening instruments tend to yield higher completion rates than other modes of data collection. For example, in an experimental comparison of a two-stage sample of veterans, the response rate for a paper screener was almost 7 percentage points significantly higher than that of a web screener, and the effective coverage rate of the target population was significantly better (Han, et al. 2010). In the telephone-based 2007 NHES, the screener response rate was 53%, compared to a screener response rate of 69% for a mail-based NHES in a 2011 Field Test for the same survey (Montaquila, et al. 2013). The NHES experimentally compared web and mail modes of collecting information for the screener in 2016, and the mail screener response rates was about 5 percentage points higher than the web screener response rate (McPhee, et al. 2018; p. 87). Amaya, et al. (2015) experimentally examined screener completion rates for a sequential mixed-mode telephone-mail survey from an address-based sample matched to telephone numbers in six US communities targeting particular racial/ethnic groups. The experimental treatment in which mail surveys were sent initially with a phone follow-up yielded a higher screener completion rate (48.7%) than the experimental treatment where phone attempts were followed up by a mail questionnaire (44.8%), with notably more mail screeners being completed than telephone screeners. The conditions for the ANES recruitment pilot that included screening by mail were higher (54%) than those that screened via a web instrument (about 48%), but the response rate for the topical survey was lower for those who were screened by mail and asked to go online for the main survey compared to those who were screened and interviewed all online (59% topical response rate for mail; >81% topical response rate for web survey) (DeBell, et al. 2017).
Screener incentives. With a two-stage selection approach, an open question is whether to and at what level to provide incentives for completing the screener questionnaire.
In the existing experimental comparisons, prepaid incentives improve response rates to mailed screening questionnaires. Montaquila, et al. (2013) and McPhee (2012) compared two levels of prepaid incentives ($2 vs. $5) for the screener questionnaire in a field test for the 2011 NHES. Unsurprisingly, the incentives increased the screener response rate initially (46.1% responded to the first mailing with a $5 incentive, compared to 39.7% with a $2 incentive), a difference which held even with additional mailings (final response rate $5 68.9% vs. $2 65.0%). This experiment was replicated on the NHES in 2016, with less than a 2 percentage point difference between the two experimental conditions on screener response rates, but notable differences in screener completion rates across groups that varied simultaneously on incentive level and predicted response propensity (McPhee, et al. 2018). Adding a $1 incentive with a screener survey for anglers increased response rates to the screener survey by about 12 percentage points in data collection efforts during the second half of a year compared to data collected during the first half of the year with no incentives (Andrews, Brick, and Mathiowetz 2013). In a one-stage mail survey that screened for anglers, higher incentive levels raised the response rate for the mail survey at a decreasing rate (Brick, Andrews, and Mathiowetz 2016). Prepaid incentives also increased screener completion rates in the web/mail National Survey of Children’s Health ($2: 53.2%, $5: 55.3%) over no incentives (50.3%) (US Census Bureau 2018).
Content of the screener. In a two-stage approach, what information to include in a screening questionnaire above and beyond the questions needed to determine eligibility is an open question, with few studies replicating the same design features or yielding the same results. In the 2009 NHES and 2011 NHES pilot studies, alternative content of the screener questionnaire was evaluated, comparing a short questionnaire that contained the necessary information for determining eligibility, but nothing else (the “screen-out” version) to longer questionnaires that contained information about the topic (the “engaging” and “core” versions) (Brick, Williams, and Montaquila 2011; Montaquila, et al. 2013). Although in the 2009 NHES pilot study, the screen-out version yielded a higher response rate, the engaging version saw a higher response rate in the 2011 NHES pilot. For a two-stage sample of veterans, including one single question inquiring about being on Active Duty significantly increased response rates by 3.4 percentage points compared to not including that question at all, but did not change the effective coverage of veterans (Han, et al. 2010).
Design of the survey mailing package. A number of methods have been used to attract the attention of respondents, sometimes aimed at a particular subgroup of interest, with very little difference in response rates. For instance, among respondents to a paper questionnaire on veterans, including an insert attempting to garner attention to the survey had no effect on response rates, but significantly improved coverage of the target population (Han, et al. 2010). Similarly, Stange, Smyth, and Olson (2019) found that including images of LGB adults and families significantly improved coverage of the LGB population in a one-stage mail survey, with no difference in response rates compared to the “default” condition with no images of LGB adults and families. In the CHIS, a sponsor logo on the exterior of an envelope depressed screener response rates in one geographic area, but not in another (Jans, et al. 2015). There was no difference in return rates when a Health Resources and Services Administration’s Maternal and Child Health Bureau logo appeared in a follow-up mailing compared to a Census logo in the National Survey of Children’s Health (US Census Bureau, 2018b).
Languages. Studies face challenges when administering surveys in multiple languages. In an interviewer-administered mode, the interviewer is able to identify cases with a language barrier, and an appropriate bilingual interviewer is then able to call back the respondent to complete the survey. For self-administered modes, the survey organization either has to have translate the survey into multiple languages or consider other modes for rare language cases.
Self-administered and mixed-mode surveys conducted in multiple languages often include cover letters and survey questionnaires in these multiple languages from the initial mailing. Asking non-English-speaking respondents to call into a language-specific telephone survey is less successful. The NHES conducted telephone interviews in both Spanish and English, totaling about 5% of phone-based screener interviews (Zuckerberg and Mamedova 2012). To transition from telephone to mail, in 2011, the NHES tested an English-only, a Spanish-only, and a bilingual screening form for a nationally representative sample of households and a Spanish-targeted sample, focusing on linguistically isolated Census tracts and individuals with a Spanish surname who lived in non-linguistically isolated Census tracts. The NHES found that the timing of the Spanish screener form affected both response rates and who participated in the screener and the topical survey, recommending that surveys include a Spanish-language screener with an English-language screener in each mailing of the screener to better identify Spanish-speaking households (see also Montaquila, et al. 2013; Brick, et al. 2012). In particular, respondents were more likely to be white and less likely to speak Spanish as the primary language when both English and Spanish screener forms were included in the second mailing compared to when both forms were included in all mailings, but did not differ in parental education, household tenure, or household income. Additionally, among Spanish linguistically-isolated Census tracts, offering a Spanish-language screener yielded more Hispanic respondents, lower levels of education, higher levels of renting, and higher household income than those in an English-only screener. As part of a screener to identify children eligible for a homeschool questionnaire, NHES changed the wording of items identifying whether a child is homeschooled, a concept that was difficult to translate accurately into Spanish (Battle, Megra, and Wan 2017).
In contrast, a pilot study for the National Crime Victimization Survey in Chicago experimentally compared mailing bilingual screening materials (Spanish language and English language screening surveys) versus English-only screening materials to addresses in areas that were not linguistically isolated and did not have a Hispanic surname (Brick, et al. 2013). The response rates were about four percentage points lower for the bilingual screeners, and only 4 respondents completed the materials in Spanish.
The 2012-2013 test for the CHIS contained a Spanish-language and English-language screener form in every mailing, and translated the cover letter from English to Spanish, placing them on opposite sides of the paper (Jans, et al. 2013). The CHIS recently tested a transition from phone to a web-push/phone survey, in which English language questionnaires are initially attempted via the web, and speakers of other languages are asked to call into a phone line to talk with an interviewer who speaks Spanish, Chinese, Korean, Vietnamese, or Tagalog (Wells, et al. 2018). All non-respondents from the initial web-push phase were followed up with a telephone call to attempt a telephone interview when a telephone number was matched to the address. The 2018 CHIS experiment also included surname lists to target Spanish and Korean/Vietnamese households. The surname lists yielded a 5.8% cooperation rate (compared to 9.1% for the ABS sample), and only 11 interviews were conducted as part of this experiment in a language other than English (Wells, et al. 2018).
The NSCH also provided an English and Spanish version of the screener and the topical survey. Spanish-language translations were printed on the back on the invitation letters, and respondents could request a Spanish-language paper screener and topical questionnaire. The web survey included an option to switch between English- and Spanish-language instruments, with about 350 web screeners completed in Spanish and about 250 web topical questionnaires completed in Spanish (US Census Bureau 2018a,b; Ghandour, et al. 2018). These versions were not experimentally varied.
Minors. Research with minors has inherent challenges. Persons under age 18 are a protected class of people by law and interviewing minors comes with stringent consent requirements. Parental consent is required to interview any minor children under the age of 18. This means that the researchers must identify and interview two different people: the parent or legal guardian to obtain consent and the minor. Nonresponse can occur at two places: first with the parent refusing to allow their child to be interviewed and second due to non-contact or refusal by the minor to participate in the survey. For a single-mode telephone survey, for instance, this requires more phone calls, voicemails, and follow-up messages to first speak with the adult to obtain consent and then to reach the minor for whom the parent has consented to be interviewed. The type of telephone can also further complicate data collection, where a parent might be reached on a cellular phone but their 17-year old teenager is more easily reached at a different cell phone number. The preferred language for taking the survey for the parent and their minor child can further pose challenges and increase the level of effort by the survey organization.
Transitioning an existing phone survey to a self-administered (web or mail) mode for research to screen and identify minors faces a unique set of challenges. As with identifying a sample of adults, identifying a sample of children or teenagers can be done in a single-stage or two-stage self-administered survey. Which approach is best depends on whether parents/guardians provide proxy reports for all of their children, a single child, or the child is asked to report for themselves. For example, in the redesigned web and mail-based NSCH, household informants completed a screener questionnaire to identify whether there were any children in the home, including those who met particular survey criteria of having special health care needs or being young. Focal children in the household were then randomly selected from the screening questions, and the adult household informant completed a survey about the child. In the web version of the NSCH, children were selected automatically via the web instrument; in the paper version, a two-stage selection occurred – the household returned the screener questionnaire, and then a topical questionnaire was sent with the identified child in the household prefilled in the questionnaire (US Census Bureau 2018a,b; Ghandour, et al. 2018). This approach for an adult respondent providing proxy reports for children is similar to the design for the NHES (e.g., Brick, et al. 2011; Montaquila, et al. 2013; McPhee, et al. 2018) and the child portion of the web-based CHIS pilot (Wells, et al. 2018).
Difficulty with transitioning to a self-administered mode increases substantially when the minor is a teen who is requested to answer survey questions for themselves. Here, the parent must provide both permission to contact and interview the teen. Difficulty arises when attempting to accurately capture an email address or cell phone number to request that the teen complete an online survey, or engaging a teen respondent who is unaware of or disinterested in research activities and must rely on their parent to inform them about the mail or web survey they are being asked to complete. Questions about data quality could also become an issue if survey research topics are sensitive in nature and parents are present when their children participate in an online survey, without researcher knowledge.
To our knowledge, few studies that have transitioned from telephone to self-administered modes have attempted this difficult task. The CHIS pilot collected data from teens on the web by first asking parents for permission and contact information for a selected teen respondent, and then following up with the teens. Out of the 125 eligible teens, parents provided permission for only 38 of these teens, and completed interviews were obtained from only 12 of them, yielding about a 10% cumulative response rate among the eligible teens (Wells, et al. 2018). Cantrell, et al. (2018) report on an ABS-screener to identify youth and young adults aged between 15 and 21, with 1,293,801 households sent survey invitations to complete a web-based screening questionnaire by a household respondent who completed a household roster. Age-eligible household members were identified and one teen or young adult was randomly selected from the household. Parents provided consent for teens aged 15-17, as well as contact information. Of the 1,293,801 households contacted, 40,464 completed the web screener questionnaire (3.1% of the total sample), 12,882 were identified as eligible (31.8% of the screener completes, Cantrell, et al. 2018, Table 1), and 10,257 completed the questionnaire. The National Survey of Children’s Exposure to Violence (NatSCEV) is conducting methodological research to transition from telephone to self-administered modes, with parental reports for children aged 2 to 11, and self-reports for children aged 12 and older (Brick, Steiger, et al. 2018). Recruiting and successfully interviewing teens in an ABS-sample, self-administered mixed-mode survey requires further investigation.
3.9 Summary and Takeaways
3.9.1 As in interviewer-administered surveys, there is no single method for selecting a respondent within the household in self-administered and mixed-mode surveys.
3.9.2 Surveys that transition from telephone to self-administered or mixed modes may use the same methods of selecting a respondent within a household or may change the methods of selecting a respondent within a household.
3.9.3 Full household rosters are often used in mail or web/mail surveys with two stages of selection. Mailed household rosters are more likely to be completed than other modes for a screener questionnaire. However, household rosters that are completed online seem to successfully transfer respondents to the online instrument at higher rates.
3.9.4 A variety of probability, quasi-probability, and non-probability methods are used in self-administered or mixed-mode surveys with one stage of selection. Selection within a household is often inaccurately made where accuracy can be evaluated; asking respondents to verify that they meet the selection criteria can help reduce inaccurate selections.
3.9.5 Studies of within-household selection methods in telephone surveys or in self-administered surveys found few demographic differences across within-household selection methods within each of those modes, each over- or underrepresenting groups in similar ways. There are few evaluations directly comparing representation of different demographic groups across modes for different within-household selection methods.
3.9.6 Few studies examining one stage within-household selection in self-administered or mixed-mode surveys across different methods have found notable differences in key survey estimates.
3.9.7 The type of information to include in a screening questionnaire above and beyond the questions needed to determine eligibility is an open question, with few studies replicating the same design features or yielding the same results. Similarly, experimental designs examining properties of a one-stage survey vary, with few consistent design features or outcomes.
3.9.8 Prepaid incentives improve response rates to mailed screening questionnaires.
3.9.9 Successful self-administered and mixed-mode surveys conducted in multiple languages include cover letters and/or survey questionnaires in these multiple languages from the initial mailing. Asking non-English-speaking respondents to call into a language-specific telephone survey is less successful.
3.9.10 As with identifying a sample of adults (including special population groups), identifying a sample of children or teenagers can be done in a single-stage or two-stage self-administered survey. Which approach is best depends on whether parents/guardians provide proxy reports for all of their children, a single child, or the child is requested to report for themselves. Parental reports for children occur at about the same rates as in a telephone survey, but more research is needed on obtaining successful cooperation from teen respondents in a self-administered or mixed-mode survey.
Return to Top
Each survey mode is made up of features that affect what types of questions can be asked, other measurements that can be collected, and the quality of these measurements (de Leeuw 2005). Any two modes may have some of these features in common and others that differ. Thus, in transitioning from telephone to self-administered or mixed modes, a major challenge is determining if it is possible to collect the necessary information at the required quality level in the new mode(s) and how to do so. It is also important to consider respondent characteristics that, through their interaction with these mode features, may make surveying in a specific mode more or less difficult or accurate.
We first provide an overview of questionnaire design features that may differ when transitioning from telephone to self-administered or mixed-mode surveys. We then briefly review potential differences in measurement quality for different types of devices within modes (e.g., landline versus cell phone telephone interviews; desktop/laptop computer versus mobile devices for browser-based web surveys) and what is known about surveys that transitioned to a mode including the web. Next, we turn to additional types of questions and questionnaire features that are problematic to transition. We end with discussing collection of biomarkers, environmental samples, and consent to link survey data to other records, as well as our summary and takeaways about questionnaire design when transitioning from telephone to self-administered or mixed modes of data collection.
4.1 Overview of Relevant Major Mode Features
There are three primary dimensions on which modes differ: whether they are interviewer- or self-administered, aural/oral or visual (or both), and computerized or not computerized. These three characteristics, individually and in concert, have implications for how respondents experience a questionnaire and thus the responses they give.
All surveys that transition from telephone to self-administered or mixed modes must transition questionnaires from an interviewer-administered to a self-administered mode, and from an aural administration to a visual administration. Some surveys that transition from telephone to self-administered or mixed modes also transition from a computerized mode to a non-computerized mode.
4.1.1 Interviewer-Administered Versus Self-Administered Modes
Telephone and in-person interviewers can take advantage of the social basis of surveys by listening and/or watching for cues that the respondent is not understanding questions and then providing clarification or by following up inadequate answers with feedback or probes (Schwarz, et al. 1991). Interviewers can also motivate the respondent to complete the survey or answer optimally through probing or other behaviors, and can order the items as presented in the questionnaire. Self-administered surveys do not have these benefits of having an interviewer in administration, clarification, motivation, or order of presentation of items.
In interviewer-administered questionnaires, “don’t know” and “refused” options can be available for respondents without explicitly offering them aloud, accepting a volunteered “don’t know” or “refused” response after an initial probe is unsuccessful. In a web or paper questionnaire, when interviewer presence is not possible, offering a “don’t know” or “refused” response as an explicit response option is the only way to communicate to the respondent that the response is a valid one. Explicitly offering these nonsubstantive response options in self-administered modes often results in a higher selection of them than occurs in interviewer-administered modes where they are accepted on a voluntary basis (Nicolaas and Tipping 2006; Jones, et al. 2015) or even where the options are read aloud (Klausch, Hox, and Schouten 2013). In our survey of survey organizations that transitioned from telephone to self-administered or mixed modes, 4 organizations reported having explicit don’t know responses only in the interviewer-administered mode, 6 used explicit don’t know options only in the self-administered mode, 10 had them in both modes, and 3 reported not using them at all. When nonsubstantive options are not explicitly offered in self-administered surveys, respondents can simply leave items blank, although the analyst has no means of knowing if the respondent didn't know the answer, didn't want to give the answer, or just accidentally skipped a question.
In general, since self-administered modes are normally more prone to item-nonresponse than interviewer-administered modes (Nicolaas and Tipping 2006; Heerwegh and Loosveldt 2008; Heerwegh 2009; Klausch, Hox and Schouten 2013; Breton, et al. 2017)
, surveys experience slightly higher item nonresponse rates when transitioning to self-administered modes. Figure 1 shows examples of average (mean or median) item nonresponse rates before and after mode transitions for the National Household Education Surveys (NHES) and Residential Energy Consumption Survey (RECS) surveys.
Figure 1: Item nonresponse rates by survey mode before and after transitions
Exceptions to the trend of higher item nonresponse rates in self-administered modes may be on sensitive questions (Nicolaas and Tipping 2006; Liu 2018). However, for some sensitive questions, “don’t know” may be the more embarrassing response, resulting in fewer people selecting it (e.g., the number of times one has had sex recently, Olson, Smyth, and Ganshert 2019).
For knowledge questions, a “don’t know” response may be more accurate than a guess or may be a legitimate answer. But, respondents to web surveys may be able to look up answers and thus, transitioning to a web-based mode may have unintended consequences on knowledge items. For example, in the American National Election Studies (ANES), web respondents had higher levels of political knowledge than face-to-face respondents on 10 of 13 political knowledge questions (Liu and Wang 2014; see also Chang and Krosnick 2009 and Ansolabehere and Schaffner 2014 for similar findings on political knowledge). This pattern is attributed to web respondents being able to look up answers for factual questions on the internet (Clifford and Jerit 2016). Fricker, et al. (2005) found higher levels of science knowledge for respondents who participated via web compared to those interviewed via a telephone, and that it took about four more minutes for respondents to complete open-ended knowledge questions on the web compared to the telephone; here, the authors did not attribute the mode differences to information seeking on the internet. Domnich, et al. (2015) found a significant difference in health-related knowledge for timed vs. untimed administration in a web survey for items that were easily searchable on the internet, but no difference on items that were not easily searchable on the internet. A randomized controlled mode of interview experiment conducted by Gooch and Vavreck (2019) as part of pilot research for the ANES found that respondents in the web-based self-administration condition scored better on knowledge questions than those in the face-to-face interview condition. In this study, which was conducted at the CBS research facility in Las Vegas and not at respondents’ homes, only two of the 505 respondents assigned to the self-administered mode had looked up the answers. It is not clear whether these findings translate to a more general population survey. Thus,
surveys with knowledge questions may see an increase in estimated knowledge when transitioning from telephone to web-based self-administered modes.
While interviewers can improve measurement, they can also have negative effects on measurement quality such as when they introduce interviewer bias (i.e., estimates are artificially low or high) or variance into measures. Interviewer bias occurs when responses are influenced by interviewer characteristics such as gender (Groves and Fultz 1985; Catania, Binson, Canchola, Pollack, and Hauck 1996) or race (Hyman, Cobb, Feldman, Hart, and Stember 1954; Hatchett and Schuman 1975; Schuman and Converse, 1971; Krysan and Couper 2003). Biased measurements can also result when the simple presence of an interviewer evokes a social norm such as social desirability (Hochstim 1967; Dillman and Tarnai 1991; Aquilino 1994; Tourangeau and Smith 1996; Tourangeau and Yan 2007; Kreuter, Presser, and Tourangeau 2008; Preisendorfer and Wolter 2014) or acquiescence (Schuman and Presser 1981; Javeline 1999) that changes how respondents answer. In contrast, self-administered surveys can be answered without others hearing the answer, including the interviewer or other household members, which helps minimize socially desirable and acquiescent responding (Schwarz, et al. 1991; de Leeuw 2005). As a result, respondents are more likely to give answers that cast them in a positive light in interviewer-administered than in self-administered modes (e.g., Hochstim 1967; Dillman and Tarnai 1991; Tourangeau and Yan 2007; Kreuter, et al. 2008). When compared to records, self-administered modes generate more accurate reporting of autobiographical sensitive information than interviewer-administered modes (Tourangeau and Yan 2007; Kreuter, et al. 2008; Preisendorfer and Wolter 2014). Additionally, several studies have found that respondents are more likely to agree with items in interviewer-administered than self-administered modes (Dillman and Tarnai 1991; Greene, Speizer, and Wiitala 2008).
Thus,
surveys that transition from telephone to self-administered or mixed modes may see changes in their survey estimates (perhaps with increases in accuracy) related to socially (un)desirable issues or that are subject to acquiescence. For instance, Cernat, Couper and Ofstedal (2016) found that web respondents to the traditionally interviewer-administered Health and Retirement Study (HRS) had higher rates of endorsement of negatively rated items and lower rates of endorsement of positively rated items than interviewer-administered (telephone or face-to-face) respondents in a commonly used depression scale, even after accounting for the latent trait of depression. In a repeated cross-sectional Transgender Acceptance survey transitioned from telephone (in 2017) to web (in 2018) by Langer Research Associates, the percent of respondents reporting being comfortable with transgender people declined 10 percentage points and the percent reporting being uncomfortable increased 17 percentage points with the move from phone to web. The online survey also produced a 12 percentage point increase in reports that students should use the bathroom that matches their sex at birth (Sinozich, et al. 2019).
Additionally, surveys that measure items potentially influenced by interviewer characteristics (e.g., race- or gender-related attitudes) are likely to see changes in response distributions when moving to a self-administered mode because interviewer characteristics will not be a cue for answering. Interviewer vocal characteristics and paralinguistic cues such as interviewer speaking speed also can affect respondent perceptions of interviewers and data quality (Charoenruk 2015; Charoenruk and Olson 2018). How exactly these changes manifest, however, depends on the composition of the interviewer and respondent pool in the interviewer-administered mode. For instance, web respondents in the ANES provided cooler responses to the Feeling Thermometer questions about various political figures and greater endorsement of Blacks and Latinos as lazy and as unintelligent, more racial resentment, and lower ratings on feeling thermometers toward racial groups than face-to-face respondents (Liu and Wang 2015; Abrajano and Alvarez 2019). Similarly, the Pew Research Center found that web respondents reported less satisfaction with their quality of life and they were less likely to indicate that minority groups experienced “a lot” of discrimination than telephone survey respondents (Keeter, et al. 2015). Importantly, there are notable differences in these mode effects on racial attitudes across respondent racial/ethnic groups (Keeter, et al. 2015; Abrajano and Alvarez 2019).
Interviewer variance occurs when different interviewers administer questions in different ways, leading to artificially high variation in respondent answers (Groves and Magilavy 1986; Fowler and Mangione 1990). This is most likely to occur when interviewers have more need or discretion to assist respondents such as on attitude, sensitive, ambiguous, complex, and open-ended questions (Schaeffer, Dykema, and Maynard 2010; West and Blom 2017). For example, Klausch, Hox and Schouten (2013) found less random measurement error, and thus more reliable measurements, on attitudinal items administered via self-administered web or mail surveys than in interviewer-administered telephone or face-to-face surveys in a mixed-mode experiment for the Dutch Crime Victimization Survey. When examining reports of depression in a mixed-mode HRS (face-to-face, telephone, and web), however, Cernat, Couper, and Ofstedal (2016) found no differences in reliability of measurement across modes.
More research is needed to identify exactly how and what variable errors change and on what types of questions when transitioning from a telephone survey to self-administered or mixed-mode study.
4.1.2 Aural versus Visual Stimuli
Interviewer-administered surveys tend to be primarily delivered through oral communication channels. While visual cues such as body language and show cards can also be used in face-to-face surveys, interviewers and respondents have to rely entirely on aural stimuli in telephone surveys (Schwarz, et al. 1991; de Leeuw 2005). This means respondents have to hold the question and any response options in working memory while also generating a response, making such surveys more difficult from a respondent cognitive processing/working memory perspective and leading to more top-of-the-head responses (Schwarz, et al. 1991). This can be particularly difficult for respondents with lower cognitive abilities such as older respondents and those with low education (Krosnick 1991). In contrast, mail and web-based self-administered surveys are primarily visual. For these modes to work, respondents have to be literate enough to read the questions and response options without the assistance of an interviewer (although in computerized self-administered modes audio reading of questions can be offered; Couper 2005). Additionally, self-administered surveys may be preferred or even necessary for people with hearing limitations but may be problematic for those with vision limitations.
One persistent mode effect is that
ordinal scale attitude/opinion items produce more extreme positive responses in interviewer-administered modes, especially telephone, than in self-administered modes (e.g., Tarnai and Dillman 1992; Krysan, et al. 1994; Christian, Dillman, and Smyth 2008; Dillman, et al. 2009; Ye, Fulton, and Tourangeau 2011). For example, Dillman, et al. (2009) found that phone respondents were about twice as likely as mail respondents to choose the extreme positive response option when asked for overall satisfaction with their long distance telephone service. In an evaluation of web versus face-to-face respondents for the ANES, Liu (2018) found higher levels of reports of “favor” on a three point scale for a series of abortion-related attitudes for face-to-face respondents than for web respondents. Keeter and his colleagues (2015) found that phone respondents were less likely to use the extreme negative rating (“very unfavorable”) compared to web respondents when rating high-profile political figures. Several explanations have been offered for this mode effect, including primacy/recency effects, social desirability, acquiescence, differential cognitive processing of information obtained orally versus visually, and reluctance to give negative evaluations to interviewers, although tests of these alternative explanations are inconclusive (Krosnick and Alwin, 1987; Schwarz, Hippler, and Noelle-Neumann 1992; Dillman, et al. 1995; Ye et al. 2011; Dillman, et al. 2014). Whatever the mechanism,
surveys that transition ordinal scale opinion questions from telephone to self-administered or mixed modes will likely see less extreme positive reports on these items.
In addition to the benefits of visual communication, respondents can answer self-administered surveys at their own pace and do not have the social pressure to avoid long silences that might occur for respondents in a telephone survey (Schwarz, et al. 1991). This allows respondents to read questions and response options at their own pace rather than the pace set by the interviewer and have the time needed for recall and answer formation. For example, the American Community Survey (ACS) asks respondents how much they pay in real estate taxes, information that is likely not easily recalled from memory. Seeskin (2016) found that the difference in self-reported property taxes on the ACS versus administrative data were less variable for respondents to the mail questionnaire than for respondents to either the telephone or face-to-face mode, possibly because mail respondents can take the time to look up the information online or locate past statements. In a web test with highly cooperative respondents from the Panel Survey of Income Dynamics (PSID), more than half of the respondents report using records, and those that used records had web interviews that were about 27% longer than those who did not use records, compared to 11% longer in CATI (McGonagle, et al. 2017) Thus,
surveys switching from interviewer-administered to self-administered may see better quality of responses on autobiographical items that can be identified from records (for motivated respondents), although more research is needed to evaluate this hypothesis.
One advantage of web and mail modes is that they allow researchers to take advantage of visual design to more effectively communicate with respondents.
Visual self-administered surveys allow for the use of graphics such as maps, ladders, smiley faces, or thermometers to try to help respondents understand questions that are not possible or very difficult to implement in telephone surveys. For example, in the National Household Transportation Survey (NHTS) transition, researchers were able to capitalize on the visual and dynamic nature of the web by integrating mapping functions (using Google Maps API) for the origin, destination, and shortest path distances of respondent reported trips (Federal Highway Administration and Westat 2018). Likewise, in the RECS, a question about the number of cooktops in the home proved problematic in the web/mail pilot because respondents misinterpreted the item as asking for the number of separate burners. After testing, this problem was resolved by adding a picture of a modern cooktop and wording the question, “An example separate cooktop is displayed above. How may separate cooktops do you have in your home? (Count the entire cooktop, not the number of burners. Do not include cooktops that are attached to an oven)” (Murphy, et al. 2015). The same survey also included images of CFL, LED, and incandescent light bulbs to help respondents accurately report how many of each type of bulb they have in their home. Both of these surveys were able to use the visual communication channel of self-administered modes to improve their data collection.
4.1.3 Computerized versus Not Computerized Instruments
When transitioning from interviewer-administered to self-administered modes, researchers often use a mix of web and mailed paper questionnaires. Although web questionnaires share many of the same features of programmed telephone or face-to-face instruments, paper questionnaires cannot accommodate many of these computer-assisted features. Computerization is important from a questionnaire design perspective because it allows automation and advanced design features.
When transitions involve the use of mail surveys, designers lose the ability to use a package of automation methods to assist respondents in that mode.
Skip patterns are ubiquitous across modes – in our survey of survey organizations that transitioned from telephone to a self-administered or mixed mode, virtually all (19 of 23) had skip patterns in both modes. A computerized questionnaire removes the responsibility for navigation from both interviewers and respondents, automatically taking people to the appropriate next question. Provided the programming is correct, in addition to the benefits of interviewers in helping navigation, automation can virtually eliminate navigation errors and considerably reduce interviewer training and workload. Computerization also can make it much easier to manage topical modules that apply to particular sub-populations.
Although a web survey can easily duplicate routing used in a telephone survey, mail surveys are much more constrained; skip patterns are limited to what can easily be conveyed to the respondent using text and graphics. In addition, respondents on the telephone cannot hear (and on the web, cannot see) what items are being skipped (or easily anticipate that their answers are triggering follow-up questions). In contrast, respondents filling out a mail questionnaire can see every item and may be discouraged from completing a lengthy-looking survey - even if many of the questions would not apply to them - or may choose answers that allow them to minimize the number of follow-up questions that they will receive (i.e., motivated misreporting). Thus, if mail surveys are going to be used at all in a self-administered or mixed-mode survey, the questionnaire may need to be simplified or abbreviated in order to avoid complex skip patterns (Berktold, et al. 2018). For example, the ACS asks questions about marital status for all mail respondents, but implements an age-based skip pattern for telephone and face-to-face respondents (US Census Bureau 2014). Skip pattern errors include errors of omission (not answering items that should have been answered) and commission (answering items that should have been skipped). Such errors may be more prevalent among those with lower levels of education or income or among youth who are less familiar with how to navigate a questionnaire (Redford and Hastedt 2011).
Possible solutions are to eliminate complex skip patterns in the paper version of the instrument, even if this means creating some repetitiveness for respondents or eliminating modules of items that are only reachable through complex skip patterns. The transition of the NHES from telephone to mail, for example, required the simplification or removal of many complex skip patterns that had been built into the CATI questionnaire. In the Parent and Family Involvement in Education component of the NHES, researchers decided it made more sense to move a set of questions about homeschooling into an entirely new topical questionnaire because leaving them in the main questionnaire would have required complicated skips and the questions only apply to about 3 percent of K-12 children (Chapman and Hagedorn 2009).
When responses to items in a set of related questions trigger follow-up questions, an additional consideration is whether to “interleaf” the stem (i.e., filter) and leaf (i.e., follow-up) items (asking the stem and leaf for each item before moving to the next stem) or to ask all stem items first, then move to the follow up items for each endorsed stem (grouped). What works well on phone may not work the same on web/paper. Major interviewer-administered surveys such as the Behavioral Risk Factor Surveillance System (BRFSS) and the Consumer Expenditure Survey (CE) use an interleafed format (Bureau of Labor Statistics n.d.; Centers for Disease Control and Prevention n.d.), whereas other major surveys such as the National Comorbidity Survey uses a grouped format (National Comorbidity Survey n.d.). Kreuter, et al. (2011) found that telephone respondents are more likely to affirmatively answer filter questions when the items are asked in a grouped format rather than interleafed. In the interleafed format, they learn to alter their answers to filter questions in order to avoid follow-up questions later in the interview. In a web format, Mavletova and Couper (2016) also found that, if given an option, respondents will choose a strategy that will minimize their effort. This is a concern for surveys transitioning to mail for some or all of their data collection because the grouped format requires complicated skips that are really only feasible with computerization.
In addition to navigation, computerization opens up possibilities for customizing and personalizing the questionnaire such as by using information from a previous survey (Mathiowetz and McGonagle 2000; Jackle and Callegaro 2008; Lugtig and Jackle 2014), a previous answer in the same survey, or from the sample frame to create personalized routing and/or question wording. Although very effective and widely used in interviewer-administered and web questionnaires, fills are not possible on a paper questionnaire. A mail version of a survey requires more generic item wording, or construction of a version for each fill, which greatly complicates survey production and management. For instance, in the National Survey of Children’s Health (NSCH), computerization is used for skip patterns, range checks, “pick lists,” fills, required responses for screening questions, soft edit prompts, and online help screens in the web mode. On the mail questionnaire, researchers were able to include identifying information taken from the screener about the sampled child (name, initials, or nickname; age, and sex), but were unable to use any of the other automation tools (U.S. Census Bureau 2018b). In our survey of organizations that transitioned a survey from telephone to self-administered or mixed modes, 11 studies reported having fills in both the interviewer- and self-administered modes, 3 in only the interviewer-administered modes, and 6 in neither mode.
While there is not a specific literature on this, questionnaire designers should be thoughtful in handling fill language when converting from a telephone instrument to a paper instrument.
Computerization also allows real-time validation of inconsistent responses within a survey or between two surveys of the same person. Because of the lack of automated consistency checks and edit prompts in mail surveys, irresolvable inconsistencies can sometimes occur. For example, the 2012 NHES data file user’s manual cites reports of children with both birth mothers and foster fathers at home and with age and grade mismatches, such as a 12 year old in 12
th grade or a 17 year old in first grade (McPhee, et al. 2015). There are several ways these inconsistent reports can be dealt with such as treating them as missing, imputing new values, or, as was done in the NHES, leaving them in the data file for analysts to deal with on a case-by-case basis.
Surveys that transition to a computerized self-administered mode can take advantage of dynamic question formats in web surveys such as drag-and-drop questions where respondents can move items around to keep track of order (Blasius 2012), slider (or visual analogue) scales (Couper, Conrad, and Singer 2006; Funke, Reips, and Thomas 2011), and automatic calculation tools that keep running totals of numeric responses (Conrad, Couper, Tourangeau, and Galesic 2005). For example, as noted above, the web-based 2017 NHTS used the Google Maps API to calculate the distance traveled for each trip rather than self-reported distance traveled (Federal Highway Administration and Westat 2018).
Another benefit of computerization is that computerized assistive programs can allow those with vision impairments to still complete self-administered web surveys. This is important given Section 508 requirements that federal agencies ensure accessibility of their surveys for persons with disabilities (
https://www.section508.gov).
Surveys that transition to a computerized self-administered web mode may experience lower item nonresponse rates than those that transition to a mail survey alone. Previous research shows that mail surveys often have higher item nonresponse rates than web surveys (Israel and Lamm 2012; Lesser, Newton, and Yang 2012; Messer, Edwards, and Dillman 2012; Millar and Dillman 2012; Marken, Auter, and Marlar 2018). In web surveys, respondents can be prompted to give a response if they leave an item blank, a practice that has been shown to reduce item nonresponse (DeRouvray and Couper 2002; Al Baghal and Lynn 2015). Moreover, at least one study has shown that such prompting, when done immediately, can reduce item nonresponse to the same level as in face-to-face interviews (Al Baghal and Lynn 2015).
4.2 Device Differences within Web Modes
In addition to considering mode features when transitioning modes, telephone and web surveys have different types of devices that are used within modes to answer surveys. These devices may complicate questionnaire design when transitioning and the resultant measurement error and data quality. Smartphones are sometimes used to answer self-administered web surveys and provide a potentially different stimulus to the respondent than using a computer or laptop. Estimates of the percent of mobile phone web completes vary based on the survey topic and target population, but can be as high as 40% or more (questionpro.com n.d.). In addition, some respondents will answer web surveys on other mobile devices like tablets. The prevalence of mobile device usage in web surveys has led some to argue that all web surveys are mixed-device surveys (Toepoel and Lugtig 2015). The prospect of mixed device web surveys involving small mobile screens has implications for questionnaire design for any survey transitioning to a self-administered or mixed modes that contains a web component.
Although many surveys report collecting paradata on the device used to complete the survey (e.g., McPhee, et al. 2018), few of the studies that transitioned from telephone to a web mode contain indicators in public use data files for whether the survey was completed on a desktop or laptop, mobile phone, or mobile tablet. The PSID Well Being and Daily Life Supplement is an exception. It was collected via mail, web, and telephone, and the public use data file contains an indicator for both mode of interview and, for the web respondents, device used to complete the questionnaire, including whether the survey was logged into multiple times and the devices used for each log in (Freedman 2017). Furthermore, few of the studies that transitioned to self-administered modes that included a web component reported how the design changed on mobile devices and whether data quality differed for those who completed the questionnaire on a mobile device. Additionally, few studies that we reviewed provided screenshots of any part of the web instrument overall or the mobile instrument in particular.
As studies transition to self-administered modes that contain web, planning for questionnaire display and response on mobile devices in addition to that for desktop web instruments is critical. Screenshots of the questionnaire on both web and mobile devices should be captured and reported as part of methodology reports to allow data users to understand differences in questionnaire format and design across devices, and how these differences may have affected measurement quality.
A few surveys report overviews of how a survey was optimized for mobile use. For instance, the ACS, in developing a mobile device version of the questionnaire, conducted multiple rounds of usability testing, providing screenshots of a few items from the usability testing. This testing included evaluating a “mobile optimized” browser, including removing a sidebar that appears on the desktop web to help with navigation, reducing banner width, having larger text and less white space, and increasing the padding between response options. Even with these “optimized” features, usability testing identified several design features where the mobile instruments’ “optimized” version still failed to be usable (Olmstead-Halawa, Nichols, and Myers 2017). The National Survey of College Graduates (NSCG) had 4% of respondents complete the 2015 administration on a mobile device, and also increased font size and padding around response options in the mobile modes (National Academies of Sciences, Engineering, and Medicine 2018).
Empirical literature examining measurement and data quality differences across web devices is growing. Simply fitting the questions on the screen becomes a challenge (Peytchev and Hill 2010) as does minimizing design differences across devices and across modes (Smyth, et al. 2018). The two most consistent device differences reported within this literature are that
mobile respondents (especially smartphone respondents) break off at higher rates and take more time to respond than computer respondents (e.g., Mavletova 2013; Buskirk and Andrus 2014; Keusch and Yan 2016; Lambert and Miller 2015; Antoun, et al. 2017b).
Otherwise, there are few consistent measurement or data quality differences across respondents who answer via desktops/laptops and mobile devices. For example, response distributions tend not to differ across devices provided questions are asked in the same way (e.g., Baker-Prewitt and Miller 2013; de Bruijne and Wijnant 2013; Wells, Bailey, and Link 2014; Keusch and Yan 2016), including for socially desirable questions (Mavletova 2013; Mavletova and Couper 2013; Toninelli and Revilla 2016; Antoun, et al. 2017a). However, some mobile-specific question formats can be more problematic than the corresponding formats used for computers, for example, date pickers on mobile devices compared to month, day, and year drop down boxes on computers (Antoun, et al. 2017). Likewise, reliability and validity of answers have not been found to differ across devices (Sommer, et al. 2016; Tourangeau, et al. 2017; Mavletova, Couper, and Lebedev 2018; Grady, Greenspan, and Liu 2019).
Differences in item nonresponse rates across devices are generally equivocal. Most studies report no difference in item nonresponse rates or the use of nonsubstantive response options (i.e., don’t know, prefer not to answer, etc.) across devices (Mavletova 2013; Buskirk and Andrus 2014; Andreadis 2015; Revilla and Couper 2017; Schlosser and Mays 2017; Toepoel and Lugtig 2014; Tourangeau, et al. 2017; Olson, Smyth, and Phillips 2018). A few studies find small, but statistically significant item nonresponse rate differences, although not always in the same direction (Guidry 2012; Keusch and Yan 2016). Most studies report no differences in the length of responses to open-ended questions (e.g., Buskirk and Andrus 2014; Toepoel and Lugtig 2014; Schlosser and Mays 2017) or longer response on computers (Mavletova 2013; Peterson, et al. 2013; Wells, et al. 2014; Revilla and Ochoa 2016), but one study has reported longer responses on mobile devices (Antoun, et al. 2017a).
Nondifferentiation in battery items depends on both the device used and the format in which the battery is displayed – that is, displayed in a grid versus item-by-item – although there is little replication in which device and display have the highest nondifferentiation rates. Some studies find no difference in nondifferentiation rates across devices (Antoun, et al. 2017a; Revilla and Couper 2017; Tourangeau, et al. 2017; Liu and Cernat 2018; Maveltova, Couper and Lebedev 2018; Olson, et al. 2018; Grady, Greenspan, and Liu 2019). Still other studies find the highest nondifferentiation rates in smartphone grid formats (Baker-Prewitt and Miller 2013; Struminskaya, Weyandt, and Bosnjak 2015; Stern, Sterrett, and Bilgen 2016), and others find the highest nondifferentiation rates in computer grid formats (Peterson, et al. 2013; Lugtig and Toepoel 2016; Richards, et al. 2016). More work is needed here to understand how question content, number of response options (e.g., Liu and Cernat 2018; Grady, Greenpsan, and Liu 2019), type of scale used, and number of items (Mavletova, Couper, and Lebedev 2018; Grady, Greenpsan, and Liu 2019) displayed in the battery affect answering across devices.
In sum, despite much worry about device differences for web surveys, the empirical literature to date has not shown large or consistent differences in the quality of data obtained from mobile versus web respondents. It must be noted that much of this literature is based on studies using volunteer panels and limited to people who have both computers and mobile devices (i.e., excluding those with only one or the other device). Additionally, few studies compare answers from mobile web respondents to responses via a mail survey. As such, general population studies that transition from a telephone mode to a web mode or a mixed-mode study that includes both web and mail may see different response patterns and data quality issues that arise than those found in prior work. To help future research on data differences across devices,
collecting information about the device used to complete the survey through paradata or respondent reports and including this information on public release files will facilitate understanding of how device of completion affects measurement quality.
While they can pose challenges from a questionnaire design perspective, mobile devices create new measurement opportunities as described in the AAPOR Task Force Report on Mobile Technologies (Link, et al. 2014). These opportunities include the ability to collect location and activity data (e.g., Krenn, et al. 2011; Mavoa, et al. 2011; Wagner, Olson, and Edgar 2017), social network data (e.g., Boonstra, Larsen, and Christensen 2015), photos and videos (Gotschi, Delve, and Freyer 2009), and physical measurements (e.g., Gregoski, et al. 2013; Link 2013). In addition, data collection apps allow for more frequent and timely reporting, reducing recall bias, such as in time-use surveys (Lai, et al. 2010; Link, et al. 2014). For example, the National Oceanic and Atmospheric Administration has developed apps to collect angler reports of catches for specific types of fish and for fishing vessel operators to submit Vessel Trip Reports (National Oceanic and Atmospheric Administration n.d.).
4.3 Additional Questions That Are Particularly Hard to Transition
Other questions may pose particular challenges when transitioning from telephone to self-administered or mixed-mode surveys. In our convenience sample survey, organizations were asked about the questionnaire design features of studies in both the original mode and in the transitioned mode. A few common types of questions were present in most of the responding studies before and after the transition. For example, 16 respondents reported having open-ended questions both before and after the transition. These questionnaire design features are summarized in Table 4.1.
Table 4.1 Questionnaire features of transitioned surveys
Number of respondents indicating presence of each feature |
|
In interviewer-administered only |
In self-administered only |
In both |
In neither |
Skip patterns |
0 |
2 |
19 |
2 |
Matrix/grid Qs |
2 |
6 |
10 |
5 |
Mark all that apply Qs |
2 |
4 |
15 |
2 |
Interviewer-coded questions |
9 |
2 |
5 |
6 |
Optional instructions |
10 |
2 |
9 |
2 |
Fills |
3 |
0 |
11 |
6 |
Explicit DK options |
4 |
6 |
10 |
3 |
Open-ended Qs |
1 |
4 |
16 |
2 |
Definitions |
3 |
2 |
15 |
3 |
Long questionnaire |
5 |
4 |
9 |
5 |
Multiple languages |
1 |
1 |
11 |
10 |
Sensitive subject matter |
1 |
2 |
10 |
10 |
Source: AAPOR Mixed Mode Task Force survey of organizations that have transitioned a survey across modes |
4.3.1 Numeric Reports and Complex Recall
In interviewer-administered surveys, numeric data such as age, number of children, number of adults currently living in the household, dates, measurements (e.g., height or weight), amounts, or expenditures are relatively straightforward to collect (although are not necessarily easy to answer!). Speakers often clearly state units, which help interviewers be clear about the format of an answer (e.g., “twenty five dollars and sixteen cents”, “five feet, six inches”), and the verbal interaction allows interviewers to verify responses or follow up unclear responses. Interviewers also accurately enter numeric responses into the telephone or face-to-face instrument (Smyth and Olson Forthcoming).
Surveys that transition from telephone to self-administered surveys should carefully consider how to ask about numeric values, especially when web-based mobile devices or paper mail surveys are included. In a web-based survey, some of these types of items can be collected via drop-down question formats provided by the researcher. In mail surveys and for questions where the drop-down format is impractical in web surveys, well-designed answer boxes and good verbal instructions have been shown to drastically improve open-ended numeric reports (Couper, Traugott, and Lamias 2001; Christian, Dillman, and Smyth 2007a; Fuchs 2009a; Fuchs 2009b; Couper, Kennedy, Conrad, and Tourangeau 2011; Dillman, et al. 2014). Moreover, in web surveys, edit checks, placeholder examples within a box, and error messages can be used successfully to prompt respondents to provide properly formatted answers (Christian, et al. 2007a). However, even with these tools, open-ended numeric questions can be problematic on mobile devices because on-screen keyboards may take up considerable screen space, or the size of the number box is reduced, making it difficult for the respondent to see what they entered. For instance, in examining usability properties of a mobile instrument for the American Community Survey, Olmsted-Hawala, Nichols, Holland and Gareau (2017) found that respondents had difficulty seeing numbers that were entered (thus misentering the number of zeros) and often missed a “.00” in the cents area of a number box.
No automated tools like validation checks or error messages are available in a mail survey, making administration of open-ended numeric items more complicated, especially for items for which there are multiple conventional ways of formatting (e.g., dates, monetary values, telephone numbers, etc.), those that can be reported in different units (e.g., income, height, etc.), or whether whole numbers are sufficient or decimal numbers are needed. Even with well-designed input boxes, respondents sometimes write in answers that are too vague or do not make sense to researchers, such as, for example, writing cents into dollar boxes or providing expense amounts that seem too high such that it is not clear if they are correct or if they are missing a decimal point (Breidt, et al. 2018; U.S. Census Bureau 2018b). For instance, one of the data editing steps for the mail-based Health Information National Trends Survey (HINTS) (Westat 2018) is to account for people reporting height in feet and inches (with two boxes provided) in the wrong box or with the wrong units (e.g., centimeters rather than feet). Additionally, the self-administered 2016 NSCH methodology report lists write-in items such as age, birth weight, BMI, year entered the U.S., and similar items as having higher item nonresponse rates than many other questions in the survey (6% missing or higher) (U.S. Census Bureau 2018). In the telephone administration of these same questions, the item nonresponse rates were less than 5% (BMI), and generally much lower (2011/12 National Survey of Children’s Health 2013).
In some cases, respondents may write illegibly or they may write outside of the answer box. Both of these errors make data entry more difficult and may inhibit the use of scanning. Even when the data are provided correctly, scanning vendors may charge for scanning open-ended responses at a higher rate than closed-ended responses, which can increase the total cost of collecting these data. Finally, respondents may be less likely to enter a numeric response than they are to check a box (Olson, Smyth, Phillips, and Stenger 2019).
The challenges of asking open numeric questions in mail surveys is exacerbated when the items are particularly complex, such as those that require respondents to consult multiple data sources or to do complicated calculations. Thus in surveys that transition from interviewer administered to self-administered and include mail surveys, it may be necessary to limit such requests, break them down into smaller and simpler pieces, and/or provide considerable context to help respondents understand the task and complete it accurately (Redline 2011; 2013). An example of a complex numeric question that likely suffers from increased measurement error after the survey mode was transitioned from in-person to mixed modes (in-person, web and mail) is a question ascertaining the square footage of housing units in the RECS (Amaya, Biemer, and Kinyon 2018; Murphy, Biemer, and Berry 2018). The definition of what counts in this measure is rather complex, with some spaces (attics and garages) only being counted in specific circumstances (attics if they are heated, cooled, or finished and garages if they are heated or cooled AND directly attached to the housing unit) (U.S. Energy Information Administration 2017). In computer-assisted personal interviews (CAPI), interviewers could ensure that respondents understood what counted and did not count toward housing unit square footage for their particular home and could help respondents estimate it using the official definition. Moreover, in previous CAPI administrations and in the CAPI portion of the 2015 RECS (42.5% of completes), square footage was collected two ways – by respondent self-report and by interviewer-taken measurements (with respondent consent) (Amaya, et al. 2018; U.S. Energy Information Administration 2017). Comparisons of CAPI self-reports and interviewer measurements from both 2009 and 2015 RECS indicate that respondents consistently under-estimate the square footage of their homes by about 400 square feet with even larger underestimates for single family detached homes, even when interviewers were present to help them understand the instructions (U.S. Energy Information Administration 2017). During the transition, based on these findings and the danger that bias would be even larger in self-administered modes, researchers opted to report interviewer measures of square footage for CAPI respondents but to use imputed measures rather than self-responses for web and mail respondents.
4.3.2 Multiple Answer Questions
In telephone surveys, multiple answer questions are often asked in a forced-choice (yes/no) format, allowing interviewers to administer the items one-at-a-time and eliminating the need for respondents to remember all of the items at once. When converting to self-administered modes, these items can be converted to a “mark-all-that-apply” or “check-all-that-apply” format rather than the forced-choice format. The check-all format are thought to be easier for respondents to read and answer, and therefore reduce burden. In our survey of organizations that transitioned a survey from telephone to self-administered or mixed modes, 15 respondents reported having check-all-that-apply questions both before and after the transition, 4 had them only in the self-administered mode, 2 in only the interviewer-administered mode, and only 2 did not use them at all.
However, a growing body of research indicates that check-all-that-apply formats are subject to shallower cognitive processing and more satisficing. The forced-choice question format tends to result in endorsement of significantly more response options than does the check-all question format because the items are processed more deeply (i.e., more optimal response behavior) in this format (Lau and Kennedy 2019; Smyth, et al. 2006; Thomas and Klein 2006). This phenomenon holds both across modes and within modes (Smyth, Christian and Dillman, 2008). As such,
multiple answer questions should be asked with a forced choice format when transitioning to self-administered or mixed modes of data collection.
One potential problem that has been identified with the forced-choice format is the phenomenon of respondents marking answers only in the affirmative column and leaving the negative column blank, essentially treating the item as a check-all question. When this happens, it is unclear if the missing items were overlooked (i.e., truly missing) or intended to be “not affirmative” responses (note that this confusion about missing items is always the case in the check-all format - Rasinski, Mingay, and Bradburn 1994). For example, the 2016 NSCH methodology report indicates that the items with the highest missing data rates were forced-choice items about reasons needed health care was not received, sources of health insurance, and reasons for not having health insurance, and that the primary reason for the missing data was respondents only using the affirmative response option (i.e., treated the item as a check-all) (U.S. Census Bureau 2018b). Review of the identified questions, however, reveals that while the response options were formatted as forced choice, the question stems were written using check-all wording, a combination that previous research suggests can increase the likelihood of people treating forced-choice items as check-all items (Dillman, Smyth, and Christian 2014, but also see Smyth and Olson 2019). For example, the question stem for the item about reasons a child was not covered by health insurance in the 2016 NSCH 12-17 year old questionnaire is, “Indicate whether any of the following is a reason this child was not covered by health insurance DURING THE PAST 12 MONTHS:” (Data Resource Center for Child and Adolescent Health n.d.). This question stem emphasizes the affirmative response option (“is a reason”), ignores the negative response option, and reinforces the impression that a check-all answering strategy is needed with the words, “any of the following”. A forced-choice equivalent of this question is, “Indicate whether or not each of the following is a reason this child was not covered by health insurance DURING THE PAST 12 MONTHS”. Unlike the check-all wording, this wording emphasizes the need for an affirmative or negative response (“whether or not”) for every item (“each of the following”) and thus should reduce the incidence of respondents treating it like a check-all question. The HINTS notes that editing is needed for a number of forced-choice format questions, where questions are presented in a grid with “yes/no” responses (Westat 2018). The items where this editing is reported also use the “any of the following” wording or fail to include directive wording at all “was there a time when you…”. Thus,
surveys that transition to a self-administered mode should be aware of the independent potential influence of the question wording along with the response option format for multiple answer questions and ensure that the question wording reinforces the response task dictated by the response option format.
While the forced-choice format (with forced-choice wording) is generally recommended over the check-all format in both interviewer- and self-administered modes, a check-all approach may still be more appropriate for a limited set of items in a self-administered questionnaire. For example, when questions ask for factual information (e.g., race/ethnicity, language spoken at home, etc.) so that the primary response task is simply searching the list for the appropriate items (i.e., respondents do not need to read and consider every item), a check-all format is appropriate and less tedious than a forced-choice format. However, for check-all items, it is helpful to add a "none of these" option. Otherwise, it becomes difficult to interpret these types of items when they are left blank.
4.3.3 Open-Ended Narrative and Field-Coded Questions
In interviewer-administered questionnaires, asking open-ended narrative (i.e., non-numeric) questions can be an effective means of gathering information without potentially influencing respondents by presenting predefined response options or for questions where a predefined list is unavailable. Sometimes, interviewers are asked to record the text of the response verbatim. For other items, “field coded” questions are used, in which respondents provide open-ended answers and the interviewer immediately interprets the response and categorizes it into one of the existing response options (although recording these accurately can be quite difficult; Smyth and Olson forthcoming). For example, the PSID uses this approach to categorize occupation by collecting information about the respondent’s major activities and duties, and then following up with additional questions and probes to ensure the responses are sufficiently detailed (McGonagle, et al. 2012). In our survey of organizations that transitioned, 16 used open-ended questions and 5 reported using field-coded questions in both modes (although how this was operationalized is unclear), with nine organizations using field-coded questions only in the interviewer-administered mode.
Open-ended items are more challenging in self-administered questionnaires for several reasons. First, without an interviewer present, there is no way to ask follow-up questions to clarify a respondent's answer (which may be unusable in its initial formation) or to redirect the respondent if they do not provide an answer to the specific question asked (McGonagle, et al. 2017). Secondly, because there is no interviewer to code open-ended responses into predefined categories, a great deal of costly data cleaning post data collection is required. Third, respondents often write very brief responses or skip the open-ended item altogether. Finally, interviewers can encourage a response to sensitive questions or questions that raise privacy concerns. In fact, the lost ability to probe open-ended questions was highlighted as one of the problems experienced when transitioning to a self-administered or mixed modes of data collection in our survey of organizations that transitioned.
Field coded questions cannot be used in mail-based self-administered surveys, so the response options need to be converted into multiple open-ended questions that probe on relevant areas or to questions with closed-ended response options that will make sense to survey respondents and allow them to easily map their responses. In a web or mail administration, lists of categories can be administered via a series of closed-ended items that successively narrows the set of appropriate categories via skip patterns or a combination of open- and closed-ended questions (e.g., field of study in the (discontinued) National Survey of Recent College Graduates, Pierzchala, Wright, Wilson, and Guerino 2004; Tijdens 2014, 2015). For example, in the PSID, this was accomplished through maintaining three open-ended questions about the respondent’s major activities and duties (rather than converting these to closed-ended questions). While nearly three times fewer characters were provided in the responses to the web instrument than the interviewer-administered instrument, levels of agreement of the occupational coding were high (McGonagle, et al. 2017).
One exception where retaining the open format is necessary is when researchers do not want to unduly influence respondents. For instance, if researchers want to measure what a respondent remembers hearing in the news yesterday, it may be better to ask an open-ended question. Another exception would be for questions in which no predefined response can be populated such as re-contact information (i.e., email addresses or phone numbers). In cases when an open-ended question is the only available option, researchers should use visual design and motivational instructions in self-administered surveys to improve reports (Dillman, et al. 2014; Smyth, et al. 2009). Moreover, it has also been shown that structured probes that can be anticipated ahead of time can improve responses to open-ended questions in web surveys (Holland and Christian 2007; Oudejans and Christian 2011). However, in general,
researchers who transition from telephone to self-administered surveys should anticipate lower item response rates to open-ended questions and additional processing costs to retrieve information provided by respondents.
4.3.4 Matrix or Grid Questions
Battery items, or individual items that share the same question stem and response options, are often converted to a grid or matrix format for self-administered questionnaires. Grids are an efficient visual format because they don't require respondents to read the same text (particularly the response options) over and over. They also take up less space in a paper questionnaire than writing each item out separately, which can reduce perceptions of questionnaire length and burden and save costs. In our survey of surveys that transitioned from telephone to self-administered or mixed modes, matrix or grid questions were used by most organizations, but six appear to have added them only after the transition to self-administered modes. When asked to elaborate on questionnaire elements that were particularly problematic in the transition, respondents mentioned concerns about administering grid items on the web because of people responding on mobile devices.
Several studies point to a reduction in completion time when grids are used, rather than single items (Couper, Traugott and Lamias 2001; Tourangeau, Couper and Conrad 2004; Callegaro, Shand-Lubbers and Dennis 2009; Toepoel, Das, and van Soest 2009). However, other studies, particularly of mobile devices, find that grids take longer to fill out than other types of questions (Couper and Peterson 2017) and can be particularly problematic for mobile device users (e.g., de Bruijne and Wijnant 2013; McClain and Crawford 2013; Peterson, et al. 2013). In a meta-analysis of break-off rates among mobile web respondents, for example, Mavletova and Couper (2015) found that complex grids increased the odds of the respondent breaking off the survey. However, as noted above, rates of straightlining or nondifferentiation are not consistently higher in a grid format compared to an item-by-item format across devices.
Yet, with higher breakoff rates, many recommend limited use of grids (Dillman, Smyth and Christian 2009)
or finding ways to improve their design in order to mitigate their negative effects (Tourangeau, Conrad, and Couper 2013).
4.4 Questionnaire Features That Are Hard to Transition
In addition to types of questions that are difficult to transition, there are more general survey features that can be challenging during mode transitions such as optional instructions, definitions, questionnaire length, and multiple languages. In our survey of organizations that transitioned surveys, 19 organizations reporting having skip patterns both before and after the transition and no organizations reported eliminating skip patterns in the transition (Table 1 above). Fifteen organizations reported having definitions at both time periods, three reported having definitions only in the interviewer-administered mode, and two reported having definitions only in the self-administered mode. While most organizations reported using optional instructions (i.e., “if needed” messages), 10 had them only in the interviewer-administered version, not the self-administered version of the questionnaire; nine organizations had them in both. Nine studies reported that their questionnaires were longer than 20 minutes in both interviewer- and self-administered versions, while five reported this was the case only in the interviewer-administered version and four reported it was the case only in the self-administered version. In open-ended comments, respondents mentioned that transitioning interviewer-coded “don’t know” responses led to difficult choices about the provision of explicit “don’t know” responses in the self-administered versions and that skip patterns had to be simplified in several of the studies. We turn next to the difficulties of transitioning these types of questionnaire features.
4.4.1 “If Needed” Information
Interviewer-administered questionnaires often include "if needed" information that is provided only to those for whom it applies. Sometimes the interviewer has discretion over when such information is provided, such as with some definitions, clarifications, and instructions. Other times automation can be used to provide or not provide the information based on previous answers, which allows some “if needed” information to be used in web surveys.
For self-administered questionnaires, researchers need to decide whether "if needed" information should be included or not, knowing that in some cases, including the information means it will be there for everyone. The use of italics, parentheticals, bolding, and other visual techniques can help differentiate this optional information from the main question text (Redline, et al. 2003; Christian and Dillman 2004; Tourangeau, Couper and Conrad 2004, Dillman and Christian 2005).
For example, one question in the RECS is about how many full bathrooms are in the household. A complication with this question is that depending on type of household, some respondents may need to be reminded to think about spaces they commonly overlook like finished attics or finished basements, but others do not need this reminder. In the interviewer-administered version of the RECS, the instruction to “Include bathrooms in finished attics or finished basements” is automatically added or excluded from the question depending on previously established housing type so that only those to whom the instruction applies are exposed to it. In the mail version of the questionnaire, this instruction is visible to everyone, regardless of housing type (U.S. Energy and Information Administration n.d.).
4.4.2 Definitions
In an interviewer-administered survey, definitions can be read to all respondents or to help only those respondents who exhibit signs of difficulty. In self-administered surveys, respondents themselves decide what information to read from definitions and when to read them. Thus,
an important and common decision when transitioning a survey from an interviewer-administered method to a self-administered method is where and how to display definitions. Although common practice in some self-administered surveys is to include definitions at the beginning of the survey in a call-out box or glossary format, respondents are more likely to read and use definitions if they are strategically placed to facilitate use when/where they are needed in the instrument, for example at the actual survey item to which they pertain (Christian and Dillman 2004). For some items, it may be more effective to place the definition before the question stem than after it (Redline 2013).
We could not find a systematic evaluation of how surveys that transitioned from telephone to self-administered modes addressed the placement of definitions and how that varied across modes.
Some surveys that transition from telephone to mixed modes or self-administered surveys may strategically change the placement of definitions in a question. For example, the telephone-based 2007 NSCH directly incorporated a definition of “specialists” into the first part of a question prompt, to be read to all respondents: “Specialists are doctors like surgeons, heart doctors, allergy doctors, skin doctors, and others who specialize in one area of health care. [During the past 12 months/Since [his/her] birth], did [Sampled Child] see a specialist [IF K4Q22 = 1, THEN INSERT: other than a mental health professional]?.” In the 2017 self-administered version, this definition came in italics after the main question: “DURING THE PAST 12 MONTHS, did this child see a specialist other than a mental health professional?
Specialists are doctors like surgeons, heart doctors, allergy doctors, skin doctors, and others who specialize in one area of health care” (US Census Bureau 2018). In addition, the 2015 ACS provided interviewers with a definition of who is included in the household, and interviewers ask a general question about whether there are people in the household who meet that definition. In contrast, the mailed ACS survey includes a definition of household membership in the front cover of the questionnaire, whereas the web mode turns those definitions into individual questions answered by the respondent (Clark 2017).
In web surveys, the presentation of definitions can be executed several different ways including their inclusion on every screen, a clickable reference, or a rollover feature whereby respondents are able to rollover a term to receive a definition. For example, in the web version of the ACS, definitions for the household roster and residence rules (“help text”) require clicking on specific help links at the top of the screen (Clark 2017). An analysis of the paradata for the web version of the ACS indicates that fewer than 3% of respondents accessed any of the definitions during the household roster, and generally less than 1.5% of respondents accessed them at any point (Clark 2017). Definitions are more likely to be attended to if they are easier to access (Peytchev, et al. 2006; Galesic, et al. 2008). For example, Conrad, et al. (2006) found that few web survey respondents (about one in six) accessed definitions at all, and the more effort it required to get the definitions, the less likely respondents were to consult them. Fewer respondents opened definitions when it took two mouse clicks to access them than when it took just one. Those respondents who did obtain definitions might not have attended to the details of the definitions (Tourangeau, et al. 2006). Thus,
even with computerization available, definitions should be placed where they are needed and should be immediately available with no necessary user-action to access them.
4.4.3 Long Questionnaires
Long questionnaires can be difficult to transition to a self-administered mode. In a lengthy telephone-administered survey, interviewers can be quite effective at encouraging participation in the survey, particularly when respondents show signs of fatigue or exhibit frustrations with the survey length. In a self-administered mode, respondents can complete the survey when it is convenient, in some cases even pausing and returning to lengthy surveys as time permits, but motivational prompts such as those used by interviewers cannot be used. In addition, respondents to self-administered questionnaires (especially mail) can see the entire questionnaire at once and thus may perceive the questions as more burdensome or intimidating than would be the case for the same questions in an interviewer-administered mode, potentially leading to break-offs. For this reason,
a big challenge in transitioning surveys to self-administered modes is managing questionnaire length.
Many surveys that transition from telephone to self-administered modes shorten the questionnaire. For example, the RECS transition shortened a 40 minute face-to-face survey to a 20 to 30 minute web and paper questionnaire by focusing on only the most critical content and asking for less detail in the self-administered modes (Murphy, Biemer, and Berry 2018). For example, whereas the interviewer-administered mode asked for information about up to three refrigerators in a household, the self-administered modes were capped at two refrigerators (U.S. Energy Information Administration 2017). Likewise, the transition of the NHTS reduced the number of response categories for questions about the purpose of trips and the means of transportation used (Federal Highway Administration and Westat 2018). The NHTS also attempted to reduce respondent burden by taking advantage of web technology in their trip rostering section. Since household members sometimes travel together, if one household member had already reported a joint trip, the other household members simply had to confirm and/or edit the details of the trip, saving them time and burden (2017 NHTS Data User Guide 2018). Similarly, the 2007 HINTS introduced a mail instrument to the existing RDD telephone survey, reducing the length of both from a 40 minute interview to a 30 minute interview (Cantor, et al. 2009). Others have attempted to deal with remaining questionnaire length issues after shortening surveys for a transition by offering the new version in two separate modules versus one longer survey. However, they found this to be an ineffective strategy as it decreased response rates and increased data collection time and costs (Liao, et al. 2019).
In some instances, efforts to shorten questionnaires have led to unanticipated problems in the self-administered data collection. For example, the National Center for Education Statistics undertook an intensive review of questions in the NHES surveys as part of their transition process. This review involved identifying and dropping questions that were of secondary important or that were too difficult for self-administration (Montaquila, et al. 2013). One result was that they asked fewer questions to verify homeschooler status on the screener questionnaire than had previously been used in the interviewer-administered screener, resulting in possible parent misreports of homeschooled children being in public or private schools in the 2012 NHES screener (McPhee, et al. 2015).
4.4.4 Single versus Multiple Languages
When surveys are designed and administered in multiple languages, interviewers help identify the need to administer a questionnaire in the appropriate language, to conduct the interview if they are bilingual interviewers in the appropriate language, or to ensure follow-up with a bilingual interviewer. Administering self-administered surveys in multiple survey languages is slightly more complicated than in interviewer-administered surveys. In a mail survey, researchers often send multiple versions of the same survey in different languages (e.g., the RECS, NHES, NSCH, ACS, etc.), or a dual-language survey, perhaps formatted as a “swim-lane” (side-by-side) questionnaire (e.g., 2010 U.S. Census [Rothhaas, et al. 2011]). The inclusion of multiple languages significantly increases the costs associated with printing and mailing mail survey packages, especially in cases when two or more alternative languages are required. In an effort to minimize the total number of survey packages printed, many researchers use sample information to predict the likelihood the respondent will require another language in order to participate in the survey (e.g., 2010 U.S. Census [Rothhaas, et al. 2011]; HINTS 4, Cycle 2 [Westat 2013]; NHES 2016 [McPhee, et al. 2018]). In a web-based survey administration, this process is more efficient when respondents may select their own language, but translation and programming costs are still required.
In the surveys that transitioned from telephone to self-administered or mixed modes, most were administered in only English or only English and Spanish. From our survey of surveys that transitioned, multiple languages were offered for about half of the studies, the vast majority of which offered them in both interviewer- and self-administered modes. For example, the self-administered NSCH used only English and Spanish language materials because the prior telephone administration, which used a language-line service for languages other than English and Spanish, found that 0.2% of interviews were conducted in Mandarin, Cantonese, Vietnamese or Korean (Bramlett, et al. 2017; Ghandour, et al. 2018). The California Health Interview Survey web survey pilot was administered only in English and asked respondents to call into the telephone center for languages other than English, yielding only 11 non-English interviews (Wells, et al. 2018).
Respondents in languages other than English may experience challenges with an instrument that do not occur for respondents who complete the English-language questionnaire, either because of translation problems or because of other visual design problems. One challenge with bilingual self-administered questionnaires, such as the 2010 U.S. Census questionnaire, which included both English and Spanish side-by-side in a “swim-lane” design, is that respondents may not answer questions in a single language. Rather, they may enter responses in both languages, raising challenges for data entry and processing. In the 2010 Census, 3.4 percent of returned bilingual questionnaires had this problem (Rothhaas, et al. 2011). Web questionnaires can allow respondents to switch between languages, as was done in the web experiment of the 2016 NHES (McPhee, et al. 2018). As such, data users who want to know which language was used to complete the questionnaire may need item-specific flags; alternatively, the survey organization may need to make a decision on how to assign language used. For example, in the 2016 NHES web experiment, language of interview in web surveys was identified as the language used for the last item completed in the questionnaire (McPhee, et al. 2018).
When transitioning interviewer-administered survey instruments into self-administered questionnaires in languages other than English, it is critically important to test and evaluate all parts of a questionnaire, including formatting and visual design, launch pages to a web survey, and question wording itself. These tests may reveal myriad problems that may be different in self-administered modes than in interviewer-administered modes. For instance, the Spanish-language respondents answering the mail questionnaire for the HINTS showed difficulties in completing grids (Westat 2013). Cognitive testing for Spanish-language respondents to the 2020 Census Barriers, Attitudes, and Motivators Survey (2020 CBAMS) revealed that the phrase “beginning the survey” on the web survey’s launch page was actually translated as “after the survey” (Lykke and Garcia Trejo 2018). Other issues may arise.
4.5 Collection of Biomeasures, Environmental Samples, Interviewer Observations and Consent for Administrative Record Linkage
Some measurements are facilitated by having an interviewer-administered survey (Kreuter 2013). Having interviewers present in-person, for example enables the use of show cards and card sorting measurement techniques and the collection of physical measurements of people (e.g., height, weight, etc.) or housing units (e.g., square footage) or physical samples (i.e., biological or environmental samples such as saliva, stool, water, and air). Interviewers can also make important observations about neighborhood characteristics, housing conditions or other factors. Finally, interviewers can assist with the integration of less traditional types of measurement such as by installing passive data loggers (e.g., meters that measure television viewing, energy use, light, air quality, etc.) or obtaining record linkage consent. These measurement techniques are not possible in mail or web surveys and – in some instances – are more difficult on the telephone.
For example, as described above, interviewers in the RECS measure housing unit square footage, providing better measures than those that can be obtained through respondent self-report. They also record, based on their own observation, the housing type of the respondent and then a number of details about the housing (e.g., if it is an apartment, what floor it is on and whether the door opens to a hallway or outside); respondents have to be asked for this information directly in the self-administered version. Likewise, interviewers in the ANES made observations about respondent characteristics like skin tone, apparent intelligence, cooperation, suspicion, interest in the interview, and sincerity as well as took notes about any visible political or campaign signs at the residence (American National Election Studies 2015; 2018). In addition, interviewers are sometimes asked to collect administrative records or even to install passive data loggers.
Transitioning away from interviewer-administered and to self-administered modes raises challenges for interviewer observations as a critical part of data collection.
One way to continue to collect observational or biological measurements when transitioning to self-administered or mixed modes is to send a separate observational team to collect the assessments, but consent rates may decrease substantially and more research is needed to minimize the losses. In the first four waves of National Longitudinal Study of Adolescent to Adult Health (Add Health), researchers collected extensive biological measures, including height, weight, BMI, DNA, pulse, and blood pressure and tested for sexually transmitted infections, HIV, immune function, inflammation, and diabetes, requiring the taking of physical measurements and collection of blood, urine, and saliva samples. In wave 4 of data collection, measures and samples were taken by in-person interviewers in an approximately 30 minute procedure that took place immediately after the interview (Add Health Wave IV n.d.); 96% of respondents consented to providing saliva and 95% consented to providing blood samples (Harris 2018). In wave 5, the survey was transitioned from in-person to a mixed-mode design that started with web and mail data collection followed by telephone non-response follow-up. Researchers sought consent for the physical and biomarker collection during the initial web, mail, or phone survey and then had a biomarker subcontractor visit respondents for actual collection. Using this two-step process, consent rates were considerably lower, with only 66% consenting for the biomarker visit (Harris 2018).
The HRS also faced mode-related limits to collection of these types of measures. HRS started collecting physical and biomarker data in 2004 and since 2006 has used interviewers to collect breathing tests, hand strength tests, walking tests, balance tests, height, weight, waist circumference, blood pressure, saliva, and blood spots in their biennial surveys. Interviewers are also able to administer cognitive performance tests and to provide observations such as information about the mode of response, how much help respondents received with the interview and from whom, notes about respondent difficulties with the questionnaire, and notes about factors that might impact respondent recruitment in future surveys (Health and Retirement Study Questionnaires n.d.). In off-years HRS, has collected considerable data using self-administered modes, but is generally unable to collect physical measures, biomarkers, some cognitive performance tests, and interviewer observations in these efforts (Fisher and Ryan 2018; Health and Retirement Study Questionnaires n.d.).
Without interviewers to collect biomeasures, researchers are left with a few options.
One option for surveys that transition away from interviewers is to ask sample members to go to a clinic to give samples. While collecting samples at a clinic maximizes the types and quality of samples that can be taken, this method requires sample members to be in geographic proximity to a clinic, is expensive, and is prone to low cooperation rates among sample members (Sakshaug, et al. 2015).
Some studies have attempted to collect biological samples via self-administration. These studies typically have lower participation rates than those using interviewers, although not always (Sakshaug, et al. 2015). Participation rates to such requests vary widely from 15% to 92%, and likely depend on how the request is made, of whom, and what samples are collected (blood and urine tend to have more nonresponse than saliva and buccal cells [i.e., cheek swabs]) (Gatny, Couper, and Axinn 2013; Sakshaug, Couper, and Ofstedal 2010; Dykema, et al. 2017). For example, the Wisconsin Longitudinal Study was able to collect saliva samples from 54% of sampled participants via mailed saliva kits using a protocol that started with prenotice phone calls, followed by postal mail saliva kits, a reminder postcard, and a final reminder telephone call (Dykema, et al. 2017). In the Danish Nurse Cohort Study, Hansen, et al. (2007) found 76% to 80% of nurses asked to mail in buccal cells did so as did 72% of those asked to mail in saliva samples. This compares to 31% of those asked to go to a central location to have their blood venously collected. They further found that most self-administered samples of buccal cells and saliva contained the desired DNA; only 2.6% were failed samples containing no DNA. However, the DNA quality from the buccal cells was too low for genotyping whereas the DNA from the blood samples and about three quarters of the DNA from the saliva samples could be genotyped. Rylander-Rudqvist, et al. (2006) also found high rates of saliva sample returns among Swedish men (80%) and high DNA quality in the samples. Clements and Parker (1998) similarly showed that concentrations of cortisol in saliva samples that were exposed to simulated postal mail conditions were virtually the same as those frozen within one hour of collection, and Durdiakova, et al. (2013) found that salivary testosterone levels were unchanged after 1 day, 1 week, and 1 month of sample storage at room temperature, 4°C, -20°C, and -80°C (i.e., neither storage time nor temperature degraded the samples). These studies suggest that saliva can be successfully collected via self-administered and mail-back methods in order to test common biomarkers like DNA, cortisol, and testosterone, although we are unaware of any studies directly comparing the quality of samples collected via interviewer- and self-administration in a population survey context.
There is some evidence that, at least among certain populations, blood samples can also be collected in self-administered surveys. In 2003, the HRS conducted a survey among people diagnosed with diabetes in which they mailed blood-collection kits to sample members with instructions to mail the completed blood sample to a lab. The blood completion rate for this study was 52% (Sakshaug, et al. 2015). The study demonstrated that it is possible to collect blood via self-administered means, but it is notable that the completion rate is considerably lower than the comparable 80% to 87% for interviewer-administered HRS surveys around the same time frame, even though the survey population is made up of people who commonly have their blood monitored or monitor it themselves (Sakshaug, et al. 2015).
While it is by no means exhaustive, in our review of surveys that have transitioned, we did not come across any that attempted to collect environmental samples such as soil, water, or dust using self-administered modes.
In addition to making interviewer observations and collecting biological and environmental samples, researchers have also begun to rely on interviewers to collect consent for administrative record linkage, such as linking medical records to survey responses.
Successful consent to record linkage is obtained at higher rates in face-to-face interviews than in any other mode of data collection. As just one example, the HRS links survey data to Social Security Administration records on earnings and benefits, to the Centers for Medicare and Medicaid Services claims information, to Veteran’s Affairs health care utilization information, and to the National Death Index for mortality and cause of death information (Fisher and Ryan 2018; Health and Retirement Study n.d.). Their in-person linkage consent rates range from 78 to 84% (Sonnega, et al. 2014). Fulton (2012) reviewed 22 U.S. surveys conducted between 1982 and 2010 that utilized record linkage. Most of these surveys (18) were conducted using interviewer-administered modes. Those conducted in-person had average record linkage consent rates of 75% compared to 63% for those conducted by phone. Three of the surveys were conducted by mail; these three had a substantially lower average record linkage consent rate of 49%. These results are consistent with the findings of an experimental comparison of record linkage consent rates (to employment data) in the 2012/13 Legitimation of Inequality Over the Life Span (LINOS) panel survey in Germany. In this experiment, 94% of those responding to an in-person interviewer consented to the record linkage compared to only 54% of those responding by mail or web, a finding that held even after controlling for differential nonresponse across the modes. In addition, while the linkage consent bias was small for all modes, it was larger for the self-administered modes (Sakshaug, et al. 2017).
These results suggest that transitioning from interviewer-to self-administered modes can be problematic for record linkage. More work is needed to figure out how to increase consent rates for record linkage in self-administered modes.
4.6 Summary and Takeaways
4.6.1 Different survey modes have different features that affect what can be measured and how measures work. Major mode features of consequence for measurement are interviewer- versus self-administration, visual versus aural communication channels, and computerized versus not computerized instruments.
4.6.2 Surveys that transition from telephone to self-administered or mixed modes generally experience slightly higher item nonresponse rates in the self-administered modes.
4.6.3 Surveys that transition from telephone to self-administered or mixed modes may see shifts in their survey estimates related to socially desirability, acquiescence, ordinal scale items, or items that are related to interviewer characteristics.
4.6.4 More research is needed to identify exactly how and what variable errors change and on what types of questions when transitioning from a telephone survey to self-administered or mixed-mode study.
4.6.5 Surveys switching from interviewer-administered to self-administered modes may see better quality of responses on autobiographical items that can be identified from records, although more research is needed to evaluate this hypothesis.
4.6.6 Surveys with knowledge questions may see an increase in estimated knowledge when transitioning from telephone to web-based self-administered modes.
4.6.7 Visual self-administered surveys allow for the use of graphics such as maps, ladders, smiley faces, or thermometers to try to help respondents understand questions that are not possible or very difficult to implement in telephone surveys.
4.6.8 Questionnaire and question features that are particularly challenging to transition to mail surveys include skip patterns, fills, open-ended and field-coded items, and definitions. Thoughtful planning about implementing questions with these features, including the need to simplify questions and skip patterns, differ across web and mail modes.
4.6.9 Screenshots of the questionnaire on both web and mobile devices should be captured and made available with documentation of the questionnaire.
4.6.10 The empirical literature to date has not shown large or consistent differences in the quality of data obtained from mobile versus computer web respondents. The most consistent pattern is that mobile respondents (especially smartphone respondents) break off at higher rates and take more time to respond than computer respondents. To facilitate learning about possible device effects on measurement, surveys should collect the type of device used to complete the survey through paradata or respondent report, and include this information on public release files.
4.6.11 To the extent possible, multiple answer questions should be asked with a forced choice format. Surveys that transition to a self-administered mode should align the question wording to the response option format for multiple answer questions.
4.6.12 Because of higher breakoff rates, use of grids should be limited or their design improved to mitigate negative effects.
4.6.13 Many surveys that transitioned from telephone to self-administered modes shortened the questionnaire.
4.6.14 Most surveys that transitioned from telephone to self-administered or mixed modes were administered in only English or only English and Spanish. Testing of question wording, formatting and visual design, and other parts of a multi-language survey in all languages is critical.
4.6.15 Collection of a limited set of biomeasures or consent for administrative linkage is possible in self-administered modes, although consent rates are lower than in face-to-face studies. The range of the types of measures is limited, as neighborhood and environmental observations by an external observer are not possible through the self-administered mode itself.
Return to Top
Moving from one mode to another or to a combination of modes may yield significant changes to multiple features of the data collection instruments. Thus, key to these mode transitions is testing. Just as testing is important for the initial fielding of a survey instrument to understand whether accurate information will be collected, testing in surveys transitioning to new modes can also provide insight into the potential effect of new modes on data quality.
Anytime one transitions to a mode that changes communication channels (i.e., visual versus aural) or adds or takes away an interviewer or computerization, the changes in stimuli to the respondent are significant enough that testing will need to be conducted in the new mode(s). For example, surveys conducted primarily via telephone may rely on conversational norms that have no corollary in self-administered modes. Telephone surveys may also have complex routing and skips that are difficult or impossible to replicate on paper. In addition, some questionnaire features may be particularly prominent in self-administered modes, but not used in telephone administrations, such as the use of a grid in a self-administered mode for items that were administered individually in an interviewer-administered mode. Thus, even if question wording remains the same, changes in mode warrant additional questionnaire testing.
There are a variety of methods available for questionnaire development and testing (Presser, et al. 2004; Tourangeau, Maitland, Steiger, and Yan forthcoming). In our survey of organizations, most indicated that they performed questionnaire testing during the transition, primarily cognitive interviews, pilot tests and usability testing. Many organizations combined these strategies in order to evaluate the instrument and then the entire data collection protocol.
5.1 Expert Reviews
Many surveys that transitioned from telephone to self-administered modes report convening a panel of experts to help with this transition. These can be informal, such as when panels of experts are asked to review materials and provide feedback on the materials. Individual experts can also be asked to provide a list of problems they identified. In surveys that transition, expert reviews and panels contained both methodological and subject matter experts (e.g., Brick, Williams and Montaquila 2011; Wells, et al. 2018; Federal Highway Administration and Westat 2018; Ghandour, et al. 2018) or stakeholders broadly defined (e.g., Cantor, et al. 2009). How exactly these experts were used is not always described, but include: “evaluated various frame and mode options to supplement or replace the existing data collection methodology” (Wells, et al. 2018, p. 12), did “work to refine and revise selected content” (Ghandour, et al. 2018), or “were instrumental in shaping the design of the Pilot Study” (Brick, Williams and Montaquila 2011, p. 409).
Other surveys used a more formal expert review process. More formal expert reviews can be conducted when these experts use a standardized evaluation tool to evaluate the questionnaire. Three examples of such tools are the Question Appraisal System (QAS - Willis and Lessler 1999), Question Understanding System (QUAID – Graesser, et al. 2006), and the Survey Quality Predictor (SQP - Saris and Gallhofer 2007). For instance, the QAS was used to review the Residential Energy Consumption Survey (RECS) instrument (Murphy, Mayclin, Richards, and Roe 2016).
In general, there is little information available on the details of how expert panels are used in these transitions. Expert reviews can focus on many aspects of the design including whether the right constructs are measured, how individual measures will work, question order effects, respondent burden, navigation, or recruitment methods. Compared to other testing methods, they are quick, inexpensive, and easy to implement. They provide a good means to identify possible problems with a questionnaire (especially those related to retrieval and respondent burden problems linked to item-nonresponse and inaccurate reporting – Olson 2010), and thus can be very informative about what parts of the questionnaire should be prioritized for additional testing. Given these many uses, more research is needed on the most effective use of expert panels and expert review when transitioning a survey from telephone to self-administered or mixed modes, including the composition of these panels or experts (substantive; methodological; data users), the frequency with which the experts are engaged (monthly, quarterly, annually), when the experts are engaged in the transition process (before starting design decisions; after the study team has identified core content; etc.), the level of formal assistance from the experts (informal conversations; coding sheets), and more.
5.2 Cognitive Interviews
Cognitive interviews are a well-accepted and commonly used method of testing questionnaires, used to obtain qualitative information from potential respondents about the process they use when answering survey questions (Willis 2005, 2015). A typical cognitive test is conducted by a trained interviewer who solicits verbal reports from a respondent as they answer survey questions, using structured follow-up probes to gather information on specific steps in the response process (Willis 2015). For example, asking participants to tell what a question is asking in their own words can help identify comprehension problems while asking them how easy or difficult it was to recall something can help assess recall challenges.
Many surveys transitioning to mixed modes conduct additional cognitive testing for new versions of questionnaires. For example, the Health Information National Trends Survey (HINTS) conducted three rounds of cognitive interviews for the 2007 phone administration of the instrument and three rounds of cognitive testing for the mail administration of the instrument (Cantor, et al. 2009). Because both modes were implemented that year, the computer-assisted telephone interview (CATI) cognitive interviews informed the design of the mail questionnaire, and the mail cognitive interviews focused on navigation issues, question formatting issues (e.g., indentation, font size), and other issues around visual layout. These cognitive interviews also asked respondents to react to the cover of the questionnaire.
One important consideration, especially for self-administered modes, is when and how the interviewer should interact with the participant during the interview. Think-aloud procedures and having the cognitive interviewer probe on a question-by-question basis ensures that participant thoughts are heard when they are occurring. However, real-time probing may also interrupt processing, leading to a participant experience and behaviors that are very different from how self-administered surveys are actually done (i.e., reading more closely than usual, paying more attention to instructions and definitions, etc.). Foregoing the think-aloud procedure and saving probes until the end of the interview (i.e., retrospective interviewing and probing) may mean some details are forgotten, but will keep the survey experience closer to field conditions and minimize interviewer impact on participants as they complete the questionnaire. It may also allow for more observation of usability issues (discussed below) such as navigation errors or difficulty registering responses. For even greater realism, the National Survey of Veterans mailed cognitive interview participants a questionnaire and asked them to complete it and return the questionnaire to the survey organization; respondents were then called on the telephone to provide insights into their difficulties in completing the questionnaire (Westat 2010). Surveys often use concurrent or think aloud probes in initial rounds of cognitive interviews to explore understanding, while in later rounds, the strategy often shifts to retrospective probing in order to gain an understanding of how the entire instrument is performing (Willis 2005).
Cognitive interviews have also been used to test other important factors like the effects of visual design features in questionnaires, how well respondents navigate questionnaires, and how respondents process implementation materials (Dillman and Allen 1995; Sawyer and Dillman 2002; Dillman, Parsons, and Mahon-Haft 2004; for an overview of the extension of cognitive interviews to self-administered modes, see Dillman and Redline 2004). For instance, Martinez, Eggleston, Katz, and Morales (2018) used cognitive interviews to examine a series of mailings in the mixed-mode American Community Survey (ACS). To address the mixed-mode data collection of the ACS, “likely internet responders” and “likely paper responders” were recruited, based off of prior statistical analyses of ACS that identified demographic and other correlates. Cognitive interview participants evaluated five different packages of recruitment materials, reflecting the ACS mailing strategy, including letters, questionnaires, postcards, and envelopes. These interviews revealed not only what respondents paid attention to in the letters and envelopes, but also perceptions of the recruitment protocol as described in the letters (e.g., negative comments about having to wait three weeks after the first mailing for a paper questionnaire among those who did not have internet access).
Because cognitive interviewing is very labor intensive and burdensome for respondents, researchers often conduct a small number of interviews and focus the interviews on areas identified as particularly problematic rather than testing the entire questionnaire.
Surveys that are transitioning to self-administered or mixed modes often focus on areas where the instruments are substantially different across the old and new mode(s). For instance, the 2016 American National Election Studies (ANES) reported that a “subset of questions … in the post-election CAPI [computer-assisted personal interview] instrument” (DeBell, et al. 2018, p. 28) were included in cognitive interviews. The RECS conducted two rounds of cognitive interviews on selected items that were thought to have changed in meaning over time (e.g., questions about compact fluorescent lightbulbs) or would require changes in presentation across modes (Murphy, et al. 2016).
5.3 Web Probing
Transitions from interviewer-administered modes to self-administered modes are also occurring for cognitive testing methods. Recently, survey methodologists have used qualitative questions embedded in web surveys or “web probing” to ask follow up probe questions about respondents’ answers (Murphy, et al. 2016; Edgar and Scanlon 2017). For instance, Edgar and Scanlon (2017) provide examples from questions such as “How much have you spent on clothing in the past 3 months?,” which can be followed up with probes similar to those that would be used in interviewer administered cognitive interviews such as, “What types of clothing were you thinking of when you answered that question?”. The follow-up questions can be presented for all respondents or just for those providing specific answers, and can be closed- or open-ended questions. Obviously, this method does not require a trained interviewer, but is limited to the probes that can be specified in advance.
This type of testing can be administered to larger numbers of respondents than are typically included in cognitive interviews, can be conducted very quickly and relatively cheaply, and can yield much more diverse participants than those typically used for cognitive interviews (i.e., recruit beyond those from the immediate geographic area of the cognitive lab) (Murphy et al. 2016). Scanlon (2018) used web probes among over 2000 respondents in a web panel to evaluate misunderstanding of a question on health insurance across age, education, and income subgroups. Respondents can be recruited from social media platforms or sources such as Amazon’s Mechanical Turk if testing is done outside the survey itself (Edgar and Scanlon 2017).
For surveys moving to a heavier reliance on web questionnaires, web probing may be a particularly useful method of evaluating whether the online instrument and questions are working as intended.
5.4 Usability testing
The use of cognitive interview methods for examining navigation issues and recruitment materials has increased with the increase in web surveys and is now often referred to as “usability testing.” The term “usability testing” comes from the more general website design and testing literature and refers to tests focused on a respondent’s ability to navigate a website and perform a task (Krug 2014). For web-based survey instruments, the task(s) include logging into the instrument, entering answers, navigating successfully through the instrument and submitting data. Usability tests might focus on the design of the log-in screen, scrolling, automated editing or warnings, and the survey submission procedure. For instance, Hunsecker (2018) used in person and virtual (web-based) usability tests to evaluate problems with completing surveys and web panel enrollment forms online. These usability tests revealed traditional problems with question wording, and also problems with navigating the web forms and other question formatting issues. In paper questionnaires, navigation and skip patterns are a focus of usability testing. Given there is no interviewer to assist with these tasks in self-administered surveys,
usability tests are very important in surveys transitioning to self-administered modes and are best used as a complement to traditional cognitive interview methods that focus on the response process for questions.
Increasingly, respondents are answering web surveys on mobile devices like phones or tablets. The limited screen size of smaller devices and alternative input methods (i.e., touch screens, scrolling, spin wheels, etc.) may have a substantial impact on the layout and formatting of instruments and respondents’ ability to navigate through them. Indeed,
the most common remark from our survey of organizations transitioning regarding types of questions difficult to move to new modes was that grid or matrix questions became problematic for surveys that were likely to be completed on cell phones or mobile devices. Usability testing across device types can help ensure that the design does not inadvertently introduce measurement error for subsets of device users. For example, in a test for the mixed-mode Consumer Expenditure Survey, Williams, et al. (2018) evaluated the usability of a web-based consumer expenditure diary for both desktop and mobile respondents. Respondents were allowed to select the device used to complete the web diary, with many not selecting a mobile device because they thought it would be difficult to use. As with a pilot test, the usability test also revealed differences in timeliness and the types of expenditures reported by respondents across devices.
Usability tests are especially important when transitioning to web modes for understanding aspects of technology that may not be uncovered when the questionnaire and its specifications are displayed on paper. For example, Omsted-Hawala, Nichols, and Myers (2018) identified technology related usability problems with the mobile version of the mixed-mode Decennial Census Test such as the mobile device keyboard covering up the answer categories and entry boxes and failure to default to a numeric mobile keyboard for fields requiring only numeric entry. These types of problems would not have been revealed in a paper-based test using questionnaire specification documents.
Usability testing may also be important when transitioning to self-administered surveys if the level of internet experience, literacy, or English language ability is expected to vary across respondents. That is, respondents who are not comfortable with computers or the internet or who have lower levels of literacy may have more trouble with web or paper instruments than those who use computers more regularly or have higher literacy levels, and thus should be included in usability tests.
Questionnaires that are transitioned to self-administered surveys and offered in multiple languages also require attention in usability tests. For instance, Olmsted-Hawala, Nichols, and Myers (2018) report that respondents who speak languages other than English may have browsers that automatically translate survey login pages or questionnaire forms, even if the survey organization has a translated version available. Furthermore, usability tests revealed that locating a toggle button for a questionnaire in Spanish or another language far away from entry fields poses problems with respondents finding this button; similarly, those with limited familiarity with the English language may have difficulty with even the initial task of entering a URL into a web browser. Each of these aspects should be tested across multiple populations when transitioning a survey to mixed modes, and especially those that include the web.
5.5 Field Tests
Field tests, also known as pilot tests, are small-scale studies of the entire survey procedure, including implementation materials and processes and the questionnaire itself, and yield empirical data about the new design under real survey conditions (i.e., in the field, with actual target population members). Field tests can give realistic estimates of item nonresponse rates, response distributions, and skip errors in the questionnaire, although often they cannot reveal problems with question comprehension. In computerized modes, they can also provide paradata to help understand survey question timing and rates of answer changes. For implementation, field tests provide information about response rates, sample composition, timing, costs, staffing needs, staff communication and coordination, and the effectiveness of field monitoring systems, but are costly. Field tests in mixed-mode surveys also facilitate comparisons of responses across modes; if the modes include the web, responses can be examined across devices within web respondents as long as those paradata are collected. These comparisons allow the researcher to evaluate the impact of using multiple modes or of multiple response devices on estimates, data quality, timing, and costs. For instance, when transitioning the National Longitudinal Study of Adolescent to Adult Health (Add Health) from interviewer-administered to mixed mode, Biemer, et al. (2018b) conducted a small pilot test prior to a larger implementation to evaluate response to a web-only survey and the quality of sample members’ email addresses. This pilot test informed the design of the study and possible experiments for a larger scale implementation.
One strategy when moving from a single to mixed-mode survey is to try to maintain comparability to the existing modes. As such, field tests may involve simultaneous fielding of the new modes and the old mode to evaluate how responses change with the change in design. This is expensive, and as an alternative, some surveys compare a field test in the new self-administered or mixed modes with the most recent implementation of the survey in the interviewer-administered mode. For instance, the Panel Survey of Income Dynamics (PSID) compared the implementation of a web instrument in 2016 with the most recent telephone administration of the instrument in 2015 (McGonagle, Freedman, and Griffin 2017), focusing on differences in questionnaire and section length, important survey estimates, and the ability to code answers about work and occupations from narrative open-ended questions. Brick, Williams, and Montaquila (2011) compared response rates, household eligibility, and demographic characteristics for a 2009 pilot study for the National Household Education Surveys (NHES) with the most recent telephone administration in 2007. Link, et al. (2008) compared response rates, demographic characteristics, and costs for a mail pilot of the 2005 Behavioral Risk Factor Surveillance System (BRFSS) with the 2005 telephone survey being done in the same states at the same time.
5.6 Experiments
Experiments assign different versions of the same question or implementation feature to random subsets of the sample. Experiments can help researchers determine which changes in their design will matter and how much.
When transitioning from telephone to self-administered or mixed-mode surveys, many experiments are conducted as part of field tests (if the field-test sample is large enough), but they are also often conducted within production surveys. While production surveys aim to optimize quality and cost trade-offs for each mode, experimental research on its own or within a production survey aims to optimize equivalence of design elements that are not part of the experimental variation in order to isolate the effect of a specific feature on the outcomes of interest. Experiments allow researchers to quantify the effects of alternative versions of a questionnaire or implementation procedure, but a weakness of some experiments, especially for questionnaire design and measurement purposes, is that they sometimes do not reveal the underlying cause of the difference, leaving it unclear which of two versions is “better”. The findings from experimental research build up the design principles and theory underlying potential differences that may be observed in self-administered and mixed-mode surveys compared to telephone surveys, while the findings from production surveys are empirically based and decisions are based on subjective assessment of what is important, what works, and what is available in terms of the survey budget and resources.
One important decision to be made in mixed-mode experiments is whether to experimentally assign sample members to modes/devices or allow sample members to self-select these modes and devices. In theory, random assignment ensures that differences found across modes/devices is due to the modes/devices themselves and not self-selection, although in practice, differential nonresponse across modes/devices undermines the random assignment. For instance, the Pew Research Center experiments (e.g., Keeter, et al. 2015) randomized members of their ongoing American Trends Panel to phone or web, but both arms of the experiment had nonresponse, opening the door to compositional differences. Additionally, randomly assigning sample members to particular modes and devices may systematically exclude large portions of the population (e.g., those without internet access; those without both desktop and mobile devices), making generalization more difficult. In most mixed-mode survey experiments for studies that transition from telephone to mixed modes, however, selection into modes occurs through the mixed-mode design itself. For instance, when transitioning the Gallup-Sharecare Well-Being Index survey from telephone to web and mail modes, Marken, Auter, and Marlar (2018) randomly assigned sample members to a mail only condition, a simultaneous web and mail condition, a sequential mail-web condition, and a sequential web-mail condition, allowing respondents to self-select mode in the various mixed-mode conditions.
5.7 Packages of Testing Strategies in Surveys that Transitioned
Many surveys that transitioned from interviewer-administered to self-administered or mixed modes used a package of testing strategies during this transition. For instance, the RECS faced a number of challenges in their transition from in-person to web and paper modes, including a short timeline for testing to determine the design that would be used in the production survey. As such, they adopted a multi-phase testing approach in which the best features of the testing phases were built into future tests and the final production design on a flow basis (Murphy, Biemer, and Berry 2018). Testing started with expert review of the questionnaire, in-person cognitive interviews, and online self-administered cognitive interviews (Murphy, et al. 2016). The RECS in-person cognitive interviews (2 rounds with 15 people each from three cities) focused on particularly challenging content, for example, questions about new technologies, revisions of outdated or previously problematic questions, and “mode sensitive” content. The online cognitive interviews were similarly focused, although they also included some updates made based on the in-person cognitive interview findings. Problems and changes identified in early testing were addressed and retested in later testing (Murphy et al. 2016).
A series of field tests were then conducted to test the feasibility of collecting energy use data via self-administered modes and to experimentally test questionnaire length and initial mode assignment (Murphy et al. 2018). The first test focused on a subset of localities, showing that a 30 minute self-administered RECS survey was feasible for both web and mail modes with respect to budget, timing, and response rate; that most people preferred the mail mode; and that the web mode produced higher quality data at lower costs. The second field test, a national level test designed while the first field test was still in the field (using daily tracking results) and conducted alongside the 2015 RECS CAPI data collection, adopted the materials and strategies that worked in the first field test but also included experiments with incentives and mode to try to push respondents to the web. Four key metrics were participation rates, web response rates, respondent sample representativeness, and costs per completed case (Murphy et al. 2018). These were monitored on a daily basis and had the biggest influence on decision making as the testing progressed and on the final production design.
Similar to the RECS, the transition of the NHES surveys from telephone to mail also involved considerable testing that started with a comprehensive review, redesign, and three rounds of cognitive interviews and ended with two pilot field tests (Westat 2009; Montaquila, Brick, and Kim 2012; Montaquila et al. 2013). The cognitive interviews tested recruitment materials, screener questionnaires, and topical questionnaires, with early interview findings informing changes that were tested in later interviews (for design details and findings, see Westat 2009). Two pilot field tests were subsequently conducted, one in 2009 (n=11,800 – see Brick, Williams, and Montaquila 2011) and another in 2011 (n=60,000 – see Montaquila et al. 2013). These pilot tests included experiments on prenotice letters, incentives, questionnaire design, postal delivery methods, Spanish language materials, and envelopes. Key outcomes from the field tests included the screener response rate, eligibility rate, topical response rate, overall response rate, number of eligible households required to get one screener completed with an eligible household, number of eligible addresses required to get one completed topical survey, respondent characteristics, and costs (Montaquila et al. 2012).
5.8 Tools to Evaluate Questionnaire Features
Assessing the impact of mode transitions on estimates and data quality is complex. It requires researchers to plan ahead, identifying outcomes or metrics that will be used in such assessments, and ensuring that the proper information is collected to analyze these outcomes. Advance planning for the types of information to capture and the analysis to use will facilitate evaluation of the impact of transitions on the quality of measurement. Any analysis of the changes of mode of data collection may be confounded by the nature of who responds via different modes, whether through self-selection into response mode or differential nonresponse across response modes. All analysis of mode of survey, and especially those for which mode is not experimentally manipulated, must wrestle with these confounding factors. Chapter 8 on Survey Estimation addresses these estimation issues in more detail.
5.8.1 Data to be Collected and Associated with a Response or a Question
Surveys that transition from interviewer-administered modes to self-administered modes may want to plan to capture certain data as part of the data collection effort for facilitating analysis either by the data collector or secondary analysts of the data. Survey researchers who collect their own data may need to identify systems that can collect this information. Researchers who contract to another organization to administer the survey may want to include these items in the data collection contract. This list aims to be comprehensive, recognizing inherent challenges in measuring or capturing some of the information at different survey organizations and with different data collection instruments.
Mode of response. Where multiple modes will be used, an indicator of which mode (telephone, paper, web-based) each respondent used is crucial to permit analyses of outcomes across modes.
Device used for responding. For surveys conducted by web, information about response device type (desktop, laptop, tablet, smartphone) is needed to allow comparison of responses and data quality across devices. Device type can be collected simply by asking respondents for a self-report. To fully understand the nature of the responding device, researchers may want to collect as much information as feasible and practical about the responding device (e.g., iPhone 6SE, screen size, and screen resolution). Such detailed information may yield more insights than simply categorizing the devices into broad-based categories (e.g., tablet vs. smartphone), although most existing analyses simply focus on these broad-based categories. Device type information can be recorded as paradata in what is called a “user agent string” (i.e., a string of text that identifies information about the responding device like resolution, operating system, browser, etc.). For an overview of collecting device type via paradata, see Callegaro (2010). For longer surveys, capturing device type at multiple points in the survey may be needed to evaluate whether respondents switch devices partway through.
Question characteristics. Most surveys document the wording of questions. Researchers can refer to this documentation to understand question features like the question wording, whether it is open or closed, or how many response options there were. This documentation often does not capture other questionnaire design features that can impact the quality of measurement. In visual modes where the graphical display of the question can communicate meaning to respondents and in mixed-mode surveys in which the questionnaire design has been optimized for display depending upon the mode and device used, simply documenting question wording and response options is insufficient. Design features such as scale orientation (vertical vs. horizontal), the use of verbal analogs (end points only or all points, regardless of device used for data collection), placement in a grid versus presentation on separate screens and a number of other visual design features can also influence responses (for an overview see Dillman, et al. 2014). These features are best captured by retaining production copies of the paper questionnaires and screenshots of web questionnaires on desktop and mobile devices.
In our review of surveys that have transitioned to self-administered or mixed modes, we were almost always able to find copies of paper questionnaires, but almost never able to find documentation of how the survey appeared on the web when that mode was used. Thus, in general,
better documentation and dissemination of screen captures of web and mobile surveys is needed. In the case of web especially, once the study is out of the field and a little time has passed, it may be difficult, if not impossible to reproduce the questionnaire how respondents saw it because of technological changes. Thus,
it is paramount that web and mobile screenshots be taken during the field period to provide accurate documentation.
Paradata. To the extent that data are collected via computerized means, auxiliary data about the data collection process can inform post-data collection analysis. These auxiliary data may include keystroke data both for interviewers as well as self-administered respondents, use of “help” screens, and timing information for any particular screen, questionnaire section, or the entire interview (Kreuter 2013; Olson and Parkhurst 2013). Surveys that transition from a computerized interviewer-administered mode to a computerized self-administered mode can use this information to understand differences in questionnaire and section length, as well as other particular problems encountered by respondents during data collection.
Interviewer Information. If interviewer administration is retained as part of the mix of data collection modes, information about interviewers is needed. At minimum, this should include an anonymized interviewer identification number for each case so researchers can nest respondents within interviewers. Interviewer ID numbers allow investigators to uncover some of the error in interviewer-administered surveys by examining interviewer bias and variance in responses (Groves 2004; Fowler and Mangione 1989; Elliott and West 2015). When possible and when there is no danger of identifying individual interviewers, additional interviewer characteristics such as race, gender, age, overall interviewing experience (i.e., tenure), and experience within the specific study (i.e., within study interview count) also may yield insights into potential interviewer-related error in a mixed-mode context (e.g., Catania, et al. 1996; Krysan and Couper 2003; Olson and Peytchev 2007).
5.8.2 Possible Analysis for the Evaluation of Mode Effects
Chapter 8 provides an overview of types of analyses that focus on diagnosing nonresponse and measurement errors in mixed-mode surveys. Here, we suggest additional analyses that survey researchers can conduct to evaluate the quality of data collected in different modes.
Item nonresponse rates. Self-administered modes tend to lead to higher rates of item nonresponse (Nicolaas and Tipping 2006) as well as the loss of information concerning the nature of the item nonresponse (refusal vs. don’t know). Mail questionnaires generally have slightly higher item nonresponse rates than web (e.g., see Survey Practice, Volume 5, Issue 2) and, in the case of paper instruments, item nonresponse rates may be higher due to noncompliance with skip patterns. These differences should be expected as a matter of course, but exceptionally high item nonresponse rates overall or to individual questions may indicate other problems with the questionnaire design in one or the other modes that should be further explored through testing. Chapter 4 discusses differences across modes in item nonresponse rates.
Response distributions. As described above, there are many reasons to expect differences in response distributions across modes, such as differences in social desirability due to interviewer presence, extreme positive responses in interviewer-administered modes, and differences due to automation or lack thereof. Sample composition differences may also lead to differences in response distributions across modes. Chapter 8 deals with these analyses in much more detail.
Open ended questions. Comparisons of responses to open-ended questions across modes can focus on either content of the responses or the quality of the responses, both of which require considerable data processing. With respect to content, researchers can examine whether the same substantive themes or ideas occur across different modes, a task that will require qualitative coding. With respect to quality, researchers can compare the amount of information collected, which can be operationalized as character or word counts, as a count of the number of themes (i.e., independent ideas that answer the question), or whether there was any elaboration (description or expansion on a theme) in a response. If audio recordings are available of interviewer-administered questions, researchers can also compare the response given by the respondent to the response recorded by the interviewer to assess interviewer accuracy in keying responses. Differences in responses to open-ended questions is described in Chapter 4.
Nondifferentiation. For items that appear in a series, a common measure of data quality is nondifferentiation, or the extent to which answers are the same (i.e., not varying) across the items. There are a number of different operationalizations of nondifferentiation (see Kim, et al. 2018 for an overview). The strictest is the straightlining rate, which is the percent of respondents who gave the exact same answer to all items in the series. Others such as the within-respondent standard deviation across the items in the series are less strict. Generally it is assumed that nondifferentiation is a form of satisficing or respondents shortcutting the response process; however, the content of a specific series of items one is assessing also influences nondifferentiation rates. For example, we might expect variation among a set of items about how often one does different recreational activities, but we would expect very little variation among a set of items about how often one engages in different criminal activities. In the latter case, most respondents are expected to select “never” for most items with the resulting nondifferentiation representing high quality responses, not measurement error. Differences across modes and devices in nondifferentiation rates is discussed in Chapter 4.
Response Order Effects. Response order effects can be an indication of measurement error. For example, if responses to ordinal scalar items are more highly sensitive to scale order in one mode than another, this could be an indication that respondents in that mode are reacting less to the content of the response options themselves and more to more formal features like the presentation of the scale. If one mode is more prone to primacy (i.e., higher selection of items appearing first regardless of their content), it is often assumed the mode is more error prone (i.e., respondents are satisficing or misunderstanding the scale order). Likewise, large differences in endorsement of nominal items based on their position in the response options can be an indication of respondent confusion or misunderstanding of response options. These types of comparisons require experimental designs where response option order is varied in the same way (inverted, randomized, etc.) within each of the survey modes being used. Without such designs, it is impossible to differentiate the effects of content from position. Differences in primacy and recency effects across modes is discussed in Chapter 4.
Response time. Response time is often used as a proxy for measurement error (Yan and Olson 2013). Response times that are too fast may indicate problems with the administration or answering of the questions, such as that respondents did not carefully think about their answer. Similarly, response times that are too slow may indicate that respondents were confused or distracted. Generally, differences in response times may be observed across computerized modes, but substantial differences may indicate a problem with questionnaire design.
Reliability. Reliability can be assessed several different ways, depending on the types of items and data at hand. Internal scale reliability (typically Cronbach’s Coefficient Alpha) for a set of related items can be compared across modes. Previous studies comparing scale reliability across modes have found few differences (de Leeuw 1992; Borkan 2010). Additionally, increased scale reliability may, in some cases, reflect increases in correlated measurement error rather an improved measurement (Peytchev 2007). Another way to assess reliability over time is with test-retest or repeated measures designs such as those used by Cernat (2015), Chang and Krosnick (2009), Braunsberger, Wybenga, and Gates (2007), and the multitrait-multimethod experiments reported by Saris and Gallhofer (2007a,b). Yet a third is to examine the extent of random error in a measure under the rationale that less random error yields higher reliability (Klausch, et al. 2013). Assessments of these types generally find few differences in reliability between interviewer-administered modes and few differences between self-administered modes, but self-administered modes have higher reliability than interviewer-administered modes.
Validity. Validity can also be assessed in different ways, with each requiring different types of data. Concurrent validity can be assessed by predicting the relationship between two measures taken in the same survey that are theoretically related and thus should be highly correlated. Previous research has shown little difference in concurrent validity between interviewer-administered modes (Jackle, Roberts, and Lynn 2010), but a slight advantage for web panel responses over random digit dial (RDD) telephone responses (Chang and Krosnick 2009). Validity can also be assessed by measuring the extent to which a time 1 attitude or behavior predicts a time 2 attitude or behavior that it should predict. For example, Chang and Krosnick (2009) show that pre-election candidate preferences were more strongly related to reported vote choice in web panel responses than in RDD telephone survey responses. The gold standard measure of validity is a record check study where self-reports can be compared to high quality records. In an early meta-analysis, de Leeuw (1992) found no difference in record check validity across modes.
5.9 Summary and Takeaways
5.9.1 There are a number of testing methods available to help with transitioning modes. Each yields a different type of information and thus is appropriate at different phases of the transition. Generally, extensive testing using multiple methods will be needed to make the transition as strong as possible; previous surveys that transitioned have used a combination of expert review, cognitive testing, usability tests, and field tests.
5.9.2 Evaluating the effect of a transition on measurement and data quality requires forethought and planning to ensure that the necessary data are collected to make desired comparisons. Among other things, this should include response mode, response device, question characteristics, paradata, and information about interviewers if they are utilized.
5.9.3 Such evaluations can examine item nonresponse rates, response distributions, quality of open-ended responses, nondifferentiation, response order effects, response time, reliability, and validity. Researchers need to ensure prior to data collection (whether a field test or production data collection) that they have the right design for the evaluations they choose to conduct.
Return to Top
Response rates to household surveys in the US and around the world are falling, both for face-to-face surveys (e.g., Williams and Brick, 2018) and for telephone surveys (e.g., Lavrakas, et al. 2017, Appendix D). Transitioning to a mixed-mode data collection can help improve coverage and reduce nonresponse (de Leeuw 2005; Cornesse and Bosjnak 2018). Research from a number of different countries show that using multiple modes to contact sampled units can improve response rates and potentially reduce nonresponse bias because different types of respondents are more or less likely to respond to certain modes (Messer and Dillman 2011; Bandilla, Couper, and Kaczmirek 2014; Dillman, Smyth, and Christian 2014; Kappelhof 2015). In our convenience sample of surveys that have transitioned from interviewer-administered to self-administered modes, 12 of 22 organizations reported that declining response rates to the interviewer-administered survey was extremely important in their decision to transition to a self-administered or mixed-mode survey, and 10 organizations reported that anticipated response rates to the self-administered or mixed-mode surveys were extremely important in their decision to transition. Of those 17 organizations that reported on what actually occurred to response rates as part of the transition, 5 reported that the survey response rate decreased with the transition, 5 reported that the response rate stayed about the same with the transition, and 7 reported that the response rate increased.
Figure 6.1 displays response rates from a set of surveys conducted in the US that have examined transitioning from interviewer-administered modes to self-administered modes, focusing only on one-stage (i.e., no screening) surveys where the self-administered mode survey was conducted within two years of the most recent interviewer-administered survey to help with comparability of the essential survey conditions for the two administrations. The figure orders the surveys by the year of the transition study and separates studies that examined concurrent mixed-mode designs, sequential mixed-mode designs, and single mode designs (either mail only or web only). Some of these comparisons are experimental (interviewer- and self-administered modes mounted at the same time) whereas others are observational (self-administered mounted at a different time, limited here to those with no more than two years between the interviewer- and self-administered surveys, or one mode used as a follow-up mode for another mode). The response rates are taken directly from the available reports or articles, and thus some are AAPOR Response Rates (RR1 and RR3 are common) whereas others are CASRO Response Rates (citations in appendix table 6.A). Many factors vary across the studies.
Yet patterns can be easily observed. In the one-stage surveys conducted between 2001 and 2012, response rates to the telephone mode tended to be higher or at about the same level as response rates to the self-administered or mixed modes. After about 2013, response rates to the self-administered or mixed modes tended to exceed those for the telephone mode.
6.2 Modes of Response in Self-Administered and Mixed-Mode Survey Transitions
Table 6.1 also contains the mode of administration for self-administered and mixed-mode surveys.
Early mixed-mode studies that transitioned from telephone often still included telephone as one of the contact modes and modes of data collection. For instance, the 2005 Health Information National Trends Survey (HINTS) contacted households by telephone and offered them the option of completing the survey online - this approach only yielded 95 web respondents and reduced the overall extended interview response rate from 65.4% for the telephone only group to 57.0% for the group provided a mode choice (Cantor, et al. 2005).
Many initial mixed-mode surveys used mail as a method of gaining telephone numbers for cellphone only households and/or households where the address could not be linked to a telephone number through a reverse directory telephone match. For instance, Allison, Stevenson, and Kniss (2014) sent a one-page mail questionnaire requesting a telephone number to an address-based sample of Wisconsin households that could not be reverse directory list-matched to a telephone number for the 2012 Wisconsin Family Health Survey, with 43% of unmatched households returning the questionnaire, 91.6% of which had a valid telephone number to be called for a telephone interview. This approach was also used in two pilot studies for the California Health Interview Survey (CHIS) (Jans, et al. 2013; Kali and Flores Cervantes 2016). Here, the CHIS matched an address-based sample with listed telephone numbers in selected counties who were called via the regular CHIS telephone approach. For portions of the sample that could not be matched to a telephone number, households were mailed a screening survey to request a telephone number (Kali and Flores Cervantes 2016). Only 15% of households returned the form with a telephone number, yielding a 9% completion rate among the unmatched ABS sample. However, this is higher than the main CHIS landline and cell phone sample completion rates (4.1% landline; 5.2% cell phone) (see also Jans, et al. 2013). Pilot work for the National Crime Victimization Survey tested two approaches in one metropolitan area – sending screeners to obtain telephone numbers only for unmatched households (about 40% of the sample) versus to all households (Brick, et al. 2013). In both approaches, about one-third of unmatched households completed the mail screener (47% of matched addresses returned the screener), and just under 75% of returned screeners contained a telephone number.
Other mixed-mode studies use telephone as a nonresponse follow-up data collection mode in addition to mail for addresses that are linked to telephone numbers through reverse directory lookup or other information on the sample frame. In the 2009 National Household Education Surveys pilot, matched addresses with telephone numbers (about 57% of the sample) were randomly selected for nonresponse follow-up with telephone or an additional mail attempt. Following up the mail survey with telephone calls yielded lower screener response rates (34.4%) than staying with mail alone (49.3%; Brick, Williams and Montaquila 2011). Additionally, the Racial and Ethnic Approaches to Community Health across the US Risk Factor Survey (REACH US) randomly assigned addresses matched to a telephone number to be initially contacted in a telephone mode and then nonrespondents followed up with a mailed paper questionnaire (the phone-first approach), or to be initially contacted with a mail questionnaire and then nonrespondents followed up with telephone (the mail-first approach) (Amaya, et al. 2015; see also Murphy, Harter, and Xia 2010; LeClere, et al. 2012). Following up mail nonrespondents with telephone had a higher screener response rate (48.7%) and higher interview completion rate (79.8%) than the following up telephone nonrespondents with mail (screener: 44.8%; interview: 70.8%). Finally, in a survey of students about sexual misconduct, Axinn, Wagner, Couper, and Crawford (2018) used an invitation email request to student email addresses, obtaining a 54.0% response rate. Interviewers then followed up with nonrespondents either on the telephone to encourage an online response or in-person, bringing tablet computers so that sampled members could complete the survey online at the time of face-to-face contact, increasing the overall response rate by 13 percentage points.
However, many surveys that transitioned from telephone to self-administered surveys abandoned telephone and used only mail for contact and data collection mode. For example, the National Household Education Survey (NHES) was an early adopter of a mail survey as an approach for transitioning from RDD to self-administered surveys (e.g., Montaquila, et al. 2013). NHES achieved response rates for a mail-only design that equaled or exceeded the most recent telephone administration of the surveys. Brick Andrews, and Mathiowetz (2016) report on a one-stage recruitment of a rare population (participants in recreational saltwater fishing in specific geographic areas) in which a single-stage mail survey yielded response rates across four states that averaged 34.7%, over three times higher than the telephone survey mounted at the same time in the same states (10.4%). A subsample of nonrespondents to the initial survey were re-sent the mail survey with an increased incentive of $5, yielding a combined weighted response rate of 64% across the two phases of data collection.
More recent mixed-mode studies use mail to recruit sampled individuals, but use only web as a data collection mode. For example, in Canada, the National Travel Survey transitioned to a mail recruitment of adults in households who are asked to go online and report information on domestic and international trips that they have taken for either personal or business-related reasons in a web survey (Bosa, Gagnon, and Caron 2017). In the US, the 2017 NHTS asks sampled households to complete a screener either online or via a mailed paper questionnaire. Households that complete the screener are mailed paper logs for recording their travel activity on a sampled day, and then asked to enter it online into a web instrument, or to complete it over the telephone (Federal Highway Administration and Westat 2018). The web component of the 2016 American National Election Studies (2018) mailed US households an invitation to complete a screening questionnaire online and have a randomly selected adult US citizen living in the household then complete the online questionnaire.
Other studies use a mailed contact letter to recruit respondents to complete either a mail or web questionnaire. In these designs, typically, a mailed recruitment letter to a web survey is followed in later contact attempts with a mailed paper questionnaire (Dillman, Smyth, and Christian 2014). For instance, Marlar, et al. (2017) use a mail screener survey to identify those with fishing activity (and a small subset of those without fishing activity), who are then sent a letter asking them to go online to complete the topical survey. Nonrespondents were followed up with a mail survey. The NSCH sent a mailed invitation to sampled addresses containing a URL, username, and password, plus a $2 or $5 cash incentive for logging on the web survey (US Census Bureau 2018; Ghandour, et al. 2018). Nonrespondents received repeated mailings about logging into the website to complete the screening instrument and the main questionnaire; remaining nonrespondents were sent a paper screening questionnaire and, if eligible, were mailed a paper topical questionnaire. The 2016 NHES added a web component to the data collection (McPhee, et al. 2018). Households were mailed a cover letter containing a URL and login information for the web survey; after completing the web screener, screener respondents who were the selected topical respondent immediately continued into the appropriate web survey. Households with an eligible person in the household who was not the screener respondent were mailed a topical web package containing a letter identifying the appropriate respondent. Nonrespondents to the web survey screener were sent a paper screening questionnaire; nonrespondents to the topical survey were followed up by a paper questionnaire for remaining nonrespondents.
Although early meta-analyses found that concurrent web-mail surveys had lower response rates than mail-only surveys, more recent experiments have found few consistent differences in response rates across single, concurrent, and sequential designs. For example, several recent studies have found no notable difference in response rates between single mode web-only or mail-only surveys and concurrent web and mail surveys (Mathews, et al. 2012; Steele, et al. 2016; Marken, Auter, and Marlar 2018; Biemer, et al. 2018). Other recent studies that have compared sequential modes (web+mail or mail+web) with single mode studies have found either no difference in response rates between these two designs (Weaver, Beebe, and Rockwood 2019) or higher response rates for the sequential modes than the single modes (McMaster, et al. 2017; Biemer, et al. 2018; Millar, et al. 2018). Finally, other studies have found similar or higher response rates for concurrent web+mail design than a sequential mixed-mode design (web+mail) (Lesser, et al. 2016; Bucks, Fulford, and Couper 2018).
Some of the difference in response rates for web surveys in recent years may be due to shifts in the proportion of adults who have access to the internet. In 2000, only about half of the US adult population had access to the internet (Pew Research Center 2019). This grew - about 90% of the US adult population in 2019 has access to the internet, and access is almost universal for adults under age 50. We speculate that this shift in both internet coverage and internet familiarity may change the “best practices” recommendations for how to combine and sequence modes. Clearly, more research and a systematic meta-analysis of recent studies comparing response rates across these combinations of modes of data collection is needed.
6.3 Longitudinal Surveys and Transitions to Self-Administered Modes
Longitudinal surveys of adults in the United States and throughout the world are often conducted with an initial face-to-face recruitment for the first wave of data collection, and transition to alternative, less expensive modes, for follow-up waves of data collection (de Leeuw 2005; Schoeni, Stafford, McGonagle, and Andreski 2013). Traditionally, the alternative less expensive mode of data collection has been telephone. For example, the Current Population Survey in the United States starts with a face-to-face recruitment of sampled addresses, follows up for the next three months with about 85% of the interviews conducted on telephone, returns for a face-to-face interview for the fifth wave, and then returns to telephone interviews (Bureau of Labor Statistics, 2018). Here, we will focus exclusively on longitudinal surveys that select a probability sample of individuals with the goal of following these individuals over time on a common set of measures rather than online panel surveys that allow clients to purchase administration of items with varying survey topics or content.
It is becoming increasingly common for longitudinal household surveys to transition from an interviewer-administered mode to a self-administered mode or a mixed-mode data collection strategy for at least some of the follow-up waves (Dillman 2009). The National Longitudinal Study of Adolescent to Adult Health (commonly known as Add Health) used face-to-face interviews for the first four waves of data collection, moving to web and paper survey instruments for the fifth wave of data collection in 2016-2018, with face-to-face and telephone follow-up and in-person administration of biomarker collection (Harris 2018). The Health and Retirement Survey (HRS) uses mail and web surveys during years in between the face-to-face data collection efforts to collect additional information from respondents on a wide variety of topics (Health and Retirement Survey 2019). The Panel Study of Income Dynamics (PSID) also uses a mixed-mode questionnaire during non-interview years; in 2014, a web data collection was used to collect information about experiences that the PSID sample member had as a child (McGonagle, Freedman, and Griffin 2017) and in 2016 a web followed by mail survey was used to collect information on topics including well-being, personality traits, and literacy and numeracy skills (Freedman 2017). Understanding Society, the UK Household Longitudinal Study, used a face-to-face recruitment during the first wave of the survey, with primarily face-to-face interviews through wave six. In Wave 7, nonresponding households to at least two prior waves were provided the opportunity to complete a web survey, with a face-to-face follow up. In Wave 8, the web-first group was expanded to 40% of the sample during the survey production year (Bianchi, Biggignandi, and Lynn 2017; Carpenter 2018). The Canadian Labour Force Survey (LFS) uses a face-to-face or telephone recruitment for the first wave of data collection. Starting in 2015, the Canadian LFS started offering a web survey for data collection in the second through sixth months of data collection (Francis and Laflamme 2015).
Other longitudinal surveys start with self-administered modes and use more expensive interviewer-administered modes for nonresponse follow-up or in an attempt to tailor to a respondent’s reported preferences. For instance, the High School Longitudinal Survey of 2009 collected self-administered electronic questionnaires from ninth grade students using an in-school administration, sequential web to telephone questionnaires from parents of those students (with mailed shortened questionnaires for nonresponse follow-up), and concurrent telephone or web questionnaires for teachers, school administrators, and school counselors (Ingels, et al. 2011). The 2017 National Survey of College Graduates (NSCG), collected by the US Census Bureau, is a longitudinal survey of adults holding at least a bachelor’s degree, sampled from the American Community Survey (SESTAT 2018; OMB 2017). Newly sampled persons are initially contacted via mail to complete a web survey, and nonrespondents are followed up first with a mail questionnaire and then a telephone interview. All longitudinal cases are provided information about completing the interview via web; some longitudinal cases are also provided information about completing the questionnaire via mail or telephone, with nonrespondents followed up with both web login and mailed information. Similarly, Monitoring the Future (MTF) is examining the use of mail recruitment to a web survey and email recruitment to a web survey for the longitudinal follow-up of survey respondents who initially complete a survey in the classroom (Patrick, et al. 2018).
6.4 Adaptive/Responsive Designs
Adaptive and responsive designs (Groves and Heeringa 2006) can be used to attempt to reduce nonresponse bias or survey costs by deliberately attempting to tailor data collection methods to the “optimal” method for individual sampled cases or groups of cases. The goal of an adaptive or responsive design is to maximize response rates, reduce costs, and make it more likely that the sample will adequately represent the target population. In practice, a common approach to adaptive design is to design an upfront differential strategy that will target specific data collection strategies to subsamples to gain cooperation. Self-administered and mixed modes of data collection are among the strategies that are considered in adaptive and responsive designs.
Surveys that are transitioning to mixed-mode or self-administered surveys from telephone surveys can plan to use adaptive mixed-mode strategies before data collection begins.
Planning for active monitoring of multiple metrics during data collection is important when transitioning to self-administered or mixed modes. This planning is especially important when using experiments to identify the “best” mode or combination and sequencing of modes going forward and there is no prior data to use to conduct initial data analyses to understand the impact of these decisions. For instance, one might tailor mixed-mode strategies to certain subgroups based off of information on the sampling frame, even without existing data about the potential benefit of these decisions. The 2015 National Census Test used a mixed-mode design with various mail strategies. The majority of the strategies started with mailed letters containing a URL and login information for accessing the web questionnaire, followed by mailed questionnaires to nonrespondents (a web-push design). However, in areas with low internet penetration (Phelan 2016), identified from geographic information on the Census Master Address File, sampled addresses were offered a choice between mail and web from the initial mailing.
In an example of a survey that transitioned from face-to-face to a self-administered mixed-mode format, Murphy, Biemer, and Berry (2018) used adaptive and responsive design approaches to monitor data collection during a mixed-mode experiment for the RECS pilot. A number of nonresponse-related metrics were identified as being important to monitor for each mode condition, including completion rates, the percent of completed questionnaires submitted via the web, metrics of relative cost, and important key estimates, including housing unit type and current heating fuel estimate versus the national benchmark, among others. Because the survey had preidentified these metrics and actively monitored them during the field period, these metrics could be used to compare the yield from the self-administered modes to the face-to-face administered main RECS survey. When the face-to-face RECS fell behind the self-administered modes, nonresponse follow-up for the face-to-face main RECS study was conducted using the self-administered mixed modes from the pilot study. A similar approach was used for monitoring the adaptive design in the mixed-mode National Longitudinal Survey of Adolescent to Adult Health (Add Health) Wave V data collection (Murphy, et al. 2019).
Surveys transitioning from telephone to self-administered or mixed modes can also use adaptive measures during data collection to attempt to “optimize” the use of different modes.
In particular, limited data collection resources can be judiciously assigned to where they are most likely needed via more expensive modes (i.e. interviewer-administered) or other data collection methods (e.g., incentives). For repeated cross-sectional surveys or longitudinal surveys that have previously had a mix of modes in data collection but want to increase the proportion of self-administered modes, statistical analysis and simulation using the existing data can be used to plan interventions in a new round of data collection. For example, Coffey and colleagues (2013, 2015; Finamore, et al. 2015) used interventions in the National Survey of College Graduates (NSCG), a sequential mixed-mode survey, to improve the representativeness of the sample. The potential interventions were preidentified through simulating the impact of different decisions on different metrics (e.g., R-indicators, costs) on a previous round of data collection (Coffey, et al. 2013). In the 2013 NSCG, the adaptive design deliberately changed the mix of modes during the field period to improve representation of the sample and control costs. In particular, they increased telephone follow-up to cases who were under-represented during data collection, including Black and Hispanic sampled cases who have a Bachelor’s degree, while reducing or eliminating telephone follow-up and increasing web follow-up for cases that were over-represented (Whites with a Bachelor’s Degree). This strategy resulted in a more representative sample, without increasing costs or negatively affecting response rates.
The Dutch Labor Force Survey also used an adaptive design to sequence the use of web, telephone, and face-to-face modes of data collection, as well as the use of additional call attempts in the face-to-face mode (Schouten, Peytchev, and Wagner 2018). Through use of a rich frame (the Dutch Population Register and the Dutch Tax Board Register), modes and the sequence of modes could be optimized across subgroups of the population to make the most efficient use of these modes while controlling costs and reducing potential nonresponse bias.
6.5 Designing Contact Attempts
When switching from an interviewer-administered to a mixed-mode survey, the entire recruitment protocol must change. Rather than call attempts administered through a telephone call scheduler, in self-administered modes and mixed-mode surveys that contain self-administered components, recruitment comes via mailings sent to a household or, for select studies, emails sent to the sampled individual. If there are interviewer-administered components, these may be attempted later in the data collection field period to reduce costs.
Table 6.2 contains a summary of the number and type of contact attempts used across a non-exhaustive list of surveys that have transitioned from telephone to self-administered or mixed modes without a screener, excluding surveys that used exclusively telephone to recruit sampled individuals to a survey and then offered other modes of data collection during the recruitment.
There is no single contact protocol used for self-administered or mixed-mode surveys. The number of contact attempts (sent via mail primarily; telephone, email and text message rarely) ranged from 1 to 14. About half of the studies in table 6.2 included an advance letter or advance postcard; the rest contained a questionnaire or login information directly from the first mailing. Follow-up materials tended to include at least one reminder postcard in almost all of the studies, at least one replacement paper questionnaire in studies that used a mail questionnaire, and at least one letter after the initial containing the survey URL and login information in studies that used a web questionnaire. Those studies that switched modes in a sequential mixed-mode web/mail design tended to do so at the second or the third contact attempt to the household. Those studies that switched at the third contact attempt did so after sending a reminder postcard to the household to complete the survey in the initial mode.
Table 6.2. Number and Type of Contact Attempts in Example Surveys That Have Transitioned from Telephone to Self-Administered or Mixed Mode
Survey Name |
|
# contact attempts |
|
Type of contact attempts |
|
Source |
CAHPS Hospice Survey |
|
2 |
|
Mail: (1) Invitation Letter and questionnaire; (2) Paper Questionnaire
Mail/Telephone: (1) Invitation Letter and questionnaire; (2) Telephone follow-up (5 attempts) |
|
Parast, et al. (2018) |
2005 Behavioral Risk Factor Surveillance System pilot |
|
3 |
|
(1) Invitation Letter and questionnaire; (2) Postcard reminder; (3) Replacement questionnaire |
|
Battaglia, et al. (2008); Link, et al. (2008) |
Coastal Household Telephone Survey |
|
3 |
|
(1) Invitation letter and questionnaire; (2) Postcard reminder / Telephone call reminders; (3) Replacement questionnaire |
|
Brick, Andrews, and Mathiowetz (2016) |
German Health Update 2.0 (GEDA) pilot study |
|
3 |
|
Concurrent web/mail/CATI: (1) Invitation letter and questionnaire and URL and log-in information plus CATI survey return form; (2) Letter with URL and login information; (3) Reminder Letter with URL and login information
Sequential web/mail/CATI: (1) Invitation letter with URL and login information; (2) Paper questionnaire and URL information; (3) Reminder letter with URL and login code and CATI survey form |
|
Mauz, et al. (2018); Hoebel, et al. (2014) |
Gallup Sharecare Wellbeing Index |
|
3 |
|
Mail only: (1) Invitation Letter and questionnaire; (2) Postcard reminder; (3) Postcard reminder
Concurrent web/mail: (1) Invitation letter and questionnaire and URL and log-in information; (2) Letter with URL and login information; (3) Postcard reminder
Sequential web/mail: (1) Invitation letter with URL and login information; (2) Paper questionnaire and URL; (3) Postcard reminder
Sequential mail/web: (1) Invitation Letter and questionnaire; (2) Letter with URL and login information; (3) Postcard reminder |
|
Marken, Auter, and Marlar (2018) |
Canada National Travel Survey pilot |
|
3 |
|
(1) Invitation letter with URL and login information; (2) Letter with URL and login information; (3) Reminder letter with URL and login information |
|
Bosa, Gagnon, Caron (2017 ) |
2015 New York Adult Tobacco Survey |
|
4 |
|
Web/mail sequential: (1) Advance letter; (2) Letter with URL and login information; (3) Reminder postcard with URL; (4) Paper questionnaire and URL
Mail/web sequential :(1) Advance letter; (2) Paper questionnaire; (3) Reminder postcard; (4) Paper questionnaire and URL |
|
Brown, et al. (2018 ) |
Dutch Crime Victimization Survey |
|
4 |
|
Mail to F2F: (1) Invitation Letter and questionnaire; (2) Paper questionnaire; (3) Replacement questionnaire; (4) Face-to-face attempt
Web to F2F: (1) Invitation letter with URL and login information; (2) Letter with URL and login information; (3) Reminder Letter with URL and login information; (4) Face-to-face attempt |
|
Klausch, Hox, and Schouten (2015) |
American Crime Victimization Survey Field Test |
|
4 |
|
(1) Invitation Letter and questionnaire; (2) Postcard reminder; (3) Reminder letter and questionnaire; (4) Replacement questionnaire |
|
Williams, Edwards, Giambo, and Kena (2018) |
2018 California Health Interview Survey pilot |
|
4 |
|
(1) Invitation letter with URL and login information; (2) Postcard reminder with URL and login information; (3) Reminder letter with URL and login information; (4) Telephone follow-up |
|
Wells, et al. (2018) |
2006-2014 ODOT surveys |
|
5 |
|
Mail: (1) Advance letter; (2) Paper questionnaire; (3) Postcard reminder; (4) Replacement questionnaire; (5) Replacement Questionnaire
Web/Mail sequential: (1) Advance letter; (2) Letter with URL and login information; (3) Reminder postcard with URL; (4) Replacement questionnaire; (5) URL Letter and Questionnaire |
|
Lesser, et al. (2016) |
2007 Health Information National Trends Survey |
|
5 |
|
(1) Advance letter; (2) Paper questionnaire; (3) Postcard reminder; (4) Replacement questionnaire; (5) IVR experiment |
|
Cantor, et al. (2009) |
Survey of Consumer Attitudes |
|
5 |
|
Mail: (1) Advance letter; (2) Paper questionnaire; (3) Postcard reminder; (4) Replacement questionnaire; (5) Postcard reminder
Mail/Web concurrent: (1) Advance letter; (2) Paper questionnaire and URL information; (3) Postcard reminder; (4) Paper questionnaire and URL; (5) Postcard reminder
Web/Mail sequential: (1) Advance letter; (2) Letter with URL and login information; (3) Reminder postcard with URL; (4) Reminder Postcard with URL; (5) URL Letter and Questionnaire |
|
Elkasabi, et al. (2014); Survey of Consumers (2012) |
National Immunization Survey |
|
5 |
|
(1) Advance postcard; (1) Letter with URL and login information; (3) Reminder postcard with URL; (4) Reminder letter with URL and login information; (5) Reminder postcard with URL and login information |
|
Skalland , et al. (2017) |
2015 Residential Energy Consumption Survey National Pilot study |
|
7 |
|
(1) Advance postcard; (2) Letter with URL and login information; (3) Reminder postcard; (4) Paper questionnaire and URL; (5) Postcard reminder; (6) Reminder letter; (7) Short questionnaire |
|
Biemer, et al. (2018) |
National Survey of College Graduates |
|
14 |
|
(1) Advance letter; (2) Letter with URL and login information; (3) Reminder postcard; (4) Reminder letter with URL and login information; (5) Reminder email; (6) Reminder letter with URL and paper questionnaire; (7) Reminder postcard; (8) Telephone reminder; (9) Reminder letter with telephone information; (10) Telephone calls; (11) Reminder letter with URL; (12) Reminder letter with URL and paper questionnaire |
|
Coffey (2016); National Academies of Sciences, Engineering, and Medicine (2018) |
In general, many of the mail-based protocols reflect the recommendations made by Dillman, Smyth, and Christian (2014, p. 373). These protocols either consist of 5 contact attempts, including an advance mailing, questionnaire, reminder (letter or postcard), replacement questionnaire, and final reminder, or 4 contact attempts of a full questionnaire packet, reminder (postcard or letter), replacement questionnaire, and final reminder. Surveys that start with a paper questionnaire generally use at least two or three mailings with a complete paper questionnaire. Surveys that include a web questionnaire generally include the URL and login information at each mailing after the advance letter (Bosa, Gagnon, and Caron 2017; American National Election Studies 2018; Mauz, et al. 2018). Thus, the surveys that include web as one of the modes generally include the full information for participating in the web survey (URL, login information) at more mailings than the mail surveys that include the paper questionnaire at only a subset of the mailings. This makes sense from a cost perspective (paper questionnaires are more expensive to print and mail) and potentially from an error perspective (web questionnaires require more effort for the sampled individual to login to and complete than the mail survey).
In mixed-mode surveys that combine both web and mail sequentially, nonresponding households receive the paper questionnaire in the second (Marken, Auter, and Marlar 2018; Mauz, et al. 2018), third (Elkasabi, et al. 2014; Han, et al. 2010; Biemer, et al. 2017; Ghandour, et al. 2018), or fourth mailing (Lesser, et al. 2016; Brown, et al. 2018) to the household. In a second type of design, a mailed recruitment letter to a web survey is sent to the household with nonrespondents followed up with an interviewer-administered mode (Klausch, Hox and Schouten 2015; Federal Highway Administration and Westat 2018; Wells, et al. 2018).
Table 6.3 contains a list of surveys that contain a screener, which use up to 9 contact attempts when combining the total number mailings for both the screener and topical surveys.
Surveys that include a separate screening questionnaire from the topical or main questionnaire have very similar mailing protocols. Almost all use either 4 or 5 mailings at each stage, with almost all of these studies requesting a response (e.g., including a screener questionnaire, URL and login information, form for telephone number) in the initial mailing rather than starting with an advance letter.
Table 6.3. Number and Type of Contact Attempts in Surveys That Have Transitioned from Telephone to Self-Administered or Mixed Modes
Survey Name |
|
# of contact attempts |
|
Type of contact attempts |
|
Source |
Wisconsin Family Health Survey |
|
Screener: 3
Topical: Not specified |
|
Screener: (1) Invitation letter and form requesting telephone number; (2) Reminder postcard; (3) Cover letter and form requesting telephone number
Topical: (1) Telephone to addresses that returned form with telephone number |
|
Allison, Stevenson, and Kniss (2014) |
National Household Education Survey: 2009 Pilot Study |
|
Screener: 4
Topical: 5 |
|
Screener: (1) Screener questionnaire and letter; (2) Reminder postcard; (3) Reminder screener questionnaire and letter or telephone reminder; (4) Reminder screener questionnaire and letter or telephone reminder
Topical: (1) Topical questionnaire and letter; (2) Reminder postcard; (3) Reminder topical questionnaire and letter; (4) Reminder topical questionnaire and letter; (5) Telephone follow-up |
|
Brick Williams, Montaquila (2011) |
National Survey of Veterans |
|
Screener: 4
Topical: 4 |
|
Screener: (1) Advance letter; (2) Screener survey and letter; (3) Postcard reminder; (4) Reminder survey
Topical:
Web: (1) Letter with URL and login information; (2) Reminder postcard; (3) Paper questionnaire and letter with URL; (4) Paper questionnaire and letter with URL and telephone call in information
Mail: (1) Paper questionnaire and letter; (2) Reminder postcard; (3) Paper questionnaire; (4) Paper questionnaire and telephone call in information |
|
Han, et al. (2010) |
National Household Education Survey: 2011 Field Test |
|
Screener: 4
Topical: 4 |
|
Screener: (1) Screener questionnaire and letter; (2) Reminder postcard; (3) Reminder screener questionnaire and letter; (4) Reminder screener questionnaire and letter
Topical: (1) Topical questionnaire and letter; (2) Reminder postcard; (3) Reminder topical questionnaire and letter; (4) Reminder topical questionnaire and letter |
|
Montaquila, et al. (2013) |
2013 California Health Interview Survey ABS pilot |
|
Screener: 4
Topical: Not specified |
|
Screener: (1) Invitation letter and form requesting telephone number; (2) Reminder postcard; (3) Cover letter and form requesting telephone number; (4) Cover letter and form requesting telephone number
Topical: (1) Telephone to addresses who returned telephone number request and those who matched directory listings |
|
Jans, et al. (2013); California Health Interview Survey (2016 ) |
2017 National Household Travel Survey |
|
Screener: 4
Topical: 5
Telephone: 7 days |
|
Screener: (1) Invitation letter and paper questionnaire; (2) Reminder postcard; (3) Letter and paper questionnaire; (4) Reminder postcard with URL and PIN for online completion
Topical - web: (1) Letter with URL and login information and paper travel log; (2) Email/text/IVR reminders pre-travel day; (3) Up to 3 email/text post-travel day reminder; (4) Switched to phone if provided phone number
Topical – phone: Interviewers attempted calls for 7 days after assigned travel day |
|
Federal Highway Administration and Westat (2018) |
2016 National Survey of Children’s Health |
|
Screener: 5
Topical: 4 |
|
Screener: (1) Invitation letter with URL and login information; (2) Reminder letter with URL and login information; (3) Reminder letter with URL and login information; low web received paper screener; (4) Reminder letter with paper screener; (5) Reminder letter with paper screener
Topical for web nonrespondents: (1) Paper Questionnaire; (2) Follow-up paper questionnaire; (3) Follow-up paper questionnaire; (4) Follow-up paper questionnaire |
|
Ghandour, et al. (2018) |
6.6 Incentives
A ubiquitous finding across surveys and across modes of data collection is that prepaid incentives raise response rates compared to no incentives, and that prepaid incentives are more successful at encouraging response than promised incentives (Singer and Ye, 2013; Mercer, et al. 2015). As such, surveys that transition from telephone to self-administered or mixed modes often use incentives as part of the recruitment protocol. Most of the studies reported on in our survey of organizations that have transitioned use incentives (20 of 24 answering). About half of respondents reported that the level of incentives used in each mode did not change, other than to account for inflation as time passed. In general, the changes reported were modest in size, though one respondent reported that the shift to internet data collection brought enough cost savings to offer gift cards when no budget had previously been available for incentives. One organization shifting from RDD to web reported a shift in incentives from $5 for cellphone respondents to variable incentives for all respondents ranging from $5 to $20. Another is offering a bonus incentive for respondents who voluntarily shift from paper to web. Both pre-paid and promised incentives were reported.
In mailed invitations to a mail or web survey, pre-paid incentives are highly effective in increasing participation. Table 6.4 contains an overview of incentives that have been offered in surveys that have transitioned to self-administered or mixed modes of data collection. Many of these studies included experimental comparisons of incentive levels versus a $0 condition (excluded from this table); surveys that transitioned but did not mention an incentive level are excluded from this table.
Table 6.4: Summary of monetary incentive levels and example studies using the incentive amount
Incentive Amount |
|
Example Studies |
Prepaid |
|
|
Amount not reported |
|
Brick, Andrews, and Mathiowetz (2016); Breton, et al. (2017) |
$1 |
|
Skalland, et al. (2017); Andrews, Brick, and Mathiowetz (2013); Williams, Edwards, Giambo, and Kena (2018) |
$2 |
|
Brick Williams, Montaquila (2011); Cantor, et al. (2009); Montaquila, et al. (2013); Allison, Stevenson, and Kniss (2014); Jans, et al. (2013); Ghandour, et al. (2018); Federal Highway Administration and Westat (2018); Wells, et al. (2018); Jackson, McPhee, and Lavrakas (2019); Williams, Edwards, Giambo, and Kena (2018) |
$5 |
|
Montaquila, et al. (2013); Elkasabi, et al. (2014); Murphy, Harter, and Xia (2010); LeClere, et al. (2012); Ghandour, et al. (2018); Federal Highway Administration and Westat (2018); Amaya, et al. (2015); Brown, et al. (2018) |
$10 |
|
Jackson, McPhee, and Lavrakas (2019) |
$20 |
|
American National Election Studies (2018) |
$30 |
|
National Academies of Sciences, Engineering and Medicine (2018) |
Promised |
|
|
$5 |
|
Cantor, et al. (2005); Brick Williams, Montaquila (2011); Montaquila, et al. (2013) |
$10 |
|
Biemer, et al. (2017); Montaquila, et al. (2013) |
$15 |
|
Cantor, et al. (2005); Brick Williams, Montaquila (2011); Montaquila, et al. (2013) |
$20 |
|
Allison, Stevenson, and Kniss (2014); Biemer, et al. (2017); Montaquila, et al. (2013); Federal Highway Administration and Westat (2018) |
Promised >$20 |
|
American National Election Studies (2015); American National Election Studies (2018); Harris (2019) |
Looking across surveys, prepaid incentives of $2 and $5 are common. Promised incentives are less commonly used, but when used, tend to be larger in value than prepaid incentives. In mixed-mode surveys, a combination of prepaid and promised incentives can be effective in pushing respondents to a new mode. For instance, the proportion of respondents who complete via a web instrument in a web+mail survey can be increased when a small prepaid incentive is followed by a larger promised incentive paid to those who respond by web (Biemer et al. 2018).
One concern often voiced for federal surveys that are transitioning from telephone to self-administered modes on the use of incentives is potential restrictions from the Office of Management and Budget (OMB) on the use of incentives. OMB has allowed the use of incentives in Federal data collections, albeit on a limited basis. Guideline 2.3.1 of the Standards and Guidelines for Statistical Surveys (OMB 2006) notes, “Although incentives are not typically used in Federal surveys, agencies may consider use of respondent incentives if they believe incentives would be necessary to use for a particular survey in order to achieve data of sufficient quality for their intended uses(s).” Typically, requests for incentive use must be approved based on specific justification for their use as part of the overall survey docket, and ideally with explicit plans for evaluation of their effectiveness. As such, incentive experiments are often a part of federal surveys that have transitioned from interviewer-administered to self-administered modes, with amounts as represented in the Table 6.4 above and a control condition of $0 for comparison.
In addition to the question of whether or not to use pre-paid or post-paid monetary incentives, some studies that have transitioned to a self-administered or mixed mode have examined (1) how to distribute monetary incentives, and (2) whether to use a non-monetary incentive. Monetary incentives can be distributed via debit cards, plastic or electronic gift cards, cash, or checks. Debit cards and checks incur costs only when they are cashed, which can result in significant cost savings to the survey organization. For example, in the 2015 mixed-mode NSCG, only 35% of recipients used the debit card (Vasquez 2019). Similarly, in the mail component of Phase III of the Agricultural Resource Management Survey, prepaid $20 ATM cards were cashed by 39% of recipients (Beckler, Ott, and Horvath 2005). With self-administered and mixed-mode surveys, some types of incentives may be more appropriate for specific modes. For example, electronically delivered incentives may make sense for web surveys where sample members are only contacted via email, whereas incentives such as cash, debit cards, or checks may be more appropriate for those who are contacted via mailed letters.
Non-monetary incentives can also be delivered in a mixed-mode survey, although with more limited effectiveness. For example, the web and mail 2018 National Sample Survey of Registered Nurses used lanyards and pens, incentives that were thought to be salient to the target population (US Census Bureau 2017, FR-2017-13292). The 2014 NHES-Feasibility Test included a Department of Education magnet in the screener questionnaire. There was no statistical benefit to including this non-monetary incentive on response rates or eligibility rates (McQuiggan, et al. 2015).
Survey practitioners also need to decide which sample cases will receive an incentive. One option is to target incentives or to provide differential incentives to groups that are least likely to respond or are demographically different. For example, Jackson, McPhee, and Lavrakas (2019) used a tailored incentive in the sequential mixed-mode web and mail 2016 NHES, targeting higher incentive levels to those estimated to have lower response propensity. This targeted incentive lowered response rates relative to a uniform incentive to all respondents and did not improve the representativeness of the survey relative to population benchmarks. In contrast, the mixed-mode 2015 NSCG used a targeted incentive for those who are less likely to participate and contribute significantly to the final estimates through having a large weight (National Academies of Sciences, Engineering and Medicine 2018). This targeted incentive (and other targeted interventions) improved representativeness of the final respondent pool (Thieme and Reist 2017). More work is needed in self-administered and mixed-mode surveys on optimal allocation of resources for incentives.
6.7 Tracking Contacts in All Modes
One challenge to surveys when implementing a mixed-mode survey design is keeping track of the contact attempts via recorded paradata for each sampled case in each mode. When using a mixed-mode survey, especially with an adaptive design strategy during which interventions occur, it is important to consider what kinds of measures will be used before, during, and after data collection to effectively evaluate response rates, representativeness, and data quality. Some surveys may simply want to keep track of what was done for contacting and gaining cooperation with each sampled unit, and plan for analysis of the data after the data collection period is over. Other surveys may want to produce estimates of interest and quality measures regularly during data collection to help with data monitoring and to measure the impact of interventions. As such,
having data collection systems that effectively track what contacts cases have received and ensure interventions are properly employed are critically important. Having systems that talk to each other across multiple modes and also permit real-time analysis of data collection may be challenging or require significant infrastructure development at survey organizations.
Challenges in managing mixed-mode data collection systems may be especially present for smaller survey organizations. Smaller organizations may use off-the-shelf web or telephone survey software systems that do not easily permit managing the number and types of mailings and contact attempts across modes, especially for non-computerized modes (e.g., mail). Additionally, off-the-shelf software systems are limited in the number and types of analyses that can be done regularly.
As such, survey organizations may manage and evaluate the mailings and web-based contacts in different files, using Excel, SPSS, SAS, or other spreadsheet-style programs for analysis and reporting. For instance, Murphy, Biemer, and Berry (2018) report using SAS and Excel to create daily reports for monitoring the RECS’s mixed-mode data collection. These reports include response rates plotted on graphs throughout data collection and other field metrics, permitting evaluation of the effectiveness of different data collection strategies (see also Kreuter and Olson 2013). Although there are few available examples of using off-the-shelf software for mixed-mode surveys, lessons may be drawn from interviewer-administered surveys. For example, Kirgis and Lepkowski (2013) report using SAS and Excel to create visualizations for monitoring the in-person National Survey of Family Growth. Jans, Sirkis, and Morgan (2013) use SAS to create quality-control charts for the National Health Interview Survey.
Alternatively, some organizations may build in-house mixed-mode data collection systems, requiring substantial commitment of resources, planning, and extensive use of field managers, researchers, and IT professionals. For example, the NSCG required restructuring the entire paradata system to jointly manage and monitor mail, web, and telephone contact attempts, rather than manage each mode separately (Reist 2014). Other large survey organizations developed in-house sample and data management systems that track movement of cases through different modes and types of contact attempts, each requiring multiple years of planning and integration (Cheung and Maher 2015, Wernimont and Snowden 2015, Edwards, Maitland, and Connor 2017, Bonhomme 2018). Krzyzanowski, Qin, Robinson, and Sikes (2018) report on building extensive custom overlays to an existing off-the-shelf software system to manage a web and face-to-face mixed-mode survey. More research is needed on how to “best” design these systems, or how to integrate analyses with the field management systems (Schouten, Peytchev, and Wagner 2018, p. 103).
6.8 Sample Composition
One important question for switching to a self-administered or mixed-mode data collection is how well these data collection efforts reflect the characteristics of the target population. Telephone surveys systematically miss certain individuals who cannot be contacted, refuse to participate, or do not speak the language of the survey. Transitioning to a mail or web survey adds other potential causes of nonresponse, including low literacy, lack of internet access, and low familiarity with computers (e.g., Brick, Williams, and Montaquila 2011). Our goal in this section is not to review all of the literature evaluating nonresponse bias on self-administered or mixed-mode surveys. Rather, we look across a set of surveys we have identified that have transitioned from telephone to self-administered and mixed modes. These surveys are inconsistent in whether and how nonresponse bias is evaluated. In many instances, we lack information about nonresponse bias on these estimates prior to the transition to a self-administered mode. As such, we identify trends across these studies in demographic characteristics of who is over- or underrepresented in the self-administered or mixed-mode surveys, but cannot easily comment on whether these biases are better or worse than that for the same telephone surveys.
We focus first on demographic variables commonly used in weighting schemes – age, sex, and race. First,
mail surveys of the general population tend to underrepresent younger adults and overrepresent older adults (Battaglia, et al. 2008; Han, et al. 2010; Klausch, Hox and Schouten 2015; Lesser, et al. 2016; NHES 2016; Mauz, et al. 2018), similar to recent unweighted telephone surveys (Keeter, et al. 2017). The degree to which age distributions in the respondent pool differ from that of the population differs somewhat across modes, where some (but not all) web or web with mail follow-up surveys yielded a higher proportion of younger adults or more representative sample based on age (Klausch, Hox and Schouten 2015; Biemer, et al. 2017; Marken, et al. 2018; Wells, et al. 2018). Second,
there are few consistent patterns across studies in whether men or women are more likely to participate in certain self-administered modes of data collection. In some studies, men are overrepresented in self-administered modes (Han, et al. 2010; Lesser, et al. 2016; Winneg, Ben-Porath, and Jamieson 2017; McPhee, et al. 2018), in other studies women are overrepresented (DeBell, et al. 2017; Breidt, et al. 2018), and in other studies, there is no difference (Klausch, Hox and Schouten 2015). Third,
racial/ethnic minorities are underrepresented in self-administered and mixed-mode surveys, either when looking at the race/ethnicity of the respondent (Battaglia, et al. 2008; Link, et al. 2008; Han, et al. 2010; Brick, Williams, and Montaquila 2011; Kali and Flores-Cervantes 2016; DeBell, et al. 2017; Winneg, Ben-Porath and Jamieson 2017; Breidt, et al. 2018; Wells, et al. 2018) or in areas with higher proportions of racial/ethnic minorities (Cantor, et al. 2005; Cantor, et al. 2009; McPhee, et al. 2018). This is similar to underrepresentation of minorities in telephone surveys (Link, et al. 2008; Keeter, et al. 2017).
Next, we examine socioeconomic variables of education and income.
More highly educated individuals are systematically overrepresented in self-administered and mixed-mode surveys (Battaglia, et al. 2008; Link, et al. 2008; Brick, Williams, and Montaquila, 2011; Lesser, et al. 2016; DeBell, et al. 2016; NHES 2016; Marken, et al. 2018; but see Wells, et al. 2018; Breidt, et al. 2018), similar to telephone surveys (Link, et al. 2008; Keeter, et al. 2017).
The representation across income levels is more variable, partially due to substantial variation in how income is operationalized across studies. Some surveys show a higher proportion of higher income households participating in a mail or web survey than the population (Link et al. 2008; Lesser, et al. 2016; Marken, et al. 2018; McPhee, et al. 2018; Wells, et al. 2018), others show a greater representation of lower income households (Brick, Williams, and Montaquila 2011; Amaya, et al. 2015; Kali and Flores-Cervantes 2016; Breidt, et al. 2016), some show simply discrepancies in the income distribution (Klausch, Hox, and Schouten 2015; DeBell, et al. 2017; Biemer, et al. 2017). Education levels may also affect web uptake rates (e.g., Lesser, et al. 2016; Steele, et al. 2016). For example, in general population surveys including the American Community Survey and Decennial Census Tests, the proportion of the sample that participants via web is typically between 30 and 40 percent (Baumgardner 2018; Bentley 2019). This compares to approximately 80 percent in surveys with a more highly educated population (e.g., the National Survey of College Graduates) (Finamore 2019).
6.9 Unique Issues in Transitioning Surveys from Interviewer-Administered Modes to Self-Administered Modes
Collection of Additional Information. Sample surveys are increasingly collecting biomeasures (e.g., height, weight, saliva, blood), consent to link to administrative records, and other physical environmental samples (e.g., dust, air quality, soil), or geocoded measurements, in addition to asking survey questions. In our sample of surveys that transitioned to self-administered or mixed modes, 5 of 24 organizations reported collecting information in addition to survey data. These additional requests for information included blood samples, consent for linkages to administrative data, and geographical information. Chapter 4 discusses collecting biomeasures and consent to link in more detail. This is an area in self-administered and mixed-mode surveys that needs more research.
Nonresponse due to Language Difficulties. As noted in Chapter 3, surveys transitioning from telephone to self-administered or mixed-mode surveys should plan strategies for successful recruitment of non-English speakers and readers. Self-administered surveys systematically underrepresent racial and ethnic minorities, including those that speak languages other than English (e.g., Brick, et al. 2012; Wells, et al. 2018).
Surveys that have successfully recruited Spanish speaking respondents have translated the survey materials into Spanish and included those Spanish-language materials (letters, questionnaires, other survey information) in mailings from the very first contact attempt (e.g., Brick, et al. 2012; Jans, et al. 2013; Amaya, et al. 2015; Blake, et al. 2016; Skalland, et al. 2017; Ghandour, et al. 2018; McPhee, et al. 2018). Web surveys that also permit the respondent to “toggle” between English and Spanish-language questionnaires facilitate representation of Spanish-speakers in a questionnaire (e.g., Kennedy, et al. 2016; ANES 2018; Ghandour, et al. 2018). Offering a language-specific telephone line for non-English speaking sample members is less successful (e.g., Cantor, et al. 2009; Wells, et al. 2018).
Eligibility Rates.
Even when the target population for a study does not change, the observed eligibility rates may differ across modes due to differential nonresponse or other error sources. This may lead to differential sample yield from what was observed in the telephone survey, differential estimates of coverage and/or eligibility depending on the mode or combinations of modes, and may require different numbers of contact attempts depending on the mode selected for data collection. Early studies that transitioned telephone to mail questionnaires had a goal of adequately covering the cell phone population. For example, Link, et al. (2008), in examining the potential for a mail-based BRFSS, found that 6.5% of the mail survey respondents self-reported being cell-only and 1% had no telephone at all, aligning well with benchmark estimates from the National Health Interview Survey of 6.7% and 1.7%, respectively.
Additionally, the proportion of “valid” elements on the sample frame may vary across modes, either due to quality of the sampling frame, differences in how materials are delivered to those sampled units, or differences in the time domain for estimating the potential eligibility rate. For example, the NSCH estimated that 11% of the addresses would be non-residential or undeliverable, but found that 16% of addresses were confirmed to be undeliverable or nonresidential, with another 5% estimated to be undeliverable or nonresidential (US Census Bureau 2018). In the NHES, McPhee and Zuckerberg (2018) report that about 10% of the addresses in a sample are designated as undeliverable as addressed at some point during the data collection period, but that about 15% of these potentially undeliverable addresses actually return a completed mail questionnaire.
In surveys that target particular subgroups, screening eligibility rates can differ across modes of data collection and sample frames. For example, the National Survey of Veterans found that a mail-based screening instrument yielded better coverage of the target population of veterans (59.6%) than a web-based screening instrument (46.5%), and that including an informative paper insert in the mail-based group increased the effective coverage rate even further (66.1%) (Han, et al. 2010). These differences could not be attributed solely to different response rates across the groups. The mail-based 2011 NHES found that 32% of households had eligible children, slightly under the 35% estimated from the American Community Survey (ACS) (Montaquila, et al. 2013). This eligibility rate is similar to the 2009 NHES pilot study that found that 31% of households had children in eligible age ranges, compared to 35% in the ACS, but that children aged 1 year old and under were substantially undercovered (Brick, et al. 2011). When the NHES added a web component to the existing mail survey, eligibility rates and response rates differed across the modes of data collection (McPhee, et al. 2018). The Coastal Household Telephone Survey yielded around a 10% eligibility rate for general population households that engage in recreational saltwater fishing; a mail-based version had two frames – a license-based frame (which was included to screen more effectively) and an ABS sample that was matched to the license frame. As anticipated, eligibility was higher in a fishing license-based frame (37.2%), followed by an ABS sample that could be matched to the license frame (21.9%), followed by an ABS sample that could not be matched to the license frame (6.6%) (Andrews, Brick, and Mathiowetz 2013). The mail survey yielded a much higher estimate of fishing prevalence than the telephone survey. Similarly, an evaluation of the National Survey of Fishing, Hunting, and Wildlife-Associated Recreation found much higher incidence rates of fishing, hunting, and wildlife-watching in a mail-based approach than a face-to-face approach to the survey, a difference that could not be attributed solely to differences in screening decisions (Breidt, et al. 2018). More research is needed to understand how decisions made in each mode affect eligibility rates for surveys of different topics.
6.10 Summary and Takeaways
6.10.1 In surveys conducted in the early 2000 through about 2012, telephone survey response rates tended to be higher or at about the same level as the self-administered or mixed-mode surveys. After about 2013, the self-administered or mixed-mode surveys generally had response rates that exceeded the same survey’s telephone survey response rates. Recent experiments have tested single, concurrent, and sequential mixed-mode designs and found few differences in response rates.
6.10.2 The most common recruitment mode among surveys that have transitioned to self-administered or mixed modes is mail.
6.10.3 Telephone is still included as one of the modes of data collection, either as a primary data collection mode or a follow-up mode. Mail can be used to obtain telephone numbers for households where an address cannot be linked to a telephone number through a reverse directory telephone match, and as a follow-up mode for nonrespondents where a telephone number can be linked to a sampled address.
6.10.4 Mixed-mode studies mail a URL and login information for sampled addresses to complete the survey online. These surveys often include a mailed paper questionnaire in follow-up mailings, sometimes referred to as a web-push design.
6.10.5 Use of email to recruit sample members is limited to those studies using a special population selected from a list containing email addresses (e.g., students; employees), from probability or non-probability web panels, for studies that used a screener survey to collect an email address, or in longitudinal surveys where an email address has been obtained at a prior wave.
6.10.6 Text message invitations and reminders are rarely practical for one-time surveys, but can be particularly useful for panel and longitudinal surveys where texting consent can be obtained.
6.10.7 Longitudinal household surveys now often include a self-administered mode or a mixed-mode data collection strategy for at least some of the follow-up waves.
6.10.8 Planning for active monitoring of multiple metrics during data collection is especially important when using experiments to identify the “best” mode or combination and sequencing of modes going forward and there is no prior data to use to conduct initial data analyses to understand the impact of these decisions. Having data collection systems that effectively track contact modes and attempts, and ensure interventions are properly employed, is critically important, although this may be challenging or require significant infrastructure development at survey organizations.
6.10.9 Repeated cross-sectional surveys or longitudinal surveys that have previously had a mix of modes in data collection can use statistical analysis and simulation of existing data to plan interventions in a new round of data collection.
6.10.10 There is no single mailing protocol used for self-administered or mixed-mode surveys, although surveys that include a separate screening questionnaire and topical or main questionnaire use very similar mailing protocols.
6.10.11 Prepaid incentives of $2 and $5 are common incentive levels. Promised incentives are less commonly used, with the monetary levels of promised incentive levels much higher than prepaid incentive levels.
6.10.12 Self-administered surveys tend to underrepresent younger adults and racial/ethnic minorities and overrepresent older adults and adults with higher levels of education. There is less consistency in the quality of representation across gender and income categories.
6.10.13 Surveys that have successfully recruited Spanish speaking respondents have translated the survey materials into Spanish and included those Spanish-language materials (letters, questionnaires, other survey information) in mailings from the very first contact attempt.
6.10.14 Even when the target population for a study does not change, the observed eligibility rates may differ across modes due to differential nonresponse or other error sources.
Appendix Table 6.A: Citations for Response Rates for Surveys Conducted in Both Telephone and Self-Administered or Mixed-Mode Data Collection Modes
|
Year |
|
Survey Name |
Interviewer-administered |
Self-Admin / Mixed Mode |
Source |
2005 Behavioral Risk Factor Surveillance System six-state pilot |
2005 |
2005 |
Battaglia, et al. (2008); Link, et al. (2008) |
2005 Health Information National
Trends Survey (HINTS) |
2005 |
2005 |
Cantor, et al. (2005) |
2006-2014 ODOT surveys |
2006-2008 |
2006-2014 |
Lesser, et al. (2016) |
2007 Health Information National
Trends Survey (HINTS) |
2007 |
2007 |
Cantor, et al. (2009) |
National Household Education Survey: 2009 Pilot Study |
2007 |
2009 |
Brick Williams, Montaquila (2011) |
National Survey of Veterans |
2001 |
2009 |
Han, et al. (2010) |
Dutch Crime Victimization Survey |
2011 |
2011 |
Klausch, Hox, and Schouten (2015) |
National Household Education Survey: 2011 Field Test |
2007 |
2011 |
Montaquila, et al. (2013) |
Survey of Consumer Attitudes |
2011 |
2011 |
Elkasabi, et al. (2014); Survey of Consumers (2012) |
Racial and Ethnic Approaches to Community Health (REACH) U.S. Risk Factor Survey, Phase 3 |
2011 |
2011 |
LeClere, et al. (2012) |
German Health Update 2.0 (GEDA) pilot study |
2012 |
2012 |
Mauz, et al. (2018); Hoebel, et al. (2014) |
Wisconsin Family Health Survey |
2011 |
2012 |
Allison, Stevenson, and Kniss (2014) |
2013 California Health Interview Survey ABS pilot |
2013-2014 |
2012 |
Jans, et al. (2013); California Health Interview Survey (2016) |
American National Election Studies 2012 Time Series Study |
2012 |
2012 |
American National Election Studies (2015) |
Coastal Household Telephone Survey |
2013 |
2013 |
Brick, Andrews, and Mathiowetz (2016) |
2013-2014 California Health Interview Survey ABS pilot |
2013-2014 |
2013-2014 |
Kali and Flores Cervantes (2016) |
2015 Residential Energy Consumption Survey National Pilot study |
2009 |
2015 |
Biemer, et al. (2017); Residential Energy Consumption Survey (RECS) 2009 Technical Documentation Summary (2013) |
2015 Canada Election Study |
2015 |
2015 |
Breton, et al. (2017) |
CAHPS Hospice Survey |
2015 |
2015 |
Parast, et al. (2018) |
2015 New York Adult Tobacco Survey |
2015 |
2015 |
Brown, et al. (2018) |
2016 American National Election Studies Time Series Study |
2016 |
2016 |
American National Election Studies (2018) |
National Travel Survey pilot |
2016 |
2016 |
Bosa, Gagnon, Caron (2017); Statistics Canada (2018) |
National Immunization Survey |
2016 |
2016 |
Skalland, et al. (2017); CDC (2017) |
National Survey of Fishing, Hunting, and Wildlife-Associated Recreation |
2016 |
2016 |
Breidt, et al. (2018) |
2016 National Survey of Children’s Health |
2011-2012 |
2016-2017 |
Ghandour, et al. (2018) |
2017 National Household Travel Survey |
2009 |
2017 |
US Department of Transportation (2011); Federal Highway Administration and Westat (2018) |
Gallup Sharecare Well-Being Surveys |
2017 |
2018 |
Marken (2018) |
2018 California Health Interview Survey Push-to-web pilot |
2017 |
2018 |
Wells, et al. (2018) |
Return to Top
7.1 Introduction
A review of the current literature regarding transitions from single mode data collection efforts to mixed-mode data collection provides little experimental or empirical data with respect to how such transitions affect data processing. As a result, this section of the report will provide little experimental or empirically-based guidance; rather, our intent is to raise issues that could affect data quality, timing, and costs. We will draw on current examples of data preparation and processing used for mixed-mode surveys, but cannot evaluate alternative methods for doing data processing where none exist. We list factors to be considered as surveys move away from a single mode data collection to mixed-mode data collections.
As has been already noted (see Chapter 4), part of the challenge of mixed-mode data collection efforts that span varying levels of technology is whether and to what extent that technology is utilized in the capture of the data, when possible. For example, consider a mixed-mode data collection that utilizes both Internet web response and mail questionnaires, such as the current design for the U.S. American Community Survey (ACS). While range checks, validation, and data edits can be incorporated into the web-based instrument, these are not feasible in the paper format. Because these processes cannot be incorporated into both modes, designers must decide whether to maintain quality checks in the technology-assisted instruments and to what degree. This is a key data management issue—whether or not to take advantage of technology that could potentially improve data quality at the cost of varying data quality across the different modes. For example, does one integrate a range check for the web-based data collection, when such an option does not exist for the paper version?
In our convenience sample of organizations that have transitioned a survey from telephone to self-administered or mixed modes, eight of the respondents said that data editing for their project varies by mode; 11 said it did not. Open-ended responses about data editing provide no consensus about this aspect of the surveys. Respondents said to “be meticulous” and make sure multiple people examine the data, build edits into and thoroughly test the data capture process, save original files from the discrete mode sources and review each step of the process.
As a field, we have not uniformly identified, researched, and addressed basic philosophical questions about data processing in a mixed-mode survey environment. Is the goal of data processing in a mixed-mode data collection environment to have the resulting data blind to data collection mode or to preserve those differences that may arise from the mode of data collection? Should those differences be noted and addressed with the same rigor as response rates and sampling error? Another philosophy of data processing and management is to try to ultimately achieve comparable levels of quality across modes, and using the best methods within each mode to achieve it. A construct for quality in this case could mean comparable levels of consistency, completeness, and coherence. The concept of ‘comparable’ quality with mixed-mode design is open for discussion and research.
Regardless of the mode by which data are collected, once captured, the data often move through a process involving multiple stages before it is utilized for analysis. Drawing on the Generic Statistical Business Process Model developed by the United National Economic Commission for Europe (UNECE 2013), we organize this section of the report according to the following processes:
- Data integration
- Classification and coding
- Review and validation
- Editing and imputation
- Calculation of weights
- Finalization of data files
To illustrate the data preparation and processing, we expand on the ACS model of data preparation and processing (U.S. Census Bureau 2014, page 105, Figure 10-1). Figure 7.1 depicts the overall flow of data as they pass from data collection operations through data preparation and processing.
