AAPOR
The leading association
of public opinion and
survey research professionals
American Association for Public Opinion Research

Social Media in Public Opinion Research

Social Media in Public Opinion Research: Report of the AAPOR Task Force on Emerging Technologies in Public Opinion Research
 
May 28, 2014
 
Report Authors:
 
Joe Murphy, Co-Chair, RTI International
Michael W. Link, Co-Chair, Nielsen
Jennifer Hunter Childs, U.S. Census Bureau
Casey Langer Tesfaye, American Institute of Physics
Elizabeth Dean, RTI International
Michael Stern, NORC
Josh Pasek, University of Michigan
Jon Cohen, SurveyMonkey Mario Callegaro, Google Paul Harwood, Twitter
 
Additional Task Force Members:
 
Trent D. Buskirk, Marketing Systems Group
Michael F. Schober, New School for Social Research
 
Acknowledgements: We thank Scott Turner for his work on an earlier draft of this report.

Table of Contents
 
Executive Summary  
 
1.0 Background
 
1.1 AAPOR Council Charge and Report Focus  
1.2 AAPOR Reports on Related Topics  
 
2.0 Social Media Types, Usage, and Data
 
2.1 Social Media Types  
2.2 Social Media Usage  
2.3 Social Media Data  
 
3.0 Quality Considerations for Social Media in Research
 
3.1 Coverage and Sampling  
3.2 Data Completeness and Accuracy  
3.3 Analysis of Social Media Data  
 
4.0 Current Uses and Evaluations of Social Media in Research
 
4.1 Social Media to Inform the Survey Process  
4.2 Study Recruitment  
4.3 Locating Sample Members  
4.4 Social Media as a Supplement or Replacement for Surveys  
 
5.0 Legal and Ethical Considerations
 
5.1 Personally Identifiable Information  
5.2 Terms of Service  
5.3 Industry Ethical Guidelines  
5.4 Other Ethical Considerations for Researchers  
5.5 Public Perception  
 
6.0 The Road Ahead
 
6.1 Validating Social Media  
6.2 Addressing Coverage, Sampling, and Differential Access Challenges  
6.3 Designing Better Integrations of Surveys and Social Media  
6.4 Leveraging the Unique Features of Social Media  
6.5 Continuing to Refine Understanding and Guidance on Privacy and Ethics  
 
References
 
 
Appendix: Further Reading on Legal and Ethical Issues
 
 
Executive Summary
 
Public opinion research is entering a new era, one in which traditional survey research may play a less dominant role. The proliferation of new technologies, such as mobile devices and social media platforms, are changing the societal landscape across which public opinion researchers operate. As these technologies expand, so does access to users’ thoughts, feelings and actions expressed instantaneously, organically, and often publicly across the platforms they use. The ways in which people both access and share information about opinions, attitudes, and behaviors have gone through perhaps a greater transformation in the last decade than in any previous point in history and this trend appears likely to continue. The ubiquity of social media and the opinions users express on social media provide researchers with new data collection tools and alternative sources of qualitative and quantitative information to augment or, in some cases, provide alternatives to more traditional data collection methods.

The reasons to consider social media in public opinion and survey research are no different than those of any alternative method.  We are ultimately concerned with answering research questions, and this often requires the collection of data in one form or another.  This may involve the analysis of data to obtain qualitative insights or quantitative estimates. The quality of data and ability to help accurately answer research questions is of paramount concern. Other practical considerations include the cost efficiency of the method and speed at which the data can be collected, analyzed, and disseminated. If the combination of data quality, cost efficiency, and timeliness required by a study can best be achieved through the use of social media, then there is reason to consider these methods for research.

An additional reason to consider social media in public opinion and survey research is its explosion in popularity over the last several years.  At a time when many are eschewing landline telephones (Blumberg and Luke, 2013) or actively taking steps to prevent unsolicited contact (e.g. caller ID, restricted access buildings), many are now communicating and interacting online via social networking sites.  It is only natural for researchers to aim to meet potential respondents where they have the best chance of getting their attention and potentially gaining their cooperation. However, this brave new world is not without its share of issues and pitfalls – technological, statistical, methodological, and ethical, and much remains to be investigated. As the leading association of public opinion research professionals, AAPOR is uniquely situated to examine and assess the potential impact of new “emerging technologies” on the broader discipline and industry of opinion research. In September 2012, AAPOR Council approved the formation of the Emerging Technologies Task Force with the goal of focusing on two critical areas: smartphones as data collection vehicles and social media as platform and information source. The current report focuses on social media; a companion report covers mobile data collection.

This report examines the potential impact of social media on public opinion research – as a vehicle for facilitating some aspect of the survey research process (i.e., questionnaire development, recruitment, locating, etc.) and/or augmenting or replacing traditional survey research methods (i.e., content analysis of existing data).  We distinguish between qualitative insights and quantitative indicators from social media and discuss the factors that must be evaluated to determine its fitness for use.
 
 
DEFINING SOCIAL MEDIA, USAGE, AND DATA
Social media has been defined in many ways, but for the purposes of this report, we borrow the definition from Murphy, Hill, and Dean (2013), which is relevant for public opinion and survey research: “Social media is the collection of websites and web-based systems that allow for mass interaction, conversation, and sharing among members of a network.” Social media platforms have proliferated in recent years with a rapid increase in adoption and use by both members of the general public and specific subpopulations.  Social media is not defined by a single type of platform or data.  The list of popular platforms is long and can change rapidly.  Platform types include blogs, microblogs, social networking services, content sharing and discussion sites, and virtual worlds.

According to the Pew Internet and American Life Project, as of 2013, 81% of the U.S. adult population had Internet access, and of that population, 73% used social media. This rate differs most significantly by age group, but has increased dramatically over the last several years among all age groups. The largest demographic difference is by age.  Social networking sites are currently being used by 9 in 10 18-29 year olds but fewer than half of the 65+ population. (Duggan and Smith, 2013a).  Although social media popularity overall has skyrocketed in recent years, the popularity of individual social media sites has risen and fallen over time.  Certain other social media platforms are highly popular outside the U.S. And many platforms have changed in the features and access they offer over time.

Data from social media platforms capture a variety of information and come in several different formats, with different access methods and levels of availability.  Social media data can be purely text based or include audio or visual components. Data from social media sites can be accessed directly through the platform itself or through a range of partially to fully automated methods.  The specific types of information available can also change rapidly in the social media world. Platforms sometimes release large changes in both features and access with little to no warning.

A bounty of data is freely available to researchers, but the availability of data for research purposes is largely dependent on the terms and conditions of each site and is subject to change with little or no notice.
 
QUALITY CONSIDERATIONS FOR SOCIAL MEDIA IN RESEARCH
There are legitimate quality concerns with using social media in research.  Not every member of the public uses these platforms and those who do use them in different ways.  In this respect, social media may provide useful insights for a particular set of questions, but perhaps not more specific point-estimates which are generalizable to a broader population. The public nature of social media requires significant attention be paid to the barriers for sharing honestly and openly.  Just as we do with survey research, social media must be viewed practically and objectively and its potential advantages and error sources investigated and documented.

One of the most pervasive questions in using social media for data collection is its use for constructing a sample frame and recruiting respondents.  To date, there has been little progress in attempts to show how data collected through the use of social media sites can represent the general population.   Because social media users are not representative of the wider general public and due to the lack of reliable sampling frames, only non-probability samples can currently be gathered in this way. Researchers must consider the universe of people who use the Internet, who uses social media among those on the Internet, and how those people are represented on social media.

Social media research often faces issues of incomplete information. Unlike survey respondents who typically only provide information when prompted, those who use social media tend to post what they want, when they want, prompted or not. Those using social media usually also continue to have the ability to control their content after it is posted. They can edit or remove posts or change the privacy settings related to those posts. Additionally, much of the content in social media research frequently includes linked content across media sources, which are also subject to change. Along with the threat of incomplete data from social media, however, is the opportunity that comes from its abundance.  Survey data are typically captured from individuals once in cross-sectional studies and a limited number of times within particular time frames with longitudinal studies.  Social media, on the other hand, allows for a more continuous look at opinion, attitudes, and behaviors, when shared and when reflecting the truth.  The fact that social media is inherently public to some extent affects the likelihood that an individual will share the type of data in which we are interested. Stigma and social desirability may prevent honest and open sharing on certain topics.

In contrast to survey research, social media researchers less frequently specify individual people as the unit of analysis. The units of analysis in social media research can be individual posts, words, unique users, pages, or the like. Choosing a unit of analysis is usually a vitally important part of data selection and analysis.

Data analysis is done in a variety of ways by researchers from a wide variety of backgrounds. The most common form of data in social media analysis is textual data, and textual data can be analyzed in many different ways.  Text mining algorithms are popular with a growing legion of data scientists, and sophisticated computer programs have been built by machine learning experts to tackle the challenge of finding meaning in social media and other textual data. There is a growing set of text classifiers that are built by natural language processing specialists and linguists to uncover relevant underlying textual patterns using exploratory data visualizations, or markedly smaller-scale qualitative observations.
 
CURRENT USES AND EVALUATIONS OF SOCIAL MEDIA IN RESEARCH Researchers are recognizing the potential for social media to provide new options to conduct research more quickly, efficiently, and in new ways than in the past. Many researchers have begun to explore and implement methods to incorporate social media in public opinion and survey research in two basic ways.  The first is to actively identify, locate or interact with study participants.  The second is through passive monitoring, as an early warning or forecasting system, or as a supplement or alternative to survey data collection.

In the design phase of the survey lifecycle, social media has been used to inform questionnaire design, allowing researchers new insights into the survey topics and populations under consideration. In testing and preparing for data collection, social media has been used for targeted recruitment of respondents for cognitive interviews and focus groups, using non- probability sampling methods (AAPOR, 2013). In longitudinal studies, social media has been used to actively locate or stay in touch with sample members through outreach and engagement efforts.   Finally, data from social media and web-based systems have been used as both a supplement and proxy for survey data by “scraping” websites for information on people’s self- reported characteristics, behaviors, opinions, and interests.  We present and discuss examples of each type of use in this report.
 
LEGAL AND ETHICAL CONSIDERATIONS
Because regulation of new technologies can be a slow process, assessment of all relevant legal regulations in this area of research is challenging. The lack of legal guidance specific to new technologies puts respondents potentially at risk and leaves researchers with unanswered questions. U.S. regulations are only applicable for research within the U.S. Other countries and political regions have different, and sometimes stricter, protections of human subjects and the preservation of data collected. In the absence of clear legal direction, researchers need to self- regulate, adapting survey screeners and research documentation to accommodate the portability and flexibility of the platform on which we wish to conduct research so as to not erode the protection of human subjects.   In addition to legal requirements pertaining to the locations of the researchers and participants, researchers must also adhere to the terms of use of the sites that they wish to use.

Though there is much debate on the topic of informed consent and passive social media data collection, we argue that the way informed consent is applied depends on the public or private nature of the space. Public spaces can be likened to observing behavior in public. In situations where the terms of service clearly state that content will be made public, no consent should be necessary to conduct research on publicly available information. Researchers still need to maintain their code of ethics and protect the privacy of their research subjects. Researchers should also note the risks mentioned above with releasing information that can be re-identified. The benefits to the community should always outweigh the potential harm doing research.
 
THE ROAD AHEAD
Though a good deal of research to date has focused on an array of issues related to social media, far less is known in terms of if, when, and how such data may be fit for use in public opinion and survey research.   Currently, researchers are making use of social media to derive qualitative insights (rather than probability-based point-estimates), for pretesting purposes, and for a recruiting resource for nonprobability surveys. Looking forward, it is necessary for researchers to continue investigation into social media’s ultimate utility for public opinion research and the extent to which it can serve as a resource for both qualitative and quantitative applications. This will require replicable, impartial, transparent experiments to gauge its effectiveness as a source of opinion, attitudes and behaviors and/or as a platform for collecting such.  Here, we highlight just a few of the priority areas of research for our field:
 
Validating Social Media: A question of paramount concern is whether social media, when used as a substantive source of data, can provide accurate answers to certain research questions.  How do we know that our interpretation of posts on the Internet mean what we think they mean?  Or if they were made by individuals at all (with the pace of ethnographic, behavioral and linguistic research on social media is fast, many questions have yet to be answered about the growth or “bots” or computerized postings as well as those “paid to post”). In order to provide some validation, we will need to interact with those who post social media and learn more about their intentions, attitudes, and behaviors when producing content.  Just as we have validated survey items against gold standard data sources, we must also validate social media against more certain sources of information.
  
Addressing Coverage, Sampling, and Differential Access Challenges: A second area of concern is whether social media can be representative of the general population or even a subset of set of Internet or social media users. Although social media research can accurately reflect activity online, more research is needed to determine whether we can create a frame of social media users from which we can sample individuals for research with a known and non-zero probability. Research into inferred demographics is useful to fill in missing information on those who use social media, and detection of fake and duplicate accounts is also helping produce a clearer picture of the social media landscape, but there is much work to be done to be certain whether and how social media may represent the real world or even a subset of that world. A related set of issues involves the differential access to and use of the Internet and social media across various subgroups of the population.  The impact of differential access and use must be better understood and overcome if social media is to become a robust source of public opinion data in the years ahead.
 
Designing Better Integrations of Surveys and Social Media: To date, few studies have been published that directly compare survey responses with online behaviors. But this is an appealing option, both because it may allow areas of survey coverage error to be explored in greater detail than traditional survey research, and because it may allow social media coverage to be explored in unprecedented ways through links to survey and administrative records.
 
Leveraging the Unique Features of Social Media: Social media research has many drawbacks when compared with survey data for the purpose of generalizable research. However, there are unique aspects of social media that make it ideally suited for other types of research. One major advantage of social media is that it can provide a glimpse into the social networks of individuals. Beyond social networks, there may be other unique features that become evident from social media and opportunities to investigate and supplement research with those that are fit for use.
 
Continuing to Refine Understanding and Guidance on Privacy and Ethics: As with other types of research, we must place paramount importance on questions related to the privacy and ethical implications of social media research. Many questions remain to be answered about what topics are suitable for research with social media. We need a better understanding of the cases where benefits to the public of such research outweigh the possible harm. Balancing these privacy and ethical concerns along with the quality considerations and great potential for new insights into the study of public opinion, attitudes, and behaviors presents a significant challenge for the field of public opinion and survey research.  It is incumbent upon the field to explore this new world in a way that holds true to our values of ethical research, impartiality, transparency, and maximizing accuracy and quality in our measurements. 

REPORT

1.0 BACKGROUND

 
Social media technology is rapidly expanding and being adopted worldwide. It is natural for survey and public opinion researchers to consider adapting their methods to accommodate such new technologies. However, the appeal of social media for research goes beyond just embracing new technology, for several reasons. Social media may represent increased access to survey respondents. At a time when many are eschewing landline telephones (Blumberg and Luke, 2013) or actively taking steps to prevent unsolicited contact (e.g. caller ID, restricted access buildings), much of the population is now communicating and interacting online via social networking sites. Social media provides an avenue to meet potential respondents where researchers have the best chance of getting their attention and gaining their cooperation. It provides a potentially much less costly data source than a traditional designed survey. In many cases, at least a subset of personal data shared by any given person on social media (depending on user settings and platform features) is freely available to the public. Social media research can be less burdensome on research participants and less intrusive into their daily lives. Rather than taking 15 minutes to respond to a survey or a couple of hours to participate in a focus group, participants may agree once to share certain information and make available a steady stream of data about tastes, preferences, behaviors and choices to researchers without even having to think about it. Insights from research participants may be obtained more quickly through social media. Passive social media data capture and analysis enables instantaneous observations; active social media interaction with respondents requires time to recruit and interview, yet is still quicker than inviting them to a lab to participate in an interview or calling them on the phone to collect their opinions.  Social media data collection efficiently enables a broader range of types of analysis, such as the potential to perform social network analyses linked to opinions, sentiment and behaviors without recruiting and interviewing respondents through extensive network sampling procedures.

Social media for opinion research offers impressive potential. As with any other means of data collection, researchers are ultimately concerned with answering their research questions with data that are as accurate as possible. Other practical considerations include the cost efficiency of the method and speed at which the data can be collected, analyzed, and disseminated. If the combination of data quality, cost efficiency, and timeliness required by the study could best be achieved through the use of social media, then there is reason to consider these methods for research. For example, a market research company seeking qualitative insights about use of a particular product may find starting conversations with customers on social media to be more efficient than in-person focus groups.

There are legitimate concerns with using social media in research.  Not every member of the public uses these platforms and those who do use them in different ways.  The public nature of social media requires significant attention be paid to the barriers for sharing honestly and openly.  Just as with surveys, social media must be viewed practically and objectively and its potential advantages and error sources investigated and documented. The task force is aware that social media research at this time falls under the rubric of nonprobability research. That is, given the constraints of social media access, availability, and adoption, probability sampling of users or of data to represent a known population is not at this time an option. For this reason, social media is a source of insights rather than point estimates and allows researchers to examine the subtleties of opinion and behavior of individuals within their social context, but not in a way that allows generalizability. Although much of this territory is currently unknown, this report summarizes relevant literature and should be viewed as a working document as future research is conducted and these open questions are resolved.
 
1.1 AAPOR Council Charge and Report Focus
 
As the leading association of public opinion and survey research professionals, AAPOR is uniquely situated to examine and assess the potential impact of these “emerging technologies” on the broader discipline and industry of opinion research. In September 2012, AAPOR Council approved the formation of a task force to assess the opportunities and challenges emerging mobile and social media technologies might have on the fields of public opinion and survey research.

The AAPOR Emerging Technologies Task Force was first convened in October 2012 with the goal of focusing on two interconnected areas: smartphones as data collection vehicles and social media as platform and information source. These areas appear “ripe” for investigation, given that (1) each has widespread visibility and recognition within the industry as important new areas of development, (2) each area is already having an effect in many quarters of the survey discipline and related fields, and (3) there is sufficient initial empirical information within each area to allow us to begin assessing the relative merits and drawbacks of these potential approaches. The purposes of the task force are as follows:
  • define and delineate the scope and landscape of each area;
  • describe the potential impact in terms of quality, efficiency, timeliness and analytic reach;
  • discuss potential opportunities and challenges based on the empirical research available to date;
  • delineate some of the key legal and ethical considerations; and
  • detail the gaps in our understanding and propose avenues of future research.
At this juncture, the task force is not issuing detailed operational “how to” lessons; that will be an activity for future task forces which will explore in more detail each of these areas as they become more “mature” methodologies and/or sources of information.

This report and its companion, “Mobile Technology in Public Opinion and Survey Research,” are designed to inform those who study public opinions, attitudes and/or behaviors or have an interest in such research, including those involved in the collection and/or analysis of data as well as policymakers, members of the media, and the general public. As previously mentioned, these reports should be viewed as “living documents” in that they represent the state of the discipline at a particular point in time. Given the incredible speed of change in this area, how quickly new technologies are being developed, and the level of on-going research, both theoretical and practical, that is currently underway, we fully anticipate that the reports will require updating from time to time.

The data, examples, and discussions in this report focus mainly on the population of the United States, though the methods can be applied to other populations as well.  Aspects of the survey or data collection lifecycle that may include social media components but are not considered here include public relations campaigns and the dissemination of survey results using social media.  These activities do not, in themselves, involve the collection of information via social media.

Researchers are recognizing the potential for social media to provide new options to conduct research more quickly, efficiently, and in new ways than in the past. Many researchers have begun to explore and implement methods to incorporate social media in public opinion and survey research in two basic ways.  The first is to actively identify, locate or interact with study participants.  The second is through passive monitoring, as an early warning or forecasting system, or as a supplement or alternative to survey data collection.

We begin this report by discussing the social media landscape including types, usage, and data in Section 2.  Next, in Section 3, we discuss why social media is important for our research and its qualities as compared with more traditional resources. We then present, in Section 4, examples of the use of social media in public opinion and survey research, including active qualitative pretesting, recruiting, locating, and passive analysis of data to supplement or replace surveys.  Section 5 discusses privacy and ethical concerns when using social media in research. We conclude in Section 6 with thoughts and considerations for the future on the role of social media in public opinion research as this area of research evolves.
 
1.2 AAPOR Reports on Related Topics
 
This report overlaps some of the ground covered by previous AAPOR Task Force reports, most notably:
  • Opt-in Online Panel Task Force Report (2010)
  • Non-probability Sampling Task Force Report (2013)
Where possible, we have attempted to reduce any redundancies with these prior efforts, except in places where there is either new information or where it is critical to the understanding of issues and points raised in this report. We encourage those interested in these other areas to view the other reports for more details. Each is available on the AAPOR website at www.aapor.org.
 
2.0 SOCIAL MEDIA TYPES, USAGE, AND DATA
 
Social media has been defined in many ways, but for the purposes of this report, we borrow the definition from Murphy, Hill, and Dean (2013), which is relevant for public opinion and survey research: “Social media is the collection of websites and web-based systems that allow for mass interaction, conversation, and sharing among members of a network.” Social media platforms have proliferated in recent years with a rapid increase in adoption and use by both members of the general public and specific subpopulations.  In this section, we discuss the types of social media and popular current examples of each, rates of usage among the U.S. population, and types and elements of data from social media platforms that can be used in the research process.
 
2.1 Social Media Types
 
Social media is not defined by a single type of platform or data.  The list of popular platforms is long and can change rapidly (e.g. Snapchat and Instagram emerged on the scene during the writing of this report). Widely recognized popular social media types include:
  • Blogs (e.g. Blogger, WordPress, Tumblr),
  • Microblogs (e.g. Twitter),
  • Social networking services (e.g. Facebook),
  • Content sharing and discussion sites (e.g. YouTube, Reddit), and
  • Virtual worlds (e.g. Second Life).
Blogs are websites run by an individual or group with periodic entries, or posts, on a variety of topics. Readers can typically voice reactions to the material posted by leaving comments directly on the blog. These comments often contain more varied information and opinion than the original blog post itself. Popular blogging sites include Blogger, WordPress, and Tumblr. More than 150 million public blogs exist on the web. Compared with the general population, bloggers are more likely to be female, in the 18- to 34-year-old age group, well-educated, and active across social media (Nielsen, 2012).

Microblogs are abbreviated versions of blogs where users publish very short messages. Twitter is currently the most well-known example.  On Twitter, users post or “Tweet” messages up to 140 characters in length. Twitter is convenient for research because of the large volume of publicly available messages and relatively simple process of obtaining them (O’Connor et al., 2010).  In the U.S., about 18% of online adults use Twitter.  Use is higher among those age 18-29, African Americans, and urban/suburban dwellers (Duggan and Smith, 2013a).
 
Some blogs and microblogs are enabled by social networking services1 which allow users to post photos, videos, notes, and status updates to share with contacts or “friends.” Facebook is currently the most popular social networking service with almost over a billion active users, including the majority of U.S. adults (57%; Smith, 2014). Facebook use is highest among women, young adults, and those with lower levels of income (Duggan and Smith, 2013a).  As of this writing, Instagram and Pinterest are rising in popularity and starting to surpass the penetration of Twitter, (17% and 21% of online users, respectively compared to Twitter’s 18%). LinkedIn currently has 22% of online users in its ranks, and this demographic is notably different from other social networking sites because of its focus on networks of professionals. However because little to no public opinion research has been documented on these platforms yet, they
will not be discussed here. We do note that these platforms seem ripe for investigation.
 
Many social networking websites allow users to share images, videos, and comments and to discuss the content shared by other users.  The data from these websites can provide insights about individuals’ behaviors and opinions for use in research. YouTube is a video sharing site started in 2005 and has a current penetration of 51 percent of U.S. adults (Holcomb, Gottfried and Mitchell, 2013). In the U.S., Nielsen estimates that YouTube reaches more adults ages 18-34 than any cable network (YouTube, 2014). Almost half of YouTube users are between 16 and 34 years old and slightly more than half are male (Google, 2014). Reddit is an example of social news and entertainment site where users submit, comment, and vote on content to determine its prominence on the site. Reddit is most popular among those aged 18-29 and the typical user is male. Reddit is used by 6% of online adults in the U.S. (Duggan and Smith, 2013b).

Virtual worlds are online realistic representations in which a user controls an avatar and interacts with other avatars and the surrounding environment. They are distinct from other social media sites, which typically augment real-life personas and relationships.  In virtual worlds, users represent themselves in ways that depart from real-life appearances and personalities (Murphy et al., 2013). Second Life is one of the more well-known examples of a virtual world. Second Life “residents” communicate with each other “in-world” through instant messages or voice chat. Second Life no longer publishes user statistics but in 2009, they estimated 481 million logged user-hours (Linden, 2011). Anecdotally, the site’s popularity has plateaued since that time.  The most active users (as of 2008) were 25-44 years old (64% of hours logged) and
male (59% of hours logged).  In survey research, Second Life has been used to conduct cognitive interviews and other survey pretesting activities and the system allows the researcher to target and recruit specific types of residents through classified-type advertisements, online bulletin boards, and word-of-mouth in the virtual world, which can be more efficient and cost-effective when compared with traditional newspaper ads or flyers that are typically used to recruit in- person cognitive interview subjects (Dean et al., 2013). Second Life has also been used to access hard-to-reach populations such as those with chronic illnesses (Haque et al., 2013).

1 But social networking services enable more than just blogs and microblogs.

2.2 Social Media Usage
According to the Pew Internet and American Life Project, as of 2013, 81% of the U.S. adult population had Internet access, and of that population, 73% used social media (Duggan and Smith, 2013a). This rate differs most significantly by age group, but has increased dramatically over the last several years among all age groups, as shown in
Figure 1.

Figure 1. Social Networking Use by Age, U.S. Online Adults
figure1.jpg


As shown in Figure 2, social media among U.S. online adults in 2013 was significantly higher among women (78%) than men (69%) and higher among Hispanics (79%) than non-Hispanics (72%).  Figure 2 highlights other demographic differences, but the real division, as suggested in Figure 1, is age. Social networking sites are currently being used by 9 in 10 18-29 year olds but fewer than half of the 65+ population.  Many more statistics regarding social media use in the United States are available at http://pewinternet.org

Although social media popularity overall has skyrocketed in recent years, the popularity of individual social media sites has risen and fallen over time.  For instance, MySpace was a very popular website around 2006-2008, but soon after was essentially replaced by Facebook. Similarly, sites like Pinterest, Instagram, and Snapchat are rising in popularity as of the publishing of this report.
 
Certain other social media platforms are highly popular outside the U.S. (e.g. Sina Weibo is the most popular microblogging platform in China). And many platforms have changed in the features and access they offer over time. Despite the volatility and diversity of social media platforms, we include a description of a few key types of social media platforms and current popular examples. This information is best used as an introduction to the types of data available rather than any kind of lasting
guide to specific platforms.
 
Figure 2. Social Networking Site Use by Demographics, U.S. Online Adults, 2013
figure2.jpg 
 
2.3 Social Media Data
 
Data from social media platforms capture a variety of information and come in several different formats, with different access methods and levels of availability.  Generally, the data can be separated into user-platform interactions (users querying a platform) and user- user interactions (users sharing or discussing with other users).  Within
user-user interactions, communications may be classified into different types such as:
 
  • Broadcast: one person communicating with many others,
  • Conversational: one-to-one communication, and
  • Community: groups communicate with each other and within their membership ranks (Murphy, Hill, and Dean, 2013). 
Social media data can be purely text based or include audio or visual components. Social media posts might comprise original information or information that has been repeated or modified based on content from another user. Some platforms allow multiple types of interactions, like video with text or photos with location tags as specific as longitude and latitude coordinates which might be thought of as “paradata” in survey terminology (Callegaro, 2013).

Typically, social media data are organized with some common elements:
 
  • Username: who is saying what,
  • Content: what is being said or shared, sometimes pre-classified by the user with a “hashtag” or “#”symbol (e.g. #survey would most likely indicate the post is related to surveys),
  • Time and date: when the post was made,
  • Location: self-reported information on where the user resides or where the post was made,
  • References: to other users or sites,
  • User network: who else a user is connected with or what interests and content they follow (i.e. personal network).
Other data elements, such as the sentiment of statements (typically positive, negative, or neutral), the inferred topic of content, the connections between users and elements, and demographic information for the individual users, are often derived.

Data from social media sites can be accessed directly through the platform itself (e.g. copy and paste or screenshot) or through a range of partially to fully automated methods.  The most efficient and common access method is through a platform’s Application Programming Interface (API).  While access levels differ by platform, a user or software program can make a “call” to the API to “pull down” a data set based on certain search parameters.  Depending on the platform and researcher’s level of access, some data can be obtained for free and other data may only be available for a fee from authorized vendors.  Each platform has its own rules and arrangements, which are subject to change at any time.

The specific types of information available can also change rapidly in the social media world. Platforms sometimes release large changes in both features and access with little to no warning (for example, as of this writing, Twitter recently introduced video capabilities). At times, whole platforms can even disappear. Some currently pending redesigns and major feature changes, such as Google Plus, Facebook Home and Facebook Graph Search, can alter the types of interactions that users have with platforms like Google or Facebook and with each other; this, in turn, can change the nature of the data available to researchers. Because of changing features and changes to the policies that govern the terms and conditions of each platform, the data available to researchers can change in ways that could influence research dramatically (e.g. Facebook introduced limits on the data that could be collected by applications on the site). Descriptions of online platforms can quickly become outdated, and sometimes features that are the focus of inquiry can disappear altogether.

A bounty of data is freely available to researchers, including all public Facebook postsand a 1% sample of Tweets from Twitter. The availability of data for research purposes is largely dependent on the terms and conditions of each site and is subject to change with little or no notice. Some free data are accessible only through a platform’s API, or through a website with a preprogrammed interface.  In the case of Twitter, the API is freely accessible but does not grant researchers access to the universe of data nor any measure of the completeness of the query results. In these cases, the proportion and representativeness of data on the API is unknown (boyd and Crawford, 2012) and variable (Morstatter et al., 2013). At the present, access to 100% of public Tweets can be obtained through contractual arrangements with a few selected data vendors. Additionally, in partnership with Twitter, the Library of Congress is reportedly archiving all Tweets ever made for future access by researchers (Library of Congress, 2010).

2 Likely fewer than half of all posts (Dey et al., 2012)

3.0 QUALITY CONSIDERATIONS FOR SOCIAL MEDIA IN RESEARCH
 
There are many current and potential applications of social media in survey research.  Similarly, there are many quality considerations which should be addressed when determining whether social media are fit for use in any research application.  This section discusses the major considerations that must be addressed when considering social media as a resource in survey research: coverage and sampling, data completeness and accuracy, and appropriate methods of analysis.
  
3.1 Coverage and Sampling
 
One of the most pervasive questions in using social media for data collection is its use for constructing a sample frame and recruiting respondents.  To date, there has been little progress in attempts to show how data collected through the use of social media sites can represent the general population.   Because social media users are not representative of the wider general public and due to the lack of reliable sampling frames, only non-probability samples can currently be gathered in this way. Researchers must consider the universe of people who use the Internet, who uses social media among those on the Internet, and how those people are represented on social media.

The proportion of the population with Internet access has risen steadily since the Internet’s inception, but Internet access is still not universal (Fox and Rainie, 2014). We do not have a list of social media users from which to draw samples representative of all users.  Even if such a list were available for any given site, inferences made from such a sample would only generalize to the population of site users; this is rarely the group we wish to describe. Researchers using any Internet-based sample to talk about people who do not use the Internet or social media need to proceed with caution and carefully consider the variety of demographic and social-psychological factors that may differentiate those who are and are not online.  These questions and potential ways to address them using weighting procedures are discussed in the AAPOR task force report on online panels (AAPOR, 2010).

The problem of coverage in social media research is a nuanced one that goes deeper than the problem of Internet access alone. Once people are online, they behave in a variety of ways. This has been referred to as differential use.  The nature of social media data is such that the production of information online is almost never distributed equally across individuals.  Some people post far more information than the average user and other individuals tend to lurk in the background, rarely, if ever, generating their own content (Gruzd & Haythornthwaite, 2013).  Individuals who never post information may be invisible to certain sampling techniques.  Those who rarely post information might be systematically undersampled.  This process can bias the results of data collection toward the heaviest users.  Differences between frequent and infrequent posters can be addressed by weighting individuals by the inverse of the frequency with which they post, yet “lurkers” may be systematically different from active individuals in terms of their privacy preferences, opinions, and behaviors. Differences between posters and never-posters may be impossible to establish without alternative methods of data collection.

What is more, we often simply do not know the demographics of social media users. For instance, while 13% of the U.S. online population actively Tweet (Link, 2013), we have little to no user level information.  In addition, about one third of Facebook users have no demographics associated to their profile (Link, 2013).  Still, some researchers attempt to address the coverage error issues of social media by collecting demographic information through profile data, through information embedded on specific posts, or by behavioral inference, and weighting (Heerwegh, 2003). Barron (2013) proposed a method of assigning demographic characteristics to Twitter users in order to weight data mined from Twitter to U.S. population control totals. This method extracts latent characteristics (such as sex, race, and age) based on Twitter behavior. Similarly, Sloan et al. (2013) use natural language processing and text mining to infer Tweeters’ gender, language, and location.

Coverage error in social media research is further complicated because the relationship between unique users and unique accounts is not necessarily one-to-one. Some individuals have multiple accounts on the same platform, many accounts are shared by multiple users, and accounts can also represent companies or products instead of individuals. Further, some subset of social media accounts and posts are not genuine and exist primarily to deliver spam (Nexgate,
2013).
 
Another factor affecting representativeness concerns access to the Internet and proficiency in using it (Stern, Bilgen, & Dillman, 2014). There are clearly differences between Internet access disparities and the variation found in social media usage.  For example, research has demonstrated that race, education, rurality, and socio-economic status play a role in social media usage and proficiency (Stern, Adams, & Elasser, 2009; Witte & Mannon, 2010).  Duggan and Brenner (2013) show that African-Americans and Latinos use social networking sites and other social media at slightly higher rates than do whites.  This is especially true for Twitter and Instagram. Therefore, there are many nuances related to the questions of who uses social media and how that we have yet to fully understand.  Concentrated study in this area will be important as social media use in research continues to evolve.
 
3.2 Data Completeness and Accuracy
 
Social media research often faces issues of incomplete information. Unlike survey respondents who typically only provide information when prompted, those who use social media tend to post what they want, when they want, prompted or not. Those using social media usually also continue to have the ability to control their content after it is posted. They can edit or remove posts or change the privacy settings related to those posts. Additionally, much of the content in social media research frequently includes linked content across media sources, which are also subject to change (e.g., embedded YouTube videos that are removed from YouTube and no longer work). At the site level, usage policies and practices regarding the content can be changed by the provider at any time. Data that have embedded IP addresses on a given day may not have IP addresses embedded later the same day, and the IP addresses collected earlier by the researcher may later be declared “off limits” to the researcher under new terms of service. Websites that allow users to post usernames may later become anonymous by policy (as is the case with some meme message boards). Data that were available at the outset of a study may no longer be available when a study is complete, either because an individual changed their personal settings or privacy settings on a specific posting, or on a group level, when a site alters its
privacy structure.
 
As a result of incomplete information, capturing and archiving social media data is a challenge. Links can expire (especially shortened links) or lose their original destinations. When relying on an API to collect information, the researcher is dependent on the availability of the API, the completeness of the data collected by the API, and the specific data captured by the API. Twitter’s API, for example, does not return a complete set of query results to general users. Instead, they sell complete data only to a small set of selected vendors. Further, Twitter archives its records, limiting the total number of Tweets that can be accessed from any given account.

Along with the threat of incomplete data from social media, however, is the opportunity that comes from its abundance.  Survey data are typically captured from individuals once in cross-sectional studies and a limited number of times within particular time frames with longitudinal studies.  Social media, on the other hand, allows for a more continuous look at opinion, attitudes, and behaviors, when shared and when reflecting the truth.  The fact that social media is inherently public to some extent affects the likelihood that an individual will share the type of data in which we are interested. Stigma and social desirability may prevent honest and open sharing on certain topics.

Incomplete information can be partially addressed by linking social media to other sources.  For instance, social media data are frequently accompanied by metadata and paradata, allowing researchers to conduct analyses about the author of the post, time and date of the post, and location of the poster or at which the post was made. This can be useful, for instance, when examining geocoded data to examine geographic differences on the phenomenon of interest.  The quality of these geographic data can vary, however.  Hecht and colleagues (2011) examined the validity of user-provided location on Twitter.  They found that 34% of Twitter users in their sample had inaccurate location information. They also concluded, though, that coders could often determine the location of the user (at least to country and state) by “implicit” location information in the content in their Tweets.
 
3.3 Analysis of Social Media Data
 
In contrast to survey research, social media researchers less frequently specify individual people as the unit of analysis. The units of analysis in social media research can be individual posts, words, unique users, pages, or the like. Choosing a unit of analysis is usually a vitally important part of data selection and analysis.

Data analysis is done in a variety of ways by researchers from a wide variety of backgrounds. The most common form of data in social media analysis is textual data, and textual data can be analyzed in many different ways.  Text mining algorithms are popular with a growing legion of data scientists, and sophisticated computer programs have been built by machine learning experts to tackle the challenge of finding meaning in social media and other textual data. There is a growing set of text classifiers that are built by natural language processing specialists and linguists to uncover relevant underlying textual patterns using exploratory data visualizations, or markedly smaller-scale qualitative observations.

There is a natural tendency to focus on the ease of access rather than the strengths and challenges inherent in the analytical process. Arguably, the influence of processing errors in social media research is rarely regarded as seriously as merited. Automated textual analysis in social media research is done through structured queries. Traditionally, these queries were done by first finding and describing the external structure or genre of the data, then determining the grammatical structure of the data, and then by using these structures to isolate potentially meaningful information. Machine learning techniques have become increasingly common, allowing for data structure to be determined through automated repetitive iterations of a program rather than through grammatical rules or the iterative knowledge gained and employed by the programmer. Whether queries are built by program or by programmer, layered queries are subject to layers of potential error and researchers must take care to understand their data, their analysis methods, and where automated routines may lead to erroneous interpretation of social media data.

Query quality is measured in two primary ways: precision and recall. Precision is the proportion of records that are collected by a query that were indeed part of the group of targeted records that were intended by the researcher. Recall is the proportion of targeted records that are collected by a query; a less specific query will have higher recall than a more specific one. Anyone who has ever conducted a web search (e.g. Google or Bing) has been exposed to the difficulty of balancing precision and recall. Some search results will be correct, and some will not. And some elements that were intended in a search may not be reflected in the results. Queries in social media research tend to be multilayered to accommodate these complications. In text research, common communicative strategies such as pronouns, metaphors, irony, and intertextual references are extremely difficult for any query to deal with. The more detailed or targeted the query, the more likely these elements can be addressed, the larger the query, the more these elements are likely to be ignored.
 
Sentiment analysis is a commonly used text analytic strategy that involves the use of specific lexicons as proxies for emotional orientations (see Bontcheva and Rout, 2012 for a review of this literature). For example, in a sentiment analysis of Tweets during a political debate, researchers could use scripted queries which would find co-occurrences of candidate names and words from lexical sets. The results of this analysis would be intended to show whether Tweeters were commenting positively or negatively about each candidate during the course of the debate. These techniques can be conducted very quickly and can measure change over time and in real time, but they are complicated by slang, domain specific terms, negation, pronouns, and the complicated relationship between words and definitions (e.g. “the green one”
or flushing/Flushing; Murphy, et al., 2011). Additionally, they often miss communicative devices such as irony and metaphor.
 
Sentiment analyses have been used in public opinion research with differing success but continue to evolve at a markedly fast pace. One of the issues is how to analyze open text, what to keep and what to exclude (Pettit, 2013). Some companies are more forthcoming in describing their algorithms than others, for example Crimson Hexagon (Rosenstiel and Jurkowitz, 2011) uses the Hopkins and King (2010) method. Maynard and Funk (2011) studied problems with detecting political sentiment in Tweets, finding that sentiment analysis tools do well on long form pieces of text, but poorly on something as short as a Tweet and offer alternative approaches to sentiment analysis that they dub “opinion mining.” A number of researchers, including Kim and colleagues (2012) have compared automated vs. human coding of sentiment analysis for the purposes of public opinion research. They showed that automated processes have not reached the point needed for reliability and that standards are needed in this area of research. Further research is needed regarding the comparative quality of sentiment analysis algorithms for social media data across platforms and vendors.
 
4.0 CURRENT USES AND EVALUATIONS OF SOCIAL MEDIA IN RESEARCH Public opinion and survey researchers have recognized social media as a potentially valuable resource to conduct research more quickly, efficiently, and in new ways than in the past. Researchers in related disciplines such as market research, public health, and political science have also recognized this potential.  Researchers in each of these fields have begun exploratory research with social media to determine just where and how it can provide value. In this section, we discuss current uses for social media both within the survey lifecycle and as a supplement or replacement for surveys.  We first discuss active methods for using social media in qualitative research to inform survey design.  Next, we discuss survey recruitment using social media and the use of social media for respondent locating.  Finally, we discuss passive analysis of social media data to supplement, or in lieu of, surveys. In the design phase of the survey lifecycle, social media has been used to inform questionnaire design, allowing researchers new insights into the survey topics and populations under consideration. In testing and preparing for data collection, social media has been used for targeted recruitment of respondents for cognitive interviews and focus groups, using non-probability sampling methods (AAPOR, 2013). In longitudinal studies, social media has been used to actively locate or stay in touch with sample members through outreach and engagement efforts.   Finally, data from social media and web-based systems have been used as both a supplement and proxy for survey data by “scraping” websites for information on people’s self-reported characteristics, behaviors, opinions, and interests.
 
4.1 Social Media to Inform the Survey Process
 
Social media can be used in survey research planning to gain a better understanding of a topic, population, or a reaction to a survey instrument. Qualitative methods like netnography (net- ethnography) can provide key insights when planning a survey. Social media tools may also aid the process of pretesting survey items. Two pretesting techniques, focus groups and cognitive interviewing, appear to have good potential for adaptation and use via social media.

Netnography is a type of ethnographic research explained in depth by Kozinets (2010) that has emerged in the web-based digital age and may aid researchers in developing survey topics and improving questionnaire designs.  It builds on ethnographic approaches historically used to draw insight on culture and communities in the formative stage of survey data collection. Ethnographic methods, including participant observation, in-depth qualitative interviewing, and source document analysis, are used to identify question domains and concepts for measurement (Coreil et al. 1989), identify culturally appropriate terminology and wording, anticipate cultural factors critical to designing questionnaire items, and determine valid ranges and response options for survey answers.

In many respects, publicly posted information on the Internet provides an ideal environment in which to observe ordinary behavior and conduct this sort of research.  Although it may draw on existing online content for passive analysis, netnography has traditionally involved an active researcher as a member or participant in the online community they are studying. More recently, however, large amounts of social media data are harvested and observed without interaction or follow-up interviews from the researcher. Many netnographers, including Kozinets, discourage netnography without active involvement from the researcher believing that it is only through interaction between the researcher and the research subjects that the subjects’ behavior can really be understood.

Netnographic methods can be harnessed for online qualitative, formative pre-survey research in a similar fashion to the application of ethnographic methods in face-to-face field research. For example, netnographic methods could be used in the early stages of designing a study to:
  • Identify question domains and concepts,
  • Discover terminology used by the population of interest, and
  • Assess any language conventions that should or should not be used when collecting data on a specific topic or from a specific population.
As an early example of netnographic research, Kozinets (2002) conducted netnography of coffee consumers in the online newsgroup and content sharing platform alt.coffee. Postings relevant to coffee, specifically espresso and Starbucks, were downloaded and analyzed in order to measure variation in tastes and preferences for types of coffee.  “Member checks” involving active interaction with community members helped refine the information gathered from posts and provide additional detail.  The netnographic analysis identified themes in the data and was able to uncover key differences between “basic” coffee and “real” or “essential” (better quality) coffee, identify the status symbol of home espresso brewing, and group perceptions of Starbucks’ coffee quality and culture. This type of analysis is less intrusive and less expensive than a probability sample of coffee drinkers or focus group data collection with coffee connoisseurs.

A key advantage of netnography is that its cultural focus enables the capture of naturalistic language and terminology about research topics—much more so than surveys and even focus groups. Although this can be an end in itself, it is also advantageous for developing proper terminology for survey items. For example, Baker and colleagues (2010) suggest that a netnographic approach could be used to collect data for health studies from an online community like Patients Like Me (www.patientslikeme.com). A netnographer could join the community, share the research goals with other community users, and follow the progression and treatment of disease, well-being, and prognosis among members.

A risk of replacing traditional ethnographic methods with netnographic methods for survey development is the inherent bias generated when collecting data from only the online population. If survey designers and substantive experts rely only on netnography to develop domains and terminology about concepts (for example, identifying survey themes and issues for a survey on cancer from an online cancer support group only) then the survey might incorrectly measure only attributes of the online population of interest (which, depending on the online platform, is likely to be younger, better educated, urban, more mobile, etc.). Additionally, terminology used to discuss certain topics online may differ from offline counterparts, as digital communities may develop their own terminologies and behavioral conventions.

Focus group interviews, carefully planned discussions with small groups of people in which a moderator guides the conversational topics and encourages interaction among participants, are an opinion collection mode in their own right, but are also often used to pretest surveys. Focus groups are often used to identify salient topics among populations of interest to researchers, pretest questionnaire wording, instructions, layout and organization by soliciting early feedback, and evaluate survey contact materials and policies. Traditionally, focus groups have been conducted in-person in offices or research facilities or other quiet public meeting spaces.
 
With the emergence of web chat rooms and message boards in the 1990s, researchers began to use these platforms for focus group data collection (see Clapper & Massey 1996; Gaiser 1997; Schneider et al., 2002; Underhill & Olmsted, 2003). In the early twenty-first century, the emergence of social media has offered even more options for researchers conducting online focus groups.  Broad, national groups of individuals can now be brought together to deliberate on important issues in an online forum (Luskin, Fishkin, and Iyengar, 2004). Likewise, it is becoming common for organizations host Twitter chats around certain topics where people are invited to follow Tweets during a certain time period using a certain hashtag and join the discussion. These chats are typically hosted by an individual posing the questions and asking follow-ups, much like a focus group.

Some market research companies have begun replacing traditional focus group research with online and social media feedback. Social media channels can be used in many ways to encourage consumers to comment about products and brands. Market data gathering via social media can be as simple as querying Twitter and Facebook followers about tastes and preferences for an instantaneous response or a more sophisticated setup of private groups of consumers who provide routine feedback (along with their available social media data), akin to members of a traditional online consumer panel.

Wingate (2013) compared data collected in a study of experiences of new owners of an appliance brand via interaction on a Facebook group to data collected through a traditional series of focus groups. Participants were recruited for both groups through a list of registered owners of products from the company. The in-person focus groups were comprised of four 90-minute groups of five consumers each, flown in from across the country for data collection. Each focus group spent time discussing the user manual, product website and registration, emails and social media, dealer and showroom follow-up, and ideas for new owners. The Facebook group was made up of 30 participants who participated for 30 minutes a week over seven weeks. Each
week, users discussed one topic (user manual, website and registration, etc.) from the in-person focus group guide. The Facebook group generated information about ongoing problems and resolutions new users had with their appliances, as well as data on types of information new users were seeking but unable to find. Data on this type of ongoing experience was less accessible in the one-time focus groups. However, the focus groups generated useful information on the paper version of the user manual. Participants were able to drill down into the details of the manual text and highlight navigational problems with the manual.  Ultimately, both methods generated unique and valuable insights, with the Facebook method proving less costly (since travel was not required) and generating a broader range of data.

These findings suggest survey researchers should consider social media tools among their options when conducting focus groups. However, as in assessing any method, researchers should consider ways the approach might produce biased results. The Facebook group in the case study was conducted with a group of active, highly engaged product users. This type of engagement in an issue or topic area may be useful for scientific survey researchers as well, especially when pretesting methods for use with a very specific population. However, the biases associated with such engagement must be considered carefully with the results.

Cognitive interviews are another common method of pretesting. Typically, this involves semi-structured interviews administered concurrently or immediately following a test administration of a questionnaire under development. By administering think-aloud probes to respondents as they determine their answers to questions and probing, cognitive interviews assess properties of survey instruments and items that may lead to measurement error in surveys, such as problems with respondent comprehension, burden, and ability to map true answers onto response categories (Willis, 2005). Similar to focus group administration, cognitive interviews are traditionally conducted face-to-face in cognitive or survey labs at research facilities. Cognitive interviewers often react to verbal and nonverbal respondent cues to probe on any sources of confusion. Although the ability to draw on such cues is diminished, some researchers have reported improved efficiency without a substantial loss of quality by conducting cognitive interviews without a physically present interviewer by telephone or over the web (Bergstrom, et al., 2013; Edgar, 2013; Murphy, Keating, & Edgar, 2013). 

Emergent social media platforms enable even more new interfaces for cognitive interview data collection online that can better facilitate these processes. For instance, the virtual world Second Life allows users to enter a 3-D world that operates like a video game in which users move their avatars around the environment to walk (or fly) around and interact with other users. Dean and colleagues (2013) compared cognitive interviewing in Second Life to face-to- face remote interviewing via Skype and found both platforms feasible for cognitive interviewing. They evaluated interviews for functionality, participant engagement, and the number and type of cognitive interview errors identified. They found the ability to see the participant’s real face in Skype, rather than the Second Life avatar, resulted in more observations of potential
measurement errors in the form of question problems as well as nonverbal cues. Second Life interviews, however, were slightly less likely to show disengagement.
 
4.2 Study Recruitment
 
Keeping the quality considerations from the previous section in mind, there are several examples of when recruitment via social media may fit the needs of a study. One example of using social media for survey recruitment can be found in Bhutta (2012).  In her study of baptized Catholics, the author created a Facebook group named “Please Help Me Find Baptized Catholics!” She contacted the administrators from other Catholic-centered Facebook groups to recruit members for her study.  She also sent a message to her own Facebook friends to recruit Catholics to join her group.  When she felt she had a sufficient number of group members (n=7,500 over three groups; she used three because Facebook group sizes are capped at 5,000 individuals), she messaged them with an invitation to take the survey along with a link. Relative to the General Social Survey, her respondents were disproportionally female, young, educated, and religious; the latter is not particularly surprising given the sample.  She received over 3,500 responses to her survey. In this example, questionnaire saliency obviously provided some motivation for respondents, which raises bias concerns, but the approach did amass a large sample with very few resources. This method of recruitment for a more general population may prove more difficult than using a more traditional frame, but using a “group-centered” approach has worked with hard to reach populations such as foreign-nationals (e.g., Baltar and Brunet 2012).
 
Given the difficulty with sampling in a way that is generalizable to a population, a second way of recruiting a more diverse set of respondents is to use a pay-per-click ad campaign.  Social networking sites like Facebook provide options for advertising by outside organizations in order to generate revenue. In this situation, a researcher bids (usually for $1 to $5 per resulting click) on an online auction against other advertisers to have their ad featured on users’ pages.  The higher the bid, the more likely one’s ad is to be shown to “active” users (i.e., those individuals who click on ads with great regularity).  Using this format, a researcher may also run a targeted campaign.  For instance, if a researcher wants to target women between the ages of 35 and 45 years, a few clicks of the mouse can limit the ad to people meeting that criterion. Because Facebook, for example, collects demographic information from members when they create an account, this targeting can be easily accomplished.

Facebook Ads, for example, can also be built to target user interests on certain topic areas.  Demographic data used by Facebook in targeting come from multiple sources. For example, self-reported data, provided on users’ profile pages, are applied for targeting. These data include age, education, hometown, etc.  Such ads can be targeted at users all around the world, providing a cost-effective method for international survey recruitment. However, although such information is a good resource for researchers wanting to quickly learn the opinions of certain demographic groups of consumers, it is important to note when recruiting on Facebook and other social media sites, recruitment is typically targeted at individuals matching a certain set of characteristics rather than specifically sampled individuals (Popkin, 2012). Similar tools are available through social media platforms such as Google Plus.  On this platform, however, social media is used to create and target ads but not display them.  The ads are displayed outside of Google Plus on the Google Display Network (Wasserman, 2013).

Until recently, a major limitation of recruiting on Facebook has been that ads were only presented when the site was accessed from a full-resolution web browser on a desktop or laptop computer. As of this writing, mobile ads have been introduced, which is increasingly important as more and more people principally access the mobile versions of the site either with web browsers or apps on phones and tablet computers (ComScore, 2013). The cost of recruiting through ads depends on the market (e.g. certain international markets are more expensive), and the delivery platform (Social News Daily, 2013).

This ad-based approach has shown to be one way to recruit respondents.  For example, Ramo and Prochaska (2012) used this method on Facebook to recruit participants for a study of cigarette users between the ages of 18-25 years old.  Within three months, they had obtained a sample of 3,093 individuals who were eligible for the study with 1,548 completing the survey at a cost of $4.28 per complete.  Pay-per-click ads have been shown to be effective in other studies focused on hard to reach populations as well (e.g., Knox and Nunan, 2012). However, Stern, Wolter, and Bilgen (2012) used a Facebook campaign with an ad soliciting respondents to a technology study with no targeting, meaning it could appear on any member’s page, and found that the demographics of people who “clicked” differed from those of who actually completed surveys.  The examination of the Facebook-side data showed that the majority of people who clicked the ad were between 18 and 25 years, but the distribution from the demographic questions in the survey showed a much more representative age distribution.  This implies that the younger individuals (and not the older Facebook users) were more likely to click the ad and never complete the survey. However, this non-targeted approach proved to be slower in providing an adequate number of respondents and twice as expensive as compared to the more targeted approaches cited above.

Sage, Dean, and Richards (2012) explored the value of Facebook advertising as a viable solution to recruiting study participants when a probability-based sample is not required for research needs. The authors demonstrated how Facebook advertising methods were used to recruit study participants for several types of projects, mostly falling into the pretesting realm. They focused on “fit for purpose” situations that are conducive to the utilization of Facebook advertising for recruitment, looking into the reach, or potential audience, for an advertisement; cost associated with ad development, implementation, and level of effort; and the limitations and advantages of Facebook's advertising technique.

Antoun, et al. (2013) examined the performance of four online sources (Craigslist, Facebook, Google Ads and Amazon Mechanical Turk) for recruiting participants. They found very different performance between two types of online recruitment strategies: those that “pull- in” online users actively looking for paid work (e.g., Turk workers and Craigslist users) and those that “push-out” a recruiting ad to online users engaged in other, unrelated online activities (e.g., Google Ads and Facebook). They found that, although the pull-in recruiting strategy was more cost efficient, the two push-out recruiting sources seemed to reach a more diverse user base. They also found differences in commitment to the task and willingness to disclose personal information through the different techniques.

Researchers may also use Twitter for survey recruitment, across all platforms. Here, promoted tweets and trends may be applied allowing researchers to reach out to users around a specific hashtag (#) (Twitter 2013a & 2013b). In addition, binary polls (“yes/ no,” “agree/ disagree”) with hyperlinks embedded within the 140 character question specific are used, along with simple hyperlinks within Tweets redirecting Twitter users to third-party sites to receive the actual survey.
 
4.3 Locating Sample Members
 
Researchers have started to take advantage of social networking services to enhance tracing and to locate study participants for follow-up interviews. Several recent articles describe how Facebook has been used for locating participants. Facebook locating has been helpful to studies following up with participants in intervention studies and longitudinal studies.  None of the published articles use an experimental design.  Rather, Facebook is just an additional measure taken to find respondents (Borie-Holtz, 2012; Feelman, et al., 2013; Jaffee & Mills, 2012).

Several studies have reported modest success in locating respondents using social media and have reported the amount of attrition reduced by the Facebook contact as measured by the number of respondents who were contacted through Facebook. Whereas the coverage rate from Facebook may not make enough of a difference on some surveys, the additional coverage can be much more important for surveys of hard to reach populations as well as longitudinal studies. .

A study of adult methamphetamine users followed up with its sample members approximately eight years after participation in the initial study and found 48 of 511 surviving eligible respondents on Facebook. Eleven of these completed the interview, reducing attrition by 2%. These eleven respondents tended to be younger, female, and more mobile than the rest of the sample (Bolanos et al., 2012).

In one longitudinal telephone survey of families with young children in Oklahoma (Rhodes & Marks 2011), Facebook was used to locate parents who had participated in a telephone interview three years previously. The study team used their personal Facebook accounts to contact people who were not reachable by telephone or mail efforts. Of the 919 non- respondents, 294 (32%) were found on Facebook and 92 of these completed an interview. Those 92 represented 4% of all completed interviews. Another study, a follow-up with young adults who had participated in the Longitudinal Studies of Child Abuse and Neglect (LONGSCAN) as children used Facebook and MySpace to locate and contact sample members and generated approximately the same results (Nwadiuko et al., 2011). Ultimately, 4% of the sample was retained through MySpace and Facebook contacts. Unlike Rhodes and Marks (2011), Nwadiuko and colleagues study established personal profiles specifically for the study on Facebook and MySpace and used these profiles to find 35 subjects, 7 of whom agreed to remain in the study sample.

A physical activity study of adolescent girls used Facebook friending to recruit sample members into a follow-up study.  The initial study had 730 eighth grade girls; the follow-up was attempted when they were in eleventh grade. Of the initial 730, 175 could not be found using traditional methods, but 78 of the 175 were identified on Facebook. The researchers set up a Facebook profile associated with the study and used it to send friend requests to the girls. Sixty- eight of the girls accepted the friend request and 43 of those who accepted the friend request ultimately participated in the study. The 43 girls made up 6% of the initial sample. As with many of the aforementioned studies, respondents in the Facebook group differed from the larger group of respondents; they had lower average BMI and body fat as well as lower numbers of daily minutes of physical activity (Jones et al. 2012).

Facebook locating efforts have not always proven successful, however. One study locating experiment was cut short when it was labeled as spam by Facebook, leading authors to conclude that this was not a viable method of recruitment (Ruggiere, et al., 2012). The authors matched a list of names and cities in their sample to people with accounts on Facebook believed to be the intended person. A study Facebook page sent messages to potential respondents to watch their mail for an invitation to participate in the survey and encouraged them to respond to the invitation or to participate whenever an interviewer called. Eventually, Facebook labeled the health study account as spam, leading to a premature end to the contact phase of the experiment.

Notably, the studies referenced here used different methods of locating using Facebook, including creating study pages and “friending” study participants. We urge practitioners to be cautious about the terms and conditions of the individual sites, as well as any policy, legal or ethical issues regarding privacy and confidentiality that may be encountered using these different methods of contact.
 
4.4 Social Media as a Supplement or Replacement for Surveys
 
As survey researchers, we are accustomed to actively seeking data from study participants, but this can be costly, time-consuming, and burdensome for respondents.  Also, depending on the analytic goals of a study, researcher intervention may not yield the most accurate information. Natural behavioral and linguistic data that are untarnished by observer paradox (i.e. people who know they are being observed behave differently) are highly prized in fields like psychology and linguistics, although it has been historically difficult to collect these kinds of data without researcher intervention. Additionally, there are situations when the benefits to analyzing existing, or secondary, data outweigh those of collecting new, or primary, data.  Like administrative records, social media data can, in certain situations, be a cost-effective, high-quality alternative to surveys.  Their main advantages include the fact that they already exist, and there is no need for contacting respondents and hence no added respondent burden (Biemer and Lyberg, 2003). Another attractive feature of social media data is they are often available at low or no cost. Gigabytes of existing, or “organic” data are available for all who can allocate the storage space, staff time, and programming skill, possibly allowing circumvention of the traditional costs of sampling, recruiting, and incentivizing respondents.

Posts made publicly available by social media users can be accessed and analyzed by researchers in order to supplement survey research, or even to constitute freestanding, independent research that does not involve surveys at all. Depending on the platform, computer programs can quickly access data from an API and restrict it to a particular topic and/or time frame. Researchers can even set up queries to monitor and provide new data on a continuous basis as information is generated on the sites.  This allows researchers to look at data retrospectively or contemporaneously. These types of searches are also easily adaptable. Researchers may modify the search strategies during data collection without any change in behavior required from study participants.

From experience with surveys, we know that question wording matters, context matters, design matters, mode matters, and even the race of the interviewer may matter (see Blumenthal, 2012 for a discussion of these factors can lead to measurable differences in survey measurements). With the example of the interviewer, one benefit of self-administered questionnaires is the elimination of interviewer effects; the drawbacks include a loss of engagement with the respondent and opportunity for clarification of the task. If we go a step further and remove the questionnaire from the research process, we are left with the type of natural observation that is valued by many in the fields of psychology and ethnography to study behavior without influencing it.  However, as discussed in the previous section, the lack of control over what is able to be measured and how presents a host of methodological concerns about the quality and relevance of inferences drawn.

A widely cited example of this type of secondary analysis with social media data can be found in Chew and Eysenbach (2010).  The researchers analyzed Tweets using the terms “H1N1” and “swine flu” during the 2009 H1N1 pandemic. They conducted a content analysis of Tweets and, through this process, demonstrated how Twitter could be used as a real-time health- trend tracking tool. They found that while H1N1-related tweets were used primarily to disseminate information from credible sources, they were also a source of attitudinal data and experiences. The authors suggested that Tweets can be used for real-time content monitoring and may help health authorities respond to public concerns.  Paul and Drezde (2009) found a high correlation between the volume of posts on Twitter regarding the flu and official estimates from the Centers for Disease Control.  The authors cited the facts that these social media data were much quicker and cheaper to obtain than survey data with the same trend over time.  However, research based on passively collected social media sources is not without its skeptics. Others have warned that the signal-to-noise ratios from sources like Twitter are very low and that because of the unequal use of social media across different types of people (e.g. see Figure 2), results obtained cannot be representative of the general public (Butler, 2013).

In the health sector, several studies have attempted to test, like the Chew and Eysenbach (2010) study, whether social media analysis could supplant survey research. Cunningham (2012) attempted to demonstrate how Twitter could be used to draw conclusions about the general population. Knowing that people tend to smoke cigarettes roughly consistently during the day and across the week, whereas alcohol consumption spikes at certain times, he used Twitter data along with previously established behavioral research to find the same trends with respect to both smoking and drinking. Murphy and colleagues (2011) found evidence that trends in discussing drug use on social media can correlate with rates of using drugs. However, the correlations are somewhat weak and the signals reflect different phenomena. In a more recent study, Hanson and colleagues (2013) examined prescription drug abuse via social media, in particular, Twitter mentions of “Adderall” which showed up during traditional college final exam periods and were most prominent in college regions of the U.S. Though these studies show promise, all studies used a measure of existing survey data by which to gauge the success of their social media research.

Other researchers have looked at supplementing health survey research with social media research, to perhaps a more stable end. Squiers and colleagues (2011) supplemented a survey of women age 40–74 with an analysis of social media posts around the time of the controversy surrounding revised breast cancer screening recommendations in the U.S. Although this study did not compare Tweet content and survey results directly, it did demonstrate how the former can supplement the latter when investigating reactions to health guidelines.

Other studies have attempted to make use of real-time social media like Twitter to track issues like HIV incidence and drug-related behaviors, aiming to detect and potentially prevent outbreaks.  Young and colleagues (2014) suggest it may be possible to predict sexual risk and drug use behaviors by monitoring Tweets, mapping from where those messages came and linking them with data on the geographical distribution of HIV cases. The researchers collected more than 550 million Tweets over six months, and searched for potentially risky behaviors, such as "sex" or "get high." Plotting the Tweets on a map, they identified where the Tweets originated and how these patterns correlated with reported HIV cases from other sources. The researchers found a significant relationship between the risky behaviors reported on Twitter and counties with the highest numbers of HIV cases. They note that the main weakness of the study is the age of the HIV data, which was much less current than the social media data used.

In addition to health issues, social media data have been used in recent years to conduct passive analysis on political and social issues. The last few election cycles have afforded the field much opportunity to study possibilities related to polling and social media research. Gayo- Avello (2013) presents a meta-analysis using Twitter data to predict election results in which he also covers various considerations which would be more broadly applicable, like sentiment analysis, performance metrics, sources of bias in the data, and methods of cleaning the data. He concludes that “the predictive power of Twitter regarding elections has been greatly exaggerated, and that hard research problems still lie ahead.” Similar to the argument with health research, Gayo-Avello cites that all of studies that he was able to find “predicted” election results after the election was over, thus had existing data on which to “train” the social media searches.

Stepping out of the U.S. context, de Voogd, Chelala, and Schwarzer (2012) also examined Twitter in light of the 2012 French Presidential Campaign and found high correlations between media exposure, public opinion polls and sentiment measures of contents of Tweets. During Dutch elections in 2012, Hosch-Dayican and colleagues (2013) studied how to measure issue salience and issue ownership passively and in a way that would be comparable to traditional survey data, setting the stage for passive polling that could be used in future elections. Their study, like many others, indicated that more work needs to be done before Twitter research can stand alone (if it can), without a baseline from traditional surveys.

Some empirical studies have tried to address coverage issues in social media research by comparing its results to results from survey research. Mitchell and Hitlin (2013) of the Pew Research Center found dramatic differences between Twitter sentiment and public opinion polls in the lead up to the presidential election in 2012. These differences were not consistently in favor of one candidate or another, revealing a lack of consistent bias and an inability to use adjusted Twitter data to represent general public opinion.

Pew has also used social media data as a supplement to survey reports. In the
immediate wake of a viral phenomenon in 2012, where a warlord named Joseph Kony suddenly became the focus of many tweets and a viral video, Pew followed up by reporting on survey results and an analysis using software from Crimson Hexagon in order to explain the unfolding phenomenon in great depth within a short timeframe. In this case, there was no direct mapping of survey participants to social media participants, but the analyses were complementary (Rainie et al., 2012).  Similarly, Jakic (2012) researched the prediction of general sentiment polarity in reactions to news articles, before the news articles were even posted. Using Reddit as the data source for news and comments, he automatically labeled comments using a sentiment prediction model and demonstrated the feasibility of prediction of general sentiment polarity in reactions to news articles, before the news articles are posted in limited cases.

Veenstra and colleagues (2011) examined Tweets to measure sentiment about the Wisconsin Labor Strikes. While they found that Twitter sentiment did not reflect the broader public opinion trends, they learned something about the use of “retweeting.” Users in this study were more interested in sharing news items than partisan discussion. Non-mainstream news outlets were retweeted more.  Veenstra and colleagues showed that Twitter is being used to present an alternative narrative of the protest, spreading news not covered in traditional and higher-profile outlets. Information novelty is thus part of what makes it spread online. Davis, van Kessel, and Jugovich (2013) conducted a similar analysis on the Chicago Teachers Strike.

A final example of supplementing surveys with social media is YouGov’s Social Media Analysis (SOMA) Tool3. The tool uses a sample of Twitter and Facebook users from the YouGov online panel members and analyzes, using their own sentiment analysis tool, what the sample “hears” on each sample member’s private Twitter and Facebook news feed. By doing so, it overlays panel members’ demographics and other previously collected survey measures to social media. This strategy goes beyond measuring “volume of social media mentions.” Specifically, the company measures the reach of brands’ social media. In case of a company measuring their perception in social media, by using a combination of survey data (BrandIndex) and analysis of its panel member social media sentiment, YouGov promises the social media tool to understand if social media mentions have a long-term effect on a brand or if they are just a “storm in a teacup” (Morris & Perry, 2012). Such approaches do, however, limit the statistical power of the social media component to a subset of panelists or survey respondents who use social media and provide researchers access to the details of their use.
   
5.0 LEGAL AND ETHICAL CONSIDERATIONS
 
Because regulation of new technologies can be a slow process, assessment of all relevant legal regulations in this area of research is challenging. The lack of legal guidance specific to new technologies puts respondents potentially at risk and leaves researchers with unanswered questions. For example, consider a U.S. resident who is contacted to participate in research via social media while they are outside of the U.S.  Does U.S. law apply, the law of the country the respondent is in, or the law of the country of the research organization? Some nation-states have their own legal protections for research human subjects on their soil (European Union 2010).   U.S. regulations are only applicable for research within the U.S. Other countries and political regions have different, and sometimes stricter, protections of human subjects and the preservation of data collected (e.g. countries in the European Union). In the absence of clear legal direction, researchers need to self-regulate, adapting survey screeners and research documentation to accommodate the portability and flexibility of the platform on which we wish to conduct research so as to not erode the protection of human subjects.

Depending on the scope of an organization’s research (e.g. collecting data actively or passively from U.S. or non-U.S. residents either on or off U.S. soil) compliance with non-U.S. research laws to ensure the full protection of local human subjects may be required. The protections provided to human subjects, along with penalties for non-compliance, differ by nation-state (European Union 2010). Organizations interested in data from users in more than one country must obey laws of each of those countries. With social media users becoming increasing international, and non-U.S. social media companies gaining presence within the social media environment (KakaoTalk, Line, and WhatsApp), unraveling the associated jurisdictions when conducting research will require increasing efforts by research organizations.

Although it should go without saying, researchers should adhere to any privacy laws of the relevant countries. Within the U.S., government agencies are obliged to follow data collection and privacy laws that would apply to any other data collection or research. The laws that seem most relevant here are the Paperwork Reduction Act (U.S. Code Title 44, Chapter 35), regulations about the Protection of Human Subjects (Code of Federal Regulations Title 45, Part 46), the Confidential Information Protection and Statistical Efficiency Act of 2002 (Title 5 of Public Law 107-347) and the Privacy Act of 1974 (U.S. Code, Title 5).  These laws deal with collection, storage, and confidentiality protection of information from the public.

Laws surrounding issues such as the copyright ownership of literary works such as blogs, pictures, videos, and sound recording should be carefully monitored as well.
  
5.1 Personally Identifiable Information
 
Personally identifiable information includes names or sufficient information that names can be re-identified. When mining publicly available information, ESOMAR (2011) recommends masking quotes from reports so that they cannot be searched and re-identified.

A few case studies demonstrate the risks in releasing data from this type of research. In 2006, an AOL research team released twenty million search queries from approximately 650,000 users that were stripped of any identifying information (Ohm, 2009). Immediately, some took on the self-imposed challenge of re-identifying the users.  Two New York Times reporters identified a 62-year-old widow from Georgia from her searches. This resulted in a public relations nightmare and several people at AOL being fired for the data release.

In another case, cited by Ohm (2009), in Massachusetts, a government agency released records summarizing every state employee’s hospital visits removing fields containing name, address, SSN, and other “explicit identifiers.” The Massachusetts Governor, William Weld, assured the public that this was safe. A graduate student, Latanya Sweeney, using publicly available voter rolls, was able to identify the governor’s medical records and sent them to him. Dr. Sweeney also used 1990 census data to show that 87% of people in the US were uniquely identified by their combined five-digit ZIP code, birth date, and sex (Ohm, 2009).

In the third case study cited by Ohm (2009), Netflix released approximately one hundred million “anonymized” records on approximately six million of its users, which included movies rated, rating, and date of the rating, offering a one million dollar prize to the winning team to improve Netflix’ recommendation algorithm. Once again, outside researchers were able to show that people could be re-identified using these data. Ohm (2009) states that researchers were able to show that, “if an adversary knows the precise ratings a person… has assigned to six obscure movies, and nothing else, he will be able to identify that person 84% of the time. If he knows approximately when… a person… has rated 6 movies, whether or not they are obscure, he can identify the person 99% of the time.” Researchers also showed that users could be matched to Amazon raters and new information could be linked. This resulted in a class action lawsuit for privacy violations and the next contest from Netflix (which would have involved using demographic data) was cancelled.

These cases are provided as cautionary tales for practitioners on the maintenance of confidentiality protection in this new environment.
 
5.2 Terms of Service
 
In addition to legal requirements pertaining to the locations of the researchers and participants, researchers must also adhere to the terms of use of the sites that they wish to use.  At the time of writing, Facebook and Twitter are the most common social media platforms for collecting data. The terms of use for these two sites are drastically, and importantly, different. The terms of use for Facebook include the following:
 
If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it.”
(http://www.facebook.com/legal/terms)
 
Twitter, however, emphasizes the public nature of their data in their privacy policy:
 
Our Services are primarily designed to help you share information with the world. Most of the information you provide us is information you are asking us to make public. This includes not only the messages you Tweet and the metadata provided with Tweets, such as when you Tweeted, but also the lists you create, the people you follow, the Tweets you mark as favorites or Retweet, and many other bits of information that result from your use of the Services. Our default is almost always to make the information you provide public for as long as you do not delete it from Twitter, but we generally give you settings to make the information more private if you want. Your public information is broadly and instantly disseminated. For instance, your public user profile information and public Tweets may be searchable by search engines and are immediately delivered via SMS and our APIs to a wide range of users and services, with one example being the United States Library of Congress, which archives Tweets for historical purposes. When you share information or content like photos, videos, and links via the Services, you should think carefully about what you are making public.” (https://twitter.com/privacy)
 
5.3 Industry Ethical Guidelines
 
CASRO (2011) and ESOMAR (2011) issued guidelines for their members in order to deal with this new form of research using social media as a platform. The backbone of CASRO and ESOMAR guidelines is the distinction between the privacy of areas of social media, mainly:
  • Private social media. In this case the users expect their comment being private and shared only among a certain set of people (e.g. friends or circles). These are sometimes referred as “walled gardens.”
  • Public social media. In these platforms the users have a reasonable expectation that anyone can read, cite, reproduce or generally use the content posted.
In the first case, private social media, researchers are recommended to obtain explicit opt- in from participants before using their content (CASRO), clearly identify themselves and include references to their role (ESOMAR), and should NOT copy or scrape content within private areas (ESOMAR). The use of public social media content is allowed for research as long as the platform Terms of Use are respected and the data are masked to protect the anonymity of the content creators (CASRO). Research organizations should check if the user identity is easily discoverable and if so, they need to place reasonable effort in masking, or when it is not possible, obtain permission for that specific user. Under any circumstances, research organizations producing data from social media should be transparent to their research participants in a “timely and open manner” (CASRO).

CASRO (2011) has a specific section of guidelines for using social media platforms to recruit participants for online panels or occasional surveys (e.g. river sampling, or real time sampling) that focuses on informed consent, transparency and disclosure.

Though there is much debate on the topic of informed consent and passive social media data collection, like the guidelines cited above, we argue that the way informed consent is applied depends on the public or private nature of the space. Public spaces can be likened to observing behavior in public. In situations, like Twitter, where the terms of service clearly state that content will be made public, no consent should be necessary to conduct research on publicly available information. Researchers still need to maintain their code of ethics and protect the privacy of their research subjects. Researchers should also note the risks mentioned above with releasing information that can be re-identified. The benefits to the community should always outweigh the potential harm of doing research.

Ethically, the nature of private social spaces requires an active informed consent procedure. Like the Facebook terms of service require, private social spaces necessitate that participants are aware that their posts and activities may be used for research purposes. In this setting, researchers should not set up fake identities or false pretenses (profiles, etc.) in order to gather data. Using these guidelines, blanket scraping of data from private spaces is also unethical.

Research social spaces are walled gardens set up for the intent purpose of conducting research. These may be communities or websites set up for research. Informed consent procedures need to be established which may differ in process but typically not content from other established methods. .

Taking the approach that public spaces do not need informed consent procedures, but private spaces do assumes an understanding of, and compliance with, terms of use of the sites that may or may not be justified. It is an open question whether virtual public spaces are perceived as private. In some cases a user may not be aware that they are producing publicly available information, and in many cases users may see research as a misappropriation of the intended use of their post. However, neither of these concerns are legally valid, and they are difficult concerns for researchers to address and this perception may change over time.
  
5.4 Other Ethical Considerations for Researchers
 
The Children's Online Privacy Protection Act (COPPA) in the U.S. defines a child as under 13 and prohibits data collection from children without parental consent. Many social media sites’ terms of use prohibit individuals under 13 from joining or using the sites. The ethical question that exists with other vulnerable populations is whether they are able to provide consent to terms of use for themselves.

Children who lie about their age, along with companies such as Twitter who do not collect data about users’ age, present an ethical dilemma for researchers.  Though COPPA specifically deals with “knowingly” gathering data from children, there is also an ethical question about using other methods to determine children’s presence on social media and excluding them from analysis.

As we apply new technologies, research organizations may need to consider adopting policies for conducting online mobile research of individuals who are 13-18 year olds and whether they should be treated as children, as they would under most traditional survey procedures, or adults, as laws like COPPA would allow. ESOMAR recommends generally treating those under age 14 as children and 14 through 17 as “young people” - both with a special degree of care (see also ESOMAR Guideline on Interviewing Children and Young People).

The protection of minors in research is probably even more of an issue in the passive data collection realm. For example, if researchers analyze passively collected GPS location data – highly personal information – is it ethical to use data from “participants” over the age of 13, or over the age of 18? We recommend making informed decisions on issues like these prior to implementing passive data collection research.
 
5.5 Public Perception
 
The ethics of social media data is a particularly controversial topic at the moment of writing.  Different entities have access and rights to different levels of data. Some data are available by purchase (e.g., the Twitter firehose). Some are only completely available to researchers on the inside of the company (e.g., Facebook or Google), and those that the company supplies them to (e.g. Government entities).  The use of social media data for research purposes is clearly different than using these data for marketing purposes or for law enforcement purposes. The privacy community has reacted strongly and negatively to the U.S. Department of Homeland Security using social media data for the tracking of suspected criminals. Perhaps in reaction to this, CASRO and ESOMAR recommend against using social media to track or market to their clients.
 
The 2012 U.S. presidential election saw the use of a dynamic social-media data-mining campaign by the Obama camp.  This raised concerns by some privacy advocates about the potential for manipulation by way of social media.

Questions remain to be answered about what topics are suitable for passive social media research. In which cases do benefits to the public outweigh the possible harm? And do governments have the “right” to participate in this passive research?

As the reader can gather from this discussion, the legal and ethical considerations of these new technologies are a moving target and the presented guidelines are very likely to be updated in a short time frame. Each guideline should be also read in the legal context of the country where the research is conducted, e.g. in the U.K. by referring to the concepts laid out in the Data Protection Act of 1998 (“Data Protection act 1998,” 1998). The appendix to this report provides references to further guidance on legal and ethical issues for practitioners.
 
6.0 THE ROAD AHEAD
 
Though a good deal of research to date has focused on an array of issues related to social media, far less is known in terms of if, when, and how such data may be fit for use in public opinion and survey research.   Looking forward, it is necessary for researchers to continue investigation into social media’s potential utility for public opinion research, which will require replicable, impartial, transparent experiments to gauge its effectiveness as a source of opinion, attitudes and behaviors and/or as a platform for collecting such.  Here, we highlight just a few of the priority areas of research for our field.  Surely more questions will arise during the course of further investigation and as the landscape of social media in our society changes over time.
 
6.1 Validating Social Media
 
A question of paramount concern is whether social media, when used as a substantive source of data, can provide accurate answers to certain research questions.  How do we know that our interpretation of posts on the Internet mean what we think they mean?  Or if they were made by individuals at all (with the pace of ethnographic, behavioral and linguistic research on social media is fast, many questions have yet to be answered about the growth or “bots” or computerized postings as well as those “paid to post”). In order to provide some validation, we will need to interact with those who post social media and learn more about their intentions, attitudes, and behaviors when producing content.  Just as we have validated survey items against gold standard data sources, we must also validate social media against more certain sources of information.  In some cases, this may involve treating surveys as the standard and validating the information provided by social media, not only at the aggregate level, but at the individual level as well.  By measuring a phenomenon with the same individuals using different methods, we can begin to understand the sources of error and quality concerns and begin to make more certain claims about the validity of social media or the lack thereof.
 
6.2 Addressing Coverage, Sampling, and Differential Access Challenges
 
A second area of concern is whether social media can be representative of the general population or even a more specific group, such as Internet or social media users. Although social media research can accurately reflect activity online, more research is needed to determine whether we can create a frame of social media users from which we can sample individuals for research with a known and non-zero probability. Research into inferred demographics is useful to fill in missing information on those who use social media, and detection of fake and duplicate accounts is also helping produce a clearer picture of the social media landscape, but there is much work to be done to be certain whether and how social media may represent the real world or even a subset of that world.

A related set of issues involves the differential access to and use of the Internet and social media across various subgroups of the population.  Not everyone can access the Internet or can do so with the same level of access in terms of time and location (e.g., home, work, mobile). Moreover, even among those who use the Internet or social media, usage can vary in terms of cross-time engagement, number of different social media sites, types of activities performed on these sites, and even when and where they access social media.  To the extent these rates are associated with important demographic or substantive characteristics, the conclusions drawn from social media could be biased.  Time will tell how pervasive social media access will become.  And it is unknown if true parity in access and use across population groups will ever be achieved or whether we will be able to represent important subgroups well enough for quality measurement.  It is interesting and provocative to note that there are more individuals currently using social media in the U.S. than have a landline telephone (Duggan and Smith, 2013a; Blumberg and Luke, 2013).  But the impact of differential access and use must be better understood and overcome if social media is to become a robust source of public opinion data in the years ahead.
 
6.3 Designing Better Integrations of Surveys and Social Media
 
Social media research may be conducted in conjunction with surveys in an attempt to add an alternative quantitative perspective or a narrative element to a traditional survey analysis. Although the multiple data sources can converge to help describe the target population, record matching with social media data is extremely difficult and rarely done. In the Pew Research analysis of the Kony 2012 phenomenon described earlier in this report, the researchers were able to describe the unfolding event both through measures of public opinion and through an analysis of Tweets, leading to a nuanced description of the unfolding story that could be published very quickly after the event (Rainie et al., 2012).

To date, few studies have been published that directly compare survey responses with online behaviors (see Mishra, et al., 2012 for an example). But this is an appealing option, both because it may allow areas of survey coverage error to be explored in greater detail than traditional survey research, and because it may allow social media coverage to be explored in unprecedented ways through links to survey and administrative records. This kind of strategy has the potential to better describe survey non-respondents using a higher quality frame than social media research generally allows, but it also has the potential to compound web survey nonresponse with the coverage issues inherent in social media research.
 
6.4 Leveraging the Unique Features of Social Media
 
Social media research has many drawbacks when compared with survey data for the purpose of generalizable research. However, there are unique aspects of social media that make it ideally suited for other types of research. One major advantage of social media is that it can provide a glimpse into the social networks of individuals.  This includes important relationships often hidden from survey researchers, such as “weak ties” (Granovetter, 1973) or links between members of different social or professional circles (e.g. a friend of a friend). For instance, research on the importance of weak ties fueled the development of LinkedIn, a popular professional networking site. Hsieh (2013) shows how asking a respondent to consult records of online contacts, such as friends or contacts on social media, can provide more data on relationships and better coverage of “weak ties” than without consulting these sources. Golbeck (2013) provides an easy to follow introduction to social network analysis with a fair amount of depth. Sage (2013) discussed using a platform like Facebook to conduct social network analysis. Beyond social networks, there may be other unique features that become evident from social media and opportunities to investigate and supplement research with those that are fit for use.
 
6.5 Continuing to Refine Understanding and Guidance on Privacy and Ethics
 
Finally, as with other types of research, we must place paramount importance on questions related to the privacy and ethical implications of social media research. Many questions remain to be answered about what topics are suitable for research with social media. We need a better understanding of the cases where benefits to the public of such research outweigh the possible harm. Research and clear policies on the implications for conducting research with individuals under 18 are needed as well. We also must consider whether governments and researchers have the “right” to participate in passive monitoring of available social media data.
 
Balancing these privacy and ethical concerns along with the quality considerations and great potential for new insights into the study of public opinion, attitudes, and behaviors presents a significant challenge for the field of public opinion and survey research.  It is incumbent upon the field to explore this new world in a way that holds true to our values of ethical research, impartiality, transparency, and maximizing accuracy and quality in our measurements.
 
REFERENCES
 
AAPOR (2010). AAPOR Report on Online Panels. http://www.aapor.org/AM/Template.cfm?Section=AAPOR_Committee_and_Task_Force_Reports&Template=/CM/ContentDisplay.cfm&ContentID=2223
 
AAPOR (2013). Report of the AAPOR Task Force on Non-Probability Sampling. http://www.aapor.org/AM/Template.cfm?Section=Reports1&Template=/CM/ContentDisplay.cfm&ContentID=5963
 
Antoun, C., Zhang, C., Conrad, F.G. & Schober, M.F. (2013) Comparisons of Online Recruitment Strategies: Craigslist, Google Ads and Amazon’s Mechanical Turk. Presented at the Annual Conference of the American Association of Public Opinion Research. Boston, MA.

Baker, R. Downes‐LeGuin, T. & Ruyle, E. (2011). Proceedings of the Tenth Conference on Health Survey Research Methods. http://www.srl.uic.edu/Publist/HSRM10_proceedings.pdf
 
Barron, M. (2013). Latent Characteristic Extraction from Twitter Data: Toward Weighting Social Media Data to Make Inferences to the General Public. Presented at the Annual Conference of the American Association of Public Opinion Research. Boston, MA.
 
Baltar, F., & Brunet, I. (2012). Social Research 2.0: Virtual Snowball Sampling Method Using Facebook. Internet Research 22.1: 57-74.
 
Bergstrom, J.R., Krulikowski, C., Carroll, R, Marsh, K., Luchman, J.N., Helland, K., & Fishcer, M. (2013). A Framework and Usage Model of Young Adult Social Media Usage. Presented at the Annual Conference of the American Association of Public Opinion Research. Boston, MA.
 
Bhutta, C.B. (2012). Not by the Book: Facebook as a Sampling Frame. Sociological Methods Research 41(1) 57-88.

Biemer, P.P. & Lyberg, L.E. (2003). Introduction to Survey Quality.  John Wiley & Sons.

Blumenthal, M. (2012). Race Matters: Why Gallup Poll Finds Less Support For President Obama. http://www.huffingtonpost.com/2012/06/17/gallup-poll-race-barack-obama_n_1589937.html
 
Bolanos, F., Herbeck, D., Christou, D., Lovinger, K., Pham, A., Raihan, A., Rodriguez, L., Sheaff, P. & Lynn, M.. (2012). Using Facebook to Maximize Follow-up Response Rates in a Longitudinal Study of Adults Who Use Methamphetamine. Subst Abuse 6: 1-11.
 
Bontcheva, K., & Rout, D. (2013). Making Sense of Social Media through Semantics: A Survey. Semantic Web - Interoperability, Usability, Applicability. IOS Press. http://www.semantic-web-journal.net/content/making-sense-social-media-streams-through-semantics-survey
 
Borie-Holtz, D. (2012). Update Your Status Lately? – Then Why Not Respond to Our Survey! Presented at the Annual Conference of the American Association of Public Opinion Research. Orlando, FL.
 
boyd, D. and Crawford, K. (2012). Critical Questions for Big Data: Provocations for a Cultural, Rechnological, and Scholarly Phenomenon." Information, Communication & Society 15.5: 662-679.
 
Blumberg, S. & Luke, J. (2013). Wireless Substitution: Early Release of Estimates From the National Health Interview Survey, July–December 2012. National Center for Health Statistics. http://www.cdc.gov/nchs/data/nhis/earlyrelease/wireless201306.pdf
 
Butler, D. (2013, February 13). When Google Got Flu Wrong: US Outbreak Foxes a Leading Web-based Method for Tracking Seasonal Flu. Nature. http://www.nature.com/news/when-google-got-flu-wrong-1.12413
 
Callegaro, M. (2013). Paradata in web surveys. In F. Kreuter (Ed.), Improving surveys with paradata: Analytic use of process information (pp. 261–279). Hoboken, NJ: Wiley.
 
Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: Content analysis of tweets during the 2009 H1N1 outbreak. PLoS ONE, 5 (11), e14118.
 
Clapper, D.L., & Massey, A.P. (1996). Electronic Focus Groups: A framework for exploration. Information and Management, 30:1, 43-50.
 
ComScore. (2013). “2013 Mobile Future in Focus”
https://www.comscore.com/Insights/Presentations_and_Whitepapers/2013/2013_Mobile_Future_in_Focus3
 
Coreil, J., Augustin, A., Holt, E. & Halsey, N.A. (1989). Use of Ethnographic Research for Instrument Development in a Case-Control Study of Immunization Use in Haiti. Int. J. Epidemiol. 18(Supplement 2): S33-S37.
 
Cunningham, J.A. (2012) Using Twitter to Measure Behavior Patterns. Epidemiology 23: 764-765.
 
Davis, N.D., Van Kessel, P. & Jugovich, M. (2013) Tweeting the Chicago Teachers Strike: Using Organic Twitter Data and Sentiment Analysis to Understand Support on a Local Issue. Presented at the Annual Conference of the American Association of Public Opinion Research. Boston, MA.
 
Dean, E., Head, B., & Swicegood, J. (2013). Virtual Cognitive Interviewing in Skype and Second Life. in Hill, Dean and Murphy, eds. Social Media, Sociality and Survey Research. Wiley.
 
de Voogd, L., Chelala, P., & Schwarzer, S. (2012). Do Social Media Affect Public Discourses? A Sentiment Analysis of Political Tweets during the Frence Presidential Election Campaign. Presented at the Annual Conference of the American Association of Public Opinion Research. Orlando, FL.
 
Dey, R. Jelveh, Z., & Ross, K.W. (2012). Facebook Users Have Become Much More Private: A Large-Scale Study. 4th IEEE International Workshop on Security and Social Networking (SESOC). Lugano, Switzerland.
 
Duggan, M. & Brenner, J. (2013) The Demographics of Social Media Users — 2012. Pew Research Center. http://www.pewinternet.org/2013/02/14/the-demographics-of-social-media-users-2012/.
 
Duggan, M., & Smith, A. (2013a) Social Media Update 2013. Pew Research Center. http://www.pewinternet.org/files/old- media/Files/Reports/2013/Social%20Networking%202013_PDF.pdf
 
Duggan, M. & Smith, A. (2013b) 6% of Online Adults are Reddit Users. Pew Research Center. http://www.pewinternet.org/Reports/2013/reddit/Findings.aspx
 
Edgar, J. (2013). Self-Administered Cognitive Interviewing. Presented at the Annual Conference of the American Association of Public Opinion Research. Boston, MA.
 
Fleeman, A., Francis, K., Henderson, T., Woodford, M., & Jani, M. (2013). The Use of Email, Text Messages, and Facebook to Increase Response Rates among Adolescents in a Longitudinal Study. Presented at the Annual Conference of the American Association of Public Opinion Research. Boston, MA.
 
Fox, S., & Rainie, L. (2014). The Web at 25 in the U.S. http://www.pewinternet.org/2014/02/27/the-web-at-25-in-the-u-s/
 
Gaiser T., (1997). Conducting Online Focus Groups. Social Science Computer Review. 15 2, 135–44.
 
Gayo-Avello, D. (2013). A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter Data. Social Science Computer Review. 31(6): 649-679.
 
Granovetter, M. S. (1973). "The Strength of Weak Ties". The American Journal of Sociology 78 (6): 1360–1380.

Golbeck, J. (2013). Analyzing the Social Web. Burlington, MA: Morgan Kauffman. Google.  (2014). TNS consumer barometer http://www.consumerbarometer.com/
 
Gruzd, A. & Haythornthwaite, C. (2013). Enabling Community through Social Media. Journal of Medical Internet Research 15(10):e248.
  
Hanson, C.L., Burton, S.H., Giraud-Carrier, C., West, J.H., Barnes, M.D., & Hansen B. (2013). Tweaking and Tweeting: Exploring Twitter for Nonmedical Use of a Psychostimulant Drug (Adderall) Among College Students J Med Internet Res; 15(4):e62.
 
Haque, S. & Swicegood, J. (2013). Recruiting Participants with Chronic Conditions in Second Life. in Hill, Dean and Murphy, eds. Social Media, Sociality and Survey Research. Wiley.
 
Hecht, B., Hong, L. Suh, B., &  Chi, E. (2011). Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems: 237–246.
 
Heerwegh, D. (2003). Explaining Response Latencies and Changing Answers Using Client-side Paradata from a Web Survey. Social Science Computer Review 21.3: 360-373.
 
Holcomb, J., Gottfried, J., & Mitchell, A. (2013). News Use Across Social Networking Platforms. Pew Research Center. http://www.journalism.org/2013/11/14/news-use-across-social-media-platforms/
 
Hopkins, D. J., & King, G. (2010). A method of automated nonparametric content analysis for social science. American Journal of Political Science, 54, 229–247.
 
Hosch-Dayican, B., Aarts, K., Amrit, C., and Dassen, A. (2013). Issue Salience and Issue Ownership Online and Offline: Comparing Twitter and Survey Data. APSA 2013 Annual Meeting Paper.
 
Hsieh, Y.P. (2013), Testing Information and Communication Technology (ICT) Recall Aids for Personal Networks Surveys.  Presented at the 38th Annual Meeting of the Midwest Association for Public Opinion Research.  Chicago, IL.
 
Jacik, B. (2012). Predicting Sentiment of Comments to News on Reddit. http://dare.uva.nl/document/451648
 
Jaffe, E.M., & Mills, M.L. (2012). Evaluating New Technologies for Retention of Rural Youth in Longitudinal Survey Research. Presented at the Annual Conference of the American Association of Public Opinion Research. Orlando, FL.
 
Jones, L., Saksvig, B. I., Grieser, M., & Young, D.R. (2012). Recruiting Adolescent Girls into a Follow-up Study: Benefits of Using a Social Networking Website. Contemp Clin Trials 33(2): 268-272.
 
Kim, A., Richards, R., Murphy, J., Sage, A., & Hansen, H. (2012). Can Automated Sentiment Analysis of Twitter Data Replace Human Coding? Presented at the Annual Conference of the American Association of Public Opinion Research. Orlando, FL.
 
Knox, S.D., & Nunan, D. (2012) Can search engine advertising help improve online research? International Journal of Market Research, 30. Vol 53, no 4, pp 523 – 540.
 
Kozinets, R.V. (2010). Netnography: Doing ethnographic research online. Thousand Oaks, CA: Sage.
 
Kozinets, R.V. (2002), The Field Behind the Screen: Using Netnography for Marketing Research in Online Communities,” Journal of Marketing Research, 39, 61-72.
 
Library of Congress. (2010). Twitter’s Gift. http://www.loc.gov/loc/lcib/1005/twitter.html
 
Linden Labs (2011). The Second Life Economy in Q3 2011. http://community.secondlife.com/t5/Featured-News/The-Second-Life-Economy-in-Q3-2011/ba-p/1166705
 
Link, M. (2013). Emerging technologies: New opportunities, old challenges. Keynote presentation at FedCASIC, Washington, DC.
 
Luskin, R., Fishkin, J., & Iyengar, S. (2004). Considered opinions on U.S. foreign policy: Face- to-face versus online deliberative polling. http://cdd.stanford.edu/research/papers/2004/online-fp.pdf
 
Maynard, D., & Funk, A. (2011). Automatic detection of political opinions in tweets. In Dieter Fensel Raúl García-Castro and Grigoris Antoniou, editors, The Semantic Web: ESWC 2011 Selected Workshop Papers, Lecture Notes in Computer Science. Springer.
 
Mishra, S., Draus, P., Caputo, D., & Leone, G. (2012). A Survey of Social Media Usage Integrating Daily Facebook Participation Time with In-person Social Interaction among college undergraduate students. In AMCIS 2012 Proceedings. http://aisel.aisnet.org/amcis2012/proceedings/Posters/23
 
Mitchell, A., & Hitlin, P. (2013). Twitter Reaction to Events Often at Odds with Overall Public Opinion. Pew Research Center. http://www.pewresearch.org/2013/03/04/twitter-reaction-to-events-often-at-odds-with-overall-public-opinion/
 
Morris, A., & Perry, H. (2012). How and Why Social Media Storms Impact Brands. Presented at the Funky Data: Working with Unconventional Data in Surveys and Research. London: Association for Survey Computing. http://www.asc.org.uk/wordpress/wp-content/uploads/2012/10/ASC0912-P1-Andy-Morris-and-Hannah-Perry-Social-media-storms.pdf
 
Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. (2013). Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. Association for the Advancement of Artificial Intelligence. http://arxiv.org/pdf/1306.5204.pdf
 
Murphy, J., Keating, M., & Edgar, J. (2013). Crowdsourcing in the Cognitive Interviewing Process. Proceedings of FCSM Research Conference. http://www.fcsm.gov/events/prior.html
 
Murphy, J., Dean, E. F., Hill, C. A., & Richards, A. K. (2013). Social media, new technologies, and the future of health survey research. In Tenth Conference on Health Survey Research Methods, pp. 231–241. http://www.srl.uic.edu/hsrm/hsrm10_proceedings.pdf
 
Murphy, J. Hill, C.A.,  & Dean E. (2013). Social Media, Sociality and Survey Research. in Hill, Dean and Murphy, eds. Social Media, Sociality and Survey Research. Wiley.
 
Murphy, J. J., Kim, A., Hansen, H. M., Richards, A. K., Augustine, C. B., Kroutil, L. A., & Sage, A. J. (2011, September). Twitter feeds and Google search query surveillance: Can they supplement survey data collection? Proceedings of the Association for Survey Computing Sixth International Conference. http://www.asc.org.uk/publications/proceedings/ASC2011Proceedings.pdf
 
Nexgate. (2013). 2013 State of Social Media Spam. http://nexgate.com/wp-content/uploads/2013/09/Nexgate-2013-State-of-Social-Media-Spam-Research-Report.pdf
 
Nielsen. (2012). State of the Media: U.S. Digital Consumer Report. http://www.nielsen.com/us/en/reports/2012/us-digital-consumer-report.html
 
Nwadiuko, J., Isbell, P., Zolotor, A.J., Hussey, J. & Kotch, J. B. (2011). Using Social
Networking Sites in Subject Tracing. Field Methods, 23(1) 77-85.
 
O’Connor, B., Balasubramanyan, R., Routledge, B.R., & Smith, N.A. (2010). From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. Proceedings of the International AAAI Conference on Weblogs and Social Media. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1536
 
Paul, M.J., & Dredze, M. (2011). You Are What You Tweet: Analyzing Twitter for Public Health. Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, July 17-21. Palo Alto, CA: AAAI Publications, pp. 265-272.
 
Pettit, A. (2013). Bending the rules and biting the hand. International Journal of Market
Research, 55, 13–16.
 
Popkin, H. (2012). Facebook: More than 83 Million Users are Fake. http://www.nbcnews.com/technology/facebook-more-83-million-users-are-fake-919873
 
Rainie, L., Hitlin, P., Jurkowitz, M., Dimock, M., & Neidorf, S.  (2012). The Viral Kony 2012 Video. Young Adults & Media. Pew Research Center. http://pewinternet.org/~/media//Files/Reports/2012/The_Viral_Kony_2012_Video.pdf
 
Ramo, D.E., & Prochaska, J.J. (2012). Broad Reach and Targeted Recruitment Using Facebook for an Online Survey of Young Adult Substance Use.
 
Rhodes, B.B.,  & Marks, E. L. (2011). Using Facebook to Locate Sample Members, Survey Practice 4(5). http://www.surveypractice.org/index.php/SurveyPractice/article/view/83/html
 
Rosenstiel, T. & Jurkowitz, M. (2011). How News Media and Blogs Have Eyed the Presidential Contenders During the First Phase of the 2012 Race. http://www.journalism.org/files/legacy/CANDIDATESSUTDYFINAL.pdf
 
Ruggiere, P., Sams, A., Niermann, A., & Romero, E. (2012). Viability of Using Facebook to Increase Response Rates in an ABS Survey. Presented at the Annual Conference of the American Association of Public Opinion Research. Orlando, FL.
 
Sage, A., Dean, E., & Richards, A. (2012). Facebook Ads: An Adaptive Convenience Sample- Building Mechanism. Presented at the Annual Conference of the American Association of Public Opinion Research. Orlando, FL.
 
Schneider, S., Kerwin, J., Frechtling, J., & Vivari, B. (2002). Characteristics of the Discussion in Online and Face-to-face Focus Groups. Social Science Computer Review 20(1): 31-42.
 
Sloan, L., Morgan, J., Housley, W., Williams, M., Edwards, A., Burnap, P., & Rana, O. (2013). Knowing the Tweeters: Deriving Sociologically Relevant Demographics from Twitter. Sociological Research Online, 18 (3) 7. http://www.socresonline.org.uk/18/3/7.html
 
Social News Daily. (2013). “Facebook’s Mobile Ads Now More Expensive than Desktop Options.” http://socialnewsdaily.com/15167/facebooks-mobile-ads-now-more-expensive-than-desktop-options/
 
Smith, A. (2014). 6 new facts about Facebook. Pew Research Center, February 3, 2014. http://www.pewresearch.org/fact-tank/2014/02/03/6-new-facts-about-facebook/
 
Squiers, L., Holden, D. J., Doline, S., Kim, E., Bann, C. M.,& Renaud, J. M. (2011). The Public’s Response to the U.S. Preventive Services Task Force’s 2009 Recommendations on Mammography Screening. American Journal of Preventive Medicine, 40,  497–504.
 
Stern, M.J., Adams, A.E., & Elasser, S. (2009). Digital Inequality and Place: The Effects of Technological Diffusion on Internet Proficiency and Usage across Rural, Suburban, and Urban Counties. Sociological Inquiry 79(4): 391–417.
 
Stern, M. J., Bilgen, I., and Dillman, D. A. (2014). The State of Survey Methodology in the 2010s: Challenges, Dilemmas, and Optimal Solutions in the Era of the Tailored Design. Field Methods.
 
Stern, M. J., Wolter, K. M.,& Bilgen, I. (2013) Can We Effectively Sample from Social Media Sites?: Results from Two Sampling Experiments. Presented at the Annual Conference of the American Association of Public Opinion Research. Boston, MA.
 
Twitter. 2013a. “Promoted Trends.” https://business.twitter.com/products/promoted-trends-full-service
 
Twitter. 2013b. “Promoted Tweets.” https://business.twitter.com/products/promoted-tweets-full-service
 
Underhill, C. & Olmsted, M. (2003). An Experimental Comparison of Computer-mediated and Face-to-face Focus Groups. Social Science Computer Review 21(4): 506-12.
 
Veenstra, A., Iyer, N., Bansal, N., Hossain, M., Park, J. & Hong,  J. (2011) #Forward!: Twitter as Citizen Journalism in the Wisconsin Labor Protests. Paper presented at the Annual Meeting of the Association for Education in Journalism and Mass Communication. St. Louis, MO. http://citation.allacademic.com/meta/p520757_index.html
 
Wasserman, T. (2013). New Google+ Ads Won't Run on Google+. http://mashable.com/2013/12/09/google-plus-ads-outside-network/
 
Willis, G.  (2005). Cognitive Interviewing: A Tool for Improving Questionnaire Design. Thousand Oaks, CA: Sage.
 
Wingate, M. (2013). Should you replace focus groups with social media research? in QRCA Views, Fall 2013, pp 12-20. http://www.mydigitalpublication.com/publication/?i=170487&p=12&utm_source=August+New sletter&utm_campaign=Successful+Targeting&utm_medium=email
 
Witte, C.J., & Mannon, S.E. (2010). The Internet and Social Inequalities. Routledge.
 
Young, S.D., Rivers, C., & Lewis, B. (2014). Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes. Preventive Medicine.
 
YouTube. (2014). Statistics. http://www.youtube.com/yt/press/statistics.html

APPENDIX: FURTHER READING ON LEGAL AND ETHICAL ISSUES

 
The following guidelines were used in the development of the section on legal and ethical issues and we refer the reader to these references for more information on legal and ethical considerations:
 
Association of Internet Researchers (AOIR). (2012). Ethical decision-making and Internet Research. Retrieved from http://aoir.org/reports/ethics2.pdf
 
Council of American Survey Organization (CASRO). (2011). Social media research guidelines. http://c.ymcdn.com/sites/www.casro.org/resource/resmgr/docs/social_media_research_guidel.pd f
 
Data protection act 1998. (1998). Retrieved from http://www.legislation.gov.uk/ukpga/1998/29/data.pdf
 
European Society of Market Research (ESOMAR). (2011). ESOMAR guideline on social media research. Retrieved from http://www.esomar.org/uploads/public/knowledge-and-
standards/codes-and-guidelines/ESOMAR-Guideline-on-Social-Media-Research.pdf
 
European Society of Market Research (ESOMAR). (2012). ESOMAR guideline on conducting mobile market research. Retrieved from http://www.esomar.org/uploads/public/knowledge-and- standards/codes-and-guidelines/ESOMAR_Guideline-for-conducting-Mobile-Market-Research.pdf
 
European Union. (2010). European Textbook on Ethics in Research. Luxembourg: Luxembourg Publication Office of the European Commission. http://ec.europa.eu/research/science- society/document_library/pdf_06/textbook-on-ethics-report_en.pdf
 
Market Research Society (MRS). (2011). Online data collection and privacy discussion paper. Retrieved from https://www.mrs.org.uk/pdf/2011-07-
19%20Online%20data%20collection%20and%20privacy.pdf
 
Market Research Society (MRS). (2012). Online data collection and privacy. Response to submissions. Retrieved from https://www.mrs.org.uk/pdf/2012-04-
04%20Online%20data%20collection%20and%20privacy.pdf
 
MRA/IMRO. (2010). MRA/IMRO Guide to the top 16 social media research questions. Retrieved from http://www.mranet.org/rq/documents/mra_imro_smr16.pdf