Skip to main content
 

Illustration of a robot

Harnessing the power of the internet and social media in research has many advantages – accessibility, a means to approach large and/or targeted audiences, and remote participation during public health emergences (e.g., COVID-19). However, while data collection instruments such as a survey developed in Qualtrics or

REDCap, then posted on the world wide web can reach an intended study population, it also entails the risk of BOTS posing as human subjects.A BOT is a software program that performs automated tasks, mimics human behavior, and can perform predictable form filling (aka, an internet robot). They are designed to be not easily detected, adaptable, and smart. Bots have several uses such as Chat Bots to simulate human conversations and answer questions, Social Bots to influence social media platforms, Shop Bots that culls and organizes data or helps with a transaction, and Bots that extract content as part of internet search engines.

In research, a BOT can be programmed to complete a lot of surveys in minutes, posing as eligible human participants – especially common when a financial incentive or compensation is offered. As BOTS are not people, they compromise the integrity of data and research.

Tricks to Keep Bots Out of Your Dataset

At the study design phase:

  • Assess whether the inclusion and exclusion criteria are reasonable for the target population(s) and would not attract a BOT.
  • Assess whether the participant recruitment strategy would attract a BOT – posting a survey link on open sources or social media public pages may be BOT attractive.
  • Assess the reasonable amount of time it takes a human participant to complete study tasks. If a survey should take 10-15 minutes and time tracking shows less than a minute, you can more easily identify it was likely a BOT response.

With the informed consent:

  • Develop a disqualification of compensation statement and clearly state it in the informed consent form.
  • Inform participants that they cannot complete a survey more than once.

At data collection instrument development:

  • Never use a public survey link. Instead, use a unique and personalized survey link that can be accessible through study sites and only once per participant. Personalized links prevent the use of the same IP address accessing the survey.
  • Attention checks to test participant’s understanding of instructions (your task is to answer questions about ____).
  • BOT proof survey – a) open-ended questions or b) logic/contrasting cases questions or c) If/Then conditional logic questions or d) false questions (a ‘no’ answer is true).
  • Repeat questions as consistency checks. Ask the same question at separate points using different modes and verify answers match. Some examples:
    1. What is your age? Followed later in the survey with what is your date of birth?
    2. Ask about age in a drop-down menu and again later as an open-ended question.
  • Instructional manipulation checks such as skip and move, do not click on scale, or screen out random clicking.
  • Timed release for a “next” button to show on screen.
  • Include a cognitive task requirement to progress in survey – reCAPTCHA, QR Codes, Word Scramble, and “Click Here” Prompts derail BOTS.
  • Place an initial time limit on the survey to see if a BOT attack occurs. Keep the survey available for a limited time only in general.
  • Limit the total number of participants who can respond online. Align this with the project budget for providing compensation.
  • Honeypots – similar questions that no person should be able to see because they are in a hidden form field invisible to humans but seen by a BOT who do not know to skip the question, such as relationships status followed by marital status.

At data analysis:

  • Track Study Time Stamps: track the amount of time it takes a participant to complete a survey and note any unusual times.

My Research was Attacked by Rampaging Bots! Now What Do I Do?

Despite your best efforts, keep in mind that Bot programmers can be effective. For example, they have developed ways to create a normal distribution across responses and craft open-ended responses using extracted language from the survey itself to appear logical or believable.

At the same time human participants can provide quirky responses or repeatedly complete survey data to obtain an incentive payment.

For studies where UNC-CH is the IRB of record, please submit the event as Promptly Reportable Information (PRI) in IRBIS within 7 calendar days of the investigator becoming aware of the event. If you do not have all the information, include this in the submission and indicate that the investigation is ongoing.

The IRB will want to know how the investigator initially attempted to prevent bots from accessing the research (see above). They will request that the investigator develop an effective strategy to distinguish bot-generated or spam responses from real, genuine human subject responses so that subjects are not punished, their rightful compensation is not denied, or their data removed. Effective approaches from other researchers involved flagging survey responses that included:

  • Unanswered required questions or requests.
  • Inconsistent responses to identical questions.
  • Incomplete surveys.
  • Impossible data values.
  • Illogical responses to open-ended questions.
  • Excluding IP addresses outside of a targeted geographic location (i.e., outside NC)

Challenging cases may require an email or other communication with participants to determine if their responses were valid.

Lastly, ask for help, the team at UNC Odum Institute is phenomenal!

Written by Eric Geers, OHRE Compliance Manager

Comments are closed.