The ability to rapidly identify people likely to be infected with SARS-CoV-2 both speeds infection control efforts and results in better-powered clinical trials and observational studies. This protocol is a tool for identifying groups of people likely to be infected with SARS-CoV-2, including asymptomatic and pre-symptomatic cases. The purpose of the protocol is to help institutional decision-makers and researchers develop efficient and high-yield SARS-CoV-2 testing strategies that determine who in a given population should be approached for testing or recruited for a study on SARS-CoV-2.
Who should use this protocol? This protocol may be used by decision-makers wishing to optimize transmission control efforts through testing in settings where universal testing is not feasible. It can apply to a variety of institutional settings, such as universities, residential settings, workplaces, or schools. In addition, it is often impractical to test every single individual in a randomized or observational study of SARS-CoV-2. Therefore, this protocol is also useful for researchers wishing to optimize statistical power in any study where people at higher risk of infection must be preferentially sampled.
What information do I need to follow this protocol? This protocol assumes that some testing has already been performed in the target population or in a population similar to the target population. It also requires data on characteristics that could be used to propose testing strategies that would prioritize individuals with certain characteristics for testing. For example, testing data and data from housing contracts can be used to estimate the testing efficiency and yield of a strategy that prioritizes on-campus students for SARS-CoV-2 testing.
What expertise do I need to follow this protocol? Following this protocol requires basic skills in a statistical software package such as SAS, Stata, R or SPSS. In addition, following this protocol requires knowledge about the institution or community where testing will be conducted, including the availability of data and the feasibility of implementing various testing strategies. Therefore, it may be necessary to assemble a team or seek the input of individuals with experience in statistical analysis, institutional or local record keeping, and testing.
What can I learn by following this protocol? This protocol outlines a step-by-step method, complete with technical details and concrete examples, to evaluate candidate SARS-CoV-2 testing strategies in terms of their efficiency and yield. Comparing the efficiency and yield of potential testing strategies can help stakeholders make informed decisions about which testing strategy or strategies to pursue, given available resources.
To compare these terms, consider the example below, where a workplace implements this protocol and finds that:
Strategy 1: “Test all women,” has an efficiency of 0.3, and
Strategy 2: “Test all men,” has an efficiency of 0.12.
Assume that the workplace has 20 women who have not been tested and 100 men who have not been tested. If the workplace implements the more efficient Strategy 1, they will identify, in expectation, 6 cases of SARS-CoV-2.
If the workplace implements Strategy 2, more people will need to be tested to identify any one case, but 12 cases would be identified - twice as many as identified by Strategy 1.
Decision-makers may want to consider a combination of highly efficient, targeted strategies and less efficient, broad-based strategies to optimally identify and prevent outbreaks of COVID-19.
To estimate the yield and efficiency of various testing strategies in a target population of interest, two types of data are needed:
1. Data on SARS-CoV-2 test results and participant characteristics from a specific data collection effort: Ideally, this data collection effort would be conducted among a census or random sample of the target population at baseline. Alternatively, this data collection effort could have taken place outside the target population or among a convenience sample of the target population, but additional assumptions are required. We will refer to data from this data collection effort as the study sample. Testing in the study sample should be conducted without regard to presentation of symptoms or contact with infected individuals. Accordingly, data from testing of close contacts of infected cases or clinical testing sites are not appropriate for use with this protocol.
2. Data on the target population, as outlined below.
If the study sample was not a probability-based sample of individuals in the target population (for example, if testing data reflect a convenience sample of members of the target population, such as all people who showed up for a testing event):
This method requires a dataset with one row per person in the target population. The target population includes all individuals who may be eligible to be tested under any testing strategy. (See Approach, Step 1). It should also include available variables describing characteristics of the target population, such as age, sex year in school, department at work, location of office or residence, or other variables that could be used to identify individuals for a testing strategy.
If test results were collected from a census or probability-based sample of the target population:
It is preferable to procure a dataset with one row per person in the target sample to ensure accurate representation of the distribution of characteristics of the target population. If such a dataset is unavailable, however, a dataset including one row per person in the probability-based sample and a variable with the sampling weights can be used instead. This dataset should contain the test results of all members of the sample who have been tested. It should also include descriptive variables measured among members of the probability-based sample.
A note on aggregate data: Examples in this protocol use individual-level data; however, this protocol could also be used with aggregate data on SARS-CoV-2 test results and participant characteristics from a specific data collection effort and/or aggregate data on the target population that describes the distribution of the characteristics in the target population.
Include, in the dataset, descriptive variables that are readily available and can be used by decision-makers and those involved in testing activities to prioritize individuals for testing. Depending on the setting, relevant accessible variables may include age, residence hall, office building, participation in a meal plan, enrollment in-person classes, and/or others.
SARS CoV-2 test results from a specific data collection effort are needed to implement this protocol. For this protocol, only include test results from individuals who were tested without regard for presentation of symptoms. Methods to collect such results include testing a probability-based sample of the target population, or convenience sampling methods that are agnostic to symptom presentation, such as testing all potentially exposed persons in an outbreak. Clinical data that overrepresent symptomatic individuals (e.g., those relying on self- or provider- referral symptoms present) should not be used for this protocol.
The steps in this method are:
1. Define the target population
The target population must be clearly defined before candidate strategies can be compared. This step may require input from institutional stakeholders. If the institution does not plan to offer testing to certain individuals, such as remote employees or students in online degree programs, these individuals should be excluded from the dataset before proceeding.
2. Gather available study data to inform testing strategies
Gather test result data from the study sample, and additional datasets with descriptive variables that can be used to characterize members of both the study sample and the target population. Examples of data sources that may include descriptive variables of interest include student enrollment data, health records, work schedules, office rosters, housing rosters, or other sources. This process may require consultation with persons involved in institutional record-keeping.
If the test result data for the study sample already includes all variables that would be of interest for defining subpopulations for testing, it is not necessary to link the testing data with a separate data set containing descriptive variables. In the common case, however, where limited covariates have been collected alongside the test result data, it may be necessary to link the test results to additional descriptive data on the same individuals. Because of the need to link these data, ensure that the descriptive datasets include identifiers that also appear in the test result data for members of the study sample. Depending on the setting, identifiers may be variables such as student ID, employee ID, or name.
3. Identify and link descriptive variables of interest
Among available descriptive variables, consider which variables would be feasible to use in identifying members of the target population to reach for testing. Variables to consider may include area of residence, sex, age, year in school, classroom, etc. These descriptive variables will be used in defining eligibility for candidate testing strategies in Step 5, below. Include as many variables as desired.
If the test result dataset already includes all descriptive variables of interest, continue to Step 4. Otherwise, for members of the study sample, link the test result data to data on the descriptive variables of interest. The resulting data set should contain one record per member of the study sample, with one variable indicating their SARS-CoV-2 test result and one variable for each descriptive characteristic of interest.
For members of the target population, create a separate data set that includes one record per member of the target population. This data set should include one variable for each descriptive characteristic of interest. If the study sample is a probability sample of the target population, sampling weights and clusters (if applicable) may be included in the dataset containing the test results and characteristics of the study sample in lieu of obtaining a separate dataset with characteristics of the target population. However, the preferred approach is to use a dataset containing the characteristics of all members of the target population, if such a dataset is available.
4. Assess bivariate associations
After identifying potential variables, examine bivariate relationships between SARS-CoV-2 infection and each available variable that could be used to develop a testing strategy. One approach is to use two-by-two contingency tables to calculate prevalence ratios and 95% confidence intervals. Each two-by-two table contains counts of individuals in the study sample stratified by their SARS-CoV-2 test result and a single binary characteristic (e.g. gender). Note that any categorical or continuous variables of interest (e.g. year in school, age) must be re-coded as binary variables prior to constructing the two-by-two table. For example, if the dataset contains age in years, a new variable should be created that indicates whether each individual belongs to the age group that is being considered for priority testing. The new indicator variable could, for instance, take a value of one for all individuals over 65 and zero otherwise, take a value of one for all individuals between 18 and 25 and zero otherwise, take a value of one for all individuals under 25 or over 65 and zero otherwise, etc.
The following two-by-two table presents the number of SARS-CoV-2 infections among men and women in a hypothetical study sample of size 1,000.
The following formulae may be used to calculate the estimated prevalence ratio and 95% confidence interval comparing the prevalence of SARS-CoV-2 in the index group to the prevalence of SARS-CoV-2 in the reference group. Here, men are the index group and women are the reference group. Note that these formulae yield unbiased estimates only when the study sample is a simple random sample of the target population. In that scenario, the calculations can be done with a calculator or in Microsoft Excel. If the study sample is a probability-based sample of the target population (i.e. individuals were sampled for the study based on their characteristics), the calculations must be done in a statistical software package that can take into account the sampling weights. Finally, the below formula for the standard deviation is a large-sample approximation; when the number of infections diagnosed in either the index group or the reference group is fewer than 5, small-sample methods should be used to calculate the confidence limits (sample SAS code is provided in the Appendix).
Formulae for Calculating the Estimated Prevalence Ratio and 95% CI
In this example, the prevalence ratio comparing the prevalence of SARS-CoV-2 infection among men vs. women is 1.81, with a 95% confidence interval of 1.08 to 3.04. The prevalence ratio is greater than 1, indicating that the prevalence of SARS-Cov-2 is higher among men compared to women in the target population.
An alternative approach, which some users may find more convenient, is to use log-binomial models to calculate bivariate prevalence ratios and 95% CI. Log-binomial models will produce estimates that are equivalent to those produced by the two-by-two table approach described above. The Appendix provides sample SAS code for users who prefer this approach. We do not recommend adjusting for any additional variables in the models, since unadjusted estimates of association are of interest in this step.
Use the bivariate estimates of association calculated in this step to identify variables associated with the prevalence of SARS-CoV-2 in the target population.
5. List candidate strategies
Generate a list of candidate testing strategies. This list can be informed by factors such as the direction and magnitude of the bivariate associations computed in Step 4 and the feasibility of implementation of testing in specific population sub-groups. Include as many strategies as desired. Combination strategies may also be considered. Implementation of the candidate strategies would involve testing all who are eligible under the strategy (e.g., all students living on campus). This would allow for identification of cases even among those who were unknowingly exposed to SARS-CoV-2 and who have not exhibited symptoms.
Candidate strategies will vary with context and available data. Some examples of candidate testing strategies are provided below. To evaluate these strategies, the dataset must include variables relevant to determining eligibility for the strategies. For example, to evaluate the strategy: “Offer testing to everyone who has used the gym or pool in the past two weeks,” the dataset must include variables that indicate whether individuals used these amenities. This step should be conducted in consultation with institutional decision-makers and those who will implement the proposed testing activities.
The table below shows some potential strategies that may be considered, among others, in various settings.
6. Create strategy indicator variables
For each candidate strategy, create a binary indicator variable that indicates if the person would be eligible for testing under the candidate strategy. For any combined testing strategies to be considered, include a binary indicator variable specific to that combination strategy.
Exclude any participants who refused SARS-CoV-2 testing from the study sample, under the assumption that they would also refuse when implementing a candidate testing strategy.
7. Estimate the efficiency, number needed to test, and yield of each candidate strategy
For each candidate strategy, first restrict the data set for the study sample to people who would be tested under that strategy, based on values assigned for the strategy indicator variable in Step 6. Proceed to calculate the expected efficiency, number needed to test, and yield among only this subset of people.
Calculate the efficiency of the testing strategy by dividing the number of participants with a positive SARS-CoV-2 test result by the number of participants who were tested. For example, if there were 100 people tested among those eligible for Strategy 1, and if 5 were cases, the efficiency would be 5/100 = 0.05.
Calculate the number needed to test, that is, the number of people who would need to be tested to identify one case of SARS-CoV-2. To do this, calculate the inverse of the expected efficiency. For example, if the efficiency of Strategy 1 were 0.05, the number needed to test to find one case would be 1/0.05 = 20 people.
Calculate the yield of the testing strategy in the target population by multiplying the efficiency by the total number of people eligible for testing under the strategy in the target population. If Strategy 1 has an efficiency of 0.05 and there are 60 people eligible for the strategy in the target population, the yield of this strategy in the target population would be 0.05*60 = 3 cases.
For each strategy, repeat the process of sub-setting the study sample to include those eligible for the strategy, and then within the subset, compute the efficiency, number needed to test, and yield in the target population.
8. Compare the values calculated in Step 7 across candidate strategies
Compare the number needed to test and the testing yield across strategies to identify the strategies that are most efficient and/or have the highest yield in the target population. Note that comparing expected efficiency or the number needed to test will result in the same conclusion about the relative efficiency of various testing strategies. We recommend comparing the number needed to test, as it may be more intuitive to interpret.
A university plans to offer free SARS-CoV-2 testing to some of its undergraduate students to help identify asymptomatic cases (i.e., those without signs of sickness but able to transmit the virus) and stop the spread of COVID-19. The university considers its target population for student testing efforts to be all students who attend on-campus classes or reside on campus. Knowing that there is limited laboratory capacity to process tests, tests were offered first to a simple random sample of students returning to campus in August. Data from this first wave of testing constitute the study sample, and administrators will use the test results from the study sample to identify efficient and high-yield testing strategies to roll out testing across campus.
Based on anecdotal evidence from other universities, there are concerns that the virus may be spreading rapidly among first-year students, fraternity members, and students who returned to campus from out-of-state. Accordingly, three possible testing strategies have now been suggested for a subsequent round of testing: test all first-year students, test all students who are members of a fraternity, and test all students with an out-of-state permanent address. The relative efficiency, number needed to test, and yield of these strategies are, however, unknown. This protocol will be implemented to compute and compare the efficiency, number needed to test, and yield for the three candidate strategies.
First, a dataset is created with one row per undergraduate student, indicator variables for the three characteristics of interest (being a first-year student, being a fraternity member, and having an out-of-state permanent address), and the test results of students who were tested in August.
Second, bivariate estimates of association are calculated for each of the three characteristics of interest. Below, we show by-hand calculations for one of the characteristics. The calculations may be done by hand when the study sample is a simple random sample of the target population, and when the number of infections diagnosed in both the index group and the reference group is greater than 5. When these two conditions are not met, statistical software should be used to accommodate sampling weights and/or exact methods. Sample SAS code is provided in the Appendix.
The following two-by-two table shows counts of individuals in the simulated study sample of size 216, by their test result and fraternity membership. Recall that in this example, the study sample was a simple random sample of the target population, so we can use the formulae provided in Step 4 to calculate prevalence ratios (PR) and 95% CI comparing the prevalence of SARS-CoV-2 among fraternity members vs. non-fraternity members.
Formulae for Calculating the Estimated Prevalence Ratio and 95% CI
Estimates for all three characteristics of interest are summarized in the table below.
Based on these results, the prevalence of SARS-CoV-2 was higher among fraternity members and students with out-of-state permanent addresses in August. First-year students, on the other hand, do not seem to have experienced a higher prevalence of COVID-19 in August. Given these results, administrators decide not to test all first-year students, but to consider the following testing strategies:
Strategy 1: Test all fraternity members.
Strategy 2: Test all out-of-state students.
To compare each candidate strategy, the expected efficiency, number needed to test, and expected yield are calculated.
Among the students randomly sampled for testing in August, 24 were fraternity members; of those, 5 tested positive for SARS-CoV-2.
Based on these data:
The expected efficiency of Strategy 1 is 5/24 = 0.201. This means that we expect 20.1% of all tests conducted under Strategy 1 to identify a case of SARS-CoV-2.
The corresponding number needed to test is 1/0.201 = 4.80. This means that under Strategy 1, we expect that 4.80 - almost 5 - people will need to be tested to identify one case of SARS-CoV-2.
Of all undergraduate students, university records show that 980 are fraternity members. Therefore, the expected yield of Strategy 1 is 0.201*980 = 204. This means that if Strategy 1 were implemented, an expected 204 cases of SARS-CoV-2 would be identified.
Strategy 1: Number Needed to Test
Strategy 1: Yield
Among the students randomly sampled for testing in August, 58 had out-of-state permanent addresses; of those, 7 tested positive for SARS-Cov-2.
The expected efficiency of Strategy 2 is 7/58 = 0.121. This means that, in expectation, 12.1% of all people tested under Strategy 2 will be identified as SARS-CoV-2 cases.
The corresponding number needed to test is 1/0.121 = 8.29. This means that just over 8 individuals will need to be tested per case of SARS-CoV-2 identified under Strategy 2.
Of all undergraduate students at the university, records show that 2,487 have out-of-state permanent addresses. Therefore, the expected yield of Strategy 2 is 0.121*2,487 = 300. If Strategy 2 were implemented, the university would expect to find 300 cases of SARS-CoV-2.
Strategy 2: Number Needed to Test
Strategy 2: Yield
The table below compares the two candidate strategies.
Based on these results, testing all fraternity members (Strategy 1) is expected to identify cases more efficiently than testing all out-of-state students (Strategy 2). Specifically, approximately 8 out-of-state students would need to be tested to identify one case of SARS-CoV-2 under Strategy 2, whereas only about 5 fraternity members would need to be tested to identify a case under Strategy 1. Testing out-of-state students is expected to result in a higher yield, however, identifying 300 cases compared to the 204 cases expected to be identified under Strategy 1.
Based on the findings of this analysis and considerations about testing capacity, university leadership decides on a two-pronged approach to university-wide testing. First, free testing will be provided to all fraternity members through mobile clinics at fraternity houses. Then, all out-of-state students will be offered tests through the campus health clinic, with appointments made on a rolling basis according to remaining testing capacity. This approach leverages a highly efficient testing strategy and a broader-based, high-yield strategy to ensure that the greatest number of cases is identified as quickly as possible, while accounting for testing capacity constraints.
If data related to transmission are available in the study sample, such as number of contacts among those testing positive, additional comparisons can be made to examine the potential for limiting onward SARS-CoV-2 transmission under each strategy.
For example, if, on average, people with SARS-CoV-2 identified under Strategy A have contact with twice as many people than Strategy B, identifying cases in Strategy A may be especially beneficial for reducing spread, even if the yield of new cases is slightly lower than for Strategy B.
Data related to transmission probabilities (e.g., total number of contacts in the week prior to SARS-CoV-2 diagnosis) may be collected as part of contact tracing in the study sample.
The approach described in this protocol is adapted from the following peer-reviewed journal article:
Edwards, Jessie K.; Arimi, Peter; Ssengooba, Freddie; Herce, Michael E.; Mulholland, Grace; Markiewicz, Milissa; Babirye, Susan; Ssendagire, Stevenc; Weir, Sharon S. Improving HIV outreach testing yield at cross-border venues in East Africa, AIDS: May 1, 2020 - Volume 34 - Issue 6 - p 923-930 doi: 10.1097/QAD.0000000000002500
The following SAS code walks through the example scenario laid out on page 12.