- 1 Introduction
- 2 Methods
- 3 Results
- 4 Discussion
- 5 Conclusion
SARS-CoV-2 was first identified in December 2019 in Wuhan, China and has since been declared a worldwide pandemic.1 2 Since the initial description of the outbreak, the number of publications, including commentaries, expert opinions, mathematical modelling studies, observational studies, as well as the results of several randomised clinical trials (RCTs)3–5 in patients with SARS-CoV-2 has increased rapidly. Online resource centres have been set up to consolidate SARS-CoV-2-related literature with fast turn-around times for article submission and publication.6 The body of literature surrounding SARS-CoV-2 has thereby rapidly increased.7
High-quality evidence is desirable to guide clinical practice.8 In the hierarchy of quality of original science involving patients, RCTs generally provide the most reliable evidence, followed by cohort studies, case–control studies, case series and case reports.9 Systematic reviews, narrative reviews and commentaries/expert opinions are also sources of summative scientific information. Systematic reviews consolidate evidence through a structured method of collecting data, appraising evidence and synthesising results.10 Robust systematic reviews that include a meta-analysis of high-quality RCTs can provide more reliable evidence than the constituent RCTs themselves.11 A meta-analysis can also assess for publication bias and provide an estimate of effect size towards the research question. Narrative reviews are less reliable because they do not employ a systematic approach to summarising the evidence,12 and are therefore prone to bias.13 Commentaries/expert opinions are similar to case reports and case series as they are frequently anecdotal.11
In this paper, we critically examine the quality of evidence from papers on SARS-CoV-2 infection published in January and February 2020.
This systematic review was reported following standards outlined in the Preferred Reporting Items of Systematic Reviews and Meta-Analyses (PRISMA) statement.14 The research protocol was formulated prospectively and is available online here.
A comprehensive search of MEDLINE and EMBASE was conducted on 21 April 2020, with the aid of a medical librarian at McMaster University. We limited our search to papers published from 1 January 2020 to 21 April 2020 and searched for articles using the following keywords: ‘COVID-19’, ‘SARS-CoV-2’, ‘2019-nCoV’, ‘2019 novel coronavirus’, ‘Wuhan coronavirus’, ‘new coronavirus’ and ‘coronavirus’. Results were uploaded to Endnote (Clarivate Analytics) and duplicates were removed.
The articles from Endnote were uploaded onto Microsoft Excel to filter papers based on publication date. Studies published between January and February 2020 or an unspecified publication date were included. Titles and abstracts were then uploaded onto Covidence (Veritas Health Innovation Ltd) for independent screening. Screening was performed in duplicate across two stages. First, two review authors (SI and AS) independently assessed titles and abstracts. Next, included articles underwent full-text analysis by the same two authors to assess eligibility for data extraction. Any discrepancies in eligibility during a given stage were resolved through joint discussions with a third reviewer (SY). Inclusion criterion was any studies pertaining to clinical outcome or characteristic of SARS-CoV-2. Exclusion criteria for abstract and title screening included: studies whose full text could not be sourced, studies not pertaining to SARS-CoV-2, in vitro and in vivo lab studies, studies not published in January or February 2020, studies not published in the English language and virus transmission models. In full-texting screening, we added the exclusion criterion of commentaries or case reports to record the number of these types of publications.
Data were extracted from each study independently by two authors (SI and AS), and discrepancies were resolved through discussion until agreement was reached between the reviewers. Information extracted included journal of publication, impact factor of journal, country of journal basis, date of publication, number of times the paper has been cited, study design and topic of study. Journal impact factors were determined using the 2018 InCites Journal Citation Report, and the number of citations of each study was obtained using Google Scholar (Google LLC, Mountain View, Cali.).
Assessment of methodological quality
Methodological quality was independently assessed by two authors (SI and AS), and discrepancies in assessment were resolved through joint discussion until consensus was reached. Tools used for evaluation include AMSTAR-2 for systematic reviews,15 Joanna Briggs Institute (JBI) Checklist for case series,16 Scale for the Assessment of Narrative Review Articles (SANRA) for narrative reviews17 and the Newcastle-Ottawa scale for non-randomised studies.18 Selections of unclear (U) or N/A were assigned 0 points, and total scores for each study on their corresponding scale were tabulated. Given that AMSTAR-2 does not generate a numerical score, these studies were reported as low, medium or high quality per AMSTAR-2.
Methods of data analysis and synthesis
Descriptive data were recorded using Microsoft Excel (Microsoft, Redmond, Wash.) and summarised in tables and graphs. Quality assessment data were summarised in subgroups with values reported as median (IQR). The number of patients per study type as well as across all included studies were summarised using median (IQR). Statistical analyses for Spearman’s rank correlation between quality of the article and journal impact factor were calculated using GraphPad Prism 7 (GraphPad Software, San Diego, Cali.). Journals whose impact factors could not be found were not included in these analyses but were included in the general descriptive results.
After removing 625 duplicates, our initial search yielded a total of 2504 citations. 1963 studies were excluded during title and abstract screening, leaving 541 citations that pertained to a clinical outcome or characteristic of SARS-CoV-2 for full-text screening. Of the 541 studies, 432 studies were excluded. Out of these 432 excluded studies, 295 were commentaries, 36 were case reports and 101 were excluded based on reasons listed in the PRISMA flow chart. 109 citations were left to be extracted. The PRISMA flow chart of study selection can be found online in the online supplemental file.
General characteristics of studies
The countries where studies were performed spanned 15 countries. The number of studies included in each country can be found in online supplemental table 7. The USA was the country of origin of most of the included studies (35/109, 32.1%) followed by the UK (29/109, 26.6%) and China (16/109, 14.7%). 59/109 (41.8%) of the included studies were narrative reviews, 45/109 (31.9%) were case series, 1/109 (0.7%) was a cohort study and 5/109 (3.5%) were systematic reviews (table 1). We did not find any published randomised clinical trials. The most common study topic was an overview of SARS-CoV-2. The median number of patients in case series and cohort studies was 46 (IQR=66). The median number of patients included in all study types can be found in table 2.
Methodological quality of studies
The quality scales that we used to grade papers assessed certain methodological processes including quality of literature search strategies, presentation of data and use of consecutive participant inclusion. While this approach does not provide a comprehensive critical appraisal of a paper, it enables standard comparisons between papers. The methodological quality of each type of study is summarised intable 3 . The specific scoring and description of each study can be found in online supplemental tables 1–4.
The JBI checklist, which consists of a 10-point scale, was used to score case series. No studies received a perfect score and one study received a 0/10 score.19 The median score was 7 (2). Use of valid methods for identification of all participants’ condition (40/45, 88.9%) and clear reporting of the demographics of participants were the highest scoring criteria (43/45, 95.6%), while the least reported criteria were consecutive inclusion of participants and clear reporting of the presenting site(s)/clinic(s) demographic information (0/45, 0%).
A total of 58 narrative reviews were identified with the median score being 9 (2) on the 12-point SANRA scale. Two studies scored perfectly.20 21 The most reported criteria (where most studies scored 2/2 for that specific criterion) were key statements supported by references (56/58, 96.6%), appropriate scientific reasoning (56/58, 96.6%) and appropriate presentation of data (53/58, 91.4%). The least reported criterion (where most studies scored 0/2) was description of the literature search (2/58, 3.4%).
The cohort study received a score of five on the nine-point Newcastle-Ottawa Scale for non-randomised studies. The study scored highest in the selection category (3/4), and lowest in the outcome category (1/3).
The AMSTAR 2 checklist, which is scaled from critically low to high overall, was used to score the systematic reviews. Of the five total systematic reviews we included, three received a score of ‘low’. These studies had one critical flaw and could not be justified to have provided an accurate and comprehensive summary of its included studies. The remaining two systematic reviews received a ‘moderate score’ as they had more than one non-critical weakness but provided an accurate and comprehensive summary of the available studies.
A two-tailed Spearman’s rank correlation test revealed no significant correlation between study quality and the impact factor of the publishing journal for case series studies (r=0.11; CI: −0.21 to 0.41; p=0.49) or narrative reviews (r=0.12; 95% CI: −0.17 to 0.40; p=0.40).
In this systematic review, we evaluated clinical studies of COVID-19 published between January and February 2020. Of 541 papers that reported clinical characteristics, 295 were commentaries/expert opinions and 36 were case reports. There were no RCTs, 45 case series studies, 58 narrative reviews, 1 cohort study and 5 systematic reviews. For studies that included more than one patient, the majority consisted of narrative reviews (n=58) and case series (n=45). China was the originating country for over half of the literature (56%) with most of the studies being focused on the diagnosis of COVID-19 (51%). There was no significant correlation between study quality and the impact factor of its publishing journal.
Commentaries and expert opinions can offer practical knowledge for practitioners,22 23 especially when SARS-CoV-2 was first described and little was known about the diagnosis and management. Similarly, case reports can offer insights into the clinical presentation and possible approaches to management in the early stages of the pandemic, although they are severely limited in generalisability due to their n=1 sample size. Both case reports and case series lack a control group and are susceptible to selection and reporting biases, but can provide information that informs and develops hypotheses that can be tested using more rigorous study methodology.24 The initial case series articles documenting the recovery of patients taking lopinavir–ritonavir therapy which was later investigated through an RCT serves as an example.25 Well-designed cohort studies overcome many of the limitations of case series but can still introduce bias. A recent methodological evaluation of the cohort studies and RCTs examining chloroquine and hydroxychloroquine therapy for SARS-CoV-2 found concerning methodological issues.26 While it is important to disseminate information rapidly, especially during a pandemic of a novel virus, it is essential for both authors and journals to ensure scientific rigour.7 The consequences of not adhering to sound scientific principles can be quite drastic, as seen with the recent retraction of a major chloroquine and hydroxychloroquine study in the Lancet that, in retrospect, inappropriately halted many RCTs worldwide. Furthermore, when primary studies are not undertaken with rigour, review articles and meta-analyses inadvertently become at risk of bias as they consolidate and summarise primary literature.
In the setting of a pandemic, each study design can hold value if methodologically sound, but it is important for clinicians to recognise their strengths and limitations. Though RCTs provide the most reliable evidence concerning efficacy and safety of interventions, they can be costly to conduct, require ethics and regulatory approvals and require adequate follow-up.27 Furthermore, other study designs might be needed to bridge knowledge gaps where RCTs cannot, such as exploring disease pathogenesis and characteristics. Systematic reviews can quickly summarise the conclusions across these studies to help disseminate knowledge quickly and efficiently to clinicians.
A strength of our study is that it provides a practical view of the body of literature used to inform early clinical decisions and highlights a need for earlier high-quality controlled study designs such as RCTs and controlled-cohort studies. Furthermore, the use of SANRA and the JBI checklist allowed for analysis of narrative review and case series quality, which are seldom measured. Our study also highlights specific weaknesses in methodology used, allowing authors to incorporate this information in future publications. For example, most narrative reviews did not describe their search strategy, and most case series did not adequately report the presenting site/clinic demographic information.
To the best of our knowledge, there have been no studies published examining the quality of literature in the early phases of other diseases or pandemics. While other studies extracted similar data to our paper, they had a greater focus on literature characteristics without necessarily evaluating literature quality. Further, they included a broader range of studies including non-clinical papers.28 29 Our study also has limitations. While we identified overall poor study designs and moderate quality, we cannot objectively determine whether the quality of evidence was inferior to the usual quality of evidence over a comparable timeframe. Our search was limited to articles published in the English language in January and February. In these very early stages, lower thresholds of quality may be acceptable as the body of literature develops. The metric of citation rates via google scholar may be inflated, but nevertheless provides a platform to compare the studies.30
In the context of the low-moderate quality COVID-19 evidence base, it is important for clinicians to continue practicing evidence-based medicine. Clinicians should be competent in appraising the literature and applying relevant findings to the appropriate patient population. It is important to not inform practice solely on expert opinions and commentaries. Furthermore, while high impact journals are expected to publish higher quality studies, our study shows this was not the case in early COVID-19 research. This further highlights the need for clinicians to use appropriate appraisal skills for independent studies, and not apply findings based on the publishing journal.
Our study demonstrates that early COVID-19 literature was of, at best, moderate quality, and very limited in terms of study design, with only one cohort study and no RCTs. The quality of the published literature did not correlate with the impact factor of the publishing journal. While these studies are helpful in informing clinical decisions early in the pandemic, clinicians should be cautious in adopting evidence from the early literature into their practices. Further research is warranted to determine how the quality of evidence changes over time, and to see how the emerging evidence informs RCT design.