BioCompute Object
The BioCompute Object (BCO) Project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from High-throughput sequencing (HTS -- also referred to as next-generation sequencing or massively parallel sequencing). The project has since been standardized as IEEE 2791-2020, and the project files are maintained in an open source repository. [1] The July 22nd, 2020 edition of the Federal Register announced that the FDA now supports the use of BioCompute (officially known as IEEE 2791-2020) in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data in NDAs, ANDAs, BLAs, and INDs to CBER, CDER, and CFSAN.
Originally started as a collaborative contract between the George Washington University and the Food and Drug Administration, the project has grown to include over 20 universities, biotechnology companies, public-private partnerships and pharmaceutical companies including Seven Bridges and Harvard Medical School.[2] The BCO aims to ease the exchange of HTS workflows between various organizations, such as the FDA, pharmaceutical companies, contract research organizations, bioinformatic platform providers, and academic researchers. Due to the sensitive nature of regulatory filings, few direct references to material can be published. However, the project is currently funded to train FDA Reviewers and administrators to read and interpret BCOs, and currently has 4 publications either submitted or nearly submitted.
Status | Active IEEE Working Group |
---|---|
Related standards | Common Workflow Language |
License | BSD-3-clause |
Abbreviation | BCO |
Website | osf |
Background
One of the biggest challenges in bioinformatics is documenting and sharing scientific workflows in a such a way that the computation and its results can be peer-reviewed or reliably reproduced.[3] Bioinformatic pipelines typically use multiple pieces of software, each of which typically has multiple versions available, multiple input parameters, multiple outputs, and possibly platform-specific configurations. As with experimental parameters in a laboratory protocol, small changes in computational parameters may have a large impact on the scientific validity of the results. The BioCompute Framework provides an object oriented design from which a BCO that contains details of a pipeline and how it was used can be constructed, digitally signed, and shared. The BioCompute concept was originally developed to satisfy FDA regulatory research and review needs for evaluation, validation, and verification of genomics data. However, the Biocompute Framework follows FAIR Data Principles[4] and can be used broadly to provide communication and interoperability between different platforms, industries, scientists and regulators[5]
Utility
As a standardization for genomic data, BioCompute Objects are mostly useful to three groups of users: 1) academic researchers carrying out new genetic experiments, 2) pharma/biotech companies that wish to submit work to the FDA for regulatory review, and 3) clinical settings (hospitals and labs) that offer genetic tests and personalized medicine. The utility to academic researchers is the ability to reproduce experimental data more accurately and with less uncertainty. The utility to entities wishing to submit work to the FDA is a streamlined approach, again with less uncertainty and with the ability to more accurately reproduce work. For clinical settings, it is critical that HTS data and clinical metadata be transmitted in an accurate way, ideally in a standardized way that is readable by any stakeholder, including regulatory partners.
Format
The BioCompute Object is in json format and, at a minimum, contains all the software versions and parameters necessary to evaluate or verify a computational pipeline. It may also contain input data as files or links, reference genomes, or executable Docker components. A BioCompute Object can be integrated with HL7 FHIR as a Provenance Resource[6]. The effort is seen by many to be redundant and unnecessary as the bioinformatics community has already embraced the Common Workflow Language which contains all of these, and superior capabilities, despite the BCO objective to treat the CWL as a Research Object.[7]
BCO Consortium
The BioCompute Object working group facilitates a means for different stakeholders to provide input on current practices on the BCO. This working group was formed during preparation for the 2017 HTS Computational Standards for Regulatory Sciences Workshop, and was initially made up of the workshop participants. There has been a continual growth of the BCO working group as a direct result of the interaction between a variety of stakeholders from all interested communities in standardization of computational HTS data processing. The Public-Private partnerships formed between universities, private genomic data companies, software platforms, government and regulatory institutions have been an easy point of entry for new individuals or institutions into the BCO project to participate in the discussion of best practices for the objects.
Implementations
The simple R package biocompute[8] can create, validate, and export BioCompute Objects. The Genomics Compliance Suite is a Shiny app that offers similar features to regular expressions found in all modern text editors. There are several internally developed open source software packages and web applications that implement the BioCompute specification, three of which have been deployed in a publicly accessible AWS EC2 cloud. These include an instance of the High-performance Integrated Virtual Environment, the BioCompute Portal[9] (a form-based web application that can create and edit BioCompute Objects based on the IEEE-2791-2020 standard, and a BioCompute compliant instance of Galaxy.
References
- Simonyan V, Goecks J, Mazumder R. Biocompute Objects—A Step towards Evaluation and Validation of Biomedical Scientific Computations. PDA journal of pharmaceutical science and technology. 2017;71(2):136-146. doi:10.5731/pdajpst.2016.006734.
- "BioCompute Objects specifications to advance genomic data analysis". www.europeanpharmaceuticalreview.com. Retrieved 2017-12-21.
- Sandve, Geir Kjetil; Nekrutenko, Anton; Taylor, James; Hovig, Eivind (24 October 2013). "Ten Simple Rules for Reproducible Computational Research". PLOS Computational Biology. 9 (10): e1003285. doi:10.1371/journal.pcbi.1003285. PMC 3812051. PMID 24204232.
- Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem; Santos, Luiz Bonino da Silva (2016-03-15). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data. 3: 160018. doi:10.1038/sdata.2016.18. PMC 4792175. PMID 26978244.
- Alterovitz, Gil; Dean, Dennis A.; Goble, Carole; Crusoe, Michael R.; Soiland-Reyes, Stian; Bell, Amanda; Hayes, Anais; King, Charles Hadley H.; Johanson, Elaine; Thompson, Elaine E.; Donaldson, Eric; Tsang, Hsinyi S.; Goecks, Jeremy; Almeida, Jonas S.; Guo, Lydia; Walderhaug, Mark; Walsh, Paul; Kahsay, Robel; Bloom, Toby; Lai, Yuching; Simonyan, Vahan; Mazumder, Raja (21 September 2017). "Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results". bioRxiv: 191783. doi:10.1101/191783 – via www.biorxiv.org.
- "Provenance-example-biocompute-object". HL7 FHIR Release 3 (STU).
- Soiland-Reyes, Stian (2016-12-19), hive-cwl-examples: Example BioCompute as Research Object with CWL, retrieved 2017-12-21
- "CRAN - Package biocompute". cran.r-project.org. Retrieved 2019-11-28.
- "BioCompute Portal". github.com/biocompute-objects. Retrieved 2020-06-25.