The 2017 Market for NGS Informatics: Probing the Commercial Landscape is designed to help companies understand the challenges NGS users face each day. The scale of the data generated is not simply an obstacle for individual researchers trying to interpret it, but it presents significant informatics issues for reproducibility and even collaboration. NGS users are well aware that simply generating data does not lead to a proportionate increase in knowledge.
The report is based on a survey of 400 scientists using an NGS platform in their labs. The 43-question survey was designed to understand the instruments used as well the focus of the users’ research or clinical applications. We paid particular attention to each phase of data analysis.
We focused first on questions relating to primary analysis where raw data from the sequencing instrument is transformed into base calls with quality scores (e.g., creating a FASTQ file).
Next, we explored the users’ experiences relating to secondary analysis where scientists analyze the sequence data and reduce it to high-level, genomic-context related data without interpretation (e.g., aligning reads to a genome or creating a VCF file with variants detected and coverage).
Then we asked about methods used for tertiary analysis or downstream analysis, ranging from annotation to interpretation and reporting (e.g., trio analysis on a rare disease and to identify causal variants)
Sequencing an individual human genome involves generating short reads and mapping them to a known human reference genome. Read alignment maps hundreds of millions of short strings onto a reference string that can number three billion in length. Scientists and bioinformaticians are challenged to continually develop advanced data structures and highly parallel algorithms.
Our research confirms the direction of genomic research is changing. In the past, a single or just a handful of genomes were analyzed. Many translation research projects and the promise of personalized medicine now involve the analysis of hundreds of genomes in a single run. Storing and transmitting genomic data is another major challenge. According to sequencing expert Shawn Baker, Ph.D. , the BAM file (a semi compressed alignment file) for a single 30X human whole-genome sample is about 90 GB. A relatively modest project of 100 samples would generate nine terabytes of BAM files. The problem is further exacerbated by the creation of metadata file to ensure data associated with a genome is not lost, and the integration with other types of information such as transcriptomic, methylomic and metabolomic data.
NGS data analysis, transmission and storage must also take into account the issues of data privacy and security which has inhibited the widespread adoption of cloud-based solutions and is particularly sensitive in clinical applications. Our survey also looked into workflow management systems employed by labs to provide an end-to-end infrastructure to manage the scientific workflow.
NGS Lab Budget Allocation
We learned that the biggest pain point of respondents in dealing with NGS data analysis is not “cost”, but rather the informatics expertise required and data storage requirements. The near-term challenge for commercial software developers will be competing against freeware and in-house developed solutions, and – with an estimated 3,000 solutions on the market – creating awareness of their products’ availability.
Types of NGS Software Used
The scope and scale of sequencing projects will only continue to grow as throughput per platform increases and sequencing costs decrease. But continued advancements in sequencing technology are offset by the ability of scientists to interpret biologically or clinically relevant information. Although the report was just published this week, companies are already using it to benchmark their performance against their peers, and to better align their solutions with the expressed needs of the NGS community. Please download an Executive Summary and detailed Table of Contents. The 2017 Market for NGS Informatics: Probing the Commercial Landscape provides deep insights into the needs of end-users so that solutions to the many challenges faced by NGS labs can be overcome.