Waves Waves Waves

About Ocean Genomics

Our Mission

To provide you the world's most efficient, accurate, and comprehensive gene expression analyses by supporting and extending our current state-of-the-art gene expression software.

Why It Matters

The same genome is shared by nearly every cell in your body. Depending on the tissue, cell type, environment, and disease state of the cell, different sets of genes are used in different abundances. Gene expression quantification measures the level of activity of each gene in a particular sample. The genome provides the blueprint, but gene expression captures how that blueprint is actually used at any point in time.

Among thousands of applications, these expression patterns have been used to identify cancer subtypes and matching treatments, to separate breast cancer patients with good versus poor post‐treatment prognosis, to predict post-surgery recurrence for prostate cancer patients, and to predict the effects of potential drug compounds.

Gene expression is measured using a technique called RNA-seq that sequences the expressed genes in proportion to their level of activity. However, RNA-seq provides only raw sequence fragments. Ocean’s software analysis pipeline converts this raw data info usable and interpretable features, supporting drug discovery and personalized treatment recommendations.

The Open Ocean Philosophy and Commitment

The core methods and software around which the Ocean Genomics ecosystem is built are open source; it is our commitment that they remain open source and freely available.

The products, services, and support provided by Ocean Genomics center around an ecosystem of free (as in both speech and beer), open-source software, mostly written by us and our collaborators. The initial development of this software was funded largely through research grants, and the corresponding methods were described in the scientific literature as contributions to advance science. Thus, we think it is important that these tools remain unencumbered by restrictions on use or commercial licensing.

We also believe that the usefulness of gene expression can be expanded by offering additional support, integration, and services around these tools. That is the mission of Ocean Genomics. This additional commitment to state-of-the-art RNA-seq analysis software will help ensure that the open source projects continue to be supported, expanded, and useful. We believe that your experience with these open source tools will be improved by the additional support and care that Ocean Genomics can provide, even if you are not an Ocean Genomics customer.

If you are a user of any of the open source tools that are part of the Ocean Genomics ecosystem, it is our commitment to you that your ability to use, build upon, and modify these tools should not be diminished in any way. These tools will remain open source, and continue to be actively supported, as they currently are, via the relevant GitHub pages and user groups.


Rob and Carl

Our People

We’re experts in expression analysis algorithms.

Carl Kingsford


  • Associate Professor
  • Computational Biology Department
  • Chief Science Officer, Center for Machine Learning and Health
  • Carnegie Mellon University
  • Pittsburgh, PA
  • carl@oceangenomics.com



  • Assistant Professor
  • Computer Science Department
  • Stony Brook University
  • Stony Brook, NY
  • rob@oceangenomics.com


Senior Programmer


Eric Schultz

Strategic Advisor

  • Evolvagent
  • Advisor, Center for Machine Learning and Health Carnegie Mellon University
  • Pittsburgh, PA
  • eric@oceangenomics.com

William Kea

Strategic Advisor


Our Publications

Research into gene expression analytics is rapidly expanding—and our team is leading the way.

Alevin efficiently estimates accurate gene abundances from dscRNA-seq data.

Genome Biology. 2019;20:65.

Avi Srivastava, Laraib Malik, Tom Smith, Ian Sudbery, Rob Patro

We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular...

SQUID: transcriptomic structural variation detection from RNA-seq.

Genome Biology. 2018;19:52.

Cong Ma, Mingfu Shao, Carl Kingsford

Transcripts are frequently modified by structural variations, which lead to fused transcripts of either multiple genes, known as a fusion gene, or a gene and a previously...

Graph-guided assembly for novel HLA allele discovery.

Genome Biology. 2018;19:16.

Heewook Lee, Carl Kingsford

Accurate typing of human leukocyte antigen (HLA) is important because HLA genes play important roles in immune responses and disease genesis. Previously available compu...

MUMmer4: A fast and versatile genome alignment system.

PLoS Computational Biology. 2018;14(1):e1005944.

Guillaume Marçais, Arthur L Delcher, Adam M Phillippy, Rachel Coston, Steven L Salzberg, Aleksy Zimin

The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last majo...

Scallop enables accurate assembly of transcripts through phasing-preserving graph decomposition.

Nature Biotechnology. 2017;35:1167-1169.

Mingfu Shao, Carl Kingsford

We introduce Scallop, an accurate reference-based transcript assembler that improves reconstruction of multi-exon and lowly expressed transcripts. Scallop preserves lon...

Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference.

Nature Methods. 2017;14:417-419.

Rob Patro, Geet Duggal, Michael I Love, Rafael A Irizarry, Carl Kingsford

We introduce Salmon, a method for quantifying transcript abundance from RNA-seq reads that is accurate and fast. Salmon is the first transcriptome-wide quantifier to co...

Fast search of thousands of short-read sequencing experiments.

Nature Biotechnology. 2016;34:300-302.

Brad Solomon, Carl Kingsford

The amount of sequence information in public repositories is growing at a rapid rate. Although these data are likely to contain clinically important information that has n...

Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.

Nature Biotechnology. 2014;32:462-464.

Rob Patro, Stephen M. Mount, Carl Kingsford

We introduce Sailfish, a computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Because Sailfish entirely avoi...

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Bioinformatics. 2011;27(6):764-770.

Guillaume Marçais, Carl Kingsford

Motivation: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including...

Our Applications

Our team has developed some of the most widely used open source applications for gene expression analysis.

Learn More