Logo

Khoi Bui Dinh

Home | Projects | Resources | Posts

Learning how to use DESeq2 through mistakes

Sep 19th, 2025

learning

As with any gentle introduction to bioinformatic training, I find myself looking at DESeq2 as a popular choice for RNA-seq gene expression data. In my current job, DESeq2 is nice ending to our existing RNA-seq workflow, it’s a way for us to make sense of the differences between groups eventually. But, throughout my months of twisting and working with R to make a complete (functional) DESeq2 script, I have encountered errors, both very subtle and very disruptive to my work. Here I attempt to summarize the errors.

How does DESeq2 work?

For a comprehensive understanding of DESeq2, I recommend checking out the series by StatsQuest:

Mistakes (bugs) I encountered and how to fix them:

Error #1: Wrong shape for design file

If you ever have this error, you might want to recheck your design (metadata) file, a good one would look something like this:

group,sample,fastq_1,fastq_2
treatment,sample1,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz
treatment,sample2,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz
control,sample3,/path/to/sample3_R1.fastq.gz,/path/to/sample3_R2.fastq.gz

You need to make sure that your raw count data matches the same number of samples as your metadata file.

Error #2: Choosing the wrong normalization method

Normalization is crucial in RNA-Seq to account for sequencing depth and library size differences. Using an inappropriate method can lead to biased or incorrect differential expression results.

Solutions:

  • Use standard normalization methods recommended for your analysis software (e.g., DESeq2’s default size factor normalization).
  • Avoid arbitrary transformations; consult DESeq2 documentation to select a method compatible with your experimental design.

Error #3: DESeq2 iterative size factor normalization did not converge

DESeq2 uses an iterative algorithm to compute size factors for normalization. If this process fails to converge, it often indicates extreme outliers, very low counts, or problematic sample distributions.

Solutions:

  • Filter out genes with consistently low counts.
  • Check for outlier samples and consider removing or re-examining them.
  • Ensure your count matrix is formatted correctly, with samples in columns and genes in rows.

Error #4: Every gene contains at least one zero, cannot compute log geometric means

DESeq2 calculates geometric means for normalization. If every gene has a zero in at least one sample, the log of zero is undefined, causing this error.

Solutions:

  • Add a small pseudocount (e.g., 1) to all counts before computing geometric means.
  • Consider filtering genes with too many zeros.
  • Ensure that your dataset is appropriate for DESeq2 analysis (extremely sparse matrices may require alternative approaches).