Sep 19th, 2025
As with any gentle introduction to bioinformatic training, I find myself looking at DESeq2 as a popular choice for RNA-seq gene expression data. In my current job, DESeq2 is nice ending to our existing RNA-seq workflow, it’s a way for us to make sense of the differences between groups eventually. But, throughout my months of twisting and working with R to make a complete (functional) DESeq2 script, I have encountered errors, both very subtle and very disruptive to my work. Here I attempt to summarize the errors.
For a comprehensive understanding of DESeq2, I recommend checking out the series by StatsQuest:
If you ever have this error, you might want to recheck your design (metadata) file, a good one would look something like this:
group,sample,fastq_1,fastq_2
treatment,sample1,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz
treatment,sample2,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz
control,sample3,/path/to/sample3_R1.fastq.gz,/path/to/sample3_R2.fastq.gz
You need to make sure that your raw count data matches the same number of samples as your metadata file.
Normalization is crucial in RNA-Seq to account for sequencing depth and library size differences. Using an inappropriate method can lead to biased or incorrect differential expression results.
Solutions:
DESeq2 uses an iterative algorithm to compute size factors for normalization. If this process fails to converge, it often indicates extreme outliers, very low counts, or problematic sample distributions.
Solutions:
DESeq2 calculates geometric means for normalization. If every gene has a zero in at least one sample, the log of zero is undefined, causing this error.
Solutions: