Friday, May 20, 2016

[Bioinfo] GATK ERROR: attempting to calculate the mismatch count against a reference string that is smaller than the read

[Bioinfo] GATK ERROR: attempting to calculate the mismatch count against a reference string that is smaller than the read

Before variant calling process, the local realignment at indel regions is known to reduce false positives of variant calls (ref1). There are a few types of errors.

ERROR MESSAGE: attempting to calculate the mismatch count against a reference string that is smaller than the read

1. Diagnose bam file using picard ValidateSamFile:

java -jar picard.jar ValidateSamFile \
     I=input.bam \
     MODE=SUMMARY

Output in the end

## HISTOGRAM    java.lang.String
Error Type      Count
ERROR:CIGAR_MAPS_OFF_REFERENCE  2      <- The reads causing problems
ERROR:MATE_NOT_FOUND    290628

2. Clear bam file using picard CleanSam:

Cleans the provided SAM/BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads

java -jar picard.jar CleanSam \
     I=input.bam \
     O=filtered.bam

The filtered.bam now is able to used as input of GATK realignment.

No comments:

Post a Comment