Before variant calling process, the local realignment at indel regions is known to reduce false positives of variant calls (ref1). There are a few types of errors.
ERROR MESSAGE: attempting to calculate the mismatch count against a reference string that is smaller than the read
1. Diagnose bam file using picard ValidateSamFile:
java -jar picard.jar ValidateSamFile \
I=input.bam \
MODE=SUMMARY
Output in the end
## HISTOGRAM java.lang.String
Error Type Count
ERROR:CIGAR_MAPS_OFF_REFERENCE 2 <- The reads causing problems
ERROR:MATE_NOT_FOUND 290628
2. Clear bam file using picard CleanSam:
Cleans the provided SAM/BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
java -jar picard.jar CleanSam \
I=input.bam \
O=filtered.bam
The filtered.bam
now is able to used as input of GATK realignment.
No comments:
Post a Comment