We are using GATK 4.0 pipeline:
FastqToSam
/data3/tools/gatk-4.0.6.0/gatk --java-options "-Xmx4g" FastqToSam -FASTQ $1_forward_paired.fq -FASTQ2 $1_reverse_paired.fq -OUTPUT $1.unmapped.bam -PLATFORM illumina -READ_GROUP_NAME M70496.34 -LIBRARY_NAME Miseq -SAMPLE_NAME $1
bwa aln_1
/data3/tools/bwa/bwa mem -t 4 -K 100000000 -v 3 /data3/database/ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta $1_forward_paired.fq $1_reverse_paired.fq > $1.sam
samtools view
samtools view -1 $1.sam > $1.bam
MergeBamAlignment
/data3/tools/gatk-4.0.6.0/gatk --java-options "-Xmx4g" MergeBamAlignment --VALIDATION_STRINGENCY SILENT --EXPECTED_ORIENTATIONS FR --ATTRIBUTES_TO_RETAIN X0 --ALIGNED_BAM $1.bam --UNMAPPED_BAM $1.unmapped.bam --OUTPUT $1.merged.bam --REFERENCE_SEQUENCE /data3/database/ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta --PAIRED_RUN true --SORT_ORDER "unsorted" --IS_BISULFITE_SEQUENCE=false --CLIP_ADAPTERS=false --MAX_RECORDS_IN_RAM 2000000 --ADD_MATE_CIGAR true --MAX_INSERTIONS_OR_DELETIONS -1 --PRIMARY_ALIGNMENT_STRATEGY=MostDistant --UNMAPPED_READ_STRATEGY=COPY_TO_TAG --ALIGNER_PROPER_PAIR_FLAGS=true --UNMAP_CONTAMINANT_READS true
SortSam
/data3/tools/gatk-4.0.6.0/gatk --java-options "-Xmx4g" SortSam --INPUT $1.merged.bam --OUTPUT $1.merged.sorted.bam --SORT_ORDER coordinate --CREATE_INDEX false --CREATE_MD5_FILE false
SetNmAndUqTags
/data3/tools/gatk-4.0.6.0/gatk --java-options "-Xmx4g" SetNmAndUqTags --INPUT $1.merged.sorted.bam --OUTPUT $1.merged.sorted.fixed.bam --CREATE_INDEX true --CREATE_MD5_FILE true --REFERENCE_SEQUENCE /data3/database/ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta
MarkDuplicates
/data3/tools/gatk-4.0.6.0/gatk --java-options "-Xmx4g" MarkDuplicates --INPUT $1.merged.sorted.fixed.bam --OUTPUT $1.merged.sorted.fixed.duplicate_marked.bam --METRICS_FILE $1.duplicate_metrics --VALIDATION_STRINGENCY SILENT --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --ASSUME_SORT_ORDER coordinate --CREATE_MD5_FILE true
samtools index
/data3/tools/samtools-1.6/samtools index $1.merged.sorted.fixed.duplicate_marked.bam
BaseRecalibrator
/data3/tools/gatk-4.0.6.0/gatk --java-options "-Xmx4g" BaseRecalibrator -R /data3/database/ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta -I $1.merged.sorted.fixed.duplicate_marked.bam --use-original-qualities true -O $1.recalibration_report --known-sites /data3/database/ftp.broadinstitute.org/bundle/hg19/dbsnp_138.hg19.excluding_sites_after_129.vcf --known-sites /data3/database/ftp.broadinstitute.org/bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -L $2
GatherBQSRReports
/data3/tools/gatk-4.0.6.0/gatk --java-options "-Xmx4g" GatherBQSRReports -I $1.recalibration_report -O $1.recal_data.csv
ApplyBQSR
/data3/tools/gatk-4.0.6.0/gatk --java-options "-Xmx4g" ApplyBQSR -R /data3/database/ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta -I $1.merged.sorted.fixed.duplicate_marked.bam -O $1.merged.sorted.fixed.duplicate_marked.recalibrated.bam -L $2 --bqsr-recal-file $1.recalibration_report --create-output-bam-md5 --use-original-qualities true
However, in the pipeline.out file, we have 7 "Tools returned: " messages, and one of them has very large number as shown below. In fact, we found that 518596 is the "Reads Processed" number from ApplyBQSR.
Can we ignore this number although Tools returned is not zero?
Thank you,
Tool returned:
0
Tool returned:
0
Tool returned:
0
Tool returned:
0
Tool returned:
0
Tool returned:
518596
Tool returned:
0