Quantcast
Channel: Ask the FireCloud Team — GATK-Forum
Viewing all articles
Browse latest Browse all 1147

Getting out-of-memory errors while running the worflow for germline short variant discovery

$
0
0

I'm trying to run the wdl posted on the gatk-workflows Github page, under the gatk4-germline-snps-indels repository. The wdl is "haplotypecaller-gvcf-gatk4.wdl"

I'm attempting to run this wdl locally on my computer. This wdl script makes use of GATK in a docker containers to execute tools such as HaplotypeCaller, and MergeVcfs. I'm using Cromwell in "run mode" to run the wdl script. I'm using the exact inputs listed in the haplotypecaller-gvcf-gatk4.hg38.wgs.inputs.json file.

The bam file is the NA12878_24RG_small.hg38.bam, which is about 5 gigs in size.
The fasta file is the Homo_sapiens_assembly38.fasta, which is about 3 gigs in size

Anytime I run this I eventually get out-of-memory errors. It seems like 50 GATK docker containers are getting spun up and run HaplotypeCaller in parallel. This is due to the number of interval lists declared in hg38_wgs_scattered_calling_intervals.txt I think?

I'm running it on a machine with 32G of RAM and 512GB of disk space. My questions are basically:

  1. How much RAM is needed to run this workflow?
  2. Should I set a limit on how much memory each docker container can use in the Cromwell configuration file, and if so, how much should I set it to?
  3. What should the Java heap size be set to?
  4. It looks like it is using the "scatter-gather" technique for paralyzation. Does this require me to set up a cluster of servers to run the workflow? I'm not sure if I can run it like this on just my local computer.

Any insight would be greatly appreciated. Thank you!


Viewing all articles
Browse latest Browse all 1147

Trending Articles