GenotypeGVCFs GATK 4.1.0.0 error bcf_update_format: Assertion `nps && nps*line->n

Hello!

I am trying to run GenotypeGVCFs on a set of 12 pooled-seq (50 fish per pool, using a ploidy of 20 in GATK) whole genome samples. Unfortunately I have had recurrent java memory issues ( I think!) and this one error that I can't figure out. I posted a version of this question originally on gatk/discussion/13430 and didn't pursue it because I thought it was tied to other problems I was having with GenomicsDBImports. Since then I have tried a couple things, but still get the error.

Here is a little bit about my analysis so far: I have pre-processed bam files (according to GATK best practices) for each of my 12 pools. I'm working on a non-model organism, so I don't have a full genome to align my reads to, just 24 linkage groups and 8626 scaffolds. The number of scaffolds has presented some issues, so at times along my pipeline I have split the analysis into two tracks, keeping the linkage groups together and the scaffolds together. I start with HaplotypeCaller scatter/gathering over the linkage groups then using -L and treating the scaffolds as intervals. Then MergeGVCFs to combine all the GVCFs into one GVCF file per pool. I use LeftAlignAndTrimVariants to left align and trim the variants (I was having a problem with GenomicsDB where there were variants with a ton of PLs and at it was preventing me from importing to the GenomicsDB, this seems to have solved that problem). Then I use GenomicsDBImport to make a GenomicsDB for each of my linkage groups, with all my pools included. I'm using CombineGVCFs to combine all the GVCFs for the scaffolds.

This is where I run into trouble. I have made a GenomicsDB for each of my 24 linkage groups without issue, but to run GenotypeGVCFs on the GenomicsDB and get genotypes I have to split the linkage group into intervals. I can get some intervals of just under ~2million base pairs to genotype with this command and runtime:

  command <<<
    set -e

    tar -xf ${workspace_tar}
    WORKSPACE=$( basename ${workspace_tar} .tar)

    "/gatk/gatk" --java-options "-Xmx125g -Xms125g" \
     GenotypeGVCFs \
     -R ${ref_fasta} \
     -O ${output_vcf_filename} \
     --disable-read-filter NotDuplicateReadFilter \
     -V gendb://$WORKSPACE \
     --verbosity ERROR \
     -L ${interval}

  >>>
  runtime {
    docker: "broadinstitute/gatk:4.1.0.0"
    memory: "150 GB"
    cpu: "4"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible
  }

But some intervals fail with the following error, which I have been treating like an out of memory error, and continuously bumping up the -Xmx, -Xms, total memory and the disk space (up to 500G), though I'm not sure which would help- I just feel desperate to get them genotyped.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb8e3875edf, pid=19, tid=0x00007fdefaaed700
#
# JRE version: OpenJDK Runtime Environment (8.0_191-b12) (build 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 )
# Problematic frame:
# C  [libtiledbgenomicsdb571603236969900509.so+0x17dedf]  void VariantOperations::reorder_field_based_on_genotype_index<int>(std::vector<int, std::allocator<int> > const&, unsigned long, MergedAllelesIdxLUT<true, true> const&, unsigned int, bool, bool, unsigned int, RemappedDataWrapperBase&, std::vector<unsigned long, std::allocator<unsigned long> >&, int, std::vector<int, std::allocator<int> > const&, unsigned long, std::vector<int, std::allocator<int> >&)+0x6f
#
# Core dump written. Default location: /cromwell_root/core or core.19
#
# An error report file with more information is saved as:
# /cromwell_root/hs_err_pid19.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Following the examples in this thread: gatk/discussion/comment/43887, I have also tried adding -XX:+UseSerialGC and using --use-jdk-deflater true and --use-jdk-inflater true, and using the most recent GATK: 4.1.1.0. Those have not helped.

Do you know why I might be getting this error? Should I continue to increase the memory/disk space, or reduce the size of the interval, or is there something else that could be wrong?

The other error that I get is maybe more cryptic:

java: /home/vagrant/GenomicsDB/dependencies/htslib/vcf.c:3641: bcf_update_format: Assertion `nps && nps*line->n_sample==n' failed.
Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx125g -Xms125g -jar /gatk/gatk-package-4.1.0.0-local.jar GenotypeGVCFs -R /cromwell_root/uw-hauser-pcod-gatk-tests/gadMor2.fasta -O LG01.3.Genotyped.vcf.gz --disable-read-filter NotDuplicateReadFilter -V gendb://genomicsdb_LGs --verbosity ERROR -L LG01:6000001-8000000

That one seems to be tied to a specific location in the genome. Though this run with the interval LG01:6000001-8000000 failed, it produced part of a VCF file. I made two smaller intervals from the original interval to see if avoiding the area around where the VCF file stopped would help it finish running. My two new intervals were: LG01:6000001-7134350 and LG01:7150000-8000000. The first interval worked fine, but the second gave me the same error and no VCF file.

Do you have any ideas what might be happening here, and what bcf_update_format: Assertion `nps && nps*line->n_sample==n' failed means? I don't need to genotype every location- if some are giving me issues because they are too messy I am ok dropping them. I would just like to make sure that I am not inadvertently introducing errors in my data, or ignoring some warning flags for issues I need to address in the analysis. And, with so many linkage groups split up into intervals, it is tough to have a pipeline that requires manual tinkering when intervals fail.

Thank you so much for any advice you have!

GenotypeGVCFs GATK 4.1.0.0 error bcf_update_format: Assertion `nps && nps*line->n_sample==n' failed.

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112