Quantcast
Channel: Ask the FireCloud Team — GATK-Forum
Viewing all articles
Browse latest Browse all 1147

samtools view slice of cloud storage bam not working

$
0
0

Hi,

I am working with WGS data, and since it's so huge (upwards of 300 GB in some cases), when I scatter across many instances I'd like to be able to avoid localizing the entire bam for each scatter. Instead, I'd like to be able to operate on only the portion of the corresponding to the interval I've assigned to each scatter instance. To this end, I'm trying to use samtools to view only certain parts of the bam. I'm trying to follow the instructions listed here, but can't get it to work: http://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/data/data2/data_in_GCS.html

These are the commands being run in my instance, along with the output/error message ensuing.

>> gcloud auth print-access-token
+ GCS_OAUTH_TOKEN=*******[redacted]********

>> samtools view gs://fc-47b16dc3-db04-48f5-a26a-ddec3c09c578/workspace_name/RP-1476/WGS/MSK-004_T_P1/v7/MSK-004_T_P1.bam 1:1-15000000
open: No such file or directory
[main_samview] fail to open "gs://fc-47b16dc3-db04-48f5-a26a-ddec3c09c578/workspace_name/RP-1476/WGS/MSK-004_T_P1/v7/MSK-004_T_P1.bam" for reading.

And this is the WDL command code that generated those commands:

task ProportionalCoverage_WGS_Task {
    File reference
    File referenceDict
    File referenceIndex
    File inputBamLocation
    String sampleID
    Int memoryGb
    Int diskSpaceGb
    File targetsIntervalList
    Int preemptible

    command <<<
        samtools view ${inputBamLocation} $(head -n1 ${targetsIntervalList}) >> bam_section.bam
        samtools index bam_section.bam

        java -jar /gatk/gatk.jar CalculateTargetCoverage \
        -L ${targetsIntervalList} \
        --output ${sampleID}.pcov \
        --groupBy SAMPLE \
        --transform PCOV \
        --input bam_section.bam \
        --reference ${reference}
    >>>

    output {
        File pcov = "${sampleID}.pcov"
    }

    runtime {
        docker: "broadinstitute/gatk:4.beta.6"
        memory: "${memoryGb} GB"
        cpu: "1"
        disks: "local-disk ${diskSpaceGb} HDD"
        preemptible: preemptible
    }
}

How can I do this? This will save me countless hours while developing my workflows for WGS, and I'm sure would be very useful to others in the community.

Thanks,

Eric


Viewing all articles
Browse latest Browse all 1147

Trending Articles