Quantcast
Channel: Ask the FireCloud Team — GATK-Forum
Viewing all articles
Browse latest Browse all 1147

Failure to delocalize files when using my own docker image

$
0
0

Hi-

I am having some trouble with file delocalization in firecloud when using my own docker images.

So for the setup, I made a dockerfile that contains all of my source code (an autobuild from a github repo) that I wanted to use for analyses in Firecloud. Then in my method in Firecloud, this docker image would be called in the runtime block, allowing access to my source code (basically a bunch of R scripts that would be run). I hard-carded paths within each wdl task to the corresponding script within the docker container.

When I try to run my method in Firecloud, I am seeing some weird behavior in where files are being moved/written:

message: Task fullPipe.getarray:NA:1 failed. JES error code 5. Message: 10: Failed to delocalize files: failed to copy the following files: "/mnt/local-disk/getarray-rc.txt -> gs://fc-fa093e72-dbcb-4028-ae82-609a79ced51a/3d32ccf4-28ba-43d8-8704-7c87d8f34be7/fullPipe/ae7b05d4-cc26-451b-8a07-00b5b12d26a8/call-getarray/getarray-rc.txt (cp failed: gsutil -q -m cp -L /var/log/google-genomics/out.log /mnt/local-disk/getarray-rc.txt gs://fc-fa093e72-dbcb-4028-ae82-609a79ced51a/3d32ccf4-28ba-43d8-8704-7c87d8f34be7/fullPipe/ae7b05d4-cc26-451b-8a07-00b5b12d26a8/call-getarray/getarray-rc.txt, command failed: CommandException: No URLs matched: /mnt/local-disk/getarray-rc.txt\nCommandException: 1 file/object could not be transferred.\n)"

From the log file, the task seems to be completing but failing when copying files:

2017/08/29 18:09:53 I: Running command: iptables -I FORWARD -d metadata.google.internal -p tcp --dport 80 -j DROP
2017/08/29 18:09:53 I: Setting these data volumes on the docker container: [-v /tmp/ggp-146399440:/tmp/ggp-146399440 -v /mnt/local-disk:/cromwell_root]
2017/08/29 18:09:53 I: Running command: docker run -v /tmp/ggp-146399440:/tmp/ggp-146399440 -v /mnt/local-disk:/cromwell_root -e fc-d960a560-7e5c-4083-b61e-b2ea71ae5b14/passgt.minDP10-gds500/chunk2.freeze4.chrALL.pass.gtonly.minDP10.genotypes.gds=/cromwell_root/fc-d960a560-7e5c-4083-b61e-b2ea71ae5b14/passgt.minDP10-gds500/chunk2.freeze4.chrALL.pass.gtonly.minDP10.genotypes.gds -e __extra_config_gcs_path=gs://cromwell-auth-amp-t2d-op/ae7b05d4-cc26-451b-8a07-00b5b12d26a8_auth.json -e getarray.gdsfilesin-0=/cromwell_root/fc-d960a560-7e5c-4083-b61e-b2ea71ae5b14/passgt.minDP10-gds500/chunk1.freeze4.chrALL.pass.gtonly.minDP10.genotypes.gds -e getarray.gdsfilesin-1=/cromwell_root/fc-d960a560-7e5c-4083-b61e-b2ea71ae5b14/passgt.minDP10-gds500/chunk2.freeze4.chrALL.pass.gtonly.minDP10.genotypes.gds -e exec=/cromwell_root/exec.sh -e getarray-rc.txt=/cromwell_root/getarray-rc.txt -e fc-d960a560-7e5c-4083-b61e-b2ea71ae5b14/passgt.minDP10-gds500/chunk1.freeze4.chrALL.pass.gtonly.minDP10.genotypes.gds=/cromwell_root/fc-d960a560-7e5c-4083-b61e-b2ea71ae5b14/passgt.minDP10-gds500/chunk1.freeze4.chrALL.pass.gtonly.minDP10.genotypes.gds tmajarian/topmed@sha256:b0b54996d86746d199493a94dbc92751c4a1d9399c7898e58174c84d35fe44fe /tmp/ggp-146399440
2017/08/29 18:09:54 I: Switching to status: delocalizing-files
2017/08/29 18:09:54 I: Calling SetOperationStatus(delocalizing-files)
2017/08/29 18:09:54 I: SetOperationStatus(delocalizing-files) succeeded
2017/08/29 18:09:54 I: Docker file /cromwell_root/getarray-rc.txt maps to host location /mnt/local-disk/getarray-rc.txt.
2017/08/29 18:09:54 I: Running command: sudo gsutil -q -m cp -L /var/log/google-genomics/out.log /mnt/local-disk/getarray-rc.txt gs://fc-fa093e72-dbcb-4028-ae82-609a79ced51a/3d32ccf4-28ba-43d8-8704-7c87d8f34be7/fullPipe/ae7b05d4-cc26-451b-8a07-00b5b12d26a8/call-getarray/getarray-rc.txt
2017/08/29 18:09:55 E: command failed: CommandException: No URLs matched: /mnt/local-disk/getarray-rc.txt
CommandException: 1 file/object could not be transferred.
 (exit status 1)

This problem seems to only be with the docker files/images that I create; the task called above completes when a different docker is used (one that was build by someone else). The docker image is public also: tmajarian/topmed. Also, here is the wdl that I am using:

task getarray {
    Array[File] gdsfilesin

    command {
        ls -lh ${sep = ' ' gdsfilesin}
    }

    output {
        Array[File] gdsfilesout = gdsfilesin}

     runtime {
           docker: "tmajarian/topmed@sha256:1b10a60f8ad47316b71e51ea864fa1b68fb0585cc5ac190f827573e6eaa0348e"
     }

}

task common_ID {
        File gds
        File ped
        String idcol
        String label

        command {
                R --vanilla --args ${gds} ${ped} ${idcol} ${label} < /src/workflows/singleVariantFull/commonID.R
        }

        meta {
                author: "jasen jackson"
                email: "jasenjackson97@gmail.com"
        }

        runtime {
           docker: "tmajarian/topmed@sha256:1b10a60f8ad47316b71e51ea864fa1b68fb0585cc5ac190f827573e6eaa0348e"
           disks: "local-disk 100 SSD"
           memory: "3G"
        }

        output {
                File commonIDstxt = "${label}.commonIDs.txt"
                File commonIDsRData = "${label}.commonIDs.RData"
        }
}

task assocTest {
    File gds
    File ped
    File GRM
    File commonIDs
    String label
    String colname
    String outcome
    String outcomeType
    String covariates

    command {
        R --vanilla --args ${gds} ${ped} ${GRM} ${commonIDs} ${colname} ${label} ${outcome} ${outcomeType} ${covariates} < /src/workflows/singleVariantFull/assocSingleVar.R
    }

    meta {
        author: "jasen jackson; Alisa Manning, Tim Majarian"
        email: "jasenjackson97@gmail.com; amanning@broadinstitute.org, tmajaria@braodinstitute.org"
    }

    runtime {
        # docker: "tmajarian/topmed@sha256:1b10a60f8ad47316b71e51ea864fa1b68fb0585cc5ac190f827573e6eaa0348e"
        docker: "tmajarian/topmed:latest"
        disks: "local-disk 100 SSD"
        memory: "30G"
    }

    output {
        File assoc = "${label}.assoc.RData"
    }
}

task summary {
    Array[File] assoc
    String pval
    String label
    String title

    command {
        R --vanilla --args ${pval} ${label} ${title} ${sep = ' ' assoc} < /src/workflows/singleVariantFull/summarySingleVar.R
    }

    runtime {
        docker: "tmajarian/topmed@sha256:1b10a60f8ad47316b71e51ea864fa1b68fb0585cc5ac190f827573e6eaa0348e"
        disks: "local-disk 100 SSD"
        memory: "30G"
    }

    output {
        File mhplot = "${label}.mhplot.png"
        File qqplot = "${label}.qqplot.png"
        File topassoccsv = "${label}.topassoc.csv"
        File allassoccsv = "${label}.assoc.csv"
    }
}

workflow fullPipe {
    Array[File] genFiles
    File this_ped
    File this_kinshipGDS
    String this_label
    String this_colname
    String this_outcome
    String this_outcomeType
    String this_covariates
    String this_pval
    String this_title

    call getarray { input: gdsfilesin=genFiles }

    call common_ID {
        input: gds=getarray.gdsfilesout[0], ped=this_ped, idcol=this_colname, label=this_label
    }

    scatter ( this_genfile in getarray.gdsfilesout ) {
        call assocTest {
            input: gds = this_genfile, ped = this_ped, GRM = this_kinshipGDS, commonIDs = common_ID.commonIDsRData, colname = this_colname, outcome = this_outcome, outcomeType = this_outcomeType, covariates = this_covariates,  label=this_label
        }

    }

    call summary {
        input: assoc = assocTest.assoc, pval=this_pval, label=this_label, title=this_title
    }

    output {
        File mhplot=summary.mhplot
        File qqplot=summary.qqplot
        File allassoc=summary.allassoccsv
        File topassoc=summary.topassoccsv
    }

}

Any input would be totally awesome.

-Tim


Viewing all articles
Browse latest Browse all 1147

Trending Articles