Quantcast
Channel: Ask the FireCloud Team — GATK-Forum
Viewing all articles
Browse latest Browse all 1147

Possibly misleading failure message?

$
0
0

I (re-)ran our CGA somatic variant calling pipeline on 402 TCGA THCA pairs. In addition to the expected failures due to congestion in rawls (see
https://gatkforums.broadinstitute.org/firecloud/discussion/11860/rawls-failure-in-10-of-402-workflows-launched-in-single-submission#latest ), there were also four failures in workflows, mid-operation, all appearing to be associated with container creation. I reran these four jobs and they all ran successfully through completion. The error messages for these four failed workflows were:

message: Workflow failed
causedBy: 
message: Task Clinical_Workflow.Mutect2_Task:6:1 failed. Job exited without an error, exit code 0. PAPI error code 10. Message: 15: Gsutil failed: Could not capture docker logs: Unable to capture docker logs exit status 1
message: Cromwell server was restarted while this workflow was running. As part of the restart process, Cromwell attempted to reconnect to this job, however it was never started in the first place. This is a benign failure and not the cause of failure for this workflow, it can be safely ignored.

message: Workflow failed
causedBy: 
message: Task Clinical_Workflow.Mutect1_Task:5:1 failed. The job was stopped before the command finished. PAPI error code 10. Message: 11: Docker run failed: command failed: docker: error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.37/containers/create: read unix @->/var/run/docker.sock: read: connection reset by peer. See 'docker run --help'. . See logs at gs://fc-ce9e4f8c-2c1f-4d67-94e7-4170daa0c81d/5e9c7d0c-ae1d-4213-9cdb-b4ef91c25f9f/Clinical_Workflow/fcb66940-3deb-4cef-9439-a4bcf800d6d2/call-Mutect1_Task/shard-5/

message: Workflow failed
causedBy: 
message: Task Clinical_Workflow.Mutect1_Task:5:1 failed. The job was stopped before the command finished. PAPI error code 10. Message: 11: Docker run failed: command failed: docker: error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.37/containers/create: read unix @->/var/run/docker.sock: read: connection reset by peer. See 'docker run --help'. . See logs at gs://fc-ce9e4f8c-2c1f-4d67-94e7-4170daa0c81d/5e9c7d0c-ae1d-4213-9cdb-b4ef91c25f9f/Clinical_Workflow/546b7656-1e1d-4547-bc6c-1e9c22dc2526/call-Mutect1_Task/shard-5/
message: Cromwell server was restarted while this workflow was running. As part of the restart process, Cromwell attempted to reconnect to this job, however it was never started in the first place. This is a benign failure and not the cause of failure for this workflow, it can be safely ignored.

message: Workflow failed
causedBy: 
message: Cromwell server was restarted while this workflow was running. As part of the restart process, Cromwell attempted to reconnect to this job, however it was never started in the first place. This is a benign failure and not the cause of failure for this workflow, it can be safely ignored.
  (11 copies of the same message)
    message: Cromwell server was restarted while this workflow was running. As part of the restart process, Cromwell attempted to reconnect to this job, however it was never started in the first place. This is a benign failure and not the cause of failure for this workflow, it can be safely ignored.
    message: Task Clinical_Workflow.normalMM_Task:NA:1 failed. The job was stopped before the command finished. PAPI error code 10. Message: 11: Docker run failed: command failed: docker: error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.37/containers/create: read unix @->/var/run/docker.sock: read: connection reset by peer. See 'docker run --help'. . See logs at gs://fc-ce9e4f8c-2c1f-4d67-94e7-4170daa0c81d/5e9c7d0c-ae1d-4213-9cdb-b4ef91c25f9f/Clinical_Workflow/3004d191-8ea2-4a29-b934-4e71ac7f9a42/call-normalMM_Task/
    message: Cromwell server was restarted while this workflow was running. As part of the restart process, Cromwell attempted to reconnect to this job, however it was never started in the first place. This is a benign failure and not the cause of failure for this workflow, it can be safely ignored.
(8 copies of the same message)
    message: Cromwell server was restarted while this workflow was running. As part of the restart process, Cromwell attempted to reconnect to this job, however it was never started in the first place. This is a benign failure and not the cause of failure for this workflow, it can be safely ignored.

As mentioned, I reran these four failing workflows and they completed with no problem. Given how adamant the system was in telling me the restarting of cromwell did not cause the workflow failures, I am apt to believe Cromwell's restart did have a role in the workflow failures. Regardless, I'd like to understand the source of these intermittent failures.

Here is information on the failures:

Google Project: cloud-resource-miscellaneous
Workspace: CBB_20180405_TCGA_THCA_ControlledAccess_V1-0_DATA
Submission ID: 5e9c7d0c-ae1d-4213-9cdb-b4ef91c25f9f
Workflow IDs: a8b9ae04-52bf-476f-a4cc-bee63f5aa013, fcb66940-3deb-4cef-9439-a4bcf800d6d2, 546b7656-1e1d-4547-bc6c-1e9c22dc2526, 3004d191-8ea2-4a29-b934-4e71ac7f9a42


Viewing all articles
Browse latest Browse all 1147

Trending Articles