Quantcast
Channel: Ask the FireCloud Team — GATK-Forum
Viewing all articles
Browse latest Browse all 1147

FireCloud Cromwell incorrectly instructing JES to run task on preemptible VM

$
0
0

As mentioned on two other forum threads (see http://gatkforums.broadinstitute.org/firecloud/discussion/8832/how-can-i-tell-if-my-jobs-were-preempted and http://gatkforums.broadinstitute.org/firecloud/discussion/8880/why-am-i-seeing-no-evidence-of-preemption), we have been unable to see any evidence of preemptions of VMs after running 100,000 preemptible jobs. This seemed rather strange and unlikely.

@esalinas and I drilled down a bit further, looking at what Google was telling us about these jobs we were having FireCloud run for us on preemptible VMs. We used the gcloud alpha genomics operations describe command to get the operations metadata for a job that was directed to run on a preemptible VM.

The job/operations ID in question was "operations/EJbW07ebKxiK7peJ_M_NsOIBIJ_jnLeTDCoPcHJvZHVjdGlvblF1ZXVl".

Using FireCloud's /api/workspaces/{workspaceNamespace}/{workspaceName}/submissions/{submissionId}/workflows/{workflowId} API to get the call level data associated with this job, we confirmed that cromwell recognized the job as one that was to run on a preemptible VM. In particular, the preemptible attribute is set to true. I have attached a file to this posting (callMetadata.fcapi.txt) with the output of this firecloud API call.

Using gcloud alpha genomics operations describe , we saw that the preemptible status of a job request is contained in two locations of the response:

(1) in the ephemeralPipeline/resources block, which according to Google Genomics's Melissa Chang is a "create time" attribute. In this block, the preemptible attribute was set to true.
(2) in the pipelinesArgs/resources block, which according to Melissa is a "run time" attribute. In this block, the preemptible attribute was set to false.

I have attached a file to this posting (operationsMetadata.gcloud_aplha_genomics.txt) with the output of this gcloud call.

Melissa has stated that both the create time and runtime attributes need to be set to "preemptible" in order for a job to be run on an preemptible VM, and she confirmed that because the attribute in the runtime block was set to false, the job DID NOT run as a preemptible. This explains why I could not see any evidence of preempted VMs in any of my 100,000 preemptible jobs.

Now, the version of cromwell that is supporting GoTC is successfully running jobs on preemptible VMs....Melissa has confirmed this for us. So the bug we are seeing appears to be isolated to the version of Cromwell that is running within FireCloud.

This is a major blocker for the benchmarking and cost reduction work we are doing and needs to be addressed ASAP.


Viewing all articles
Browse latest Browse all 1147

Trending Articles