Seeing multiple instances of this (for example, workflow 2bd68c76-d225-4cc8-a226-e5eb28c48474, submission 20b7af64-5a22-4c4b-8b19-e8b03de85880; I've added GROUP_support@firecloud.org to the workspace). Maybe related to https://gatkforums.broadinstitute.org/firecloud/discussion/10429/failed-jes-error-code-2-message-gaia-unavailable ?
These failures happened on individual scatter jobs. With hundreds of tasks getting terminated as a result, the cost of such errors is non-negligible.
While intermittent, JES errors do occur regularly (mostly code 10). Are any near-term fixes planned? Would it be possible to implement a mechanism avoiding termination of unaffected scatter jobs?