I have submitted a large computation to FC with ~4000 shards across 4 calls, all tasks allowing a few preemptibles. I noticed a number of shards eventually failing with the following code:
message: Task workflowAssembly.qcQualityHuman:160:3 failed. The job was stopped before the command finished.
PAPI error code 10. 14: VM ggp-9876237950776228486 stopped unexpectedly.
I know that this has been reported previously by various users. Did you ever figure out why this happens and how to prevent it? I call cache my results, but it's a very large amount of data that will be copied every time I re-run to successfully process the failed shards, and eventually aggregate my results.
Damian@Broad