My job crashed due to a problem in my code or inputs. It happens.
As a result, none of the dozen files the job was supposed to produce were created, so when the delocalizer attempted to copy them back to the bucket, the attempt failed. However, the delocalizer code must assume it's just a transient issue, and continues to retry for the next hour before giving up and reporting the job failure. While access to the bucket may be less reliable, access to the source file should be highly reliable.
Could you please add a check for an output file's existance before you attempt to delocalize it to the bucket?
The current non-discriminating delocalization retry loop not only wastes compute cycles, but also prolongs debug cycle time.