We run workflows to download files from the GDC and copy them into cloud storage. The two workflows are:
broadinstitute_cga/gdc_file_downloader/1
broadinstitute_cga/gdc_bam_downloader/1
The workflows take as input the uuid of a file at the GDC and download that file onto the VM. FireCloud then ensures the file is copied to cloud storage as part of standard delocalization of workflow output files.
The workflows each have two tasks. The first task (disk_size_calculator) queries the GDC for the size of the file. That filesize is then passed into the second task (gdc_bam_downloader or gdc_file_downloader) as a runtime parameter for sizing the VMS attached storage (we need to ensure the VM that is running the downloader has enough disk space to store the downloaded file). gdc_bam/file_downloader runs the gdc-client to retrieve the file.
These workflows were running successfully on July 18th. They began to fail yesterday...I'm guessing right after FireCloud was updated.
The disk_size_calculator is failing...it is unable to connect to the GDC to query for the file size.
The failure we are getting is in the disk_size_calculator task. Here is the error message reported in stdout.log
Exception= HTTPSConnectionPool(host='gdc-api.nci.nih.gov', port=443): Max retries exceeded with url: /files/a7b7b65f-609d-4ad7-b535-
adbcec1d79b9?fields=file_size (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at
0x7ff67c25c3c8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
I have test the task, running it in the docker container referenced in the WDL, from both my local laptop and a VM on the google cloud in the same zone firecloud uses. In both cases the task ran successfully, so I'm pretty sure the problem is on our end and not the GDC, and given the problem just appeared yesterday, it's highly likely it is linked to the July 19th release.
This is a significant regression as it breaks just-in-time file retrieval, which is a key step in analysis of image data and hg38 harmonized data.