I have a workflow that indexes a vcf.gz to produce a tabix index (tbi). A common sanity check when reading from a VCF index is to verify that the index was created more recently than the VCF.
It seems like gsutil sets file creation times on copy, rather than when they're actually created by my workflow.
Since the index is often < 100X smaller than the VCF, it'll almost always get copied first. Here are the file sizes and creation times for my two files in the GS bucket:
VCF: 150789555 2018-06-05T13:42:39Z
VCF index: 1616957 2018-06-05T13:42:34Z
When I copy the index to my local machine, the creation time and sizes are:
VCF index: 1.6M Jun 7 14:02
Is there a way to ensure the creation times actually reflect when the file was created, not copied?
As a hack, I'd be willing to run some sort of gsutil touch
command to reset the file times, but I don't see how that might work. Perhaps gsutil setmeta
?