I have cloned the five dollar genome analysis pipeline and uploaded my own WGS input data, however, I encountered problems in CheckContamination stages:
Job germline_single_sample_workflow.CheckContamination:NA:1 exited with return code 1
Workflow ID: 16497c6d-be4c-4579-9202-58960bbde32d
And I checked CheckContamination-stderr.log and found the following error msgs as:
Traceback (most recent call last):
File "", line 6, in
File "/usr/local/lib/python3.6/csv.py", line 111, in next
self.fieldnames
File "/usr/local/lib/python3.6/csv.py", line 98, in fieldnames
self._fieldnames = next(self.reader)
File "/usr/local/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 146: invalid start byte
And then I traced back it is in line 357, open function fails : with open('${output_prefix}.selfSM') as selfSM:
I tried two ways to fix this:
1. change line 354 in bam_processing.wdl from
python3 -> python2,
if we change python3 to python2, then the input .selfSM file can be handled correctly,
- change line 357 in bam_processing.wdl
with open('${output_prefix}.selfSM', ) as selfSM:
to
with open('${output_prefix}.selfSM',errors='ignore') as selfSM:
to ignore the unrecognized byte.