Quantcast
Channel: Ask the FireCloud Team — GATK-Forum
Viewing all articles
Browse latest Browse all 1147

Cross product between entities for Mutect2 normal-normal analysis

$
0
0

What I want to do
One of the standard evaluations of somatic variant callers is normal-normal calling, where you take a bunch of replicate normal (non-tumor) bams from the same individual eg NA12878 and run your caller over every pair of samples, assigning one sample as the "tumor" and one as the "normal."

If I were writing a wdl outside of Firecloud I could implement this with the cross product:

workflow NormalNormal {
  Array[File] bams

  scatter (pair in cross(bams, bams)) {
    call Mutect2 { input: tumor = pair.left, normal = pair.right }
  }
}

My understanding of why this is tricky
As I understand the data model, it's baked into Firecloud that you run a method over each sample, which means you don't know about the other samples. That is, you can't perform the cross because your input is a File egsample.bam, and not an Array[File] eg sample_set.bams.

Hacky solution 1
I suppose one could set up a bunch of pairs, basically by implementing the cross manually and then uploading the resulting data model, and run the analysis over a pair set. Besides being really ugly this is not very maintainable because the part of the workflow that forms all pairs out of the list of samples lives outside of Firecloud.

What I mean is that even though the natural data model of the problem is

sample bam
sample1 sample1.bam
sample2 sample2.bam
sample3 sample3.bam

I would run on pairs:

pair tumor_sample normal_sample
pair1 sample1 sample2
pair2 sample2 sample1
pair3 sample1 sample3
pair4 sample3 sample1
pair5 sample2 sample3
pair5 sample3 sample2

Hacky solution 2
I could also imagine the following hack. The data model would be a single, dummy, sample, with the attribute "bam_paths" which is a file with a path to a different bam on each line. Then the method would take in this FoFN, use read_tsv to get the Array[File] of bams, and then scatter over the cross.

Similarly, there's another analysis I want to do that scatters Mutect2 linearly (not sure what the word is but I mean trivially, without a cross or anything) over samples, but then does a cross over the output, the pairwise overlap between callsets FWIW. I'm also not sure how to do that.

Is there a good way to do these things?


Viewing all articles
Browse latest Browse all 1147

Trending Articles