# Generic variation analysis reporting This workflow generates reports from a list of variants generated by [Variant Calling Workflow](https://workflowhub.eu/workflows/353). The workflow accepts a single input: - A collection of VCF files The workflow produces two outputs (format description below): 1. A list of variants grouped by Sample 2. A list of variants grouped by Variant Here is example of output **by sample**. In this table all varinats in all samples are epxlicitrly listed: | Sample | POS | FILTER | REF | ALT | DP | AF | AFcaller | SB | DP4 | IMPACT | FUNCLASS | EFFECT | GENE | CODON | AA | TRID | min(AF) | max(AF) | countunique(change) | countunique(FUNCLASS) | change | |----------|------|----------|---------|-----|-----|------|-----------|-----|-------|----------|---------------|-------------|--------|-------------| ---|--------|----------|-----------|-------------------------|------------------------------|------------| | ERR3485786 | 11644 | PASS | A | G | 97 | 0.979381 | 0.907216 | 0 | 1,1,49,46 | LOW | SILENT | SYNONYMOUS_CODING | D7L | tgT/tgC | C512 | AKG51361.1 | 0.979381 | 1 | 1 | 1 | A>G | | ERR3485786 | 11904 | PASS | T | C | 102 | 0.990196 | 0.95098 | 0 | 0,0,51,50 | MODERATE | MISSENSE | NON_SYNONYMOUS_CODING | D7L | Act/Gct | T426A | AKG51361.1 | 0.990196 | 1 | 1 | 1 | T>C | > **Note** the two alernative allele frequency fields: "AFcaller" ans "AF". LoFreq reports AF values listed in "AFcaller". They incorrect due to the known LoFreq [bug](https://github.com/CSB5/lofreq/issues/80). To correct for this we are recomputing AF values from DP4 and DP fields as follows: `AF == (DP4[2] + DP4[3]) / DP.` Here is an example of output **by variant**. In this table data is aggregated by variant across all samples in which this variant is present: | POS | REF | ALT | IMPACT | FUNCLASS | EFFECT | GENE | CODON | AA | TRID | countunique(Sample) | min(AF) | max(AF) | SAMPLES(above-thresholds) | SAMPLES(all) | AFs(all) | change | |-----|-------|-----|-----------|----------------|------------|----------|-----------|------|--------|------------------------|----------|-----------|------------------------------------|------------------|----------|---------| | 11644 | A | G | LOW | SILENT | SYNONYMOUS_CODING | D7L | tgT/tgC | C512 | AKG51361.1 | 11 | 0.979381 | 1 | ERR3485786,ERR3485787... | ERR3485786,ERR3485787,ERR3485789 ... | 0.979381,1.0... | A>G | | 11904 | T | C | MODERATE | MISSENSE | NON_SYNONYMOUS_CODING | D7L | Act/Gct | T426A | AKG51361.1 | 12 | 0.990196 | 1 | ERR3485786,ERR3485787... | ERR3485786,ERR3485787,ERR3485789... | 0.990196,1.0,1.0... | T>C | The workflow can be accessed at [usegalaxy.org](https://usegalaxy.org/u/aun1/w/genetic-variation-analysis-reporting) The general idea of the workflow is: 