Research Object Crate for DNA-seq

Original URL: https://workflowhub.eu/workflows/60/ro_crate?version=1

# DNA-Seq pipeline Here we provide the tools to perform paired end or single read DNA-Seq analysis including raw data quality control, read mapping, variant calling and variant filtering. ## Pipeline Workflow All analysis steps are illustrated in the pipeline [flowchart](https://www.draw.io/?lightbox=1&highlight=0000ff&edit=_blank&layers=1&nav=1&title=NGSpipe2go_DNAseq_pipeline.html#R7R1bd5s489f4nObBPoDvj3ESO%2BnXZpuk3Wz3pUcG2WaDgQJ2Lr%2F%2B0wgJIxAY29jGabZ7TowAIc2MRnNXrXkxfxl5yJ19dQxs1TTFeKk1L2uaprY0rQb%2FK8Zr2NLtdsOGqWca7KFVw4P5hlmjwloXpoF94cHAcazAdMVG3bFtrAdCG%2FI851l8bOJY4lddNMWphgcdWenWR9MIZqxV7fRXN66xOZ2xT%2Fc0Nr8x0p%2BmnrOw2fdsx8bhnTni3bA5%2BjNkOM%2BxpuZVrXnhOU4Q%2Fpq%2FXGALwMohFr43zLgbDdnDdlDkhaV%2Be%2FfwuPi8uHz79%2FvN9%2BFPf%2FlvXeUYWCJrwYBR0zoW6XFgmEsAr2VObXqj83sBYx14FA7RJfk1ZX%2Fpa2Mv2ULGRPvirRQawSsH%2FiyYW%2BSXSu5ZaIytQQTTC8dyPPpQc0j%2FI4%2F4gec8RVgiUBxMHDtgJKV2YNzIn2GD9Uj7ia4mpmXFOr3qwL%2BoU36HIrE5mHrIMAlwE826Mzd1cqnAIxbyffY7wq8STTKOHIavJfYC%2FBJrYsgaYWeOA%2B%2BVPMLuaj32CltTmsJQ9byi0JbCCHQWI84OfxCxVTGN%2Bl6RB%2FnBKEROLf9i21w%2BXM8nj8355L63%2FPEjqNc7Kdxhg6wjdul4wcyZOjayrlatMQQAXP5bzF3%2B%2FBS5pGX11hfHcRme%2FsNB8MpwihaBQ5piVIJfzOAf6LDRZlc%2FWffw%2B%2FIlfvHKL2wCgNhLcPmT9wcXq9foFX8vTW4hDGDimQuRNfnOwtPZU%2F3P6rBl%2FfwyuhtMHr%2BPTHXx2Ku3GatD3hQHOXBvygnFwxYKzKU4jl2Qfq18nsxue4G1nN2Pb68fhr%2FerusfHGIvHMJzAoI8BwBX75fEMnrtrsAy6n0txTLUnpZmGS1F2516rkZfR%2BdB%2F9v8WRuN0C%2F%2F4n%2Bvn%2BvdnVlGaQyC35EziKyFvmZdS3fVZj%2B9sKUPthQpNznAUn97DC6%2FK%2F%2F8%2B3l4f%2BXNVeP76PUm5DOngKwsbr7i35tx862QXBDHzValcNxW3zuSsW2cg05ALnXgvsCLoXFoWnw4aToQtoQDEEXzWESRN2zJFg9bp0AtfFOHG3Wf4vucPKC23JecHT9quNBq54PL2%2FP6A%2F4NMDZdbJlk1yTtg7g4MC4kNISjE5tDoWTHAe827TrFrBmgMXk7NkmiIOOJaZt03ycosddMpRzg%2By6Cr%2BlMElk9qPt%2BPfAWto4CGFv8ss4oWNKpEht02HUhtFRyLmGnCKQhD0%2BEDmdBAMaHc1hY2nBqBkTybHiW27Chq6E5H08c0i35eTt6AAxrU4dc1IF2LWcMEi1egplkyNHvk9%2BE8n0g%2FKFhI%2FKjwe81iDjrLF%2FTY8x8jsMaSeC%2FAVqSknel1pDlIMMHKwtMH2wqjrGwAI4Ksg1YY75POLSJrJBbI8IMsXfKE8YvWF8EdIYxpkEIDbo17SllJ598fEb%2B0EYXmR6moKAA%2BeTSWwb2QSXbBhDlLNPUJ7MVulIsJYqg9nQ6aUMJ12LjWk9k%2FCp9P21J9tMECIgy6MJPoiy6RG2EyYcyFJdKerwhMjy2pODK3dALw1BNgUyTQIy3FZZK2Pe%2BOSYlK%2FaxutYWENYVO3AmEx%2BYbAIJ0aC3x4vWO6bwy3%2F%2FFAThdcKvIPquJOEDaDicatdLswcSZn%2F42Ptr%2FB94AMjmAJYgUVh1pZxrToZPRC3KthTgWqG1RQnb6wFgEe51YvfIqgnqzLAF93QCPLq1rBNwv63YtmxHKibWusm2WeqpraZq4UkQ3uzJuXdcvJMJEceQlKJNvrFEnp8pKGU9toWclAL%2FfoAd9m3yBtdzlqaBC4g1Zh4RDSI56Za8KACDDibnu%2BFrdd%2FFujkhWrPwdeVT3giyGxw3aMjePCs6IRlatLWMXotxILYzxJg%2BN1mHOBo4ZJ%2BcWNRiMKFGgjj3fp6ZAX5wEWWVzx51WpQhtCS9O7IdWCa0KK2N%2BSa5jLHOTfZLmWHgoIIM%2F2BlJZmWgMYEbkqRZPqfvd7T01B%2Fsr4P6uhypGL1S11tv3dJRrDIlWq4LWyk61REromkhiHyg7sLysiLiBS%2FF4TRBTBCnWhjngNU5UxgwOiZDpuq1zBYsus%2BcdWsrN080tbhSdAIyeB%2F642wOdqg17LyZtz7mWblMvaTYO9km9eJ8vyFXl02OxlMnXoJWceKSPcl8Ps2d9Vzht9qphiV2pFwqs7mSmoRfi%2FlKmnu%2Fs58A6XykVZBPtKqHB8ZPJ4XZiKCiSxsuieMgwq6rkvWVUyoXO%2FZT%2FX1yceRMQufZb22fx4V6RvjZ7Q5j2odiEeVbDjj8QGcJ%2FW6R%2BVJubuwxDm1Xi%2BjVkqmnSlb6Pd%2FLQJ3AasJQgrJA6tFE1sxym89bykdT6EPyRiWVzQNM7SfhyZlGLfyiQkV3Kg%2BOv%2F%2BPwpuz0RU1MUAeRqrcpa7zhP6WYz%2BgSJNHVnnjOapySW5AtJKWDxsh4fxDNHctIBcr7G1xNDrQbS1bqsnrJRmP21jlqpr3BZdvsu2W7FloaRWhof9hRX41VwbS30SWw%2FPJsFiRPTQQsjVop6V8etqUVwj13KCVxdfwF1qFbm%2BOIuWzg%2FbnJjYGGGbPkTv%2Fxid5U12H1P7%2B%2B7hy1%2BX0LHueNj%2FWLRcOe92Gu1iy7YfPVn%2Bwu1VfuFazrSiq5aMrM7XrO6QNagHdJGG%2BxlGOqxivu1Vi8j3QNDtfkFy3kJeKxbeyT2o1TdDlan58bDNtZpf%2F1CaX0G3dHYod1VEaQ%2B7hCAqyn1EB9menWGQAAWKaQSRoT8z7ddfYUTQr7CZqIy%2F4EFyl8kuYXvjfm5IZB7JI5n%2BsaRTqdrTbtCnyN3GDFtE9PIb95nzlz%2Bb5yg8iNj2jYyIMKNoEcBS8Zx5TYwFgs2vUZP48b46hjkBXkLxSh9cCbb6wg9oV4jw2Vff9Gsx594qoCogiPVnZH36IfvRFBtjYNU5vrqq7K8HEiI73X6xPVfVSth0pVbajzD9TfbqTnqvlnvUDrRVFzbS3s8vF4ypFrHTfkXeE%2BWbc7KeSNfGwrVMiH81agd39Xhz8nV%2FYzNqiKuTN6M2O12Jnnl05456VL5x0i7jdlGBv3ou479DmeeKyPwbuHzawEDGD7ffahpYhqfMokW1blt3PAPZBGD0Hg70xiE4S%2BSgYVIcmKU35jBtpXaKHCbpPG42laNzGEk%2B88lH1zY3FkzSCe07iCqFHcq9Y5kVpGjvf6C9bLTnlQs4PNalRQlaO2O91GIQIh1o2xGCtm14%2FablIHZb%2F2F%2B%2BQ6Ip6%2Beex56jT3gQmijH%2Bs5EevY4lsJtz4z1WpY8Pmm2k0QXTiCgnGRxYNbkI%2FvsU727bHHSiwUlHtWb4GkA%2F2QP3c0iM6kLgfI2FQG519rkefwgHKPF5%2FT5rrVqYaoiHHS7WY6vveQcs%2B4uzBGzu0%2FLp7X7%2B77QV8fdervPha3TF2qV5jLHWu7yx32Vr6Tvacd3eMJJqgAnUyB8IM5LqrpVcOnAgGJwF4N%2FELtVVNhDklXBIvGuMeM2DRlTBZTmBxLWGKDKq%2B%2B7TaW%2BuQs%2FPaT7TzbtXhYE2Pgm0DkwHlAzIRvOTpl%2BdEGtC7Vaat0moplx4gWtbYkq0PtS2qfqWp%2FX%2Bu%2FLVn%2FCcDtNxUm4kB%2FbiqMVBFR04j4KEu3ix7Sl7hMpJDfVQ3ZaUH2s%2BOCs6o5jJ0XKGEAMeuUIY8dz8BenTRHTJutFVriALx67ehOvNRnGDXVxC34Fz3hIsOI%2BtYK1cMRPfKUmPIjHyTB9LzMTHq%2B2ZNJVnPA8%2FhgidxhWUzcFzpUUk9Fc449pWVPPBg7xmuqMbU9BQZvCUEiBjDDNk%2BomqAujP0KneSS4Abu%2FX6yzcA7W11PTJsm10YBqmFWFKJEJ0uaCr3jMd%2B5H2DXj%2Be1rkacnkPuNpx%2BDBrTEMmB0Yb9%2FyEDg0YZtUF7uGLKjVnYv1DU7ybqnPRbaamoJ9nh%2ByVEIsiLyZ1MecdtNd%2Btisltv%2FWqqqQm7Alox9G4jxJaeGMXiSxEzwYK0Jr99UhaMM2NXSmlyqc303WxcfDUgZQ2aqN5mOJHNsMlaJ20ct0kXXxKsLGuCRI74Ph9NHctrDbu1QaFcWP6Vu1xaluMs5wN7EhB7alCGBKlVVoHY1%2FFu3j13spESacTNCDzVojKrBArG6N5jJFRwZrlZYDQDLwtyhuG8JEoTI3uVXMH4lTojRs7PGXiHlOwzYHC6A3mlxHcIR%2BZidH6KZoT0uvtvn7k9phjyoSiN6R7WHdIOdYXfgBKseLhRwwto2H09zkJGzFBLFqWkcled4xVcUvCWp6M0ELPg98P6Fnl%2BQAwkY0dq%2F34%2Bjwlx6poYpcqkwd0rMrPnsjciVMb3m6FS7%2FgKa2gGhFxRsXUceZOAhu8iCTpkRHx8yVYE6cGJhI0B8lNaW4aBuWSMqoQOWcpG4p4iIQmqaYq20%2FKSHeQEsExE93NULMcOgmxrJKO1KpLQCUQZ9IEpvLA2DjXakmos1VCHQYpdWZnc0dK201AwKnHt8iVPkfxufBprQBGbFHO1bvlMD0xmllt94ohcW8sJq3xXYapBFQEYaFfBldTZB4C6hB4zygTnbtqpy8rpSBFWgkeeWnU1e4BqO8z6koKrKJKxUcp50jEi5VyJut8Yk4XHo%2FDqaLd5aBZ2iFAoB0ea4SXmbWZ5Q8dPeH6OBBzPYzmY8jizgKX5IkKwCoh1IjAk9moj1EsnFYPyARs4m62uJwCc3Z9a3qiD820d7wnOLCEgtRGUyrSrbHhb1BHe32Dx8Ih%2FQ2%2Bm1vUe7MBEkCghQXs3GQjYND3rYUnhJeAEdcEu7DtxMVdxV%2F%2FEUwL0tFTUtgRjbxbiJK0BJQXmteqeJfsbrI2Omy%2BfHFGL4aFKIQwzU%2FUqPWEqb17BRifiBamPfXPpBXRs9W1XDEo3%2FxUgbCFhIOnyesvr%2FXwdDYXK4pYmqRQ1CRqXEK6LC26M09A%2BwjuFJ0JaTScmJDf3VzKz0ou3NS9IKWzXlFFoHccwT%2BSqFl5twHs81%2BgIF3RLC7dcV%2FpDP0nypahlh1h0rVYFCFr5Ma8%2Fbsa5qYP0ZLIxs7Cp9IgnRwVjCOXgyZ1OmRj8eR8Dm1VtN51Osf1OUhL2h25us5GPIdcfMOeGR7RkuBDqsiHtk5t3craUDSKjTs7Dx%2FGJsX97gfcVwP3O%2Be3l4v6vI2oPMxvl9ScyC%2FtsOPRt0xSLiZsSoMls6TNhWe9DjykPwE01zFp8TD71BnzGyC6OFdv9UQQttqSKvKSXK0mP4m%2B9AJpR7UMV7hAWl7ds%2FXhJkc7x9xEw7vl893NTXPYt97uH2%2FsN62EnOs9lp84cPWJfaKdR0Mem08nikm0WJ5D1YpP0KjFeMhi8doToYSeqC%2BhIPgij4V0Hd9kNqb96yyxwhPRdPKio7LJ7OQUlWRwVIsXgKtStS21QoGWlXKA5hVPWs%2Fvugfa5jat4AdHTvwYFeYnF2E2Y%2BxEi4UfZnKkT6k4IC9h49HpZPK5SRYST46bJKv3taXBE8fnJ39mLf%2FCBtEqM4br4odCZjOG1Ok2x%2BEL1xebF7c6UXtogjG0%2Bq0jMwZpdFz6EGBqt66tvKJZtouN4F0CQLs9MRmmIxHbZF6y1ubQLAa6E6xRwZLjh2GlhZwgiT%2BtFEUuMP7YMguVHVjtePUfSuBk%2FY64NXTSWRjSEOlma3dOJj%2FpKTsXZ328WSeZhhFbuhywsVAeALFs3aI5YMIe%2B%2FDHjYdn8n0m8Uz6WsklLWno5gYjTFBSBYJvkjuiPGFCk2yJJSRMyAlJkxBSAQFiq1CbfFL%2Bc2NtpNVctHzXN7bGzvPRdb8SjedlFvEufBh4Re3pzbaSIKnd%2FJ5Sn83u7vU4ylimzOnR3fZUJjvM6KSorN%2BLU9m%2BvTbFQgx3r2H1DspHCkfnSGlnLcVKgdssSLDCwSwHP8tg94NLKudMPlQNi52wHtpUD29CZcUCJkgXdZZV4vL69DDeENaxkBYj2KR%2FZsx9gCO7a1D1Jn6Awdq6BMIgsizByiYHg4sfOrwl%2BBcd46%2FChx5k02HJZuFCOltKtZCso%2BJmYVU7slk4u%2Fpvjm0zrTZ7dKCbVS4QrIwJJh1DBV3kg8j8ybOpa1pzSP%2BTcrx4EYNOchMUQ%2BniKdqk06sO%2FKtl5HNPPWSYBNmJZkJnUD6Sst8pFJPkrHiGDGoVUGpCzJ5SDjklaybw0BmxbKjEBKB223vab3OMSVm0k2kUom8trGSLtcqLQ8%2Bkx7D2ozR3H7xe8aReK5WSFesMsww%2FVsKMkCNBlx%2BI9R3yevg7OoEAnF20k5z38ibGjkFI%2B%2FMLjYO9nXL65bxMGvPGQ3rQF1ZYIyG11bBCT5MFEYtoaiRZUQH2QiDSKnErj2QOOygwo%2FiJh%2FEDH9KYKvgF6bTXsKbdKz%2BwHWjnug%2B78Y5WU%2BQd%2FXRsr9QWpm7hUoNQM8cJ4todmdbsqwPBb82r%2FwM%3D). In case of paired end reads, corresponding fastq files should be named using *.R1.fastq.gz* and *.R2.fastq.gz* suffixes. Specify the desired analysis details for your data in the *essential.vars.groovy* file (see below) and run the pipeline *dnaseq.pipeline.groovy* as described [here](https://gitlab.rlp.net/imbforge/NGSpipe2go/-/blob/master/README.md). A markdown file *variantreport.Rmd* will be generated in the output reports folder after running the pipeline. Subsequently, the *variantreport.Rmd* file can be converted to a final html report using the *knitr* R-package. GATK requires chromosomes in bam files to be karyotypically ordered. Best you use an ordered genome fasta file as reference for the pipeline (assigned in *essential.vars.groovy*, see below). ### The pipelines includes - quality control of rawdata with FastQC - Read mapping to the reference genome using BWA - identify and remove duplicate reads with Picard MarkDuplicates - Realign BAM files at Indel positions using GATK - Recalibrate Base Qualities in BAM files using GATK - Variant calling using GATK UnifiedGenotyper and GATK HaplotypeCaller - Calculate VQSLOD scores for further filtering variants using GATK VariantRecalibrator and ApplyRecalibration - Calculate the basic properties of variants as triplets for "all", "known" ,"novel" variants in comparison to dbSNP using GATK VariantEval ### Pipeline parameter settings - essential.vars.groovy: essential parameter describing the experiment including: - ESSENTIAL_PROJECT: your project folder name - ESSENTIAL_BWA_REF: path to BWA indexed reference genome - ESSENTIAL_CALL_REGION: bath to bed file containing region s to limit variant calling to (optional) - ESSENTIAL_PAIRED: either paired end ("yes") or single end ("no") design - ESSENTIAL_KNOWN_VARIANTS: dbSNP from GATK resource bundle (crucial for BaseQualityRecalibration step) - ESSENTIAL_HAPMAP_VARIANTS: variants provided by the GATK bundle (essential for Variant Score Recalibration) - ESSENTIAL_OMNI_VARIANTS: variants provided by the GATK bundle (essential for Variant Score Recalibration) - ESSENTIAL_MILLS_VARIANTS: variants provided by the GATK bundle (essential for Variant Score Recalibration) - ESSENTIAL_THOUSAND_GENOMES_VARIANTS: variants provided by the GATK bundle (essential for Variant Score Recalibration) - ESSENTIAL_THREADS: number of threads for parallel tasks - additional (more specialized) parameter can be given in the var.groovy-files of the individual pipeline modules ## Programs required - Bedtools - BWA - FastQC - GATK - Picard - Samtools

Author
Sergi Sayols
License
GPL-3.0

Contents

Main Workflow: DNA-seq
Size: 1793 bytes