IsoAnnot
0.9.0b1 @ 0572562

Workflow Type: Snakemake

Work-in-progress

IsoAnnot

IsoAnnot is a new tool for generating functional and structural annotation at isoform level, capable of collecting and integrating information from different databases to categorize and describe each isoform, including functional and structural information for both transcript and protein.

⚠️⚠️ IsoAnnot is currenlty under beta-testing. Please see the latest release and download IsoAnnot from the release branch.

Requirements

Computational Requirements

The computational requirements to run IsoAnnot may vary depending on the organism of interest and the size of the transcriptome you want to annotate.

Reference benchmark (Human transcriptome):

Transcriptome size: 252,205 isoforms
CPU cores: 8 cores
Memory: 12 GB RAM
Disk space: 14 GB
Execution time: ~20 hours

The number of cores can be modified by editing the --cores parameter in the last line of IsoAnnot/isoannot.sh (default is 8 cores).

Software Prerequisites

IsoAnnot requires the following software to be installed before use:

Operating System: GNU/Linux (tested and supported)
Python: Python 3 (managed automatically by conda)
Conda: For dependency management
Snakemake: Workflow management system (version 7.x recommended)

Installation

IsoAnnot is distributed as a compressed file containing the proper directory structure.

Installation steps:

Extract the package to your desired installation folder:
```
tar -xzf IsoAnnot.tar.gz
cd IsoAnnot
```
Ensure all prerequisites are installed (see Installation Prerequisites)
Install external software (see External Software)
Activate the snakemake conda environment:
```
conda activate snakemake
```

You're now ready to run IsoAnnot!

Configuration Files

Configuration files control how IsoAnnot processes data for each species and database combination. Snakemake configuration files in IsoAnnot use the YAML file format and are organized on a per-species basis.

Where to Find Config Files

Configuration files are organized in a hierarchical directory structure:

IsoAnnot/config/
├── ensembl/
│   ├── hsapiens/
│   │   ├── config.yaml
│   │   └── Snakefile.smk
│   ├── mmusculus/
│   │   ├── config.yaml
│   │   └── Snakefile.smk
│   └── ...
├── refseq/
│   ├── hsapiens/
│   │   ├── config.yaml
│   │   └── Snakefile.smk
│   └── ...
├── mytranscripts/
│   ├── hsapiens/
│   │   ├── config.yaml
│   │   └── Snakefile.smk
│   └── ...
└── generic/
    ├── config.yaml          # Generic settings
    ├── Snakefile.smk        # Main workflow
    ├── Snakefile_ensembl.smk
    ├── Snakefile_refseq.smk
    └── Snakefile_mytranscripts.smk

Path structure: config///config.yaml

Examples:

Human Ensembl: config/ensembl/hsapiens/config.yaml
Mouse RefSeq: config/refseq/mmusculus/config.yaml
Custom human transcripts: config/mytranscripts/hsapiens/config.yaml

How to Modify Config Files

To modify an existing configuration:

Navigate to the config file:
```
cd IsoAnnot/config///
nano config.yaml
```
Edit parameters as needed (see Configuration Parameters Explained)
Save the file

Run IsoAnnot with the updated configuration:

cd IsoAnnot
./isoannot.sh --database  --species

Common modifications:

Update database URLs to newer releases
Change file paths for custom data
Adjust species-specific parameters
Modify the transcript_versioned flag

Generic Configuration

The generic configuration file (config/generic/config.yaml) contains global settings used across all species:

interproscan_path: "software/interproscan/interproscan.sh"
pfam_clan_url: ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.clans.tsv.gz
dir_sqanti: "scripts/sqanti3/"

Key parameters:

interproscan_path: Path to InterProScan executable
pfam_clan_url: URL for Pfam clan database
dir_sqanti: Directory containing SQANTI3 scripts

Output

Output Structure

IsoAnnot generates its output in a structured directory hierarchy within the directory supplied by the user. In case none is given, it will use the running directory by default. The structure of /data/ folder is as follows:

/data/
└── /                                    # e.g., Hsapiens/
    ├── _tappas__annotation_file.gff3     # Main output
    ├── _tappas__annotation_file.gff3_mod # Modified GFF3
    ├── config/                                  # Downloaded config files
    │   ├── ensembl/
    │   ├── refseq/
    │   └── global/
    ├── output/
    │   └── /                               # Database-specific outputs
    │       ├── layers/                         # Annotation layers
    │       │   ├── go.gtf
    │       │   ├── interpro.gtf
    │       │   ├── reactome.gtf
    │       │   └── ...
    │       ├── transcripts/                    # Transcript files
    │       ├── proteins/                       # Protein sequences
    │       └── ...
    └── tmp/                                    # Temporary processing files

Directory naming:

``: Capitalized species prefix from config (e.g., Hsapiens, `Mmusculus`, `Stuberosum`)
``: Lowercase common name from config (e.g., human, `mouse`, `potato`)
``: Database used (e.g., ensembl, `refseq`, `mytranscripts`)

Main Output Files

Primary Annotation File

File: _tappas__annotation_file.gff3

This is the main output file containing comprehensive isoform-level annotations.

Example: human_tappas_ensembl_annotation_file.gff3

Location: IsoAnnot/data//

Content: GFF3-formatted annotation with:

Gene and transcript structures
Protein-coding predictions
Functional annotations from multiple databases
Structural features
Post-translational modifications

Modified Annotation File

File: _tappas__annotation_file.gff3_mod

A modified version of the main GFF3 file optimized for downstream analysis tools.

Understanding the GFF3 Annotation File

The output GFF3 file integrates information from multiple sources:

Structural information:

Gene and transcript coordinates
Exon/intron structure
CDS (coding sequence) regions
UTR regions (5' and 3')

Functional annotations (in attributes column):

Gene Ontology (GO): Biological process, molecular function, cellular component
InterPro: Protein domains, families, and functional sites
Pfam: Protein family classifications
Reactome: Pathway associations
UniProt: Protein function descriptions

Post-translational modifications:

Phosphorylation sites
Other PTMs from PhosphoSitePlus

Example GFF3 attributes:

gene_id=ENSG00000000003;transcript_id=ENST00000000003;GO=GO:0005515,GO:0003824;
InterPro=IPR001478,IPR015421;Reactome=R-HSA-112316;UniProt=P12345

Using the output:

Import into genome browsers (IGV, UCSC Genome Browser)
Use with tappAS for isoform-level functional analysis
Parse programmatically for custom analyses
Filter by specific annotation types

Troubleshooting

Problem: "The snakefile or configfile requested do not exist"

Solution: Ensure config files exist for your species at config///

Problem: InterProScan not found

Solution: Run ./InterproScan_install.sh or verify interproscan_path in config/generic/config.yaml

Problem: Out of memory errors

Solution: Increase available RAM or reduce the number of cores used

Problem: Download errors for database files

Solution: Check internet connection and verify URLs in config file are current

Problem: Snakemake directory locked

Solution: Use --unlock option (see Unlocking the Working Directory)

Support

For issues, questions, or contributions:

GitHub Issues: https://github.com/ConesaLab/IsoAnnot/issues
Documentation: This README

License

[License information to be added]

SEEK ID: https://workflowhub.eu/workflows/2192?version=1

Version History

0.9.0b1 @ 0572562 (earliest) Created 16th Jun 2026 at 09:25 by Fabián Robledo

Merge pull request #9 from ConesaLab/dev

Dev updates

Frozen 0.9.0b1 0572562

Creators and Submitter

Creators

Alessandra Martinez
Pablo Atienza

Submitter

Fabián Robledo

License

Creative Commons Attribution 4.0 International (CC-BY-4.0)

Activity

Views: 7 Downloads: 0

Created: 16th Jun 2026 at 09:25

Last updated: 16th Jun 2026 at 09:28

Annotated Properties

Operation annotations

Genome annotation, Gene functional annotation

Scientific disciplines

Computer Science, Biochemistry, Genetics and Molecular Biology

IsoAnnot 0.9.0b1 @ 0572562

IsoAnnot

Requirements

Computational Requirements

Software Prerequisites

Installation

Configuration Files

Where to Find Config Files

How to Modify Config Files

Generic Configuration

Output

Output Structure

Main Output Files

Primary Annotation File

Modified Annotation File

Understanding the GFF3 Annotation File

Troubleshooting

Support

License

Version History

0.9.0b1 @ 0572562 (earliest) Created 16th Jun 2026 at 09:25 by Fabián Robledo

Creators

Submitter

Related items

IsoAnnot
0.9.0b1 @ 0572562