Poly Pipeline
master @ 7c668f0

Workflow Type: Shell Script
Stable

POLY_PIPELINE

A data analysis pipeline for STOmics data tailored to polyploid organisms.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request. See the CONTRIBUTING.md for details.

Main Pipeline (Cluster Execution)

The core scripts are optimized for SGE and PBS clusters execution. They use relative paths and must be run from the POLY_PIPELINE main directory.

Data Input

The input file must be the .gef file (post-processed by the SAW pipeline). It must be placed in the INPUT/datasets/ folder.

IMPORTANT: Place only one .gef file in the datasets folder.

The script provides a converter (check ANALYSIS [0] below), capable of converting .GEM and .H5AD files to the proper .GEF file prior to execution

It is possible (BUT NOT REQUIRED) to generate differential analysis for a list of genes of interest, generate the file INPUT/interest_genes.txt, or use the explicit path (check below for information) following the structure:

gene_name,Gene_ID_1,Gene_ID_2,Gene_ID_3,Gene_ID_4,Gene_ID_5
FLORAL_MERISTEM,AT5G08570,LOC107775591,Nicotiana_T001,LOC107775592
STRESS_HEAT,AT1G53540,LOC107817066,GmHIS4_A01,LOC107817067,AT2G41090
AUXIN_RESPONSE,AT3G15540,LOC107833544,Os02g0602300,AT4G20560
CELL_CYCLE,LOC107769919,AT1G44110,LOC107769920,AT3G53210
APICAL_DOMINANCE,LOC107802111,AT2G44320,LOC107802112
DEFENSE_MECH,AT5G41220,LOC107764120,Solyc01g099710
GIBBERELLIN_SYN,LOC107823450,AT1G05030,LOC107823451,AT3G44360

Each line represents one gene of interest starting with the identification of the gene followed by all correponding IDs (The IDs must match the mapping reference used in the generation of the .gef file).

Step Script Description
Analysis AND Annotation bin/2_COMPLETE_ANALYSIS.sh Complete analysis following the Stereopy documentation (generates the stereopy_ultimate_analysis.py script).

Cluster Execution Example (SGE or PBS)

The scripts are submitted with explicit Miniconda or docker image paths and parameter variables (qsub -v).

  • Analysis Script:

    • SGE
    qsub -v ST_PYTHON="/home/user/.conda/envs/st/bin/python",ANALYSIS=1,MIN_COUNTS=50,MIN_GENES=5,PCT_COUNTS_MT=30,N_PCS=30 bin/2_COMPLETE_ANALYSIS.sh
    
    • PBS
    qsub -v ST_PYTHON="/project/directory/POLY_PIPELINE/stereopy_1.5.1.sif",ANALYSIS=1,MIN_COUNTS=50,MIN_GENES=5,PCT_COUNTS_MT=30,N_PCS=30 bin/2_COMPLETE_ANALYSIS.sh
    
    Variable Description Default
    ST_PYTHON Path to the python executable inside the st environment (SGE) or the container (PBS) for main analysis. -
    R_CONTAINER Path to the R container for secondary analysis (3 - Network Analysis). -
    MIN_COUNTS Minimum number of counts per cell. 20
    MIN_GENES Minimum number of genes per cell. 3
    PCT_COUNTS_MT Acceptable percentage of mitochondrial genes. 2
    N_PCS Number of principal components. This step can be inproved after first run. Check the Elbow Plot (RESULTS/results_ultimate/plots/qc/pca_elbow_enhanced.png) and insert the value of the elbow as N_PCS -
    ANALYSIS (Optional) Select the type of analysis (check below for details): [0] Converter, [1] Primary analysis, [3] Network Analysis 1
    INTEREST_GENES_PATH (Optional) Select the list of candidate genes for analysis (see above). use explicit path for custom list: INTEREST_GENES_PATH="/Storage/user/file_name.txt" "INPUT/interest_genes.txt"
    EXPRESSION_THR (Optional) Set expression threshold for Interest Genes filtering. 1.0
    MIN_X (Optional) Minimum X coordinate for spatial filtering. -
    MAX_X (Optional) Maximum X coordinate for spatial filtering. -
    MIN_Y (Optional) Minimum Y coordinate for spatial filtering. -
    MAX_Y (Optional) Maximum Y coordinate for spatial filtering. -
    HVG_MIN_MEAN (Optional) Min mean filtering for selection of HVGs. 0.0125
    HVG_MAX_MEAN (Optional) Max mean filtering for selection of HVGs. 3.0
    HVG_DISP (Optional) Dispersion filtering for selection of HVGs. 0.5
    HVG_TOP (Optional) Number of top genes selected for HVG filtering. 2000
    INPUT_PATH (Required for analysis [0] Converter) Input file or folder with files to be converted to .gef format -
  • Converting files (.GEM or .H5AD) to .GEF prior to primary analysis only requires the input folder or input file, bin size is optional:

  • SGE

    qsub -v ST_PYTHON="/home/user/.conda/envs/st/bin/python",INPUT_PATH="path/to/file.h5ad",BIN_SIZE=100,ANALYSIS=0 bin/2_COMPLETE_ANALYSIS.sh
    qsub -v ST_PYTHON="/home/user/.conda/envs/st/bin/python",INPUT_PATH="path/to/files/",BIN_SIZE=100,ANALYSIS=0 bin/2_COMPLETE_ANALYSIS.sh
    
  • PBS

    qsub -v ST_PYTHON="/project/directory/POLY_PIPELINE/stereopy_1.5.1.sif",ANALYSIS=0,INPUT_PATH="path/to/file.h5ad",BIN_SIZE=50 bin/2_COMPLETE_ANALYSIS.sh
    qsub -v ST_PYTHON="/project/directory/POLY_PIPELINE/stereopy_1.5.1.sif",ANALYSIS=0,INPUT_PATH="path/to/files/",BIN_SIZE=50 bin/2_COMPLETE_ANALYSIS.sh
    
  • The variables are not required, the script can run with defaults and the entire tissue area.

  • The corret python or docker image path (ST_PYTHON) for the server must be selected.

  • If coordinate filtering is required (MIN_X, MAX_X, MIN_Y, MAX_Y), all coordinate parameters must be provided together.

Analysis selection

  • The script is set to the primary analysis [1] as standard, proper for any spatial analysis from original files and generating the primary results.
  • The script includes secondary analysis for specific uses: [3] for network analysis.
  • Options must be explicit when submiting the job (or [1] will be used as standard).

Local Execution Example

IMPORTANT: This analysis requires high computational resources and are not recommended to be run locally. To run the main analysis locally using the bash wrapper and your specific Conda path:

ST_PYTHON='/home/user/.conda/envs/st/bin/python' MIN_COUNTS=50 MIN_GENES=5 PCT_COUNTS_MT=30 N_PCS=30,ANALYSIS=1 bash bin/2_COMPLETE_ANALYSIS.sh

Network Visualization (Secondary analysis)

  • Example of job command (SGE):
qsub -v ST_PYTHON="home/user/.conda/envs/st/bin/python",ANALYSIS=3 bin/2_COMPLETE_ANALYSIS.sh
  • Example of job command (PBS):
qsub -v ST_PYTHON="/project/directory/POLY_PIPELINE/stereopy_1.5.1.sif",R_CONTAINER="/project/directory/POLY_PIPELINE/r_hdwgcna.sif",ANALYSIS=3 bin/2_COMPLETE_ANALYSIS.sh
  • After running the secondary network analysis, the Edges and Nodes files will be generated under the EXPORTS folder, which can be used for posterior visualizations/filtering, mainly NETWORKX and Cytoscape.

  • Importing for Cytoscape:

  • File -> Import -> Network from File....

  • Select file EXPORTS/[project_name]_FULL_EDGES.txt.

  • Under the configuration, select fromNode (Source Node), toNode (Target Node) and weight (Edge Attribute).

  • Click OK.

  • File -> Import -> Table from File....

  • Select file EXPORTS/[project_name]_FULL_NODES.txt.

  • Select (auto) column gene_name as key.

  • Click OK.

  • For proper coloring clusters: Select Style from sidebar, select Fill Color, select module as column, select Discrete Mapping and use right buttom to select Mapping Value Generators to automatically select colors for each module.

  • Check documentation for further details.

License

This project is licensed under the MIT License.


Version History

master @ 7c668f0 (earliest) Created 19th Feb 2026 at 08:43 by Pedro Carvalho

Merge pull request #36 from capuccino26/dev

Implementation of Anchoring Analysis


Frozen master 7c668f0
help Creators and Submitter
Creator
  • Pedro Cristovão Carvalho
Submitter
Citation
Cristovão Carvalho, P. (2026). Poly Pipeline. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.2091.1
Activity

Views: 376   Downloads: 67

Created: 19th Feb 2026 at 08:43

Annotated Properties
Topic annotations
Operation annotations
Scientific disciplines
Biochemistry, Genetics and Molecular Biology
help Attributions

None

Total size: 430 KB
Powered by
(v.1.17.3)
Copyright © 2008 - 2026 The University of Manchester and HITS gGmbH