POLY_PIPELINE
A data analysis pipeline for STOmics data tailored to polyploid organisms.
Contributing
Contributions are welcome! Please fork the repository and submit a pull request.
See the CONTRIBUTING.md for details.
Main Pipeline (Cluster Execution)
The core scripts are optimized for SGE and PBS clusters execution. They use relative paths and must be run from the POLY_PIPELINE main directory.
Data Input
The input file must be the .gef file (post-processed by the SAW pipeline). It must be placed in the INPUT/datasets/ folder.
IMPORTANT: Place only one
.geffile in thedatasetsfolder.
The script provides a converter (check ANALYSIS [0] below), capable of converting .GEM and .H5AD files to the proper .GEF file prior to execution
It is possible (BUT NOT REQUIRED) to generate differential analysis for a list of genes of interest, generate the file INPUT/interest_genes.txt, or use the explicit path (check below for information) following the structure:
gene_name,Gene_ID_1,Gene_ID_2,Gene_ID_3,Gene_ID_4,Gene_ID_5
FLORAL_MERISTEM,AT5G08570,LOC107775591,Nicotiana_T001,LOC107775592
STRESS_HEAT,AT1G53540,LOC107817066,GmHIS4_A01,LOC107817067,AT2G41090
AUXIN_RESPONSE,AT3G15540,LOC107833544,Os02g0602300,AT4G20560
CELL_CYCLE,LOC107769919,AT1G44110,LOC107769920,AT3G53210
APICAL_DOMINANCE,LOC107802111,AT2G44320,LOC107802112
DEFENSE_MECH,AT5G41220,LOC107764120,Solyc01g099710
GIBBERELLIN_SYN,LOC107823450,AT1G05030,LOC107823451,AT3G44360
Each line represents one gene of interest starting with the identification of the gene followed by all correponding IDs (The IDs must match the mapping reference used in the generation of the .gef file).
| Step | Script | Description |
|---|---|---|
| Analysis AND Annotation | bin/2_COMPLETE_ANALYSIS.sh |
Complete analysis following the Stereopy documentation (generates the stereopy_ultimate_analysis.py script). |
Cluster Execution Example (SGE or PBS)
The scripts are submitted with explicit Miniconda or docker image paths and parameter variables (qsub -v).
-
- SGE
qsub -v ST_PYTHON="/home/user/.conda/envs/st/bin/python",ANALYSIS=1,MIN_COUNTS=50,MIN_GENES=5,PCT_COUNTS_MT=30,N_PCS=30 bin/2_COMPLETE_ANALYSIS.sh- PBS
qsub -v ST_PYTHON="/project/directory/POLY_PIPELINE/stereopy_1.5.1.sif",ANALYSIS=1,MIN_COUNTS=50,MIN_GENES=5,PCT_COUNTS_MT=30,N_PCS=30 bin/2_COMPLETE_ANALYSIS.shVariable Description Default ST_PYTHONPath to the python executable inside the st environment (SGE) or the container (PBS) for main analysis. - R_CONTAINERPath to the R container for secondary analysis (3 - Network Analysis). - MIN_COUNTSMinimum number of counts per cell. 20 MIN_GENESMinimum number of genes per cell. 3 PCT_COUNTS_MTAcceptable percentage of mitochondrial genes. 2 N_PCSNumber of principal components. This step can be inproved after first run. Check the Elbow Plot (RESULTS/results_ultimate/plots/qc/pca_elbow_enhanced.png) and insert the value of the elbow as N_PCS - ANALYSIS(Optional) Select the type of analysis (check below for details): [0] Converter, [1] Primary analysis, [3] Network Analysis 1 INTEREST_GENES_PATH(Optional) Select the list of candidate genes for analysis (see above). use explicit path for custom list: INTEREST_GENES_PATH="/Storage/user/file_name.txt" "INPUT/interest_genes.txt" EXPRESSION_THR(Optional) Set expression threshold for Interest Genes filtering. 1.0 MIN_X(Optional) Minimum X coordinate for spatial filtering. - MAX_X(Optional) Maximum X coordinate for spatial filtering. - MIN_Y(Optional) Minimum Y coordinate for spatial filtering. - MAX_Y(Optional) Maximum Y coordinate for spatial filtering. - HVG_MIN_MEAN(Optional) Min mean filtering for selection of HVGs. 0.0125 HVG_MAX_MEAN(Optional) Max mean filtering for selection of HVGs. 3.0 HVG_DISP(Optional) Dispersion filtering for selection of HVGs. 0.5 HVG_TOP(Optional) Number of top genes selected for HVG filtering. 2000 INPUT_PATH(Required for analysis [0] Converter) Input file or folder with files to be converted to .gef format - -
Converting files (.GEM or .H5AD) to .GEF prior to primary analysis only requires the input folder or input file, bin size is optional:
-
SGE
qsub -v ST_PYTHON="/home/user/.conda/envs/st/bin/python",INPUT_PATH="path/to/file.h5ad",BIN_SIZE=100,ANALYSIS=0 bin/2_COMPLETE_ANALYSIS.sh qsub -v ST_PYTHON="/home/user/.conda/envs/st/bin/python",INPUT_PATH="path/to/files/",BIN_SIZE=100,ANALYSIS=0 bin/2_COMPLETE_ANALYSIS.sh -
PBS
qsub -v ST_PYTHON="/project/directory/POLY_PIPELINE/stereopy_1.5.1.sif",ANALYSIS=0,INPUT_PATH="path/to/file.h5ad",BIN_SIZE=50 bin/2_COMPLETE_ANALYSIS.sh qsub -v ST_PYTHON="/project/directory/POLY_PIPELINE/stereopy_1.5.1.sif",ANALYSIS=0,INPUT_PATH="path/to/files/",BIN_SIZE=50 bin/2_COMPLETE_ANALYSIS.sh -
The variables are not required, the script can run with defaults and the entire tissue area.
-
The corret python or docker image path (ST_PYTHON) for the server must be selected.
-
If coordinate filtering is required (MIN_X, MAX_X, MIN_Y, MAX_Y), all coordinate parameters must be provided together.
Analysis selection
- The script is set to the primary analysis [1] as standard, proper for any spatial analysis from original files and generating the primary results.
- The script includes secondary analysis for specific uses: [3] for network analysis.
- Options must be explicit when submiting the job (or [1] will be used as standard).
Local Execution Example
IMPORTANT: This analysis requires high computational resources and are not recommended to be run locally. To run the main analysis locally using the
bashwrapper and your specific Conda path:
ST_PYTHON='/home/user/.conda/envs/st/bin/python' MIN_COUNTS=50 MIN_GENES=5 PCT_COUNTS_MT=30 N_PCS=30,ANALYSIS=1 bash bin/2_COMPLETE_ANALYSIS.sh
Network Visualization (Secondary analysis)
- Example of job command (SGE):
qsub -v ST_PYTHON="home/user/.conda/envs/st/bin/python",ANALYSIS=3 bin/2_COMPLETE_ANALYSIS.sh
- Example of job command (PBS):
qsub -v ST_PYTHON="/project/directory/POLY_PIPELINE/stereopy_1.5.1.sif",R_CONTAINER="/project/directory/POLY_PIPELINE/r_hdwgcna.sif",ANALYSIS=3 bin/2_COMPLETE_ANALYSIS.sh
-
After running the secondary network analysis, the Edges and Nodes files will be generated under the EXPORTS folder, which can be used for posterior visualizations/filtering, mainly NETWORKX and Cytoscape.
-
Importing for Cytoscape:
-
File->Import->Network from File.... -
Select file EXPORTS/[project_name]_FULL_EDGES.txt.
-
Under the configuration, select
fromNode(Source Node),toNode(Target Node) andweight(Edge Attribute). -
Click
OK. -
File->Import->Table from File.... -
Select file EXPORTS/[project_name]_FULL_NODES.txt.
-
Select (auto) column
gene_nameas key. -
Click
OK. -
For proper coloring clusters: Select Style from sidebar, select
Fill Color, selectmoduleas column, select Discrete Mapping and use right buttom to selectMapping Value Generatorsto automatically select colors for each module. -
Check documentation for further details.
License
This project is licensed under the MIT License.
Version History
master @ 7c668f0 (earliest) Created 19th Feb 2026 at 08:43 by Pedro Carvalho
Merge pull request #36 from capuccino26/dev
Implementation of Anchoring Analysis
Frozen
master
7c668f0
Creators and SubmitterCreator
Submitter
Views: 376 Downloads: 67
Created: 19th Feb 2026 at 08:43
AttributionsNone
View on GitHub
https://orcid.org/0000-0001-9242-6911