NanoporeDB_workflow
1. Overview
This repository contains the integrated computational workflow for the large-scale mining, multimeric structure prediction, and quality filtering of protein nanopores. This pipeline enables the discovery of novel nanopore candidates from massive metagenomic and genomic databases. The structural models, pore geometry analysis, and membrane orientation predictions generated by this workflow are hosted at our public database: NanoporeDB (https://db.genomics.cn/nanopore/).
2. Workflow Diagram

- Figure 1: Overview of the nanopore mining workflow.
3. Prerequisites & Installation
3.1 Conda Environment
We recommend using Conda to manage dependencies. To replicate the environment:
conda env create -f environment.yml
conda activate Foldseek
3.2 External Tools
Ensure the following tools are installed and accessible in your $PATH:
MMseqs2 (0b27c9d7d7757f9530f2efab14d246d268849925)
Foldseek (v9.427df8a)
US-align (v20241108)
AlphaFold-Multimer & AlphaFold3 Server
4. Database Preparation
Before running the pipeline, download and index the required databases:
4.1 Foldseek pre-generated databases of AFDB
mkdir -p Database && cd Database
wget https://foldseek.steineggerlab.workers.dev/afdb.tar.gz
tar -xzf afdb.tar.gz
- Path to this directory will be used in Step 2
4.2 Sequence Databases (UniRef90 & MGnify90)
cd Database
Download:
UniRef90 (Release 2024_05)
MGnify90 (Release 2024_04)
Pre-processing (Extract Full-Length sequences):
zcat mgy_clusters.fa.gz | perl -ne 'if(/^>/){$keep = /FL=1/} print if $keep' > MGnify90FL.fa
Indexing:
mmseqs createdb MGnify90FL.fa MGnify90FL
mmseqs createdb uniref90.fasta uniref90
5. Step-by-Step Guide
Step 1: Candidate Retrieval (Manual/Web)
PDB Search: Search keywords "nanopore", "porin" at RCSB PDB. Save oligomeric structures to 1nanopore_query/PDB_nanopore/.
AFDB Search: Search keywords at AlphaFold DB. Save monomers to 1nanopore_query/AFDB_nanopore/.
*Refer to 1nanopore_query/search_keywords.txt for the detailed query logic.
Step 2: Structure-based Mining
Compare monomeric seed structures against AFDB using Foldseek:
perl bin/2_structure_search.pl 1nanopore_query/PDB_nanopore 1nanopore_query/AFDB_nanopore Database [threads]
Step 3: Sequence-based Expansion
Expand candidates by searching against UniRef90 and MGnify90FL:
perl bin/3_sequence_search.pl Database/uniref90 Database/MGnify90FL Database/uniref90.fasta Database/MGnify90FL.fa [threads]
Step 4: Multimeric Structure Prediction
AFM: Predict locally using AlphaFold-Multimer. Save to 4Multimer_prediction/nanopore_AFM/.
AF3: Submit to AlphaFold3 Server. Save to 4Multimer_prediction/nanopore_AF3/.
Consistency Check:
python bin/4_check.py
- Ensures IDs match between AFM (.pdb) and AF3 (.cif)
Step 5: Quality Filtering & Merging
perl bin/5_structure_filter.pl [threads]
6. Citation
If you use this workflow or NanoporeDB, please cite: Liu et al. NanoporeDB: A Structural Resource Of Multimeric Protein Nanopores For Single-Molecule Sensing. GigaScience, 2025. DOI:[https://doi.org/10.1101/2025.11.25.690617]
Version History
V1.0.0 (latest) Created 17th May 2026 at 15:34 by Yuqian Liu
Add files via upload
Frozen
V1.0.0
1c278af
main @ 8179c93 (earliest) Created 13th May 2026 at 10:53 by Yuqian Liu
Update README.md
Frozen
main
8179c93
Creators and SubmitterCreator
Submitter
Views: 331 Downloads: 69
Created: 13th May 2026 at 10:53
AttributionsNone
View on GitHub