Skip to content

An automatic pipeline for the BGC characterization of genomes. Moreover, it should integrate further data layers, such as a phylogenetic tree and the similarity to specific proteins.

License

Notifications You must be signed in to change notification settings

Integrative-Transcriptomics/SID-Chart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SID-Chart

An automatic pipeline for the BGC characterization of genomes. Moreover, it should integrate further data layers, such as a phylogenetic tree and the similarity to specific proteins.

Prerequisites

Input

SID-Chart expects input data in the following format.

└── Data
    ├── dataset_staphy
        ├── ncbi_dataset
            ├── GCA_000433035.1
                ├── GCA_000433035.1_MGS324_genomic.fna
                ├── genomic.gbff
                └── protein.faa
            ├── ...
            └── ...
    ├── reference_BGCs
        ├── BGC0000943.gbk
        ├── ...
        └── ...
    ├── reference_genome
        └── GCF_001027105.1_ASM102710v1_genomic.fna
    ├── metadata.tsv
    ├── overviewBGCs.csv
    ├── proteins_uptake.fa
    └── Staphylococcus_aureus.trn

  • ncbi_dataset/ — Contains information about all species to be analyzed.
  • reference_BGCs/ — Contains all biosynthetic gene clusters (BGCs) to be included in the analysis.
  • reference_genome/ — Contains a FASTA file of a reference species required by chewBBACA.
  • metadata.tsv — Lists the accession numbers and corresponding NCBI organism names.
  • overviewBGC.csv — Provides a mapping between BGCs and lipoproteins.
  • proteins_uptake.fa — Contains the lipoproteins to be analyzed.
  • Staphylococcus_aureus.trn — A Prodigal training file required by chewBBACA (can be created using Pyrodigal)

File and folder naming expected by SID-Chart can be customized via nextflow.conf.

Run Pipeline

  • Check that the input file names correspond to the default parameters defined in nextflow.conf.
  • If they differ, either modify the values in nextflow.conf or provide the correct file names as arguments in run_pipeline.sh.
  • Set the input directory inside the run_pipeline.sh script by defining it as the --input parameter in the Nextflow command.
  • Inside the nf directory run:
./run_pipeline.sh [RUN_NAME]

Run Visualization

  • Inside the web directory run:
./run_visualization.sh [RUN_NAME]

About

An automatic pipeline for the BGC characterization of genomes. Moreover, it should integrate further data layers, such as a phylogenetic tree and the similarity to specific proteins.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •