To use MEBS2, the only requirement is to known the genomes involved in a given metabolism. As an example, you can find the in the test_input directory, a list containing 97 genus of sulfate/elemental sulfur reducing microorganisms described in MEBSv1
First generate a file containing the genus of interest.
head test_input/srb.txt
Desulfobacula
Desulfocaldus
Desulfocella
Desulfofrigus
Desulfonatronum
Desulfonauticus
Desulforegula
Desulforhabdus
Desulfospira
Desulfothermus
Desulfotignum
Thermodesulforhabdus
Archaeoglobus
We provide the assembly_refseq.nr2016.txt derived from the assembly sumary file form RefSeq
. For more information see the Stage 1 of MEBSv1
for i in `cat test_input/SRB.txt`; do grep $i assembly_refseq.nr2016.txt ; done | cut -f 1,8 >> test_input/genomes_SRB.txt
From those 97 genus of the input list, we obtain a total of 207 complete and non-reduntant genomes, being at least two genomes per genera.
less genomes_SRB.txt | sort | uniq -c | sort -r
4 GCF_000243155.2 Desulfitobacterium dehalogenans ATCC 51507
4 GCF_000243135.2 Desulfitobacterium dichloroeliminans LMG P-21439
4 GCF_000231405.2 Desulfitobacterium metallireducens DSM 15288
4 GCF_000020365.1 Desulfobacterium autotrophicum HRM2
4 GCF_000010045.1 Desulfitobacterium hafniense Y51
2 GCF_001592435.1 Thermococcus peptonophilus
2 GCF_001553605.1 Desulfovibrio fairfieldensis
perl entropy.pl
Program to compute Pfam entropies from a list of accession genomes of interest.
usage: entropy.pl [options]
-help Brief help message
-input_dom tbl-format file with HMM matches produced by hmmsearch (required)
-input_list list of selected genomes of interest (required)
-names optional list of Refseq assembly annotations to print
scientific names instead of accesion codes (optional)
We have previously annotated the Gen and GenF datasets (relase ) 88 Gb the directory /realease to dowload… TODO
genomes_refseq_nr_22122016.faa.all.pfam.tab
genomes_refseq_nr_22122016_size100_cover10.faa.all.pfam.tab
genomes_refseq_nr_22122016_size150_cover10.faa.all.pfam.tab
genomes_refseq_nr_22122016_size200_cover10.faa.all.pfam.tab
genomes_refseq_nr_22122016_size250_cover10.faa.all.pfam.tab
genomes_refseq_nr_22122016_size300_cover10.faa.all.pfam.tab
genomes_refseq_nr_22122016_size30_cover10.faa.all.pfam.tab
genomes_refseq_nr_22122016_size60_cover10.faa.all.pfam.tab
for i in GenF_Pfam; do
perl entropy.pl $i genomes_SRB.txt > $i.SRB.csv
done
Warning!!! All the files need to have the same number of domains(profiles) in the same column order. This script assumes that these considerations are true, so it cannot find errors in the input files format
python3 scripts/extract_entropies.py test_input_entropy
The above comand generate the entropy file required to compute the score
head test_input_entropy_entropies.tab
For practical reasons, first change the name of the entropy file to entropies.tab, and then move the later file move the entropy file inside the input directory . In this case, if the user wants to compute the score for several ‘metabolism’ each one must contain an entropy file and the list of genomes of interest.
mv test_input_entropy_entropies.tab entropies.tab && mv entropies.tab test_input
To run the mebsv2 script, you need
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
gunzip Pfam-A.hmm.gz
less less -S config/config.txt
Cycle Path Comple
srb test_input/
perl mebsv2.pl
Program to compute MEBS for a set of genomic/metagenomic FASTA files in input folder.
usage: mebsv2.pl [options]
-help Brief help message
-input Folder containing FASTA peptide files (.faa) (required)
-type Nature of input sequences, either 'genomic' or 'metagenomic' (required)
-db Database to scann your input files (required, default Pfam-A.hmm)
-comp Compute the metabolic completeness (optional)
*NOT RUNNING! ERROR LOCATING HMM**
perl mebsv2.pl -input test_genomes -type genomic -db Pfam-A.hmm
# mebsv2.pl -input test_genomes -type genomic -comp
1
srb
Enterococcus_durans.faatest_input/
Error: File existence/permissions problem in trying to open HMM file 1.
HMM file 1 not found (nor an .h3m binary of it)
# ERROR: failed to generate test_genomes/Enterococcus_durans.faa.srb.hmmsearch.tab
NA
Archaeoglobus_profundus_DSM_5631.faatest_input/
Error: File existence/permissions problem in trying to open HMM file 1.
HMM file 1 not found (nor an .h3m binary of it)
# ERROR: failed to generate test_genomes/Archaeoglobus_profundus_DSM_5631.faa.srb.hmmsearch.tab
NA