API for padmet.utils.exploration
Description:
#TODO
compare_padmet
- Description:
#Compare 1-n padmet and create a folder output with files: genes.tsv:
fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, 1 (if in padmet_a), 1 (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]
- reactions.tsv:
fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, 1 (if in padmet_a), 1 (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
- pathways.tsv:
fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
- compounds.tsv:
fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’’,rxn-1,’’]
usage:
padmet compare_padmet --padmet=FILES/DIR --output=DIR [--padmetRef=FILE] [--cpu INT] [-v]
option:
-h --help Show help.
--padmet=FILES/DIR pathname of the padmet files, sep all files by ',', ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
--output=DIR pathname of the output folder
--padmetRef=FILE pathanme of the database ref in padmet
--cpu INT number of CPU to use in multiprocessing
- padmet.utils.exploration.compare_padmet.compare_padmet(padmet_path, output, padmetRef=None, verbose=False, number_cpu=None)[source]
#Compare 1-n padmet and create a folder output with files: genes.tsv:
fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, 1 (if in padmet_a), 1 (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]
- reactions.tsv:
fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, 1 (if in padmet_a), 1 (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
- pathways.tsv:
fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
- compounds.tsv:
fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’’,rxn-1,’’]
- Parameters:
padmet_path (str) – pathname of the padmet files, sep all files by ‘,’, ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
output (str) – pathname of the output folder
padmetRef (padmet.classes.PadmetRef) – padmet containing the database of reference, need to calculat pathway completion rate
verbose (bool) – if True print information
compare_sbml
- Description:
compare reactions in 1-n or 2 sbml.
Returns if a reaction is missing
And if a reaction with the same id is using different species or different reversibility
usage:
padmet compare_sbml --sbml=FILES/DIR --output=DIR
option:
-h --help Show help.
--sbml FILES/DIR pathname of the sbml files, sep all files by ',', ex: /path/sbml1.sbml;/path/sbml2.sbml OR a folder
--output DIR pathname of the output folder
- padmet.utils.exploration.compare_sbml.compare_multiple_sbml(sbml_path, output_folder)[source]
Compare 1-n sbml, create two output files reactions.tsv and metabolites.tsv with the reactions/metabolites in each sbml
- Parameters:
sbml_path (str) – path to a folder containing sbmls or multiple sbml paths separated by a ‘,’
output_folder (str) – path to the output folder
- padmet.utils.exploration.compare_sbml.compare_rxn(rxn1, rxn2)[source]
compare two cobra reaction object and return (same_cpd, same_rev) same_cpd: bool, if true means same compounds consumed and produced same_reve: bool, if true means same direction of reaction (reversible or not)
- Parameters:
rxn1 (cobra.model.reaction) – reaction as cobra object
rxn2 (cobra.model.reaction) – reaction as cobra object
- Returns:
(same_cpd (bool), same_rev (bool))
- Return type:
tuple
- padmet.utils.exploration.compare_sbml.compare_sbml(sbml1_path, sbml2_path)[source]
Compare 2 sbml, print nb of metabolites and reactions. If reaction missing print reaction id, and reaction formula.
- Parameters:
sbml1_path (str) – path to the first sbml file to compare
sbml2_path (str) – path to the second sbml file to compare
compare_sbml_padmet
- Description:
compare reactions in sbml and padmet file
usage:
padmet compare_sbml_padmet --padmet=FILE --sbml=FILE
option:
-h --help Show help.
--padmet=FILE path of the padmet file
--sbml=FILE path of the sbml file
- padmet.utils.exploration.compare_sbml_padmet.command_help()[source]
Show help for analysis command.
- padmet.utils.exploration.compare_sbml_padmet.compare_sbml_padmet(sbml_document, padmet)[source]
compare reactions ids in sbml vs padmet, return nb of reactions in both and reactions id not in sbml or not in padmet
- Parameters:
padmet (padmet.classes.PadmetSpec) – padmet to udpate
sbml_file (libsbml.document) – sbml document
convert_sbml_db
- Description:
This tool is use the MetaNetX database to check or convert a sbml. Flat files from MetaNetx are required to run this tool. They can be found in the aureme workflow or from the MetaNetx website. To use the tool set:
mnx_folder= the path to a folder containing MetaNetx flat files. the files must be named as ‘reac_xref.tsv’ and ‘chem_xref.tsv’ or set manually the different path of the flat files with:
mnx_reac= path to the flat file for reactions
mnx_chem= path to the flat file for chemical compounds (species)
- To check the database used in a sbml:
- to check all element of sbml (reaction and species) set:
to–map=all
- to check only reaction of sbml set:
to–map=reaction
- to check only species of sbml set:
to–map=species
- To map a sbml and obtain a file of mapping ids to a given database set:
- to-map:
as previously explained
- db_out:
the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
- output:
the path to the output file
For a given sbml using a specific database.
Return a dictionnary of mapping.
the output is a file with line = reaction_id/or species in sbml, reaction_id/species in db_out database
- ex:
For a sbml based on kegg database, db_out=metacyc: the output file will contains for ex:
R02283 ACETYLORNTRANSAM-RXN
usage:
padmet convert_sbml_db --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --to-map=STR [-v]
padmet convert_sbml_db --mnx_folder=DIR --sbml=FILE --to-map=STR [-v]
padmet convert_sbml_db --mnx_folder=DIR --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]
padmet convert_sbml_db --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]
options:
-h --help Show help.
--to-map=STR select the part of the sbml to check or convert, must be in ['all', 'reaction', 'species']
--mnx_reac=FILE path to the MetaNetX file for reactions
--mnx_chem=FILE path to the MetaNetX file for compounds
--sbml=FILE path to the sbml file to convert
--output=FILE path to the file containing the mapping, sep = "\t"
--db_out=FILE id of the output database in ["BIGG","METACYC","KEGG"]
-v verbose.
- padmet.utils.exploration.convert_sbml_db.check_sbml_db(sbml_file, to_map, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]
Check sbml database of a given sbml.
- Parameters:
sbml_file (str) – path to the sbml file to convert
to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]
verbose (bool) – if true: more info during process
mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)
mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)
mnx_folder (str) – the path to a folder containing MetaNetx flat files
- Returns:
(name of the best matching database, dict of matching)
- Return type:
tuple
- padmet.utils.exploration.convert_sbml_db.map_sbml(sbml_file, to_map, db_out, output, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]
map a sbml and obtain a file of mapping ids to a given database.
- Parameters:
sbml_file (str) – path to the sbml file to convert
to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]
db_out (str) – the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
output (str) – path to the file containing the mapping, sep = “ “
verbose (bool) – if true: more info during process
mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)
mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)
mnx_folder (str) – the path to a folder containing MetaNetx flat files
- Returns:
(name of the best matching database, dict of matching)
- Return type:
tuple
dendrogram_reactions_distance
- Description:
Use reactions.tsv file from compare_padmet.py to create a dendrogram using a Jaccard distance.
From the matrix absence/presence of reactions in different species computes a Jaccard distance between these species. Apply a hierarchical clustering on these data with a complete linkage. Then create a dendrogram. Apply also intervene to create an upset graph on the data.
usage:
padmet dendrogram_reactions_distance --reactions=FILE --output=FOLDER [--padmetRef=STR] [--pvclust] [--upset=INT] [-v]
option:
-h --help Show help.
--reactions=FILE pathname of the file containing reactions in each species of the comparison.
--output=FOLDER path to the output folder.
--pvclust launch pvclust dendrogram using R
--padmetRef=STR path to the padmet Ref file
-u --upset=INT number of cluster in the upset graph.
-v verbose mode.
- padmet.utils.exploration.dendrogram_reactions_distance.absent_and_specific_reactions(reactions_dataframe, output_folder_tree_cluster, output_folder_specific, output_folder_absent, organisms)[source]
Compare all cluster one against another.
- Parameters:
reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
output_folder_tree_cluster (str) – path to output tree cluster folder
output_folder_specific (str) – path to output folder with specific reactions for each species
output_folder_absent (str) – path to output folder with absent reactions for each species
organisms (list) – organisms names
- padmet.utils.exploration.dendrogram_reactions_distance.add_dendrogram_node_label(reaction_dendrogram, node_list, reactions_clust, len_longest_cluster_id)[source]
Using cluster nodes, add label and reactions number on each node of the dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/43519473
- Parameters:
reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
node_list (list) – cluster nodes
reactions_clust (dictionary) – reactions in each cluster of the tree
len_longest_cluster_id (int) – reactions in each cluster of the tree
- padmet.utils.exploration.dendrogram_reactions_distance.command_help()[source]
Show help for analysis command.
- padmet.utils.exploration.dendrogram_reactions_distance.comparison_cluster(reactions_clust, output_folder_comparison)[source]
Compare all cluster one against another.
- Parameters:
reactions_clust (dictionary) – reactions in each cluster of the tree
output_folder_comparison (str) – path to output folder
- padmet.utils.exploration.dendrogram_reactions_distance.create_cluster(reactions_dataframe, absence_presence_matrix, linkage_matrix)[source]
Cut the dendrogram to create clusters.
- Parameters:
reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
linkage_matrix (ndarray) – linkage matrix
- Returns:
dendrogram_fclusters – {number used to split the linkage matrix: ndarray with the corresponding clusters}
- Return type:
dictionary
- padmet.utils.exploration.dendrogram_reactions_distance.create_intersection_files(root, cluster_leaf_species, reactions_dataframe, output_folder_tree_cluster, metacyc_to_ecs)[source]
Create intersection files.
- Parameters:
root (root) – root of the xml tree
cluster_leaf_species (dictionary) – for each leaf give the organisms in it
reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
output_folder_tree_cluster (str) – path to the output folder
metacyc_to_ecs (dictionary) – mapping of metayc reaction to EC number
- Returns:
reactions_clust – reactions in each cluster of the tree
- Return type:
dictionary
- padmet.utils.exploration.dendrogram_reactions_distance.create_intervene_graph(absence_presence_matrix, reactions_dataframe, temp_data_folder, path_to_intervene, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]
Create an upset graph. Deprecated function, no we use supervenn look at create_supervenn function.
- Parameters:
absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
temp_data_folder (str) – temporary data folder
path_to_intervene (str) – path to intervene bin
output_folder_upset (str) – path to output folder
dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}
k (int) – number of cluster to create
- padmet.utils.exploration.dendrogram_reactions_distance.create_pvclust_dendrogram(reaction_file, output_folder)[source]
- padmet.utils.exploration.dendrogram_reactions_distance.create_supervenn(absence_presence_matrix, reactions_dataframe, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]
Create an supervenn graph.
- Parameters:
absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
output_folder_upset (str) – path to output folder
dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}
k (int) – number of cluster to create
- padmet.utils.exploration.dendrogram_reactions_distance.dendrogram_reactions_distance_cli(command_args)[source]
- padmet.utils.exploration.dendrogram_reactions_distance.getNewick(node, newick, parentdist, leaf_names)[source]
Create a newick file from the root node of the dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/31878514.
- Parameters:
node (scipy.cluster.hierarchy.ClusterNode) – root ClusterNode of the scipy tree
newick (str) – newick string
parentdist (str) – root ClusterNode distance from the linkage matrix
leaf_names (list) – list of organism names
- padmet.utils.exploration.dendrogram_reactions_distance.hclust_to_xml(linkage_matrix)[source]
Using a distance matrix from scipy linkage, create a xml tree corresponding to the hierarchical clustering. Return the root of the tree.
- Parameters:
linkage_matrix (ndarray) – linkage matrix
- Returns:
root of the xml tree
- Return type:
root
- padmet.utils.exploration.dendrogram_reactions_distance.pvclust_dendrogram(reactions_dataframe, organisms, output_folder)[source]
Using a distance matrix, pvclust R package (with rpy2 package) create a dendrogram with bootstrap values.
- Parameters:
reactions_dataframe (pandas.DataFrame) – Reactions absence/presence matrix
organisms (list) – organisms names
output_folder (str) – path to the output folder
- padmet.utils.exploration.dendrogram_reactions_distance.reaction_figure_creation(reaction_file, output_folder, upset_cluster=None, padmetRef_file=None, pvclust=None, verbose=False)[source]
Create dendrogram, upset figure (if upset argument) and compare reactiosn in species.
- Parameters:
reaction_file (str) – path to reaction file
upset_cluster (int) – the number of cluster you want in the intervene figure
output_folder (str) – path to output folder
padmet_ref_file (str) – path to padmet ref file
pvclust (bool) – boolean to launch or not R pvclust dendrogram
flux_analysis
- Description:
1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.
2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.
3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)
usage:
padmet flux_analysis --sbml=FILE
padmet flux_analysis --sbml=FILE --seeds=FILE --targets=FILE [--all_species]
padmet flux_analysis --sbml=FILE --all_species
option:
-h --help Show help.
--sbml=FILE pathname to the sbml file to test for fba and fva.
--seeds=FILE pathname to the sbml file containing the seeds (medium).
--targets=FILE pathname to the sbml file containing the targets.
--all_species allow to make FBA on all the metabolites of the given model.
- padmet.utils.exploration.flux_analysis.fba_on_targets(allspecies, model)[source]
for each specie in allspecies, create an objective function with the current species as only product and try to optimze the model and get flux.
- Parameters:
allSpecies (list) – list of species ids to test
model (cobra.model) – Cobra model from a sbml file
- padmet.utils.exploration.flux_analysis.flux_analysis(sbml_file, seeds_file=None, targets_file=None, all_species=False)[source]
1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.
2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.
3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)
- Parameters:
sbml_file (str) – path to sbml file to analyse
seeds_file (str) – path to sbml file with only compounds representing the seeds/growth medium
targets_file (str) – path to sbml file with only compounds representing the targets to reach
all_species (bool) – if True will try to create obj function for each compound and return which are reachable by flux.
get_pwy_from_rxn
- Description:
From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio
usage:
padmet get_pwy_from_rxn --reaction_file=FILE --padmetRef=FILE --output=FILE
options:
-h --help Show help.
--reaction_file=FILE pathname of the file containing the reactions id, 1/line
--padmetRef=FILE pathname of the padmet representing the database.
--output=FILE pathname of the file with line = pathway id, all reactions id, reactions ids from reaction file, ratio. sep = "\t"
- padmet.utils.exploration.get_pwy_from_rxn.dict_pwys_to_file(dict_pwy, output)[source]
Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()
- Parameters:
dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
output (str) – path to output file
- padmet.utils.exploration.get_pwy_from_rxn.extract_pwys(padmet, reactions)[source]
#extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
- Parameters:
padmet (padmet.classes.PadmetSpec) – padmet to udpate
reactions (set) – set of reactions to match with pathways
- Returns:
dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
- Return type:
dict
padmet_stats
- Description:
Create a padmet stats file containing the number of pathways, reactions, genes and compounds inside the padmet.
The input is a padmet file or a folder containing multiple padmets.
Create a tsv file named padmet_stats.tsv where the script have been launched.
usage:
padmet padmet_stats --padmet=FILE --output=FOLDER
option:
-h --help Show help.
-p --padmet=FILE padmet file or folder containing padmet(s).
-o --output=FOLDER path to output folder.
- padmet.utils.exploration.padmet_stats.compute_stats(padmet_file_folder, output_folder)[source]
Count reactions/pathways/compounds/genes in padmet(s).
- Parameters:
padmet_file_folder (str) – path to the padmet file/folder to analyze
output_folder (str) – path to the output folder
- padmet.utils.exploration.padmet_stats.orthology_result(padmet_file, padmet_names)[source]
Count reactions/pathways/compounds/genes in a padmet file.
- Parameters:
padmet_file (str) – path to a padmet file
padmet_names (list) – all the padmet filenames
- Returns:
Number of reactions given by the other species
- Return type:
dictionary
- padmet.utils.exploration.padmet_stats.padmet_stat(padmet_file)[source]
Count reactions/pathways/compounds/genes in a padmet file.
- Parameters:
padmet_file (str) – path to a padmet file
- Returns:
[path to padmet, number of pathways, number of reactions, number of genes, number of compounds, number of class compounds]
- Return type:
list
prot2genome
- Description:
Prot2Genome contains functions used for blast analysis and padmet enrichment
usage:
padmet prot2genome --query_faa=FILE --query_ids=FILE/STR --subject_gbk=FILE --subject_fna=FILE --subject_faa=FILE --output_folder=FILE [--cpu=INT] [blastp] [tblastn] [debug]
padmet prot2genome --query_faa=FILE --query_ids=FILE/STR --subject_gbk=FILE --subject_fna=FILE --subject_faa=FILE --output_folder=FILE --exonerate=PATH [--cpu=INT] [blastp] [tblastn] [debug]
padmet prot2genome --padmet=FOLDER --output=FOLDER
padmet prot2genome --studied_organisms=FOLDER --output=FOLDER
padmet prot2genome --run=FOLDER --padmetRef=FILE [--cpu=INT] [debug]
From aucome run fromAucome():
-1. Extract specifique reactions in spec_reactions folder with extractReactions()
-2. Extract genes from spec_reactions files with extractGenes()
-3. Run tblastn + exonerate with runAllAnalysis()
options:
--query_faa=FILE #TODO.
--query_ids=FILE/STR #TODO.
--subject_gbk=FILE #TODO.
--subject_fna=FILE #TODO.
--subject_faa=FILE #TODO.
--output_folder=FILE #TODO.
--cpu=INT Number of cpu to use for the multiprocessing (if none use 1 cpu). [default: 1]
blastp #TODO.
tblastn #TODO.
debug #TODO.
- padmet.utils.exploration.prot2genome.cleanTmp(tmp_folder)[source]
Remove all files from tmp folder
- Parameters:
tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse
- padmet.utils.exploration.prot2genome.createPadmet(dict_args)[source]
function used in mp_createPadmet by each worker the Pool padmet are updated using funciton add_delete_rxn from padmet.utils.connection.manual_curation
- padmet.utils.exploration.prot2genome.createSeqFromTblastn(subject_fna, sseq_seq_faa, exonerate_target_id, start_match, end_match)[source]
Use the result from the tBlastn to extract a region from the subject genome. The region extracted corresponds to the match region and 10kb before and 10kb after.
- Parameters:
subject_fna (str) – path to subject fasta sequence (genome)
sseq_seq_faa (str) – path to output fasta sequence
exonerate_target_id (str) – ID of the contig/scaffold/chromosome where a match has been found
start_match (int) – start of the match
end_match (int) – end of the match
- padmet.utils.exploration.prot2genome.extractAnalysis(blast_analysis_folder, spec_reactions_folder, output_folder)[source]
- For each analysis output in blast analysis folder, obtained with runAllAnalysis()
1./ Extract orthologues hit 2./ For each specific reactions from spec_reactions_folder, if all genes of a reactions got ortho hit
add reaction to reactions_to_add
- Parameters:
blast_analysis_folder (str) – path folder with all blast analysis output files
spec_reactions_folder (str) – path folder with all files containing specific reactions
output_folder (str) – path folder where to extract all reactions to add
- padmet.utils.exploration.prot2genome.extractGenes(reactions_file)[source]
Extract genes ids and return a list from reactions_file obtained with extractReactions()
- Parameters:
reactions_file (str) – path to reaction file
- padmet.utils.exploration.prot2genome.extractReactions(dict_args)[source]
function used in mp_cextractReactions by each worker the Pool for org_a.padmet and org_b.padmet:
1./ extract reactions and specific reactiosn (not in a, not in b) 2./ extract genes associated to specific reactions 3./ Select only reactions if they are from annotation rxn-1 in org_a but not in org_b, if rxn-1 doesn’t come from org_a annotation, skip the reaction 4./ create output file: header = [“reaction_id”, “genes_ids”, “sources”]
- padmet.utils.exploration.prot2genome.extract_sequence(exonerate_output, exonerate_sequence)[source]
Extract protein sequence from exonerate ouput.
- padmet.utils.exploration.prot2genome.fromAucome(run_folder, cpu, padmetRef, blastp=True, tblastn=True, exonerate=True, keep_tmp=False, debug=False)[source]
This function fit an AuCoMe run. Select a aucome run folder and then the function will: 1./ For each couple of studied organisms, extract specific reactions
ex: For org A and org B, extract reactions in org A but not in org B and vice versa
2./ Then for each specific reactions, extract genes associated and run blastp, tblastn and exonerate 3./ For each reaction, for all genes associated, if no blastp match but tblastn and exonerate hit select the reaction as a hit 4./ Create a new padmet file with the new reactions to add within
- Parameters:
run_folder (str) – path to aucome run folder
cpu (int) – number of cpu to use for multiprocessing steps
padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files
blastp (bool) – If true run blastp during analysis
tblastn (bool) – If true run tblastn during analysis
exonerate (bool) – If true run exonerate during analysis, tblastn must also be True
keep_tmp (bool) – If true keep temporary files of analysis (with predicted gene sequence)
debug (bool) – if true, print all raw informations of analysis
- padmet.utils.exploration.prot2genome.mp_createPadmet(reactions_to_add_folder, padmet_folder, output_folder, padmetRef, pool, verbose=False)[source]
Update all padmet in padmet_folder with reactions to add from file in reactiosn_to_add_folder, the informations of the reactions are extracted from padmetRef as unique source ex: for padmet_folder/org_a.padmet, select reactions_to_add_folder/org_a.tsv, add each reactions listed in this file based on padmetRef to create output_folder/org_a.padmet Create the padmet files in multiprocess, the more cpu the more new padmet files will be created faster
- Parameters:
reactions_to_add_folder (str) – path folder with all files containing reactions to add for each studied organism
padmet_folder (str) – path to folder with all padmet files of studied organism
output_folder (str) – path to output folder where to create new padmet files
padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files
pool (Pool object) – pool object of multiprocessing
verbose (bool) – verbose
- padmet.utils.exploration.prot2genome.mp_extractReactions(padmet_folder, output_folder, pool)[source]
From a folder of padmet files, create all dual combination and extract specific reactions to create a file in output_folder ex: in padmet_folder: org_a.padmet, org_b.padmet, create: output_folder: org_a_vs_org_b.tsv and org_b_vs_org_a.tsv
- Parameters:
padmet_folder (str) – path to folder with all padmet files of studied organism
output_folder (str) – path to output folder where to extract specific reactions
pool (Pool object) – pool object of multiprocessing
- padmet.utils.exploration.prot2genome.mp_runAnalysis(spec_reactions_folder, studied_organisms_folder, output_folder, tmp_folder, pool, blastp, tblastn, exonerate, keep_tmp, debug, predicted_folder)[source]
Run different blast analysis based on files representing specific reactions of 2 padmet files. For each specific reaction file in spec_reactions_folder (ex: org_a_vs_org_b.tsv):
- 1./ search for:
faa file of org_a (studied_organisms_folder/org_a/org_a.faa) gbk file of org_b (studied_organisms_folder/org_b/org_b.gbk) faa file of org_b (studied_organisms_folder/org_b/org_b.faa) fna file of org_b (studied_organisms_folder/org_b/org_b.fna)
if fna doesn’t exist create it
2./ if output file (blast_analysis_folder/org_a_VS_org_b.tsv) doesn’t already exist run analysis 3./ extracts all genes ids from specific reaction file with fct extractGenes() 4./ Run blastp, tblastn, exonerate on gene_id.faa vs target.faa / fna with runAllAnalysis() 5./ Create analysis output The analysis create a lot of temp files, all are in tmp_folder wich is cleaned after all loop
- Parameters:
spec_reactions_older (str) – path folder with all files containing specific reactions
studied_organisms_folder (str) – path to folder with all data of studied organisms. Folder contains 1 folder by org with name as org name, in each: org.gbk,org.faa,org.fna
output_folder (str) – path to output folder where to extract blast analysis
tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse
pool (Pool object) – pool object of multiprocessing
blastp (bool) – If true run blastp during analysis
tblastn (bool) – If true run tblastn during analysis
exonerate (bool) – If true run exonerate during analysis, tblastn must also be True
keep_tmp (bool) – If true keep temporary files of analysis (with predicted gene sequence)
debug (bool) – if true, print all raw informations of analysis
- padmet.utils.exploration.prot2genome.runAllAnalysis(dict_args)[source]
- For a given gene query id:
- 1/ extract from query_faa the sequence and create a faa file output_folder/query_id.faa
If isoforms found, also search for each specific isoform
2/ if blastp, run blastp; if tblastn, run tblastn; if exonerate and tblastn has hit, run exonerate Run all of them and extract output as dict of data
- Returns:
list of dict with all analysis output
- Return type:
list
- padmet.utils.exploration.prot2genome.runBlastp(query_seq_faa, subject_faa, header=['sseqid', 'evalue', 'bitscore'], debug=False)[source]
Run blastp on querry_seq vs subectj faa and return output based on header Use NcbiblastpCommandline fct and extract output Extract 1st best hit based on bitscore
- Parameters:
query_seq_faa (str) – path to query fasta sequence
subject_faa (str) – path to subject fasta sequence
header (list) – output format of blastp
debug (bool) – if true print all raw blastp output
- Returns:
dict of the best blastp hit, add ‘blastp_’ tag, or empty dict if no hit
- Return type:
dict
- padmet.utils.exploration.prot2genome.runExonerate(query_seq_faa, sseq_seq_faa, output, debug=False)[source]
Run exonerate on querry_seq vs subject faa Exonerate must be installed, and the global var PATH must be update with the exonerate/bin/ command ‘exonerate’ should work from shell sseq_seq_faa is obtained after tblastn run based on tblastn_sseqid value
- Parameters:
query_seq_faa (str) – path to query fasta sequence
sseq_seq_faa (str) – path to subject faa sequence
output (str) – path to exonerate output
debug (bool) – if true print all raw exonerate output
- Returns:
dict of the best exonerate hit, add ‘exonerate_’ tag, or empty dict if no hit
- Return type:
dict
- padmet.utils.exploration.prot2genome.runSearchOnProteome(proteome_orgA, genome_orgB, output_folder, proteome_orgB=None)[source]
From a proteome of OrgA search for missing structural annotation in genome of OrgB. First launch Blastp between proteome of OrgA and proteome of OrgB. Then launch tBlastn between proteome of OrgA and genome of OrgB to find matches. Use the best match to extract a region from the genome of OrgB. Then launch Exonerate on this region using the sequence of OrgA.
- Parameters:
proteome_orgA (str) – path to fasta file of proteome of OrgA
genome_orgB (str) – path to fasta file of genome of OrgB
output_folder (str) – path to output folder
proteome_orgB (str) – path to fasta file of proteome of OrgB
- padmet.utils.exploration.prot2genome.runTblastn(query_seq_faa, subject_fna, header=['sseqid', 'evalue', 'bitscore', 'sstart', 'send'], debug=False)[source]
Run tblastn on querry_seq vs subectj fna and return output based on header Use NcbitblastnCommandline fct and extract output Extract 1st best hit based on bitscore
- Parameters:
query_seq_faa (str) – path to query fasta sequence
subject_fna (str) – path to subject fna sequence
header (list) – output format of tblastn
debug (bool) – if true print all raw tblastn output
- Returns:
dict of the best tblastn hit, add ‘tblastn_’ tag, or empty dict if no hit
- Return type:
dict
report_network
- Description:
Create reports of a padmet file.
all_pathways.tsv: header = [“dbRef_id”, “Common name”, “Number of reaction found”, “Total number of reaction”, “Ratio (Reaction found / Total)”]
all_reactions.tsv: header = [“dbRef_id”, “Common name”, “formula (with id)”, “formula (with common name)”, “in pathways”, “associated genes”]
all_metabolites.tsv: header = [“dbRef_id”, “Common name”, “Produced (p), Consumed (c), Both (cp)”]
usage:
padmet report_network --padmetSpec=FILE --output_dir=dir [--padmetRef=FILE] [-v]
options:
-h --help Show help.
--padmetSpec=FILE pathname of the padmet file.
--padmetRef=FILE pathname of the padmet file used as database
--output_dir=dir directory for the results.
-v print info.
visu_network
- Description:
Allows to visualize a metabolic network on a compounds perspectives
usage:
padmet visu_network -i=FILE -o=FILE [--html=FILE] [--level=STR] [--hide-currency]
options:
-h --help Show help.
-i=FILE pathname to the input file (either PADMet or SBML).
-o=FILE pathname to the output file (picture of metabolic network).
--html=FILE pathname to the output file (interactive hmtl of metabolic network).
--level=STR level of precision for the visualization (compound, reaction or pathway). By default visualization uses "compound".
--hide-currency hide currency metabolites.
- padmet.utils.exploration.visu_network.create_graph(metabolic_network_file, output_file, visualization_level, hide_currency_metabolites)[source]
Using output of parse_compounds_padmet or parse_compounds_sbml create a network picture using igraph.
- Parameters:
metabolic_network_file (str) – pathname of the metabolic network file
output_file (str) – pathname of the output picture of the metabolic network
visualization_level (str) – level of visualization either compound, reaction or pathway
hide_currency_metabolites (bool) – hide currency metabolites
- padmet.utils.exploration.visu_network.create_html_graph(metabolic_network_file, output_file, visualization_level, hide_currency_metabolites)[source]
Using output of parse_compounds_padmet or parse_compounds_sbml create an interactive graph in html.
- Parameters:
metabolic_network_file (str) – pathname of the metabolic network file
output_file (str) – pathname of the output picture of the metabolic network
visualization_level (str) – level of visualization either compound, reaction or pathway
hide_currency_metabolites (bool) – hide currency metabolites
- padmet.utils.exploration.visu_network.parse_compounds_padmet(padmet_file, hide_metabolites)[source]
Parse padmets files to extract compounds to create edges and nodes for igraph.
- Parameters:
padmet_file (str) – pathname of the padmet file
hide_metabolites (list) – list of metabolites to hide
- Returns:
edges (list) – edges between two compounds (symbolizing the reaction)
edges_label (list) – for each edge the name of the reaction
weights (list) – the weight associated to each edge
nodes (list) – a compound
nodes_label (list) – for each node the name of the compound
- padmet.utils.exploration.visu_network.parse_compounds_sbml(sbml_file, hide_metabolites)[source]
Parse sbml files to extract compounds to create edges and nodes for igraph.
- Parameters:
sbml_file (str) – pathname of the sbml file
hide_metabolites (list) – list of metabolites to hide
- Returns:
edges (list) – edges between two compounds (symbolizing the reaction)
edges_label (list) – for each edge the name of the reaction
weights (list) – the weight associated to each edge
nodes (list) – a compound
nodes_label (list) – for each node the name of the compound
- padmet.utils.exploration.visu_network.parse_pathways_padmet(padmet_file)[source]
Parse padmets files to extract pathway inputs and ouputs to create edges and nodes for igraph.
- Parameters:
padmet_file (str) – pathname of the padmet file
- Returns:
edges (list) – edges between two compounds (symbolizing the pathway)
edges_label (list) – for each edge the name of the pathway
weights (list) – the weight associated to each edge
nodes (list) – a compound
nodes_label (list) – for each node the name of the compound
- padmet.utils.exploration.visu_network.parse_reactions_padmet(padmet_file)[source]
Parse padmets files to extract reactions to create edges and nodes for igraph.
- Parameters:
padmet_file (str) – pathname of the padmet file
- Returns:
edges (list) – edges between two reactions
edges_label (list) – for each edge the name of the reaction
weights (list) – the weight associated to each edge
nodes (list) – a compound
nodes_label (list) – for each node the name of the compound
visu_path
- Description:
Allows to visualize a pathway in padmet network.
Color code: reactions associated to the pathway, present in the network: lightgreen reactions associated to the pathway, not present in the network: red compounds: skyblue
usage:
padmet visu_path --padmetSpec=FILE/FOLDER --padmetRef=FILE --pathway=ID --output=FILE [--hide-currency] [--level=STR]
options:
-h --help Show help.
--padmetSpec=FILE/FOLDER pathname to the PADMet file of the network or to a folder containing multiple padmets.
--padmetRef=FILE pathname to the PADMet file of the db of reference.
--pathway=ID pathway id to visualize, can be multiple pathways separated by a ",".
--output=FILE pathname to the output file (extension can be .png or .svg).
--hide-currency hide currency metabolites.
--level=STR level of precision for the visualization (compound or pathway). By default visualization uses "compound".
- padmet.utils.exploration.visu_path.visu_path_compounds(padmet_pathname, padmet_ref_pathname, pathway_ids, output_file, hide_currency_metabolites=None)[source]
Extract reactions from pathway and create a comppound/reaction graph.
- Parameters:
padmet_pathname (str) – pathname of the padmet file or a folder containing multiple padmet
padmet_ref_pathname (str) – pathname of the padmetRef file
pathway_ids (str) – name of the pathway (can be multiple pathways separated by a ‘,’)
output_file (str) – pathname of the output picture (extension can be .png or .svg)
hide_currency_metabolites (bool) – hide currency metabolites
- padmet.utils.exploration.visu_path.visu_path_pathways(padmet_pathname, padmet_ref_pathname, pathway_ids, output_file)[source]
Extract reactions from pathway and create a comppound/reaction graph.
- Parameters:
padmet_pathname (str) – pathname of the padmet file or a folder containing multiple padmet
padmet_ref_pathname (str) – pathname of the padmetRef file
pathway_ids (str) – name of the pathway (can be multiple pathways separated by a ‘,’)
output_file (str) – pathname of the output picture (extension can be .png or .svg)
hide_compounds (bool) – hide common compounds (like water or proton)
visu_similarity_gsmn
- Description:
Visualize similarity between metabolic networks using MDS.
usage:
padmet visu_similarity_gsmn --reaction=FILE --output=FILE [--group=FILE]
options:
-h --help Show help.
--reaction=FILE pathname to the reaction file output of compare_padmet or compare_sbml.
--output=FILE pathname to the picture output file containing the MDS projection
--group=FILE pathname to the group file containing a column named "species" with the organism ID and a column "group" classifying species in group (you can also use a "color" column to associate group to specific color)
- padmet.utils.exploration.visu_similarity_gsmn.command_help()[source]
Show help for analysis command.
- padmet.utils.exploration.visu_similarity_gsmn.visu_similarity_gsmn(reaction_file, output_file, group_file=None)[source]
Create dendrogram, upset figure (if upset argument) and compare reactiosn in species.
- Parameters:
reaction_file (str) – path to reaction file from compare_padmet/compare_sbml.
output_file (str) – path to picture ouput file.
group_file (str) – path to group file containing group assignation for each metabolic network.