Exploration¶
Description:
#TODO
compare_padmet¶
- Description:
#Compare 1-n padmet and create a folder output with files: genes.tsv:
fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, 1 (if in padmet_a), 1 (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]- reactions.tsv:
- fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, 1 (if in padmet_a), 1 (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
- pathways.tsv:
- fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
- compounds.tsv:
- fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’‘,rxn-1,’‘]
usage:
padmet compare_padmet --padmet=FILES/DIR --output=DIR [--padmetRef=FILE] [--cpu INT] [-v]
option:
-h --help Show help.
--padmet=FILES/DIR pathname of the padmet files, sep all files by ',', ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
--output=DIR pathname of the output folder
--padmetRef=FILE pathanme of the database ref in padmet
--cpu INT number of CPU to use in multiprocessing
-
padmet.utils.exploration.compare_padmet.
compare_padmet
(padmet_path, output, padmetRef=None, verbose=False, number_cpu=None)[source]¶ #Compare 1-n padmet and create a folder output with files: genes.tsv:
fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, 1 (if in padmet_a), 1 (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]- reactions.tsv:
- fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, 1 (if in padmet_a), 1 (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
- pathways.tsv:
- fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
- compounds.tsv:
- fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’‘,rxn-1,’‘]
Parameters: - padmet_path (str) – pathname of the padmet files, sep all files by ‘,’, ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
- output (str) – pathname of the output folder
- padmetRef (padmet.classes.PadmetRef) – padmet containing the database of reference, need to calculat pathway completion rate
- verbose (bool) – if True print information
compare_sbml¶
- Description:
compare reactions in 1-n or 2 sbml.
Returns if a reaction is missing
And if a reaction with the same id is using different species or different reversibility
usage:
padmet compare_sbml --sbml=FILES/DIR --output=DIR
option:
-h --help Show help.
--sbml FILES/DIR pathname of the sbml files, sep all files by ',', ex: /path/sbml1.sbml;/path/sbml2.sbml OR a folder
--output DIR pathname of the output folder
-
padmet.utils.exploration.compare_sbml.
compare_multiple_sbml
(sbml_path, output_folder)[source]¶ Compare 1-n sbml, create two output files reactions.tsv and metabolites.tsv with the reactions/metabolites in each sbml
Parameters: - sbml_path (str) – path to a folder containing sbmls or multiple sbml paths separated by a ‘,’
- output_folder (str) – path to the output folder
-
padmet.utils.exploration.compare_sbml.
compare_rxn
(rxn1, rxn2)[source]¶ compare two cobra reaction object and return (same_cpd, same_rev) same_cpd: bool, if true means same compounds consumed and produced same_reve: bool, if true means same direction of reaction (reversible or not)
Parameters: - rxn1 (cobra.model.reaction) – reaction as cobra object
- rxn2 (cobra.model.reaction) – reaction as cobra object
Returns: (same_cpd (bool), same_rev (bool))
Return type: tuple
-
padmet.utils.exploration.compare_sbml.
compare_sbml
(sbml1_path, sbml2_path)[source]¶ Compare 2 sbml, print nb of metabolites and reactions. If reaction missing print reaction id, and reaction formula.
Parameters: - sbml1_path (str) – path to the first sbml file to compare
- sbml2_path (str) – path to the second sbml file to compare
compare_sbml_padmet¶
- Description:
- compare reactions in sbml and padmet file
usage:
padmet compare_sbml_padmet --padmet=FILE --sbml=FILE
option:
-h --help Show help.
--padmet=FILE path of the padmet file
--sbml=FILE path of the sbml file
-
padmet.utils.exploration.compare_sbml_padmet.
command_help
()[source]¶ Show help for analysis command.
-
padmet.utils.exploration.compare_sbml_padmet.
compare_sbml_padmet
(sbml_document, padmet)[source]¶ compare reactions ids in sbml vs padmet, return nb of reactions in both and reactions id not in sbml or not in padmet
Parameters: - padmet (padmet.classes.PadmetSpec) – padmet to udpate
- sbml_file (libsbml.document) – sbml document
convert_sbml_db¶
- Description:
This tool is use the MetaNetX database to check or convert a sbml. Flat files from MetaNetx are required to run this tool. They can be found in the aureme workflow or from the MetaNetx website. To use the tool set:
mnx_folder= the path to a folder containing MetaNetx flat files. the files must be named as ‘reac_xref.tsv’ and ‘chem_xref.tsv’ or set manually the different path of the flat files with:
mnx_reac= path to the flat file for reactions
mnx_chem= path to the flat file for chemical compounds (species)
- To check the database used in a sbml:
- to check all element of sbml (reaction and species) set:
- to–map=all
- to check only reaction of sbml set:
- to–map=reaction
- to check only species of sbml set:
- to–map=species
- To map a sbml and obtain a file of mapping ids to a given database set:
- to-map:
- as previously explained
- db_out:
- the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
- output:
- the path to the output file
For a given sbml using a specific database.
Return a dictionnary of mapping.
the output is a file with line = reaction_id/or species in sbml, reaction_id/species in db_out database
- ex:
- For a sbml based on kegg database, db_out=metacyc: the output file will contains for ex:
R02283 ACETYLORNTRANSAM-RXN
usage:
padmet convert_sbml_db --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --to-map=STR [-v]
padmet convert_sbml_db --mnx_folder=DIR --sbml=FILE --to-map=STR [-v]
padmet convert_sbml_db --mnx_folder=DIR --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]
padmet convert_sbml_db --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]
options:
-h --help Show help.
--to-map=STR select the part of the sbml to check or convert, must be in ['all', 'reaction', 'species']
--mnx_reac=FILE path to the MetaNetX file for reactions
--mnx_chem=FILE path to the MetaNetX file for compounds
--sbml=FILE path to the sbml file to convert
--output=FILE path to the file containing the mapping, sep = "\t"
--db_out=FILE id of the output database in ["BIGG","METACYC","KEGG"]
-v verbose.
-
padmet.utils.exploration.convert_sbml_db.
check_sbml_db
(sbml_file, to_map, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]¶ Check sbml database of a given sbml.
Parameters: - sbml_file (str) – path to the sbml file to convert
- to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]
- verbose (bool) – if true: more info during process
- mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)
- mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)
- mnx_folder (str) – the path to a folder containing MetaNetx flat files
Returns: (name of the best matching database, dict of matching)
Return type: tuple
-
padmet.utils.exploration.convert_sbml_db.
map_sbml
(sbml_file, to_map, db_out, output, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]¶ map a sbml and obtain a file of mapping ids to a given database.
Parameters: - sbml_file (str) – path to the sbml file to convert
- to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]
- db_out (str) – the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
- output (str) – path to the file containing the mapping, sep = ” “
- verbose (bool) – if true: more info during process
- mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)
- mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)
- mnx_folder (str) – the path to a folder containing MetaNetx flat files
Returns: (name of the best matching database, dict of matching)
Return type: tuple
dendrogram_reactions_distance¶
- Description:
Use reactions.tsv file from compare_padmet.py to create a dendrogram using a Jaccard distance.
From the matrix absence/presence of reactions in different species computes a Jaccard distance between these species. Apply a hierarchical clustering on these data with a complete linkage. Then create a dendrogram. Apply also intervene to create an upset graph on the data.
usage:
padmet dendrogram_reactions_distance --reactions=FILE --output=FOLDER [--padmetRef=STR] [--pvclust] [--upset=INT] [-v]
option:
-h --help Show help.
--reactions=FILE pathname of the file containing reactions in each species of the comparison.
--output=FOLDER path to the output folder.
--pvclust launch pvclust dendrogram using R
--padmetRef=STR path to the padmet Ref file
-u --upset=INT number of cluster in the upset graph.
-v verbose mode.
-
padmet.utils.exploration.dendrogram_reactions_distance.
absent_and_specific_reactions
(reactions_dataframe, output_folder_tree_cluster, output_folder_specific, output_folder_absent, organisms)[source]¶ Compare all cluster one against another.
Parameters: - reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
- output_folder_tree_cluster (str) – path to output tree cluster folder
- output_folder_specific (str) – path to output folder with specific reactions for each species
- output_folder_absent (str) – path to output folder with absent reactions for each species
- organisms (list) – organisms names
-
padmet.utils.exploration.dendrogram_reactions_distance.
add_dendrogram_node_label
(reaction_dendrogram, node_list, reactions_clust, len_longest_cluster_id)[source]¶ Using cluster nodes, add label and reactions number on each node of the dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/43519473
Parameters: - reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
- node_list (list) – cluster nodes
- reactions_clust (dictionary) – reactions in each cluster of the tree
- len_longest_cluster_id (int) – reactions in each cluster of the tree
-
padmet.utils.exploration.dendrogram_reactions_distance.
command_help
()[source]¶ Show help for analysis command.
-
padmet.utils.exploration.dendrogram_reactions_distance.
comparison_cluster
(reactions_clust, output_folder_comparison)[source]¶ Compare all cluster one against another.
Parameters: - reactions_clust (dictionary) – reactions in each cluster of the tree
- output_folder_comparison (str) – path to output folder
-
padmet.utils.exploration.dendrogram_reactions_distance.
create_cluster
(reactions_dataframe, absence_presence_matrix, linkage_matrix)[source]¶ Cut the dendrogram to create clusters.
Parameters: - reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
- absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
- linkage_matrix (ndarray) – linkage matrix
Returns: dendrogram_fclusters – {number used to split the linkage matrix: ndarray with the corresponding clusters}
Return type: dictionary
-
padmet.utils.exploration.dendrogram_reactions_distance.
create_intersection_files
(root, cluster_leaf_species, reactions_dataframe, output_folder_tree_cluster, metacyc_to_ecs)[source]¶ Create intersection files.
Parameters: - root (root) – root of the xml tree
- cluster_leaf_species (dictionary) – for each leaf give the organisms in it
- reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
- output_folder_tree_cluster (str) – path to the output folder
- metacyc_to_ecs (dictionary) – mapping of metayc reaction to EC number
Returns: reactions_clust – reactions in each cluster of the tree
Return type: dictionary
-
padmet.utils.exploration.dendrogram_reactions_distance.
create_intervene_graph
(absence_presence_matrix, reactions_dataframe, temp_data_folder, path_to_intervene, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]¶ Create an upset graph. Deprecated function, no we use supervenn look at create_supervenn function.
Parameters: - absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
- reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
- temp_data_folder (str) – temporary data folder
- path_to_intervene (str) – path to intervene bin
- output_folder_upset (str) – path to output folder
- dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}
- k (int) – number of cluster to create
-
padmet.utils.exploration.dendrogram_reactions_distance.
create_pvclust_dendrogram
(reaction_file, output_folder)[source]¶
-
padmet.utils.exploration.dendrogram_reactions_distance.
create_supervenn
(absence_presence_matrix, reactions_dataframe, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]¶ Create an supervenn graph.
Parameters: - absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
- reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
- output_folder_upset (str) – path to output folder
- dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}
- k (int) – number of cluster to create
-
padmet.utils.exploration.dendrogram_reactions_distance.
dendrogram_reactions_distance_cli
(command_args)[source]¶
-
padmet.utils.exploration.dendrogram_reactions_distance.
getNewick
(node, newick, parentdist, leaf_names)[source]¶ Create a newick file from the root node of the dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/31878514.
Parameters: - node (scipy.cluster.hierarchy.ClusterNode) – root ClusterNode of the scipy tree
- newick (str) – newick string
- parentdist (str) – root ClusterNode distance from the linkage matrix
- leaf_names (list) – list of organism names
-
padmet.utils.exploration.dendrogram_reactions_distance.
hclust_to_xml
(linkage_matrix)[source]¶ Using a distance matrix from scipy linkage, create a xml tree corresponding to the hierarchical clustering. Return the root of the tree.
Parameters: linkage_matrix (ndarray) – linkage matrix Returns: root of the xml tree Return type: root
-
padmet.utils.exploration.dendrogram_reactions_distance.
pvclust_dendrogram
(reactions_dataframe, organisms, output_folder)[source]¶ Using a distance matrix, pvclust R package (with rpy2 package) create a dendrogram with bootstrap values.
Parameters: - reactions_dataframe (pandas.DataFrame) – Reactions absence/presence matrix
- organisms (list) – organisms names
- output_folder (str) – path to the output folder
-
padmet.utils.exploration.dendrogram_reactions_distance.
reaction_figure_creation
(reaction_file, output_folder, upset_cluster=None, padmetRef_file=None, pvclust=None, verbose=False)[source]¶ Create dendrogram, upset figure (if upset argument) and compare reactiosn in species.
Parameters: - reaction_file (str) – path to reaction file
- upset_cluster (int) – the number of cluster you want in the intervene figure
- output_folder (str) – path to output folder
- padmet_ref_file (str) – path to padmet ref file
- pvclust (bool) – boolean to launch or not R pvclust dendrogram
flux_analysis¶
- Description:
1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.
2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.
3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)
usage:
padmet flux_analysis --sbml=FILE
padmet flux_analysis --sbml=FILE --seeds=FILE --targets=FILE [--all_species]
padmet flux_analysis --sbml=FILE --all_species
option:
-h --help Show help.
--sbml=FILE pathname to the sbml file to test for fba and fva.
--seeds=FILE pathname to the sbml file containing the seeds (medium).
--targets=FILE pathname to the sbml file containing the targets.
--all_species allow to make FBA on all the metabolites of the given model.
-
padmet.utils.exploration.flux_analysis.
fba_on_targets
(allspecies, model)[source]¶ for each specie in allspecies, create an objective function with the current species as only product and try to optimze the model and get flux.
Parameters: - allSpecies (list) – list of species ids to test
- model (cobra.model) – Cobra model from a sbml file
-
padmet.utils.exploration.flux_analysis.
flux_analysis
(sbml_file, seeds_file=None, targets_file=None, all_species=False)[source]¶ 1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.
2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.
3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)
Parameters: - sbml_file (str) – path to sbml file to analyse
- seeds_file (str) – path to sbml file with only compounds representing the seeds/growth medium
- targets_file (str) – path to sbml file with only compounds representing the targets to reach
- all_species (bool) – if True will try to create obj function for each compound and return which are reachable by flux.
get_pwy_from_rxn¶
- Description:
- From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio
usage:
padmet get_pwy_from_rxn --reaction_file=FILE --padmetRef=FILE --output=FILE
options:
-h --help Show help.
--reaction_file=FILE pathname of the file containing the reactions id, 1/line
--padmetRef=FILE pathname of the padmet representing the database.
--output=FILE pathname of the file with line = pathway id, all reactions id, reactions ids from reaction file, ratio. sep = "\t"
-
padmet.utils.exploration.get_pwy_from_rxn.
dict_pwys_to_file
(dict_pwy, output)[source]¶ Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()
Parameters: - dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
- output (str) – path to output file
-
padmet.utils.exploration.get_pwy_from_rxn.
extract_pwys
(padmet, reactions)[source]¶ #extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
Parameters: - padmet (padmet.classes.PadmetSpec) – padmet to udpate
- reactions (set) – set of reactions to match with pathways
Returns: dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
Return type: dict
padmet_stats¶
- Description:
- From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio
usage:
padmet get_pwy_from_rxn --reaction_file=FILE --padmetRef=FILE --output=FILE
options:
-h --help Show help.
--reaction_file=FILE pathname of the file containing the reactions id, 1/line
--padmetRef=FILE pathname of the padmet representing the database.
--output=FILE pathname of the file with line = pathway id, all reactions id, reactions ids from reaction file, ratio. sep = "\t"
-
padmet.utils.exploration.get_pwy_from_rxn.
command_help
()[source] Show help for analysis command.
-
padmet.utils.exploration.get_pwy_from_rxn.
dict_pwys_to_file
(dict_pwy, output)[source] Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()
Parameters: - dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
- output (str) – path to output file
-
padmet.utils.exploration.get_pwy_from_rxn.
extract_pwys
(padmet, reactions)[source] #extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
Parameters: - padmet (padmet.classes.PadmetSpec) – padmet to udpate
- reactions (set) – set of reactions to match with pathways
Returns: dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
Return type: dict
-
padmet.utils.exploration.get_pwy_from_rxn.
get_pwy_from_rxn
(padmet, reaction_file, output)[source]
-
padmet.utils.exploration.get_pwy_from_rxn.
get_pwy_from_rxn_cli
(command_args)[source]
padmet_stats¶
- Description:
Create a padmet stats file containing the number of pathways, reactions, genes and compounds inside the padmet.
The input is a padmet file or a folder containing multiple padmets.
Create a tsv file named padmet_stats.tsv where the script have been launched.
- usage:
- padmet padmet_stats –padmet=FILE –output=FOLDER
- option:
- -h –help Show help. -p –padmet=FILE padmet file or folder containing padmet(s). -o –output=FOLDER path to output folder.
-
padmet.utils.exploration.padmet_stats.
compute_stats
(padmet_file_folder, output_folder)[source]¶ Count reactions/pathways/compounds/genes in padmet(s).
Parameters: - padmet_file_folder (str) – path to the padmet file/folder to analyze
- output_folder (str) – path to the output folder
-
padmet.utils.exploration.padmet_stats.
orthology_result
(padmet_file, padmet_names)[source]¶ Count reactions/pathways/compounds/genes in a padmet file.
Parameters: - padmet_file (str) – path to a padmet file
- padmet_names (list) – all the padmet filenames
Returns: Number of reactions given by the other species
Return type: dictionary
-
padmet.utils.exploration.padmet_stats.
padmet_stat
(padmet_file)[source]¶ Count reactions/pathways/compounds/genes in a padmet file.
Parameters: padmet_file (str) – path to a padmet file Returns: [path to padmet, number of pathways, number of reactions, number of genes, number of compounds] Return type: list
prot2genome¶
- Description:
- Prot2Genome contains functions used for blast analysis and padmet enrichment
- usage:
padmet prot2genome –query_faa=FILE –query_ids=FILE/STR –subject_gbk=FILE –subject_fna=FILE –subject_faa=FILE –output_folder=FILE [–cpu=INT] [blastp] [tblastn] [debug] padmet prot2genome –query_faa=FILE –query_ids=FILE/STR –subject_gbk=FILE –subject_fna=FILE –subject_faa=FILE –output_folder=FILE –exonerate=PATH [–cpu=INT] [blastp] [tblastn] [debug] padmet prot2genome –padmet=FOLDER –output=FOLDER padmet prot2genome –studied_organisms=FOLDER –output=FOLDER padmet prot2genome –run=FOLDER –padmetRef=FILE [–cpu=INT] [debug]
- From aucome run fromAucome():
- -1. Extract specifique reactions in spec_reactions folder with extractReactions() -2. Extract genes from spec_reactions files with extractGenes() -3. Run tblastn + exonerate with runAllAnalysis()
- options:
- –query_faa=FILE #TODO. –query_ids=FILE/STR #TODO. –subject_gbk=FILE #TODO. –subject_fna=FILE #TODO. –subject_faa=FILE #TODO. –output_folder=FILE #TODO. –cpu=INT Number of cpu to use for the multiprocessing (if none use 1 cpu). [default: 1] blastp #TODO. tblastn #TODO. debug #TODO.
-
padmet.utils.exploration.prot2genome.
cleanTmp
(tmp_folder)[source]¶ Remove all files from tmp folder
Parameters: tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse
-
padmet.utils.exploration.prot2genome.
createPadmet
(dict_args)[source]¶ function used in mp_createPadmet by each worker the Pool padmet are updated using funciton add_delete_rxn from padmet.utils.connection.manual_curation
-
padmet.utils.exploration.prot2genome.
createSeqFromTblastn
(subject_fna, sseq_seq_faa, exonerate_target_id, start_match, end_match)[source]¶ Use the result from the tBlastn to extract a region from the subject genome. The region extracted corresponds to the match region and 10kb before and 10kb after.
Parameters: - subject_fna (str) – path to subject fasta sequence (genome)
- sseq_seq_faa (str) – path to output fasta sequence
- exonerate_target_id (str) – ID of the contig/scaffold/chromosome where a match has been found
- start_match (int) – start of the match
- end_match (int) – end of the match
-
padmet.utils.exploration.prot2genome.
extractAnalysis
(blast_analysis_folder, spec_reactions_folder, output_folder)[source]¶ - For each analysis output in blast analysis folder, obtained with runAllAnalysis()
1./ Extract orthologues hit 2./ For each specific reactions from spec_reactions_folder, if all genes of a reactions got ortho hit
add reaction to reactions_to_add
Parameters: - blast_analysis_folder (str) – path folder with all blast analysis output files
- spec_reactions_folder (str) – path folder with all files containing specific reactions
- output_folder (str) – path folder where to extract all reactions to add
-
padmet.utils.exploration.prot2genome.
extractGenes
(reactions_file)[source]¶ Extract genes ids and return a list from reactions_file obtained with extractReactions()
Parameters: reactions_file (str) – path to reaction file
-
padmet.utils.exploration.prot2genome.
extractReactions
(dict_args)[source]¶ function used in mp_cextractReactions by each worker the Pool for org_a.padmet and org_b.padmet:
1./ extract reactions and specific reactiosn (not in a, not in b) 2./ extract genes associated to specific reactions 3./ Select only reactions if they are from annotation rxn-1 in org_a but not in org_b, if rxn-1 doesn’t come from org_a annotation, skip the reaction 4./ create output file: header = [“reaction_id”, “genes_ids”, “sources”]
-
padmet.utils.exploration.prot2genome.
extract_sequence
(exonerate_output, exonerate_sequence)[source]¶ Extract protein sequence from exonerate ouput.
-
padmet.utils.exploration.prot2genome.
fromAucome
(run_folder, cpu, padmetRef, blastp=True, tblastn=True, exonerate=True, keep_tmp=False, debug=False)[source]¶ This function fit an AuCoMe run. Select a aucome run folder and then the function will: 1./ For each couple of studied organisms, extract specific reactions
ex: For org A and org B, extract reactions in org A but not in org B and vice versa2./ Then for each specific reactions, extract genes associated and run blastp, tblastn and exonerate 3./ For each reaction, for all genes associated, if no blastp match but tblastn and exonerate hit select the reaction as a hit 4./ Create a new padmet file with the new reactions to add within
Parameters: - run_folder (str) – path to aucome run folder
- cpu (int) – number of cpu to use for multiprocessing steps
- padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files
- blastp (bool) – If true run blastp during analysis
- tblastn (bool) – If true run tblastn during analysis
- exonerate (bool) – If true run exonerate during analysis, tblastn must also be True
- keep_tmp (bool) – If true keep temporary files of analysis (with predicted gene sequence)
- debug (bool) – if true, print all raw informations of analysis
-
padmet.utils.exploration.prot2genome.
mp_createPadmet
(reactions_to_add_folder, padmet_folder, output_folder, padmetRef, pool, verbose=False)[source]¶ Update all padmet in padmet_folder with reactions to add from file in reactiosn_to_add_folder, the informations of the reactions are extracted from padmetRef as unique source ex: for padmet_folder/org_a.padmet, select reactions_to_add_folder/org_a.tsv, add each reactions listed in this file based on padmetRef to create output_folder/org_a.padmet Create the padmet files in multiprocess, the more cpu the more new padmet files will be created faster
Parameters: - reactions_to_add_folder (str) – path folder with all files containing reactions to add for each studied organism
- padmet_folder (str) – path to folder with all padmet files of studied organism
- output_folder (str) – path to output folder where to create new padmet files
- padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files
- pool (Pool object) – pool object of multiprocessing
- verbose (bool) – verbose
-
padmet.utils.exploration.prot2genome.
mp_extractReactions
(padmet_folder, output_folder, pool)[source]¶ From a folder of padmet files, create all dual combination and extract specific reactions to create a file in output_folder ex: in padmet_folder: org_a.padmet, org_b.padmet, create: output_folder: org_a_vs_org_b.tsv and org_b_vs_org_a.tsv
Parameters: - padmet_folder (str) – path to folder with all padmet files of studied organism
- output_folder (str) – path to output folder where to extract specific reactions
- pool (Pool object) – pool object of multiprocessing
-
padmet.utils.exploration.prot2genome.
mp_runAnalysis
(spec_reactions_folder, studied_organisms_folder, output_folder, tmp_folder, pool, blastp, tblastn, exonerate, keep_tmp, debug)[source]¶ Run different blast analysis based on files representing specific reactions of 2 padmet files. For each specific reaction file in spec_reactions_folder (ex: org_a_vs_org_b.tsv):
- 1./ search for:
faa file of org_a (studied_organisms_folder/org_a/org_a.faa) gbk file of org_b (studied_organisms_folder/org_b/org_b.gbk) faa file of org_b (studied_organisms_folder/org_b/org_b.faa) fna file of org_b (studied_organisms_folder/org_b/org_b.fna)
if fna doesn’t exist create it
2./ if output file (blast_analysis_folder/org_a_VS_org_b.tsv) doesn’t already exist run analysis 3./ extracts all genes ids from specific reaction file with fct extractGenes() 4./ Run blastp, tblastn, exonerate on gene_id.faa vs target.faa / fna with runAllAnalysis() 5./ Create analysis output The analysis create a lot of temp files, all are in tmp_folder wich is cleaned after all loop
Parameters: - spec_reactions_older (str) – path folder with all files containing specific reactions
- studied_organisms_folder (str) – path to folder with all data of studied organisms. Folder contains 1 folder by org with name as org name, in each: org.gbk,org.faa,org.fna
- output_folder (str) – path to output folder where to extract blast analysis
- tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse
- pool (Pool object) – pool object of multiprocessing
- blastp (bool) – If true run blastp during analysis
- tblastn (bool) – If true run tblastn during analysis
- exonerate (bool) – If true run exonerate during analysis, tblastn must also be True
- keep_tmp (bool) – If true keep temporary files of analysis (with predicted gene sequence)
- debug (bool) – if true, print all raw informations of analysis
-
padmet.utils.exploration.prot2genome.
runAllAnalysis
(dict_args)[source]¶ - For a given gene query id:
- 1/ extract from query_faa the sequence and create a faa file output_folder/query_id.faa
- If isoforms found, also search for each specific isoform
2/ if blastp, run blastp; if tblastn, run tblastn; if exonerate and tblastn has hit, run exonerate Run all of them and extract output as dict of data
Returns: list of dict with all analysis output Return type: list
-
padmet.utils.exploration.prot2genome.
runBlastp
(query_seq_faa, subject_faa, header=['sseqid', 'evalue', 'bitscore'], debug=False)[source]¶ Run blastp on querry_seq vs subectj faa and return output based on header Use NcbiblastpCommandline fct and extract output Extract 1st best hit based on bitscore
Parameters: - query_seq_faa (str) – path to query fasta sequence
- subject_faa (str) – path to subject fasta sequence
- header (list) – output format of blastp
- debug (bool) – if true print all raw blastp output
Returns: dict of the best blastp hit, add ‘blastp_’ tag, or empty dict if no hit
Return type: dict
-
padmet.utils.exploration.prot2genome.
runExonerate
(query_seq_faa, sseq_seq_faa, output, debug=False)[source]¶ Run exonerate on querry_seq vs subject faa Exonerate must be installed, and the global var PATH must be update with the exonerate/bin/ command ‘exonerate’ should work from shell sseq_seq_faa is obtained after tblastn run based on tblastn_sseqid value
Parameters: - query_seq_faa (str) – path to query fasta sequence
- sseq_seq_faa (str) – path to subject faa sequence
- output (str) – path to exonerate output
- debug (bool) – if true print all raw exonerate output
Returns: dict of the best exonerate hit, add ‘exonerate_’ tag, or empty dict if no hit
Return type: dict
-
padmet.utils.exploration.prot2genome.
runSearchOnProteome
(proteome_orgA, genome_orgB, output_folder, proteome_orgB=None)[source]¶ From a proteome of OrgA search for missing structural annotation in genome of OrgB. First launch Blastp between proteome of OrgA and proteome of OrgB. Then launch tBlastn between proteome of OrgA and genome of OrgB to find matches. Use the best match to extract a region from the genome of OrgB. Then launch Exonerate on this region using the sequence of OrgA.
Parameters: - proteome_orgA (str) – path to fasta file of proteome of OrgA
- genome_orgB (str) – path to fasta file of genome of OrgB
- output_folder (str) – path to output folder
- proteome_orgB (str) – path to fasta file of proteome of OrgB
-
padmet.utils.exploration.prot2genome.
runTblastn
(query_seq_faa, subject_fna, header=['sseqid', 'evalue', 'bitscore', 'sstart', 'send'], debug=False)[source]¶ Run tblastn on querry_seq vs subectj fna and return output based on header Use NcbitblastnCommandline fct and extract output Extract 1st best hit based on bitscore
Parameters: - query_seq_faa (str) – path to query fasta sequence
- subject_fna (str) – path to subject fna sequence
- header (list) – output format of tblastn
- debug (bool) – if true print all raw tblastn output
Returns: dict of the best tblastn hit, add ‘tblastn_’ tag, or empty dict if no hit
Return type: dict
visu_path¶
- Description:
- Allows to visualize a pathway in padmet network.
Color code: reactions associated to the pathway, present in the network: lightgreen reactions associated to the pathway, not present in the network: red compounds: skyblue
usage:
padmet visu_path --padmetSpec=FILE/FOLDER --padmetRef=FILE --pathway=ID --output=FILE [--hide-currency] [--level=STR]
options:
-h --help Show help.
--padmetSpec=FILE/FOLDER pathname to the PADMet file of the network or to a folder containing multiple padmets.
--padmetRef=FILE pathname to the PADMet file of the db of reference.
--pathway=ID pathway id to visualize, can be multiple pathways separated by a ",".
--output=FILE pathname to the output file (extension can be .png or .svg).
--hide-currency hide currency metabolites.
--level=STR level of precision for the visualization (compound or pathway). By default visualization uses "compound".
-
padmet.utils.exploration.visu_path.
visu_path_compounds
(padmet_pathname, padmet_ref_pathname, pathway_ids, output_file, hide_currency_metabolites=None)[source]¶ Extract reactions from pathway and create a comppound/reaction graph.
Parameters: - padmet_pathname (str) – pathname of the padmet file or a folder containing multiple padmet
- padmet_ref_pathname (str) – pathname of the padmetRef file
- pathway_ids (str) – name of the pathway (can be multiple pathways separated by a ‘,’)
- output_file (str) – pathname of the output picture (extension can be .png or .svg)
- hide_currency_metabolites (bool) – hide currency metabolites
-
padmet.utils.exploration.visu_path.
visu_path_pathways
(padmet_pathname, padmet_ref_pathname, pathway_ids, output_file)[source]¶ Extract reactions from pathway and create a comppound/reaction graph.
Parameters: - padmet_pathname (str) – pathname of the padmet file or a folder containing multiple padmet
- padmet_ref_pathname (str) – pathname of the padmetRef file
- pathway_ids (str) – name of the pathway (can be multiple pathways separated by a ‘,’)
- output_file (str) – pathname of the output picture (extension can be .png or .svg)
- hide_compounds (bool) – hide common compounds (like water or proton)