Exploration

Description:

#TODO

compare_padmet

Description:

#Compare 1-n padmet and create a folder output with files: genes.tsv:

fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, 1 (if in padmet_a), 1 (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]
reactions.tsv:
fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, 1 (if in padmet_a), 1 (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
pathways.tsv:
fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
compounds.tsv:
fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’‘,rxn-1,’‘]
usage:
    padmet compare_padmet --padmet=FILES/DIR --output=DIR [--padmetRef=FILE] [--cpu INT] [-v]

option:
    -h --help    Show help.
    --padmet=FILES/DIR    pathname of the padmet files, sep all files by ',', ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
    --output=DIR    pathname of the output folder
    --padmetRef=FILE    pathanme of the database ref in padmet
    --cpu INT    number of CPU to use in multiprocessing
padmet.utils.exploration.compare_padmet.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.compare_padmet.compare_padmet(padmet_path, output, padmetRef=None, verbose=False, number_cpu=None)[source]

#Compare 1-n padmet and create a folder output with files: genes.tsv:

fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, 1 (if in padmet_a), 1 (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]
reactions.tsv:
fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, 1 (if in padmet_a), 1 (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
pathways.tsv:
fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
compounds.tsv:
fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’‘,rxn-1,’‘]
Parameters:
  • padmet_path (str) – pathname of the padmet files, sep all files by ‘,’, ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
  • output (str) – pathname of the output folder
  • padmetRef (padmet.classes.PadmetRef) – padmet containing the database of reference, need to calculat pathway completion rate
  • verbose (bool) – if True print information
padmet.utils.exploration.compare_padmet.compare_padmet_cli(command_args)[source]
padmet.utils.exploration.compare_padmet.extract_information_padmet(file_path, padmetRef, verbose)[source]
padmet.utils.exploration.compare_padmet.merge_dicts(element_dict, tmp_dict)[source]

compare_sbml

Description:

compare reactions in 1-n or 2 sbml.

Returns if a reaction is missing

And if a reaction with the same id is using different species or different reversibility

usage:
    padmet compare_sbml --sbml=FILES/DIR --output=DIR

option:
    -h --help    Show help.
    --sbml FILES/DIR    pathname of the sbml files, sep all files by ',', ex: /path/sbml1.sbml;/path/sbml2.sbml OR a folder
    --output DIR    pathname of the output folder
padmet.utils.exploration.compare_sbml.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.compare_sbml.compare_multiple_sbml(sbml_path, output_folder)[source]

Compare 1-n sbml, create two output files reactions.tsv and metabolites.tsv with the reactions/metabolites in each sbml

Parameters:
  • sbml_path (str) – path to a folder containing sbmls or multiple sbml paths separated by a ‘,’
  • output_folder (str) – path to the output folder
padmet.utils.exploration.compare_sbml.compare_rxn(rxn1, rxn2)[source]

compare two cobra reaction object and return (same_cpd, same_rev) same_cpd: bool, if true means same compounds consumed and produced same_reve: bool, if true means same direction of reaction (reversible or not)

Parameters:
  • rxn1 (cobra.model.reaction) – reaction as cobra object
  • rxn2 (cobra.model.reaction) – reaction as cobra object
Returns:

(same_cpd (bool), same_rev (bool))

Return type:

tuple

padmet.utils.exploration.compare_sbml.compare_sbml(sbml1_path, sbml2_path)[source]

Compare 2 sbml, print nb of metabolites and reactions. If reaction missing print reaction id, and reaction formula.

Parameters:
  • sbml1_path (str) – path to the first sbml file to compare
  • sbml2_path (str) – path to the second sbml file to compare
padmet.utils.exploration.compare_sbml.compare_sbml_cli(command_args)[source]

compare_sbml_padmet

Description:
compare reactions in sbml and padmet file
usage:
    padmet compare_sbml_padmet --padmet=FILE --sbml=FILE

option:
    -h --help    Show help.
    --padmet=FILE    path of the padmet file
    --sbml=FILE    path of the sbml file
padmet.utils.exploration.compare_sbml_padmet.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.compare_sbml_padmet.compare_sbml_padmet(sbml_document, padmet)[source]

compare reactions ids in sbml vs padmet, return nb of reactions in both and reactions id not in sbml or not in padmet

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate
  • sbml_file (libsbml.document) – sbml document
padmet.utils.exploration.compare_sbml_padmet.compare_sbml_padmet_cli(command_args)[source]

convert_sbml_db

Description:

This tool is use the MetaNetX database to check or convert a sbml. Flat files from MetaNetx are required to run this tool. They can be found in the aureme workflow or from the MetaNetx website. To use the tool set:

mnx_folder= the path to a folder containing MetaNetx flat files. the files must be named as ‘reac_xref.tsv’ and ‘chem_xref.tsv’ or set manually the different path of the flat files with:

mnx_reac= path to the flat file for reactions

mnx_chem= path to the flat file for chemical compounds (species)

To check the database used in a sbml:
to check all element of sbml (reaction and species) set:
to–map=all
to check only reaction of sbml set:
to–map=reaction
to check only species of sbml set:
to–map=species
To map a sbml and obtain a file of mapping ids to a given database set:
to-map:
as previously explained
db_out:
the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
output:
the path to the output file

For a given sbml using a specific database.

Return a dictionnary of mapping.

the output is a file with line = reaction_id/or species in sbml, reaction_id/species in db_out database

ex:
For a sbml based on kegg database, db_out=metacyc: the output file will contains for ex:

R02283 ACETYLORNTRANSAM-RXN

usage:
    padmet convert_sbml_db --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --to-map=STR [-v]
    padmet convert_sbml_db --mnx_folder=DIR --sbml=FILE --to-map=STR [-v]
    padmet convert_sbml_db --mnx_folder=DIR --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]
    padmet convert_sbml_db --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]

options:
    -h --help     Show help.
    --to-map=STR     select the part of the sbml to check or convert, must be in ['all', 'reaction', 'species']
    --mnx_reac=FILE     path to the MetaNetX file for reactions
    --mnx_chem=FILE     path to the MetaNetX file for compounds
    --sbml=FILE     path to the sbml file to convert
    --output=FILE     path to the file containing the mapping, sep = "\t"
    --db_out=FILE     id of the output database in ["BIGG","METACYC","KEGG"]
    -v     verbose.
padmet.utils.exploration.convert_sbml_db.check_sbml_db(sbml_file, to_map, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]

Check sbml database of a given sbml.

Parameters:
  • sbml_file (str) – path to the sbml file to convert
  • to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]
  • verbose (bool) – if true: more info during process
  • mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)
  • mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)
  • mnx_folder (str) – the path to a folder containing MetaNetx flat files
Returns:

(name of the best matching database, dict of matching)

Return type:

tuple

padmet.utils.exploration.convert_sbml_db.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.convert_sbml_db.convert_sbml_db_cli()[source]
padmet.utils.exploration.convert_sbml_db.get_from_mnx(mnx_dict, element_id, db_out)[source]

#TODO

padmet.utils.exploration.convert_sbml_db.intern_mapping(id_to_map, db_out, _type)[source]

#TODO

padmet.utils.exploration.convert_sbml_db.map_sbml(sbml_file, to_map, db_out, output, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]

map a sbml and obtain a file of mapping ids to a given database.

Parameters:
  • sbml_file (str) – path to the sbml file to convert
  • to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]
  • db_out (str) – the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
  • output (str) – path to the file containing the mapping, sep = ” “
  • verbose (bool) – if true: more info during process
  • mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)
  • mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)
  • mnx_folder (str) – the path to a folder containing MetaNetx flat files
Returns:

(name of the best matching database, dict of matching)

Return type:

tuple

padmet.utils.exploration.convert_sbml_db.mnx_reader(input_file, db_out)[source]

#TODO

dendrogram_reactions_distance

Description:

Use reactions.tsv file from compare_padmet.py to create a dendrogram using a Jaccard distance.

From the matrix absence/presence of reactions in different species computes a Jaccard distance between these species. Apply a hierarchical clustering on these data with a complete linkage. Then create a dendrogram. Apply also intervene to create an upset graph on the data.

usage:
    padmet dendrogram_reactions_distance --reactions=FILE --output=FOLDER [--padmetRef=STR] [--pvclust] [--upset=INT] [-v]

option:
    -h --help    Show help.
    --reactions=FILE    pathname of the file containing reactions in each species of the comparison.
    --output=FOLDER    path to the output folder.
    --pvclust    launch pvclust dendrogram using R
    --padmetRef=STR    path to the padmet Ref file
    -u --upset=INT    number of cluster in the upset graph.
    -v    verbose mode.
padmet.utils.exploration.dendrogram_reactions_distance.absent_and_specific_reactions(reactions_dataframe, output_folder_tree_cluster, output_folder_specific, output_folder_absent, organisms)[source]

Compare all cluster one against another.

Parameters:
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • output_folder_tree_cluster (str) – path to output tree cluster folder
  • output_folder_specific (str) – path to output folder with specific reactions for each species
  • output_folder_absent (str) – path to output folder with absent reactions for each species
  • organisms (list) – organisms names
padmet.utils.exploration.dendrogram_reactions_distance.add_dendrogram_node_label(reaction_dendrogram, node_list, reactions_clust, len_longest_cluster_id)[source]

Using cluster nodes, add label and reactions number on each node of the dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/43519473

Parameters:
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • node_list (list) – cluster nodes
  • reactions_clust (dictionary) – reactions in each cluster of the tree
  • len_longest_cluster_id (int) – reactions in each cluster of the tree
padmet.utils.exploration.dendrogram_reactions_distance.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.dendrogram_reactions_distance.comparison_cluster(reactions_clust, output_folder_comparison)[source]

Compare all cluster one against another.

Parameters:
  • reactions_clust (dictionary) – reactions in each cluster of the tree
  • output_folder_comparison (str) – path to output folder
padmet.utils.exploration.dendrogram_reactions_distance.create_cluster(reactions_dataframe, absence_presence_matrix, linkage_matrix)[source]

Cut the dendrogram to create clusters.

Parameters:
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
  • linkage_matrix (ndarray) – linkage matrix
Returns:

dendrogram_fclusters – {number used to split the linkage matrix: ndarray with the corresponding clusters}

Return type:

dictionary

padmet.utils.exploration.dendrogram_reactions_distance.create_intersection_files(root, cluster_leaf_species, reactions_dataframe, output_folder_tree_cluster, metacyc_to_ecs)[source]

Create intersection files.

Parameters:
  • root (root) – root of the xml tree
  • cluster_leaf_species (dictionary) – for each leaf give the organisms in it
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • output_folder_tree_cluster (str) – path to the output folder
  • metacyc_to_ecs (dictionary) – mapping of metayc reaction to EC number
Returns:

reactions_clust – reactions in each cluster of the tree

Return type:

dictionary

padmet.utils.exploration.dendrogram_reactions_distance.create_intervene_graph(absence_presence_matrix, reactions_dataframe, temp_data_folder, path_to_intervene, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]

Create an upset graph. Deprecated function, no we use supervenn look at create_supervenn function.

Parameters:
  • absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • temp_data_folder (str) – temporary data folder
  • path_to_intervene (str) – path to intervene bin
  • output_folder_upset (str) – path to output folder
  • dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}
  • k (int) – number of cluster to create
padmet.utils.exploration.dendrogram_reactions_distance.create_pvclust_dendrogram(reaction_file, output_folder)[source]
padmet.utils.exploration.dendrogram_reactions_distance.create_supervenn(absence_presence_matrix, reactions_dataframe, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]

Create an supervenn graph.

Parameters:
  • absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • output_folder_upset (str) – path to output folder
  • dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}
  • k (int) – number of cluster to create
padmet.utils.exploration.dendrogram_reactions_distance.dendrogram_reactions_distance_cli(command_args)[source]
padmet.utils.exploration.dendrogram_reactions_distance.getNewick(node, newick, parentdist, leaf_names)[source]

Create a newick file from the root node of the dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/31878514.

Parameters:
  • node (scipy.cluster.hierarchy.ClusterNode) – root ClusterNode of the scipy tree
  • newick (str) – newick string
  • parentdist (str) – root ClusterNode distance from the linkage matrix
  • leaf_names (list) – list of organism names
padmet.utils.exploration.dendrogram_reactions_distance.hclust_to_xml(linkage_matrix)[source]

Using a distance matrix from scipy linkage, create a xml tree corresponding to the hierarchical clustering. Return the root of the tree.

Parameters:linkage_matrix (ndarray) – linkage matrix
Returns:root of the xml tree
Return type:root
padmet.utils.exploration.dendrogram_reactions_distance.pvclust_dendrogram(reactions_dataframe, organisms, output_folder)[source]

Using a distance matrix, pvclust R package (with rpy2 package) create a dendrogram with bootstrap values.

Parameters:
  • reactions_dataframe (pandas.DataFrame) – Reactions absence/presence matrix
  • organisms (list) – organisms names
  • output_folder (str) – path to the output folder
padmet.utils.exploration.dendrogram_reactions_distance.reaction_figure_creation(reaction_file, output_folder, upset_cluster=None, padmetRef_file=None, pvclust=None, verbose=False)[source]

Create dendrogram, upset figure (if upset argument) and compare reactiosn in species.

Parameters:
  • reaction_file (str) – path to reaction file
  • upset_cluster (int) – the number of cluster you want in the intervene figure
  • output_folder (str) – path to output folder
  • padmet_ref_file (str) – path to padmet ref file
  • pvclust (bool) – boolean to launch or not R pvclust dendrogram

flux_analysis

Description:

1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.

2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.

3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)

usage:
    padmet flux_analysis --sbml=FILE
    padmet flux_analysis --sbml=FILE --seeds=FILE --targets=FILE [--all_species]
    padmet flux_analysis --sbml=FILE --all_species

option:
    -h --help    Show help.
    --sbml=FILE    pathname to the sbml file to test for fba and fva.
    --seeds=FILE    pathname to the sbml file containing the seeds (medium).
    --targets=FILE    pathname to the sbml file containing the targets.
    --all_species    allow to make FBA on all the metabolites of the given model.
padmet.utils.exploration.flux_analysis.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.flux_analysis.fba_on_targets(allspecies, model)[source]

for each specie in allspecies, create an objective function with the current species as only product and try to optimze the model and get flux.

Parameters:
  • allSpecies (list) – list of species ids to test
  • model (cobra.model) – Cobra model from a sbml file
padmet.utils.exploration.flux_analysis.flux_analysis(sbml_file, seeds_file=None, targets_file=None, all_species=False)[source]

1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.

2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.

3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)

Parameters:
  • sbml_file (str) – path to sbml file to analyse
  • seeds_file (str) – path to sbml file with only compounds representing the seeds/growth medium
  • targets_file (str) – path to sbml file with only compounds representing the targets to reach
  • all_species (bool) – if True will try to create obj function for each compound and return which are reachable by flux.
padmet.utils.exploration.flux_analysis.flux_analysis_cli(command_args)[source]

get_pwy_from_rxn

Description:
From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio
usage:
    padmet get_pwy_from_rxn --reaction_file=FILE --padmetRef=FILE  --output=FILE

options:
    -h --help     Show help.
    --reaction_file=FILE    pathname of the file containing the reactions id, 1/line
    --padmetRef=FILE    pathname of the padmet representing the database.
    --output=FILE    pathname of the file with line = pathway id, all reactions id, reactions ids from reaction file, ratio. sep = "\t"
padmet.utils.exploration.get_pwy_from_rxn.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.get_pwy_from_rxn.dict_pwys_to_file(dict_pwy, output)[source]

Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()

Parameters:
  • dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
  • output (str) – path to output file
padmet.utils.exploration.get_pwy_from_rxn.extract_pwys(padmet, reactions)[source]

#extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate
  • reactions (set) – set of reactions to match with pathways
Returns:

dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Return type:

dict

padmet.utils.exploration.get_pwy_from_rxn.get_pwy_from_rxn(padmet, reaction_file, output)[source]
padmet.utils.exploration.get_pwy_from_rxn.get_pwy_from_rxn_cli(command_args)[source]

padmet_stats

Description:
From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio
usage:
    padmet get_pwy_from_rxn --reaction_file=FILE --padmetRef=FILE  --output=FILE

options:
    -h --help     Show help.
    --reaction_file=FILE    pathname of the file containing the reactions id, 1/line
    --padmetRef=FILE    pathname of the padmet representing the database.
    --output=FILE    pathname of the file with line = pathway id, all reactions id, reactions ids from reaction file, ratio. sep = "\t"
padmet.utils.exploration.get_pwy_from_rxn.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.get_pwy_from_rxn.dict_pwys_to_file(dict_pwy, output)[source]

Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()

Parameters:
  • dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
  • output (str) – path to output file
padmet.utils.exploration.get_pwy_from_rxn.extract_pwys(padmet, reactions)[source]

#extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate
  • reactions (set) – set of reactions to match with pathways
Returns:

dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Return type:

dict

padmet.utils.exploration.get_pwy_from_rxn.get_pwy_from_rxn(padmet, reaction_file, output)[source]
padmet.utils.exploration.get_pwy_from_rxn.get_pwy_from_rxn_cli(command_args)[source]

padmet_stats

Description:

Create a padmet stats file containing the number of pathways, reactions, genes and compounds inside the padmet.

The input is a padmet file or a folder containing multiple padmets.

Create a tsv file named padmet_stats.tsv where the script have been launched.

usage:
padmet padmet_stats –padmet=FILE –output=FOLDER
option:
-h –help Show help. -p –padmet=FILE padmet file or folder containing padmet(s). -o –output=FOLDER path to output folder.
padmet.utils.exploration.padmet_stats.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.padmet_stats.compute_stats(padmet_file_folder, output_folder)[source]

Count reactions/pathways/compounds/genes in padmet(s).

Parameters:
  • padmet_file_folder (str) – path to the padmet file/folder to analyze
  • output_folder (str) – path to the output folder
padmet.utils.exploration.padmet_stats.orthology_result(padmet_file, padmet_names)[source]

Count reactions/pathways/compounds/genes in a padmet file.

Parameters:
  • padmet_file (str) – path to a padmet file
  • padmet_names (list) – all the padmet filenames
Returns:

Number of reactions given by the other species

Return type:

dictionary

padmet.utils.exploration.padmet_stats.padmet_stat(padmet_file)[source]

Count reactions/pathways/compounds/genes in a padmet file.

Parameters:padmet_file (str) – path to a padmet file
Returns:[path to padmet, number of pathways, number of reactions, number of genes, number of compounds]
Return type:list
padmet.utils.exploration.padmet_stats.padmet_stats_cli(command_args)[source]

prot2genome

Description:
Prot2Genome contains functions used for blast analysis and padmet enrichment
usage:

padmet prot2genome –query_faa=FILE –query_ids=FILE/STR –subject_gbk=FILE –subject_fna=FILE –subject_faa=FILE –output_folder=FILE [–cpu=INT] [blastp] [tblastn] [debug] padmet prot2genome –query_faa=FILE –query_ids=FILE/STR –subject_gbk=FILE –subject_fna=FILE –subject_faa=FILE –output_folder=FILE –exonerate=PATH [–cpu=INT] [blastp] [tblastn] [debug] padmet prot2genome –padmet=FOLDER –output=FOLDER padmet prot2genome –studied_organisms=FOLDER –output=FOLDER padmet prot2genome –run=FOLDER –padmetRef=FILE [–cpu=INT] [debug]

From aucome run fromAucome():
-1. Extract specifique reactions in spec_reactions folder with extractReactions() -2. Extract genes from spec_reactions files with extractGenes() -3. Run tblastn + exonerate with runAllAnalysis()
options:
–query_faa=FILE #TODO. –query_ids=FILE/STR #TODO. –subject_gbk=FILE #TODO. –subject_fna=FILE #TODO. –subject_faa=FILE #TODO. –output_folder=FILE #TODO. –cpu=INT Number of cpu to use for the multiprocessing (if none use 1 cpu). [default: 1] blastp #TODO. tblastn #TODO. debug #TODO.
padmet.utils.exploration.prot2genome.analysisOutput(analysis_result, analysis_output)[source]
padmet.utils.exploration.prot2genome.cleanTmp(tmp_folder)[source]

Remove all files from tmp folder

Parameters:tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse
padmet.utils.exploration.prot2genome.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.prot2genome.createPadmet(dict_args)[source]

function used in mp_createPadmet by each worker the Pool padmet are updated using funciton add_delete_rxn from padmet.utils.connection.manual_curation

padmet.utils.exploration.prot2genome.createSeqFromTblastn(subject_fna, sseq_seq_faa, exonerate_target_id, start_match, end_match)[source]

Use the result from the tBlastn to extract a region from the subject genome. The region extracted corresponds to the match region and 10kb before and 10kb after.

Parameters:
  • subject_fna (str) – path to subject fasta sequence (genome)
  • sseq_seq_faa (str) – path to output fasta sequence
  • exonerate_target_id (str) – ID of the contig/scaffold/chromosome where a match has been found
  • start_match (int) – start of the match
  • end_match (int) – end of the match
padmet.utils.exploration.prot2genome.extractAnalysis(blast_analysis_folder, spec_reactions_folder, output_folder)[source]
For each analysis output in blast analysis folder, obtained with runAllAnalysis()

1./ Extract orthologues hit 2./ For each specific reactions from spec_reactions_folder, if all genes of a reactions got ortho hit

add reaction to reactions_to_add
Parameters:
  • blast_analysis_folder (str) – path folder with all blast analysis output files
  • spec_reactions_folder (str) – path folder with all files containing specific reactions
  • output_folder (str) – path folder where to extract all reactions to add
padmet.utils.exploration.prot2genome.extractGenes(reactions_file)[source]

Extract genes ids and return a list from reactions_file obtained with extractReactions()

Parameters:reactions_file (str) – path to reaction file
padmet.utils.exploration.prot2genome.extractReactions(dict_args)[source]

function used in mp_cextractReactions by each worker the Pool for org_a.padmet and org_b.padmet:

1./ extract reactions and specific reactiosn (not in a, not in b) 2./ extract genes associated to specific reactions 3./ Select only reactions if they are from annotation rxn-1 in org_a but not in org_b, if rxn-1 doesn’t come from org_a annotation, skip the reaction 4./ create output file: header = [“reaction_id”, “genes_ids”, “sources”]
padmet.utils.exploration.prot2genome.extract_sequence(exonerate_output, exonerate_sequence)[source]

Extract protein sequence from exonerate ouput.

padmet.utils.exploration.prot2genome.fromAucome(run_folder, cpu, padmetRef, blastp=True, tblastn=True, exonerate=True, keep_tmp=False, debug=False)[source]

This function fit an AuCoMe run. Select a aucome run folder and then the function will: 1./ For each couple of studied organisms, extract specific reactions

ex: For org A and org B, extract reactions in org A but not in org B and vice versa

2./ Then for each specific reactions, extract genes associated and run blastp, tblastn and exonerate 3./ For each reaction, for all genes associated, if no blastp match but tblastn and exonerate hit select the reaction as a hit 4./ Create a new padmet file with the new reactions to add within

Parameters:
  • run_folder (str) – path to aucome run folder
  • cpu (int) – number of cpu to use for multiprocessing steps
  • padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files
  • blastp (bool) – If true run blastp during analysis
  • tblastn (bool) – If true run tblastn during analysis
  • exonerate (bool) – If true run exonerate during analysis, tblastn must also be True
  • keep_tmp (bool) – If true keep temporary files of analysis (with predicted gene sequence)
  • debug (bool) – if true, print all raw informations of analysis
padmet.utils.exploration.prot2genome.mp_createPadmet(reactions_to_add_folder, padmet_folder, output_folder, padmetRef, pool, verbose=False)[source]

Update all padmet in padmet_folder with reactions to add from file in reactiosn_to_add_folder, the informations of the reactions are extracted from padmetRef as unique source ex: for padmet_folder/org_a.padmet, select reactions_to_add_folder/org_a.tsv, add each reactions listed in this file based on padmetRef to create output_folder/org_a.padmet Create the padmet files in multiprocess, the more cpu the more new padmet files will be created faster

Parameters:
  • reactions_to_add_folder (str) – path folder with all files containing reactions to add for each studied organism
  • padmet_folder (str) – path to folder with all padmet files of studied organism
  • output_folder (str) – path to output folder where to create new padmet files
  • padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files
  • pool (Pool object) – pool object of multiprocessing
  • verbose (bool) – verbose
padmet.utils.exploration.prot2genome.mp_extractReactions(padmet_folder, output_folder, pool)[source]

From a folder of padmet files, create all dual combination and extract specific reactions to create a file in output_folder ex: in padmet_folder: org_a.padmet, org_b.padmet, create: output_folder: org_a_vs_org_b.tsv and org_b_vs_org_a.tsv

Parameters:
  • padmet_folder (str) – path to folder with all padmet files of studied organism
  • output_folder (str) – path to output folder where to extract specific reactions
  • pool (Pool object) – pool object of multiprocessing
padmet.utils.exploration.prot2genome.mp_runAnalysis(spec_reactions_folder, studied_organisms_folder, output_folder, tmp_folder, pool, blastp, tblastn, exonerate, keep_tmp, debug)[source]

Run different blast analysis based on files representing specific reactions of 2 padmet files. For each specific reaction file in spec_reactions_folder (ex: org_a_vs_org_b.tsv):

1./ search for:

faa file of org_a (studied_organisms_folder/org_a/org_a.faa) gbk file of org_b (studied_organisms_folder/org_b/org_b.gbk) faa file of org_b (studied_organisms_folder/org_b/org_b.faa) fna file of org_b (studied_organisms_folder/org_b/org_b.fna)

if fna doesn’t exist create it

2./ if output file (blast_analysis_folder/org_a_VS_org_b.tsv) doesn’t already exist run analysis 3./ extracts all genes ids from specific reaction file with fct extractGenes() 4./ Run blastp, tblastn, exonerate on gene_id.faa vs target.faa / fna with runAllAnalysis() 5./ Create analysis output The analysis create a lot of temp files, all are in tmp_folder wich is cleaned after all loop

Parameters:
  • spec_reactions_older (str) – path folder with all files containing specific reactions
  • studied_organisms_folder (str) – path to folder with all data of studied organisms. Folder contains 1 folder by org with name as org name, in each: org.gbk,org.faa,org.fna
  • output_folder (str) – path to output folder where to extract blast analysis
  • tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse
  • pool (Pool object) – pool object of multiprocessing
  • blastp (bool) – If true run blastp during analysis
  • tblastn (bool) – If true run tblastn during analysis
  • exonerate (bool) – If true run exonerate during analysis, tblastn must also be True
  • keep_tmp (bool) – If true keep temporary files of analysis (with predicted gene sequence)
  • debug (bool) – if true, print all raw informations of analysis
padmet.utils.exploration.prot2genome.prot2genome_cli(command_args)[source]
padmet.utils.exploration.prot2genome.runAllAnalysis(dict_args)[source]
For a given gene query id:
1/ extract from query_faa the sequence and create a faa file output_folder/query_id.faa
If isoforms found, also search for each specific isoform

2/ if blastp, run blastp; if tblastn, run tblastn; if exonerate and tblastn has hit, run exonerate Run all of them and extract output as dict of data

Returns:list of dict with all analysis output
Return type:list
padmet.utils.exploration.prot2genome.runBlastp(query_seq_faa, subject_faa, header=['sseqid', 'evalue', 'bitscore'], debug=False)[source]

Run blastp on querry_seq vs subectj faa and return output based on header Use NcbiblastpCommandline fct and extract output Extract 1st best hit based on bitscore

Parameters:
  • query_seq_faa (str) – path to query fasta sequence
  • subject_faa (str) – path to subject fasta sequence
  • header (list) – output format of blastp
  • debug (bool) – if true print all raw blastp output
Returns:

dict of the best blastp hit, add ‘blastp_’ tag, or empty dict if no hit

Return type:

dict

padmet.utils.exploration.prot2genome.runExonerate(query_seq_faa, sseq_seq_faa, output, debug=False)[source]

Run exonerate on querry_seq vs subject faa Exonerate must be installed, and the global var PATH must be update with the exonerate/bin/ command ‘exonerate’ should work from shell sseq_seq_faa is obtained after tblastn run based on tblastn_sseqid value

Parameters:
  • query_seq_faa (str) – path to query fasta sequence
  • sseq_seq_faa (str) – path to subject faa sequence
  • output (str) – path to exonerate output
  • debug (bool) – if true print all raw exonerate output
Returns:

dict of the best exonerate hit, add ‘exonerate_’ tag, or empty dict if no hit

Return type:

dict

padmet.utils.exploration.prot2genome.runSearchOnProteome(proteome_orgA, genome_orgB, output_folder, proteome_orgB=None)[source]

From a proteome of OrgA search for missing structural annotation in genome of OrgB. First launch Blastp between proteome of OrgA and proteome of OrgB. Then launch tBlastn between proteome of OrgA and genome of OrgB to find matches. Use the best match to extract a region from the genome of OrgB. Then launch Exonerate on this region using the sequence of OrgA.

Parameters:
  • proteome_orgA (str) – path to fasta file of proteome of OrgA
  • genome_orgB (str) – path to fasta file of genome of OrgB
  • output_folder (str) – path to output folder
  • proteome_orgB (str) – path to fasta file of proteome of OrgB
padmet.utils.exploration.prot2genome.runTblastn(query_seq_faa, subject_fna, header=['sseqid', 'evalue', 'bitscore', 'sstart', 'send'], debug=False)[source]

Run tblastn on querry_seq vs subectj fna and return output based on header Use NcbitblastnCommandline fct and extract output Extract 1st best hit based on bitscore

Parameters:
  • query_seq_faa (str) – path to query fasta sequence
  • subject_fna (str) – path to subject fna sequence
  • header (list) – output format of tblastn
  • debug (bool) – if true print all raw tblastn output
Returns:

dict of the best tblastn hit, add ‘tblastn_’ tag, or empty dict if no hit

Return type:

dict

visu_path

Description:
Allows to visualize a pathway in padmet network.

Color code: reactions associated to the pathway, present in the network: lightgreen reactions associated to the pathway, not present in the network: red compounds: skyblue

usage:
    padmet visu_path --padmetSpec=FILE/FOLDER --padmetRef=FILE --pathway=ID --output=FILE [--hide-currency] [--level=STR]

options:
    -h --help     Show help.
    --padmetSpec=FILE/FOLDER    pathname to the PADMet file of the network or to a folder containing multiple padmets.
    --padmetRef=FILE    pathname to the PADMet file of the db of reference.
    --pathway=ID    pathway id to visualize, can be multiple pathways separated by a ",".
    --output=FILE    pathname to the output file (extension can be .png or .svg).
    --hide-currency    hide currency metabolites.
    --level=STR    level of precision for the visualization (compound or pathway). By default visualization uses "compound".
padmet.utils.exploration.visu_path.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.visu_path.visu_path_cli(command_args)[source]
padmet.utils.exploration.visu_path.visu_path_compounds(padmet_pathname, padmet_ref_pathname, pathway_ids, output_file, hide_currency_metabolites=None)[source]

Extract reactions from pathway and create a comppound/reaction graph.

Parameters:
  • padmet_pathname (str) – pathname of the padmet file or a folder containing multiple padmet
  • padmet_ref_pathname (str) – pathname of the padmetRef file
  • pathway_ids (str) – name of the pathway (can be multiple pathways separated by a ‘,’)
  • output_file (str) – pathname of the output picture (extension can be .png or .svg)
  • hide_currency_metabolites (bool) – hide currency metabolites
padmet.utils.exploration.visu_path.visu_path_pathways(padmet_pathname, padmet_ref_pathname, pathway_ids, output_file)[source]

Extract reactions from pathway and create a comppound/reaction graph.

Parameters:
  • padmet_pathname (str) – pathname of the padmet file or a folder containing multiple padmet
  • padmet_ref_pathname (str) – pathname of the padmetRef file
  • pathway_ids (str) – name of the pathway (can be multiple pathways separated by a ‘,’)
  • output_file (str) – pathname of the output picture (extension can be .png or .svg)
  • hide_compounds (bool) – hide common compounds (like water or proton)