Exploration¶

Description:

#TODO

compare_padmet¶

Description:

#Compare 1-n padmet and create a folder output with files: genes.tsv:

fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, 1 (if in padmet_a), 1 (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]

reactions.tsv:: fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, 1 (if in padmet_a), 1 (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
pathways.tsv:: fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
compounds.tsv:: fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’‘,rxn-1,’‘]

usage:
    padmet compare_padmet --padmet=FILES/DIR --output=DIR [--padmetRef=FILE] [--cpu INT] [-v]

option:
    -h --help    Show help.
    --padmet=FILES/DIR    pathname of the padmet files, sep all files by ',', ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
    --output=DIR    pathname of the output folder
    --padmetRef=FILE    pathanme of the database ref in padmet
    --cpu INT    number of CPU to use in multiprocessing

padmet.utils.exploration.compare_padmet.command_help()[source]¶: Show help for analysis command.

padmet.utils.exploration.compare_padmet.compare_padmet(padmet_path, output, padmetRef=None, verbose=False, number_cpu=None)[source]¶

#Compare 1-n padmet and create a folder output with files: genes.tsv:

fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, 1 (if in padmet_a), 1 (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]

reactions.tsv:: fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, 1 (if in padmet_a), 1 (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
pathways.tsv:: fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
compounds.tsv:: fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’‘,rxn-1,’‘]

Parameters:	padmet_path (str) – pathname of the padmet files, sep all files by ‘,’, ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder output (str) – pathname of the output folder padmetRef (padmet.classes.PadmetRef) – padmet containing the database of reference, need to calculat pathway completion rate verbose (bool) – if True print information

padmet.utils.exploration.compare_padmet.compare_padmet_cli(command_args)[source]¶

padmet.utils.exploration.compare_padmet.extract_information_padmet(file_path, padmetRef, verbose)[source]¶

padmet.utils.exploration.compare_padmet.merge_dicts(element_dict, tmp_dict)[source]¶

compare_sbml¶

Description:

compare reactions in 1-n or 2 sbml.

Returns if a reaction is missing

And if a reaction with the same id is using different species or different reversibility

usage:
    padmet compare_sbml --sbml=FILES/DIR --output=DIR

option:
    -h --help    Show help.
    --sbml FILES/DIR    pathname of the sbml files, sep all files by ',', ex: /path/sbml1.sbml;/path/sbml2.sbml OR a folder
    --output DIR    pathname of the output folder

padmet.utils.exploration.compare_sbml.command_help()[source]¶: Show help for analysis command.

padmet.utils.exploration.compare_sbml.compare_multiple_sbml(sbml_path, output_folder)[source]¶

Compare 1-n sbml, create two output files reactions.tsv and metabolites.tsv with the reactions/metabolites in each sbml

Parameters:	sbml_path (str) – path to a folder containing sbmls or multiple sbml paths separated by a ‘,’ output_folder (str) – path to the output folder

padmet.utils.exploration.compare_sbml.compare_rxn(rxn1, rxn2)[source]¶

compare two cobra reaction object and return (same_cpd, same_rev) same_cpd: bool, if true means same compounds consumed and produced same_reve: bool, if true means same direction of reaction (reversible or not)

Parameters:	rxn1 (cobra.model.reaction) – reaction as cobra object rxn2 (cobra.model.reaction) – reaction as cobra object
Returns:	(same_cpd (bool), same_rev (bool))
Return type:	tuple

padmet.utils.exploration.compare_sbml.compare_sbml(sbml1_path, sbml2_path)[source]¶

Compare 2 sbml, print nb of metabolites and reactions. If reaction missing print reaction id, and reaction formula.

Parameters:	sbml1_path (str) – path to the first sbml file to compare sbml2_path (str) – path to the second sbml file to compare

padmet.utils.exploration.compare_sbml.compare_sbml_cli(command_args)[source]¶

compare_sbml_padmet¶

Description:: compare reactions in sbml and padmet file

usage:
    padmet compare_sbml_padmet --padmet=FILE --sbml=FILE

option:
    -h --help    Show help.
    --padmet=FILE    path of the padmet file
    --sbml=FILE    path of the sbml file

padmet.utils.exploration.compare_sbml_padmet.command_help()[source]¶: Show help for analysis command.

padmet.utils.exploration.compare_sbml_padmet.compare_sbml_padmet(sbml_document, padmet)[source]¶

compare reactions ids in sbml vs padmet, return nb of reactions in both and reactions id not in sbml or not in padmet

Parameters:	padmet (padmet.classes.PadmetSpec) – padmet to udpate sbml_file (libsbml.document) – sbml document

padmet.utils.exploration.compare_sbml_padmet.compare_sbml_padmet_cli(command_args)[source]¶

convert_sbml_db¶

Description:

This tool is use the MetaNetX database to check or convert a sbml. Flat files from MetaNetx are required to run this tool. They can be found in the aureme workflow or from the MetaNetx website. To use the tool set:

mnx_folder= the path to a folder containing MetaNetx flat files. the files must be named as ‘reac_xref.tsv’ and ‘chem_xref.tsv’ or set manually the different path of the flat files with:

mnx_reac= path to the flat file for reactions

mnx_chem= path to the flat file for chemical compounds (species)

To check the database used in a sbml:

to check all element of sbml (reaction and species) set:: to–map=all
to check only reaction of sbml set:: to–map=reaction
to check only species of sbml set:: to–map=species

To map a sbml and obtain a file of mapping ids to a given database set:

to-map:: as previously explained
db_out:: the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
output:: the path to the output file

For a given sbml using a specific database.

Return a dictionnary of mapping.

the output is a file with line = reaction_id/or species in sbml, reaction_id/species in db_out database

ex:: For a sbml based on kegg database, db_out=metacyc: the output file will contains for ex:

R02283 ACETYLORNTRANSAM-RXN

usage:
    padmet convert_sbml_db --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --to-map=STR [-v]
    padmet convert_sbml_db --mnx_folder=DIR --sbml=FILE --to-map=STR [-v]
    padmet convert_sbml_db --mnx_folder=DIR --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]
    padmet convert_sbml_db --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]

options:
    -h --help     Show help.
    --to-map=STR     select the part of the sbml to check or convert, must be in ['all', 'reaction', 'species']
    --mnx_reac=FILE     path to the MetaNetX file for reactions
    --mnx_chem=FILE     path to the MetaNetX file for compounds
    --sbml=FILE     path to the sbml file to convert
    --output=FILE     path to the file containing the mapping, sep = "\t"
    --db_out=FILE     id of the output database in ["BIGG","METACYC","KEGG"]
    -v     verbose.

padmet.utils.exploration.convert_sbml_db.check_sbml_db(sbml_file, to_map, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]¶

Check sbml database of a given sbml.

Parameters:	sbml_file (str) – path to the sbml file to convert to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’] verbose (bool) – if true: more info during process mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder) mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder) mnx_folder (str) – the path to a folder containing MetaNetx flat files
Returns:	(name of the best matching database, dict of matching)
Return type:	tuple

padmet.utils.exploration.convert_sbml_db.command_help()[source]¶: Show help for analysis command.

padmet.utils.exploration.convert_sbml_db.convert_sbml_db_cli()[source]¶

padmet.utils.exploration.convert_sbml_db.get_from_mnx(mnx_dict, element_id, db_out)[source]¶: #TODO

padmet.utils.exploration.convert_sbml_db.intern_mapping(id_to_map, db_out, _type)[source]¶: #TODO

padmet.utils.exploration.convert_sbml_db.map_sbml(sbml_file, to_map, db_out, output, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]¶

map a sbml and obtain a file of mapping ids to a given database.

Parameters:	sbml_file (str) – path to the sbml file to convert to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’] db_out (str) – the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only output (str) – path to the file containing the mapping, sep = ” “ verbose (bool) – if true: more info during process mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder) mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder) mnx_folder (str) – the path to a folder containing MetaNetx flat files
Returns:	(name of the best matching database, dict of matching)
Return type:	tuple

padmet.utils.exploration.convert_sbml_db.mnx_reader(input_file, db_out)[source]¶: #TODO

dendrogram_reactions_distance¶

Description:

Use reactions.tsv file from compare_padmet.py to create a dendrogram using a Jaccard distance.

From the matrix absence/presence of reactions in different species computes a Jaccard distance between these species. Apply a hierarchical clustering on these data with a complete linkage. Then create a dendrogram. Apply also intervene to create an upset graph on the data.

usage:
    padmet dendrogram_reactions_distance --reactions=FILE --output=FOLDER [--padmetRef=STR] [--pvclust] [--upset=INT] [-v]

option:
    -h --help    Show help.
    --reactions=FILE    pathname of the file containing reactions in each species of the comparison.
    --output=FOLDER    path to the output folder.
    --pvclust    launch pvclust dendrogram using R
    --padmetRef=STR    path to the padmet Ref file
    -u --upset=INT    number of cluster in the upset graph.
    -v    verbose mode.

padmet.utils.exploration.dendrogram_reactions_distance.absent_and_specific_reactions(reactions_dataframe, output_folder_tree_cluster, output_folder_specific, output_folder_absent, organisms)[source]¶

Compare all cluster one against another.

Parameters:

reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
output_folder_tree_cluster (str) – path to output tree cluster folder
output_folder_specific (str) – path to output folder with specific reactions for each species
output_folder_absent (str) – path to output folder with absent reactions for each species
organisms (list) – organisms names

padmet.utils.exploration.dendrogram_reactions_distance.add_dendrogram_node_label(reaction_dendrogram, node_list, reactions_clust, len_longest_cluster_id)[source]¶

Using cluster nodes, add label and reactions number on each node of the dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/43519473

Parameters:	reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism node_list (list) – cluster nodes reactions_clust (dictionary) – reactions in each cluster of the tree len_longest_cluster_id (int) – reactions in each cluster of the tree

padmet.utils.exploration.dendrogram_reactions_distance.command_help()[source]¶: Show help for analysis command.

padmet.utils.exploration.dendrogram_reactions_distance.comparison_cluster(reactions_clust, output_folder_comparison)[source]¶

Compare all cluster one against another.

Parameters:	reactions_clust (dictionary) – reactions in each cluster of the tree output_folder_comparison (str) – path to output folder

padmet.utils.exploration.dendrogram_reactions_distance.create_cluster(reactions_dataframe, absence_presence_matrix, linkage_matrix)[source]¶

Cut the dendrogram to create clusters.

Parameters:	reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe linkage_matrix (ndarray) – linkage matrix
Returns:	dendrogram_fclusters – {number used to split the linkage matrix: ndarray with the corresponding clusters}
Return type:	dictionary

padmet.utils.exploration.dendrogram_reactions_distance.create_intersection_files(root, cluster_leaf_species, reactions_dataframe, output_folder_tree_cluster, metacyc_to_ecs)[source]¶

Create intersection files.

Parameters:	root (root) – root of the xml tree cluster_leaf_species (dictionary) – for each leaf give the organisms in it reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism output_folder_tree_cluster (str) – path to the output folder metacyc_to_ecs (dictionary) – mapping of metayc reaction to EC number
Returns:	reactions_clust – reactions in each cluster of the tree
Return type:	dictionary

padmet.utils.exploration.dendrogram_reactions_distance.create_intervene_graph(absence_presence_matrix, reactions_dataframe, temp_data_folder, path_to_intervene, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]¶

Create an upset graph. Deprecated function, no we use supervenn look at create_supervenn function.

Parameters:

absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
temp_data_folder (str) – temporary data folder
path_to_intervene (str) – path to intervene bin
output_folder_upset (str) – path to output folder
dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}
k (int) – number of cluster to create

padmet.utils.exploration.dendrogram_reactions_distance.create_pvclust_dendrogram(reaction_file, output_folder)[source]¶

padmet.utils.exploration.dendrogram_reactions_distance.create_supervenn(absence_presence_matrix, reactions_dataframe, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]¶

Create an supervenn graph.

Parameters:

absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
output_folder_upset (str) – path to output folder
dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}
k (int) – number of cluster to create

padmet.utils.exploration.dendrogram_reactions_distance.dendrogram_reactions_distance_cli(command_args)[source]¶

padmet.utils.exploration.dendrogram_reactions_distance.getNewick(node, newick, parentdist, leaf_names)[source]¶

Create a newick file from the root node of the dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/31878514.

Parameters:	node (scipy.cluster.hierarchy.ClusterNode) – root ClusterNode of the scipy tree newick (str) – newick string parentdist (str) – root ClusterNode distance from the linkage matrix leaf_names (list) – list of organism names

padmet.utils.exploration.dendrogram_reactions_distance.hclust_to_xml(linkage_matrix)[source]¶

Using a distance matrix from scipy linkage, create a xml tree corresponding to the hierarchical clustering. Return the root of the tree.

Parameters:	linkage_matrix (ndarray) – linkage matrix
Returns:	root of the xml tree
Return type:	root

padmet.utils.exploration.dendrogram_reactions_distance.pvclust_dendrogram(reactions_dataframe, organisms, output_folder)[source]¶

Using a distance matrix, pvclust R package (with rpy2 package) create a dendrogram with bootstrap values.

Parameters:	reactions_dataframe (pandas.DataFrame) – Reactions absence/presence matrix organisms (list) – organisms names output_folder (str) – path to the output folder

padmet.utils.exploration.dendrogram_reactions_distance.reaction_figure_creation(reaction_file, output_folder, upset_cluster=None, padmetRef_file=None, pvclust=None, verbose=False)[source]¶

Create dendrogram, upset figure (if upset argument) and compare reactiosn in species.

Parameters:	reaction_file (str) – path to reaction file upset_cluster (int) – the number of cluster you want in the intervene figure output_folder (str) – path to output folder padmet_ref_file (str) – path to padmet ref file pvclust (bool) – boolean to launch or not R pvclust dendrogram

flux_analysis¶

Description:

1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.

2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.

3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)

usage:
    padmet flux_analysis --sbml=FILE
    padmet flux_analysis --sbml=FILE --seeds=FILE --targets=FILE [--all_species]
    padmet flux_analysis --sbml=FILE --all_species

option:
    -h --help    Show help.
    --sbml=FILE    pathname to the sbml file to test for fba and fva.
    --seeds=FILE    pathname to the sbml file containing the seeds (medium).
    --targets=FILE    pathname to the sbml file containing the targets.
    --all_species    allow to make FBA on all the metabolites of the given model.

padmet.utils.exploration.flux_analysis.command_help()[source]¶: Show help for analysis command.

padmet.utils.exploration.flux_analysis.fba_on_targets(allspecies, model)[source]¶

for each specie in allspecies, create an objective function with the current species as only product and try to optimze the model and get flux.

Parameters:	allSpecies (list) – list of species ids to test model (cobra.model) – Cobra model from a sbml file

padmet.utils.exploration.flux_analysis.flux_analysis(sbml_file, seeds_file=None, targets_file=None, all_species=False)[source]¶

1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.

2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.

3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)

Parameters:	sbml_file (str) – path to sbml file to analyse seeds_file (str) – path to sbml file with only compounds representing the seeds/growth medium targets_file (str) – path to sbml file with only compounds representing the targets to reach all_species (bool) – if True will try to create obj function for each compound and return which are reachable by flux.

padmet.utils.exploration.flux_analysis.flux_analysis_cli(command_args)[source]¶

get_pwy_from_rxn¶

Description:: From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio

usage:
    padmet get_pwy_from_rxn --reaction_file=FILE --padmetRef=FILE  --output=FILE

options:
    -h --help     Show help.
    --reaction_file=FILE    pathname of the file containing the reactions id, 1/line
    --padmetRef=FILE    pathname of the padmet representing the database.
    --output=FILE    pathname of the file with line = pathway id, all reactions id, reactions ids from reaction file, ratio. sep = "\t"

padmet.utils.exploration.get_pwy_from_rxn.command_help()[source]¶: Show help for analysis command.

padmet.utils.exploration.get_pwy_from_rxn.dict_pwys_to_file(dict_pwy, output)[source]¶

Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()

Parameters:	dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}} output (str) – path to output file

padmet.utils.exploration.get_pwy_from_rxn.extract_pwys(padmet, reactions)[source]¶

#extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Parameters:	padmet (padmet.classes.PadmetSpec) – padmet to udpate reactions (set) – set of reactions to match with pathways
Returns:	dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
Return type:	dict

padmet.utils.exploration.get_pwy_from_rxn.get_pwy_from_rxn(padmet, reaction_file, output)[source]¶

padmet.utils.exploration.get_pwy_from_rxn.get_pwy_from_rxn_cli(command_args)[source]¶

padmet_stats¶

Description:: From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio

usage:
    padmet get_pwy_from_rxn --reaction_file=FILE --padmetRef=FILE  --output=FILE

options:
    -h --help     Show help.
    --reaction_file=FILE    pathname of the file containing the reactions id, 1/line
    --padmetRef=FILE    pathname of the padmet representing the database.
    --output=FILE    pathname of the file with line = pathway id, all reactions id, reactions ids from reaction file, ratio. sep = "\t"

padmet.utils.exploration.get_pwy_from_rxn.command_help()[source]: Show help for analysis command.

padmet.utils.exploration.get_pwy_from_rxn.dict_pwys_to_file(dict_pwy, output)[source]

Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()

Parameters:	dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}} output (str) – path to output file

padmet.utils.exploration.get_pwy_from_rxn.extract_pwys(padmet, reactions)[source]

#extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Parameters:	padmet (padmet.classes.PadmetSpec) – padmet to udpate reactions (set) – set of reactions to match with pathways
Returns:	dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
Return type:	dict

padmet.utils.exploration.get_pwy_from_rxn.get_pwy_from_rxn(padmet, reaction_file, output)[source]

padmet.utils.exploration.get_pwy_from_rxn.get_pwy_from_rxn_cli(command_args)[source]

padmet_stats¶

Description:

Create a padmet stats file containing the number of pathways, reactions, genes and compounds inside the padmet.

The input is a padmet file or a folder containing multiple padmets.

Create a tsv file named padmet_stats.tsv where the script have been launched.

usage:: padmet padmet_stats –padmet=FILE –output=FOLDER
option:: -h –help Show help. -p –padmet=FILE padmet file or folder containing padmet(s). -o –output=FOLDER path to output folder.

padmet.utils.exploration.padmet_stats.command_help()[source]¶: Show help for analysis command.

padmet.utils.exploration.padmet_stats.compute_stats(padmet_file_folder, output_folder)[source]¶

Count reactions/pathways/compounds/genes in padmet(s).

Parameters:	padmet_file_folder (str) – path to the padmet file/folder to analyze output_folder (str) – path to the output folder

padmet.utils.exploration.padmet_stats.orthology_result(padmet_file, padmet_names)[source]¶

Count reactions/pathways/compounds/genes in a padmet file.

Parameters:	padmet_file (str) – path to a padmet file padmet_names (list) – all the padmet filenames
Returns:	Number of reactions given by the other species
Return type:	dictionary

padmet.utils.exploration.padmet_stats.padmet_stat(padmet_file)[source]¶

Count reactions/pathways/compounds/genes in a padmet file.

Parameters:	padmet_file (str) – path to a padmet file
Returns:	[path to padmet, number of pathways, number of reactions, number of genes, number of compounds]
Return type:	list

padmet.utils.exploration.padmet_stats.padmet_stats_cli(command_args)[source]¶

prot2genome¶

Description:: Prot2Genome contains functions used for blast analysis and padmet enrichment

usage:

padmet prot2genome –query_faa=FILE –query_ids=FILE/STR –subject_gbk=FILE –subject_fna=FILE –subject_faa=FILE –output_folder=FILE [–cpu=INT] [blastp] [tblastn] [debug] padmet prot2genome –query_faa=FILE –query_ids=FILE/STR –subject_gbk=FILE –subject_fna=FILE –subject_faa=FILE –output_folder=FILE –exonerate=PATH [–cpu=INT] [blastp] [tblastn] [debug] padmet prot2genome –padmet=FOLDER –output=FOLDER padmet prot2genome –studied_organisms=FOLDER –output=FOLDER padmet prot2genome –run=FOLDER –padmetRef=FILE [–cpu=INT] [debug]

From aucome run fromAucome():: -1. Extract specifique reactions in spec_reactions folder with extractReactions() -2. Extract genes from spec_reactions files with extractGenes() -3. Run tblastn + exonerate with runAllAnalysis()

options:

–query_faa=FILE #TODO. –query_ids=FILE/STR #TODO. –subject_gbk=FILE #TODO. –subject_fna=FILE #TODO. –subject_faa=FILE #TODO. –output_folder=FILE #TODO. –cpu=INT Number of cpu to use for the multiprocessing (if none use 1 cpu). [default: 1] blastp #TODO. tblastn #TODO. debug #TODO.

padmet.utils.exploration.prot2genome.analysisOutput(analysis_result, analysis_output)[source]¶

padmet.utils.exploration.prot2genome.cleanTmp(tmp_folder)[source]¶

Remove all files from tmp folder

Parameters:	tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse

padmet.utils.exploration.prot2genome.command_help()[source]¶: Show help for analysis command.

padmet.utils.exploration.prot2genome.createPadmet(dict_args)[source]¶: function used in mp_createPadmet by each worker the Pool padmet are updated using funciton add_delete_rxn from padmet.utils.connection.manual_curation

padmet.utils.exploration.prot2genome.createSeqFromTblastn(subject_fna, sseq_seq_faa, exonerate_target_id, start_match, end_match)[source]¶

Use the result from the tBlastn to extract a region from the subject genome. The region extracted corresponds to the match region and 10kb before and 10kb after.

Parameters:	subject_fna (str) – path to subject fasta sequence (genome) sseq_seq_faa (str) – path to output fasta sequence exonerate_target_id (str) – ID of the contig/scaffold/chromosome where a match has been found start_match (int) – start of the match end_match (int) – end of the match

padmet.utils.exploration.prot2genome.extractAnalysis(blast_analysis_folder, spec_reactions_folder, output_folder)[source]¶

For each analysis output in blast analysis folder, obtained with runAllAnalysis(): 1./ Extract orthologues hit 2./ For each specific reactions from spec_reactions_folder, if all genes of a reactions got ortho hit

add reaction to reactions_to_add

Parameters:	blast_analysis_folder (str) – path folder with all blast analysis output files spec_reactions_folder (str) – path folder with all files containing specific reactions output_folder (str) – path folder where to extract all reactions to add

padmet.utils.exploration.prot2genome.extractGenes(reactions_file)[source]¶

Extract genes ids and return a list from reactions_file obtained with extractReactions()

Parameters:	reactions_file (str) – path to reaction file

padmet.utils.exploration.prot2genome.extractReactions(dict_args)[source]¶: function used in mp_cextractReactions by each worker the Pool for org_a.padmet and org_b.padmet:

1./ extract reactions and specific reactiosn (not in a, not in b) 2./ extract genes associated to specific reactions 3./ Select only reactions if they are from annotation rxn-1 in org_a but not in org_b, if rxn-1 doesn’t come from org_a annotation, skip the reaction 4./ create output file: header = [“reaction_id”, “genes_ids”, “sources”]

padmet.utils.exploration.prot2genome.extract_sequence(exonerate_output, exonerate_sequence)[source]¶: Extract protein sequence from exonerate ouput.

padmet.utils.exploration.prot2genome.fromAucome(run_folder, cpu, padmetRef, blastp=True, tblastn=True, exonerate=True, keep_tmp=False, debug=False)[source]¶

This function fit an AuCoMe run. Select a aucome run folder and then the function will: 1./ For each couple of studied organisms, extract specific reactions

ex: For org A and org B, extract reactions in org A but not in org B and vice versa

2./ Then for each specific reactions, extract genes associated and run blastp, tblastn and exonerate 3./ For each reaction, for all genes associated, if no blastp match but tblastn and exonerate hit select the reaction as a hit 4./ Create a new padmet file with the new reactions to add within

Parameters:

run_folder (str) – path to aucome run folder
cpu (int) – number of cpu to use for multiprocessing steps
padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files
blastp (bool) – If true run blastp during analysis
tblastn (bool) – If true run tblastn during analysis
exonerate (bool) – If true run exonerate during analysis, tblastn must also be True
keep_tmp (bool) – If true keep temporary files of analysis (with predicted gene sequence)
debug (bool) – if true, print all raw informations of analysis

padmet.utils.exploration.prot2genome.mp_createPadmet(reactions_to_add_folder, padmet_folder, output_folder, padmetRef, pool, verbose=False)[source]¶

Update all padmet in padmet_folder with reactions to add from file in reactiosn_to_add_folder, the informations of the reactions are extracted from padmetRef as unique source ex: for padmet_folder/org_a.padmet, select reactions_to_add_folder/org_a.tsv, add each reactions listed in this file based on padmetRef to create output_folder/org_a.padmet Create the padmet files in multiprocess, the more cpu the more new padmet files will be created faster

Parameters:

reactions_to_add_folder (str) – path folder with all files containing reactions to add for each studied organism
padmet_folder (str) – path to folder with all padmet files of studied organism
output_folder (str) – path to output folder where to create new padmet files
padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files
pool (Pool object) – pool object of multiprocessing
verbose (bool) – verbose

padmet.utils.exploration.prot2genome.mp_extractReactions(padmet_folder, output_folder, pool)[source]¶

From a folder of padmet files, create all dual combination and extract specific reactions to create a file in output_folder ex: in padmet_folder: org_a.padmet, org_b.padmet, create: output_folder: org_a_vs_org_b.tsv and org_b_vs_org_a.tsv

Parameters:	padmet_folder (str) – path to folder with all padmet files of studied organism output_folder (str) – path to output folder where to extract specific reactions pool (Pool object) – pool object of multiprocessing

padmet.utils.exploration.prot2genome.mp_runAnalysis(spec_reactions_folder, studied_organisms_folder, output_folder, tmp_folder, pool, blastp, tblastn, exonerate, keep_tmp, debug)[source]¶

Run different blast analysis based on files representing specific reactions of 2 padmet files. For each specific reaction file in spec_reactions_folder (ex: org_a_vs_org_b.tsv):

1./ search for:

faa file of org_a (studied_organisms_folder/org_a/org_a.faa) gbk file of org_b (studied_organisms_folder/org_b/org_b.gbk) faa file of org_b (studied_organisms_folder/org_b/org_b.faa) fna file of org_b (studied_organisms_folder/org_b/org_b.fna)

if fna doesn’t exist create it

2./ if output file (blast_analysis_folder/org_a_VS_org_b.tsv) doesn’t already exist run analysis 3./ extracts all genes ids from specific reaction file with fct extractGenes() 4./ Run blastp, tblastn, exonerate on gene_id.faa vs target.faa / fna with runAllAnalysis() 5./ Create analysis output The analysis create a lot of temp files, all are in tmp_folder wich is cleaned after all loop

Parameters:

spec_reactions_older (str) – path folder with all files containing specific reactions
studied_organisms_folder (str) – path to folder with all data of studied organisms. Folder contains 1 folder by org with name as org name, in each: org.gbk,org.faa,org.fna
output_folder (str) – path to output folder where to extract blast analysis
tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse
pool (Pool object) – pool object of multiprocessing
blastp (bool) – If true run blastp during analysis
tblastn (bool) – If true run tblastn during analysis
exonerate (bool) – If true run exonerate during analysis, tblastn must also be True
keep_tmp (bool) – If true keep temporary files of analysis (with predicted gene sequence)
debug (bool) – if true, print all raw informations of analysis

padmet.utils.exploration.prot2genome.prot2genome_cli(command_args)[source]¶

padmet.utils.exploration.prot2genome.runAllAnalysis(dict_args)[source]¶

For a given gene query id:

1/ extract from query_faa the sequence and create a faa file output_folder/query_id.faa: If isoforms found, also search for each specific isoform

2/ if blastp, run blastp; if tblastn, run tblastn; if exonerate and tblastn has hit, run exonerate Run all of them and extract output as dict of data

Returns:	list of dict with all analysis output
Return type:	list

padmet.utils.exploration.prot2genome.runBlastp(query_seq_faa, subject_faa, header=['sseqid', 'evalue', 'bitscore'], debug=False)[source]¶

Run blastp on querry_seq vs subectj faa and return output based on header Use NcbiblastpCommandline fct and extract output Extract 1st best hit based on bitscore

Parameters:	query_seq_faa (str) – path to query fasta sequence subject_faa (str) – path to subject fasta sequence header (list) – output format of blastp debug (bool) – if true print all raw blastp output
Returns:	dict of the best blastp hit, add ‘blastp_’ tag, or empty dict if no hit
Return type:	dict

padmet.utils.exploration.prot2genome.runExonerate(query_seq_faa, sseq_seq_faa, output, debug=False)[source]¶

Run exonerate on querry_seq vs subject faa Exonerate must be installed, and the global var PATH must be update with the exonerate/bin/ command ‘exonerate’ should work from shell sseq_seq_faa is obtained after tblastn run based on tblastn_sseqid value

Parameters:	query_seq_faa (str) – path to query fasta sequence sseq_seq_faa (str) – path to subject faa sequence output (str) – path to exonerate output debug (bool) – if true print all raw exonerate output
Returns:	dict of the best exonerate hit, add ‘exonerate_’ tag, or empty dict if no hit
Return type:	dict

padmet.utils.exploration.prot2genome.runSearchOnProteome(proteome_orgA, genome_orgB, output_folder, proteome_orgB=None)[source]¶

From a proteome of OrgA search for missing structural annotation in genome of OrgB. First launch Blastp between proteome of OrgA and proteome of OrgB. Then launch tBlastn between proteome of OrgA and genome of OrgB to find matches. Use the best match to extract a region from the genome of OrgB. Then launch Exonerate on this region using the sequence of OrgA.

Parameters:	proteome_orgA (str) – path to fasta file of proteome of OrgA genome_orgB (str) – path to fasta file of genome of OrgB output_folder (str) – path to output folder proteome_orgB (str) – path to fasta file of proteome of OrgB

padmet.utils.exploration.prot2genome.runTblastn(query_seq_faa, subject_fna, header=['sseqid', 'evalue', 'bitscore', 'sstart', 'send'], debug=False)[source]¶

Run tblastn on querry_seq vs subectj fna and return output based on header Use NcbitblastnCommandline fct and extract output Extract 1st best hit based on bitscore

Parameters:	query_seq_faa (str) – path to query fasta sequence subject_fna (str) – path to subject fna sequence header (list) – output format of tblastn debug (bool) – if true print all raw tblastn output
Returns:	dict of the best tblastn hit, add ‘tblastn_’ tag, or empty dict if no hit
Return type:	dict

visu_path¶

Description:: Allows to visualize a pathway in padmet network.

Color code: reactions associated to the pathway, present in the network: lightgreen reactions associated to the pathway, not present in the network: red compounds: skyblue

usage:
    padmet visu_path --padmetSpec=FILE/FOLDER --padmetRef=FILE --pathway=ID --output=FILE [--hide-currency] [--level=STR]

options:
    -h --help     Show help.
    --padmetSpec=FILE/FOLDER    pathname to the PADMet file of the network or to a folder containing multiple padmets.
    --padmetRef=FILE    pathname to the PADMet file of the db of reference.
    --pathway=ID    pathway id to visualize, can be multiple pathways separated by a ",".
    --output=FILE    pathname to the output file (extension can be .png or .svg).
    --hide-currency    hide currency metabolites.
    --level=STR    level of precision for the visualization (compound or pathway). By default visualization uses "compound".

padmet.utils.exploration.visu_path.command_help()[source]¶: Show help for analysis command.

padmet.utils.exploration.visu_path.visu_path_cli(command_args)[source]¶

padmet.utils.exploration.visu_path.visu_path_compounds(padmet_pathname, padmet_ref_pathname, pathway_ids, output_file, hide_currency_metabolites=None)[source]¶

Extract reactions from pathway and create a comppound/reaction graph.

Parameters:

padmet_pathname (str) – pathname of the padmet file or a folder containing multiple padmet
padmet_ref_pathname (str) – pathname of the padmetRef file
pathway_ids (str) – name of the pathway (can be multiple pathways separated by a ‘,’)
output_file (str) – pathname of the output picture (extension can be .png or .svg)
hide_currency_metabolites (bool) – hide currency metabolites

padmet.utils.exploration.visu_path.visu_path_pathways(padmet_pathname, padmet_ref_pathname, pathway_ids, output_file)[source]¶

Extract reactions from pathway and create a comppound/reaction graph.

Parameters:

padmet_pathname (str) – pathname of the padmet file or a folder containing multiple padmet
padmet_ref_pathname (str) – pathname of the padmetRef file
pathway_ids (str) – name of the pathway (can be multiple pathways separated by a ‘,’)
output_file (str) – pathname of the output picture (extension can be .png or .svg)
hide_compounds (bool) – hide common compounds (like water or proton)