API for padmet.utils.exploration

Description:

#TODO

compare_padmet

Description:

#Compare 1-n padmet and create a folder output with files: genes.tsv:

fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, 1 (if in padmet_a), 1 (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]

reactions.tsv:

fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, 1 (if in padmet_a), 1 (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]

pathways.tsv:

fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]

compounds.tsv:

fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’’,rxn-1,’’]

usage:
    padmet compare_padmet --padmet=FILES/DIR --output=DIR [--padmetRef=FILE] [--cpu INT] [-v]

option:
    -h --help    Show help.
    --padmet=FILES/DIR    pathname of the padmet files, sep all files by ',', ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
    --output=DIR    pathname of the output folder
    --padmetRef=FILE    pathanme of the database ref in padmet
    --cpu INT    number of CPU to use in multiprocessing
padmet.utils.exploration.compare_padmet.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.compare_padmet.compare_padmet(padmet_path, output, padmetRef=None, verbose=False, number_cpu=None)[source]

#Compare 1-n padmet and create a folder output with files: genes.tsv:

fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, 1 (if in padmet_a), 1 (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]

reactions.tsv:

fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, 1 (if in padmet_a), 1 (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]

pathways.tsv:

fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]

compounds.tsv:

fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’’,rxn-1,’’]

Parameters:
  • padmet_path (str) – pathname of the padmet files, sep all files by ‘,’, ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder

  • output (str) – pathname of the output folder

  • padmetRef (padmet.classes.PadmetRef) – padmet containing the database of reference, need to calculat pathway completion rate

  • verbose (bool) – if True print information

padmet.utils.exploration.compare_padmet.compare_padmet_cli(command_args)[source]
padmet.utils.exploration.compare_padmet.extract_information_padmet(file_path, padmetRef, verbose)[source]
padmet.utils.exploration.compare_padmet.merge_dicts(element_dict, tmp_dict)[source]

compare_sbml

Description:

compare reactions in 1-n or 2 sbml.

Returns if a reaction is missing

And if a reaction with the same id is using different species or different reversibility

usage:
    padmet compare_sbml --sbml=FILES/DIR --output=DIR

option:
    -h --help    Show help.
    --sbml FILES/DIR    pathname of the sbml files, sep all files by ',', ex: /path/sbml1.sbml;/path/sbml2.sbml OR a folder
    --output DIR    pathname of the output folder
padmet.utils.exploration.compare_sbml.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.compare_sbml.compare_multiple_sbml(sbml_path, output_folder)[source]

Compare 1-n sbml, create two output files reactions.tsv and metabolites.tsv with the reactions/metabolites in each sbml

Parameters:
  • sbml_path (str) – path to a folder containing sbmls or multiple sbml paths separated by a ‘,’

  • output_folder (str) – path to the output folder

padmet.utils.exploration.compare_sbml.compare_rxn(rxn1, rxn2)[source]

compare two cobra reaction object and return (same_cpd, same_rev) same_cpd: bool, if true means same compounds consumed and produced same_reve: bool, if true means same direction of reaction (reversible or not)

Parameters:
  • rxn1 (cobra.model.reaction) – reaction as cobra object

  • rxn2 (cobra.model.reaction) – reaction as cobra object

Returns:

(same_cpd (bool), same_rev (bool))

Return type:

tuple

padmet.utils.exploration.compare_sbml.compare_sbml(sbml1_path, sbml2_path)[source]

Compare 2 sbml, print nb of metabolites and reactions. If reaction missing print reaction id, and reaction formula.

Parameters:
  • sbml1_path (str) – path to the first sbml file to compare

  • sbml2_path (str) – path to the second sbml file to compare

padmet.utils.exploration.compare_sbml.compare_sbml_cli(command_args)[source]

compare_sbml_padmet

Description:

compare reactions in sbml and padmet file

usage:
    padmet compare_sbml_padmet --padmet=FILE --sbml=FILE

option:
    -h --help    Show help.
    --padmet=FILE    path of the padmet file
    --sbml=FILE    path of the sbml file
padmet.utils.exploration.compare_sbml_padmet.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.compare_sbml_padmet.compare_sbml_padmet(sbml_document, padmet)[source]

compare reactions ids in sbml vs padmet, return nb of reactions in both and reactions id not in sbml or not in padmet

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate

  • sbml_file (libsbml.document) – sbml document

padmet.utils.exploration.compare_sbml_padmet.compare_sbml_padmet_cli(command_args)[source]

convert_sbml_db

Description:

This tool is use the MetaNetX database to check or convert a sbml. Flat files from MetaNetx are required to run this tool. They can be found in the aureme workflow or from the MetaNetx website. To use the tool set:

mnx_folder= the path to a folder containing MetaNetx flat files. the files must be named as ‘reac_xref.tsv’ and ‘chem_xref.tsv’ or set manually the different path of the flat files with:

mnx_reac= path to the flat file for reactions

mnx_chem= path to the flat file for chemical compounds (species)

To check the database used in a sbml:
to check all element of sbml (reaction and species) set:

to–map=all

to check only reaction of sbml set:

to–map=reaction

to check only species of sbml set:

to–map=species

To map a sbml and obtain a file of mapping ids to a given database set:
to-map:

as previously explained

db_out:

the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only

output:

the path to the output file

For a given sbml using a specific database.

Return a dictionnary of mapping.

the output is a file with line = reaction_id/or species in sbml, reaction_id/species in db_out database

ex:

For a sbml based on kegg database, db_out=metacyc: the output file will contains for ex:

R02283 ACETYLORNTRANSAM-RXN

usage:
    padmet convert_sbml_db --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --to-map=STR [-v]
    padmet convert_sbml_db --mnx_folder=DIR --sbml=FILE --to-map=STR [-v]
    padmet convert_sbml_db --mnx_folder=DIR --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]
    padmet convert_sbml_db --mnx_reac=FILE --mnx_chem=FILE --sbml=FILE --output=FILE --db_out=ID --to-map=STR [-v]

options:
    -h --help     Show help.
    --to-map=STR     select the part of the sbml to check or convert, must be in ['all', 'reaction', 'species']
    --mnx_reac=FILE     path to the MetaNetX file for reactions
    --mnx_chem=FILE     path to the MetaNetX file for compounds
    --sbml=FILE     path to the sbml file to convert
    --output=FILE     path to the file containing the mapping, sep = "\t"
    --db_out=FILE     id of the output database in ["BIGG","METACYC","KEGG"]
    -v     verbose.
padmet.utils.exploration.convert_sbml_db.check_sbml_db(sbml_file, to_map, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]

Check sbml database of a given sbml.

Parameters:
  • sbml_file (str) – path to the sbml file to convert

  • to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]

  • verbose (bool) – if true: more info during process

  • mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)

  • mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)

  • mnx_folder (str) – the path to a folder containing MetaNetx flat files

Returns:

(name of the best matching database, dict of matching)

Return type:

tuple

padmet.utils.exploration.convert_sbml_db.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.convert_sbml_db.convert_sbml_db_cli()[source]
padmet.utils.exploration.convert_sbml_db.get_from_mnx(mnx_dict, element_id, db_out)[source]

#TODO

padmet.utils.exploration.convert_sbml_db.intern_mapping(id_to_map, db_out, _type)[source]

#TODO

padmet.utils.exploration.convert_sbml_db.map_sbml(sbml_file, to_map, db_out, output, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]

map a sbml and obtain a file of mapping ids to a given database.

Parameters:
  • sbml_file (str) – path to the sbml file to convert

  • to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]

  • db_out (str) – the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only

  • output (str) – path to the file containing the mapping, sep = “ “

  • verbose (bool) – if true: more info during process

  • mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)

  • mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)

  • mnx_folder (str) – the path to a folder containing MetaNetx flat files

Returns:

(name of the best matching database, dict of matching)

Return type:

tuple

padmet.utils.exploration.convert_sbml_db.mnx_reader(input_file, db_out)[source]

#TODO

dendrogram_reactions_distance

Description:

Use reactions.tsv file from compare_padmet.py to create a dendrogram using a Jaccard distance.

From the matrix absence/presence of reactions in different species computes a Jaccard distance between these species. Apply a hierarchical clustering on these data with a complete linkage. Then create a dendrogram. Apply also intervene to create an upset graph on the data.

usage:
    padmet dendrogram_reactions_distance --reactions=FILE --output=FOLDER [--padmetRef=STR] [--pvclust] [--upset=INT] [-v]

option:
    -h --help    Show help.
    --reactions=FILE    pathname of the file containing reactions in each species of the comparison.
    --output=FOLDER    path to the output folder.
    --pvclust    launch pvclust dendrogram using R
    --padmetRef=STR    path to the padmet Ref file
    -u --upset=INT    number of cluster in the upset graph.
    -v    verbose mode.
padmet.utils.exploration.dendrogram_reactions_distance.absent_and_specific_reactions(reactions_dataframe, output_folder_tree_cluster, output_folder_specific, output_folder_absent, organisms)[source]

Compare all cluster one against another.

Parameters:
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism

  • output_folder_tree_cluster (str) – path to output tree cluster folder

  • output_folder_specific (str) – path to output folder with specific reactions for each species

  • output_folder_absent (str) – path to output folder with absent reactions for each species

  • organisms (list) – organisms names

padmet.utils.exploration.dendrogram_reactions_distance.add_dendrogram_node_label(reaction_dendrogram, node_list, reactions_clust, len_longest_cluster_id)[source]

Using cluster nodes, add label and reactions number on each node of the dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/43519473

Parameters:
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism

  • node_list (list) – cluster nodes

  • reactions_clust (dictionary) – reactions in each cluster of the tree

  • len_longest_cluster_id (int) – reactions in each cluster of the tree

padmet.utils.exploration.dendrogram_reactions_distance.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.dendrogram_reactions_distance.comparison_cluster(reactions_clust, output_folder_comparison)[source]

Compare all cluster one against another.

Parameters:
  • reactions_clust (dictionary) – reactions in each cluster of the tree

  • output_folder_comparison (str) – path to output folder

padmet.utils.exploration.dendrogram_reactions_distance.create_cluster(reactions_dataframe, absence_presence_matrix, linkage_matrix)[source]

Cut the dendrogram to create clusters.

Parameters:
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism

  • absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe

  • linkage_matrix (ndarray) – linkage matrix

Returns:

dendrogram_fclusters – {number used to split the linkage matrix: ndarray with the corresponding clusters}

Return type:

dictionary

padmet.utils.exploration.dendrogram_reactions_distance.create_intersection_files(root, cluster_leaf_species, reactions_dataframe, output_folder_tree_cluster, metacyc_to_ecs)[source]

Create intersection files.

Parameters:
  • root (root) – root of the xml tree

  • cluster_leaf_species (dictionary) – for each leaf give the organisms in it

  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism

  • output_folder_tree_cluster (str) – path to the output folder

  • metacyc_to_ecs (dictionary) – mapping of metayc reaction to EC number

Returns:

reactions_clust – reactions in each cluster of the tree

Return type:

dictionary

padmet.utils.exploration.dendrogram_reactions_distance.create_intervene_graph(absence_presence_matrix, reactions_dataframe, temp_data_folder, path_to_intervene, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]

Create an upset graph. Deprecated function, no we use supervenn look at create_supervenn function.

Parameters:
  • absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe

  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism

  • temp_data_folder (str) – temporary data folder

  • path_to_intervene (str) – path to intervene bin

  • output_folder_upset (str) – path to output folder

  • dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}

  • k (int) – number of cluster to create

padmet.utils.exploration.dendrogram_reactions_distance.create_pvclust_dendrogram(reaction_file, output_folder)[source]
padmet.utils.exploration.dendrogram_reactions_distance.create_supervenn(absence_presence_matrix, reactions_dataframe, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]

Create an supervenn graph.

Parameters:
  • absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe

  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism

  • output_folder_upset (str) – path to output folder

  • dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}

  • k (int) – number of cluster to create

padmet.utils.exploration.dendrogram_reactions_distance.dendrogram_reactions_distance_cli(command_args)[source]
padmet.utils.exploration.dendrogram_reactions_distance.getNewick(node, newick, parentdist, leaf_names)[source]

Create a newick file from the root node of the dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/31878514.

Parameters:
  • node (scipy.cluster.hierarchy.ClusterNode) – root ClusterNode of the scipy tree

  • newick (str) – newick string

  • parentdist (str) – root ClusterNode distance from the linkage matrix

  • leaf_names (list) – list of organism names

padmet.utils.exploration.dendrogram_reactions_distance.hclust_to_xml(linkage_matrix)[source]

Using a distance matrix from scipy linkage, create a xml tree corresponding to the hierarchical clustering. Return the root of the tree.

Parameters:

linkage_matrix (ndarray) – linkage matrix

Returns:

root of the xml tree

Return type:

root

padmet.utils.exploration.dendrogram_reactions_distance.pvclust_dendrogram(reactions_dataframe, organisms, output_folder)[source]

Using a distance matrix, pvclust R package (with rpy2 package) create a dendrogram with bootstrap values.

Parameters:
  • reactions_dataframe (pandas.DataFrame) – Reactions absence/presence matrix

  • organisms (list) – organisms names

  • output_folder (str) – path to the output folder

padmet.utils.exploration.dendrogram_reactions_distance.reaction_figure_creation(reaction_file, output_folder, upset_cluster=None, padmetRef_file=None, pvclust=None, verbose=False)[source]

Create dendrogram, upset figure (if upset argument) and compare reactiosn in species.

Parameters:
  • reaction_file (str) – path to reaction file

  • upset_cluster (int) – the number of cluster you want in the intervene figure

  • output_folder (str) – path to output folder

  • padmet_ref_file (str) – path to padmet ref file

  • pvclust (bool) – boolean to launch or not R pvclust dendrogram

flux_analysis

Description:

1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.

2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.

3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)

usage:
    padmet flux_analysis --sbml=FILE
    padmet flux_analysis --sbml=FILE --seeds=FILE --targets=FILE [--all_species]
    padmet flux_analysis --sbml=FILE --all_species

option:
    -h --help    Show help.
    --sbml=FILE    pathname to the sbml file to test for fba and fva.
    --seeds=FILE    pathname to the sbml file containing the seeds (medium).
    --targets=FILE    pathname to the sbml file containing the targets.
    --all_species    allow to make FBA on all the metabolites of the given model.
padmet.utils.exploration.flux_analysis.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.flux_analysis.fba_on_targets(allspecies, model)[source]

for each specie in allspecies, create an objective function with the current species as only product and try to optimze the model and get flux.

Parameters:
  • allSpecies (list) – list of species ids to test

  • model (cobra.model) – Cobra model from a sbml file

padmet.utils.exploration.flux_analysis.flux_analysis(sbml_file, seeds_file=None, targets_file=None, all_species=False)[source]

1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.

2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.

3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)

Parameters:
  • sbml_file (str) – path to sbml file to analyse

  • seeds_file (str) – path to sbml file with only compounds representing the seeds/growth medium

  • targets_file (str) – path to sbml file with only compounds representing the targets to reach

  • all_species (bool) – if True will try to create obj function for each compound and return which are reachable by flux.

padmet.utils.exploration.flux_analysis.flux_analysis_cli(command_args)[source]

get_pwy_from_rxn

Description:

From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio

usage:
    padmet get_pwy_from_rxn --reaction_file=FILE --padmetRef=FILE  --output=FILE

options:
    -h --help     Show help.
    --reaction_file=FILE    pathname of the file containing the reactions id, 1/line
    --padmetRef=FILE    pathname of the padmet representing the database.
    --output=FILE    pathname of the file with line = pathway id, all reactions id, reactions ids from reaction file, ratio. sep = "\t"
padmet.utils.exploration.get_pwy_from_rxn.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.get_pwy_from_rxn.dict_pwys_to_file(dict_pwy, output)[source]

Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()

Parameters:
  • dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

  • output (str) – path to output file

padmet.utils.exploration.get_pwy_from_rxn.extract_pwys(padmet, reactions)[source]

#extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate

  • reactions (set) – set of reactions to match with pathways

Returns:

dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Return type:

dict

padmet.utils.exploration.get_pwy_from_rxn.get_pwy_from_rxn(padmet, reaction_file, output)[source]
padmet.utils.exploration.get_pwy_from_rxn.get_pwy_from_rxn_cli(command_args)[source]

padmet_stats

Description:

Create a padmet stats file containing the number of pathways, reactions, genes and compounds inside the padmet.

The input is a padmet file or a folder containing multiple padmets.

Create a tsv file named padmet_stats.tsv where the script have been launched.

usage:
    padmet padmet_stats --padmet=FILE --output=FOLDER

option:
    -h --help    Show help.
    -p --padmet=FILE    padmet file or folder containing padmet(s).
    -o --output=FOLDER    path to output folder.
padmet.utils.exploration.padmet_stats.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.padmet_stats.compute_stats(padmet_file_folder, output_folder)[source]

Count reactions/pathways/compounds/genes in padmet(s).

Parameters:
  • padmet_file_folder (str) – path to the padmet file/folder to analyze

  • output_folder (str) – path to the output folder

padmet.utils.exploration.padmet_stats.orthology_result(padmet_file, padmet_names)[source]

Count reactions/pathways/compounds/genes in a padmet file.

Parameters:
  • padmet_file (str) – path to a padmet file

  • padmet_names (list) – all the padmet filenames

Returns:

Number of reactions given by the other species

Return type:

dictionary

padmet.utils.exploration.padmet_stats.padmet_stat(padmet_file)[source]

Count reactions/pathways/compounds/genes in a padmet file.

Parameters:

padmet_file (str) – path to a padmet file

Returns:

[path to padmet, number of pathways, number of reactions, number of genes, number of compounds, number of class compounds]

Return type:

list

padmet.utils.exploration.padmet_stats.padmet_stats_cli(command_args)[source]

prot2genome

Description:

Prot2Genome contains functions used for blast analysis and padmet enrichment

usage:
    padmet prot2genome --query_faa=FILE --query_ids=FILE/STR --subject_gbk=FILE --subject_fna=FILE --subject_faa=FILE --output_folder=FILE [--cpu=INT] [blastp] [tblastn] [debug]
    padmet prot2genome --query_faa=FILE --query_ids=FILE/STR --subject_gbk=FILE --subject_fna=FILE --subject_faa=FILE --output_folder=FILE --exonerate=PATH  [--cpu=INT] [blastp] [tblastn] [debug]
    padmet prot2genome --padmet=FOLDER --output=FOLDER
    padmet prot2genome --studied_organisms=FOLDER --output=FOLDER
    padmet prot2genome --run=FOLDER --padmetRef=FILE [--cpu=INT] [debug]

    From aucome run fromAucome():
        -1. Extract specifique reactions in spec_reactions folder with extractReactions()
        -2. Extract genes from spec_reactions files with extractGenes()
        -3. Run tblastn + exonerate with runAllAnalysis()

options:
    --query_faa=FILE #TODO.
    --query_ids=FILE/STR #TODO.
    --subject_gbk=FILE #TODO.
    --subject_fna=FILE #TODO.
    --subject_faa=FILE #TODO.
    --output_folder=FILE #TODO.
    --cpu=INT     Number of cpu to use for the multiprocessing (if none use 1 cpu). [default: 1]
    blastp #TODO.
    tblastn #TODO.
    debug #TODO.
padmet.utils.exploration.prot2genome.analysisOutput(analysis_result, analysis_output)[source]
padmet.utils.exploration.prot2genome.cleanTmp(tmp_folder)[source]

Remove all files from tmp folder

Parameters:

tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse

padmet.utils.exploration.prot2genome.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.prot2genome.createPadmet(dict_args)[source]

function used in mp_createPadmet by each worker the Pool padmet are updated using funciton add_delete_rxn from padmet.utils.connection.manual_curation

padmet.utils.exploration.prot2genome.createSeqFromTblastn(subject_fna, sseq_seq_faa, exonerate_target_id, start_match, end_match)[source]

Use the result from the tBlastn to extract a region from the subject genome. The region extracted corresponds to the match region and 10kb before and 10kb after.

Parameters:
  • subject_fna (str) – path to subject fasta sequence (genome)

  • sseq_seq_faa (str) – path to output fasta sequence

  • exonerate_target_id (str) – ID of the contig/scaffold/chromosome where a match has been found

  • start_match (int) – start of the match

  • end_match (int) – end of the match

padmet.utils.exploration.prot2genome.extractAnalysis(blast_analysis_folder, spec_reactions_folder, output_folder)[source]
For each analysis output in blast analysis folder, obtained with runAllAnalysis()

1./ Extract orthologues hit 2./ For each specific reactions from spec_reactions_folder, if all genes of a reactions got ortho hit

add reaction to reactions_to_add

Parameters:
  • blast_analysis_folder (str) – path folder with all blast analysis output files

  • spec_reactions_folder (str) – path folder with all files containing specific reactions

  • output_folder (str) – path folder where to extract all reactions to add

padmet.utils.exploration.prot2genome.extractGenes(reactions_file)[source]

Extract genes ids and return a list from reactions_file obtained with extractReactions()

Parameters:

reactions_file (str) – path to reaction file

padmet.utils.exploration.prot2genome.extractReactions(dict_args)[source]

function used in mp_cextractReactions by each worker the Pool for org_a.padmet and org_b.padmet:

1./ extract reactions and specific reactiosn (not in a, not in b) 2./ extract genes associated to specific reactions 3./ Select only reactions if they are from annotation rxn-1 in org_a but not in org_b, if rxn-1 doesn’t come from org_a annotation, skip the reaction 4./ create output file: header = [“reaction_id”, “genes_ids”, “sources”]

padmet.utils.exploration.prot2genome.extract_sequence(exonerate_output, exonerate_sequence)[source]

Extract protein sequence from exonerate ouput.

padmet.utils.exploration.prot2genome.fromAucome(run_folder, cpu, padmetRef, blastp=True, tblastn=True, exonerate=True, keep_tmp=False, debug=False)[source]

This function fit an AuCoMe run. Select a aucome run folder and then the function will: 1./ For each couple of studied organisms, extract specific reactions

ex: For org A and org B, extract reactions in org A but not in org B and vice versa

2./ Then for each specific reactions, extract genes associated and run blastp, tblastn and exonerate 3./ For each reaction, for all genes associated, if no blastp match but tblastn and exonerate hit select the reaction as a hit 4./ Create a new padmet file with the new reactions to add within

Parameters:
  • run_folder (str) – path to aucome run folder

  • cpu (int) – number of cpu to use for multiprocessing steps

  • padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files

  • blastp (bool) – If true run blastp during analysis

  • tblastn (bool) – If true run tblastn during analysis

  • exonerate (bool) – If true run exonerate during analysis, tblastn must also be True

  • keep_tmp (bool) – If true keep temporary files of analysis (with predicted gene sequence)

  • debug (bool) – if true, print all raw informations of analysis

padmet.utils.exploration.prot2genome.mp_createPadmet(reactions_to_add_folder, padmet_folder, output_folder, padmetRef, pool, verbose=False)[source]

Update all padmet in padmet_folder with reactions to add from file in reactiosn_to_add_folder, the informations of the reactions are extracted from padmetRef as unique source ex: for padmet_folder/org_a.padmet, select reactions_to_add_folder/org_a.tsv, add each reactions listed in this file based on padmetRef to create output_folder/org_a.padmet Create the padmet files in multiprocess, the more cpu the more new padmet files will be created faster

Parameters:
  • reactions_to_add_folder (str) – path folder with all files containing reactions to add for each studied organism

  • padmet_folder (str) – path to folder with all padmet files of studied organism

  • output_folder (str) – path to output folder where to create new padmet files

  • padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files

  • pool (Pool object) – pool object of multiprocessing

  • verbose (bool) – verbose

padmet.utils.exploration.prot2genome.mp_extractReactions(padmet_folder, output_folder, pool)[source]

From a folder of padmet files, create all dual combination and extract specific reactions to create a file in output_folder ex: in padmet_folder: org_a.padmet, org_b.padmet, create: output_folder: org_a_vs_org_b.tsv and org_b_vs_org_a.tsv

Parameters:
  • padmet_folder (str) – path to folder with all padmet files of studied organism

  • output_folder (str) – path to output folder where to extract specific reactions

  • pool (Pool object) – pool object of multiprocessing

padmet.utils.exploration.prot2genome.mp_runAnalysis(spec_reactions_folder, studied_organisms_folder, output_folder, tmp_folder, pool, blastp, tblastn, exonerate, keep_tmp, debug, predicted_folder)[source]

Run different blast analysis based on files representing specific reactions of 2 padmet files. For each specific reaction file in spec_reactions_folder (ex: org_a_vs_org_b.tsv):

1./ search for:

faa file of org_a (studied_organisms_folder/org_a/org_a.faa) gbk file of org_b (studied_organisms_folder/org_b/org_b.gbk) faa file of org_b (studied_organisms_folder/org_b/org_b.faa) fna file of org_b (studied_organisms_folder/org_b/org_b.fna)

if fna doesn’t exist create it

2./ if output file (blast_analysis_folder/org_a_VS_org_b.tsv) doesn’t already exist run analysis 3./ extracts all genes ids from specific reaction file with fct extractGenes() 4./ Run blastp, tblastn, exonerate on gene_id.faa vs target.faa / fna with runAllAnalysis() 5./ Create analysis output The analysis create a lot of temp files, all are in tmp_folder wich is cleaned after all loop

Parameters:
  • spec_reactions_older (str) – path folder with all files containing specific reactions

  • studied_organisms_folder (str) – path to folder with all data of studied organisms. Folder contains 1 folder by org with name as org name, in each: org.gbk,org.faa,org.fna

  • output_folder (str) – path to output folder where to extract blast analysis

  • tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse

  • pool (Pool object) – pool object of multiprocessing

  • blastp (bool) – If true run blastp during analysis

  • tblastn (bool) – If true run tblastn during analysis

  • exonerate (bool) – If true run exonerate during analysis, tblastn must also be True

  • keep_tmp (bool) – If true keep temporary files of analysis (with predicted gene sequence)

  • debug (bool) – if true, print all raw informations of analysis

padmet.utils.exploration.prot2genome.prot2genome_cli(command_args)[source]
padmet.utils.exploration.prot2genome.runAllAnalysis(dict_args)[source]
For a given gene query id:
1/ extract from query_faa the sequence and create a faa file output_folder/query_id.faa

If isoforms found, also search for each specific isoform

2/ if blastp, run blastp; if tblastn, run tblastn; if exonerate and tblastn has hit, run exonerate Run all of them and extract output as dict of data

Returns:

list of dict with all analysis output

Return type:

list

padmet.utils.exploration.prot2genome.runBlastp(query_seq_faa, subject_faa, header=['sseqid', 'evalue', 'bitscore'], debug=False)[source]

Run blastp on querry_seq vs subectj faa and return output based on header Use NcbiblastpCommandline fct and extract output Extract 1st best hit based on bitscore

Parameters:
  • query_seq_faa (str) – path to query fasta sequence

  • subject_faa (str) – path to subject fasta sequence

  • header (list) – output format of blastp

  • debug (bool) – if true print all raw blastp output

Returns:

dict of the best blastp hit, add ‘blastp_’ tag, or empty dict if no hit

Return type:

dict

padmet.utils.exploration.prot2genome.runExonerate(query_seq_faa, sseq_seq_faa, output, debug=False)[source]

Run exonerate on querry_seq vs subject faa Exonerate must be installed, and the global var PATH must be update with the exonerate/bin/ command ‘exonerate’ should work from shell sseq_seq_faa is obtained after tblastn run based on tblastn_sseqid value

Parameters:
  • query_seq_faa (str) – path to query fasta sequence

  • sseq_seq_faa (str) – path to subject faa sequence

  • output (str) – path to exonerate output

  • debug (bool) – if true print all raw exonerate output

Returns:

dict of the best exonerate hit, add ‘exonerate_’ tag, or empty dict if no hit

Return type:

dict

padmet.utils.exploration.prot2genome.runSearchOnProteome(proteome_orgA, genome_orgB, output_folder, proteome_orgB=None)[source]

From a proteome of OrgA search for missing structural annotation in genome of OrgB. First launch Blastp between proteome of OrgA and proteome of OrgB. Then launch tBlastn between proteome of OrgA and genome of OrgB to find matches. Use the best match to extract a region from the genome of OrgB. Then launch Exonerate on this region using the sequence of OrgA.

Parameters:
  • proteome_orgA (str) – path to fasta file of proteome of OrgA

  • genome_orgB (str) – path to fasta file of genome of OrgB

  • output_folder (str) – path to output folder

  • proteome_orgB (str) – path to fasta file of proteome of OrgB

padmet.utils.exploration.prot2genome.runTblastn(query_seq_faa, subject_fna, header=['sseqid', 'evalue', 'bitscore', 'sstart', 'send'], debug=False)[source]

Run tblastn on querry_seq vs subectj fna and return output based on header Use NcbitblastnCommandline fct and extract output Extract 1st best hit based on bitscore

Parameters:
  • query_seq_faa (str) – path to query fasta sequence

  • subject_fna (str) – path to subject fna sequence

  • header (list) – output format of tblastn

  • debug (bool) – if true print all raw tblastn output

Returns:

dict of the best tblastn hit, add ‘tblastn_’ tag, or empty dict if no hit

Return type:

dict

report_network

Description:

Create reports of a padmet file.

all_pathways.tsv: header = [“dbRef_id”, “Common name”, “Number of reaction found”, “Total number of reaction”, “Ratio (Reaction found / Total)”]

all_reactions.tsv: header = [“dbRef_id”, “Common name”, “formula (with id)”, “formula (with common name)”, “in pathways”, “associated genes”]

all_metabolites.tsv: header = [“dbRef_id”, “Common name”, “Produced (p), Consumed (c), Both (cp)”]

usage:
    padmet report_network --padmetSpec=FILE --output_dir=dir [--padmetRef=FILE] [-v]

options:
    -h --help     Show help.
    --padmetSpec=FILE    pathname of the padmet file.
    --padmetRef=FILE    pathname of the padmet file used as database
    --output_dir=dir    directory for the results.
    -v   print info.
padmet.utils.exploration.report_network.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.report_network.report_network_cli(command_args)[source]

visu_network

Description:

Allows to visualize a metabolic network on a compounds perspectives

usage:
    padmet visu_network -i=FILE -o=FILE [--html=FILE] [--level=STR] [--hide-currency]

options:
    -h --help     Show help.
    -i=FILE    pathname to the input file (either PADMet or SBML).
    -o=FILE    pathname to the output file (picture of metabolic network).
    --html=FILE    pathname to the output file (interactive hmtl of metabolic network).
    --level=STR    level of precision for the visualization (compound, reaction or pathway). By default visualization uses "compound".
    --hide-currency    hide currency metabolites.
padmet.utils.exploration.visu_network.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.visu_network.create_graph(metabolic_network_file, output_file, visualization_level, hide_currency_metabolites)[source]

Using output of parse_compounds_padmet or parse_compounds_sbml create a network picture using igraph.

Parameters:
  • metabolic_network_file (str) – pathname of the metabolic network file

  • output_file (str) – pathname of the output picture of the metabolic network

  • visualization_level (str) – level of visualization either compound, reaction or pathway

  • hide_currency_metabolites (bool) – hide currency metabolites

padmet.utils.exploration.visu_network.create_html_graph(metabolic_network_file, output_file, visualization_level, hide_currency_metabolites)[source]

Using output of parse_compounds_padmet or parse_compounds_sbml create an interactive graph in html.

Parameters:
  • metabolic_network_file (str) – pathname of the metabolic network file

  • output_file (str) – pathname of the output picture of the metabolic network

  • visualization_level (str) – level of visualization either compound, reaction or pathway

  • hide_currency_metabolites (bool) – hide currency metabolites

padmet.utils.exploration.visu_network.parse_compounds_padmet(padmet_file, hide_metabolites)[source]

Parse padmets files to extract compounds to create edges and nodes for igraph.

Parameters:
  • padmet_file (str) – pathname of the padmet file

  • hide_metabolites (list) – list of metabolites to hide

Returns:

  • edges (list) – edges between two compounds (symbolizing the reaction)

  • edges_label (list) – for each edge the name of the reaction

  • weights (list) – the weight associated to each edge

  • nodes (list) – a compound

  • nodes_label (list) – for each node the name of the compound

padmet.utils.exploration.visu_network.parse_compounds_sbml(sbml_file, hide_metabolites)[source]

Parse sbml files to extract compounds to create edges and nodes for igraph.

Parameters:
  • sbml_file (str) – pathname of the sbml file

  • hide_metabolites (list) – list of metabolites to hide

Returns:

  • edges (list) – edges between two compounds (symbolizing the reaction)

  • edges_label (list) – for each edge the name of the reaction

  • weights (list) – the weight associated to each edge

  • nodes (list) – a compound

  • nodes_label (list) – for each node the name of the compound

padmet.utils.exploration.visu_network.parse_pathways_padmet(padmet_file)[source]

Parse padmets files to extract pathway inputs and ouputs to create edges and nodes for igraph.

Parameters:

padmet_file (str) – pathname of the padmet file

Returns:

  • edges (list) – edges between two compounds (symbolizing the pathway)

  • edges_label (list) – for each edge the name of the pathway

  • weights (list) – the weight associated to each edge

  • nodes (list) – a compound

  • nodes_label (list) – for each node the name of the compound

padmet.utils.exploration.visu_network.parse_reactions_padmet(padmet_file)[source]

Parse padmets files to extract reactions to create edges and nodes for igraph.

Parameters:

padmet_file (str) – pathname of the padmet file

Returns:

  • edges (list) – edges between two reactions

  • edges_label (list) – for each edge the name of the reaction

  • weights (list) – the weight associated to each edge

  • nodes (list) – a compound

  • nodes_label (list) – for each node the name of the compound

padmet.utils.exploration.visu_network.visu_network_cli(command_args)[source]

visu_path

Description:

Allows to visualize a pathway in padmet network.

Color code: reactions associated to the pathway, present in the network: lightgreen reactions associated to the pathway, not present in the network: red compounds: skyblue

usage:
    padmet visu_path --padmetSpec=FILE/FOLDER --padmetRef=FILE --pathway=ID --output=FILE [--hide-currency] [--level=STR]

options:
    -h --help     Show help.
    --padmetSpec=FILE/FOLDER    pathname to the PADMet file of the network or to a folder containing multiple padmets.
    --padmetRef=FILE    pathname to the PADMet file of the db of reference.
    --pathway=ID    pathway id to visualize, can be multiple pathways separated by a ",".
    --output=FILE    pathname to the output file (extension can be .png or .svg).
    --hide-currency    hide currency metabolites.
    --level=STR    level of precision for the visualization (compound or pathway). By default visualization uses "compound".
padmet.utils.exploration.visu_path.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.visu_path.visu_path_cli(command_args)[source]
padmet.utils.exploration.visu_path.visu_path_compounds(padmet_pathname, padmet_ref_pathname, pathway_ids, output_file, hide_currency_metabolites=None)[source]

Extract reactions from pathway and create a comppound/reaction graph.

Parameters:
  • padmet_pathname (str) – pathname of the padmet file or a folder containing multiple padmet

  • padmet_ref_pathname (str) – pathname of the padmetRef file

  • pathway_ids (str) – name of the pathway (can be multiple pathways separated by a ‘,’)

  • output_file (str) – pathname of the output picture (extension can be .png or .svg)

  • hide_currency_metabolites (bool) – hide currency metabolites

padmet.utils.exploration.visu_path.visu_path_pathways(padmet_pathname, padmet_ref_pathname, pathway_ids, output_file)[source]

Extract reactions from pathway and create a comppound/reaction graph.

Parameters:
  • padmet_pathname (str) – pathname of the padmet file or a folder containing multiple padmet

  • padmet_ref_pathname (str) – pathname of the padmetRef file

  • pathway_ids (str) – name of the pathway (can be multiple pathways separated by a ‘,’)

  • output_file (str) – pathname of the output picture (extension can be .png or .svg)

  • hide_compounds (bool) – hide common compounds (like water or proton)

visu_similarity_gsmn

Description:

Visualize similarity between metabolic networks using MDS.

usage:
    padmet visu_similarity_gsmn --reaction=FILE --output=FILE [--group=FILE]

options:
    -h --help     Show help.
    --reaction=FILE    pathname to the reaction file output of compare_padmet or compare_sbml.
    --output=FILE    pathname to the picture output file containing the MDS projection
    --group=FILE    pathname to the group file containing a column named "species" with the organism ID and a column "group" classifying species in group (you can also use a "color" column to associate group to specific color)
padmet.utils.exploration.visu_similarity_gsmn.command_help()[source]

Show help for analysis command.

padmet.utils.exploration.visu_similarity_gsmn.visu_similarity_gsmn(reaction_file, output_file, group_file=None)[source]

Create dendrogram, upset figure (if upset argument) and compare reactiosn in species.

Parameters:
  • reaction_file (str) – path to reaction file from compare_padmet/compare_sbml.

  • output_file (str) – path to picture ouput file.

  • group_file (str) – path to group file containing group assignation for each metabolic network.

padmet.utils.exploration.visu_similarity_gsmn.visu_similarity_gsmn_cli(command_args)[source]