Exploration

Description:

#TODO

compare_padmet

Description:

#Compare 1-n padmet and create a folder output with files: genes.csv:

fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, ‘present’ (if in padmet_a), ‘present’ (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]
reactions.csv:
fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, ‘present’ (if in padmet_a), ‘present’ (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
pathways.csv:
fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
compounds.csv:
fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’‘,rxn-1,’‘]
padmet.utils.exploration.compare_padmet.compare_padmet(padmet_path, output, padmetRef=None, verbose=False)[source]

#Compare 1-n padmet and create a folder output with files: genes.csv:

fieldnames = [gene, padmet_a, padmet_b, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [gene-a, ‘present’ (if in padmet_a), ‘present’ (if in padmet_b), rxn-1;rxn-2 (names of reactions associated to gene-a in padmet_a), rxn-2]
reactions.csv:
fieldnames = [reaction, padmet_a, padmet_b, padmet_a_genes_assoc, padmet_b_genes_assoc, padmet_a_formula, padmet_b_formula] line = [rxn-1, ‘present’ (if in padmet_a), ‘present’ (if in padmet_b), ‘gene-a;gene-b; gene-a, ‘cpd-1 + cpd-2 => cpd-3’, ‘cpd-1 + cpd-2 => cpd-3’]
pathways.csv:
fieldnames = [pathway, padmet_a_completion_rate, padmet_b_completion_rate, padmet_a_rxn_assoc, padmet_b_rxn_assoc] line = [pwy-a, 0.80, 0.30, rxn-a;rxn-b; rxn-a]
compounds.csv:
fieldnames = [‘metabolite’, padmet_a_rxn_consume, padmet_a_rxn_produce, padmet_b_rxn_consume, padmet_rxn_produce] line = [cpd-1, rxn-1,’‘,rxn-1,’‘]
Parameters:
  • padmet_path (str) – pathname of the padmet files, sep all files by ‘,’, ex: /path/padmet1.padmet;/path/padmet2.padmet OR a folder
  • output (str) – pathname of the output folder
  • padmetRef (padmet.classes.PadmetRef) – padmet containing the database of reference, need to calculat pathway completion rate
  • verbose (bool) – if True print information

compare_sbml

Description:

compare reactions in two sbml.

Returns if a reaction is missing

And if a reaction with the same id is using different species or different reversibility

padmet.utils.exploration.compare_sbml.compare_rxn(rxn1, rxn2)[source]

compare two cobra reaction object and return (same_cpd, same_rev) same_cpd: bool, if true means same compounds consumed and produced same_reve: bool, if true means same direction of reaction (reversible or not)

Parameters:
  • rxn1 (cobra.model.reaction) – reaction as cobra object
  • rxn2 (cobra.model.reaction) – reaction as cobra object
Returns:

(same_cpd (bool), same_rev (bool))

Return type:

tuple

padmet.utils.exploration.compare_sbml.compare_sbml(sbml1_path, sbml2_path)[source]

Compare 2 sbml, print nb of metabolites and reactions. If reaction missing print reaction id, and reaction formula.

Parameters:
  • sbml1_path (str) – path to the first sbml file to compare
  • sbml2_path (str) – path to the second sbml file to compare

compare_sbml_padmet

Description:
compare reactions in sbml and padmet file
padmet.utils.exploration.compare_sbml_padmet.compare_sbml_padmet(sbml_document, padmet)[source]

compare reactions ids in sbml vs padmet, return nb of reactions in both and reactions id not in sbml or not in padmet

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate
  • sbml_file (libsbml.document) – sbml document

convert_sbml_db

Description:

This tool is use the MetaNetX database to check or convert a sbml. Flat files from MetaNetx are required to run this tool. They can be found in the aureme workflow or from the MetaNetx website. To use the tool set:

mnx_folder= the path to a folder containing MetaNetx flat files. the files must be named as ‘reac_xref.tsv’ and ‘chem_xref.tsv’ or set manually the different path of the flat files with:

mnx_reac= path to the flat file for reactions

mnx_chem= path to the flat file for chemical compounds (species)

To check the database used in a sbml:
to check all element of sbml (reaction and species) set:
to–map=all
to check only reaction of sbml set:
to–map=reaction
to check only species of sbml set:
to–map=species
To map a sbml and obtain a file of mapping ids to a given database set:
to-map:
as previously explained
db_out:
the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
output:
the path to the output file

For a given sbml using a specific database.

Return a dictionnary of mapping.

the output is a file with line = reaction_id/or species in sbml, reaction_id/species in db_out database

ex:
For a sbml based on kegg database, db_out=metacyc: the output file will contains for ex:

R02283 ACETYLORNTRANSAM-RXN

padmet.utils.exploration.convert_sbml_db.check_sbml_db(sbml_file, to_map, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]

Check sbml database of a given sbml.

Parameters:
  • sbml_file (str) – path to the sbml file to convert
  • to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]
  • verbose (bool) – if true: more info during process
  • mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)
  • mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)
  • mnx_folder (str) – the path to a folder containing MetaNetx flat files
Returns:

(name of the best matching database, dict of matching)

Return type:

tuple

padmet.utils.exploration.convert_sbml_db.get_from_mnx(mnx_dict, element_id, db_out)[source]

#TODO

padmet.utils.exploration.convert_sbml_db.intern_mapping(id_to_map, db_out, _type)[source]

#TODO

padmet.utils.exploration.convert_sbml_db.map_sbml(sbml_file, to_map, db_out, output, verbose=False, mnx_reac_file=None, mnx_chem_file=None, mnx_folder=None)[source]

map a sbml and obtain a file of mapping ids to a given database.

Parameters:
  • sbml_file (str) – path to the sbml file to convert
  • to_map (str) – select the part of the sbml to check must be in [‘all’, ‘reaction’, ‘species’]
  • db_out (str) – the name of the database target: [‘metacyc’, ‘bigg’, ‘kegg’] only
  • output (str) – path to the file containing the mapping, sep = ” “
  • verbose (bool) – if true: more info during process
  • mnx_reac_file (str) – path to the flat file for reactions (can be None if given mnx_folder)
  • mnx_chem_file (str) – path to the flat file for chemical compounds (species) (can be None if given mnx_folder)
  • mnx_folder (str) – the path to a folder containing MetaNetx flat files
Returns:

(name of the best matching database, dict of matching)

Return type:

tuple

padmet.utils.exploration.convert_sbml_db.mnx_reader(input_file, db_out)[source]

#TODO

dendrogram_reactions_distance

Description:

Use reactions.csv file from compare_padmet.py to create a dendrogram using a Jaccard distance.

From the matrix absence/presence of reactions in different species computes a Jaccard distance between these species. Apply a hierarchical clustering on these data with a complete linkage. Then create a dendrogram. Apply also intervene to create an upset graph on the data.

padmet.utils.exploration.dendrogram_reactions_distance.absent_and_specific_reactions(reactions_dataframe, output_folder_tree_cluster, output_folder_specific, output_folder_absent, organisms)[source]

Compare all cluster one against another.

Parameters:
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • output_folder_tree_cluster (str) – path to output tree cluster folder
  • output_folder_specific (str) – path to output folder with specific reactions for each species
  • output_folder_absent (str) – path to output folder with absent reactions for each species
  • organisms (list) – organisms names
padmet.utils.exploration.dendrogram_reactions_distance.add_dendrogram_node_label(reaction_dendrogram, node_list, reactions_clust, len_longest_cluster_id)[source]

Using cluster nodes, add label and reactions number on each node of teh dendrogram. This function comes from this answer on stackoverflow: https://stackoverflow.com/a/43519473

Parameters:
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • node_list (list) – cluster nodes
  • reactions_clust (dictionary) – reactions in each cluster of the tree
  • len_longest_cluster_id (int) – reactions in each cluster of the tree
padmet.utils.exploration.dendrogram_reactions_distance.comparison_cluster(reactions_clust, output_folder_comparison)[source]

Compare all cluster one against another.

Parameters:
  • reactions_clust (dictionary) – reactions in each cluster of the tree
  • output_folder_comparison (str) – path to output folder
padmet.utils.exploration.dendrogram_reactions_distance.create_cluster(reactions_dataframe, absence_presence_matrix, linkage_matrix)[source]

Cut the dendrogram to create clusters.

Parameters:
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
  • linkage_matrix (ndarray) – linkage matrix
Returns:

dendrogram_fclusters – {number used to split the linkage matrix: ndarray with the corresponding clusters}

Return type:

dictionary

padmet.utils.exploration.dendrogram_reactions_distance.create_intersection_files(root, cluster_leaf_species, reactions_dataframe, output_folder_tree_cluster, metacyc_to_ecs)[source]

Create intersection files.

Parameters:
  • root (root) – root of the xml tree
  • cluster_leaf_species (dictionary) – for each leaf give the organisms in it
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • output_folder_tree_cluster (str) – path to the output folder
  • metacyc_to_ecs (dictionary) – mapping of metayc reaction to EC number
Returns:

reactions_clust – reactions in each cluster of the tree

Return type:

dictionary

padmet.utils.exploration.dendrogram_reactions_distance.create_intervene_graph(absence_presence_matrix, reactions_dataframe, temp_data_folder, path_to_intervene, output_folder_upset, dendrogram_fclusters, k, verbose=False)[source]

Create an upset graph.

Parameters:
  • absence_presence_matrix (pandas.DataFrame) – transposition of the reactions dataframe
  • reactions_dataframe (pandas.DataFrame) – dataframe containing absence/presence of reactions in organism
  • temp_data_folder (str) – temporary data folder
  • path_to_intervene (str) – path to intervene bin
  • output_folder_upset (str) – path to output folder
  • dendrogram_fclusters (dictionary) – {number used to split the linkage matrix: ndarray with the corresponding clusters}
  • k (int) – number of cluster to create
padmet.utils.exploration.dendrogram_reactions_distance.dendrogram_biopython(condensed_distance_matrix_jaccard, organisms)[source]

Create a lower triangle matrix. Then create a biopython dendrogram.

Parameters:
  • condensed_distance_matrix_jaccard (ndarray) – Condensed Jaccard distance matrix
  • organisms (list) – organisms names
padmet.utils.exploration.dendrogram_reactions_distance.hclust_to_xml(linkage_matrix)[source]

Using a distance matrix from scipy linkage, create a xml tree corresponding to the hierarchical clustering. Return the root of the tree.

Parameters:linkage_matrix (ndarray) – linkage matrix
Returns:root of the xml tree
Return type:root
padmet.utils.exploration.dendrogram_reactions_distance.pvclust_dendrogram(condensed_distance_matrix_jaccard, organisms, output_folder)[source]

Using a distance matrix, pvclust R package (with rpy2 package) create a dendrogram with bootstrap values.

Parameters:
  • condensed_distance_matrix_jaccard (ndarray) – Condensed Jaccard distance matrix
  • organisms (list) – organisms names
  • output_folder (str) – path to the output folder
padmet.utils.exploration.dendrogram_reactions_distance.reaction_figure_creation(reaction_file, output_folder, upset_cluster=None, padmetRef_file=None, pvclust=None, verbose=False)[source]

Create dendrogram, upset figure (if upset argument) and compare reactiosn in species.

Parameters:
  • reaction_file (str) – path to reaction file
  • upset_cluster (int) – the number of cluster you want in the intervene figure
  • output_folder (str) – path to output folder
  • padmet_ref_file (str) – path to padmet ref file
  • pvclust (bool) – boolean to launch or not R pvclust dendrogram

flux_analysis

Description:

1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.

2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.

3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)

padmet.utils.exploration.flux_analysis.fba_on_targets(allspecies, model)[source]

for each specie in allspecies, create an objective function with the current species as only product and try to optimze the model and get flux.

Parameters:
  • allSpecies (list) – list of species ids to test
  • model (cobra.model) – Cobra model from a sbml file
padmet.utils.exploration.flux_analysis.flux_analysis(sbml_file, seeds_file=None, targets_file=None, all_species=False)[source]

1./ Run flux balance analyse with cobra package on an already defined reaction. Need to set in the sbml the value ‘objective_coefficient’ to 1. If the reaction is reachable by flux: return the flux value and the flux value for each reactant of the reaction. If not: only return the flux value for each reactant of the reaction. If a reactant has a flux of ‘0’ this means that it is not reachable by flux (and maybe topologically). To unblock the reaction it is required to fix the metabolic network by adding/removing reactions until all reactant are reachable.

2./If seeds and targets given as sbml files with only compounds. Will also try to use the Menetools library to make a topologicall analysis. Topological reachabylity of the targets compounds from the seeds compounds.

3./ If –all_species: will test flux reachability of all the compounds in the metabolic network (may take several minutes)

Parameters:
  • sbml_file (str) – path to sbml file to analyse
  • seeds_file (str) – path to sbml file with only compounds representing the seeds/growth medium
  • targets_file (str) – path to sbml file with only compounds representing the targets to reach
  • all_species (bool) – if True will try to create obj function for each compound and return which are reachable by flux.

get_pwy_from_rxn

Description:
From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio
padmet.utils.exploration.get_pwy_from_rxn.dict_pwys_to_file(dict_pwy, output)[source]

Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()

Parameters:
  • dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
  • output (str) – path to output file
padmet.utils.exploration.get_pwy_from_rxn.extract_pwys(padmet, reactions)[source]

#extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate
  • reactions (set) – set of reactions to match with pathways
Returns:

dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Return type:

dict

padmet_stats

Description:
From a file containing a list of reaction, return the pathways where these reactions are involved. ex: if rxn-a in pwy-x => return, pwy-x; all rxn ids in pwy-x; all rxn ids in pwy-x FROM the list; ratio
padmet.utils.exploration.get_pwy_from_rxn.dict_pwys_to_file(dict_pwy, output)[source]

Create csv file from dict_pwy. dict_pwy is obtained with extract_pwys()

Parameters:
  • dict_pwy (dict) – dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}
  • output (str) – path to output file
padmet.utils.exploration.get_pwy_from_rxn.extract_pwys(padmet, reactions)[source]

#extract from padmet pathways containing 1-n reactions from a set of reactions ‘reactions’ Return a dict of data. dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Parameters:
  • padmet (padmet.classes.PadmetSpec) – padmet to udpate
  • reactions (set) – set of reactions to match with pathways
Returns:

dict, k=pathway_id, v=dict: k in [total_rxn, rxn_from_list, ratio ex: {pwy-x:{‘total_rxn’:[a,b,c], rxn_from_list:[a], ratio:1/3}}

Return type:

dict

padmet_stats

Description:

Create a padmet stats file containing the number of pathways, reactions, genes and compounds inside the padmet.

The input is a padmet file or a folder containing multiple padmets.

Create a tsv file named padmet_stats.tsv where the script have been launched.

padmet.utils.exploration.padmet_stats.compute_stats(padmet_file_folder)[source]

Count reactions/pathways/compounds/genes in padmet(s).

Parameters:padmet_file_folder (str) – path to the padmet file/folder to analyze
padmet.utils.exploration.padmet_stats.orthology_result(padmet_file, padmet_names)[source]

Count reactions/pathways/compounds/genes in a padmet file.

Parameters:
  • padmet_file (str) – path to a padmet file
  • padmet_names (list) – all the padmet filenames
Returns:

Number of reactions given by the other species

Return type:

pandas.DataFrame

padmet.utils.exploration.padmet_stats.padmet_stat(padmet_file)[source]

Count reactions/pathways/compounds/genes in a padmet file.

Parameters:padmet_file (str) – path to a padmet file
Returns:[path to padmet, number of pathways, number of reactions, number of genes, number of compounds]
Return type:list

prot2genome

Description:
Prot2Genome contains functions used for blast analysis and padmet enrichment
padmet.utils.exploration.prot2genome.analysisOutput(analysis_result, analysis_output)[source]
padmet.utils.exploration.prot2genome.cleanTmp(tmp_folder)[source]

Remove all files from tmp folder

Parameters:tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse
padmet.utils.exploration.prot2genome.createPadmet(dict_args)[source]

function used in mp_createPadmet by each worker the Pool padmet are updated using funciton add_delete_rxn from padmet.utils.connection.manual_curation

padmet.utils.exploration.prot2genome.extractAnalysis(blast_analysis_folder, spec_reactions_folder, output_folder)[source]
For each analysis output in blast analysis folder, obtained with runAllAnalysis()

1./ Extract orthologues hit 2./ For each specific reactions from spec_reactions_folder, if all genes of a reactions got ortho hit

add reaction to reactions_to_add
Parameters:
  • blast_analysis_folder (str) – path folder with all blast analysis output files
  • spec_reactions_folder (str) – path folder with all files containing specific reactions
  • output_folder (str) – path folder where to extract all reactions to add
padmet.utils.exploration.prot2genome.extractGenes(reactions_file)[source]

Extract genes ids and return a list from reactions_file obtained with extractReactions()

Parameters:reactions_file (str) – path to reaction file
padmet.utils.exploration.prot2genome.extractReactions(dict_args)[source]

function used in mp_cextractReactions by each worker the Pool for org_a.padmet and org_b.padmet:

1./ extract reactions and specific reactiosn (not in a, not in b) 2./ extract genes associated to specific reactions 3./ Select only reactions if they are from annotation rxn-1 in org_a but not in org_b, if rxn-1 doesn’t come from org_a annotation, skip the reaction 4./ create output file: header = [“reaction_id”, “genes_ids”, “sources”]
padmet.utils.exploration.prot2genome.fromAucome(run_folder, cpu, padmetRef, blastp=True, tblastn=True, exonerate=True, debug=False)[source]

This function fit an AuCoMe run. Select a aucome run folder and then the function will: 1./ For each couple of studied organisms, extract specific reactions

ex: For org A and org B, extract reactions in org A but not in org B and vice versa

2./ Then for each specific reactions, extract genes associated and run blastp, tblastn and exonerate 3./ For each reaction, for all genes associated, if no blastp match but tblastn and exonerate hit select the reaction as a hit 4./ Create a new padmet file with the new reactions to add within

Parameters:
  • run_folder (str) – path to aucome run folder
  • cpu (int) – number of cpu to use for multiprocessing steps
  • padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files
  • blastp (bool) – If true run blastp during analysis
  • tblastn (bool) – If true run tblastn during analysis
  • exonerate (bool) – If true run exonerate during analysis, tblastn must also be True
  • debug (bool) – if true, print all raw informations of analysis
padmet.utils.exploration.prot2genome.mp_createPadmet(reactions_to_add_folder, padmet_folder, output_folder, padmetRef, cpu, verbose=False)[source]

Update all padmet in padmet_folder with reactions to add from file in reactiosn_to_add_folder, the informations of the reactions are extracted from padmetRef as unique source ex: for padmet_folder/org_a.padmet, select reactions_to_add_folder/org_a.csv, add each reactions listed in this file based on padmetRef to create output_folder/org_a.padmet Create the padmet files in multiprocess, the more cpu the more new padmet files will be created faster

Parameters:
  • reactions_to_add_folder (str) – path folder with all files containing reactions to add for each studied organism
  • padmet_folder (str) – path to folder with all padmet files of studied organism
  • output_folder (str) – path to output folder where to create new padmet files
  • padmetRef (str) – path to padmetRef from where to extract and add the new reactions to create new padmet files
  • cpu (int) – number of cpu to use for multiprocessing steps
  • verbose (bool) – verbose
padmet.utils.exploration.prot2genome.mp_extractReactions(padmet_folder, output_folder, cpu)[source]

From a folder of padmet files, create all dual combination and extract specific reactions to create a file in output_folder ex: in padmet_folder: org_a.padmet, org_b.padmet, create: output_folder: org_a_vs_org_b.csv and org_b_vs_org_a.csv

Parameters:
  • padmet_folder (str) – path to folder with all padmet files of studied organism
  • output_folder (str) – path to output folder where to extract specific reactions
  • cpu (int) – number of cpu to use for multiprocessing steps
padmet.utils.exploration.prot2genome.mp_runAnalysis(spec_reactions_folder, studied_organisms_folder, output_folder, tmp_folder, cpu, blastp, tblastn, exonerate, debug)[source]

Run different blast analysis based on files representing specific reactions of 2 padmet files. For each specific reaction file in spec_reactions_folder (ex: org_a_vs_org_b.csv):

1./ search for:

faa file of org_a (studied_organisms_folder/org_a/org_a.faa) gbk file of org_b (studied_organisms_folder/org_b/org_b.gbk) faa file of org_b (studied_organisms_folder/org_b/org_b.faa) fna file of org_b (studied_organisms_folder/org_b/org_b.fna)

if fna doesn’t exist create it

2./ if output file (blast_analysis_folder/org_a_VS_org_b.csv) doesn’t already exist run analysis 3./ extracts all genes ids from specific reaction file with fct extractGenes() 4./ Run blastp, tblastn, exonerate on gene_id.faa vs target.faa / fna with runAllAnalysis() 5./ Create analysis output The analysis create a lot of temp files, all are in tmp_folder wich is cleanned after all loop

Parameters:
  • spec_reactions_older (str) – path folder with all files containing specific reactions
  • studied_organisms_folder (str) – path to folder with all data of studied organisms. Folder contains 1 folder by org with name as org name, in each: org.gbk,org.faa,org.fna
  • output_folder (str) – path to output folder where to extract blast analysis
  • tmp_folder (str) – path to tmp folder where to create faa of each gene to analyse
  • cpu (int) – number of cpu to use for multiprocessing steps
  • blastp (bool) – If true run blastp during analysis
  • tblastn (bool) – If true run tblastn during analysis
  • exonerate (bool) – If true run exonerate during analysis, tblastn must also be True
  • debug (bool) – if true, print all raw informations of analysis
padmet.utils.exploration.prot2genome.runAllAnalysis(dict_args)[source]
For a given gene query id:
1/ extract from query_faa the sequence and create a faa file output_folder/query_id.faa
If isoforms found, also search for each specific isoform

2/ if blastp, run blastp; if tblastn, run tblastn; if exonerate and tblastn has hit, run exonerate Run all of them and extract output as dict of data

Returns:list of dict with all analysis output
Return type:list
padmet.utils.exploration.prot2genome.runBlastp(query_seq_faa, subject_faa, header=['sseqid', 'evalue', 'bitscore'], debug=False)[source]

Run blastp on querry_seq vs subectj faa and return output based on header Use NcbiblastpCommandline fct and extract output Extract 1st best hit based on bitscore

Parameters:
  • query_seq_faa (str) – path to query fasta sequence
  • subject_faa (str) – path to subject fasta sequence
  • header (list) – output format of blastp
  • debug (bool) – if true print all raw blastp output
Returns:

dict of the best blastp hit, add ‘blastp_’ tag, or empty dict if no hit

Return type:

dict

padmet.utils.exploration.prot2genome.runExonerate(query_seq_faa, sseq_seq_faa, output, debug=False)[source]

Run exonerate on querry_seq vs subject faa Exonerate must be installed, and the global var PATH must be update with the exonerate/bin/ command ‘exonerate’ should work from shell sseq_seq_faa is obtained after tblastn run based on tblastn_sseqid value

Parameters:
  • query_seq_faa (str) – path to query fasta sequence
  • sseq_seq_faa (str) – path to subject faa sequence
  • output (str) – path to exonerate output
  • debug (bool) – if true print all raw exonerate output
Returns:

dict of the best exonerate hit, add ‘exonerate_’ tag, or empty dict if no hit

Return type:

dict

padmet.utils.exploration.prot2genome.runTblastn(query_seq_faa, subject_fna, header=['sseqid', 'evalue', 'bitscore'], debug=False)[source]

Run tblastn on querry_seq vs subectj fna and return output based on header Use NcbitblastnCommandline fct and extract output Extract 1st best hit based on bitscore

Parameters:
  • query_seq_faa (str) – path to query fasta sequence
  • subject_fna (str) – path to subject fna sequence
  • header (list) – output format of tblastn
  • debug (bool) – if true print all raw tblastn output
Returns:

dict of the best tblastn hit, add ‘tblastn_’ tag, or empty dict if no hit

Return type:

dict

visu_path