Package Functions Help
DESCRIPTION
Type: Package
Package: EnrichGT
Title: EnrichGT - all in one enrichment analysis solution
Version: 2.2.0
Authors@R: c(
person("Zhiming", "Ye", , "garnetcrow@hotmail.com", role = c("aut", "cre")),
person("Runchen", "Wang", , "runchen_wang@163.com", role = "cph")
)
Description: Do biological enrichment analysis and parsing and clustering
enrichment result to insightful results in just ONE package
License: GPL-V3
URL: https://zhimingye.github.io/EnrichGT/
Depends:
R (>= 4.1.0)
Imports:
AnnotationDbi,
cli,
dplyr,
ellmer,
fgsea,
fontawesome,
forcats,
ggdendro,
ggplot2,
ggwordcloud,
glue,
GO.db,
graphics,
grDevices,
gt,
htmltools,
Matrix,
methods,
proxy,
qvalue,
R6,
RColorBrewer,
Rcpp,
reactome.db,
rlang,
scales,
stats,
stringr,
text2vec,
tibble,
utils,
xfun
Suggests:
BiocManager,
readr,
testthat (>= 3.0.0)
LinkingTo:
Rcpp
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
comparison_reactor_base: Comparison Reactor Base
Description
Typically created via [egt_comparison_reactor](egt_comparison_reactor)()A framework for comparing and analyzing enrichment results across multiple groups. This is the base class that provides core functionality for both ORA and GSEA analysis.
Details
The comparison reactor allows: - Appending multiple enrichment results from different groups - Filtering results by p-value or NES score - Comparing results between groups - Identifying relationships between enriched terms - Visualizing biological themes - Sub-clustering results for deeper analysis
Methods
Public methods
comparison_reactor_base$new()comparison_reactor_base$append_enriched_result()comparison_reactor_base$summarize()comparison_reactor_base$prefilter_by_p_adj()comparison_reactor_base$find_relationship()comparison_reactor_base$fetch_relationship()comparison_reactor_base$fetch_biological_theme()comparison_reactor_base$split_by_cluster()comparison_reactor_base$get_splited_list()comparison_reactor_base$do_recluster()comparison_reactor_base$get_recluster_result()comparison_reactor_base$print()comparison_reactor_base$clone()
Method new()
Create a new comparison reactor base object
Usage
comparison_reactor_base$new(Type = NULL)
Arguments
Type: The analysis type (“ORA” or “GSEA”)
Method append_enriched_result()
Add enrichment results to the reactor
Usage
comparison_reactor_base$append_enriched_result(x, group)
Arguments
x: Data.frame of enrichment results (must contain p-value/adjusted p-value columns)group: Character string naming the group for these results
Returns
The reactor object (invisible) for method chaining
Examples
\dontrun{
reactor <- egt_comparison_reactor("ORA")
reactor$append_enriched_result(ora_result1, "group1")
}
Method summarize()
Print summary of groups in reactor
Usage
comparison_reactor_base$summarize()
Method prefilter_by_p_adj()
Filter enrichment results by adjusted p-value cutoff
Usage
comparison_reactor_base$prefilter_by_p_adj(x = 0.05)
Arguments
x: Numeric cutoff for adjusted p-values (default 0.05)
Returns
The reactor object (invisible) for method chaining
Examples
\dontrun{
reactor$prefilter_by_p_adj(0.01) # Use stricter cutoff
}
Method find_relationship()
Identify relationships between enriched terms across groups
Usage
comparison_reactor_base$find_relationship(
Num = NULL,
dist_method = "euclidean",
hclust_method = "ward.D2",
...
)
Arguments
Num: Number of top terms to consider from each groupdist_method: Distance method for clustering (default “euclidean”)hclust_method: Hierarchical clustering method (default “ward.D2”)...: Additional parameters passed to heatmap function
Returns
The reactor object (invisible) for method chaining
Examples
\dontrun{
reactor$find_relationship(Num = 5, dist_method = "manhattan")
}
Method fetch_relationship()
Retrieve relationship data between terms
Usage
comparison_reactor_base$fetch_relationship()
Returns
Data.frame containing term relationships and cluster assignments
Examples
\dontrun{
relation_df <- reactor$fetch_relationship()
head(relation_df)
}
Method fetch_biological_theme()
Generate wordcloud visualization of biological themes
Usage
comparison_reactor_base$fetch_biological_theme(...)
Arguments
...: Additional parameters passed to wordcloud generator
Returns
List containing ggplot2 wordcloud objects
Examples
\dontrun{
wordclouds <- reactor$fetch_biological_theme()
wordclouds[[1]] # View first wordcloud
}
Method split_by_cluster()
Split results by identified clusters for further analysis
Usage
comparison_reactor_base$split_by_cluster(...)
Arguments
...: Additional parameters
Returns
The reactor object (invisible) for method chaining
Method get_splited_list()
Get list of results split by cluster
Usage
comparison_reactor_base$get_splited_list()
Returns
List of data frames split by cluster
Method do_recluster()
Perform sub-clustering within existing clusters
Usage
comparison_reactor_base$do_recluster(
ClusterNum = 10,
P.adj = 0.05,
force = F,
nTop = 10,
method = "ward.D2",
...
)
Arguments
ClusterNum: Number of sub-clusters to generateP.adj: Adjusted p-value cutoff (default 0.05)force: Whether to force reclustering (default FALSE)nTop: Number of top terms to consider (default 10)method: Clustering method (default “ward.D2”)...: Additional parameters passed to clustering functions
Returns
The reactor object (invisible) for method chaining
Examples
\dontrun{
reactor$do_recluster(ClusterNum = 5, method = "complete")
}
Method get_recluster_result()
Retrieve sub-clustering results
Usage
comparison_reactor_base$get_recluster_result()
Returns
List containing reclustering results
Examples
\dontrun{
recluster_results <- reactor$get_recluster_result()
names(recluster_results)
}
Method print()
Usage
comparison_reactor_base$print(...)
Method clone()
The objects of this class are cloneable with this method.
Usage
comparison_reactor_base$clone(deep = FALSE)
Arguments
deep: Whether to make a deep clone.
Examples
## ------------------------------------------------
## Method `comparison_reactor_base$append_enriched_result`
## ------------------------------------------------
reactor <- egt_comparison_reactor("ORA")
reactor$append_enriched_result(ora_result1, "group1")
## ------------------------------------------------
## Method `comparison_reactor_base$prefilter_by_p_adj`
## ------------------------------------------------
reactor$prefilter_by_p_adj(0.01) # Use stricter cutoff
## ------------------------------------------------
## Method `comparison_reactor_base$find_relationship`
## ------------------------------------------------
reactor$find_relationship(Num = 5, dist_method = "manhattan")
## ------------------------------------------------
## Method `comparison_reactor_base$fetch_relationship`
## ------------------------------------------------
relation_df <- reactor$fetch_relationship()
head(relation_df)
## ------------------------------------------------
## Method `comparison_reactor_base$fetch_biological_theme`
## ------------------------------------------------
wordclouds <- reactor$fetch_biological_theme()
wordclouds[[1]] # View first wordcloud
## ------------------------------------------------
## Method `comparison_reactor_base$do_recluster`
## ------------------------------------------------
reactor$do_recluster(ClusterNum = 5, method = "complete")
## ------------------------------------------------
## Method `comparison_reactor_base$get_recluster_result`
## ------------------------------------------------
recluster_results <- reactor$get_recluster_result()
names(recluster_results)comparison_reactor_gsea: GSEA Comparison Reactor
Description
Typically created via [egt_comparison_reactor](egt_comparison_reactor)("GSEA")
Details
This reactor is optimized for comparing GSEA results across multiple groups, with methods tailored for NES (Normalized Enrichment Score) based comparisons.
Super class
[EnrichGT::comparison_reactor_base](https://rdrr.io/pkg/EnrichGT/man/comparison_reactor_base.html) -> comparison_reactor_gsea
Methods
Public methods
Method new()
Create a new GSEA comparison reactor
Usage
comparison_reactor_gsea$new()
Method make_plans()
Create comparison plans between specified GSEA groups
Usage
comparison_reactor_gsea$make_plans(group = "auto", use_value = "NES")
Arguments
group: Character vector of group names to compare or “auto” for all groupsuse_value: Which value to use for comparison (“NES” for normalized enrichment score)
Returns
The reactor object (invisible) for method chaining
Examples
\dontrun{
gsea_reactor$make_plans(group = c("group1", "group2"))
}
Method prefilter_by_NES()
Filter GSEA results by normalized enrichment score cutoff
Usage
comparison_reactor_gsea$prefilter_by_NES(x = 1)
Arguments
x: Numeric cutoff for absolute NES values (default 1)
Returns
The reactor object (invisible) for method chaining
Examples
\dontrun{
gsea_reactor$prefilter_by_NES(1.5) # Filter for stronger effects
}
Method clone()
The objects of this class are cloneable with this method.
Usage
comparison_reactor_gsea$clone(deep = FALSE)
Arguments
deep: Whether to make a deep clone.
Seealso
[comparison_reactor_base](comparison_reactor_base) for inherited methods A specialized reactor for comparing Gene Set Enrichment Analysis (GSEA) results. Inherits from comparison_reactor_base and provides GSEA-specific functionality.
Examples
## ------------------------------------------------
## Method `comparison_reactor_gsea$make_plans`
## ------------------------------------------------
gsea_reactor$make_plans(group = c("group1", "group2"))
## ------------------------------------------------
## Method `comparison_reactor_gsea$prefilter_by_NES`
## ------------------------------------------------
gsea_reactor$prefilter_by_NES(1.5) # Filter for stronger effectscomparison_reactor_ora: ORA Comparison Reactor
Description
Typically created via [egt_comparison_reactor](egt_comparison_reactor)("ORA")
Details
This reactor is optimized for comparing ORA results across multiple groups, with methods tailored for p-value based comparisons.
Super class
[EnrichGT::comparison_reactor_base](https://rdrr.io/pkg/EnrichGT/man/comparison_reactor_base.html) -> comparison_reactor_ora
Methods
Public methods
Method new()
Create a new ORA comparison reactor
Usage
comparison_reactor_ora$new()
Method make_plans()
Create comparison plans between specified ORA groups
Usage
comparison_reactor_ora$make_plans(group = "auto", use_value = "p")
Arguments
group: Character vector of group names to compare or “auto” for all groupsuse_value: Which value to use for comparison (“p” for p-value or “padj” for adjusted p-value)
Returns
The reactor object (invisible) for method chaining
Examples
\dontrun{
ora_reactor$make_plans(group = c("group1", "group2"), use_value = "padj")
}
Method clone()
The objects of this class are cloneable with this method.
Usage
comparison_reactor_ora$clone(deep = FALSE)
Arguments
deep: Whether to make a deep clone.
Seealso
[comparison_reactor_base](comparison_reactor_base) for inherited methods A specialized reactor for comparing Over-Representation Analysis (ORA) results. Inherits from comparison_reactor_base and provides ORA-specific functionality.
Examples
## ------------------------------------------------
## Method `comparison_reactor_ora$make_plans`
## ------------------------------------------------
ora_reactor$make_plans(group = c("group1", "group2"), use_value = "padj")convert_annotations_genes: Convert gene annotations from any keys to any keys
Description
Convert gene annotations from any keys to any keys
Usage
convert_annotations_genes(genes, from_what, to_what, OrgDB)Arguments
genes: gene vectorfrom_what: input type (like “SYMBOL”,“ENTREZID”,“ENSEMBL”,“GENENAME”,…), keys should be supported by AnnotationDbi. Search for the help page of AnnotationDbi for further help.to_what: output type (like “SYMBOL”,“ENTREZID”,“ENSEMBL”,“GENENAME”,…), keys should be supported by AnnotationDbi. Search for the help page of AnnotationDbi for further help. Can be multiple items E.g.c("ENTREZID","ENSEMBL","GENENAME")OrgDB: human = org.Hs.eg.db, mouse = org.Mm.eg.db, search BioConductor website for further help
Value
a data.frame
database_from_gmt: Parse GMT format gene set files
Description
Reads gene set files in GMT format (e.g., from MSigDB or WikiPathways) and converts them to a data frame suitable for enrichment analysis. Can optionally convert ENTREZ IDs to gene symbols.
Usage
database_from_gmt(gmtfile, OrgDB = NULL, convert_2_symbols = T)Arguments
gmtfile: Path to GMT format fileOrgDB: Annotation database for ID conversion (e.g., org.Hs.eg.db for human). Required if convert_2_symbols=TRUE.convert_2_symbols: Logical indicating whether to convert ENTREZ IDs to gene symbols. Default is TRUE.
Value
A data frame with columns:
- term: Gene set name
- gene: Gene identifiers (symbols or ENTREZ IDs)
If input has 3 columns, includes an additional ID column.
Examples
# Read MSigDB hallmark gene sets
gmt_file <- system.file("extdata", "h.all.v7.4.symbols.gmt", package = "EnrichGT")
gene_sets <- database_from_gmt(gmt_file)
# Read WikiPathways with ENTREZ to symbol conversion
gmt_file <- "wikipathways-20220310-gmt-Homo_sapiens.gmt"
gene_sets <- database_from_gmt(gmt_file, OrgDB = org.Hs.eg.db)egt_compare_groups: 2-Group Comparison of enrichment results and further clustering and visualizing
Description
See ?egt_enrichment_analysis()
Usage
egt_compare_groups(
obj.test,
obj.ctrl,
name.test = NULL,
name.ctrl = NULL,
ClusterNum = 15,
P.adj = 0.05,
force = F,
nTop = 10,
method = "ward.D2",
...
)Arguments
obj.test: the enriched object from tested group. WARNING:obj.testandobj.ctrlshould come from same database (e.g. GO Biological Process(GOBP)).obj.ctrl: the enriched object from control group. WARNING:obj.testandobj.ctrlshould come from same database (e.g. GO Biological Process(GOBP)).name.test: optional, the name of the testing group. If isNULL, the object name ofobj.testwill be used.name.ctrl: optional, the name of the control group. If isNULL, the object name ofobj.ctrlwill be used.ClusterNum: how many cluster will be clusteredP.adj: p.adjust cut-off. To avoid slow visualization, you can make stricter p-cut off.force: ignore all auto-self-checks, which is usefulnTop: keep n top items according to p-adj in each cluster.method: the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of “ward.D”, “ward.D2”, “single”, “complete”, “average” (= UPGMA), “mcquitty” (= WPGMA), “median” (= WPGMC) or “centroid” (= UPGMC)....: Others options.
Details
Execute obj.test VS obj.ctrl tests, showing pathway overlaps (or differences) and meta-gene modules of test group and control group. Supports ORA and GSEA results (enriched object or data.frame). !WARNING!: obj.test and obj.ctrl should come from same database (e.g. GO Biological Process(GOBP)).
Value
List containing multiple EnrichGT_obj objects. The List contains objects with overlapped enriched terms, unique enrich terms.
egt_comparison_reactor: Framework for comparing and analyzing enrichment results
across multiple groups
Description
Framework for comparing and analyzing enrichment results across multiple groups
Usage
egt_comparison_reactor(Type = NULL)Details
Type should be one of “ORA” or “GSEA”. For functions inside reactor, please see below ‘See also’ (in the bottom of this doc)
Seealso
[comparison_reactor_base](comparison_reactor_base) for the base class documentation [comparison_reactor_ora](comparison_reactor_ora) for ORA-specific functionality [comparison_reactor_gsea](comparison_reactor_gsea) for GSEA-specific functionality
Examples
# ORA example
reactor <- egt_comparison_reactor("ORA")
reactor$append_enriched_result(ora_result1, "group_1_go")
reactor$append_enriched_result(ora_result3, "group_3_go")
reactor$prefilter_by_p_adj(0.05)
reactor$make_plans(group = c("group_1_go","group_3_go"), use_value = "p")
reactor$find_relationship(Num = 3)
wordcloudFigure <- reactor$fetch_biological_theme()
wordcloudFigure[[1]]
reactor$split_by_cluster()
reactor$do_recluster(ClusterNum = 10)
cls_res <- reactor$get_recluster_result()
relation_df <- reactor$fetch_relationship()
# GSEA can do more with:
gsea_reactor <- egt_comparison_reactor("GSEA")
gsea_reactor$prefilter_by_NES(1.5)egt_enrichment_analysis: Perform Over-Representation Analysis (ORA)
Description
ORA compares the proportion of genes in your target list that belong to specific categories (pathways, GO terms etc.) against the expected proportion in a background set. This implementation uses hash tables for efficient gene counting and supports parallel processing for analyzing multiple gene lists.
Usage
egt_enrichment_analysis(
genes,
database,
p_adj_methods = "BH",
p_val_cut_off = 0.5,
background_genes = NULL,
min_geneset_size = 10,
max_geneset_size = 500,
multi_cores = 0
)Arguments
genes: Input genes, either:- Character vector of gene IDs (e.g.,
c("TP53","BRCA1")) - Named numeric vector from
genes_with_weights()(will split by expression direction) - List of gene vectors for multiple comparisons (e.g., by cell type)
database: Gene set database, either:- Built-in database from
database_GO_BP(),database_KEGG()etc. - Custom data frame with columns: (ID, Pathway_Name, Genes) or (Pathway_Name, Genes)
- GMT file loaded via
database_from_gmt() p_val_cut_off: Adjusted p-value cutoff (default 0.5)background_genes: Custom background genes (default: all genes in database)min_geneset_size: Minimum genes per set (default 10)max_geneset_size: Maximum genes per set (default 500)multi_cores: (Please don’t use this since it has several known bugs) Number of cores for parallel processing (default 0 = serial)p_adj_method: Multiple testing correction method (default “BH” for Benjamini-Hochberg)
Details
Identifies enriched biological pathways or gene sets in a gene list using high-performance C++ implementation with parallel processing support.
Value
A data frame with columns:
- ID: Gene set identifier
- Description: Gene set name
- GeneRatio: Enriched genes / input genes
- BgRatio: Set genes / background genes
- pvalue: Raw p-value
- p.adjust: Adjusted p-value
- geneID: Enriched genes
- Count: Number of enriched genes
For weighted input, additional columns show up/down regulated genes.
Examples
# Basic ORA with GO Biological Processes
genes <- c("TP53", "BRCA1", "EGFR", "CDK2")
res <- egt_enrichment_analysis(genes, database_GO_BP())
# ORA with DEG results (split by direction)
deg_genes <- genes_with_weights(DEG$gene, DEG$log2FC)
res <- egt_enrichment_analysis(deg_genes, database_KEGG())
# Multi-group ORA with parallel processing
gene_lists <- list(
Macrophages = c("CD68", "CD163", "CD169"),
Fibroblasts = c("COL1A1", "COL1A2", "ACTA2")
)
res <- egt_enrichment_analysis(gene_lists, database_Reactome(), multi_cores=0)egt_fetch_biological_theme: Generate Biological Theme Wordcloud from Enrichment Results
Description
Creates a wordcloud visualization from enrichment results, either from a data.frame or an EnrichGT_obj. For data.frames, filters by p.adjust < 0.05 by default.
Usage
egt_fetch_biological_theme(x, cluster = NULL, skip_filtering = F, ...)Arguments
x: Input object - either a data.frame with enrichment results or an EnrichGT_objcluster: Cluster name/number (required if x is EnrichGT_obj)skip_filtering: Logical, if TRUE skips filtering of data.frame by p.adjust...: Additional arguments passed to wordcloud_generator2
Value
ggwordcloud object
egt_fetch_termwise_relationship: Fetch and visualize termwise relationships in enrichment analysis results
Description
This function analyzes the relationships between enriched terms and creates a hierarchical clustering visualization. Please avoiding querying too large dataset.
Usage
egt_fetch_termwise_relationship(
x,
database,
according_to = "Description",
ClusterNum = 4,
method = "ward.D2",
force = F,
maxLength = 40
)Arguments
x: A character vector of terms to analyzedatabase: Query database. For example,database_GO_BP(org.Hs.eg.db). If you are quering fused result, you can input a list likelist(database_GO_BP(org.Hs.eg.db), database_Reactome(org.Hs.eg.db)).according_to: Character string specifying which column to use for filtering, default is “Description”ClusterNum: Integer specifying the number of clusters, default is 4method: Character string specifying the hierarchical clustering method, default is “ward.D2”force: Avoid checking the length of x. Large query will cause long calculation time and poor results.maxLength: Warp length max than what terms
Value
A list containing:
Examples
# Example usage:
result <- egt_fetch_termwise_relationship(
x = c("pathway1", "pathway2"),
database = pathway_db,
according_to = "Description"
)
result$Figureegt_generate_quarto_report: Export Quarto Report
Description
Export Quarto Report
Usage
egt_generate_quarto_report(
re_enrichment_results,
output_path = "Report.qmd",
type = "html"
)Arguments
re_enrichment_results: TheEnrichGT_obj, AI summarized result is more recommanded.output_path: Path of the output qmd file (e.g.,test.qmd)type: export pdf or html. In default output is HTML format.
Value
A quarto document
egt_gsea_analysis: Perform Gene Set Enrichment Analysis (GSEA)
Description
GSEA analyzes whether predefined gene sets show statistically significant enrichment at the top or bottom of a ranked gene list. This implementation uses the fast fgsea algorithm from the fgsea package.
Usage
egt_gsea_analysis(
genes,
database,
p_val_cut_off = 0.5,
min_geneset_size = 10,
max_geneset_size = 500,
gseaParam = 1
)Arguments
genes: A named numeric vector of ranked genes where:- Names are gene identifiers
- Values are ranking metric (e.g., log2 fold change, PCA loading)Must be sorted in descending order (use
genes_with_weights()to prepare) database: Gene set database, either:- Built-in database from
database_GO_BP(),database_KEGG()etc. - Custom data frame with columns: (ID, Pathway_Name, Genes) or (Pathway_Name, Genes)
- GMT file loaded via
database_from_gmt() p_val_cut_off: Adjusted p-value cutoff (default 0.5)min_geneset_size: Minimum genes per set (default 10)max_geneset_size: Maximum genes per set (default 500)gseaParam: GSEA parameter controlling weight of ranking (default 1)p_adj_method: Multiple testing correction method (default “BH” for Benjamini-Hochberg)
Details
Identifies enriched biological pathways in a ranked gene list using the fgsea algorithm.
Value
A data frame with columns:
- ID: Gene set identifier
- Description: Gene set name
- ES: Enrichment score
- NES: Normalized enrichment score
- pvalue: Raw p-value
- p.adjust: Adjusted p-value
- core_enrichment: Leading edge genes
Examples
# Using differential expression results
ranked_genes <- genes_with_weights(DEG$gene, DEG$log2FC)
res <- egt_gsea_analysis(ranked_genes, database_GO_BP())
# Using PCA loadings
ranked_genes <- genes_with_weights(rownames(pca$rotation), pca$rotation[,1])
res <- egt_gsea_analysis(ranked_genes, database_KEGG())
# Custom gene sets from GMT file
ranked_genes <- genes_with_weights(genes, weights)
res <- egt_gsea_analysis(ranked_genes, database_from_gmt("pathways.gmt"))egt_infer_act: Infering Pathway or Transcript Factors activity from EnrichGT meta-gene modules
Description
Only supports gene symbols. PROGENy is a comprehensive resource containing a curated collection of pathways and their target genes, with weights for each interaction. CollecTRI is a comprehensive resource containing a curated collection of TFs and their transcriptional targets compiled from 12 different resources. This collection provides an increased coverage of transcription factors and a superior performance in identifying perturbed TFs compared to our previous. If when doing re-enrichment, you select a high number of clusters, that may cause low gene number in each meta-gene module, and then can’t be infered sucessfully. So if result is empty, please increase the number of re-clustering when doing it.
Usage
egt_infer_act(x, DB = "collectri", species = "human")Arguments
x: an EnrichGT_obj object.DB: can be “progeny” (the Pathway activity database), or “collectri” (TF activity database)species: can be “human” or “mouse”
Value
an ORA result list
egt_llm_multi_summary: Compare Multiple LLM Summaries
Description
Generate summaries using multiple LLM models and create a comparison object.
Usage
egt_llm_multi_summary(
x,
chat_list,
lang = "English",
background_knowledges = NULL,
comparison_prompt = NULL
)Arguments
x: An EnrichGT_obj object from egt_recluster_analysis()chat_list: A named list of LLM chat objectslang: Language for summaries (“English” or “Chinese”)background_knowledges: Additional Reference (e.g, Papers or guide) for LLM for referencecomparison_prompt: Custom prompt for generating comparison summary
Value
EnrichGT_obj with LLM_Comparison slot filled
egt_llm_summary: Summarize EnrichGT results using LLM
Description
This function uses a Large Language Model (LLM) to generate summaries for pathway clusters and gene modules in an EnrichGT_obj object.
Usage
egt_llm_summary(
x,
chat,
lang = "English",
model_name = NULL,
background_knowledges = NULL
)Arguments
x: An EnrichGT_obj object created by[egt_recluster_analysis](egt_recluster_analysis).chat: An LLM chat object created by theellmerpackage.lang: Language pass to LLM. Can beEnglishorChinese.background_knowledges: Additional Reference (e.g, Papers or guide) for LLM for reference. A Single Character String should be provided.
Note
It is recommended not to add system prompts when creating the chat object. The function provides its own carefully crafted prompts for biological analysis.
References
For more information about creating chat objects, see the ellmer package documentation.
Seealso
[egt_recluster_analysis](egt_recluster_analysis) to create the input object.
Value
Returns the input EnrichGT_obj object with added LLM annotations in the LLM_Annotation slot. The annotations include:
pathways: Summaries of pathway clustersgenes_and_title: Summaries of gene modules and their titles
Examples
# Create LLM chat object
chat <- chat_deepseek(api_key = YOUR_API_KEY, model = "deepseek-chat", system_prompt = "")
# Run enrichment analysis and get EnrichGT_obj
re_enrichment_results <- egt_recluster_analysis(...)
# Get LLM summaries
re_enrichment_results <- egt_llm_summary(re_enrichment_results, chat)egt_plot_gsea: Generate GSEA Enrichment Plots
Description
This function creates graphical representations of Gene Set Enrichment Analysis (GSEA) results, including either a multi-panel GSEA table plot for multiple pathways or a single pathway enrichment plot. The visualization leverages the fgsea package’s plotting functions.
Usage
egt_plot_gsea(resGSEA$Description[1],genes = genes_with_weights(genes = DEGexample$...1, weights = DEGexample$log2FoldChange),database = database_GO_BP(org.Hs.eg.db))
egt_plot_gsea(resGSEA[1:8,],genes = genes_with_weights(genes = DEGexample$...1, weights = DEGexample$log2FoldChange),database = database_GO_BP(org.Hs.eg.db))Arguments
x: A GSEA result object. Can be either:- A data frame containing GSEA results (requires columns: pvalue, p.adjust, Description)
- A character string specifying a single pathway name
genes: A named numeric vector fromgenes_with_weights(). These should match the gene identifiers used in the GSEA analysis.database: A databasedata.frame, You can obtain it fromdatabase_xxx()functions. This should correspond to the database used in the original GSEA analysis.
Value
A ggplot object:
- When
xis a data frame: Returns a multi-panel plot showing normalized enrichment scores (NES), p-values, and leading edge plots for top pathways - When
xis a pathway name: Returns an enrichment plot showing the running enrichment score for the specified pathway
egt_plot_results: Visualize enrichment results using simple plot
Description
This plot is the most widely like enrichplot::dotplot()used method to visualize enriched terms. It shows the enrichment scores (e.g. p values) and gene ratio or NES as dot size and color / or bar height. Users can specify the number of terms using ntop or selected terms to color via the low.col and hi.col.
Usage
egt_plot_results(
x,
ntop = NULL,
showIDs = F,
max_len_descript = 40,
keepAll = F,
maskNoise = 3,
...,
P.adj = NULL
)Arguments
x: a data frame form enriched result likeegt_enrichment_analysis()oregt_gsea_analysis(), or an re-clusteredEnrichGTobjectntop: Show top N in each cluster. In default, for origin enriched result, showing top 15; for re-clustered object, showing top 5 in each cluster.showIDs: bool, show pathway IDs or not. Default is FALSEmax_len_descript: the label format length, default as 40.keepAll: Do filtering to avoid overlap of same genes or notmaskNoise: (Only works with re-enriched object) Cut-off value to mask rare population in cluster tree. Less than its value in a specifc child tree will be ignore (because it may hit only by coincidence). Set maskNoise = 0 to ignore this....: Other paramP.adj: (Only works with origin data.frame) If pass an origin data.frame from original enriched result, you can specify the P-adjust value cut off. If is null, default is 0.05. When passingEnrichGT_obj, this filter is previously done byegt_recluster_analysis.low.col: the color for the lowesthi.col: the color for the highest
Value
a ggplot2 object
egt_recluster_analysis: Cluster and re-enrichment enrichment results
Description
Performs hierarchical clustering on enrichment results (ORA or GSEA) based on gene-term associations to reduce redundancy and improve biological interpretation. The function helps identify coherent groups of related terms while preserving important but less significant findings.
Usage
egt_recluster_analysis(
x,
ClusterNum = 10,
P.adj = 0.05,
force = F,
nTop = 10,
method = "ward.D2",
...
)Arguments
x: Enrichment result fromEnrichGTorclusterProfiler. For multi-database results, provide alist.ClusterNum: Number of clusters to create (default: 10).P.adj: Adjusted p-value cutoff (default: 0.05). Stricter values improve performance.force: Logical to bypass validation checks (default: FALSE).nTop: Number of top terms to keep per cluster by p-value (default: 10).method: Hierarchical clustering method (default: “ward.D2”). One of: “ward.D”, “ward.D2”, “single”, “complete”, “average” (UPGMA), “mcquitty” (WPGMA), “median” (WPGMC), or “centroid” (UPGMC)....: Additional arguments passed to clustering functions.
Details
Input requirements by analysis type: ORA results: Required columns: “ID”, “Description”, “GeneRatio”, “pvalue”, “p.adjust”, “geneID”, “Count” GSEA results: Required columns: “ID”, “Description”, “NES”, “pvalue”, “p.adjust”, “core_enrichment” compareClusterResult: Either the compareClusterResult object or a data frame with: - All ORA columns listed above - Additional “Cluster” column Multi-database: Provide as a named list of the above result types
Value
An EnrichGT_obj containing:
- enriched_result: Filtered results data frame
- gt_object: Formatted
gt_tbltable object - gene_modules: List of gene modules per cluster
- pathway_clusters: Pathway names by cluster
- clustering_tree:
hclustobject for visualization - raw_enriched_result: Unfiltered results table
Examples
# ORA example
res <- egt_recluster_analysis(ora_result, ClusterNum=8)
plot(res@clustering_tree)
# GSEA example
gsea_res <- egt_recluster_analysis(gsea_result, method="average")
gsea_resegt_summary: Generate HTML Summary for EnrichGT Object
Description
This function generates an HTML-formatted summary of enrichment results that can be displayed in RStudio viewer or web browsers.
Usage
egt_summary(x, name)Arguments
x: An EnrichGT_obj objectname: The cluster name to summarize
Value
HTML formatted summary that can be viewed with htmltools::html_print() or similar
Examples
# Generate HTML summary for cluster 1
html_summary <- egt_summary(obj, "Cluster_1")
htmltools::html_print(html_summary)
# Or view in RStudio viewer
if (interactive()) {
html_summary <- egt_summary(obj, "Cluster_1")
print(html_summary)
}egt_web_interface: Launch EnrichGT Web Interface
Description
Launch an interactive Shiny web application that provides a user-friendly interface for performing enrichment analysis using EnrichGT package functions.
Usage
egt_web_interface(
LLM = NULL,
port = NULL,
host = "127.0.0.1",
launch.browser = TRUE
)Arguments
LLM: An optional LLM chat object created by theellmerpackage. If provided, enables LLM-powered summarization of recluster analysis results. If NULL (default), LLM features are disabled.port: Port number for the application. If NULL (default), a random available port will be used.host: IP address that the application should listen on. Default is “127.0.0.1” (localhost).launch.browser: Logical. If TRUE (default), the web browser will be launched automatically.
Details
The web interface includes major analysis modules in EnrichGT.
Seealso
[egt_enrichment_analysis](egt_enrichment_analysis), [egt_gsea_analysis](egt_gsea_analysis), [egt_recluster_analysis](egt_recluster_analysis), [egt_plot_results](egt_plot_results), [egt_llm_summary](egt_llm_summary)
Value
Starts a Shiny application
Examples
# Launch the web interface with default settings
egt_web_interface()
# Launch with LLM support
library(ellmer)
chat <- chat_deepseek(api_key = "your_api_key", model = "deepseek-chat")
egt_web_interface(LLM = chat)
# Launch on a specific port
egt_web_interface(port = 3838)
# Launch without opening browser automatically
egt_web_interface(launch.browser = FALSE)genes_with_weights: Create a ranked gene list for GSEA analysis
Description
Takes gene identifiers and corresponding weights (like log2 fold changes) and returns a ranked vector suitable for Gene Set Enrichment Analysis (GSEA).
Usage
genes_with_weights(genes, weights)Arguments
genes: Character vector of gene identifiers (e.g., gene symbols or ENTREZ IDs)weights: Numeric vector of weights for each gene (typically log2 fold changes)
Value
A named numeric vector sorted in descending order by weight, where: - Names are gene identifiers - Values are the corresponding weights
Examples
# Example using differential expression results
genes <- c("TP53", "BRCA1", "EGFR")
log2fc <- c(1.5, -2.1, 0.8)
ranked_genes <- genes_with_weights(genes, log2fc)database_...: Get database for enrichment analysis
Description
Get Gene Ontology (GO), Reactome, and other term-to-gene database, for enrichment analysis
Usage
database_GO_BP(OrgDB = org.Hs.eg.db)
database_GO_CC(OrgDB = org.Hs.eg.db)
database_GO_MF(OrgDB = org.Hs.eg.db)
database_GO_ALL(OrgDB = org.Hs.eg.db)
database_Reactome(OrgDB = org.Hs.eg.db)
database_progeny_human()
database_progeny_mouse()
database_CollecTRI_human()
database_CollecTRI_mouse()Arguments
OrgDB: The AnnotationDbi database to fetch pathway data and convert gene IDs to gene symbols. For human it would beorg.Hs.eg.db, for mouse it would beorg.Mm.eg.db. In AnnotationDbi there are many species, please searchAnnotationDbifor other species annotation database. GO and Reactome should add this, progeny and collectri do not.
Value
a data.frame with ID, terms and genes
database_KEGG: Get KEGG database from KEGG website
Description
KEGG is a commercialized database. So EnrichGT can’t pre-cache them locally. You can use this function to fetch KEGG database pathways and modules.
Usage
database_KEGG(kegg_organism="hsa",OrgDB = org.Hs.eg.db,kegg_modules=F,local_cache=F)
database_KEGG_show_organism()Arguments
kegg_organism: Determine which species data from KEGG will be fetch. For human, it would behsa(in default); For mouse, it would bemmu. If you wants other species, seedatabase_kegg_show_organism()for details.OrgDB: The AnnotationDbi database to convert KEGG gene ID to gene symbols. For human it would beorg.Hs.eg.db, for mouse it would beorg.Mm.eg.db. In AnnotationDbi there are many species, please searchAnnotationDbifor other species annotation database.kegg_modules: If TRUE, returns KEGG module; If FALSE returns KEGG pathways. In default, this is setted to FALSE to get mouse commonly used KEGG pathways.local_cache: cache a copy in local working folder. It will be saved as a.enrichgt_cachefile in working dictionary. The.enrichgt_cacheis just a.rdsfile, feel free to read it usingreadRDS().
Value
data.frame contains KEGG annotations
%-delete->%: Filter Enrichment Results by Description Pattern
Description
Infix operator to filter enrichment results by matching against Description field. For EnrichGT_obj objects, re-runs clustering analysis after filtering.
Usage
x %-delete->% yArguments
x: Either an EnrichGT_obj object or data.frame containing enrichment resultsy: Regular expression pattern to match against Description field
Details
This operator helps refine enrichment results by removing terms matching the given pattern from the Description field. When applied to EnrichGT_obj, it preserves all original parameters and re-runs the clustering analysis on the filtered results.
Value
For EnrichGT_obj input: A new EnrichGT_obj with filtered and re-clustered results. For data.frame input: A filtered data.frame.
Examples
# Filter out "ribosome" related terms
filtered_results <- reenrichment_obj %-delete->% "ribosome"
# Filter data.frame directly
filtered_df <- df %-delete->% "metabolism"