Package Functions Help

DESCRIPTION

Type: Package
Package: EnrichGT
Title: EnrichGT - all in one enrichment analysis solution
Version: 2.2.0
Authors@R: c(
    person("Zhiming", "Ye", , "garnetcrow@hotmail.com", role = c("aut", "cre")),
    person("Runchen", "Wang", , "runchen_wang@163.com", role = "cph")
  )
Description: Do biological enrichment analysis and parsing and clustering
    enrichment result to insightful results in just ONE package
License: GPL-V3
URL: https://zhimingye.github.io/EnrichGT/
Depends: 
    R (>= 4.1.0)
Imports: 
    AnnotationDbi,
    cli,
    dplyr,
    ellmer,
    fgsea,
    fontawesome,
    forcats,
    ggdendro,
    ggplot2,
    ggwordcloud,
    glue,
    GO.db,
    graphics,
    grDevices,
    gt,
    htmltools,
    Matrix,
    methods,
    proxy,
    qvalue,
    R6,
    RColorBrewer,
    Rcpp,
    reactome.db,
    rlang,
    scales,
    stats,
    stringr,
    text2vec,
    tibble,
    utils,
    xfun
Suggests: 
    BiocManager,
    readr,
    testthat (>= 3.0.0)
LinkingTo: 
    Rcpp
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2

comparison_reactor_base: Comparison Reactor Base

Description

Typically created via [egt_comparison_reactor](egt_comparison_reactor)()A framework for comparing and analyzing enrichment results across multiple groups. This is the base class that provides core functionality for both ORA and GSEA analysis.

Details

The comparison reactor allows: - Appending multiple enrichment results from different groups - Filtering results by p-value or NES score - Comparing results between groups - Identifying relationships between enriched terms - Visualizing biological themes - Sub-clustering results for deeper analysis

Methods

Public methods

Method new()

Create a new comparison reactor base object

Usage

comparison_reactor_base$new(Type = NULL)

Arguments

  • Type: The analysis type (“ORA” or “GSEA”)

Method append_enriched_result()

Add enrichment results to the reactor

Usage

comparison_reactor_base$append_enriched_result(x, group)

Arguments

  • x: Data.frame of enrichment results (must contain p-value/adjusted p-value columns)
  • group: Character string naming the group for these results

Returns

The reactor object (invisible) for method chaining

Examples

\dontrun{
reactor <- egt_comparison_reactor("ORA")
reactor$append_enriched_result(ora_result1, "group1")
}

Method summarize()

Print summary of groups in reactor

Usage

comparison_reactor_base$summarize()

Method prefilter_by_p_adj()

Filter enrichment results by adjusted p-value cutoff

Usage

comparison_reactor_base$prefilter_by_p_adj(x = 0.05)

Arguments

  • x: Numeric cutoff for adjusted p-values (default 0.05)

Returns

The reactor object (invisible) for method chaining

Examples

\dontrun{
reactor$prefilter_by_p_adj(0.01) # Use stricter cutoff
}

Method find_relationship()

Identify relationships between enriched terms across groups

Usage

comparison_reactor_base$find_relationship(
  Num = NULL,
  dist_method = "euclidean",
  hclust_method = "ward.D2",
  ...
)

Arguments

  • Num: Number of top terms to consider from each group
  • dist_method: Distance method for clustering (default “euclidean”)
  • hclust_method: Hierarchical clustering method (default “ward.D2”)
  • ...: Additional parameters passed to heatmap function

Returns

The reactor object (invisible) for method chaining

Examples

\dontrun{
reactor$find_relationship(Num = 5, dist_method = "manhattan")
}

Method fetch_relationship()

Retrieve relationship data between terms

Usage

comparison_reactor_base$fetch_relationship()

Returns

Data.frame containing term relationships and cluster assignments

Examples

\dontrun{
relation_df <- reactor$fetch_relationship()
head(relation_df)
}

Method fetch_biological_theme()

Generate wordcloud visualization of biological themes

Usage

comparison_reactor_base$fetch_biological_theme(...)

Arguments

  • ...: Additional parameters passed to wordcloud generator

Returns

List containing ggplot2 wordcloud objects

Examples

\dontrun{
wordclouds <- reactor$fetch_biological_theme()
wordclouds[[1]] # View first wordcloud
}

Method split_by_cluster()

Split results by identified clusters for further analysis

Usage

comparison_reactor_base$split_by_cluster(...)

Arguments

  • ...: Additional parameters

Returns

The reactor object (invisible) for method chaining

Method get_splited_list()

Get list of results split by cluster

Usage

comparison_reactor_base$get_splited_list()

Returns

List of data frames split by cluster

Method do_recluster()

Perform sub-clustering within existing clusters

Usage

comparison_reactor_base$do_recluster(
  ClusterNum = 10,
  P.adj = 0.05,
  force = F,
  nTop = 10,
  method = "ward.D2",
  ...
)

Arguments

  • ClusterNum: Number of sub-clusters to generate
  • P.adj: Adjusted p-value cutoff (default 0.05)
  • force: Whether to force reclustering (default FALSE)
  • nTop: Number of top terms to consider (default 10)
  • method: Clustering method (default “ward.D2”)
  • ...: Additional parameters passed to clustering functions

Returns

The reactor object (invisible) for method chaining

Examples

\dontrun{
reactor$do_recluster(ClusterNum = 5, method = "complete")
}

Method get_recluster_result()

Retrieve sub-clustering results

Usage

comparison_reactor_base$get_recluster_result()

Returns

List containing reclustering results

Examples

\dontrun{
recluster_results <- reactor$get_recluster_result()
names(recluster_results)
}

Method print()

Usage

comparison_reactor_base$print(...)

Method clone()

The objects of this class are cloneable with this method.

Usage

comparison_reactor_base$clone(deep = FALSE)

Arguments

  • deep: Whether to make a deep clone.

Examples

## ------------------------------------------------
## Method `comparison_reactor_base$append_enriched_result`
## ------------------------------------------------


reactor <- egt_comparison_reactor("ORA")
reactor$append_enriched_result(ora_result1, "group1")


## ------------------------------------------------
## Method `comparison_reactor_base$prefilter_by_p_adj`
## ------------------------------------------------


reactor$prefilter_by_p_adj(0.01) # Use stricter cutoff


## ------------------------------------------------
## Method `comparison_reactor_base$find_relationship`
## ------------------------------------------------


reactor$find_relationship(Num = 5, dist_method = "manhattan")


## ------------------------------------------------
## Method `comparison_reactor_base$fetch_relationship`
## ------------------------------------------------


relation_df <- reactor$fetch_relationship()
head(relation_df)


## ------------------------------------------------
## Method `comparison_reactor_base$fetch_biological_theme`
## ------------------------------------------------


wordclouds <- reactor$fetch_biological_theme()
wordclouds[[1]] # View first wordcloud


## ------------------------------------------------
## Method `comparison_reactor_base$do_recluster`
## ------------------------------------------------


reactor$do_recluster(ClusterNum = 5, method = "complete")


## ------------------------------------------------
## Method `comparison_reactor_base$get_recluster_result`
## ------------------------------------------------


recluster_results <- reactor$get_recluster_result()
names(recluster_results)

comparison_reactor_gsea: GSEA Comparison Reactor

Description

Typically created via [egt_comparison_reactor](egt_comparison_reactor)("GSEA")

Details

This reactor is optimized for comparing GSEA results across multiple groups, with methods tailored for NES (Normalized Enrichment Score) based comparisons.

Super class

[EnrichGT::comparison_reactor_base](https://rdrr.io/pkg/EnrichGT/man/comparison_reactor_base.html) -> comparison_reactor_gsea

Methods

Public methods

Method new()

Create a new GSEA comparison reactor

Usage

comparison_reactor_gsea$new()

Method make_plans()

Create comparison plans between specified GSEA groups

Usage

comparison_reactor_gsea$make_plans(group = "auto", use_value = "NES")

Arguments

  • group: Character vector of group names to compare or “auto” for all groups
  • use_value: Which value to use for comparison (“NES” for normalized enrichment score)

Returns

The reactor object (invisible) for method chaining

Examples

\dontrun{
gsea_reactor$make_plans(group = c("group1", "group2"))
}

Method prefilter_by_NES()

Filter GSEA results by normalized enrichment score cutoff

Usage

comparison_reactor_gsea$prefilter_by_NES(x = 1)

Arguments

  • x: Numeric cutoff for absolute NES values (default 1)

Returns

The reactor object (invisible) for method chaining

Examples

\dontrun{
gsea_reactor$prefilter_by_NES(1.5) # Filter for stronger effects
}

Method clone()

The objects of this class are cloneable with this method.

Usage

comparison_reactor_gsea$clone(deep = FALSE)

Arguments

  • deep: Whether to make a deep clone.

Seealso

[comparison_reactor_base](comparison_reactor_base) for inherited methods A specialized reactor for comparing Gene Set Enrichment Analysis (GSEA) results. Inherits from comparison_reactor_base and provides GSEA-specific functionality.

Examples

## ------------------------------------------------
## Method `comparison_reactor_gsea$make_plans`
## ------------------------------------------------


gsea_reactor$make_plans(group = c("group1", "group2"))


## ------------------------------------------------
## Method `comparison_reactor_gsea$prefilter_by_NES`
## ------------------------------------------------


gsea_reactor$prefilter_by_NES(1.5) # Filter for stronger effects

comparison_reactor_ora: ORA Comparison Reactor

Description

Typically created via [egt_comparison_reactor](egt_comparison_reactor)("ORA")

Details

This reactor is optimized for comparing ORA results across multiple groups, with methods tailored for p-value based comparisons.

Super class

[EnrichGT::comparison_reactor_base](https://rdrr.io/pkg/EnrichGT/man/comparison_reactor_base.html) -> comparison_reactor_ora

Methods

Public methods

Method new()

Create a new ORA comparison reactor

Usage

comparison_reactor_ora$new()

Method make_plans()

Create comparison plans between specified ORA groups

Usage

comparison_reactor_ora$make_plans(group = "auto", use_value = "p")

Arguments

  • group: Character vector of group names to compare or “auto” for all groups
  • use_value: Which value to use for comparison (“p” for p-value or “padj” for adjusted p-value)

Returns

The reactor object (invisible) for method chaining

Examples

\dontrun{
ora_reactor$make_plans(group = c("group1", "group2"), use_value = "padj")
}

Method clone()

The objects of this class are cloneable with this method.

Usage

comparison_reactor_ora$clone(deep = FALSE)

Arguments

  • deep: Whether to make a deep clone.

Seealso

[comparison_reactor_base](comparison_reactor_base) for inherited methods A specialized reactor for comparing Over-Representation Analysis (ORA) results. Inherits from comparison_reactor_base and provides ORA-specific functionality.

Examples

## ------------------------------------------------
## Method `comparison_reactor_ora$make_plans`
## ------------------------------------------------


ora_reactor$make_plans(group = c("group1", "group2"), use_value = "padj")

convert_annotations_genes: Convert gene annotations from any keys to any keys

Description

Convert gene annotations from any keys to any keys

Usage

convert_annotations_genes(genes, from_what, to_what, OrgDB)

Arguments

  • genes: gene vector
  • from_what: input type (like “SYMBOL”,“ENTREZID”,“ENSEMBL”,“GENENAME”,…), keys should be supported by AnnotationDbi. Search for the help page of AnnotationDbi for further help.
  • to_what: output type (like “SYMBOL”,“ENTREZID”,“ENSEMBL”,“GENENAME”,…), keys should be supported by AnnotationDbi. Search for the help page of AnnotationDbi for further help. Can be multiple items E.g. c("ENTREZID","ENSEMBL","GENENAME")
  • OrgDB: human = org.Hs.eg.db, mouse = org.Mm.eg.db, search BioConductor website for further help

Value

a data.frame

database_from_gmt: Parse GMT format gene set files

Description

Reads gene set files in GMT format (e.g., from MSigDB or WikiPathways) and converts them to a data frame suitable for enrichment analysis. Can optionally convert ENTREZ IDs to gene symbols.

Usage

database_from_gmt(gmtfile, OrgDB = NULL, convert_2_symbols = T)

Arguments

  • gmtfile: Path to GMT format file
  • OrgDB: Annotation database for ID conversion (e.g., org.Hs.eg.db for human). Required if convert_2_symbols=TRUE.
  • convert_2_symbols: Logical indicating whether to convert ENTREZ IDs to gene symbols. Default is TRUE.

Author

Original GMT parser by Guangchuang Yu (https://github.com/YuLab-SMU/gson). Cache system and enhancements by Zhiming Ye.

Value

A data frame with columns:

  • term: Gene set name
  • gene: Gene identifiers (symbols or ENTREZ IDs)

If input has 3 columns, includes an additional ID column.

Examples

# Read MSigDB hallmark gene sets
gmt_file <- system.file("extdata", "h.all.v7.4.symbols.gmt", package = "EnrichGT")
gene_sets <- database_from_gmt(gmt_file)

# Read WikiPathways with ENTREZ to symbol conversion
gmt_file <- "wikipathways-20220310-gmt-Homo_sapiens.gmt"
gene_sets <- database_from_gmt(gmt_file, OrgDB = org.Hs.eg.db)

egt_compare_groups: 2-Group Comparison of enrichment results and further clustering and visualizing

Description

See ?egt_enrichment_analysis()

Usage

egt_compare_groups(
  obj.test,
  obj.ctrl,
  name.test = NULL,
  name.ctrl = NULL,
  ClusterNum = 15,
  P.adj = 0.05,
  force = F,
  nTop = 10,
  method = "ward.D2",
  ...
)

Arguments

  • obj.test: the enriched object from tested group. WARNING: obj.test and obj.ctrl should come from same database (e.g. GO Biological Process(GOBP)).
  • obj.ctrl: the enriched object from control group. WARNING: obj.test and obj.ctrl should come from same database (e.g. GO Biological Process(GOBP)).
  • name.test: optional, the name of the testing group. If is NULL, the object name of obj.test will be used.
  • name.ctrl: optional, the name of the control group. If is NULL, the object name of obj.ctrl will be used.
  • ClusterNum: how many cluster will be clustered
  • P.adj: p.adjust cut-off. To avoid slow visualization, you can make stricter p-cut off.
  • force: ignore all auto-self-checks, which is useful
  • nTop: keep n top items according to p-adj in each cluster.
  • method: the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of “ward.D”, “ward.D2”, “single”, “complete”, “average” (= UPGMA), “mcquitty” (= WPGMA), “median” (= WPGMC) or “centroid” (= UPGMC).
  • ...: Others options.

Details

Execute obj.test VS obj.ctrl tests, showing pathway overlaps (or differences) and meta-gene modules of test group and control group. Supports ORA and GSEA results (enriched object or data.frame). !WARNING!: obj.test and obj.ctrl should come from same database (e.g. GO Biological Process(GOBP)).

Author

Zhiming Ye

Value

List containing multiple EnrichGT_obj objects. The List contains objects with overlapped enriched terms, unique enrich terms.

egt_comparison_reactor: Framework for comparing and analyzing enrichment results

across multiple groups

Description

Framework for comparing and analyzing enrichment results across multiple groups

Usage

egt_comparison_reactor(Type = NULL)

Details

Type should be one of “ORA” or “GSEA”. For functions inside reactor, please see below ‘See also’ (in the bottom of this doc)

Seealso

[comparison_reactor_base](comparison_reactor_base) for the base class documentation [comparison_reactor_ora](comparison_reactor_ora) for ORA-specific functionality [comparison_reactor_gsea](comparison_reactor_gsea) for GSEA-specific functionality

Examples

# ORA example
reactor <- egt_comparison_reactor("ORA")
reactor$append_enriched_result(ora_result1, "group_1_go")
reactor$append_enriched_result(ora_result3, "group_3_go")
reactor$prefilter_by_p_adj(0.05)
reactor$make_plans(group = c("group_1_go","group_3_go"), use_value = "p")
reactor$find_relationship(Num = 3)
wordcloudFigure <- reactor$fetch_biological_theme()
wordcloudFigure[[1]]
reactor$split_by_cluster()
reactor$do_recluster(ClusterNum = 10)
cls_res <- reactor$get_recluster_result()
relation_df <- reactor$fetch_relationship()

# GSEA can do more with:
gsea_reactor <- egt_comparison_reactor("GSEA")
gsea_reactor$prefilter_by_NES(1.5)

egt_enrichment_analysis: Perform Over-Representation Analysis (ORA)

Description

ORA compares the proportion of genes in your target list that belong to specific categories (pathways, GO terms etc.) against the expected proportion in a background set. This implementation uses hash tables for efficient gene counting and supports parallel processing for analyzing multiple gene lists.

Usage

egt_enrichment_analysis(
  genes,
  database,
  p_adj_methods = "BH",
  p_val_cut_off = 0.5,
  background_genes = NULL,
  min_geneset_size = 10,
  max_geneset_size = 500,
  multi_cores = 0
)

Arguments

  • genes: Input genes, either:
  • Character vector of gene IDs (e.g., c("TP53","BRCA1"))
  • Named numeric vector from genes_with_weights() (will split by expression direction)
  • List of gene vectors for multiple comparisons (e.g., by cell type)
  • database: Gene set database, either:
  • Built-in database from database_GO_BP(), database_KEGG() etc.
  • Custom data frame with columns: (ID, Pathway_Name, Genes) or (Pathway_Name, Genes)
  • GMT file loaded via database_from_gmt()
  • p_val_cut_off: Adjusted p-value cutoff (default 0.5)
  • background_genes: Custom background genes (default: all genes in database)
  • min_geneset_size: Minimum genes per set (default 10)
  • max_geneset_size: Maximum genes per set (default 500)
  • multi_cores: (Please don’t use this since it has several known bugs) Number of cores for parallel processing (default 0 = serial)
  • p_adj_method: Multiple testing correction method (default “BH” for Benjamini-Hochberg)

Details

Identifies enriched biological pathways or gene sets in a gene list using high-performance C++ implementation with parallel processing support.

Author

Zhiming Ye

Value

A data frame with columns:

  • ID: Gene set identifier
  • Description: Gene set name
  • GeneRatio: Enriched genes / input genes
  • BgRatio: Set genes / background genes
  • pvalue: Raw p-value
  • p.adjust: Adjusted p-value
  • geneID: Enriched genes
  • Count: Number of enriched genes

For weighted input, additional columns show up/down regulated genes.

Examples

# Basic ORA with GO Biological Processes
genes <- c("TP53", "BRCA1", "EGFR", "CDK2")
res <- egt_enrichment_analysis(genes, database_GO_BP())

# ORA with DEG results (split by direction)
deg_genes <- genes_with_weights(DEG$gene, DEG$log2FC)
res <- egt_enrichment_analysis(deg_genes, database_KEGG())

# Multi-group ORA with parallel processing
gene_lists <- list(
  Macrophages = c("CD68", "CD163", "CD169"),
  Fibroblasts = c("COL1A1", "COL1A2", "ACTA2")
)
res <- egt_enrichment_analysis(gene_lists, database_Reactome(), multi_cores=0)

egt_fetch_biological_theme: Generate Biological Theme Wordcloud from Enrichment Results

Description

Creates a wordcloud visualization from enrichment results, either from a data.frame or an EnrichGT_obj. For data.frames, filters by p.adjust < 0.05 by default.

Usage

egt_fetch_biological_theme(x, cluster = NULL, skip_filtering = F, ...)

Arguments

  • x: Input object - either a data.frame with enrichment results or an EnrichGT_obj
  • cluster: Cluster name/number (required if x is EnrichGT_obj)
  • skip_filtering: Logical, if TRUE skips filtering of data.frame by p.adjust
  • ...: Additional arguments passed to wordcloud_generator2

Author

Zhiming Ye

Value

ggwordcloud object

egt_fetch_termwise_relationship: Fetch and visualize termwise relationships in enrichment analysis results

Description

This function analyzes the relationships between enriched terms and creates a hierarchical clustering visualization. Please avoiding querying too large dataset.

Usage

egt_fetch_termwise_relationship(
  x,
  database,
  according_to = "Description",
  ClusterNum = 4,
  method = "ward.D2",
  force = F,
  maxLength = 40
)

Arguments

  • x: A character vector of terms to analyze
  • database: Query database. For example, database_GO_BP(org.Hs.eg.db). If you are quering fused result, you can input a list like list(database_GO_BP(org.Hs.eg.db), database_Reactome(org.Hs.eg.db)).
  • according_to: Character string specifying which column to use for filtering, default is “Description”
  • ClusterNum: Integer specifying the number of clusters, default is 4
  • method: Character string specifying the hierarchical clustering method, default is “ward.D2”
  • force: Avoid checking the length of x. Large query will cause long calculation time and poor results.
  • maxLength: Warp length max than what terms

Value

A list containing:

Examples

# Example usage:
result <- egt_fetch_termwise_relationship(
  x = c("pathway1", "pathway2"),
  database = pathway_db,
  according_to = "Description"
)
result$Figure

egt_generate_quarto_report: Export Quarto Report

Description

Export Quarto Report

Usage

egt_generate_quarto_report(
  re_enrichment_results,
  output_path = "Report.qmd",
  type = "html"
)

Arguments

  • re_enrichment_results: The EnrichGT_obj, AI summarized result is more recommanded.
  • output_path: Path of the output qmd file (e.g., test.qmd)
  • type: export pdf or html. In default output is HTML format.

Value

A quarto document

egt_gsea_analysis: Perform Gene Set Enrichment Analysis (GSEA)

Description

GSEA analyzes whether predefined gene sets show statistically significant enrichment at the top or bottom of a ranked gene list. This implementation uses the fast fgsea algorithm from the fgsea package.

Usage

egt_gsea_analysis(
  genes,
  database,
  p_val_cut_off = 0.5,
  min_geneset_size = 10,
  max_geneset_size = 500,
  gseaParam = 1
)

Arguments

  • genes: A named numeric vector of ranked genes where:
  • Names are gene identifiers
  • Values are ranking metric (e.g., log2 fold change, PCA loading)Must be sorted in descending order (use genes_with_weights() to prepare)
  • database: Gene set database, either:
  • Built-in database from database_GO_BP(), database_KEGG() etc.
  • Custom data frame with columns: (ID, Pathway_Name, Genes) or (Pathway_Name, Genes)
  • GMT file loaded via database_from_gmt()
  • p_val_cut_off: Adjusted p-value cutoff (default 0.5)
  • min_geneset_size: Minimum genes per set (default 10)
  • max_geneset_size: Maximum genes per set (default 500)
  • gseaParam: GSEA parameter controlling weight of ranking (default 1)
  • p_adj_method: Multiple testing correction method (default “BH” for Benjamini-Hochberg)

Details

Identifies enriched biological pathways in a ranked gene list using the fgsea algorithm.

Author

Based on fgsea package by Alexey Sergushichev et al.

Value

A data frame with columns:

  • ID: Gene set identifier
  • Description: Gene set name
  • ES: Enrichment score
  • NES: Normalized enrichment score
  • pvalue: Raw p-value
  • p.adjust: Adjusted p-value
  • core_enrichment: Leading edge genes

Examples

# Using differential expression results
ranked_genes <- genes_with_weights(DEG$gene, DEG$log2FC)
res <- egt_gsea_analysis(ranked_genes, database_GO_BP())

# Using PCA loadings
ranked_genes <- genes_with_weights(rownames(pca$rotation), pca$rotation[,1])
res <- egt_gsea_analysis(ranked_genes, database_KEGG())

# Custom gene sets from GMT file
ranked_genes <- genes_with_weights(genes, weights)
res <- egt_gsea_analysis(ranked_genes, database_from_gmt("pathways.gmt"))

egt_infer_act: Infering Pathway or Transcript Factors activity from EnrichGT meta-gene modules

Description

Only supports gene symbols. PROGENy is a comprehensive resource containing a curated collection of pathways and their target genes, with weights for each interaction. CollecTRI is a comprehensive resource containing a curated collection of TFs and their transcriptional targets compiled from 12 different resources. This collection provides an increased coverage of transcription factors and a superior performance in identifying perturbed TFs compared to our previous. If when doing re-enrichment, you select a high number of clusters, that may cause low gene number in each meta-gene module, and then can’t be infered sucessfully. So if result is empty, please increase the number of re-clustering when doing it.

Usage

egt_infer_act(x, DB = "collectri", species = "human")

Arguments

  • x: an EnrichGT_obj object.
  • DB: can be “progeny” (the Pathway activity database), or “collectri” (TF activity database)
  • species: can be “human” or “mouse”

Author

Zhiming Ye, Saez-Rodriguez Lab (The decoupleR package, https://saezlab.github.io/decoupleR/)

Value

an ORA result list

egt_llm_multi_summary: Compare Multiple LLM Summaries

Description

Generate summaries using multiple LLM models and create a comparison object.

Usage

egt_llm_multi_summary(
  x,
  chat_list,
  lang = "English",
  background_knowledges = NULL,
  comparison_prompt = NULL
)

Arguments

  • x: An EnrichGT_obj object from egt_recluster_analysis()
  • chat_list: A named list of LLM chat objects
  • lang: Language for summaries (“English” or “Chinese”)
  • background_knowledges: Additional Reference (e.g, Papers or guide) for LLM for reference
  • comparison_prompt: Custom prompt for generating comparison summary

Value

EnrichGT_obj with LLM_Comparison slot filled

egt_llm_summary: Summarize EnrichGT results using LLM

Description

This function uses a Large Language Model (LLM) to generate summaries for pathway clusters and gene modules in an EnrichGT_obj object.

Usage

egt_llm_summary(
  x,
  chat,
  lang = "English",
  model_name = NULL,
  background_knowledges = NULL
)

Arguments

  • x: An EnrichGT_obj object created by [egt_recluster_analysis](egt_recluster_analysis).
  • chat: An LLM chat object created by the ellmer package.
  • lang: Language pass to LLM. Can be English or Chinese.
  • background_knowledges: Additional Reference (e.g, Papers or guide) for LLM for reference. A Single Character String should be provided.

Note

It is recommended not to add system prompts when creating the chat object. The function provides its own carefully crafted prompts for biological analysis.

References

For more information about creating chat objects, see the ellmer package documentation.

Seealso

[egt_recluster_analysis](egt_recluster_analysis) to create the input object.

Value

Returns the input EnrichGT_obj object with added LLM annotations in the LLM_Annotation slot. The annotations include:

  • pathways: Summaries of pathway clusters
  • genes_and_title: Summaries of gene modules and their titles

Examples

# Create LLM chat object
chat <- chat_deepseek(api_key = YOUR_API_KEY, model = "deepseek-chat", system_prompt = "")

# Run enrichment analysis and get EnrichGT_obj
re_enrichment_results <- egt_recluster_analysis(...)

# Get LLM summaries
re_enrichment_results <- egt_llm_summary(re_enrichment_results, chat)

egt_plot_gsea: Generate GSEA Enrichment Plots

Description

This function creates graphical representations of Gene Set Enrichment Analysis (GSEA) results, including either a multi-panel GSEA table plot for multiple pathways or a single pathway enrichment plot. The visualization leverages the fgsea package’s plotting functions.

Usage

egt_plot_gsea(resGSEA$Description[1],genes = genes_with_weights(genes = DEGexample$...1, weights = DEGexample$log2FoldChange),database = database_GO_BP(org.Hs.eg.db))

egt_plot_gsea(resGSEA[1:8,],genes = genes_with_weights(genes = DEGexample$...1, weights = DEGexample$log2FoldChange),database = database_GO_BP(org.Hs.eg.db))

Arguments

  • x: A GSEA result object. Can be either:
  • A data frame containing GSEA results (requires columns: pvalue, p.adjust, Description)
  • A character string specifying a single pathway name
  • genes: A named numeric vector from genes_with_weights(). These should match the gene identifiers used in the GSEA analysis.
  • database: A database data.frame, You can obtain it from database_xxx() functions. This should correspond to the database used in the original GSEA analysis.

Author

Zhiming Ye, warpped from fgsea

Value

A ggplot object:

  • When x is a data frame: Returns a multi-panel plot showing normalized enrichment scores (NES), p-values, and leading edge plots for top pathways
  • When x is a pathway name: Returns an enrichment plot showing the running enrichment score for the specified pathway

egt_plot_results: Visualize enrichment results using simple plot

Description

This plot is the most widely like enrichplot::dotplot()used method to visualize enriched terms. It shows the enrichment scores (e.g. p values) and gene ratio or NES as dot size and color / or bar height. Users can specify the number of terms using ntop or selected terms to color via the low.col and hi.col.

Usage

egt_plot_results(
  x,
  ntop = NULL,
  showIDs = F,
  max_len_descript = 40,
  keepAll = F,
  maskNoise = 3,
  ...,
  P.adj = NULL
)

Arguments

  • x: a data frame form enriched result like egt_enrichment_analysis() or egt_gsea_analysis(), or an re-clustered EnrichGT object
  • ntop: Show top N in each cluster. In default, for origin enriched result, showing top 15; for re-clustered object, showing top 5 in each cluster.
  • showIDs: bool, show pathway IDs or not. Default is FALSE
  • max_len_descript: the label format length, default as 40.
  • keepAll: Do filtering to avoid overlap of same genes or not
  • maskNoise: (Only works with re-enriched object) Cut-off value to mask rare population in cluster tree. Less than its value in a specifc child tree will be ignore (because it may hit only by coincidence). Set maskNoise = 0 to ignore this.
  • ...: Other param
  • P.adj: (Only works with origin data.frame) If pass an origin data.frame from original enriched result, you can specify the P-adjust value cut off. If is null, default is 0.05. When passing EnrichGT_obj, this filter is previously done by egt_recluster_analysis.
  • low.col: the color for the lowest
  • hi.col: the color for the highest

Author

Zhiming Ye

Value

a ggplot2 object

egt_recluster_analysis: Cluster and re-enrichment enrichment results

Description

Performs hierarchical clustering on enrichment results (ORA or GSEA) based on gene-term associations to reduce redundancy and improve biological interpretation. The function helps identify coherent groups of related terms while preserving important but less significant findings.

Usage

egt_recluster_analysis(
  x,
  ClusterNum = 10,
  P.adj = 0.05,
  force = F,
  nTop = 10,
  method = "ward.D2",
  ...
)

Arguments

  • x: Enrichment result from EnrichGT or clusterProfiler. For multi-database results, provide a list.
  • ClusterNum: Number of clusters to create (default: 10).
  • P.adj: Adjusted p-value cutoff (default: 0.05). Stricter values improve performance.
  • force: Logical to bypass validation checks (default: FALSE).
  • nTop: Number of top terms to keep per cluster by p-value (default: 10).
  • method: Hierarchical clustering method (default: “ward.D2”). One of: “ward.D”, “ward.D2”, “single”, “complete”, “average” (UPGMA), “mcquitty” (WPGMA), “median” (WPGMC), or “centroid” (UPGMC).
  • ...: Additional arguments passed to clustering functions.

Details

Input requirements by analysis type: ORA results: Required columns: “ID”, “Description”, “GeneRatio”, “pvalue”, “p.adjust”, “geneID”, “Count” GSEA results: Required columns: “ID”, “Description”, “NES”, “pvalue”, “p.adjust”, “core_enrichment” compareClusterResult: Either the compareClusterResult object or a data frame with: - All ORA columns listed above - Additional “Cluster” column Multi-database: Provide as a named list of the above result types

Author

Zhiming Ye

Value

An EnrichGT_obj containing:

  • enriched_result: Filtered results data frame
  • gt_object: Formatted gt_tbl table object
  • gene_modules: List of gene modules per cluster
  • pathway_clusters: Pathway names by cluster
  • clustering_tree: hclust object for visualization
  • raw_enriched_result: Unfiltered results table

Examples

# ORA example
res <- egt_recluster_analysis(ora_result, ClusterNum=8)
plot(res@clustering_tree)

# GSEA example
gsea_res <- egt_recluster_analysis(gsea_result, method="average")
gsea_res

egt_summary: Generate HTML Summary for EnrichGT Object

Description

This function generates an HTML-formatted summary of enrichment results that can be displayed in RStudio viewer or web browsers.

Usage

egt_summary(x, name)

Arguments

  • x: An EnrichGT_obj object
  • name: The cluster name to summarize

Value

HTML formatted summary that can be viewed with htmltools::html_print() or similar

Examples

# Generate HTML summary for cluster 1
html_summary <- egt_summary(obj, "Cluster_1")
htmltools::html_print(html_summary)

# Or view in RStudio viewer
if (interactive()) {
  html_summary <- egt_summary(obj, "Cluster_1")
  print(html_summary)
}

egt_web_interface: Launch EnrichGT Web Interface

Description

Launch an interactive Shiny web application that provides a user-friendly interface for performing enrichment analysis using EnrichGT package functions.

Usage

egt_web_interface(
  LLM = NULL,
  port = NULL,
  host = "127.0.0.1",
  launch.browser = TRUE
)

Arguments

  • LLM: An optional LLM chat object created by the ellmer package. If provided, enables LLM-powered summarization of recluster analysis results. If NULL (default), LLM features are disabled.
  • port: Port number for the application. If NULL (default), a random available port will be used.
  • host: IP address that the application should listen on. Default is “127.0.0.1” (localhost).
  • launch.browser: Logical. If TRUE (default), the web browser will be launched automatically.

Details

The web interface includes major analysis modules in EnrichGT.

Author

Zhiming Ye

Seealso

[egt_enrichment_analysis](egt_enrichment_analysis), [egt_gsea_analysis](egt_gsea_analysis), [egt_recluster_analysis](egt_recluster_analysis), [egt_plot_results](egt_plot_results), [egt_llm_summary](egt_llm_summary)

Value

Starts a Shiny application

Examples

# Launch the web interface with default settings
egt_web_interface()

# Launch with LLM support
library(ellmer)
chat <- chat_deepseek(api_key = "your_api_key", model = "deepseek-chat")
egt_web_interface(LLM = chat)

# Launch on a specific port
egt_web_interface(port = 3838)

# Launch without opening browser automatically
egt_web_interface(launch.browser = FALSE)

genes_with_weights: Create a ranked gene list for GSEA analysis

Description

Takes gene identifiers and corresponding weights (like log2 fold changes) and returns a ranked vector suitable for Gene Set Enrichment Analysis (GSEA).

Usage

genes_with_weights(genes, weights)

Arguments

  • genes: Character vector of gene identifiers (e.g., gene symbols or ENTREZ IDs)
  • weights: Numeric vector of weights for each gene (typically log2 fold changes)

Author

Zhiming Ye

Value

A named numeric vector sorted in descending order by weight, where: - Names are gene identifiers - Values are the corresponding weights

Examples

# Example using differential expression results
genes <- c("TP53", "BRCA1", "EGFR")
log2fc <- c(1.5, -2.1, 0.8)
ranked_genes <- genes_with_weights(genes, log2fc)

database_...: Get database for enrichment analysis

Description

Get Gene Ontology (GO), Reactome, and other term-to-gene database, for enrichment analysis

Usage

database_GO_BP(OrgDB = org.Hs.eg.db)

database_GO_CC(OrgDB = org.Hs.eg.db)

database_GO_MF(OrgDB = org.Hs.eg.db)

database_GO_ALL(OrgDB = org.Hs.eg.db)

database_Reactome(OrgDB = org.Hs.eg.db)

database_progeny_human()

database_progeny_mouse()

database_CollecTRI_human()

database_CollecTRI_mouse()

Arguments

  • OrgDB: The AnnotationDbi database to fetch pathway data and convert gene IDs to gene symbols. For human it would be org.Hs.eg.db, for mouse it would be org.Mm.eg.db. In AnnotationDbi there are many species, please search AnnotationDbi for other species annotation database. GO and Reactome should add this, progeny and collectri do not.

Author

Zhiming Ye. Part of functions were inspired by clusterProfiler but with brand new implement.

Value

a data.frame with ID, terms and genes

database_KEGG: Get KEGG database from KEGG website

Description

KEGG is a commercialized database. So EnrichGT can’t pre-cache them locally. You can use this function to fetch KEGG database pathways and modules.

Usage

database_KEGG(kegg_organism="hsa",OrgDB = org.Hs.eg.db,kegg_modules=F,local_cache=F)

database_KEGG_show_organism()

Arguments

  • kegg_organism: Determine which species data from KEGG will be fetch. For human, it would be hsa(in default); For mouse, it would be mmu. If you wants other species, see database_kegg_show_organism() for details.
  • OrgDB: The AnnotationDbi database to convert KEGG gene ID to gene symbols. For human it would be org.Hs.eg.db, for mouse it would be org.Mm.eg.db. In AnnotationDbi there are many species, please search AnnotationDbi for other species annotation database.
  • kegg_modules: If TRUE, returns KEGG module; If FALSE returns KEGG pathways. In default, this is setted to FALSE to get mouse commonly used KEGG pathways.
  • local_cache: cache a copy in local working folder. It will be saved as a .enrichgt_cache file in working dictionary. The .enrichgt_cache is just a .rds file, feel free to read it using readRDS().

Value

data.frame contains KEGG annotations

%-delete->%: Filter Enrichment Results by Description Pattern

Description

Infix operator to filter enrichment results by matching against Description field. For EnrichGT_obj objects, re-runs clustering analysis after filtering.

Usage

x %-delete->% y

Arguments

  • x: Either an EnrichGT_obj object or data.frame containing enrichment results
  • y: Regular expression pattern to match against Description field

Details

This operator helps refine enrichment results by removing terms matching the given pattern from the Description field. When applied to EnrichGT_obj, it preserves all original parameters and re-runs the clustering analysis on the filtered results.

Value

For EnrichGT_obj input: A new EnrichGT_obj with filtered and re-clustered results. For data.frame input: A filtered data.frame.

Examples

# Filter out "ribosome" related terms
filtered_results <- reenrichment_obj %-delete->% "ribosome"

# Filter data.frame directly
filtered_df <- df %-delete->% "metabolism"
Back to top