Select your databases for enrichment or annotation
DataBases Helpers
How to specify species?
EnrichGT use AnnotationDbi for fetching most of databases and gene annotations. you can use org.Hs.eg.db for human and org.Mm.eg.db for mouse. For others, please search in Google or refer to BioConductor.
But for non-AnnotationDbi source database, you do not need to provide this, like database_CollecTRI_human() return database about human only.
Built in database or AnnotationDbi database
You should add argument OrgDB for fetching them.
Example:
database_GO_BP(OrgDB = org.Hs.eg.db)
GO Database
BP stands for biological process, CC stands for cellular component and MF stands for molecular functions. The ALL will combined the above three sub-databases.
KEGG is a commercialized database. So EnrichGT can’t pre-cache them locally. You can use database_kegg to fetch KEGG database pathways and modules.
This function requires two species-related argument. You may familiar to the OrgDB argument as they will be used to convert ENTREZ IDs to symbols like before. Another argument is the kegg_organism. It determines which species data from KEGG will be fetch. For human, it would be hsa(in default); For mouse, it would be mmu. If you wants other species, execute database_kegg_show_organism() for details.
You can switch fetching KEGG pathways or modules by argument kegg_modules. If TRUE, returns KEGG module; If FALSE returns KEGG pathways. In default, this is setted to FALSE to get mouse commonly used KEGG pathways.
If you set local_cache = T, EnrichGT will cache a copy in local working folder. It will be saved as a .enrichgt_cache file in working dictionary. The .enrichgt_cache is just a .rds file, feel free to read it using readRDS().
WikiPathway database provides pre-built GMT files (https://data.wikipathways.org/current/gmt/). In default they are recorded as ENTREZ IDs, so you need to provide proper species database (e.g. org.Hs.eg.db for human), to database_from_gmt function and EnrichGT will automatically convert ENTREZ IDs to gene symbols for enrichment analysis.
For pathway activity infer, database_progeny_human() and database_progeny_mouse()
CollecTRI Database
For Transcript Factors infer, database_CollecTRI_human() and database_CollecTRI_mouse()
Read Addition Gene Sets from local GMT files
EnrichGT supports reading GMT files, You can obtain GMT files from MsigDB.
database_from_gmt("Path_to_your_Gmt_file.gmt")
In default, database_from_gmt will try to convert the numeric ids to gene symbols (as they are usually the ENTREZ IDs, you can disable this by passing convert_2_symbols = F ).
Read Addition Gene Sets from local data tables
The result of any database_*** functions are data.frames. So you can simple read any data tables and use them for any enrichment function.