About fanta.bio

fanta.bio (Functional genome ANnotations with Transcriptional Activities) is a database that collects functional annotations of genomes for studying gene regulation, with a primary focus on cis-regulatory elements (CREs) such as promoters and enhancers.

Identification of CREs in fanta.bio is based on transcription signatures. Both promoters and enhancers produce specific sets of RNAs, including mRNA, lncRNA, uaRNA (upstream antisense RNA), and eRNA (enhancer RNA). These transcription signatures are effectively used in the course of the identification. The pioneering work of transcriptome-based CRE identification was carried out in the FANTOM5 project, and we here applied an advanced approach to an expanded dataset. We additionally collect relevant resources, such as genome binding sites of transcription factors (“trans” factors) and genome variations across individuals.

Interfaces

We provide the following interfaces to access the data:

In-house CRE annotation viewer: The most convenient way to explore the database is through our web interface (https://fanta.bio/)
Genome Browser: For researchers who prefer visualizing the data via a genomic view, we provide access to the data through the UCSC Genome Browser database using track hub, enabling comparison with other genomic datasets.
Download Archive: For advanced users, we offer download archives (https://data.fanta.bio/) for working with the data locally.

How the CREs are identified?

CRE peaks are identified based on the experimental measures of transcription starting sites in a broad range of samples. CAGE (Cap Analysis of Gene Expression) is a method to sequence capped RNA 5’-ends, and publicly available data produced by CAGE are utilized in our pipeline for identification, including the ones produced by FANTOM5, FANTOM6, and others deposited in SRA/ENA/DRA. Our pipeline utilizes a newly developed method based on transcription divergence (Kawaji et al. in prep.).

The CREs peaks are categorized into two groups: promoter level activity (PLA) and enhancer level activity (ELA), based on the levels of transcription (that is, CRE activity). Genomic coordinates of the CREs are provided in BED9+ format, where thickStart/thickEnd represent the core region of the CRE peak bounded by the highest signals in forward and reverse strands.

In the genome browser view, the color of CRE peaks with PLA ranges from red to blue, indicating the direction of transcription. Red is used for the forward direction (+1), blue for the reverse direction (-1), and intermediate colors for directionalities in-between, where directionality is defined as $\frac{(ForwardCounts) - (ReverseCounts)}{(ForwardCounts) + (ReverseCounts)}$ . The CRE peaks with ELA are indicated by yellow.

How the CRE activities are measured?

Cell-dependent gene regulation requires activation of a specific set of CREs, and we quantify the CRE activities by measuring their transcription outputs per cell or tissue type. RNA 5’-ends obtained by a dedicated protocol (e.g. CAGE) within each of the the CRE regions are counted, normalized as CPM (counts per million) to adjust sequence depth, and scaled by the RLE method (Anders et al. 2013) to make sample-wise comparison sensible.

Which ChIP-seq data is used?

Of the dataset provided by ChIP-Atlas, ChIP-seq peaks in the “TF and Others” categories is obtained. A parf of the data derived from the cell lines matching to the transcriptome data are included. For their entire data set (incl. ATAC-seq and Bisulfite-seq) and the data-mining tools, please visit their web site (https://chip-atlas.org/).

Additional ChIP-seq peaks can be examined via the UCSC Genome Browser, for example ReMap (https://remap.univ-amu.fr/).

Which genome variation data in human is used?

TogoVar is used as the data source of human genome variation. They focus on Japanese genetic variations but also included non-Japanese ones via dbSNP (https://www.ncbi.nlm.nih.gov/snp/) and others.

Additional variations can be examined via the UCSC Genome Browser, for example ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/), gnomAD (https://gnomad.broadinstitute.org/), TCGA Pan-cancer mutations through the Genomic Data Commons Portal, and dbSNP (https://www.ncbi.nlm.nih.gov/snp/).

Which genome variation data in mouse is used?

MoG+ is used as the data source of mouse genome variation. They collected variations across mouse subspecies or strains.

Additional variations can be examined via the UCSC Genome Browser, for example genomic variations of the common laboratory mouse strains from Mouse Genomes Project (https://www.sanger.ac.uk/data/mouse-genomes-project/), and dbSNP (https://www.ncbi.nlm.nih.gov/snp/).

Partnership

fanta.bio is affiliated with INTRARED, serving as a member database within the network.

Cite us

All data produced by fanta.bio is distributed under the CC-BY 4.0 license. When you use the data and/or the website, please attribute fanta.bio as the source.

fanta.bio: a database of functional genome annotations with transcriptional activities, https://fanta.bio/, 2024.

Contact us

If you have any questions or need assistance, please feel free to contact us at help [at] fanta.bio.

The teams

The construction and maintenance of fanta.bio is a collaborative effort of the following three labs:

Laboratory for Large-Scale Biomedical Data Technology at RIKEN IMS (led by Dr. Kasukawa)
Integrated Bioresource Information Division at RIKEN BRC (led by Dr. Masuya)
Research Center for Genome & Medical Sciences at TMiMS (led by Dr. Kawaji)

Acknowledgements

We sincerely appreciate the following resources for sharing their data:

ChIP-Atlas a data-mining suite for exploring epigenomic landscapes
MoG+ a database of genomic variations across mouse subspecies for biomedical research
TogoVar a comprehensive Japanese genetic variation database

We are also grateful to the UCSC Genome browser and its Asian mirror for enabling genomic interface.

fanta.bio is supported by JST NBDC Grant Number JPMJND2202 in Database Integration Coordination Program (DICP)

References

Mitsuhashi N, Toyo-Oka L, Katayama T, Kawashima M, Kawashima S, Miyazaki K, Takagi T. TogoVar: A comprehensive Japanese genetic variation database. Hum Genome Var. 2022 Dec 12;9(1):44. doi: 10.1038/s41439-022-00222-9. PMID: 36509753; PMCID: PMC9744889.
Takada T, Fukuta K, Usuda D, Kushida T, Kondo S, Kawamoto S, Yoshiki A, Obata Y, Fujiyama A, Toyoda A, Noguchi H, Shiroishi T, Masuya H. MoG+: a database of genomic variations across three mouse subspecies for biomedical research. Mamm Genome. 2022 Mar;33(1):31-43. doi: 10.1007/s00335-021-09933-w. Epub 2021 Nov 15. PMID: 34782917; PMCID: PMC8913468.
Zou Z, Ohta T, Miura F, Oki S. ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data. Nucleic Acids Res. 2022 Jul 5;50(W1):W175-W182. doi: 10.1093/nar/gkac199. PMID: 35325188; PMCID: PMC9252733.
Abugessaisa I, Ramilowski JA, Lizio M, Severin J, Hasegawa A, Harshbarger J, Kondo A, Noguchi S, Yip CW, Ooi JLC, et al. FANTOM enters 20th year: expansion of transcriptomic atlases and functional annotation of non-coding RNAs. Nucleic Acids Res. 2021 Jan 8;49(D1):D892-D898. doi: 10.1093/nar/gkaa1054. PMID: 33211864; PMCID: PMC7779024.
Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014 Mar 27;507(7493):455-461. doi: 10.1038/nature12787. PMID: 24670763; PMCID: PMC5215096.
Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M, Itoh M, et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. doi: 10.1038/nature13182. PMID: 24670764; PMCID: PMC4529748.