# fanta.bio

> fanta.bio (Functional genome ANnotations with Transcriptional Activities) is a database that collects functional annotations of genomes for studying gene regulation, with a primary focus on cis-regulatory elements (CREs) such as promoters and enhancers.

## Database Overview

fanta.bio provides comprehensive functional genome annotations focused on cis-regulatory elements (CREs). CREs, including promoters and enhancers, are identified based on their transcription signatures. Both promoters and enhancers produce specific sets of RNAs, such as mRNA, lncRNA, uaRNA (upstream antisense RNA), and eRNA (enhancer RNA).

The identification methodology builds upon the pioneering work of the FANTOM5 project, applying advanced approaches to an expanded dataset. The database additionally collects relevant resources like genome binding sites of transcription factors and genome variations across individuals.

## Data Access Interfaces

- **Web Interface**: The most user-friendly way to explore the database at [https://fanta.bio/](https://fanta.bio/)
- **REST API**: Programmatic access at [https://api.fanta.bio/](https://api.fanta.bio/) — interactive docs at [https://api.fanta.bio/docs](https://api.fanta.bio/docs), OpenAPI 3.1 spec at [https://api.fanta.bio/openapi.json](https://api.fanta.bio/openapi.json)
- **MCP Server**: Model Context Protocol server for AI assistants at [https://mcp.fanta.bio/mcp](https://mcp.fanta.bio/mcp) — exposes 19 tools, 5 prompt templates, and 2 resources for genes, SNPs, CREs, expression, and TF binding
- **Documentation**: Full docs at [https://docs.fanta.bio/](https://docs.fanta.bio/)
- **UCSC Genome Browser**: Access via [track hub](https://genome-asia.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://data.fanta.bio/hub/v1.1.0-2409/trackhub/hub.txt) for visualization alongside other genomic datasets
- **Download Archive**: Available at [https://data.fanta.bio/](https://data.fanta.bio/) for local data analysis

## Search API and URL Structures

fanta.bio supports URL-based searching that can be used programmatically to access specific data. The following URL patterns enable direct access to search results and record pages:

### Basic Search
- **Main search page**: https://fanta.bio/
- **Basic search URL structure**: `https://fanta.bio/search?q={QUERY}&organism={ORGANISM}`
  - Example (search for TP53): `https://fanta.bio/search?q=TP53&organism=human`
  - Example (search for Sox2): `https://fanta.bio/search?q=Sox2&organism=mouse`
  - Organism options: `human`, `mouse`, or `any`

### UCSC Browser Search
- **URL structure**: `https://genome-asia.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://data.fanta.bio/hub/v1.1.0-2409/trackhub/hub.txt&genome={GENOME}&position={POSITION}`
  - Example (human, chr17:7668402-7687538): `https://genome-asia.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://data.fanta.bio/hub/v1.1.0-2409/trackhub/hub.txt&genome=hg38&position=chr17:7668402-7687538`
  - Genome options: `hg38` (human), `mm10` (mouse)

### Advanced Search
- **Advanced search page**: [https://fanta.bio/search/cre-advanced](https://fanta.bio/search/cre-advanced)
- **URL structure**: `https://fanta.bio/search/cre-advanced?q={QUERY}&organism={ORGANISM}&bound_tf={TF_NAME}`
  - Example (CREs with CTCF binding): `https://fanta.bio/search/cre-advanced?q=&organism=human&bound_tf=CTCF`
  - Example (CREs with p53 binding): `https://fanta.bio/search/cre-advanced?q=&organism=human&bound_tf=TP53`
  - Supports combinations of parameters for precise filtering
  - Search terms allow exact or partial matches, wildcards, and logical combinations

### Gene Neighbor Search
- **Gene search page**: [https://fanta.bio/search/gene](https://fanta.bio/search/gene)
- **URL structure**: `https://fanta.bio/search/gene?q={GENE_NAME_OR_SYMBOL}&organism={ORGANISM}`
  - Example (TP53 gene): `https://fanta.bio/search/gene?q=TP53&organism=human`
  - Example (Sox2 gene): `https://fanta.bio/search/gene?q=Sox2&organism=mouse`
  - Returns CREs within 10kb of the gene (distance=0 means CRE is inside the gene)

### GWAS SNP Search
- **SNP search page**: [https://fanta.bio/search/snp](https://fanta.bio/search/snp)
- **URL structure**: `https://fanta.bio/search/snp?q={TRAIT_OR_SNP_ID}&organism={ORGANISM}`
  - Example (diabetes trait): `https://fanta.bio/search/snp?q=diabetes&organism=human`
  - Example (specific SNP): `https://fanta.bio/search/snp?q=rs12345&organism=human`
  - Returns SNPs related to the query and counts of nearby CREs (within 10kb)

### CRE Record Access
- **Direct CRE record URL structure**: `https://fanta.bio/cre/{CRE_ID}`
  - Example: `https://fanta.bio/cre/hg38_cre_12345`
  - Example: `https://fanta.bio/cre/mm10_cre_67890`
  - Provides access to all tabs: Annotation, Bound TFs, Variations, Expression Table

### Result Formats
- **CSV Export**: Add `&format=csv` to any search URL to download results in CSV format
  - Example: `https://fanta.bio/search?q=TP53&organism=human&format=csv`

## CRE Identification Methodology

CRE peaks are identified using experimental measurements of transcription starting sites across diverse biological samples. The process utilizes CAGE (Cap Analysis of Gene Expression) data from FANTOM5, FANTOM6, and other public repositories like SRA/ENA/DRA.

The identification pipeline employs a method based on transcription divergence (Kawaji et al., in preparation). CREs are categorized into two primary groups:
- **Promoter Level Activity (PLA)**: Color-coded from red to blue in the browser view, indicating transcription direction
- **Enhancer Level Activity (ELA)**: Marked in yellow in the browser view

CRE genomic coordinates are provided in BED9+ format, with thickStart/thickEnd representing the core region bounded by the highest signals in forward and reverse strands.

## CRE Activity Measurement

Cell-dependent gene regulation requires activation of specific CRE sets. The database quantifies CRE activities by measuring transcription outputs per cell or tissue type. RNA 5'-ends within CRE regions are:
1. Counted
2. Normalized as CPM (counts per million) to adjust for sequence depth
3. Scaled by the RLE method (Anders et al. 2013) for meaningful sample comparisons

## Associated Datasets

### ChIP-seq Data
- Sourced from [ChIP-Atlas](https://chip-atlas.org/)
- Includes ChIP-seq peaks in "TF and Others" categories
- Selected data derived from cell lines matching transcriptome data
- Search by TF name in advanced search: `https://fanta.bio/search/cre-advanced?bound_tf={TF_NAME}`

### Human Genome Variation Data
- [TogoVar](https://togovar.org/) serves as the primary data source
- Focuses on Japanese genetic variations but includes non-Japanese data via dbSNP
- Additional resources accessible via UCSC Genome Browser (ClinVar, gnomAD, TCGA Pan-cancer mutations, dbSNP)
- Available in the "TogoVar Variations" tab of each human CRE record

### Mouse Genome Variation Data
- [MoG+](https://molossinus.brc.riken.jp/) provides variations across mouse subspecies/strains
- Additional data via UCSC Genome Browser (Mouse Genomes Project, dbSNP)
- Linked from the "Annotation" tab of each mouse CRE record

## User Interface Guide

### Search Functionality
- **Basic Search**: Search by CRE ID, CRE Name, TFs, external identifiers, and organism
  - URL: `https://fanta.bio/search?q={QUERY}&organism={ORGANISM}`
- **UCSC Browser Search**: Search by keywords or genomic coordinates
  - URL: `https://genome-asia.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://data.fanta.bio/hub/v1.1.0-2409/trackhub/hub.txt&genome={GENOME}&position={POSITION}`
- **Advanced Search**: Combine various search parameters for more specific results
  - URL: `https://fanta.bio/search/cre-advanced?q={QUERY}&organism={ORGANISM}&bound_tf={TF_NAME}`
- **Neighbor Gene Search**: Find CREs near specific genes (within 10kb)
  - URL: `https://fanta.bio/search/gene?q={GENE_NAME_OR_SYMBOL}&organism={ORGANISM}`
- **GWAS SNP Search**: Locate CREs near SNPs associated with specific traits
  - URL: `https://fanta.bio/search/snp?q={TRAIT_OR_SNP_ID}&organism={ORGANISM}`

### CRE Record Pages
Each CRE record at `https://fanta.bio/cre/{CRE_ID}` provides:
1. **Annotation**: Basic information, genome coordinates, and nearest transcript data
   - Includes Ensembl transcript ID/RefSeq ID/GenBank accession
   - NCBI Gene ID, HGNC ID/MGI ID, UniProt ID
   - Gene Name/Symbol/Synonym from HGNC/MGI
   - Overlap information with FANTOM5 CAGE peaks and refTSS
   - Information on overlap with FANTOM5 enhancers and SCREEN cCREs
2. **Bound TFs**: Transcription factors experimentally shown to bind to the CRE region
   - Lists TFs with Max Qscore (-10 * Log10[MACS2 Q-value])
   - Includes experiment information (SRA ID) and Qscores for antigens
   - TF binding determined by 50% overlap (peak cutoff: Q-score > 1000)
3. **Variations**: Genome variation data from TogoVar (human) or MoG+ (mouse)
   - Human: "TogoVar Variations" tab with detailed variation data
   - Mouse: Link to the corresponding region in MoG+
4. **Expression Table**: CRE expression values across different samples
   - Shows normalized expression values (CPM, scaled by RLE method)
   - Facilitates understanding of cell-dependent gene regulation

## Combining Search Results
fanta.bio enables combining search results through URL parameters:
- Multiple parameters can be combined with `&` character
- Example (human CREs with CTCF binding and TP53 in description): `https://fanta.bio/search/cre-advanced?q=TP53&organism=human&bound_tf=CTCF`
- Example (retrieving CSV format): `https://fanta.bio/search?q=Sox2&organism=mouse&format=csv`
- Parameters are processed using AND logic by default

## Affiliations and Licensing

- fanta.bio is affiliated with [INTRARED](https://www.intrared.org/), serving as a member database
- All data is distributed under the [CC-BY 4.0 license](http://creativecommons.org/licenses/by/4.0/)
- Citation: Nobusada T et al., Update of the FANTOM web resource: enhancement for studying noncoding genomes. *Nucleic Acids Res.* **53** (D1), D419–D424 (2025). doi: [10.1093/nar/gkae1047](https://doi.org/10.1093/nar/gkae1047)

## Team and Acknowledgements

The database is maintained by three collaborative labs:
- Laboratory for Large-Scale Biomedical Data Technology at RIKEN IMS (led by Dr. Kasukawa)
- Integrated Bioresource Information Division at RIKEN BRC (led by Dr. Masuya)
- Research Center for Genome & Medical Sciences at TMiMS (led by Dr. Kawaji)

The project acknowledges contributions from:
- [ChIP-Atlas](https://chip-atlas.org/)
- [MoG+](https://molossinus.brc.riken.jp/)
- [TogoVar](https://togovar.org/)
- UCSC Genome Browser

fanta.bio is supported by JST NBDC Grant Number JPMJND2202 in the Database Integration Coordination Program (DICP).

## Contact Information

For questions or assistance: help@fanta.bio

## Key References

- Mitsuhashi N, et al. TogoVar: A comprehensive Japanese genetic variation database. Hum Genome Var. 2022
- Takada T, et al. MoG+: a database of genomic variations across three mouse subspecies for biomedical research. Mamm Genome. 2022
- Zou Z, et al. ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes. Nucleic Acids Res. 2022
- Abugessaisa I, et al. FANTOM enters 20th year: expansion of transcriptomic atlases and functional annotation of non-coding RNAs. Nucleic Acids Res. 2021
- Andersson R, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014
- Forrest AR, et al. A promoter-level mammalian expression atlas. Nature. 2014