Database outlines of FLJ human cDNA database
FLJ human cDNA database was constructed as human cDNA sequence analysis database focused on mRNA varieties caused by variations of transcription start site (TSS) and splicing.
Human gene number was estimated to be 20-25 thousand. However number of human mRNA varieties was predicted to be about 100 thousand. The varieties are thought to be caused by variations of TSS and splicing. In our previous human cDNA project, about 30 thousand of FLJ human full-length sequenced cDNAs were deposited to DDBJ/GenBank/EMBL, and we obtained about 1.4 million of 5'-end sequences (5'-EST) of FLJ full-length cDNAs from about 100 kinds of cDNA libraries consist of human tissues and cells constructed by oligo-capping method. The majority of the insert cDNA sizes were over 2 kb and the full-length rate of 5'-end was 90%. And our FLJ cDNAs were covered about 80% of human genes. In these situations we developed efficient human splicing variant cDNA cloning and evaluation systems in our project. About 22 thousand of finished grades of full-length sequenced cDNAs were obtained in this project.
Then we constructed the sequence analysis databases focused on mRNA variations using human genome and cDNA sequences, FLJ full-length sequenced cDNAs, 5f-ESTs of FLJ full-length cDNAs and other cDNA sequences described below. After those sequences were mapped onto the human genome sequences, clustering of the cDNA sequences were done based on the mapping results. Functional annotations described below were done. Annotations described below are searched and viewed in this database.
a) cDNA information
- Annotation A1: genome locus information of cDNA sequences
- Annotation A2: functional annotations of cDNA and the translated amino acid sequences such as BLAST analysis results, Pfam, PROSITE, PSORT, SignalP, SOSUI and GO (Gene Ontology)
b) cDNA cluster information
- Annotation A3: genome position and locus information of cDNA clusters
- mRNA variation viewer*
* including expression profile by 5f-ESTs of high full-length rate FLJ cDNAs.
Data set :
1) Human cDNA sequences of cDNAs by oligo-capping method used in this DB
EFull-length cDNA sequence data : 52,120 (finished grade)
@@@a) FLJ full-length cDNA sequence data by FLJ-PJ : 30,326
b) FLJ full-length cDNA sequence data by human cDNA sequencing project focused on splicing variants of mRNAs in NEDO FAP-PJ :
21,794 (finished grade) + 3,282 (draft grade)
EFLJ ESTs : 5'-EST 1,456,213 & 3'-EST 109,283
2) Others human cDNA sequences from public DB in this DB
EFull-length sequenced cDNAs (KIAA, MGC, DKFZ etc.) : 52,126
ERefSeq (human) and Ensembl (human gene transcripts) : 77,346
EUniGene, human ESTs : 5'-EST 2,699,311* & 3'-EST 1,638,884*
* about 1.6 million of FLJ EST sequences deposited by us are excluded
3) Human genome sequences
@EUCSC hg18 (NCBI Build 36.1)