giftflyer.blogg.se

Consensus splice site
Consensus splice site











consensus splice site

However, several cases of splice sites with GC-AG, GG-AG, GT-TG, GT-CG or CT-AG dinucleotides at the splice junctions were observed ( 4– 8).

consensus splice site

With the accumulation of gene sequence data, Mount ( 3) concluded that this GT-AG rule was always obeyed. The donor splice site has GT exactly after the point where the cell cut 5′-end of intron sequences and the acceptor site has AG exactly before the point where the cell cut 3′-end of intron sequences ( 1, 2). Both sets should be significant for future investigations of the splicing mechanism.Įver since the discovery of split genes it has been observed that practically all introns contain two highly conserved dinucleotides. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus ~600) and finally, a set of 290 EST-supported non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively 0.56% hold non-canonical GC-AG splice site pairs and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Expressed sequence tag (EST) sequences support 22 489 of them. A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes.













Consensus splice site