Your Biotechnology Service Provider

Splice site prediction

The prediction of splice sites is a complex and difficult task for two reasons:

  • There are differences between the splicing machineries of different organisms; therefore, it is not possible to apply a unified model to all DNA sequences
  • There are several mechanisms of splicing, each based on a different enzymatic mechanism

Therefore, in order to get a meaningful prediction of splice sites in a given sequence, a mechanism must be used which takes into account the specific target organism and splicing mechanism.

Available algorithms

Three state-of-the-art algorithms are available:

  1. Sequence alignment with template sequences that contain known splice sites; this is possible only if very closely related genes are investigated, and for at least one of them the splicing behaviour has been elucidated
  2. Statistical analysis using a Hidden Markov Model
  3. Pattern recognition and pattern matching

Leto

Entelechon’s gene optimization software Leto uses a Hidden Markov Model. Such a model is trained on a large set of known splice sites, retrieved from Genbank. Since the training set belongs to the specific target organism for which the splice sites are predicted, the model has a high specificity and accuracy.

Note that Leto identifies sequence parts which match a statistical pattern for splice sites in the target organism. This does not necessarily mean that splicing actually occurs.

Limitations for gene optimization

In gene optimization, it is desirable to reduce the number of possible splice sites. However, a predicted splice site does not automatically lead to splicing. For splicing to occur, an acceptor and donor site must be positioned in a suitable distance, the RNA must be free of extended secondary structure and the RNA must possess a suitable branching point between the acceptor and donor site.

Since it is relatively unlikely that all these conditions are met, a likely or very likely splice site should not be of too much concern in itself. The larger the number of identified splice sites, the more likely it is that actual splicing will occur, though, and a more than about five acceptor and five donor splice sites in an optimized sequence should be avoided.