Hidden Markov models for proteins and DNA

Anders Krogh

Bioinformatics Centre,
University of Copenhagen
Universitetsparken 15,
2100 Copenhagen, Denmark
 

At the primary level of analysis both proteins and DNA are one
dimensional sequences of symbols from a finite alphabet.  Many
secondary properties, such as gene structure, have a grammatical
structure, and therefore methods from language modelling can often be
applied to biological sequences.  A hidden Markov model (HMM) is a
probabilistic model developed primarily in speech recognition
research, but it has recently proven very useful also for biological
sequence analysis.  In this talk I will describe two applications
of HMMs: prediction of genes in genomic DNA
and prediction of transmembrane helices in membrane proteins.