Computer Science Faculty Publications and Presentations

Document Type

Article

Publication Date

2009

Abstract

We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ. A motif G = g1g2 · · · gm is a string of m characters. Each background sequence is implanted into a probabilistically generated approximate copy of G. For an approximate copy b1b2 · · · bm of G, every character bi is probabilistically generated such that the probability for r $b_i\neq g_i$ is at most $\alpha$. In this paper, we give the first analytical proof that multiple background sequences do help with finding subtle and faint motifs. This work is a theoretical approach with a rigorous probabilistic analysis. We develop an algorithm that under the probabilistic model can find the implanted motif with high probability when the number of background sequences is reasonably large. Specifically, we prove that for α < 0.1771 and any constant x ≥ 8, there exist constants t0, δ0, δ1 > 0 such that if the length of the motif is at least δ0 log n, the alphabet has at least t0 characters, and there are at least δ1 log n0 input sequences, then in O(n3) time our algorithm finds the motif with probability at least 1 − 1 2x , where n is the longest length of any input sequence and n0 ≤ n is an upper bound for the length of the motif.

Comments

Recommended Citation

Fu, Bin, Ming-Yang Kao, and Lusheng Wang. 2009. “Probabilistic Analysis of a Motif Discovery Algorithm for Multiple Sequences.” SIAM Journal on Discrete Mathematics 23 (4): 1715–37. https://doi.org/10.1137/080720401.

Publication Title

SIAM Journal on Discrete Mathematics

DOI

10.1137/080720401

Download

Included in

Computer Sciences Commons

COinS

Computer Science Faculty Publications and Presentations

Probabilistic Analysis of a Motif Discovery Algorithm for Multiple Sequences

Document Type

Publication Date

Abstract

Comments

Recommended Citation

Publication Title

DOI

Included in

Browse

Search

Author Corner

Links

Computer Science Faculty Publications and Presentations

Probabilistic Analysis of a Motif Discovery Algorithm for Multiple Sequences

Authors

Document Type

Publication Date

Abstract

Comments

Recommended Citation

Publication Title

DOI

Included in

Share

Browse

Search

Author Corner

Links