Alecs' Blog Did you ever go clear..

Suffix Array

suffix tree and suffix array are good at full-text search. The underlying property they utilize is very simple (yet elegant & powerful): Any substring of a given string must be the prefix of one of suffix strings of the original string.

There are really many many many many docs/papers/codes concerning these two structures and their applications. In short, suffix tree is easy to understand and since it has an explicit tree structure, it's easier to apply. But it consumes more space than the suffix array and is nontrivial to construct quickly (linear time). The suffix array on the other hand is more compact but it's still not very easy to construct in linear time. Some algorithms just first build the corresponding suffix tree and then do a traverse to build the resulting array.

Recently there's a new algorithm that uses a 'difference cover' approach (divide & conquer wrt modular to 3) to construct the array in linear time. See the paper Linear Work Suffix Array Construction. Among other Good Things (tm), it elegantly utilize recursion, radix/counting sort and various index reference tricks. Another good reference is about how to get the lcp (largest common prefix) from a built suffix array in linear time: Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications.

This problem is just to compute the longest common substring, which can be beautifully solved using the suffix array and lcp algorithm mentioned above. Here is the final code.

Oh lets end this blog with a poem praising the magic structure 'tree' (though here we didnt use it but an array, the tree structure is really 'magic' and thus deserves such kind of praise.)

I think that I shall never see
A poem lovely as a tree.
Poems are made by fools like me,
But only God can make a tree.

24 Apr 2006

Fork me on GitHub