Package org.biojava.bio.search
Class KnuthMorrisPrattSearch
java.lang.Object
org.biojava.bio.search.KnuthMorrisPrattSearch
An object to find exact subsequences within a sequence.
Reference: KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977, Fast pattern matching in strings, SIAM Journal on Computing 6(1):323-350. invalid input: '<'/P.
USAGE:
When the object is constructed thefindMatches()
method would be called. This will return an int[] giving the offsets
(ie the location of the first symbol of each match in the text).
The getKMPNextTable()
returns the table of border lengths
used in the algorithm. This method is protected as it is unlikely it
will be needed except for debugging.
The algorithm finds exact matches therefore ambiguity symbols will match only themselves. The class cannot perform regular expressions. The class operates on all alphabets thus if searching for a DNA pattern you should compile both the pattern and its reverse complement.
WARNING the behaviour of a pattern containing gaps is undefined. It's not recommended that you try it.
Copyright: Copyright (c) 2002
Company: AgResearch
- Version:
- 1.0
- Author:
- Mark Schreiber
-
Constructor Summary
ConstructorsConstructorDescriptionKnuthMorrisPrattSearch
(SymbolList pattern) Constructs a KMP matcher to find exact occurances ofpattern
intext
using the Knuth-Morris-Pratt algorithm. -
Method Summary
Modifier and TypeMethodDescriptionint[]
findMatches
(SymbolList text) This will return an int[] giving the offsets of the matches intext
(ie the location of the first symbol of each match in thetext
).protected int[]
Returns the table of border lengthsstatic void
Demo and Test method
-
Constructor Details
-
KnuthMorrisPrattSearch
Constructs a KMP matcher to find exact occurances ofpattern
intext
using the Knuth-Morris-Pratt algorithm.A new class should be constructed for each occurance of
pattern
.- Parameters:
pattern
- the pattern to search for
-
-
Method Details
-
findMatches
This will return an int[] giving the offsets of the matches intext
(ie the location of the first symbol of each match in thetext
).- Parameters:
text
- the text or sequence to search for the pattern- Returns:
- an int[] giving the offsets to the matches or a zero length array if there are no matches.
-
getKmpNextTable
Returns the table of border lengths- Returns:
- an int[] of border lenghts
-
getPattern
- Returns:
- the pattern being searched for
-
main
Demo and Test method- Parameters:
args
- no arguments required- Throws:
Exception
- if the test fails
-