Knuth–Morris–Pratt Algorithm
@kmpPseudo-code
function build_lps(P):
m <- length of P
lps[0] <- 0
len <- 0
i <- 1
while i < m:
if P[i] == P[len] then
len <- len + 1
lps[i] <- len
i <- i + 1
else
if len != 0 then
len <- lps[len - 1]
else
lps[i] <- 0
i <- i + 1
function kmp_search(T, P):
n <- length of T
m <- length of P
lps <- build_lps(P)
i <- 0
j <- 0
k <- 0
while i < n:
if T[i] == P[j] then
i <- i + 1
j <- j + 1
if j == m then
k <- k + 1
j <- lps[j - 1]
else if i < n and T[i] != P[j] then
if j != 0 then
j <- lps[j - 1]
else
i <- i + 1
return k
- P P is the pattern you are looking for, like a word you want to find.
- T T is the full text where you search, like a long sentence or book.
- i i is the position in the text T.
- j j is the position in the pattern P.
- lps lps is an array that tells us how much we can skip when a mismatch happens.
- m m is the length of the pattern.
- n n is the length of the text.
- k k counts how many times we found the pattern.
- 0 0 is used to start indexes and reset positions.
- -1 -1 can represent an invalid or non-existing position.
- 1 1 represents a single step forward or a found match increment.
Recipe Steps
Think of searching for a word in a long text. Normally, if you mismatch, you restart from the next position, which is slow.
KMP first studies the pattern itself and builds a helper table (lps). This is like learning patterns inside the word.
This table tells you how far you can jump when something does not match, instead of starting over.
Now start scanning the text. If characters match, move forward in both text and pattern.
If there is a mismatch, use the lps table to jump to a smarter position in the pattern instead of restarting.
When you reach the end of the pattern, you found a match. Count it and continue searching.
Questions & Answers
- The lps table is built only from the pattern, not the text.
- After finding a match, we do not restart completely but reuse previous information.
- Understanding the lps table is the hardest but most important part.
- KMP is very useful when searching the same pattern many times.
- It works best for string matching problems with repeated patterns.
Related Algorithms
Rabin-Karp Algorithm
Rabin-Karp is a string searching algorithm that uses hashing to quickly compare a pattern with parts of a text using a rolling hash technique.
ViewZ-Algorithm
Z-Algorithm is a string searching algorithm that uses a preprocessed array to find all occurrences of a pattern within a main string. It is particularly useful for finding all occurrences of a pattern in a long string, as it allows for efficient searching and multiple match detection.
View