Most linear-time string searching algorithms are tricky to implement, and require heavy preprocessing of the pattern before running the search. This article presents the Rabin-Karp algorithm, a simple probabilistic string searching algorithm based on hashing and polynomial equality testing, along with a Python implementation. A streaming variant of the algorithm and a generalization to searching for multiple patterns in one pass over the input are also described, and performance aspects are discussed.
The algorithm is probabilistic in that it doesn’t always return correct results; more precisely, it returns all valid matches and (with reasonably small probability) a few incorrect matches (algorithms such as this one that tend to be over-optimistic in reporting their results are usually said to be true-biased).
Most functional languages offer support for tagged unions (also called sum types), a type of data structure capable of successively holding values of several fixed types. This article shows how to use abstract classes to emulate such behaviour in high-level object-oriented languages such as C#, Java, or VB.Net ((.Net languages have the [StructLayout(LayoutKind.Explicit)] attribute, which makes it possible to create structs which behave a lot like C++ unions. But that only works with primitive types.)).