Most linear-time string searching algorithms are tricky to implement, and require heavy preprocessing of the pattern before running the search. This article presents the Rabin-Karp algorithm, a simple probabilistic string searching algorithm based on hashing and polynomial equality testing, along with a Python implementation. A streaming variant of the algorithm and a generalization to searching for multiple patterns in one pass over the input are also described, and performance aspects are discussed.
The algorithm is probabilistic in that it doesn’t always return correct results; more precisely, it returns all valid matches and (with reasonably small probability) a few incorrect matches (algorithms such as this one that tend to be over-optimistic in reporting their results are usually said to be true-biased).
TL;DR: I’ve taken plenty of notes while developing my YiXue Chinese dictionary for Windows Phone. Topics covered in this article include performance tips, best practices, and plenty of code snippets for Windows Phone.
This article presents notes and remarks that I gathered while working on a Chinese Dictionary App for Windows Phone, YiXue Chinese Dictionary: mistakes I made, fun tips I wrote down, and so on.
I initially didn’t really intend to create a full blog post out of these notes, but their increasing number, and my app recently placing second in Microsoft France’s App Awards contest, gave me enough motivation to share them with the community. Along with various tips and tricks and answers to often-asked (but seldom answered) questions, I will discuss a number of performance improvements that specifically apply to Windows Phone apps.
Comparing strings is often — erroneously — said to be a costly process. In this article I derive the theoretical asymptotic cost of comparing random strings of arbitrary length, and measure it in C, C++, C# and Python.