Fuzzy Searches
Exploring fuzzy search algorithms: Contains, Hamming, Levenshtein, Jaro-Winkler, and Damerau-Levenshtein distance algorithms
Primer
Fuzzy search helps us spot "close enough" matches when perfect spelling or phrasing would miss the mark. Below you will find five friendly algorithms applied to a basket of fruits. Each one balances speed, memory use, and accuracy a little differently. You can skim the summaries, peek at the code, and then try them yourself with the live dataset further down the page.
All of the algorithms share one extra trick: tokenization. We break every string into bite-sized words (tokens) and compare your query against those pieces. That keeps "berry" from being crushed by "a middle English vocabulary" and lets shorter words compete fairly inside longer phrases.
Contains Search
The simplest approach is the "contains" search. We lowercase everything and see if the letters you typed appear anywhere inside the text. It's lightning fast because it only scans for exact substrings—no math, no heavy lifting.
- Speedy O(n) scan with very small memory use.
- Perfect for quick filters or auto-complete boxes.
- Misses typos and word swaps like "stawberry."
- Finds matches even when the letters sit inside a different word (false positives).
Fruit Query: "berry"
- Elderberry
- Raspberry
Time & Space
| Time | O(n) |
|---|---|
| Memory | O(1) |
How it works: Think of it like using Ctrl+F in a document. The algorithm simply checks if your search term appears anywhere in the text. It converts everything to lowercase first so "Berry" matches "berry", then scans through character by character looking for an exact match. No fancy math—just a straightforward search.
function searchContains(text, query) {
const normalized = text.toLowerCase();
const normalizedQuery = query.toLowerCase();
return normalized.includes(normalizedQuery);
}
// Example: searchContains("Elderberry", "berry") → true Hamming Distance
Hamming distance counts how many characters do not match when two strings sit side by side. We slide your query across the text and grab the best score. It likes fixed-length comparisons, so we only test segments that roughly match your input size.
- Simple math: just counting differences.
- Great at spotting small typos when words are the same length.
- Struggles when the text shrinks or grows ("berry" vs. "berries").
- Still skips around longer sentences because it hates length changes.
Fruit Query: "berry"
- Cherry Hamming score: 1.00
- Elderberry Hamming score: 0.00
- Grapefruit Hamming score: 3.00
Time & Space
| Time | O(n · m) |
|---|---|
| Memory | O(1) |
How it works: Imagine comparing two words letter by letter, side by side. The algorithm slides your search term across the text and counts how many characters don't match at each position. It's like playing "spot the difference" but only when both words are the same length. The position with the fewest mismatches wins. This works great for catching typos like "berry" vs "beryy" but struggles when words have different lengths.
function hammingDistance(needle, haystack) {
if (needle.length !== haystack.length) {
return Infinity;
}
let distance = 0;
for (let i = 0; i < needle.length; i++) {
if (needle[i] !== haystack[i]) {
distance++;
}
}
return distance;
}
// Sliding window to find best match
function slidingHamming(needle, haystack) {
let bestDistance = Infinity;
for (let offset = 0; offset <= haystack.length - needle.length; offset++) {
const segment = haystack.slice(offset, offset + needle.length);
const distance = hammingDistance(needle, segment);
if (distance < bestDistance) {
bestDistance = distance;
if (bestDistance === 0) break;
}
}
return bestDistance;
} Levenshtein Distance
Levenshtein distance measures the minimum number of edits—insertions, deletions, or substitutions—to turn one string into another. Here we run a cache-friendly two-row version of the Wagner–Fischer algorithm, which keeps the same O(n · m) work while dropping memory to O(min(n, m)) and bailing out early whenever the score would exceed our threshold.
- Handles typos, missing letters, and swapped order all at once.
- Gives a meaningful score: smaller numbers mean closer matches.
- Still needs O(n · m) work despite the lighter memory footprint.
- Slower on long paragraphs; you may need caching or pruning.
Fruit Query: "berry"
- Cherry Levenshtein score: 2.00
Time & Space
| Time | O(n · m) |
|---|---|
| Memory | O(min(n, m)) |
Spell Autocorrect (Levenshtein)
| Misspelled | Suggestion | Edit Distance |
|---|---|---|
| strawbery | No close match | — |
| rasberry | Raspberry | 1.00 |
| kiwwi | Kiwi | 1.00 |
How it works: This algorithm figures out the minimum number of single-character edits (add a letter, remove a letter, or change a letter) needed to transform one word into another. It builds up a table row by row, calculating the cost of each possible transformation. The clever part is that it only keeps two rows in memory at a time—the previous row and the current row—instead of storing the entire table. This dramatically reduces memory usage while still finding the optimal edit distance. Think of it like solving a puzzle by working through it step by step, but only keeping track of the last two steps instead of every step you've taken.
// Memory-optimized version: O(min(n,m)) space instead of O(n*m)
function levenshteinDistance(source, target, maxDistance = Infinity) {
// Early exit if length difference exceeds threshold
if (Math.abs(source.length - target.length) > maxDistance) {
return Infinity;
}
// KEY OPTIMIZATION: Use shorter string for rows to minimize memory
// This reduces space from O(n*m) to O(min(n,m))
if (source.length > target.length) {
[source, target] = [target, source];
}
// KEY OPTIMIZATION: Only store 2 rows instead of full matrix
// Naive approach: const matrix = Array(n+1).fill(Array(m+1)) // O(n*m) space
// Optimized: only need previous and current row // O(min(n,m)) space
const previousRow = new Array(source.length + 1).fill(0);
const currentRow = new Array(source.length + 1).fill(0);
// Initialize first row: distance from empty string
for (let i = 0; i <= source.length; i++) {
previousRow[i] = i;
}
// Fill matrix row by row, reusing the two-row buffer
for (let row = 1; row <= target.length; row++) {
currentRow[0] = row;
let rowMin = currentRow[0];
for (let col = 1; col <= source.length; col++) {
const cost = source[col - 1] === target[row - 1] ? 0 : 1;
// Standard DP recurrence: min of 3 operations
currentRow[col] = Math.min(
previousRow[col] + 1, // deletion
currentRow[col - 1] + 1, // insertion
previousRow[col - 1] + cost // substitution
);
rowMin = Math.min(rowMin, currentRow[col]);
}
// Early termination if row minimum exceeds threshold
if (rowMin > maxDistance) {
return Infinity;
}
// Swap rows: current becomes previous for next iteration
// This reuses memory instead of allocating new rows
for (let i = 0; i <= source.length; i++) {
previousRow[i] = currentRow[i];
}
}
return previousRow[source.length];
}
// Example: levenshteinDistance("kitten", "sitting") → 3
// Memory: O(6) instead of O(42) for full matrix import { searchLevenshtein } from "@/scripts/FuzzySearch";
const fruits = ["Apple", "Banana", "Cherry", "Dragonfruit", "Elderberry"];
const misspellings = ["strawbery", "rasberry", "kiwwi"];
const suggestions = misspellings.map((word) => {
const [best] = searchLevenshtein(fruits.map((title) => ({ title })), word);
return { word, suggestion: best?.title, edits: best?.score };
}); Jaro-Winkler Similarity
Jaro-Winkler measures how many characters line up in the same order and rewards matching prefixes. It shines on short strings such as author names because swapped characters hurt less than in strict edit distance, yet intentional prefixes (like surnames) boost the score.
- Emphasizes leading characters, which people notice first.
- Great for matching people or product names with swapped letters.
- Similarity scores require post-processing if you want "distance".
- Not ideal for long paragraphs where edit distance is clearer.
Fruit Query: "berry"
- Cherry Similarity: 82.2% similar
Time & Space
| Time | O(n + m) |
|---|---|
| Memory | O(n + m) |
How it works: This algorithm is like a more forgiving version of edit distance. First, it finds matching characters between the two strings, even if they're slightly out of order. It then calculates a similarity score based on how many characters match and how many are transposed (swapped). The "Winkler" part adds a bonus if the words start with the same letters—because we humans pay more attention to the beginning of words. So "berry" and "beryy" would get a high score, but "berry" and "yberr" would get an even higher score because they share the same prefix. It returns a number between 0 and 1, where 1 means identical.
function jaroWinklerSimilarity(source, target, prefixScale = 0.1) {
if (!source.length || !target.length) return 0;
if (source === target) return 1;
// Match window: characters within this distance can match
const matchDistance = Math.floor(Math.max(source.length, target.length) / 2) - 1;
const sourceMatches = new Array(source.length).fill(false);
const targetMatches = new Array(target.length).fill(false);
// Find matching characters
let matches = 0;
for (let i = 0; i < source.length; i++) {
const start = Math.max(0, i - matchDistance);
const end = Math.min(i + matchDistance + 1, target.length);
for (let j = start; j < end; j++) {
if (targetMatches[j] || source[i] !== target[j]) continue;
sourceMatches[i] = true;
targetMatches[j] = true;
matches++;
break;
}
}
if (!matches) return 0;
// Count transpositions (characters that match but are out of order)
let transpositions = 0;
for (let i = 0, j = 0; i < source.length; i++) {
if (!sourceMatches[i]) continue;
while (!targetMatches[j]) j++;
if (source[i] !== target[j]) transpositions++;
j++;
}
transpositions /= 2;
// Calculate Jaro similarity
const jaro = (
matches / source.length +
matches / target.length +
(matches - transpositions) / matches
) / 3;
// Winkler modification: boost score for common prefix
let prefixLength = 0;
const maxPrefix = Math.min(4, Math.min(source.length, target.length));
for (let i = 0; i < maxPrefix && source[i] === target[i]; i++) {
prefixLength++;
}
return jaro + prefixLength * prefixScale * (1 - jaro);
} Damerau-Levenshtein
Damerau-Levenshtein extends edit distance by counting swaps of neighboring characters as a single edit. That single upgrade captures common typos such as "bery" or "stbrawberry" without over-penalizing the result, which makes it a popular alternative to the classic algorithm.
- Understands adjacent swaps, the most common human typo.
- Drop-in replacement wherever you already use Levenshtein.
- Slightly higher constant factors than basic Levenshtein.
- Still quadratic for long strings unless you add heuristics.
Fruit Query: "berry"
- Cherry Damerau score: 2.00
Time & Space
| Time | O(n · m) |
|---|---|
| Memory | O(n · m) |
How it works: This is Levenshtein distance with one important upgrade: it recognizes when two adjacent letters are swapped as a single mistake. So if you type "teh" instead of "the", regular Levenshtein counts that as two edits (swap the 'e' and 'h'), but Damerau-Levenshtein counts it as just one edit (a transposition). The algorithm builds a full table like Levenshtein, but adds an extra check: if it sees that two characters are swapped, it can fix both at once. This makes it much better at catching the most common type of typo—accidentally swapping two keys on your keyboard.
function damerauLevenshteinDistance(source, target, maxDistance = Infinity) {
if (Math.abs(source.length - target.length) > maxDistance) {
return Infinity;
}
const rows = source.length + 1;
const cols = target.length + 1;
const matrix = Array.from({ length: rows }, () =>
new Array(cols).fill(0)
);
// Initialize first row and column
for (let i = 0; i < rows; i++) matrix[i][0] = i;
for (let j = 0; j < cols; j++) matrix[0][j] = j;
// Fill matrix
for (let i = 1; i < rows; i++) {
let rowMin = Infinity;
for (let j = 1; j < cols; j++) {
const cost = source[i - 1] === target[j - 1] ? 0 : 1;
// Standard Levenshtein operations
matrix[i][j] = Math.min(
matrix[i - 1][j] + 1, // deletion
matrix[i][j - 1] + 1, // insertion
matrix[i - 1][j - 1] + cost // substitution
);
// Damerau addition: transposition (swap adjacent chars)
if (i > 1 && j > 1 &&
source[i - 1] === target[j - 2] &&
source[i - 2] === target[j - 1]) {
matrix[i][j] = Math.min(
matrix[i][j],
matrix[i - 2][j - 2] + cost
);
}
rowMin = Math.min(rowMin, matrix[i][j]);
}
// Early termination
if (rowMin > maxDistance) return Infinity;
}
return matrix[rows - 1][cols - 1];
} Search Benchmark
This quick benchmark uses a styled input to query the Open Library dataset generated at build time. Type to match against titles, subtitles, authors, or publication years and watch how long each search takes.
Search Timings
- No searches yet.
Start typing to explore the dataset.