Fuzzy Searches • Aramis Jones

Primer

Fuzzy search helps us spot "close enough" matches when perfect spelling or phrasing would miss the mark. Below you will find five friendly algorithms applied to a basket of fruits. Each one balances speed, memory use, and accuracy a little differently. You can skim the summaries, peek at the code, and then try them yourself with the live dataset further down the page.

All of the algorithms share one extra trick: tokenization. We break every string into bite-sized words (tokens) and compare your query against those pieces. That keeps "berry" from being crushed by "a middle English vocabulary" and lets shorter words compete fairly inside longer phrases.

Contains Search

The simplest approach is the "contains" search. We lowercase everything and see if the letters you typed appear anywhere inside the text. It's lightning fast because it only scans for exact substrings—no math, no heavy lifting.

Pros

Speedy O(n) scan with very small memory use.
Perfect for quick filters or auto-complete boxes.

Cons

Misses typos and word swaps like "stawberry."
Finds matches even when the letters sit inside a different word (false positives).

Fruit Query: "berry"

Elderberry
Raspberry

Time & Space

Time	O(n)
Memory	O(1)

How it works: Think of it like using Ctrl+F in a document. The algorithm simply checks if your search term appears anywhere in the text. It converts everything to lowercase first so "Berry" matches "berry", then scans through character by character looking for an exact match. No fancy math—just a straightforward search.

contains.js

function searchContains(text, query) {
	const normalized = text.toLowerCase();
	const normalizedQuery = query.toLowerCase();
	return normalized.includes(normalizedQuery);
}

// Example: searchContains("Elderberry", "berry") → true

Hamming Distance

Hamming distance counts how many characters do not match when two strings sit side by side. We slide your query across the text and grab the best score. It likes fixed-length comparisons, so we only test segments that roughly match your input size.

Pros

Simple math: just counting differences.
Great at spotting small typos when words are the same length.

Cons

Struggles when the text shrinks or grows ("berry" vs. "berries").
Still skips around longer sentences because it hates length changes.

Fruit Query: "berry"

Cherry Hamming score: 1.00
Elderberry Hamming score: 0.00
Grapefruit Hamming score: 3.00

Time & Space

Time	O(n · m)
Memory	O(1)

How it works: Imagine comparing two words letter by letter, side by side. The algorithm slides your search term across the text and counts how many characters don't match at each position. It's like playing "spot the difference" but only when both words are the same length. The position with the fewest mismatches wins. This works great for catching typos like "berry" vs "beryy" but struggles when words have different lengths.

hamming.js

function hammingDistance(needle, haystack) {
	if (needle.length !== haystack.length) {
		return Infinity;
	}
	
	let distance = 0;
	for (let i = 0; i < needle.length; i++) {
		if (needle[i] !== haystack[i]) {
			distance++;
		}
	}
	return distance;
}

// Sliding window to find best match
function slidingHamming(needle, haystack) {
	let bestDistance = Infinity;
	for (let offset = 0; offset <= haystack.length - needle.length; offset++) {
		const segment = haystack.slice(offset, offset + needle.length);
		const distance = hammingDistance(needle, segment);
		if (distance < bestDistance) {
			bestDistance = distance;
			if (bestDistance === 0) break;
		}
	}
	return bestDistance;
}

Levenshtein Distance

Levenshtein distance measures the minimum number of edits—insertions, deletions, or substitutions—to turn one string into another. Here we run a cache-friendly two-row version of the Wagner–Fischer algorithm, which keeps the same O(n · m) work while dropping memory to O(min(n, m)) and bailing out early whenever the score would exceed our threshold.

Pros

Handles typos, missing letters, and swapped order all at once.
Gives a meaningful score: smaller numbers mean closer matches.

Cons

Still needs O(n · m) work despite the lighter memory footprint.
Slower on long paragraphs; you may need caching or pruning.

Fruit Query: "berry"

Cherry Levenshtein score: 2.00

Time & Space

Time	O(n · m)
Memory	O(min(n, m))

Spell Autocorrect (Levenshtein)

Misspelled	Suggestion	Edit Distance
strawbery	No close match	—
rasberry	Raspberry	1.00
kiwwi	Kiwi	1.00

How it works: This algorithm figures out the minimum number of single-character edits (add a letter, remove a letter, or change a letter) needed to transform one word into another. It builds up a table row by row, calculating the cost of each possible transformation. The clever part is that it only keeps two rows in memory at a time—the previous row and the current row—instead of storing the entire table. This dramatically reduces memory usage while still finding the optimal edit distance. Think of it like solving a puzzle by working through it step by step, but only keeping track of the last two steps instead of every step you've taken.

levenshtein.js

// Memory-optimized version: O(min(n,m)) space instead of O(n*m)
function levenshteinDistance(source, target, maxDistance = Infinity) {
	// Early exit if length difference exceeds threshold
	if (Math.abs(source.length - target.length) > maxDistance) {
		return Infinity;
	}
	
	// KEY OPTIMIZATION: Use shorter string for rows to minimize memory
	// This reduces space from O(n*m) to O(min(n,m))
	if (source.length > target.length) {
		[source, target] = [target, source];
	}
	
	// KEY OPTIMIZATION: Only store 2 rows instead of full matrix
	// Naive approach: const matrix = Array(n+1).fill(Array(m+1)) // O(n*m) space
	// Optimized: only need previous and current row // O(min(n,m)) space
	const previousRow = new Array(source.length + 1).fill(0);
	const currentRow = new Array(source.length + 1).fill(0);
	
	// Initialize first row: distance from empty string
	for (let i = 0; i <= source.length; i++) {
		previousRow[i] = i;
	}
	
	// Fill matrix row by row, reusing the two-row buffer
	for (let row = 1; row <= target.length; row++) {
		currentRow[0] = row;
		let rowMin = currentRow[0];
		
		for (let col = 1; col <= source.length; col++) {
			const cost = source[col - 1] === target[row - 1] ? 0 : 1;
			
			// Standard DP recurrence: min of 3 operations
			currentRow[col] = Math.min(
				previousRow[col] + 1,      // deletion
				currentRow[col - 1] + 1,    // insertion
				previousRow[col - 1] + cost // substitution
			);
			
			rowMin = Math.min(rowMin, currentRow[col]);
		}
		
		// Early termination if row minimum exceeds threshold
		if (rowMin > maxDistance) {
			return Infinity;
		}
		
		// Swap rows: current becomes previous for next iteration
		// This reuses memory instead of allocating new rows
		for (let i = 0; i <= source.length; i++) {
			previousRow[i] = currentRow[i];
		}
	}
	
	return previousRow[source.length];
}

// Example: levenshteinDistance("kitten", "sitting") → 3
// Memory: O(6) instead of O(42) for full matrix

autocorrect-example.js

import { searchLevenshtein } from "@/scripts/FuzzySearch";

const fruits = ["Apple", "Banana", "Cherry", "Dragonfruit", "Elderberry"];
const misspellings = ["strawbery", "rasberry", "kiwwi"];

const suggestions = misspellings.map((word) => {
	const [best] = searchLevenshtein(fruits.map((title) => ({ title })), word);
	return { word, suggestion: best?.title, edits: best?.score };
});

Jaro-Winkler Similarity

Jaro-Winkler measures how many characters line up in the same order and rewards matching prefixes. It shines on short strings such as author names because swapped characters hurt less than in strict edit distance, yet intentional prefixes (like surnames) boost the score.

Pros

Emphasizes leading characters, which people notice first.
Great for matching people or product names with swapped letters.

Cons

Similarity scores require post-processing if you want "distance".
Not ideal for long paragraphs where edit distance is clearer.

Fruit Query: "berry"

Cherry Similarity: 82.2% similar

Time & Space

Time	O(n + m)
Memory	O(n + m)

How it works: This algorithm is like a more forgiving version of edit distance. First, it finds matching characters between the two strings, even if they're slightly out of order. It then calculates a similarity score based on how many characters match and how many are transposed (swapped). The "Winkler" part adds a bonus if the words start with the same letters—because we humans pay more attention to the beginning of words. So "berry" and "beryy" would get a high score, but "berry" and "yberr" would get an even higher score because they share the same prefix. It returns a number between 0 and 1, where 1 means identical.

jaro-winkler.js

function jaroWinklerSimilarity(source, target, prefixScale = 0.1) {
	if (!source.length || !target.length) return 0;
	if (source === target) return 1;
	
	// Match window: characters within this distance can match
	const matchDistance = Math.floor(Math.max(source.length, target.length) / 2) - 1;
	const sourceMatches = new Array(source.length).fill(false);
	const targetMatches = new Array(target.length).fill(false);
	
	// Find matching characters
	let matches = 0;
	for (let i = 0; i < source.length; i++) {
		const start = Math.max(0, i - matchDistance);
		const end = Math.min(i + matchDistance + 1, target.length);
		
		for (let j = start; j < end; j++) {
			if (targetMatches[j] || source[i] !== target[j]) continue;
			sourceMatches[i] = true;
			targetMatches[j] = true;
			matches++;
			break;
		}
	}
	
	if (!matches) return 0;
	
	// Count transpositions (characters that match but are out of order)
	let transpositions = 0;
	for (let i = 0, j = 0; i < source.length; i++) {
		if (!sourceMatches[i]) continue;
		while (!targetMatches[j]) j++;
		if (source[i] !== target[j]) transpositions++;
		j++;
	}
	transpositions /= 2;
	
	// Calculate Jaro similarity
	const jaro = (
		matches / source.length +
		matches / target.length +
		(matches - transpositions) / matches
	) / 3;
	
	// Winkler modification: boost score for common prefix
	let prefixLength = 0;
	const maxPrefix = Math.min(4, Math.min(source.length, target.length));
	for (let i = 0; i < maxPrefix && source[i] === target[i]; i++) {
		prefixLength++;
	}
	
	return jaro + prefixLength * prefixScale * (1 - jaro);
}

Damerau-Levenshtein

Damerau-Levenshtein extends edit distance by counting swaps of neighboring characters as a single edit. That single upgrade captures common typos such as "bery" or "stbrawberry" without over-penalizing the result, which makes it a popular alternative to the classic algorithm.

Pros

Understands adjacent swaps, the most common human typo.
Drop-in replacement wherever you already use Levenshtein.

Cons

Slightly higher constant factors than basic Levenshtein.
Still quadratic for long strings unless you add heuristics.

Fruit Query: "berry"

Cherry Damerau score: 2.00

Time & Space

Time	O(n · m)
Memory	O(n · m)

How it works: This is Levenshtein distance with one important upgrade: it recognizes when two adjacent letters are swapped as a single mistake. So if you type "teh" instead of "the", regular Levenshtein counts that as two edits (swap the 'e' and 'h'), but Damerau-Levenshtein counts it as just one edit (a transposition). The algorithm builds a full table like Levenshtein, but adds an extra check: if it sees that two characters are swapped, it can fix both at once. This makes it much better at catching the most common type of typo—accidentally swapping two keys on your keyboard.

damerau.js

function damerauLevenshteinDistance(source, target, maxDistance = Infinity) {
	if (Math.abs(source.length - target.length) > maxDistance) {
		return Infinity;
	}
	
	const rows = source.length + 1;
	const cols = target.length + 1;
	const matrix = Array.from({ length: rows }, () => 
		new Array(cols).fill(0)
	);
	
	// Initialize first row and column
	for (let i = 0; i < rows; i++) matrix[i][0] = i;
	for (let j = 0; j < cols; j++) matrix[0][j] = j;
	
	// Fill matrix
	for (let i = 1; i < rows; i++) {
		let rowMin = Infinity;
		for (let j = 1; j < cols; j++) {
			const cost = source[i - 1] === target[j - 1] ? 0 : 1;
			
			// Standard Levenshtein operations
			matrix[i][j] = Math.min(
				matrix[i - 1][j] + 1,      // deletion
				matrix[i][j - 1] + 1,      // insertion
				matrix[i - 1][j - 1] + cost // substitution
			);
			
			// Damerau addition: transposition (swap adjacent chars)
			if (i > 1 && j > 1 &&
				source[i - 1] === target[j - 2] &&
				source[i - 2] === target[j - 1]) {
				matrix[i][j] = Math.min(
					matrix[i][j],
					matrix[i - 2][j - 2] + cost
				);
			}
			
			rowMin = Math.min(rowMin, matrix[i][j]);
		}
		
		// Early termination
		if (rowMin > maxDistance) return Infinity;
	}
	
	return matrix[rows - 1][cols - 1];
}

Search Benchmark

This quick benchmark uses a styled input to query the Open Library dataset generated at build time. Type to match against titles, subtitles, authors, or publication years and watch how long each search takes.

Search Computer Books

Results 0

Last Search (ms) 0.00

Search Timings

No searches yet.

Start typing to explore the dataset.