The Levenshtein distance, also known as the edit distance, measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another. In the case of the words "doctor" and "bottle," here's how you can calculate their Levenshtein distance:
Start with an empty grid where the rows represent the characters of the first word ("doctor") and the columns represent the characters of the second word ("bottle").
Initialize the grid with the positions of the characters:
- d o c t o r
- 0 1 2 3 4 5 6
b 1
o 2
t 3
t 4
l 5
e 6
Begin filling in the grid by comparing characters from the two words. For each cell (i, j) in the grid, calculate the Levenshtein distance as follows:
- If the characters in the current positions (i, j) of the two words are the same, no edit (cost = 0) is needed. In this case, the value in cell (i, j) is the minimum of the values in the cells diagonally above and to the left:
grid[i-1][j-1]
.
- If the characters are different, calculate the cost of various edit operations (insertion, deletion, or substitution) and add 1 to the minimum cost of the three adjacent cells (above, left, and diagonally above-left).
Continue filling in the grid row by row until you reach the bottom-right corner. The value in the bottom-right corner will be the Levenshtein distance between the two words.
Here's the filled-in grid for "doctor" and "bottle":
- d o c t o r
- 0 1 2 3 4 5 6
b 1 1 2 3 4 5 6
o 2 2 1 2 3 4 5
t 3 3 2 2 2 3 4
t 4 4 3 2 3 3 4
l 5 5 4 3 3 4 4
e 6 6 5 4 4 4 5
The Levenshtein distance between "doctor" and "bottle" is 5, as it takes 5 single-character edits to transform one into the other.