subdist
subdist is a fast Python module for finding fuzzy substring matches. It uses a modified version of the Levenshtein distance algorithm (described here).
Version History
- 0.1.1
- Initial release
- 0.2
- Added get_score function, which returns a score between 0.0 and 1.0 based on the edit distance
- 0.2.1
- Improved docstrings, refactored, added slight speedup to distance function.
Usage
Get the Levenshtein (edit) distance of a substring
import subdist
needle = u"short string"
haystack = u"This is a long string"
distance = subdist.substring(needle, haystack)
Get the fuzzy match score (0.0 to 1.0) of a substring
import subdist
score = subdist.get_score(needle, haystack)
Quick Links
- installer package (subdist-0.2.1.tar.gz)
- Win32 binary installer (Python 2.5) (subdist-0.2.1.win32-py2.5.exe)
- Win32 binary installer (Python 2.4) (subdist-0.2.1.win32-py2.4.exe)
Here is a blog article describing the algorithm. Here is an article describing the C extension.