添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

FuzzyWuzzy

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Requirements

  • Python 2.7 or higher

  • difflib

  • python-Levenshtein (optional, provides a 4-10x speedup in String Matching, though may result in differing results for certain cases )

  • For testing

  • pycodestyle

  • hypothesis

  • pytest

  • Using PIP via PyPI

    pip install fuzzywuzzy

    or the following to install python-Levenshtein too

    pip install fuzzywuzzy[speedup]

    Using PIP via Github

    pip install git+git://github.com/seatgeek/[email protected]#egg=fuzzywuzzy

    Adding to your requirements.txt file (run pip install -r requirements.txt afterwards)

    git+ssh://[email protected]/seatgeek/[email protected]#egg=fuzzywuzzy

    Manually via GIT

    git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
    cd fuzzywuzzy
    python setup.py install

    Usage

    >>> from fuzzywuzzy import fuzz
    >>> from fuzzywuzzy import process

    Simple Ratio

    >>> fuzz.ratio("this is a test", "this is a test!")
    

    Partial Ratio

    >>> fuzz.partial_ratio("this is a test", "this is a test!")
    

    Token Sort Ratio

    >>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    >>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    

    Token Set Ratio

    >>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    >>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    

    Process

    >>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
    >>> process.extract("new york jets", choices, limit=2)
        [('New York Jets', 100), ('New York Giants', 78)]
    >>> process.extractOne("cowboys", choices)
        ("Dallas Cowboys", 90)

    You can also pass additional parameters to extractOne method to make it use a specific scorer. A typical use case is to match file paths:

    >>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
        ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
    >>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
        ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

    Known Ports

    FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

  • Java: xpresso’s fuzzywuzzy implementation

  • Java: fuzzywuzzy (java port)

  • Rust: fuzzyrusty (Rust port)

  • JavaScript: fuzzball.js (JavaScript port)

  • C++: Tmplt/fuzzywuzzy

  • C#: fuzzysharp (.Net port)

  • Go: go-fuzzywuzz (Go port)

  • Free Pascal: FuzzyWuzzy.pas (Free Pascal port)

  • Kotlin multiplatform: FuzzyWuzzy-Kotlin

  • R: fuzzywuzzyR (R port)

  • Download files

    Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

    Source Distribution Filter files by name, interpreter, ABI, and platform.

    If you're not sure about the file name format, learn more about wheel file names.

    The dropdown lists show the available interpreters, ABIs, and platforms.

    Enable javascript to be able to filter the list of wheel files.

    Copy a direct link to the current filters

  •