Below are two autocomplete features for the same list of countries. The first is features 'strict' matching - it will return results that exactly match the series of letters entered. The second is 'fuzzy' searching, and is able to account for human error such as incorrect spelling, correct letters in the wrong order, and spelling a word phonetically. It can also interpret additional information to help the user find the result they may be looking for. In this case, it allows users to search by ISO country code as well as the counry itself.
Strict search
Search examples that will return Croatia
as the top result:
- Cro (options appear after typing 3 letters)
- Croatia
Created using the most used autocomplete on the web - jQueryUI Autocomplete.
This is easy to implement, but it can only return results that are spelt correctly, and so has no flexibility in returning results that are close to certain results.
'Fuzzy' search
Search examples that will return Croatia
as the top result:
- Cro (options appear after typing 3 letters)
- Croatia
- Croasha
- HN (the country code for Croatia)
- crosha
- Crowaysha
- croaysha
And many more I likely haven't thought of...
Why is this important?
Autocomplete is the feature on many search boxes that guesses what you've started to type, in an effort to save user's time. However by default these components aren't great at accomodating for human error such as spelling errors or providing the correct letters but in the wrong order. This usually demands precision which can offset the time saved by implementing it, and even make the process longer.
Fuzzy searching
The nature of fuzzy searching is that it can search for results in a list in alternate ways, rather than relying solely on an exact match to what a user has inputted. This includes misspelt words, the correct letters but in the wrong order, or alternative spelling of a word (i.e. something that 'sounds' phonetically like the word).
Not only that, but fuzzy searching can take in associated or related information for each search result, rather than just the word. For example if a user is searching for a book, they may want to search by author, or ISBN - not just by title.
Types of fuzzy searching
I didn't create the logic for this "fuzzy" searching, I'm merely showing the ways you can accomplish better searching to help users, with and without acccess needs, and provide examples of them so that you can easily recreate the behaviour and implement it into your own sites.
Library examples
This algorithm measures how many characters there are in common between the text that the user has entered and the list of possible autocomplete options. This combats the issue of human error where text is entered quickly but in the wrong order, but also takes into account the idea of a user incorrectly spelling a word through sounding it out but getting a fair amount of the letters correct. For example, Croatia is pronounced 'cro-ay-sha' which, despite being the incorrect spelling, has 5 of the 7 letter in common with the correct spelling.
This is the approach used in the above example, and the code was created by Xavi, and for the purpose of this example all of the code has been compiled so it doesn't require webpack or NPM to run. However the full source code and installation instructions for missplete can be found in Xavi's Github repository if you're interested.
Library examples
The Levenshtein distance (also known as edit distance), is the minimum number of single-character edits (insertions, deletions or substitutions) required between two words to change one into the other. In this case, it compares what the user has entered against the list of possible options, and determines which it could get to with the fewest edits.
The most used version of this with autocomplete was created by Kiro Risk, a Software Engineer at LinkedIn, who created Fuse JS.
Which should I use?
Both provide a significant improvement on standard "strict" searching by allowing for human error, and by allowing the search to associate multiple pieces of informatiomn with each result to make it easier to find. In research conducted on both, Jaro-Winkler was considered the faster approach of the two, but the Fuse library is more heavily supported as a project. In my opinion there is very little between the two, so you can simply choose which library fits most neatly with your needs.