THE USE OF SIMILARITY MEASURES ON WORD LIST

ADEDIRAN SAMUEL OLUWOLE
Computer Science, Federal University of Agriculture, Abeokuta
January, 2015
Full text (external site)
 

Abstract

This proposed project refers to a general problem of having a particular word (pattern) which is
to be looked for amongst a variety of words in an established database. This was made possible
using the application of just three algorithms. So, the development of an application has been
made possible to search for words and get results that are similar to the word using these three
algorithms. They are Knutt-Morris-Pratt, Boyer-Moore and Jaro Winkler. The data sets used as
input to the three algorithms are word list of the same group of words which shows text and
pattern matching. KMP is the best to determine the nearest pattern matching compared to Boyer-
Moore and Jaro Winkler. It gives the highest similarity value followed by Boyer-Moore and Jaro
Winkler.These three algorithms work in their own different distinct ways to give out the result
needed as regards to the kind of pattern that is being looked for. These algorithms were
implemented and word lists as a group of words serve as input data. This kind of a scenario
resembles or is similar to the common dictionary that we have when looking for a
particular word. We can also say that this is similar to the way the series of famous search
engines that we have today like google work out.