algo.phonet_suggest
: phonetical suggestions
-
phonet_suggest
(misspelling, *, dictionary_words, table)[source] Phonetical suggestion algorithm provides suggestions based on phonetical (prononication) similarity. It requires .aff file to define
PHONE
table – which, we should add, is extremely rare in known dictionaries.Internally:
selects words from dictionary similarly to
ngram_suggest
(and even reuses itsroot_score
)and scores their phonetic representations (calculated with
metaphone()
) with phonetic representation of misspellingthen chooses the most similar ones with
final_score()
(ngram-based comparison)
Note, that as both this method, and
ngram_suggest
iterate through the whole dictionary, Hunspell optimizes suggestion search to making it all in one module/one loop. Spylls splits them for clarity.- Parameters
misspelling (str) – Misspelled word
dictionary_words (List[spylls.hunspell.data.dic.Word]) – All words from dictionary (only stems are used)
table (spylls.hunspell.data.aff.PhonetTable) – Table for metaphone producing
- Return type
Iterator[str]
def phonet_suggest(misspelling: str, *, dictionary_words: List[dic.Word], table: aff.PhonetTable) -> Iterator[str]: """ Phonetical suggestion algorithm provides suggestions based on phonetical (prononication) similarity. It requires .aff file to define :attr:`PHONE <spylls.hunspell.data.aff.Aff.PHONE>` table -- which, we should add, is *extremely* rare in known dictionaries. Internally: * selects words from dictionary similarly to :meth:`ngram_suggest <spylls.hunspell.algo.ngram_suggest.ngram_suggest>` (and even reuses its :meth:`root_score <spylls.hunspell.algo.ngram_suggest.root_score>`) * and scores their phonetic representations (calculated with :meth:`metaphone`) with phonetic representation of misspelling * then chooses the most similar ones with :meth:`final_score` (ngram-based comparison) Note, that as both this method, and :meth:`ngram_suggest <spylls.hunspell.algo.ngram_suggest.ngram_suggest>` iterate through the whole dictionary, Hunspell optimizes suggestion search to making it all in one module/one loop. Spylls splits them for clarity. Args: misspelling: Misspelled word dictionary_words: All words from dictionary (only stems are used) table: Table for metaphone producing """ misspelling = misspelling.lower() misspelling_ph = metaphone(table, misspelling) scores: List[Tuple[float, str]] = [] # First, select words from dictionary whose stems alike the misspelling we are trying to suggest. # # This cycle is exactly the same as the first cycle in ngram_suggest. In fact, in original Hunspell # both ngram and phonetical suggestion are done in one pass inside ngram_suggest, which is # more effective (one iteration through whole dictionary instead of two) but much harder to # understand and debug. # # Considering extreme rarity of metaphone-enabled dictionaries, and "educational" goal of # spylls, we split it out. for word in dictionary_words: if abs(len(word.stem) - len(misspelling)) > 3: continue # First, we calculate "regular" similarity score, just like in ngram_suggest nscore = ng.root_score(misspelling, word.stem) if word.alt_spellings: for variant in word.alt_spellings: nscore = max(nscore, ng.root_score(misspelling, variant)) if nscore <= 2: continue # ...and if it shows words are somewhat close, we calculate metaphone score score = 2 * sm.ngram(3, misspelling_ph, metaphone(table, word.stem), longer_worse=True) if len(scores) > MAX_ROOTS: heapq.heappushpop(scores, (score, word.stem)) else: heapq.heappush(scores, (score, word.stem)) guesses = heapq.nlargest(MAX_ROOTS, scores) # Finally, we sort suggestions by simplistic string similarity metric (of the misspelling and # dictionary word's stem) guesses2 = [(score + final_score(misspelling, word.lower()), word) for (score, word) in guesses] # (NB: actually, we might not need ``key`` here, but it is # added for sorting stability; doesn't changes the objective quality of suggestions, but passes # hunspell test ``phone.sug``!) guesses2 = sorted(guesses2, key=itemgetter(0), reverse=True) for (_, sug) in guesses2: yield sug
-
final_score
(word1, word2)[source] Calculate score of suggestion against misspelling.
- Parameters
word1 (str) – Misspelling
word2 (str) – Candidate suggestion
- Return type
float
def final_score(word1: str, word2: str) -> float: """ Calculate score of suggestion against misspelling. Args: word1: Misspelling word2: Candidate suggestion """ return 2 * sm.lcslen(word1, word2) - abs(len(word1) - len(word2)) + sm.leftcommonsubstring(word1, word2)
-
metaphone
(table, word)[source] Metaphone calculation
- Parameters
table (spylls.hunspell.data.aff.PhonetTable) – Metaphone table from
PHONE
directiveword (str) – Word to calculate metaphone for
- Return type
str
def metaphone(table: aff.PhonetTable, word: str) -> str: """ Metaphone calculation Args: table: Metaphone table from :attr:`PHONE <spylls.hunspell.data.aff.Aff.PHONE>` directive word: Word to calculate metaphone for """ pos = 0 word = word.upper() res = '' # Metaphone production in Spylls currently is implemented very naively, as just "search and replace" # for rules. To see what _potentially_ should been done, look at aspell's original description: # http://aspell.net/man-html/Phonetic-Code.html while pos < len(word): match = None for rule in table.rules[word[pos]]: match = rule.match(word, pos) if match: res += rule.replacement pos += match.span()[1] - match.span()[0] break if not match: pos += 1 return res