algo.capitalization: capitalization and word cases

class Type(value)

Type of capitalization, detected by Casing.guess():

  • NO: all lowercase (“foo”)

  • INIT: titlecase, only initial letter is capitalized (“Foo”)

  • ALL: all uppercase (“FOO”)

  • HUH: mixed capitalization (“fooBar”)

  • HUHINIT: mixed capitalization, first letter is capitalized (“FooBar”)

class Casing[source]

Represents casing-related algorithms specific for dictionary’s language. It is class, not a set of functions, because it needs to have subclasses for specific language casing, which have only some aspects different from generic one.

guess(word)[source]

Guess word’s capitalization. Redefined in GermanCasing.

def guess(self, word: str) -> Type:     # pylint: disable=no-self-use
    """
    Guess word's capitalization. Redefined in :class:`GermanCasing`.
    """

    if word.islower():
        return Type.NO
    if word.isupper():
        return Type.ALL
    if word[:1].isupper():
        return Type.INIT if word[1:].islower() else Type.HUHINIT
    return Type.HUH
Parameters:

word (str) –

Return type:

spylls.hunspell.algo.capitalization.Type

lower(word)[source]

Lowercases the word. It returns list of possible lowercasings for all casing classes to behave consistently. In GermanCasing (and only there), lowercasing word like “STRASSE” produces two possibilities: “strasse” and “straße” (ß is most of the time upcased to SS, so we can’t decide which of downcased words is “right” and need to check both).

Redefined also in TurkicCasing, because in Turkic languages lowercase “i” is uppercased as “İ”, and uppercase “I” is downcased as “ı”.

Parameters:

word (str) –

Return type:

List[str]

def lower(self, word: str) -> List[str]:  # pylint: disable=no-self-use
    """
    Lowercases the word. It returns *list* of possible lowercasings for all casing classes to
    behave consistently. In :class:`GermanCasing` (and only there), lowercasing word like
    "STRASSE" produces two possibilities: "strasse" and "straße" (ß is most of the time upcased
    to SS, so we can't decide which of downcased words is "right" and need to check both).

    Redefined also in :class:`TurkicCasing`, because in Turkic languages lowercase
    "i" is uppercased as "İ", and uppercase "I" is downcased as "ı".

    Args:
        word:
    """

    # can't be properly lowercased in non-Turkic collaction
    if not word or word[0] == 'İ':
        return []

    # turkic "lowercase dot i" to latinic "i", just in case
    return [word.lower().replace('i̇', 'i')]
upper(word)[source]

Uppercase the word. Redefined in TurkicCasing, because in Turkic languages lowercase “i” is uppercased as “İ”, and uppercase “I” is downcased as “ı”.

Parameters:

word (str) –

Return type:

str

def upper(self, word: str) -> str:   # pylint: disable=no-self-use
    """
    Uppercase the word. Redefined in :class:`TurkicCasing`, because in Turkic languages lowercase
    "i" is uppercased as "İ", and uppercase "I" is downcased as "ı".

    Args:
        word:
    """
    return word.upper()
capitalize(word)[source]

Capitalize (convert word to all lowercase and first letter uppercase). Returns a list of results for same reasons as lower()

Parameters:

word (str) –

Return type:

Iterator[str]

def capitalize(self, word: str) -> Iterator[str]:
    """
    Capitalize (convert word to all lowercase and first letter uppercase). Returns a list of
    results for same reasons as :meth:`lower`

    Args:
        word:
    """
    if len(word) == 1:
        return iter(self.upper(word[0]))
    else:
        return (self.upper(word[0]) + lower for lower in self.lower(word[1:]))
lowerfirst(word)[source]

Just change the case of the first letter to lower. Returns a list of results for same reasons as lower()

Parameters:

word (str) –

Return type:

Iterator[str]

def lowerfirst(self, word: str) -> Iterator[str]:
    """
    Just change the case of the first letter to lower. Returns a list of
    results for same reasons as :meth:`lower`

    Args:
        word:
    """
    return (letter + word[1:] for letter in self.lower(word[0]))
variants(word)[source]

Returns hypotheses of how the word might have been cased (in dictionary), if we consider it is spelled correctly. E.g., if the word is “Kitten”, hypotheses are “kitten”, “Kitten”.

Parameters:

word (str) –

Return type:

Tuple[spylls.hunspell.algo.capitalization.Type, List[str]]

def variants(self, word: str) -> Tuple[Type, List[str]]:
    """
    Returns hypotheses of how the word might have been cased (in dictionary), if we consider it is
    spelled correctly. E.g., if the word is "Kitten", hypotheses are "kitten", "Kitten".

    Args:
        word:
    """
    captype = self.guess(word)

    if captype == Type.NO:
        result = [word]
    elif captype == Type.INIT:
        result = [word, *self.lower(word)]
    elif captype == Type.HUHINIT:
        result = [word, *self.lowerfirst(word)]
    elif captype == Type.HUH:
        result = [word]
    elif captype == Type.ALL:
        result = [word, *self.lower(word), *self.capitalize(word)]

    return (captype, result)
corrections(word)[source]

Returns hyphotheses of how the word might have been cased if it is a misspelling. For example, the word “DiCtionary” (HUHINIT capitalization) produces hypotheses “DiCtionary” (itself), “diCtionary”, “dictionary”, “Dictionary”, and all of them are checked by Suggest.

Parameters:

word (str) –

Return type:

Tuple[spylls.hunspell.algo.capitalization.Type, List[str]]

def corrections(self, word: str) -> Tuple[Type, List[str]]:
    """
    Returns hyphotheses of how the word might have been cased if it is a misspelling. For example,
    the word "DiCtionary" (HUHINIT capitalization) produces hypotheses "DiCtionary" (itself),
    "diCtionary", "dictionary", "Dictionary", and all of them are checked by Suggest.

    Args:
        word:
    """

    captype = self.guess(word)

    if captype == Type.NO:
        result = [word]
    elif captype == Type.INIT:
        result = [word, *self.lower(word)]
    elif captype == Type.HUHINIT:
        result = [word, *self.lowerfirst(word), *self.lower(word), *self.capitalize(word)]
        # TODO: also here and below, consider the theory FooBar meant Foo Bar
    elif captype == Type.HUH:
        result = [word, *self.lower(word)]
    elif captype == Type.ALL:
        result = [word, *self.lower(word), *self.capitalize(word)]

    return (captype, result)
coerce(word, cap)[source]

Used by suggest: by known (valid) suggestion, and initial word’s capitalization, produce proper suggestion capitalization. E.g. if the misspelling was “Kiten” (INIT capitalization), found suggestion “kitten”, then this method makes it “Kitten”.

def coerce(self, word: str, cap: Type) -> str:
    """
    Used by suggest: by known (valid) suggestion, and initial word's capitalization, produce
    proper suggestion capitalization. E.g. if the misspelling was "Kiten" (INIT capitalization),
    found suggestion "kitten", then this method makes it "Kitten".
    """
    if cap in (Type.INIT, Type.HUHINIT):
        return self.upper(word[0]) + word[1:]
    if cap == Type.ALL:
        return self.upper(word)
    return word
Parameters:
Return type:

str

class TurkicCasing[source]

Redefines Casing.upper() and Casing.lower(), because in Turkic languages lowercase “i” is uppercased as “İ”, and uppercase “I” is downcased as “ı”:

>>> turkic = spylls.hunspell.algo.capitalization.TurkicCasing()
>>> turkic.lower('Izmir'))
['ızmir']
>>> turkic.upper('Izmir')
IZMİR
class GermanCasing[source]

Redefines Casing.lower() because in German “SS” can be lowercased both as “ss” and “ß”:

>>> german = spylls.hunspell.algo.capitalization.GermanCasing()
>>> german.lower('STRASSE'))
['straße', 'strasse']