algo.capitalization
: capitalization and word cases
-
class
Type
(value) Type of capitalization, detected by
Casing.guess()
:NO
: all lowercase (“foo”)INIT
: titlecase, only initial letter is capitalized (“Foo”)ALL
: all uppercase (“FOO”)HUH
: mixed capitalization (“fooBar”)HUHINIT
: mixed capitalization, first letter is capitalized (“FooBar”)
-
class
Casing
[source] Represents casing-related algorithms specific for dictionary’s language. It is class, not a set of functions, because it needs to have subclasses for specific language casing, which have only some aspects different from generic one.
-
guess
(word)[source] Guess word’s capitalization. Redefined in
GermanCasing
.def guess(self, word: str) -> Type: # pylint: disable=no-self-use """ Guess word's capitalization. Redefined in :class:`GermanCasing`. """ if word.islower(): return Type.NO if word.isupper(): return Type.ALL if word[:1].isupper(): return Type.INIT if word[1:].islower() else Type.HUHINIT return Type.HUH
- Parameters
word (str) –
- Return type
-
lower
(word)[source] Lowercases the word. It returns list of possible lowercasings for all casing classes to behave consistently. In
GermanCasing
(and only there), lowercasing word like “STRASSE” produces two possibilities: “strasse” and “straße” (ß is most of the time upcased to SS, so we can’t decide which of downcased words is “right” and need to check both).Redefined also in
TurkicCasing
, because in Turkic languages lowercase “i” is uppercased as “İ”, and uppercase “I” is downcased as “ı”.- Parameters
word (str) –
- Return type
List[str]
def lower(self, word: str) -> List[str]: # pylint: disable=no-self-use """ Lowercases the word. It returns *list* of possible lowercasings for all casing classes to behave consistently. In :class:`GermanCasing` (and only there), lowercasing word like "STRASSE" produces two possibilities: "strasse" and "straße" (ß is most of the time upcased to SS, so we can't decide which of downcased words is "right" and need to check both). Redefined also in :class:`TurkicCasing`, because in Turkic languages lowercase "i" is uppercased as "İ", and uppercase "I" is downcased as "ı". Args: word: """ # can't be properly lowercased in non-Turkic collaction if not word or word[0] == 'İ': return [] # turkic "lowercase dot i" to latinic "i", just in case return [word.lower().replace('i̇', 'i')]
-
upper
(word)[source] Uppercase the word. Redefined in
TurkicCasing
, because in Turkic languages lowercase “i” is uppercased as “İ”, and uppercase “I” is downcased as “ı”.- Parameters
word (str) –
- Return type
str
def upper(self, word: str) -> str: # pylint: disable=no-self-use """ Uppercase the word. Redefined in :class:`TurkicCasing`, because in Turkic languages lowercase "i" is uppercased as "İ", and uppercase "I" is downcased as "ı". Args: word: """ return word.upper()
-
capitalize
(word)[source] Capitalize (convert word to all lowercase and first letter uppercase). Returns a list of results for same reasons as
lower()
- Parameters
word (str) –
- Return type
Iterator[str]
def capitalize(self, word: str) -> Iterator[str]: """ Capitalize (convert word to all lowercase and first letter uppercase). Returns a list of results for same reasons as :meth:`lower` Args: word: """ if len(word) == 1: return iter(self.upper(word[0])) else: return (self.upper(word[0]) + lower for lower in self.lower(word[1:]))
-
lowerfirst
(word)[source] Just change the case of the first letter to lower. Returns a list of results for same reasons as
lower()
- Parameters
word (str) –
- Return type
Iterator[str]
def lowerfirst(self, word: str) -> Iterator[str]: """ Just change the case of the first letter to lower. Returns a list of results for same reasons as :meth:`lower` Args: word: """ return (letter + word[1:] for letter in self.lower(word[0]))
-
variants
(word)[source] Returns hypotheses of how the word might have been cased (in dictionary), if we consider it is spelled correctly. E.g., if the word is “Kitten”, hypotheses are “kitten”, “Kitten”.
- Parameters
word (str) –
- Return type
Tuple[spylls.hunspell.algo.capitalization.Type, List[str]]
def variants(self, word: str) -> Tuple[Type, List[str]]: """ Returns hypotheses of how the word might have been cased (in dictionary), if we consider it is spelled correctly. E.g., if the word is "Kitten", hypotheses are "kitten", "Kitten". Args: word: """ captype = self.guess(word) if captype == Type.NO: result = [word] elif captype == Type.INIT: result = [word, *self.lower(word)] elif captype == Type.HUHINIT: result = [word, *self.lowerfirst(word)] elif captype == Type.HUH: result = [word] elif captype == Type.ALL: result = [word, *self.lower(word), *self.capitalize(word)] return (captype, result)
-
corrections
(word)[source] Returns hyphotheses of how the word might have been cased if it is a misspelling. For example, the word “DiCtionary” (HUHINIT capitalization) produces hypotheses “DiCtionary” (itself), “diCtionary”, “dictionary”, “Dictionary”, and all of them are checked by Suggest.
- Parameters
word (str) –
- Return type
Tuple[spylls.hunspell.algo.capitalization.Type, List[str]]
def corrections(self, word: str) -> Tuple[Type, List[str]]: """ Returns hyphotheses of how the word might have been cased if it is a misspelling. For example, the word "DiCtionary" (HUHINIT capitalization) produces hypotheses "DiCtionary" (itself), "diCtionary", "dictionary", "Dictionary", and all of them are checked by Suggest. Args: word: """ captype = self.guess(word) if captype == Type.NO: result = [word] elif captype == Type.INIT: result = [word, *self.lower(word)] elif captype == Type.HUHINIT: result = [word, *self.lowerfirst(word), *self.lower(word), *self.capitalize(word)] # TODO: also here and below, consider the theory FooBar meant Foo Bar elif captype == Type.HUH: result = [word, *self.lower(word)] elif captype == Type.ALL: result = [word, *self.lower(word), *self.capitalize(word)] return (captype, result)
-
coerce
(word, cap)[source] Used by suggest: by known (valid) suggestion, and initial word’s capitalization, produce proper suggestion capitalization. E.g. if the misspelling was “Kiten” (INIT capitalization), found suggestion “kitten”, then this method makes it “Kitten”.
def coerce(self, word: str, cap: Type) -> str: """ Used by suggest: by known (valid) suggestion, and initial word's capitalization, produce proper suggestion capitalization. E.g. if the misspelling was "Kiten" (INIT capitalization), found suggestion "kitten", then this method makes it "Kitten". """ if cap in (Type.INIT, Type.HUHINIT): return self.upper(word[0]) + word[1:] if cap == Type.ALL: return self.upper(word) return word
- Parameters
word (str) –
- Return type
str
-
-
class
TurkicCasing
[source] Redefines
Casing.upper()
andCasing.lower()
, because in Turkic languages lowercase “i” is uppercased as “İ”, and uppercase “I” is downcased as “ı”:>>> turkic = spylls.hunspell.algo.capitalization.TurkicCasing() >>> turkic.lower('Izmir')) ['ızmir'] >>> turkic.upper('Izmir') IZMİR
-
class
GermanCasing
[source] Redefines
Casing.lower()
because in German “SS” can be lowercased both as “ss” and “ß”:>>> german = spylls.hunspell.algo.capitalization.GermanCasing() >>> german.lower('STRASSE')) ['straße', 'strasse']