abydos.stemmer package

abydos.stemmer.

The stemmer package collects stemmer classes for a number of languages including:

Each stemmer has a stem method, which takes a word and returns its stemmed form:

>>> stmr = Porter()
>>> stmr.stem('democracy')
'democraci'
>>> stmr.stem('trusted')
'trust'

class abydos.stemmer.CLEFGerman[source]

Bases: _Stemmer

CLEF German stemmer.

The CLEF German stemmer is defined at [Sav05].

New in version 0.3.6.

stem(word: str) str[source]

Return CLEF German stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = CLEFGerman()
>>> stmr.stem('lesen')
'lese'
>>> stmr.stem('graues')
'grau'
>>> stmr.stem('buchstabieren')
'buchstabier'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.CLEFGermanPlus[source]

Bases: _Stemmer

CLEF German stemmer plus.

The CLEF German stemmer plus is defined at [Sav05].

New in version 0.3.6.

stem(word: str) str[source]

Return 'CLEF German stemmer plus' stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = CLEFGermanPlus()
>>> stmr.stem('lesen')
'les'
>>> stmr.stem('graues')
'grau'
>>> stmr.stem('buchstabieren')
'buchstabi'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.CLEFSwedish[source]

Bases: _Stemmer

CLEF Swedish stemmer.

The CLEF Swedish stemmer is defined at [Sav05].

New in version 0.3.6.

stem(word: str) str[source]

Return CLEF Swedish stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = CLEFSwedish()
>>> stmr.stem('undervisa')
'undervis'
>>> stmr.stem('suspension')
'suspensio'
>>> stmr.stem('visshet')
'viss'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.Caumanns[source]

Bases: _Stemmer

Caumanns stemmer.

Jörg Caumanns' stemmer is described in his article in [Cau99].

This implementation is based on the GermanStemFilter described at [Lan13].

New in version 0.3.6.

stem(word: str) str[source]

Return Caumanns German stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = Caumanns()
>>> stmr.stem('lesen')
'les'
>>> stmr.stem('graues')
'grau'
>>> stmr.stem('buchstabieren')
'buchstabier'

New in version 0.2.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.Lovins[source]

Bases: _Stemmer

Lovins stemmer.

The Lovins stemmer is described in Julie Beth Lovins's article [Lov68].

New in version 0.3.6.

Initialize the stemmer.

New in version 0.3.6.

stem(word: str) str[source]

Return Lovins stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = Lovins()
>>> stmr.stem('reading')
'read'
>>> stmr.stem('suspension')
'suspens'
>>> stmr.stem('elusiveness')
'elus'

New in version 0.2.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.PaiceHusk[source]

Bases: _Stemmer

Paice-Husk stemmer.

Implementation of the Paice-Husk Stemmer, also known as the Lancaster Stemmer, developed by Chris Paice, with the assistance of Gareth Husk

This is based on the algorithm's description in [Pai90].

New in version 0.3.6.

stem(word: str) str[source]

Return Paice-Husk stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = PaiceHusk()
>>> stmr.stem('assumption')
'assum'
>>> stmr.stem('verifiable')
'ver'
>>> stmr.stem('fancies')
'fant'
>>> stmr.stem('fanciful')
'fancy'
>>> stmr.stem('torment')
'tor'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.Porter(early_english: bool = False)[source]

Bases: _Stemmer

Porter stemmer.

The Porter stemmer is described in [Por80].

New in version 0.3.6.

Initialize Porter instance.

Parameters:

early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)

New in version 0.4.0.

stem(word: str) str[source]

Return Porter stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = Porter()
>>> stmr.stem('reading')
'read'
>>> stmr.stem('suspension')
'suspens'
>>> stmr.stem('elusiveness')
'elus'
>>> stmr = Porter(early_english=True)
>>> stmr.stem('eateth')
'eat'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.Porter2(early_english: bool = False)[source]

Bases: _Snowball

Porter2 (Snowball English) stemmer.

The Porter2 (Snowball English) stemmer is defined in [Por02].

New in version 0.3.6.

Initialize Porter2 instance.

Parameters:

early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)

New in version 0.4.0.

stem(word: str) str[source]

Return the Porter2 (Snowball English) stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = Porter2()
>>> stmr.stem('reading')
'read'
>>> stmr.stem('suspension')
'suspens'
>>> stmr.stem('elusiveness')
'elus'
>>> stmr = Porter2(early_english=True)
>>> stmr.stem('eateth')
'eat'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.SStemmer[source]

Bases: _Stemmer

S-stemmer.

The S stemmer is defined in [Har91].

New in version 0.3.6.

stem(word: str) str[source]

Return the S-stemmed form of a word.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = SStemmer()
>>> stmr.stem('summaries')
'summary'
>>> stmr.stem('summary')
'summary'
>>> stmr.stem('towers')
'tower'
>>> stmr.stem('reading')
'reading'
>>> stmr.stem('census')
'census'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.Schinke[source]

Bases: _Stemmer

Schinke stemmer.

This is defined in [SGRW96].

New in version 0.3.6.

stem(word: str) str[source]

Return the stem of a word according to the Schinke stemmer.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = Schinke()
>>> stmr.stem('atque')
'atque,atque'
>>> stmr.stem('census')
'cens,censu'
>>> stmr.stem('virum')
'uir,uiru'
>>> stmr.stem('populusque')
'popul,populu'
>>> stmr.stem('senatus')
'senat,senatu'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

Changed in version 0.6.0: Made return a str with the noun then verb stem, comma-separated

stem_dict(word: str) Dict[str, str][source]

Return the stem of a word according to the Schinke stemmer.

Parameters:

word (str) -- The word to stem

Returns:

Word stems in a dictionary

Return type:

dict

Examples

>>> stmr = Schinke()
>>> stmr.stem_dict('atque')
{'n': 'atque', 'v': 'atque'}
>>> stmr.stem_dict('census')
{'n': 'cens', 'v': 'censu'}
>>> stmr.stem_dict('virum')
{'n': 'uir', 'v': 'uiru'}
>>> stmr.stem_dict('populusque')
{'n': 'popul', 'v': 'populu'}
>>> stmr.stem_dict('senatus')
{'n': 'senat', 'v': 'senatu'}

New in version 0.6.0.

class abydos.stemmer.SnowballDanish[source]

Bases: _Snowball

Snowball Danish stemmer.

The Snowball Danish stemmer is defined at: http://snowball.tartarus.org/algorithms/danish/stemmer.html

New in version 0.3.6.

stem(word: str) str[source]

Return Snowball Danish stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = SnowballDanish()
>>> stmr.stem('underviser')
'undervis'
>>> stmr.stem('suspension')
'suspension'
>>> stmr.stem('sikkerhed')
'sikker'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.SnowballDutch[source]

Bases: _Snowball

Snowball Dutch stemmer.

The Snowball Dutch stemmer is defined at: http://snowball.tartarus.org/algorithms/dutch/stemmer.html

New in version 0.3.6.

stem(word: str) str[source]

Return Snowball Dutch stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = SnowballDutch()
>>> stmr.stem('lezen')
'lez'
>>> stmr.stem('opschorting')
'opschort'
>>> stmr.stem('ongrijpbaarheid')
'ongrijp'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.SnowballGerman(alternate_vowels: bool = False)[source]

Bases: _Snowball

Snowball German stemmer.

The Snowball German stemmer is defined at: http://snowball.tartarus.org/algorithms/german/stemmer.html

New in version 0.3.6.

Initialize SnowballGerman instance.

Parameters:

alternate_vowels (bool) -- Composes ae as ä, oe as ö, and ue as ü before running the algorithm

New in version 0.4.0.

stem(word: str) str[source]

Return Snowball German stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = SnowballGerman()
>>> stmr.stem('lesen')
'les'
>>> stmr.stem('graues')
'grau'
>>> stmr.stem('buchstabieren')
'buchstabi'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.SnowballNorwegian[source]

Bases: _Snowball

Snowball Norwegian stemmer.

The Snowball Norwegian stemmer is defined at: http://snowball.tartarus.org/algorithms/norwegian/stemmer.html

New in version 0.3.6.

stem(word: str) str[source]

Return Snowball Norwegian stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = SnowballNorwegian()
>>> stmr.stem('lese')
'les'
>>> stmr.stem('suspensjon')
'suspensjon'
>>> stmr.stem('sikkerhet')
'sikker'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.SnowballSwedish[source]

Bases: _Snowball

Snowball Swedish stemmer.

The Snowball Swedish stemmer is defined at: http://snowball.tartarus.org/algorithms/swedish/stemmer.html

New in version 0.3.6.

stem(word: str) str[source]

Return Snowball Swedish stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str

Examples

>>> stmr = SnowballSwedish()
>>> stmr.stem('undervisa')
'undervis'
>>> stmr.stem('suspension')
'suspension'
>>> stmr.stem('visshet')
'viss'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

class abydos.stemmer.UEALite(max_word_length: int = 20, max_acro_length: int = 8, var: str = 'standard')[source]

Bases: _Stemmer

UEA-Lite stemmer.

The UEA-Lite stemmer is discussed in [JS05].

This is chiefly based on the Java implementation of the algorithm, with variants based on the Perl implementation and Jason Adams' Ruby port.

Java version: [Chu] Perl version: [JS05] Ruby version: [Ada17]

New in version 0.3.6.

Initialize UEALite instance.

Parameters:
  • max_word_length (int) -- The maximum word length allowed

  • max_acro_length (int) -- The maximum acronym length allowed

  • var (str) --

    Variant rules to use:

    • standard to use the original (Java-version) rules

    • Adams to use Jason Adams' rules

    • Perl to use the original Perl rules

New in version 0.4.0.

stem(word: str) str[source]

Return UEA-Lite stem.

Parameters:

word (str) -- The word to stem

Returns:

Word stem

Return type:

str or (str, int)

Examples

>>> stmr = UEALite()
>>> stmr.stem('readings')
'read'
>>> stmr.stem('insulted')
'insult'
>>> stmr.stem('cussed')
'cuss'
>>> stmr.stem('fancies')
'fancy'
>>> stmr.stem('eroded')
'erode'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

Changed in version 0.6.0: Made return a str exclusively