Skip to main content

single_word

pymusas.taggers.rules.single_word

[SOURCE]


SingleWordRule​

class SingleWordRule(Rule):
| ...
| def __init__(
| self,
| lexicon_collection: Dict[str, List[str]],
| lemma_lexicon_collection: Dict[str, List[str]],
| pos_mapper: Optional[Dict[str, List[str]]] = None
| )

A single word rule match, is a rule that matches on single word lexicon entries. Entires can be matched on:

  1. Token and the token's Part Of Speech (POS) tag, e.g. driving|adj
  2. Lemma and the lemma's POS tag, e.g. drive|adj
  3. Token, e.g. driving
  4. Lemma, e.g. drive

In all cases matches are found based on the original token/lemma and lower cased versions of the token/lemma. These matches are found through searching the lexicon_collection and lemma_lexicon_collection attributes.

Parameters¶​

  • lexicon_collection : Dict[str, List[str]]
    The data to create lexicon_collection instance attribute. A Dictionary where the keys are a combination of lemma/token and POS in the following format: {lemma}|{POS} and the values are a list of associated semantic tags.
  • lemma_lexicon_collection : Dict[str, List[str]]
    The data to create lemma_lexicon_collection instance attribute. A Dictionary where the keys are either just a lemma/token in the following format: {lemma} and the values are a list of associated semantic tags.
  • pos_mapper : Dict[str, List[str]], optional (default = None)
    If not None, maps from the given token's POS tagset to the desired POS tagset, whereby the mapping is a List of tags, at the moment there is no preference order in this list of POS tags. The POS mapping is useful in situtation whereby the token's POS tagset is different to those used in the lexicons. Note the longer the List[str] for each POS mapping the slower the tagger, a one to one mapping will have no speed impact on the tagger. A selection of POS mappers can be found in pymusas.pos_mapper.

Instance Attributes¶​

  • lexicon_collection : pymusas.lexicon_collection.LexiconCollection
    A pymusas.lexicon_collection.LexiconCollection instance that has been initialised using the lexicon_collection parameter.
  • lemma_lexicon_collection : pymusas.lexicon_collection.LexiconCollection
    A pymusas.lexicon_collection.LexiconCollection instance that has been initialised using the lemma_lexicon_collection parameter.
  • pos_mapper : Dict[str, List[str]], optional (default = None)
    The given pos_mapper.

__call__​

class SingleWordRule(Rule):
| ...
| def __call__(
| self,
| tokens: List[str],
| lemmas: List[str],
| pos_tags: List[str]
| ) -> List[List[RankingMetaData]]

Given the tokens, lemmas, and POS tags for each word in a text, it returns for each token a List of rules matches defined by the pymusas.rankers.ranking_meta_data.RankingMetaData object based on the rule matches stated in the class docstring above.

Parameters¶​

  • tokens : List[str]
    The tokens that are within the text.
  • lemmas : List[str]
    The lemmas of the tokens.
  • pos_tags : List[str]
    The Part Of Speech tags of the tokens.

Returns¶​

  • List[List[RankingMetaData]]

to_bytes​

class SingleWordRule(Rule):
| ...
| def to_bytes() -> bytes

Serialises the SingleWordRule to a bytestring.

Returns¶​

  • bytes

from_bytes​

class SingleWordRule(Rule):
| ...
| @staticmethod
| def from_bytes(bytes_data: bytes) -> "SingleWordRule"

Loads SingleWordRule from the given bytestring and returns it.

Parameters¶​

  • bytes_data : bytes
    The bytestring to load.

Returns¶​

__eq__​

class SingleWordRule(Rule):
| ...
| def __eq__(other: object) -> bool

Given another object to compare too it will return True if the other object is the same class and initialised using with the same argument values.

Parameters¶​

  • other : object
    The object to compare too.

Returns¶​

  • True