Skip to main content

mwe

pymusas.taggers.rules.mwe

[SOURCE]


MWERule​

class MWERule(Rule):
| ...
| def __init__(
| self,
| mwe_lexicon_lookup: Dict[str, List[str]],
| pos_mapper: Optional[Dict[str, List[str]]] = None
| ) -> None

A Multi Word Expression (MWE) rule match can be one of the following matches:

  1. MWE_NON_SPECIAL match - whereby the combined token/lemma and POS is found within the given MWE Lexicon Collection (self.mwe_lexicon_collection).
  2. MWE_WILDCARD match - whereby the combined token/lemma and POS matches a wildcard MWE template that is within the MWE Lexicon Collection (self.mwe_lexicon_collection).

All rule matches use the pymusas.lexicon_collection.MWELexiconCollection.mwe_match method for matching. Matches are found based on the original token/lemma and lower cased versions of the token/lemma.

Parameters¶​

  • mwe_lexicon_lookup : Dict[str, List[str]]
    The data to create mwe_lexicon_collection instance attribute. A Dictionary where the keys are MWE templates, of any pymusas.lexicon_collection.LexiconType, and the values are a list of associated semantic tags.
  • pos_mapper : Dict[str, List[str]], optional (default = None)
    If not None, maps from the mwe_lexicon_lookup POS tagset to the desired POS tagset,whereby the mapping is a List of tags, at the moment there is no preference order in this list of POS tags. Note the longer the List[str] for each POS mapping the slower the tagger, a one to one mapping will have no speed impact on the tagger. A selection of POS mappers can be found in pymusas.pos_mapper.

Instance Attributes¶​

  • mwe_lexicon_collection : pymusas.lexicon_collection.MWELexiconCollection
    A pymusas.lexicon_collection.MWELexiconCollection instance that has been initialised using the mwe_lexicon_lookup and pos_mapper parameters. This collection is used to find MWE rule matches.

__call__​

class MWERule(Rule):
| ...
| def __call__(
| self,
| tokens: List[str],
| lemmas: List[str],
| pos_tags: List[str]
| ) -> List[List[RankingMetaData]]

Given the tokens, lemmas, and POS tags for each word in a text, it returns for each token a List of rules matches defined by the pymusas.rankers.ranking_meta_data.RankingMetaData object based on the rule matches stated in the class docstring above.

Parameters¶​

  • tokens : List[str]
    The tokens that are within the text.
  • lemmas : List[str]
    The lemmas of the tokens.
  • pos_tags : List[str]
    The Part Of Speech tags of the tokens.

Returns¶​

  • List[List[RankingMetaData]]

to_bytes​

class MWERule(Rule):
| ...
| def to_bytes() -> bytes

Serialises the MWERule to a bytestring.

Returns¶​

  • bytes

from_bytes​

class MWERule(Rule):
| ...
| @staticmethod
| def from_bytes(bytes_data: bytes) -> "MWERule"

Loads MWERule from the given bytestring and returns it.

Parameters¶​

  • bytes_data : bytes
    The bytestring to load.

Returns¶​

__eq__​

class MWERule(Rule):
| ...
| def __eq__(other: object) -> bool

Given another object to compare too it will return True if the other object is the same class and initialised using with the same argument values.

Parameters¶​

  • other : object
    The object to compare too.

Returns¶​

  • True