mwe

pymusas.taggers.rules.mwe

MWERule

class MWERule(Rule):
 | ...
 | def __init__(
 |     self,
 |     mwe_lexicon_lookup: Dict[str, List[str]],
 |     pos_mapper: Optional[Dict[str, List[str]]] = None
 | ) -> None

A Multi Word Expression (MWE) rule match can be one of the following matches:

MWE_NON_SPECIAL match - whereby the combined token/lemma and POS is found within the given MWE Lexicon Collection (self.mwe_lexicon_collection).
MWE_WILDCARD match - whereby the combined token/lemma and POS matches a wildcard MWE template that is within the MWE Lexicon Collection (self.mwe_lexicon_collection).

All rule matches use the pymusas.lexicon_collection.MWELexiconCollection.mwe_match method for matching. Matches are found based on the original token/lemma and lower cased versions of the token/lemma.

Parameters¶

mwe_lexicon_lookup : Dict[str, List[str]]
The data to create mwe_lexicon_collection instance attribute. A Dictionary where the keys are MWE templates, of any pymusas.lexicon_collection.LexiconType, and the values are a list of associated semantic tags.
pos_mapper : Dict[str, List[str]], optional (default = None)
If not None, maps from the mwe_lexicon_lookup POS tagset to the desired POS tagset,whereby the mapping is a List of tags, at the moment there is no preference order in this list of POS tags. Note the longer the List[str] for each POS mapping the slower the tagger, a one to one mapping will have no speed impact on the tagger. A selection of POS mappers can be found in pymusas.pos_mapper.

Instance Attributes¶

mwe_lexicon_collection : pymusas.lexicon_collection.MWELexiconCollection
A pymusas.lexicon_collection.MWELexiconCollection instance that has been initialised using the mwe_lexicon_lookup and pos_mapper parameters. This collection is used to find MWE rule matches.

call

class MWERule(Rule):
 | ...
 | def __call__(
 |     self,
 |     tokens: List[str],
 |     lemmas: List[str],
 |     pos_tags: List[str]
 | ) -> List[List[RankingMetaData]]

Given the tokens, lemmas, and POS tags for each word in a text, it returns for each token a List of rules matches defined by the pymusas.rankers.ranking_meta_data.RankingMetaData object based on the rule matches stated in the class docstring above.

Parameters¶

tokens : List[str]
The tokens that are within the text.
lemmas : List[str]
The lemmas of the tokens.
pos_tags : List[str]
The Part Of Speech tags of the tokens.

Returns¶

List[List[RankingMetaData]]

to_bytes

class MWERule(Rule):
 | ...
 | def to_bytes() -> bytes

Serialises the MWERule to a bytestring.

Returns¶

bytes

from_bytes

class MWERule(Rule):
 | ...
 | @staticmethod
 | def from_bytes(bytes_data: bytes) -> "MWERule"

Loads MWERule from the given bytestring and returns it.

Parameters¶

bytes_data : bytes
The bytestring to load.

Returns¶

MWERule

eq

class MWERule(Rule):
 | ...
 | def __eq__(other: object) -> bool

Given another object to compare too it will return True if the other object is the same class and initialised using with the same argument values.

Parameters¶

other : object
The object to compare too.

Returns¶

True

MWERule​

Parameters¶​

Instance Attributes¶​

__call__​

Parameters¶​

Returns¶​

to_bytes​

Returns¶​

from_bytes​

Parameters¶​

Returns¶​

__eq__​

Parameters¶​

Returns¶​

MWERule

Parameters¶

Instance Attributes¶

call

Parameters¶

Returns¶

to_bytes

Returns¶

from_bytes

Parameters¶

Returns¶

eq

Parameters¶

Returns¶