mwe
pymusas.taggers.rules.mwe
MWERule
class MWERule(Rule):
| ...
| def __init__(
| self,
| mwe_lexicon_lookup: Dict[str, List[str]],
| pos_mapper: Optional[Dict[str, List[str]]] = None
| ) -> None
A Multi Word Expression (MWE) rule match can be one of the following matches:
MWE_NON_SPECIALmatch - whereby the combined token/lemma and POS is found within the given MWE Lexicon Collection (self.mwe_lexicon_collection).MWE_WILDCARDmatch - whereby the combined token/lemma and POS matches a wildcard MWE template that is within the MWE Lexicon Collection (self.mwe_lexicon_collection).
All rule matches use the
pymusas.lexicon_collection.MWELexiconCollection.mwe_match
method for matching. Matches are found based on the original token/lemma and
lower cased versions of the token/lemma.
Parameters¶
- mwe_lexicon_lookup :
Dict[str, List[str]]
The data to createmwe_lexicon_collectioninstance attribute. A Dictionary where the keys are MWE templates, of anypymusas.lexicon_collection.LexiconType, and the values are a list of associated semantic tags. - pos_mapper :
Dict[str, List[str]], optional (default =None)
If notNone, maps from themwe_lexicon_lookupPOS tagset to the desired POS tagset,whereby the mapping is aListof tags, at the moment there is no preference order in this list of POS tags. Note the longer theList[str]for each POS mapping the slower the tagger, a one to one mapping will have no speed impact on the tagger. A selection of POS mappers can be found inpymusas.pos_mapper.
Instance Attributes¶
- mwe_lexicon_collection :
pymusas.lexicon_collection.MWELexiconCollection
Apymusas.lexicon_collection.MWELexiconCollectioninstance that has been initialised using themwe_lexicon_lookupandpos_mapperparameters. This collection is used to find MWE rule matches.
__call__
class MWERule(Rule):
| ...
| def __call__(
| self,
| tokens: List[str],
| lemmas: List[str],
| pos_tags: List[str]
| ) -> List[List[RankingMetaData]]
Given the tokens, lemmas, and POS tags for each word in a text,
it returns for each token a List of rules matches defined by
the pymusas.rankers.ranking_meta_data.RankingMetaData object based on
the rule matches stated in the class docstring above.
Parameters¶
- tokens :
List[str]
The tokens that are within the text. - lemmas :
List[str]
The lemmas of the tokens. - pos_tags :
List[str]
The Part Of Speech tags of the tokens.
Returns¶
List[List[RankingMetaData]]
to_bytes
class MWERule(Rule):
| ...
| def to_bytes() -> bytes
Serialises the MWERule to a bytestring.
Returns¶
bytes
from_bytes
class MWERule(Rule):
| ...
| @staticmethod
| def from_bytes(bytes_data: bytes) -> "MWERule"
Loads MWERule from the given bytestring and returns it.
Parameters¶
- bytes_data :
bytes
The bytestring to load.
Returns¶
__eq__
class MWERule(Rule):
| ...
| def __eq__(other: object) -> bool
Given another object to compare too it will return True if the other
object is the same class and initialised using with the same argument
values.
Parameters¶
- other :
object
The object to compare too.
Returns¶
True