mwe
pymusas.taggers.rules.mwe
MWERule​
class MWERule(Rule):
| ...
| def __init__(
| self,
| mwe_lexicon_lookup: Dict[str, List[str]],
| pos_mapper: Optional[Dict[str, List[str]]] = None
| ) -> None
A Multi Word Expression (MWE) rule match can be one of the following matches:
MWE_NON_SPECIAL
match - whereby the combined token/lemma and POS is found within the given MWE Lexicon Collection (self.mwe_lexicon_collection
).MWE_WILDCARD
match - whereby the combined token/lemma and POS matches a wildcard MWE template that is within the MWE Lexicon Collection (self.mwe_lexicon_collection
).
All rule matches use the
pymusas.lexicon_collection.MWELexiconCollection.mwe_match
method for matching. Matches are found based on the original token/lemma and
lower cased versions of the token/lemma.
Parameters¶​
- mwe_lexicon_lookup :
Dict[str, List[str]]
The data to createmwe_lexicon_collection
instance attribute. A Dictionary where the keys are MWE templates, of anypymusas.lexicon_collection.LexiconType
, and the values are a list of associated semantic tags. - pos_mapper :
Dict[str, List[str]]
, optional (default =None
)
If notNone
, maps from themwe_lexicon_lookup
POS tagset to the desired POS tagset,whereby the mapping is aList
of tags, at the moment there is no preference order in this list of POS tags. Note the longer theList[str]
for each POS mapping the slower the tagger, a one to one mapping will have no speed impact on the tagger. A selection of POS mappers can be found inpymusas.pos_mapper
.
Instance Attributes¶​
- mwe_lexicon_collection :
pymusas.lexicon_collection.MWELexiconCollection
Apymusas.lexicon_collection.MWELexiconCollection
instance that has been initialised using themwe_lexicon_lookup
andpos_mapper
parameters. This collection is used to find MWE rule matches.
__call__​
class MWERule(Rule):
| ...
| def __call__(
| self,
| tokens: List[str],
| lemmas: List[str],
| pos_tags: List[str]
| ) -> List[List[RankingMetaData]]
Given the tokens, lemmas, and POS tags for each word in a text,
it returns for each token a List
of rules matches defined by
the pymusas.rankers.ranking_meta_data.RankingMetaData
object based on
the rule matches stated in the class docstring above.
Parameters¶​
- tokens :
List[str]
The tokens that are within the text. - lemmas :
List[str]
The lemmas of the tokens. - pos_tags :
List[str]
The Part Of Speech tags of the tokens.
Returns¶​
List[List[RankingMetaData]]
to_bytes​
class MWERule(Rule):
| ...
| def to_bytes() -> bytes
Serialises the MWERule
to a bytestring.
Returns¶​
bytes
from_bytes​
class MWERule(Rule):
| ...
| @staticmethod
| def from_bytes(bytes_data: bytes) -> "MWERule"
Loads MWERule
from the given bytestring and returns it.
Parameters¶​
- bytes_data :
bytes
The bytestring to load.
Returns¶​
__eq__​
class MWERule(Rule):
| ...
| def __eq__(other: object) -> bool
Given another object to compare too it will return True
if the other
object is the same class and initialised using with the same argument
values.
Parameters¶​
- other :
object
The object to compare too.
Returns¶​
True