single_word
pymusas.taggers.rules.single_word
SingleWordRule
class SingleWordRule(Rule):
| ...
| def __init__(
| self,
| lexicon_collection: Dict[str, List[str]],
| lemma_lexicon_collection: Dict[str, List[str]],
| pos_mapper: Optional[Dict[str, List[str]]] = None
| )
A single word rule match, is a rule that matches on single word lexicon entries. Entires can be matched on:
- Token and the token's Part Of Speech (POS) tag, e.g.
driving|adj - Lemma and the lemma's POS tag, e.g.
drive|adj - Token, e.g.
driving - Lemma, e.g.
drive
In all cases matches are found based on the original token/lemma and lower
cased versions of the token/lemma. These matches are found through searching
the lexicon_collection and lemma_lexicon_collection attributes.
Parameters¶
- lexicon_collection :
Dict[str, List[str]]
The data to createlexicon_collectioninstance attribute. A Dictionary where the keys are a combination of lemma/token and POS in the following format:{lemma}|{POS}and the values are a list of associated semantic tags. - lemma_lexicon_collection :
Dict[str, List[str]]
The data to createlemma_lexicon_collectioninstance attribute. A Dictionary where the keys are either just a lemma/token in the following format:{lemma}and the values are a list of associated semantic tags. - pos_mapper :
Dict[str, List[str]], optional (default =None)
If notNone, maps from the given token's POS tagset to the desired POS tagset, whereby the mapping is aListof tags, at the moment there is no preference order in this list of POS tags. The POS mapping is useful in situtation whereby the token's POS tagset is different to those used in the lexicons. Note the longer theList[str]for each POS mapping the slower the tagger, a one to one mapping will have no speed impact on the tagger. A selection of POS mappers can be found inpymusas.pos_mapper.
Instance Attributes¶
- lexicon_collection :
pymusas.lexicon_collection.LexiconCollection
Apymusas.lexicon_collection.LexiconCollectioninstance that has been initialised using thelexicon_collectionparameter. - lemma_lexicon_collection :
pymusas.lexicon_collection.LexiconCollection
Apymusas.lexicon_collection.LexiconCollectioninstance that has been initialised using thelemma_lexicon_collectionparameter. - pos_mapper :
Dict[str, List[str]], optional (default =None)
The givenpos_mapper.
__call__
class SingleWordRule(Rule):
| ...
| def __call__(
| self,
| tokens: List[str],
| lemmas: List[str],
| pos_tags: List[str]
| ) -> List[List[RankingMetaData]]
Given the tokens, lemmas, and POS tags for each word in a text,
it returns for each token a List of rules matches defined by
the pymusas.rankers.ranking_meta_data.RankingMetaData
object based on the rule matches stated in the class docstring above.
Parameters¶
- tokens :
List[str]
The tokens that are within the text. - lemmas :
List[str]
The lemmas of the tokens. - pos_tags :
List[str]
The Part Of Speech tags of the tokens.
Returns¶
List[List[RankingMetaData]]
to_bytes
class SingleWordRule(Rule):
| ...
| def to_bytes() -> bytes
Serialises the SingleWordRule to a bytestring.
Returns¶
bytes
from_bytes
class SingleWordRule(Rule):
| ...
| @staticmethod
| def from_bytes(bytes_data: bytes) -> "SingleWordRule"
Loads SingleWordRule from the given bytestring and returns it.
Parameters¶
- bytes_data :
bytes
The bytestring to load.
Returns¶
__eq__
class SingleWordRule(Rule):
| ...
| def __eq__(other: object) -> bool
Given another object to compare too it will return True if the other
object is the same class and initialised using with the same argument
values.
Parameters¶
- other :
object
The object to compare too.
Returns¶
True