single_word
pymusas.taggers.rules.single_word
SingleWordRule​
class SingleWordRule(Rule):
| ...
| def __init__(
| self,
| lexicon_collection: Dict[str, List[str]],
| lemma_lexicon_collection: Dict[str, List[str]],
| pos_mapper: Optional[Dict[str, List[str]]] = None
| )
A single word rule match, is a rule that matches on single word lexicon entries. Entires can be matched on:
- Token and the token's Part Of Speech (POS) tag, e.g.
driving|adj
- Lemma and the lemma's POS tag, e.g.
drive|adj
- Token, e.g.
driving
- Lemma, e.g.
drive
In all cases matches are found based on the original token/lemma and lower
cased versions of the token/lemma. These matches are found through searching
the lexicon_collection
and lemma_lexicon_collection
attributes.
Parameters¶​
- lexicon_collection :
Dict[str, List[str]]
The data to createlexicon_collection
instance attribute. A Dictionary where the keys are a combination of lemma/token and POS in the following format:{lemma}|{POS}
and the values are a list of associated semantic tags. - lemma_lexicon_collection :
Dict[str, List[str]]
The data to createlemma_lexicon_collection
instance attribute. A Dictionary where the keys are either just a lemma/token in the following format:{lemma}
and the values are a list of associated semantic tags. - pos_mapper :
Dict[str, List[str]]
, optional (default =None
)
If notNone
, maps from the given token's POS tagset to the desired POS tagset, whereby the mapping is aList
of tags, at the moment there is no preference order in this list of POS tags. The POS mapping is useful in situtation whereby the token's POS tagset is different to those used in the lexicons. Note the longer theList[str]
for each POS mapping the slower the tagger, a one to one mapping will have no speed impact on the tagger. A selection of POS mappers can be found inpymusas.pos_mapper
.
Instance Attributes¶​
- lexicon_collection :
pymusas.lexicon_collection.LexiconCollection
Apymusas.lexicon_collection.LexiconCollection
instance that has been initialised using thelexicon_collection
parameter. - lemma_lexicon_collection :
pymusas.lexicon_collection.LexiconCollection
Apymusas.lexicon_collection.LexiconCollection
instance that has been initialised using thelemma_lexicon_collection
parameter. - pos_mapper :
Dict[str, List[str]]
, optional (default =None
)
The givenpos_mapper
.
__call__​
class SingleWordRule(Rule):
| ...
| def __call__(
| self,
| tokens: List[str],
| lemmas: List[str],
| pos_tags: List[str]
| ) -> List[List[RankingMetaData]]
Given the tokens, lemmas, and POS tags for each word in a text,
it returns for each token a List
of rules matches defined by
the pymusas.rankers.ranking_meta_data.RankingMetaData
object based on the rule matches stated in the class docstring above.
Parameters¶​
- tokens :
List[str]
The tokens that are within the text. - lemmas :
List[str]
The lemmas of the tokens. - pos_tags :
List[str]
The Part Of Speech tags of the tokens.
Returns¶​
List[List[RankingMetaData]]
to_bytes​
class SingleWordRule(Rule):
| ...
| def to_bytes() -> bytes
Serialises the SingleWordRule
to a bytestring.
Returns¶​
bytes
from_bytes​
class SingleWordRule(Rule):
| ...
| @staticmethod
| def from_bytes(bytes_data: bytes) -> "SingleWordRule"
Loads SingleWordRule
from the given bytestring and returns it.
Parameters¶​
- bytes_data :
bytes
The bytestring to load.
Returns¶​
__eq__​
class SingleWordRule(Rule):
| ...
| def __eq__(other: object) -> bool
Given another object to compare too it will return True
if the other
object is the same class and initialised using with the same argument
values.
Parameters¶​
- other :
object
The object to compare too.
Returns¶​
True