Skip to main content

pos_mapper

pymusas.pos_mapper

[SOURCE]


Attributes

UPOS_TO_USAS_CORE

UPOS_TO_USAS_CORE: Dict[str, List[str]] = {
'ADJ': ['adj'],
'ADP': ['prep'],
'ADV': ['adv'],
'AUX': ['verb'],
'CCONJ': ['c ...

USAS_CORE_TO_UPOS

USAS_CORE_TO_UPOS: Dict[str, List[str]] = {
'adj': ['ADJ'],
'prep': ['ADP'],
'adv': ['ADV'],
'verb': ['VERB', 'AUX'],
'con ...

PENN_CHINESE_TREEBANK_TO_USAS_CORE

PENN_CHINESE_TREEBANK_TO_USAS_CORE: Dict[str, List[str]] = {
'AS': ['part'],
'DEC': ['part'],
'DEG': ['part'],
'DER': ['part'],
'DEV': ['pa ...

USAS_CORE_TO_PENN_CHINESE_TREEBANK

USAS_CORE_TO_PENN_CHINESE_TREEBANK: Dict[str, List[str]] = {
'part': ['AS', 'DEC', 'DEG', 'DER', 'DEV', 'ETC', 'LC', 'MSP', 'SP'],
'fw': ['BA', 'FW', ' ...

BASIC_CORCENCC_TO_USAS_CORE

BASIC_CORCENCC_TO_USAS_CORE: Dict[str, List[str]] = {
"E": ["noun"],
"YFB": ["art"],
"Ar": ["prep"],
"Cys": ["conj"],
"Rhi": ["num"] ...

USAS_CORE_TO_BASIC_CORCENCC

USAS_CORE_TO_BASIC_CORCENCC: Dict[str, List[str]] = {
"noun": ["E"],
"pnoun": ["E"],
"art": ["YFB"],
"det": ["YFB"],
"prep": ["Ar"], ...

upos_to_usas_core

def upos_to_usas_core(upos_tag: str) -> List[str]

Given a Universal Part Of Speech (UPOS) tag it returns a List of USAS core POS tags that are equivalent, whereby if the length of the List is greater than 1 then the first tag in the List is the most equivalent tag.

If the List is empty then an invalid UPOS tag was given.

The mappings between UPOS and USAS core can be seen in UPOS_TO_USAS_CORE

Parameters

  • upos_tag : str
    UPOS tag, expected to be all upper case.

Returns

  • List[str]

Examples

from pymusas.pos_mapper import upos_to_usas_core
assert upos_to_usas_core('CCONJ') == ['conj']
# Most equivalent tag for 'X' is 'fw'
assert upos_to_usas_core('X') == ['fw', 'xx']
assert upos_to_usas_core('Unknown') == []