Skip to main content

pos_mapper

pymusas.pos_mapper

[SOURCE]


Attributes¢​

UPOS_TO_USAS_CORE​

UPOS_TO_USAS_CORE: Dict[str, List[str]] = {
'ADJ': ['adj'],
'ADP': ['prep'],
'ADV': ['adv'],
'AUX': ['verb'],
'CCONJ': ['c ...

USAS_CORE_TO_UPOS​

USAS_CORE_TO_UPOS: Dict[str, List[str]] = {
'adj': ['ADJ'],
'prep': ['ADP'],
'adv': ['ADV'],
'verb': ['VERB', 'AUX'],
'con ...

PENN_CHINESE_TREEBANK_TO_USAS_CORE​

PENN_CHINESE_TREEBANK_TO_USAS_CORE: Dict[str, List[str]] = {
'AS': ['part'],
'DEC': ['part'],
'DEG': ['part'],
'DER': ['part'],
'DEV': ['pa ...

USAS_CORE_TO_PENN_CHINESE_TREEBANK​

USAS_CORE_TO_PENN_CHINESE_TREEBANK: Dict[str, List[str]] = {
'part': ['AS', 'DEC', 'DEG', 'DER', 'DEV', 'ETC', 'LC', 'MSP', 'SP'],
'fw': ['BA', 'FW', ' ...

BASIC_CORCENCC_TO_USAS_CORE​

BASIC_CORCENCC_TO_USAS_CORE: Dict[str, List[str]] = {
"E": ["noun"],
"YFB": ["art"],
"Ar": ["prep"],
"Cys": ["conj"],
"Rhi": ["num"] ...

USAS_CORE_TO_BASIC_CORCENCC​

USAS_CORE_TO_BASIC_CORCENCC: Dict[str, List[str]] = {
"noun": ["E"],
"pnoun": ["E"],
"art": ["YFB"],
"det": ["YFB"],
"prep": ["Ar"], ...

upos_to_usas_core​

def upos_to_usas_core(upos_tag: str) -> List[str]

Given a Universal Part Of Speech (UPOS) tag it returns a List of USAS core POS tags that are equivalent, whereby if the length of the List is greater than 1 then the first tag in the List is the most equivalent tag.

If the List is empty then an invalid UPOS tag was given.

The mappings between UPOS and USAS core can be seen in UPOS_TO_USAS_CORE

Parameters¢​

  • upos_tag : str
    UPOS tag, expected to be all upper case.

Returns¢​

  • List[str]

Examples¢​

from pymusas.pos_mapper import upos_to_usas_core
assert upos_to_usas_core('CCONJ') == ['conj']
# Most equivalent tag for 'X' is 'fw'
assert upos_to_usas_core('X') == ['fw', 'xx']
assert upos_to_usas_core('Unknown') == []