pos_mapper

pymusas.pos_mapper

Attributes¶

UPOS_TO_USAS_CORE : Dict[str, List[str]]
A mapping from the Universal Part Of Speech (UPOS) tagset to the USAS core tagset. The UPOS tagset used here is the same as that used by the Universal Dependencies Treebank project. This is slightly different to the original presented in the paper by Petrov et al. 2012, for this original tagset see the following GitHub repository.
USAS_CORE_TO_UPOS : Dict[str, List[str]]
The reverse of UPOS_TO_USAS_CORE.
PENN_CHINESE_TREEBANK_TO_USAS_CORE : Dict[str, List[str]]
A mapping from the Penn Chinese Treebank tagset to the USAS core tagset. The Penn Chinese Treebank tagset here is slightly different to the original as it contains three extra tags, X, URL, and INF, that appear to be unique to the spaCy Chinese models. For more information on how this mapping was created, see the following GitHub issue.
USAS_CORE_TO_PENN_CHINESE_TREEBANK : Dict[str, List[str]]
The reverse of PENN_CHINESE_TREEBANK_TO_USAS_CORE.
BASIC_CORCENCC_TO_USAS_CORE : Dict[str, List[str]]
A mapping from the basic CorCenCC tagset to the USAS core tagset. This mapping has come from table A.1 in the paper Leveraging Pre-Trained Embeddings for Welsh Taggers. and from table 6 in the paper Towards A Welsh Semantic Annotation System.
USAS_CORE_TO_BASIC_CORCENCC : Dict[str, List[str]]
The reverse of BASIC_CORCENCC_TO_USAS_CORE.

UPOS_TO_USAS_CORE

UPOS_TO_USAS_CORE: Dict[str, List[str]] = {
    'ADJ': ['adj'],
    'ADP': ['prep'],
    'ADV': ['adv'],
    'AUX': ['verb'],
    'CCONJ': ['c ...

USAS_CORE_TO_UPOS

USAS_CORE_TO_UPOS: Dict[str, List[str]] = {
    'adj': ['ADJ'],
    'prep': ['ADP'],
    'adv': ['ADV'],
    'verb': ['VERB', 'AUX'],
    'con ...

PENN_CHINESE_TREEBANK_TO_USAS_CORE

PENN_CHINESE_TREEBANK_TO_USAS_CORE: Dict[str, List[str]] = {
    'AS': ['part'],
    'DEC': ['part'],
    'DEG': ['part'],
    'DER': ['part'],
    'DEV': ['pa ...

USAS_CORE_TO_PENN_CHINESE_TREEBANK

USAS_CORE_TO_PENN_CHINESE_TREEBANK: Dict[str, List[str]] = {
    'part': ['AS', 'DEC', 'DEG', 'DER', 'DEV', 'ETC', 'LC', 'MSP', 'SP'],
    'fw': ['BA', 'FW', ' ...

BASIC_CORCENCC_TO_USAS_CORE

BASIC_CORCENCC_TO_USAS_CORE: Dict[str, List[str]] = {
    "E": ["noun"],
    "YFB": ["art"],
    "Ar": ["prep"],
    "Cys": ["conj"],
    "Rhi": ["num"] ...

USAS_CORE_TO_BASIC_CORCENCC

USAS_CORE_TO_BASIC_CORCENCC: Dict[str, List[str]] = {
    "noun": ["E"],
    "pnoun": ["E"],
    "art": ["YFB"],
    "det": ["YFB"],
    "prep": ["Ar"], ...

upos_to_usas_core

def upos_to_usas_core(upos_tag: str) -> List[str]

Given a Universal Part Of Speech (UPOS) tag it returns a List of USAS core POS tags that are equivalent, whereby if the length of the List is greater than 1 then the first tag in the List is the most equivalent tag.

If the List is empty then an invalid UPOS tag was given.

The mappings between UPOS and USAS core can be seen in UPOS_TO_USAS_CORE

Parameters¶

upos_tag : str
UPOS tag, expected to be all upper case.

Returns¶

List[str]

Examples¶

from pymusas.pos_mapper import upos_to_usas_core
assert upos_to_usas_core('CCONJ') == ['conj']
# Most equivalent tag for 'X' is 'fw'
assert upos_to_usas_core('X') == ['fw', 'xx']
assert upos_to_usas_core('Unknown') == []

Attributes¶​

UPOS_TO_USAS_CORE​

USAS_CORE_TO_UPOS​

PENN_CHINESE_TREEBANK_TO_USAS_CORE​

USAS_CORE_TO_PENN_CHINESE_TREEBANK​

BASIC_CORCENCC_TO_USAS_CORE​

USAS_CORE_TO_BASIC_CORCENCC​

upos_to_usas_core​

Parameters¶​

Returns¶​

Examples¶​