The UCREL Doc class holds text level lingustic information which is stored as a list of UCREL Token instances.

class UCREL_Doc[source]

UCREL_Doc(text:str, tokens:List[UCREL_Token], sentence_indexes:Optional[List[Tuple[int, int]]]=None) :: Iterable

Classs that holds text level lingustic information which is stored as a list of UCREL_Tokens. A text here can be anything from one word to a whole book or larger. However becareful how much information you store in this class as it is all stored in memory.

This class is inspired by the Doc class from the SpaCy API.

inherits from: collections.abc.Iterable and collections.abc.Sized

UCREL_Doc.__init__[source]

UCREL_Doc.__init__(text:str, tokens:List[UCREL_Token], sentence_indexes:Optional[List[Tuple[int, int]]]=None)

  1. text: The text the Doc is representing.
  2. tokens: List of UCREL_Tokens
  3. sentence_indexes: A List of Tuples. Where each tuple contains a start and an end token index representing the start and end of the sentence. These are used to create the sentences property. Can be accessed through self._sentence_indexes. Optional
from ucrel_api.ucrel_token import UCREL_Token

DOC_TOKENS = [UCREL_Token('hello', pos_tag='UH', usas_tag='Z4'), 
              UCREL_Token('how', 'RRQ', 'Z5'), 
              UCREL_Token('are', 'VBR', 'A3+'), UCREL_Token('you', 'PPY', 'Z8mf'),
              UCREL_Token('.', '.', None), UCREL_Token('I', 'PPIS1', 'Z8mf'),
              UCREL_Token('am', 'VBM', 'A3+'), UCREL_Token('great', 'JJ', 'A5.1+'),
              UCREL_Token('thanks', 'NN2', 'S1.2.4+'), UCREL_Token('.', '.', None)]

example_doc = UCREL_Doc(text='hello how are you. I am great thanks.',
                        tokens=DOC_TOKENS, sentence_indexes=[(0,5), (5,10)])

UCREL_Doc.__repr__[source]

UCREL_Doc.__repr__()

String representation of the UCREL Doc instance:

example_doc
UCREL Doc (2 sentences):
First 3 tokens:
UCREL Token: hello	POS tag: UH	USAS tag: Z4
UCREL Token: how	Lemma: RRQ	POS tag: Z5
UCREL Token: are	Lemma: VBR	POS tag: A3+

UCREL_Doc.sentences[source]

returns: An iterable of all sentences in the text represented as a list of UCREL_Tokens.

raises ValueError: If the sentence_indexes parameter is not set at contruction time.

for index, sentence in enumerate(example_doc.sentences):
    print(f'Sentence {index}:')
    for token in sentence:
        print(f'{token}')
    if index == 0:
        print('\n')
Sentence 0:
UCREL Token: hello	POS tag: UH	USAS tag: Z4
UCREL Token: how	Lemma: RRQ	POS tag: Z5
UCREL Token: are	Lemma: VBR	POS tag: A3+
UCREL Token: you	Lemma: PPY	POS tag: Z8mf
UCREL Token: .	Lemma: .


Sentence 1:
UCREL Token: I	Lemma: PPIS1	POS tag: Z8mf
UCREL Token: am	Lemma: VBM	POS tag: A3+
UCREL Token: great	Lemma: JJ	POS tag: A5.1+
UCREL Token: thanks	Lemma: NN2	POS tag: S1.2.4+
UCREL Token: .	Lemma: .

UCREL_Doc.__iter__[source]

UCREL_Doc.__iter__()

returns: Yields each token in self.tokens.

for index, token in enumerate(example_doc):
    print(f'{index} {token}')
0 UCREL Token: hello	POS tag: UH	USAS tag: Z4
1 UCREL Token: how	Lemma: RRQ	POS tag: Z5
2 UCREL Token: are	Lemma: VBR	POS tag: A3+
3 UCREL Token: you	Lemma: PPY	POS tag: Z8mf
4 UCREL Token: .	Lemma: .
5 UCREL Token: I	Lemma: PPIS1	POS tag: Z8mf
6 UCREL Token: am	Lemma: VBM	POS tag: A3+
7 UCREL Token: great	Lemma: JJ	POS tag: A5.1+
8 UCREL Token: thanks	Lemma: NN2	POS tag: S1.2.4+
9 UCREL Token: .	Lemma: .

UCREL_Doc.__getitem__[source]

UCREL_Doc.__getitem__(index:int)

  1. index: The index of the token to return.

returns: The token at the given index.

example_doc[-2]
UCREL Token: thanks	Lemma: NN2	POS tag: S1.2.4+

UCREL_Doc.__len__[source]

UCREL_Doc.__len__()

returns: The number of tokens in the Doc.

len(example_doc)
10

UCREL_Doc.__eq__[source]

UCREL_Doc.__eq__(other:Any)

Compare another instance with the current instance of this class.

  1. other: Another instance, if this instance is not of this class type it will raise a NotImplementedError.

returns True if the two instances are the same based on the following attributes:

  1. text
  2. sentence_indexes
  3. tokens

raises NotImplementedError: If the other instance is not of the same class type as self.

assert example_doc == UCREL_Doc(text='hello how are you. I am great thanks.',
                                tokens=DOC_TOKENS, 
                                sentence_indexes=[(0,5), (5,10)])

example_without_sent_indexes = UCREL_Doc(text='hello how are you. I am great thanks.',
                                         tokens=DOC_TOKENS)
assert example_doc != example_without_sent_indexes

try:
    {'text': 'hello how are you. I am great thanks.', 
     'tokens': DOC_TOKENS, 'sentence_indexes': [(0,5), (5,10)]} == example_doc
except NotImplementedError:
    print('UCREL_Doc instances can only be compared '
          'with other UCREL_Doc instances:')
UCREL_Doc instances can only be compared with other UCREL_Doc instances:

UCREL_Doc.to_json[source]

UCREL_Doc.to_json()

returns This UCREL_Doc as a JSON String.

example_doc.to_json()
'{"text": "hello how are you. I am great thanks.", "tokens": [{"text": "hello", "lemma": null, "pos_tag": "UH", "usas_tag": "Z4", "mwe_tag": null}, {"text": "how", "lemma": "RRQ", "pos_tag": "Z5", "usas_tag": null, "mwe_tag": null}, {"text": "are", "lemma": "VBR", "pos_tag": "A3+", "usas_tag": null, "mwe_tag": null}, {"text": "you", "lemma": "PPY", "pos_tag": "Z8mf", "usas_tag": null, "mwe_tag": null}, {"text": ".", "lemma": ".", "pos_tag": null, "usas_tag": null, "mwe_tag": null}, {"text": "I", "lemma": "PPIS1", "pos_tag": "Z8mf", "usas_tag": null, "mwe_tag": null}, {"text": "am", "lemma": "VBM", "pos_tag": "A3+", "usas_tag": null, "mwe_tag": null}, {"text": "great", "lemma": "JJ", "pos_tag": "A5.1+", "usas_tag": null, "mwe_tag": null}, {"text": "thanks", "lemma": "NN2", "pos_tag": "S1.2.4+", "usas_tag": null, "mwe_tag": null}, {"text": ".", "lemma": ".", "pos_tag": null, "usas_tag": null, "mwe_tag": null}], "sentence_indexes": [[0, 5], [5, 10]]}'

Static Methods

UCREL_Doc.from_json[source]

UCREL_Doc.from_json(json_string:str)

A static method that given a json_string will return a UCREL_Doc representation of that string.

  1. json_string: A string that is the return of UCREL_Doc.to_json method

returns The given json_string represented through the UCREL_Doc.

example_doc_json_string = example_doc.to_json()
another_example_doc = UCREL_Doc.from_json(example_doc_json_string)
another_example_doc
UCREL Doc (2 sentences):
First 3 tokens:
UCREL Token: hello	POS tag: UH	USAS tag: Z4
UCREL Token: how	Lemma: RRQ	POS tag: Z5
UCREL Token: are	Lemma: VBR	POS tag: A3+
example_doc == another_example_doc
True