The API module that contains the main function `parse_pdf`.

parse_pdf[source]

parse_pdf(server_address:str, file_path:Path, port:str='', timeout:int=60)

This function if successful returns the JSON output of the science parse server as a dictionary. Else if a Timeout Exception or any other Exception occurs it will return None. If any of the exceptions do occur they will be logged as an error.

  1. server_address: Address of the server e.g. http://127.0.0.1
  2. file_path: Path to the pdf file to be processed.
  3. port: The port to the server e.g. 8080
  4. timeout: The amount of time to allow the request to take.

returns A dictionary with the following keys:

['abstractText', 'authors', 'id', 'references', 'sections', 'title', 'year']

Note not all of these dictionary keys will always exist if science parse cannot detect the relevant information e.g. if it cannot find any references then there will be no reference key.

Note See the example on the main page of the documentation for a detailed example of this method.