citextract.utils package

Submodules

citextract.utils.model module

Model utilities.

citextract.utils.model.load_model_params(model, model_name, model_uri, ignore_cache=False, device=None)

Load model parameters from disk or from the web.

Parameters:
  • model (torch.nn.modules.container.Sequential) – The model instance to load the parameters for.
  • model_name (str) – The name of the model which should be loaded.
  • model_uri (str) – Part of the URL or full URL to the model parameters. If not specified, then the latest version is pulled from the internet.
  • ignore_cache (bool) – When true, all caches are ignored and the model parameters are forcefully downloaded.
  • device (torch.device) – The device to use.
Returns:

The loaded PyTorch model instance.

Return type:

torch.nn.modules.container.Sequential

Raises:

ValueError – When the model name is not supported.

citextract.utils.pdf module

PDF utilities for converting PDF to a usable format.

citextract.utils.pdf.convert_pdf_file_to_text(path)

Convert a PDF file to text.

Parameters:path (str) – Path to the PDF file.
Returns:The text found in the PDF file.
Return type:str
citextract.utils.pdf.convert_pdf_url_to_text(pdf_url)

Convert a PDF URL to text.

Parameters:pdf_url (str) – The URL to parse.
Returns:The text which was found in the PDF document.
Return type:str

Module contents

Utilities for the CiteXtract project.