citextract.utils package¶
Submodules¶
citextract.utils.model module¶
Model utilities.
-
citextract.utils.model.
load_model_params
(model, model_name, model_uri, ignore_cache=False, device=None)¶ Load model parameters from disk or from the web.
Parameters: - model (torch.nn.modules.container.Sequential) – The model instance to load the parameters for.
- model_name (str) – The name of the model which should be loaded.
- model_uri (str) – Part of the URL or full URL to the model parameters. If not specified, then the latest version is pulled from the internet.
- ignore_cache (bool) – When true, all caches are ignored and the model parameters are forcefully downloaded.
- device (torch.device) – The device to use.
Returns: The loaded PyTorch model instance.
Return type: torch.nn.modules.container.Sequential
Raises: ValueError
– When the model name is not supported.
citextract.utils.pdf module¶
PDF utilities for converting PDF to a usable format.
-
citextract.utils.pdf.
convert_pdf_file_to_text
(path)¶ Convert a PDF file to text.
Parameters: path (str) – Path to the PDF file. Returns: The text found in the PDF file. Return type: str
-
citextract.utils.pdf.
convert_pdf_url_to_text
(pdf_url)¶ Convert a PDF URL to text.
Parameters: pdf_url (str) – The URL to parse. Returns: The text which was found in the PDF document. Return type: str
Module contents¶
Utilities for the CiteXtract project.