mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-10-24 06:24:01 +00:00
arXiv Topic Modelling
This directory contains an example of how to use the arXiv python package (wrapper for the arXiv api), berTopic python package (transformer based topic modelling)
and several bricks from the unstructured library to run topic modelling on queried arXiV research papers. This notebook is very simple, but can easily modified for more complicated use cases.
To get started, use the following steps:
- Ensure you have Python 3.8 or higher installed on your system
- Create a new Python virtual environment
- Run
pip install -r requirements.txtto install the dependencies - Run
PYTHONPATH=. jupyter notebookfrom this directory to launch the notebook
At this point, you'll be able to run the topic modelling example notebook.