mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-12 19:45:56 +00:00
14 lines
757 B
Markdown
14 lines
757 B
Markdown
![]() |
## arXiv Topic Modelling
|
||
|
|
||
|
This directory contains an example of how to use the arXiv python package (wrapper for the arXiv api), berTopic python package (transformer based topic modelling)
|
||
|
and several bricks from the `unstructured` library to run topic modelling on queried arXiV research papers. This notebook is very simple, but can easily modified for more complicated use cases.
|
||
|
|
||
|
To get started, use the following steps:
|
||
|
|
||
|
- Ensure you have Python 3.8 or higher installed on your system
|
||
|
- Create a new Python virtual environment
|
||
|
- Run `pip install -r requirements.txt` to install the dependencies
|
||
|
- Run `PYTHONPATH=. jupyter notebook` from this directory to launch the notebook
|
||
|
|
||
|
At this point, you'll be able to run the topic modelling example notebook.
|