mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-06 08:31:46 +00:00

### Summary We no longer use the "bricks" terminology for partioning functions, etc in the library. This PR updates various references to bricks within the repo and the docs. This is just an initial pass to swap the terminology out, it'll likely be helpful to reorganize the docs a bit as well. --------- Co-authored-by: qued <64741807+qued@users.noreply.github.com> Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
arXiv Topic Modelling
This directory contains an example of how to use the arXiv python package (wrapper for the arXiv api), berTopic python package (transformer based topic modelling)
and several functions from the unstructured
library to run topic modelling on queried arXiV research papers. This notebook is very simple, but can easily modified for more complicated use cases.
To get started, use the following steps:
- Ensure you have Python 3.8 or higher installed on your system
- Create a new Python virtual environment
- Run
pip install -r requirements.txt
to install the dependencies - Run
PYTHONPATH=. jupyter notebook
from this directory to launch the notebook
At this point, you'll be able to run the topic modelling example notebook.