docs: cleanup readme; add python 3.12 (#3120)

### Summary Updates documentation references in the README to point to https://docs.unstructured.io and cleans up a few sections of the README. Specifically: - Removes an old API announcement - Removes the section mentioning Chipper as a beta feature. Chipper is only available through the SaaS API. Also adds a Python 3.12 tag to `setup.py` since we now support Python 3.12.
2025-12-24 13:44:05 +00:00 · 2024-05-30 12:22:54 -04:00 · 2024-05-30 12:22:54 -04:00 · 23e570fc8a
commit 23e570fc8a
parent 293901e144
2 changed files with 16 additions and 35 deletions
--- a/README.md
+++ b/README.md
@ -37,21 +37,7 @@
  <p>Open-Source Pre-Processing Tools for Unstructured Data</p>
 </h2>

-The `unstructured` library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and [many more](https://unstructured-io.github.io/unstructured/core.html#partitioning). The use cases of `unstructured` revolve around streamlining and optimizing the data processing workflow for LLMs. `unstructured` modular functions and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and efficient in transforming unstructured data into structured outputs.
-
-<h3 align="center">
-  <p>API Announcement!</p>
-</h3>
-
-We are thrilled to announce our newly launched [Unstructured API](https://unstructured-io.github.io/unstructured/api.html), providing the Unstructured capabilities from `unstructured` as an API. Check out the [`unstructured-api` GitHub repository](https://github.com/Unstructured-IO/unstructured-api) to start making API calls. You’ll also find instructions about how to host your own API version.
-
-While access to the hosted Unstructured API will remain free, API Keys are required to make requests. To prevent disruption, get yours [here](https://unstructured.io/api-key) and start using it today! Check out the [`unstructured-api` README](https://github.com/Unstructured-IO/unstructured-api#--) to start making API calls.</p>
-
-#### :rocket: Beta Feature: Chipper Model
-
-We are releasing the beta version of our Chipper model to deliver superior performance when processing high-resolution, complex documents. To start using the Chipper model in your API request, you can utilize the `hi_res_model_name=chipper` parameter. Please refer to the documentation [here](https://unstructured-io.github.io/unstructured/api.html#beta-version-hi-res-strategy-with-chipper-model).
-
-As the Chipper model is in beta version, we welcome feedback and suggestions. For those interested in testing the Chipper model, we encourage you to connect with us on [Slack community](https://short.unstructured.io/pzw05l7).
+The `unstructured` library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and [many more](https://docs.unstructured.io/open-source/core-functionality/partitioning). The use cases of `unstructured` revolve around streamlining and optimizing the data processing workflow for LLMs. `unstructured` modular functions and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and efficient in transforming unstructured data into structured outputs.

 ## :eight_pointed_black_star: Quick Start

@ -182,29 +168,23 @@ This starts a docker container with your local repo mounted to `/mnt/local_unstr
 ## :clap: Quick Tour

 ### Documentation
-This README overviews how to install, use and develop the library. For more comprehensive documentation, visit https://unstructured-io.github.io/unstructured/ .
+For more comprehensive documentation, visit https://docs.unstructured.io . You can also learn
+more about our other products on the documentation page, including our SaaS API.

-### Concepts Guide
+Here are a few pages from the [Open Source documentation page](https://docs.unstructured.io/open-source/introduction/overview)
+that are helpful for new users to review:

-The `unstructured` library includes core functionality for partitioning, chunking, cleaning, and
-staging raw documents for NLP tasks.
-You can see a complete list of available functions and how to use them from the [Core Functionality documentation](https://unstructured-io.github.io/unstructured/core.html).
+- [Quick Start](https://docs.unstructured.io/open-source/introduction/quick-start)
+- [Using the `unstructured` open source package](https://docs.unstructured.io/open-source/core-functionality/overview)
+- [Connectors](https://docs.unstructured.io/open-source/ingest/overview)
+- [Concepts](https://docs.unstructured.io/open-source/concepts/document-elements)
+- [Integrations](https://docs.unstructured.io/open-source/integrations)

-In general, these functions fall into several categories:
- *Partitioning* functions break raw documents into standard, structured elements.
- *Cleaning* functions remove unwanted text from documents, such as boilerplate and sentence fragments.
- *Staging* functions format data for downstream tasks, such as ML inference and data labeling.
- *Chunking* functions split documents into smaller sections for use in RAG apps and similarity
-  search.
- *Embedding* encoder classes provide an interfaces for easily converting preprocessed text to
-  vectors.
-
-The **Connectors** 🔗 in `unstructured` serve as vital links between the pre-processing pipeline and various data storage platforms. They allow for the batch processing of documents across various sources, including cloud services, repositories, and local directories. Each connector is tailored to a specific platform, such as Azure, Google Drive, or Github, and comes with unique commands and dependencies. To see the list of Connectors available in `unstructured` library, please check out the [Connectors GitHub folder](https://github.com/Unstructured-IO/unstructured/tree/main/unstructured/ingest/connector) and [documentation](https://unstructured-io.github.io/unstructured/ingest/index.html)

 ### PDF Document Parsing Example
-The following examples show how to get started with the `unstructured` library. You can parse over a dozen document types with one line of code! Use this [Colab notebook](https://colab.research.google.com/drive/1U8VCjY2-x8c6y5TYMbSFtQGlQVFHCVIW) to run the example below.
-
-The easiest way to parse a document in unstructured is to use the `partition` function. If you use `partition` function, `unstructured` will detect the file type and route it to the appropriate file-specific partitioning function. If you are using the `partition` function, you may need to install additional parameters via `pip install unstructured[local-inference]`. Ensure you first install `libmagic` using the instructions outlined [here](https://unstructured-io.github.io/unstructured/installing.html#filetype-detection) `partition` will always apply the default arguments. If you need advanced features, use a document-specific partitioning function.
+The following examples show how to get started with the `unstructured` library. The easiest way to parse a document in unstructured is to use the `partition` function. If you use `partition` function, `unstructured` will detect the file type and route it to the appropriate file-specific partitioning function. If you are using the `partition` function, you may need to install additional dependencies per doc type.
+For example, to install docx dependencies you need to run `pip install "unstructured[docx]"`.
+See our  [installation guide](https://docs.unstructured.io/open-source/installation/full-installation) for more details.

 ```python
 from unstructured.partition.auto import partition
@ -245,7 +225,7 @@ Deep Learning(DL)-based approaches are the state-of-the-art for a wide range of
 including document image classiﬁcation [11,
 ```

-See the [partitioning](https://unstructured-io.github.io/unstructured/core.html#partitioning)
+See the [partitioning](https://docs.unstructured.io/open-source/core-functionality/partitioning)
 section in our documentation for a full list of options and instructions on how to use
 file-specific partitioning functions.

@ -263,7 +243,7 @@ Encountered a bug? Please create a new [GitHub issue](https://github.com/Unstruc
 | Section | Description |
 |-|-|
 | [Company Website](https://unstructured.io) | Unstructured.io product and company info |
-| [Documentation](https://unstructured-io.github.io/unstructured) | Full API documentation |
+| [Documentation](https://docs.unstructured.io/) | Full API documentation |
 | [Batch Processing](unstructured/ingest/README.md) | Ingesting batches of documents through Unstructured |

 ## :chart_with_upwards_trend: Analytics
--- a/setup.py
+++ b/setup.py
@ -96,6 +96,7 @@ setup(
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
        "Programming Language :: Python :: 3.11",
+        "Programming Language :: Python :: 3.12",
        "Topic :: Scientific/Engineering :: Artificial Intelligence",
    ],
    author="Unstructured Technologies",