add extra-index-url for scarf anonymous tracking (#1668)

This adds extra-index-url to our docs to allow for anonymous install
analytics to help us understand and improve our product.

---------

Co-authored-by: cragwolfe <crag@unstructured.io>
This commit is contained in:
Trevor Bossert 2023-10-06 18:16:38 -07:00 committed by GitHub
parent 7e310ecac2
commit ce206f1f85
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 6 additions and 6 deletions

View File

@ -29,7 +29,7 @@ install-base-ci: install-base-pip-packages install-nltk-models install-test
.PHONY: install-base-pip-packages
install-base-pip-packages:
python3 -m pip install pip==${PIP_VERSION}
python3 -m pip install -r requirements/base.txt
python3 -m pip install -r requirements/base.txt --extra-index-url https://packages.unstructured.io/simple/
.PHONY: install-huggingface
install-huggingface:

View File

@ -110,9 +110,9 @@ python3
Use the following instructions to get up and running with `unstructured` and test your
installation.
- Install the Python SDK to support all document types with `pip install "unstructured[all-docs]"`
- For plain text files, HTML, XML, JSON and Emails that do not require any extra dependencies, you can run `pip install unstructured`
- To process other doc types, you can install the extras required for those documents, such as `pip install "unstructured[docx,pptx]"`
- Install the Python SDK to support all document types with `pip install "unstructured[all-docs]" --extra-index-url https://packages.unstructured.io/simple/`
- For plain text files, HTML, XML, JSON and Emails that do not require any extra dependencies, you can run `pip install unstructured --extra-index-url https://packages.unstructured.io/simple/`
- To process other doc types, you can install the extras required for those documents, such as `pip install "unstructured[docx,pptx]" --extra-index-url https://packages.unstructured.io/simple/`
- Install the following system dependencies if they are not already available on your system.
Depending on what document types you're parsing, you may not need all of these.
- `libmagic-dev` (filetype detection)
@ -183,7 +183,7 @@ This README overviews how to install, use and develop the library. For more comp
**Bricks** 🧱 in `unstructured` are the foundational elements that drive the data processing workflow within the system. These components provide users with the building blocks to build pipelines targeted at the documents they care about. The bricks fall into three categories:
- :jigsaw: *Partitioning* bricks break raw documents into standard, structured elements.
- :broom: *Cleaning* bricks remove unwanted text from documents, such as boilerplate and sentence fragments.
- :performing_arts: *Staging* bricks format data for downstream tasks, such as ML inference and data labeling.
- :performing_arts: *Staging* bricks format data for downstream tasks, such as ML inference and data labeling.
These bricks create a cohesive, streamlined process that enables effective data handling and analysis. Check out the available bricks and how to use them from the [Bricks documentation](https://unstructured-io.github.io/unstructured/bricks.html).
@ -192,7 +192,7 @@ The **Connectors** 🔗 in `unstructured` serve as vital links between the pre-p
### PDF Document Parsing Example
The following examples show how to get started with the `unstructured` library. You can parse over a dozen document types with one line of code! Use this [Colab notebook](https://colab.research.google.com/drive/1U8VCjY2-x8c6y5TYMbSFtQGlQVFHCVIW) to run the example below.
The easiest way to parse a document in unstructured is to use the `partition` brick. If you use `partition` brick, `unstructured` will detect the file type and route it to the appropriate file-specific partitioning brick. If you are using the `partition` brick, you may need to install additional parameters via `pip install unstructured[local-inference]`. Ensure you first install `libmagic` using the instructions outlined [here](https://unstructured-io.github.io/unstructured/installing.html#filetype-detection) `partition` will always apply the default arguments. If you need advanced features, use a document-specific brick.
The easiest way to parse a document in unstructured is to use the `partition` brick. If you use `partition` brick, `unstructured` will detect the file type and route it to the appropriate file-specific partitioning brick. If you are using the `partition` brick, you may need to install additional parameters via `pip install unstructured[local-inference] --extra-index-url https://packages.unstructured.io/simple/`. Ensure you first install `libmagic` using the instructions outlined [here](https://unstructured-io.github.io/unstructured/installing.html#filetype-detection) `partition` will always apply the default arguments. If you need advanced features, use a document-specific brick.
```python
from unstructured.partition.auto import partition