19 Commits

Author SHA1 Message Date
Matt Robinson
9acf26ec2e
docs: explicitly replace all old pages with link to new docs (#3118)
### Summary

Explicitly replaces all old docs pages with a link to the new docs. This
was required because 404 redirects didn't work for pages that previously
existed, though they worked non-existing paths that never existed.
2024-05-30 13:01:33 +00:00
Matt Robinson
73739b38cc
docs: redirect to docs.unstructured.io on github pages (#3054)
### Summary

Updates GitHub pages to redirect to the new https://docs.unstructured.io
page. This will appear on GitHub pages after the next tag.

### Testing

1. From the docs direction, run `make html`. You should not see any
errors or warnings
2. Open `unstructured/docs/build/html/index.html`. It should look like
the following:
<img width="1512" alt="image"
src="https://github.com/Unstructured-IO/unstructured/assets/1635179/077626a5-d88a-467e-9e37-273a92e75d30">
3. Open `unstructured/docs/build/html/404.html`. It should redirect back
to `index.html`. Per the [GitHub pages
docs](https://docs.github.com/en/pages/getting-started-with-github-pages/creating-a-custom-404-page-for-your-github-pages-site),
that page will get served for 404 errors, meaning any links to old docs
pages will redirect to `index.html`, which points users to the new docs
page.
2024-05-21 09:38:32 -04:00
Jack Retterer
a35ff890e0
Update docs jack (#1157)
Documentation Overhaul

- Added documentation hierarchy
- Added options for Bash vs Python for API & Upstream Connectors
- Added Introduction section (Overview, Key Concepts, Getting Started)
- Redid connectors section
- Installation is now broken up (needs further work)
2023-08-21 10:27:32 -07:00
Matt Robinson
331c7faf38
build(deps): split up dependencies by document type (#986)
* split dependencies by document type

* make pip-compile with new requirements

* add extra requirements to setup.py

* add in all docs; re pip-compile

* extra for all docs

* add pandas to xlsx

* dependency requires for tsv and csv

* handling for doc, docx and odt

* dependency check for pypandoc

* required dependencies for pandoc files

* xml and html

* markdown

* msg

* add in pdf

* add in pptx

* add in excel

* add lxml as base req

* extra all docs for local inference

* local inference installs all

* pin pillow version

* fixes for plain text tests

* fixes for doc

* update make commands

* changelog and version

* add xlrd

* update pip-compile

* pin numpy for python 3.8 support

* more constraints

* contraint on scipy

* update install docs

* constrain ipython

* add outlook to pip-compile

* more ipython constraints

* add extras to dockerfile

* pin office365 client

* few doc tweaks

* types as strings

* last pip-compile

* re pip-comple

* make tidy

* make tidy
2023-08-01 11:31:13 -04:00
Jack Retterer
708714dab5
docs: fixed typo in Installation guide (#945) 2023-07-21 13:33:44 +00:00
qued
d3600dd5da
build(deps): update inference version (#662)
Updated to the the latest version of unstructured-inference. detectron2 now gets implemented with onnxruntime, yay!

---------

Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2023-05-31 13:50:15 -05:00
Nicolas
c62bee48ad
Update installing.rst (#590) 2023-05-16 02:08:01 +00:00
Matt Robinson
fae5f8fdde
feat: add partition_odt for open office docs (#548)
* added filetype detection for odt

* add function for partition odt documents

* add odt files to auto

* changelog and version

* docs and readme

* update installation docs

* skip tests if not supported or in docker

* import pytest

* fix docs typos
2023-05-04 19:28:08 +00:00
Shukri
8d4308af43
doc: typo (#495)
XML/HTML Depenedencies -> XML/HTML Dependencies
2023-04-17 20:26:50 -07:00
qued
4211dda360
build: sync detectron version (#440)
* Update detectron2 version in Dockerfile
* Update detectron2 version in docs
2023-04-03 18:47:43 -05:00
Matt Robinson
e43cb0e6e0
feat: add partition_epub function (#364)
* add pypandoc dependency

* added epub partitioner and file conversion

* test for partition_epub

* tests for file conversion

* add epub to filetype detection

* added epub to auto partition

* update bricks docs

* updated installing docs

* changelot and version

* add pandoc to dependencies

* add pandoc to debian dependencies

* linting, linting, linting

* typo fix

* typo fix

* file conversion type hints

* more type hints

---------

Co-authored-by: qued <64741807+qued@users.noreply.github.com>
2023-03-14 15:52:21 +00:00
Matt Robinson
354eff1e2b
build(deps): automatically download nltk models when required (#246)
* code for downloading nltk packages

* don't run nltk make command in ci

* test for model downloads

* remove nltk install from docs

* update changelog and bump version
2023-02-23 17:19:13 +00:00
Matt Robinson
314924137f
docs: add quotes to local-inference install instructions (#245) 2023-02-21 09:58:26 -06:00
Matt Robinson
7472e1bb21
docs: add a quick start page to the readme and docs (#240)
* added quick start section to the readme

* added quick start to docs

* parenthetical on extra deps

* typo

* fix typo

* fixed mixed tabs/spaces
2023-02-17 22:13:28 +00:00
Matt Robinson
5376bc510f
feat: generic partition brick with filetype detection (#132)
* add python-magic

* first pass on filetype detection

* tests for filetype detection

* more tests for file detection

* added tests for error conditions

* install libmagic dev in github

* libmagic install instructions

* pattern for checking email files

* support reading .eml in rb mode

* add auto partition function

* auto tests for emal

* auto tests for docx

* added tests for html

* add pdf and html tests

* linting, linting, linting

* added docs for auto partitioning

* update readme with generic partition brick

* bumped version

* added test for bad type

* detect .docx files from application/octet-stream

* linting, linting, linting

* identify xlsx from octet stream

* install poppler in ci

* fix mocks; test for unknown type

* install poppler utils

* install in one line

* only poppler-utils

* file extension logic from application/octet-stream

* install local inference for ci

* install detectron2

* removing unused dockerfile
2023-01-09 16:15:14 -05:00
Matt Robinson
33b983fbf0
docs: instructions on how to install on Windows + conda (#129)
* add environment.yml

* instructions on how to install base package and detectron2

* added instructions on paddleocr

* remove covers

* install -> to install

* specified the shell

* updated example snippets

* update environment.yml

* updated the repo reference

* no more ands!
2023-01-05 16:21:44 +00:00
Mallori Harrell
53fcf4e912
chore: Remove PDF parsing code and dependencies (#75)
Remove PDF parsing code and dependencies.
2022-11-21 11:47:29 -06:00
Matt Robinson
fb16847946
feat: Staging brick for attention window chunking (#34)
* add huggingface dependencies and re pip-compile

* first pass on chunk by attention window

* test for chunking function

* completed tests for chunk_by_attention_window

* change default buffer size to 2

* wrapper function for staging

* added docs for transformers

* fix wording and typos

* updated change log and bumped the version

* added docs on huggingface dependencies

* fix typo

* re pip-compile
2022-10-13 11:18:27 -04:00
Matt Robinson
5f40c78f25 Initial Release 2022-09-26 14:55:20 -07:00