unstructured

mirror of https://github.com/Unstructured-IO/unstructured.git synced 2025-09-09 08:39:57 +00:00

Author	SHA1	Message	Date
qued	01f76888e0	build(deps): add tabulate dependency (#673 ) tabulate is used by functions that extract tables from Microsoft documents, but there is nothing explicitly requiring the library. This was not caught by tests, because for some reason, tabulate is in base.txt. This PR adds the dependency to base.in (which also puts it in setup.py), and recompiles the dependencies.	2023-06-01 16:56:24 -05:00
dependabot[bot]	cd9fd9b395	build(deps): bump pygithub from 1.57.0 to 1.58.2 in /requirements (#669 ) Bumps [pygithub](https://github.com/pygithub/pygithub) from 1.57.0 to 1.58.2. - [Release notes](https://github.com/pygithub/pygithub/releases) - [Changelog](https://github.com/PyGithub/PyGithub/blob/master/doc/changes.rst) - [Commits](https://github.com/pygithub/pygithub/compare/v1.57...v1.58.2) --- updated-dependencies: - dependency-name: pygithub dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matt Robinson <mrobinson@unstructured.io>	2023-06-01 18:45:47 +00:00
qued	d3600dd5da	build(deps): update inference version (#662 ) Updated to the the latest version of unstructured-inference. detectron2 now gets implemented with onnxruntime, yay! --------- Co-authored-by: Matt Robinson <mrobinson@unstructured.io>	2023-05-31 13:50:15 -05:00
qued	c82bad1061	build(deps): avoid version conflicts (#636 ) Addresses #631. * Uses constraints to keep dependency versions more consistent. * Moves all dependencies to .in files which are then ingested by setup.py. * Adds script to check consistency of all extras. * Adds consistency check to CI. I should note that while it shouldn't be possible to cause a conflict between base.txt and any of the extras (because base.txt constrains all the extras) it is possible to get a conflict between two of the extras files. There are ways of trying to avoid that (like constraining each file by all the files that have already been processed before it in the order given in the make pip-compile target) but the ones I could think of seemed a little overwrought, and come with problems of their own. If a conflict arises, it should be flagged by CI or locally with make check-deps. When/if that happens, you can resolve the conflict by adding appropriate global constraints in requirements/constraints.txt. Also note that if fileA.in is constrained by fileB.txt, then fileB.in should be compiled before fileA.in in the make pip-compile target. Otherwise fileA.in will be compiled with the old version of fileB.txt which can cause conflicts or keep dependencies from being updated properly.	2023-05-24 22:29:35 +00:00
dependabot[bot]	7f9ec8108d	build(deps): bump importlib-metadata in /requirements (#531 ) Bumps [importlib-metadata](https://github.com/python/importlib_metadata) from 6.5.0 to 6.6.0. - [Release notes](https://github.com/python/importlib_metadata/releases) - [Changelog](https://github.com/python/importlib_metadata/blob/main/CHANGES.rst) - [Commits](https://github.com/python/importlib_metadata/compare/v6.5.0...v6.6.0) --- updated-dependencies: - dependency-name: importlib-metadata dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matt Robinson <mrobinson@unstructured.io>	2023-05-01 21:49:26 +00:00
qued	dc4147d7df	feat: extract tables (#503 ) Exposes table extraction through partition and partition_pdf.	2023-04-21 17:01:29 +00:00
cragwolfe	3972c80c51	build(deps): bump requirements (#414 )	2023-04-05 02:59:06 +00:00
cragwolfe	ce9fc26009	feat: add ability to pass headers in partition_html (#397 ) Also adds pytest-mock requirement, those fixtures are nice to have! Implements issue/feature #396 .	2023-03-23 20:14:57 -07:00
qued	aa494623a2	chore: bump versions (#352 ) Update versions of dependencies, including unpinning the unstructured-inference dependency that's causing conflicts in repos like pipeline-oer that want the newer version.	2023-03-14 09:40:30 -05:00
Alvaro Bartolome	c51adb21e3	feat: add `FsspecConnector` to easily integrate new connectors with a `fsspec` implementation available (#318 ) So as you may see this is a pretty big PR, that basically adds an "adapter" to easily plug in any connector with an available fsspec implementation. This is a way to standardize how the remote filesystems are used within unstructured. I've additionally renamed s3_connector.py to s3.py for readability and consistency and tested that the current approach works as expected and is aligned with the expectations.	2023-03-10 06:15:19 +00:00
cragwolfe	c7eba1636d	build(deps): make pip-compile (#307 ) * build: pip-compile, skip test deps * s	2023-02-28 17:28:14 +11:00
Tom Aarsen	ded60afda9	feat: Add GitHub data connector; add Markdown partitioner (#284 )	2023-02-27 14:36:44 -08:00

12 Commits