qued 3f07840b80
chore: deprecate stage_for_label_studio (#3968)
This PR is to address [a
CVE](https://github.com/advisories/GHSA-rgv9-w7jp-m23g) that appeared in
a recent scan.

The CVE has to do with the package `label_studio_sdk`. This relates to
the tool Label Studio, a data labeling platform. We built a staging
function that takes a list of elements and converts it to a format
suitable for passing to the LabelStudio platform.

We don't use the package with the vulnerability in the actual function,
we only use it to test the output of the function against the Label
Studio API schema.

Even the test where we use it is sort of questionable in value, since
it's really testing the schema against an old version of the LabelStudio
API (we are testing against a recording of the Label Studio API's
responses stored using `vcrpy`).

Label Studio has fixed the vulnerability as of version 1.0.10 of their
SDK, but we're stuck on 1.0.5 because 1.0.6 and above require
`numpy<2.0.0`.

This leaves us with several choices of resolution, some of which are:
1. Downgrade `numpy` to upgrade `label_studio_sdk` to >=1.0.10 to
resolve the CVE
2. Drop `label_studio_sdk` by either removing or rewriting the test.
3. Drop test and dev dependencies from the `unstructured` image.

We've decided to do 2. _and_ 3. This PR handles 2., with 3. to be a
follow-on PR.

Here we add a deprecation notice to `stage_for_label_studio` and remove
the offending test. Normally good practice would be to add a warning of
future deprecation to the function for a reasonable amount of time, but
in order to address the CVE immediately, we're deprecating it right
away.

### Testing
Install the dependencies (`make install`) into a fresh environment, and
`pip list | grep label` should have no results. The scan artifact in CI
should contain no "high" or "critical" CVEs.
2025-03-26 23:37:03 +00:00
..
2024-08-06 19:21:43 +00:00