unstructured/example-docs/file_we_dont_want_imported
qued b10379c14c
Fix: plug security issue partition system files via include (#3908)
#### Summary

A recent security review showed that it was possible to partition
arbitrary local files in cases where the filetype supports an "include"
functionality that brings in the content of files external to the
partitioned file. This affects `rst` and `org` files.

#### Fix

This PR fixes the above issue by passing the parameter `sandbox=True` in
all cases where `pypandoc.convert_file` is called.

Note I also added the parameter to a call to this method in the ODT
code. I haven't investigated whether there was a security issue with ODT
files, but it seems better to use pandoc in sandbox mode given the
security issues we know about.

#### Testing

To verify that the tests that are added with this PR find the relevant
issue:
- Remove the `sandbox=True` text from
`unstructured/file_utils/file_conversion.py` line 17.
- Run the tests
`test_unstructured.partition.test_rst.test_rst_wont_include_external_files`
and
`test_unstructured.partition.test_org.test_org_wont_include_external_files`.
Both should fail due to the partitioning containing the word "wombat",
which only appears in a file external to the partitioned file.
- Add the parameter back in, and the tests pass.
2025-02-06 03:27:18 +00:00

1 line
6 B
Plaintext