mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-03 15:11:30 +00:00
36 lines
1.8 KiB
Markdown
36 lines
1.8 KiB
Markdown
![]() |
# Performance
|
||
|
This is a collection of tools helpful for inspecting and tracking performance of the Unstructured library.
|
||
|
|
||
|
The benchmarking script allows a user to track performance time to partitioning results against a fixed set of test documents and store those results with indication of architecture, instance type, and git hash, in S3.
|
||
|
|
||
|
The profiling script allows a user to inspect how time time and memory are spent across called functions when performing partitioning on a given document.
|
||
|
|
||
|
## Install
|
||
|
Benchmarking requires no additional dependencies and should work without any initial setup.
|
||
|
Profiling has a few dependencies which can be installed with:
|
||
|
`pip install -r scripts/performance/requirements.txt`
|
||
|
|
||
|
## Run
|
||
|
### Benchmark
|
||
|
Export / assign desired environment variable settings:
|
||
|
- DOCKER_TEST: Set to true to run benchmark inside a Docker container (default: false)
|
||
|
- NUM_ITERATIONS: Number of iterations for benchmark (e.g., 100) (default: 3)
|
||
|
- INSTANCE_TYPE: Type of benchmark instance (e.g., "c5.xlarge") (default: unspecified)
|
||
|
- PUBLISH_RESULTS: Set to true to publish results to S3 bucket (default: false)
|
||
|
-
|
||
|
Usage: `./scripts/performance/benchmark.sh`
|
||
|
|
||
|
### Profile
|
||
|
|
||
|
Export / assign desired environment variable settings:
|
||
|
- DOCKER_TEST: Set to true to run profiling inside a Docker container (default: false)
|
||
|
|
||
|
Usage: `./scripts/performance/profile.sh`
|
||
|
- Run the script and choose the profiling mode: 'run' or 'view'.
|
||
|
- In the 'run' mode, you can profile custom files or select existing test files.
|
||
|
- In the 'view' mode, you can view previously generated profiling results.
|
||
|
- The script supports time profiling with cProfile and memory profiling with memray.
|
||
|
- Users can choose different visualization options such as flamegraphs, tables, trees, summaries, and statistics.
|
||
|
- Test documents are synced from an S3 bucket to a local directory before running the profiles
|
||
|
|