unstructured/scripts/performance/README.md

# Performance
This is a collection of tools helpful for inspecting and tracking performance of the Unstructured library. 

The benchmarking script allows a user to track performance time to partitioning results against a fixed set of test documents and store those results with indication of architecture, instance type, and git hash, in S3.

The profiling script allows a user to inspect how time time and memory are spent across called functions when performing partitioning on a given document.

## Install
Benchmarking requires no additional dependencies and should work without any initial setup.
Profiling has a few dependencies which can be installed with: 
`pip install -r scripts/performance/requirements.txt`

## Run
### Benchmark
Export / assign desired environment variable settings:
- DOCKER_TEST: Set to true to run benchmark inside a Docker container (default: false)
- NUM_ITERATIONS: Number of iterations for benchmark (e.g., 100) (default: 3)
- INSTANCE_TYPE: Type of benchmark instance (e.g., "c5.xlarge") (default: unspecified)
- PUBLISH_RESULTS: Set to true to publish results to S3 bucket (default: false)
- 
Usage: `./scripts/performance/benchmark.sh`

### Profile

Export / assign desired environment variable settings:
- DOCKER_TEST: Set to true to run profiling inside a Docker container (default: false)

Usage: `./scripts/performance/profile.sh`
- Run the script and choose the profiling mode: 'run' or 'view'.
- In the 'run' mode, you can profile custom files or select existing test files.
- In the 'view' mode, you can view previously generated profiling results.
- The script supports time profiling with cProfile and memory profiling with memray.
- Users can choose different visualization options such as flamegraphs, tables, trees, summaries, and statistics.
- Test documents are synced from an S3 bucket to a local directory before running the profiles
test: adds profiling script (#661) 2023-06-01 14:26:05 -07:00			`# Performance`
			`This is a collection of tools helpful for inspecting and tracking performance of the Unstructured library.`

			`The benchmarking script allows a user to track performance time to partitioning results against a fixed set of test documents and store those results with indication of architecture, instance type, and git hash, in S3.`

			`The profiling script allows a user to inspect how time time and memory are spent across called functions when performing partitioning on a given document.`

			`## Install`
			`Benchmarking requires no additional dependencies and should work without any initial setup.`
			`Profiling has a few dependencies which can be installed with:`
			`pip install -r scripts/performance/requirements.txt`

			`## Run`
			`### Benchmark`
			`Export / assign desired environment variable settings:`
			`- DOCKER_TEST: Set to true to run benchmark inside a Docker container (default: false)`
			`- NUM_ITERATIONS: Number of iterations for benchmark (e.g., 100) (default: 3)`
			`- INSTANCE_TYPE: Type of benchmark instance (e.g., "c5.xlarge") (default: unspecified)`
			`- PUBLISH_RESULTS: Set to true to publish results to S3 bucket (default: false)`
			`-`
			Usage: `./scripts/performance/benchmark.sh`

			`### Profile`

			`Export / assign desired environment variable settings:`
			`- DOCKER_TEST: Set to true to run profiling inside a Docker container (default: false)`

			Usage: `./scripts/performance/profile.sh`
			`- Run the script and choose the profiling mode: 'run' or 'view'.`
			`- In the 'run' mode, you can profile custom files or select existing test files.`
			`- In the 'view' mode, you can view previously generated profiling results.`
			`- The script supports time profiling with cProfile and memory profiling with memray.`
			`- Users can choose different visualization options such as flamegraphs, tables, trees, summaries, and statistics.`
			`- Test documents are synced from an S3 bucket to a local directory before running the profiles`