unstructured/scripts/performance/run_partition.py
Yao You b504a48e06
dev: add py-spy profiling (#1251)
This PR adds a new developer tool for profiling performance: `py-spy`.
Additionally it adds a new make command to start a docker with your
local `unstructured` repo mounted for quick testing code in a Rocky
Linux environment (see usage below for intent).

### py-spy

It is a sampling profiler https://github.com/benfred/py-spy and in
practice usually provides more readily usable information than commonly
used `cProfiler`. It also supports output to `speedscope` format,
[which](https://github.com/jlfwong/speedscope#usage) provides a rich
view of the profiling result.

### usage

The new tool is added to the existing `profile.sh` script and is readily
discoverable in the interactive interface. When select to view the new
speedscope format profile it would show up in your local browser if you
followed the readme to install speedscope locally via `npm install -g
speedscope`.

On macOS the profiling tool needs superuser privilege. If you are not
comfortable with that feel free to run the profiling inside a Linux
container if your local dev env is macOS.
2023-08-31 19:26:29 +00:00

24 lines
690 B
Python

import os
import sys
from unstructured.partition.auto import partition
if __name__ == "__main__":
if len(sys.argv) < 3:
print(
"Please provide the path to the file as the first argument and the strategy as the "
"second argument.",
)
sys.exit(1)
file_path = sys.argv[1]
strategy = sys.argv[2]
model_name = None
if len(sys.argv) > 3:
model_name = sys.argv[3]
else:
model_name = os.environ.get("PARTITION_MODEL_NAME")
result = partition(file_path, strategy=strategy, model_name=model_name)
# access element in the return value to make sure we got something back, otherwise error
result[1]