mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-04 07:27:34 +00:00

This PR adds a new developer tool for profiling performance: `py-spy`. Additionally it adds a new make command to start a docker with your local `unstructured` repo mounted for quick testing code in a Rocky Linux environment (see usage below for intent). ### py-spy It is a sampling profiler https://github.com/benfred/py-spy and in practice usually provides more readily usable information than commonly used `cProfiler`. It also supports output to `speedscope` format, [which](https://github.com/jlfwong/speedscope#usage) provides a rich view of the profiling result. ### usage The new tool is added to the existing `profile.sh` script and is readily discoverable in the interactive interface. When select to view the new speedscope format profile it would show up in your local browser if you followed the readme to install speedscope locally via `npm install -g speedscope`. On macOS the profiling tool needs superuser privilege. If you are not comfortable with that feel free to run the profiling inside a Linux container if your local dev env is macOS.
24 lines
690 B
Python
24 lines
690 B
Python
import os
|
|
import sys
|
|
|
|
from unstructured.partition.auto import partition
|
|
|
|
if __name__ == "__main__":
|
|
if len(sys.argv) < 3:
|
|
print(
|
|
"Please provide the path to the file as the first argument and the strategy as the "
|
|
"second argument.",
|
|
)
|
|
sys.exit(1)
|
|
|
|
file_path = sys.argv[1]
|
|
strategy = sys.argv[2]
|
|
model_name = None
|
|
if len(sys.argv) > 3:
|
|
model_name = sys.argv[3]
|
|
else:
|
|
model_name = os.environ.get("PARTITION_MODEL_NAME")
|
|
result = partition(file_path, strategy=strategy, model_name=model_name)
|
|
# access element in the return value to make sure we got something back, otherwise error
|
|
result[1]
|