8 Commits

Author SHA1 Message Date
cragwolfe
bd8a74d686
chore: shell scripts default indent of 2 instead of 4 (#2287)
Given the tendency for shell scripts to easily enter into a few levels
of indentation and long line lengths, update the default to 2 spaces.
2023-12-19 07:48:21 +00:00
Roman Isecke
76efcf4dd7
chore: add shfmt (#2246)
### Description
Given all the shell files that now exist in the repo, would be nice to
have linting/formatting around them (in addition to the existing
shellcheck which doesn't do anything to format the shell code). This PR
introduces `shfmt` to both check for changes and apply formatting when
the associated make targets are called.
2023-12-12 01:04:15 +00:00
Austin Walker
0abebb5fe6
fix: fix benchmark script when DOCKER_TEST=true (#1515)
The home directory for our dockerfile changed and broke this script. To
verify, try running the benchmark script:

```
export DOCKER_TEST=true
./scripts/performance/benchmark.sh
```

I'll pull in the latest changelog before merging.
2023-10-02 16:08:26 +00:00
Steve Canny
b54994ae95
rfctr: docx partitioning (#1422)
Reviewers: I recommend reviewing commit-by-commit or just looking at the
final version of `partition/docx.py` as View File.

This refactor solves a few problems but mostly lays the groundwork to
allow us to refine further aspects such as page-break detection,
list-item detection, and moving python-docx internals upstream to that
library so our work doesn't depend on that domain-knowledge.
2023-09-19 15:32:46 -07:00
Yao You
b504a48e06
dev: add py-spy profiling (#1251)
This PR adds a new developer tool for profiling performance: `py-spy`.
Additionally it adds a new make command to start a docker with your
local `unstructured` repo mounted for quick testing code in a Rocky
Linux environment (see usage below for intent).

### py-spy

It is a sampling profiler https://github.com/benfred/py-spy and in
practice usually provides more readily usable information than commonly
used `cProfiler`. It also supports output to `speedscope` format,
[which](https://github.com/jlfwong/speedscope#usage) provides a rich
view of the profiling result.

### usage

The new tool is added to the existing `profile.sh` script and is readily
discoverable in the interactive interface. When select to view the new
speedscope format profile it would show up in your local browser if you
followed the readme to install speedscope locally via `npm install -g
speedscope`.

On macOS the profiling tool needs superuser privilege. If you are not
comfortable with that feel free to run the profiling inside a Linux
container if your local dev env is macOS.
2023-08-31 19:26:29 +00:00
ryannikolaidis
e08936b6fb
chore: update all bash scripts to use shebang: /usr/bin/env bash (#779) 2023-06-20 16:00:55 -07:00
ryannikolaidis
7d157c1ede
test: add benchmark script (#638) 2023-06-05 09:14:43 -07:00
ryannikolaidis
bdef4fd398
test: adds profiling script (#661) 2023-06-01 21:26:05 +00:00