mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2026-02-07 15:57:42 +00:00
- This Pull Request sets up the `codeflash.yml` file which will run on every new Pull Request that modifies the source code for `unstructured` directory. - We setup the codeflash config in the pyproject.toml file. This defines basic project config for codeflash. - The workflow uses uv to install the CI dependencies faster than your current caching solution. Speed is useful to get quicker optimizations. - Please take a look at the requirements that are being installed. Feel free to add more to the install list. Codeflash tries to execute code and if it is missing a dependency needed to make something run, it will fail to optimize. - Codeflash is being installed everytime in the CI. This helps the workflow always use the latest version of codeflash as it improves rapidly. Feel free to add codeflash to dev dependency as well, since we are about to release more local optimization tools like VS Code and claude code extensions. - Feel free to modify this Github action anyway you want **Actions Required to make this work-** - Install the Codeflash Github app from [this link](https://github.com/apps/codeflash-ai/installations/select_target) to this repo. This is required for our github-bot to comment and create suggestions on the github repo. - Create a new `CODEFLASH_API_KEY` after signing up to [Codeflash from our website](https://www.codeflash.ai/). The onboarding will ask you to create an API Key and show instructions on how to save the api key on your repo secrets. Then, after this PR is merged in it will start generating new optimizations 🎉 --------- Signed-off-by: Saurabh Misra <misra.saurabh1@gmail.com> Co-authored-by: Aseem Saxena <aseem.bits@gmail.com> Co-authored-by: cragwolfe <cragcw@gmail.com>
55 lines
1.8 KiB
YAML
55 lines
1.8 KiB
YAML
name: Codeflash Optimization
|
|
|
|
on:
|
|
pull_request:
|
|
paths:
|
|
- 'unstructured/**'
|
|
|
|
workflow_dispatch:
|
|
|
|
concurrency:
|
|
group: ${{ github.workflow }}-${{ github.ref }}
|
|
cancel-in-progress: true
|
|
|
|
jobs:
|
|
optimize:
|
|
name: Optimize new Python code
|
|
if: ${{ github.actor != 'codeflash-ai[bot]' }}
|
|
runs-on: ubuntu-latest
|
|
env:
|
|
NLTK_DATA: ${{ github.workspace }}/nltk_data
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
fetch-depth: 0
|
|
- name: 🐍 Set up Python 3.12
|
|
uses: actions/setup-python@v5
|
|
with:
|
|
python-version: 3.12
|
|
- name: 📦 Install Environment
|
|
uses: ./.github/actions/base-cache
|
|
with:
|
|
python-version: 3.12
|
|
- name: ⚡️ Codeflash Optimization
|
|
env:
|
|
UNS_API_KEY: ${{ secrets.UNS_API_KEY }}
|
|
TESSERACT_VERSION: "5.5.1"
|
|
CODEFLASH_API_KEY: ${{ secrets.CODEFLASH_API_KEY }}
|
|
run: |
|
|
source .venv/bin/activate
|
|
sudo apt-get update
|
|
sudo apt-get install -y libmagic-dev poppler-utils libreoffice
|
|
sudo add-apt-repository -y ppa:alex-p/tesseract-ocr5
|
|
sudo apt-get update
|
|
sudo apt-get install -y tesseract-ocr tesseract-ocr-kor
|
|
tesseract --version
|
|
installed_tesseract_version=$(tesseract --version | grep -oP '(?<=tesseract )\d+\.\d+\.\d+')
|
|
if [ "$installed_tesseract_version" != "${{env.TESSERACT_VERSION}}" ]; then
|
|
echo "Tesseract version ${{env.TESSERACT_VERSION}} is required but found version $installed_tesseract_version"
|
|
exit 1
|
|
fi
|
|
# FIXME (yao): sometimes there is cache but we still miss argilla in the env; so we add make install-ci again
|
|
make install-ci
|
|
pip install codeflash
|
|
codeflash
|