pre-commit hooks (#2819)

* Add pre-commit config

* update contributing guidelines

* try failing the workflow

* add pre-commit to the deps

* updating uninstall instructions

* separate jobs in CI

* make tutorials check fail

* make black check fail

* make openapi check fail

* make yaml schema and api docs checks fail

* highlight the instructions

* Update .pre-commit-config.yaml

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Use black --check

* Add images of the CI

* title level

* feedback

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>
This commit is contained in:
Sara Zan 2022-07-26 15:02:15 +02:00 committed by GitHub
parent 3c81103db7
commit 2d65c380f1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
19 changed files with 387 additions and 153 deletions

View File

@ -1,27 +0,0 @@
#!/bin/bash
echo "========== Apply Black ========== "
black .
echo
echo "========== Convert tutorial notebooks into webpages ========== "
python .github/utils/convert_notebooks_into_webpages.py
echo
echo "========== Generate OpenAPI docs ========== "
python .github/utils/generate_openapi_specs.py
echo
echo "========== Generate JSON schema ========== "
python .github/utils/generate_json_schema.py
echo
echo "========== Generate the API documentation ========== "
set -e # Fails on any error in the following loop
export PYTHONPATH=$PWD/docs/pydoc # Make the renderers available to pydoc
cd docs/_src/api/api/
for file in ../pydoc/* ; do
echo "Processing" $file
pydoc-markdown "$file"
done
echo

4
.github/utils/convert_notebooks_into_webpages.py vendored Normal file → Executable file
View File

@ -1,3 +1,5 @@
#!/usr/bin/env python3
import re
from nbconvert import MarkdownExporter
@ -142,7 +144,7 @@ date: "2022-06-15"
id: "tutorial17md"
--->""",
18: """<!---
title: "Tutorial 18"
title: "Tutorial 18"
metaTitle: "GPL Domain Adaptation"
metaDescription: ""
slug: "/docs/tutorial18"

2
.github/utils/generate_json_schema.py vendored Normal file → Executable file
View File

@ -1,3 +1,5 @@
#!/usr/bin/env python3
import sys
import logging
from pathlib import Path

2
.github/utils/generate_openapi_specs.py vendored Normal file → Executable file
View File

@ -1,3 +1,5 @@
#!/usr/bin/env python3
import json
from pathlib import Path
import os

7
.github/utils/pydoc-markdown.sh vendored Executable file
View File

@ -0,0 +1,7 @@
#!/bin/bash
set -e # Fails on any error in the following loop
cd docs/_src/api/api/
for file in ../pydoc/* ; do
pydoc-markdown "$file"
done

View File

@ -1,59 +0,0 @@
name: Code & Documentation Updates
on:
# Activate this workflow manually
workflow_dispatch:
# Activate this workflow at every push of code changes
# Note: using push instead of pull_request make the actions
# run on the contributor's actions instead of Haystack's.
# This is necessary for permission issues: Haystack's CI runners
# cannot push changes back to the source fork.
# TODO make sure this is still necessary later on.
push:
branches-ignore:
- 'master'
jobs:
run:
runs-on: ubuntu-latest
steps:
- run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Set up Python 3.7
uses: actions/setup-python@v2
with:
python-version: 3.7
- name: Cache Python
uses: actions/cache@v2
with:
path: ${{ env.pythonLocation }}
key: linux-${{ env.date }}-${{ hashFiles('**/setup.py') }}-${{ hashFiles('**/setup.cfg') }}-${{ hashFiles('**/pyproject.toml') }}
- name: Install Dependencies
run: |
pip install --upgrade pip
pip install .[all]
pip install rest_api/
pip install ui/
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.0+cpu.html
- name: Install sndfile
run: sudo apt update && sudo apt-get install libsndfile1 ffmpeg
- name: Code and Docs Updates
run: ./.github/utils/code_and_docs.sh
# Commit the files to GitHub
- name: Commit files
run: |
git status
git config --local user.email "41898282+github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
git add .
git commit -m "Update Documentation & Code Style" -a || echo "No changes to commit"
git push

47
.github/workflows/black.yml vendored Normal file
View File

@ -0,0 +1,47 @@
name: Black
on:
workflow_dispatch: # Activate this workflow manually
pull_request:
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: ./.github/actions/python_cache/
- name: Install Haystack
run: |
pip install --upgrade pip
pip install .[dev]
- name: Check status
run: |
if ! black . --check; then
git status
echo "###################################################################################################"
echo "# "
echo "# CHECK FAILED! Black found issues with your code formatting."
echo "# "
echo "# Either:"
echo "# 1. Run Black locally before committing:"
echo "# "
echo "# pip install black==22.6.0"
echo "# black ."
echo "# "
echo "# 2. Install the pre-commit hook:"
echo "# "
echo "# pre-commit install --hook-type pre-push"
echo "# "
echo "# 3. See https://github.com/deepset-ai/haystack/blob/master/CONTRIBUTING.md for help."
echo "# "
echo "# If you have further problems, please open an issue: https://github.com/deepset-ai/haystack/issues"
echo "# "
echo "##################################################################################################"
exit 1
fi

49
.github/workflows/documentation.yml vendored Normal file
View File

@ -0,0 +1,49 @@
name: Documentation
on:
workflow_dispatch: # Activate this workflow manually
pull_request:
jobs:
api-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: ./.github/actions/python_cache/
- name: Install Haystack
run: |
pip install --upgrade pip
pip install -U .[dev]
- name: Update API documentation
run: .github/utils/pydoc-markdown.sh
- name: Check status
run: |
if [[ `git status --porcelain` ]]; then
git status
echo "###################################################################################################"
echo "# "
echo "# CHECK FAILED! The API docs were not updated."
echo "# "
echo "# Either:"
echo "# 1. Generate the new API docs locally before committing:"
echo "# "
echo "# .github/utils/pydoc-markdown.sh"
echo "# "
echo "# 2. Install the pre-commit hook:"
echo "# "
echo "# pre-commit install --hook-type pre-push"
echo "# "
echo "# 3. See https://github.com/deepset-ai/haystack/blob/master/CONTRIBUTING.md for help."
echo "# "
echo "# If you have further problems, please open an issue: https://github.com/deepset-ai/haystack/issues"
echo "# "
echo "###################################################################################################"
exit 1
fi

83
.github/workflows/schemas.yml vendored Normal file
View File

@ -0,0 +1,83 @@
name: Schemas
on:
workflow_dispatch: # Activate this workflow manually
pull_request:
jobs:
openapi:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: ./.github/actions/python_cache/
- name: Install Haystack
run: |
pip install --upgrade pip
pip install .[dev]
pip install -U rest_api/
- name: Update OpenAPI specs
run: python .github/utils/generate_openapi_specs.py
- name: Check status
run: |
if [[ `git status --porcelain` ]]; then
git status
echo "# "
echo "# CHECK FAILED! OpenAPI specs were not updated."
echo "# "
echo "# Please generate the new OpenAPI specs locally:"
echo "# "
echo "# python .github/utils/generate_openapi_specs.py"
echo "# "
echo "# Or see https://github.com/deepset-ai/haystack/blob/master/CONTRIBUTING.md for help."
echo "# "
echo "# If you have further problems, please open an issue: https://github.com/deepset-ai/haystack/issues"
echo "# "
exit 1
fi
pipeline-yaml:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: ./.github/actions/python_cache/
- name: Install sndfile
run: sudo apt update && sudo apt-get install libsndfile1 ffmpeg
- name: Install Haystack
run: |
pip install --upgrade pip
pip install -U .[all]
- name: Update pipeline YAML schemas
run: python .github/utils/generate_json_schema.py
- name: Check status
run: |
if [[ `git status --porcelain` ]]; then
git status
echo "##################################################################################################"
echo "# "
echo "# CHECK FAILED! The YAML schemas for pipelines were not updated."
echo "# "
echo "# Please generate the new schemas locally:"
echo "# "
echo "# python .github/utils/generate_json_schema.py"
echo "# "
echo "# Or see https://github.com/deepset-ai/haystack/blob/master/CONTRIBUTING.md for help."
echo "# "
echo "# If you have further problems, please open an issue: https://github.com/deepset-ai/haystack/issues"
echo "# "
echo "##################################################################################################"
exit 1
fi

View File

@ -96,7 +96,7 @@ jobs:
- name: Setup Python
uses: ./.github/actions/python_cache/
- name: Install torch-scatter
run: pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.0+cpu.html
@ -499,7 +499,7 @@ jobs:
run: |
pip install -U rest_api/
pip install -U ui/
- name: Run tests
run: |
pytest ${{ env.PYTEST_PARAMS }} rest_api/ ui/
@ -618,39 +618,3 @@ jobs:
# FIXME many tests are disabled here!
run: |
pytest ${{ env.PYTEST_PARAMS }} -m "integration and not tika and not graphdb" ${{ env.SUITES_EXCLUDED_FROM_WINDOWS }} test/${{ matrix.folder }} --document_store_type=memory,faiss,elasticsearch
# This CI action mirrors autoformat.yml, with the difference that it
# runs on Haystack's end. If the contributor hasn't run autoformat.yml,
# then this check will fail.
bot-check:
runs-on: ubuntu-latest
needs: unit-tests-linux
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: ./.github/actions/python_cache/
- name: Install Dependencies
run: |
pip install --upgrade pip
pip install .[all]
pip install rest_api/
pip install ui/
- name: Code and Docs Updates
run: ./.github/utils/code_and_docs.sh
# If there is anything to commit, fail
- name: Check status
run: |
if [[ `git status --porcelain` ]]; then
git status
echo ""
echo "This means that the 'autoformat.yml' action didn't run."
echo "Please enable GitHub Action on your fork to pass this check!"
echo "See https://github.com/deepset-ai/haystack/blob/master/CONTRIBUTING.md#forks for instructions"
exit 1
fi

View File

@ -14,6 +14,43 @@ env:
jobs:
docs-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: ./.github/actions/python_cache/
- name: Install Haystack
run: |
pip install --upgrade pip
pip install .[dev]
- name: Docs Check
run: python .github/utils/convert_notebooks_into_webpages.py
- name: Status
run: |
if [[ `git status --porcelain` ]]; then
git status
echo "##################################################################################################"
echo "#"
echo "# CHECK FAILED! You need to update the static version of the tutorials."
echo "#"
echo "# Please run the tutorials documentation update script:"
echo "#"
echo "# python .github/utils/convert_notebooks_into_webpages.py"
echo "#"
echo "# or see https://github.com/deepset-ai/haystack/blob/master/CONTRIBUTING.md for help."
echo "#"
echo "# If you have further problems, please open an issue: https://github.com/deepset-ai/haystack/issues"
echo "#"
echo "##################################################################################################"
exit 1
fi
run:
runs-on: ubuntu-latest

39
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,39 @@
default_language_version:
python: python3.7
fail_fast: true
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.2.0
hooks:
- id: check-ast # checks Python syntax
- id: check-json # checks JSON syntax
- id: check-yaml # checks YAML syntax
- id: check-toml # checks TOML syntax
- id: end-of-file-fixer # checks there is a newline at the end of the file
- id: trailing-whitespace # trims trailing whitespace
- id: check-merge-conflict # checks for no merge conflict strings
- id: check-shebang-scripts-are-executable # checks all shell scripts have executable permissions
- id: mixed-line-ending # normalizes line endings
- id: no-commit-to-branch # prevents committing to master
- id: pretty-format-json # indents and sorts JSON files
- repo: https://github.com/psf/black
rev: 22.6.0 # IMPORTANT: keep this aligned with the black version in setup.cfg
hooks:
- id: black
# These can fail if some dependencies are missing. Also untested on Windows.
- repo: local
hooks:
- id: pydoc-markdown
name: Update API documentation (slow)
entry: .github/utils/pydoc-markdown.sh
language: script
types: [bash]
pass_filenames: false
always_run: true
# TODO we can make mypy and pylint run at this stage too, once their execution gets normalized

View File

@ -4,45 +4,132 @@ We are very open to community contributions and appreciate anything that improve
To avoid unnecessary work on either side, please stick to the following process:
1. Check if there is already [a related issue](https://github.com/deepset-ai/haystack/issues).
2. Open a new issue to start a quick discussion. Some features might be a nice idea, but don't fit in the scope of Haystack and we hate to close finished PRs!
3. Create a pull request in an early draft version and ask for feedback. If this is your first pull request and you wonder how to actually create a pull request, checkout [this manual](https://opensource.com/article/19/7/create-pull-request-github).
4. Verify that all tests in the CI pass (and add new ones if you implement anything new)
2. If not, open a new issue to start a quick discussion. Some features might be a nice idea, but don't fit in the scope of Haystack and we hate to close finished PRs!
3. Once you're ready to start, setup you development environment (see below).
4. Once you have commits to publish, create a draft pull request with the initial sketch of the implementation and ask for feedback. **Do not wait until the feature is complete!** If this is your first pull request and you wonder how to actually create a pull request, checkout [this manual](https://opensource.com/article/19/7/create-pull-request-github).
5. Verify that all tests in the CI pass (and add new ones if you implement anything new).
## Setting up your development environment
Even though Haystack runs on Linux, MacOS and Windows, we current we mostly support development on Linux and MacOS. Windows contributors might encounter issues. To work around these, consider using [WSL](https://docs.microsoft.com/en-us/windows/wsl/about) for contributing to Haystack.
The following instructions are **tested on Linux (Ubuntu).**
### Prerequisites
Before starting, make sure your system packages are up-to-date and that a few dependencies are installed. From the terminal, run:
```bash
sudo apt update && sudo apt-get install libsndfile1 ffmpeg
```
You might need to install additional dependencies, depending on what exactly you will be working with. Refer to the relevant node's documentation to understand which dependencies are required.
### Installation
Now fork and clone the repo. From the terminal, run:
```bash
git clone https://github.com/<your-gh-username>/haystack.git
```
or use your favourite Git(Hub) client.
Then move into the cloned folder, create a virtualenv, and perform an **editable install**.
```bash
# Move into the cloned folder
cd haystack/
# Create a virtual environment
python3 -m venv venv
# Activate the environment
source venv/bin/activate
# Upgrade pip (very important!)
pip install --upgrade pip
# Install Haystack in editable mode
pip install -e '.[all]'
```
This will install all the dependencies you need to work on the codebase, plus testing and formatting dependencies.
Last, install the pre-commit hooks with:
```bash
pre-commit install --hook-type pre-push
```
This utility will run some tasks right before all `git push` operations. From now on, your `git push` output for Haystack should look something like this:
```
> git push
check python ast.....................................(no files to check)Skipped
check json...........................................(no files to check)Skipped
check yaml...............................................................Passed
check toml...........................................(no files to check)Skipped
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
check for merge conflicts................................................Passed
check that scripts with shebangs are executable..........................Passed
mixed line ending........................................................Passed
don't commit to branch...................................................Passed
pretty format json...................................(no files to check)Skipped
black................................................(no files to check)Skipped
Update API documentation (slow)..........................................Passed
Enumerating objects: 14, done.
Counting objects: 100% (14/14), done.
Delta compression using up to 16 threads
Compressing objects: 100% (12/12), done.
Writing objects: 100% (12/12), 1.06 KiB | 1.06 MiB/s, done.
Total 12 (delta 8), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (8/8), completed with 2 local objects.
To github.com:deepset-ai/haystack.git
801be454..6fddb35f my-branch -> my-branch
```
Note: pre-commit hooks might fail. If that happens to you and you can't understand why, please do the following:
- Ask for help by opening an issue or reaching out on our Slack channel. We usually give some feedback within a day for most questions.
- As the last resort, if you are desperate and everything failed, ask Git to skip the hook with `git commit --no-verify`. This command will suspend all pre-commit hooks and let you push in all cases. The CI might fail, but at that point we will be able to help.
- In case of further issues pushing your changes, please uninstall the hook with `pre-commit uninstall -t pre-commit -t pre-push` and review your Git setup.
## Formatting of Pull Requests
Please give a concise description in the first comment in the PR that includes:
When you open a pull request, please give a concise description in the first comment in the PR that includes:
- What is changing?
- Why?
- What are limitations?
- Breaking changes (Example of before vs. after)
- Link the issue that this relates to
## CI (Continuous Integration)
We use GitHub Action for our Continuous Integration tasks. This means that, as soon as you open a PR, GitHub will start executing some workflows on your code, like automated tests, linting, formatting, api docs generation, etc.
If all goes well, at the bottom of your PR page you should see something like this, where all checks are green.
![Successful CI](docs/img/ci-success.png)
If you see some red checks (like the following), then something didn't work, and action is needed from your side.
![Failed CI](docs/img/ci-failure-example.png)
Click on the failing test and see if there are instructions at the end of the logs of the failed test.
For example, in the case above, the CI will give you instructions on how to fix the issue.
![Logs of failed CI, with instructions for fixing the failure](docs/img/ci-failure-example-instructions.png)
## Working from Github forks
Some actions in our CI (code style and documentation updates) will run on your code and occasionally commit back small changes after a push. To be able to do so,
these actions are configured to run on your fork instead of on the base repository. To allow those actions to run, please don't forget to:
In order for maintainers to be able to help you, we usually ask contributors to give us push access to their fork.
1. Enable actions on your fork with read and write permissions:
To do so, please verify that "Allow edits and access to secrets by maintainers" on the PR preview page is checked (you can check it later on the PR's sidebar once it's created).
<p align="center"><img src="https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/img/fork_action_config.png"></p>
![Allow access to your branch to maintainers](docs/img/first_time_contributor_enable_access.png)
2. Verify that "Allow edits and access to secrets by maintainers" on the PR preview page is checked (you can check it later on the PR's sidebar once it's created).
<p align="center"><img src="https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/img/first_time_contributor_enable_access.png"></p>
3. Make sure the branch of your fork where you push your changes is not called `master`. If it is, either change its name or remember to manually trigger the `Code & Documentation Updates` action after a push.
## Setting up your development environment
When working on Haystack, we recommend installing it in editable mode with `pip install -e` in a Python virtual
environment. From the root folder:
```
pip install -e '.[test]'
```
This will install all the dependencies you need to work on the codebase, which most of the times is a subset of all the
dependencies needed to run Haystack.
## Running the tests

View File

@ -1,5 +1,5 @@
<!---
title: "Tutorial 18"
title: "Tutorial 18"
metaTitle: "GPL Domain Adaptation"
metaDescription: ""
slug: "/docs/tutorial18"

Binary file not shown.

After

Width:  |  Height:  |  Size: 131 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

BIN
docs/img/ci-success.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 220 KiB

View File

@ -101,7 +101,7 @@ install_requires =
# context matching
rapidfuzz>=2.0.15,<3
# Schema validation
jsonschema
@ -187,6 +187,7 @@ ray =
colab =
grpcio==1.43.0
dev =
pre-commit
# Type check
mypy
typing_extensions; python_version < '3.8'
@ -201,7 +202,7 @@ dev =
# Linting
pylint
# Code formatting
black[jupyter]
black[jupyter]==22.6.0
# Documentation
pydoc-markdown==4.5.1 # FIXME Unpin!
mkdocs