mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-12-04 19:06:44 +00:00
docs: update contributor guidelines (#7854)
* update contributor guidelines * remove occurences of core-integrations * punctuation * Apply suggestions from code review Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * link the contributions wanted repo --------- Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
This commit is contained in:
parent
8de639bd70
commit
19a41a669e
465
CONTRIBUTING.md
465
CONTRIBUTING.md
@ -1,61 +1,179 @@
|
||||
# Contributing to Haystack
|
||||
|
||||
We are very open to community contributions and appreciate anything that improves `haystack`! This includes fixings typos, adding missing documentation, fixing bugs or adding new features.
|
||||
To avoid unnecessary work on either side, please stick to the following process:
|
||||
First off, thanks for taking the time to contribute! :blue_heart:
|
||||
|
||||
1. Check if there is already [a related issue](https://github.com/deepset-ai/haystack/issues).
|
||||
2. If not, open a new issue to start a quick discussion. Some features might be a nice idea, but don't fit in the scope of Haystack and we hate to close finished PRs!
|
||||
3. Once you're ready to start, setup you development environment (see below).
|
||||
4. Once you have commits to publish, create a draft pull request with the initial sketch of the implementation and ask for feedback. **Do not wait until the feature is complete!** If this is your first pull request and you wonder how to actually create a pull request, checkout [this manual](https://opensource.com/article/19/7/create-pull-request-github).
|
||||
5. Verify that all tests in the CI pass (and add new ones if you implement anything new).
|
||||
All types of contributions are encouraged and valued. See the [Table of Contents](#table-of-contents)
|
||||
for different ways to help and details about how this project handles them. Please make sure to read
|
||||
the relevant section before making your contribution. It will make it a lot easier for us maintainers
|
||||
and smooth out the experience for all involved. The community looks forward to your contributions!
|
||||
|
||||
> [!TIP]
|
||||
> If you like Haystack but just don't have time to contribute, that's fine. There are other easy ways to support the
|
||||
> project and show your appreciation, which we would also be very happy about:
|
||||
> - Star this repository
|
||||
> - Tweet about it
|
||||
> - Mention Haystack at local meetups and tell your friends/colleagues
|
||||
|
||||
## Setting up your development environment
|
||||
**Table of Contents**
|
||||
|
||||
Even though Haystack runs on Linux, MacOS and Windows, we current we mostly support development on Linux and MacOS. Windows contributors might encounter issues. To work around these, consider using [WSL](https://docs.microsoft.com/en-us/windows/wsl/about) for contributing to Haystack.
|
||||
- [Contributing to Haystack](#contributing-to-haystack)
|
||||
- [Code of Conduct](#code-of-conduct)
|
||||
- [I Have a Question](#i-have-a-question)
|
||||
- [Reporting Bugs](#reporting-bugs)
|
||||
- [Before Submitting a Bug Report](#before-submitting-a-bug-report)
|
||||
- [How Do I Submit a Good Bug Report?](#how-do-i-submit-a-good-bug-report)
|
||||
- [Suggesting Enhancements](#suggesting-enhancements)
|
||||
- [Before Submitting an Enhancement](#before-submitting-an-enhancement)
|
||||
- [How Do I Submit a Good Enhancement Suggestion?](#how-do-i-submit-a-good-enhancement-suggestion)
|
||||
- [Contribute code](#contribute-code)
|
||||
- [Where to start](#where-to-start)
|
||||
- [Setting up your development environment](#setting-up-your-development-environment)
|
||||
- [Clone the git repository](#clone-the-git-repository)
|
||||
- [Run the tests locally](#run-the-tests-locally)
|
||||
- [Requirements for Pull Requests](#requirements-for-pull-requests)
|
||||
- [Release notes](#release-notes)
|
||||
- [CI (Continuous Integration)](#ci-continuous-integration)
|
||||
- [Working from GitHub forks](#working-from-github-forks)
|
||||
- [Writing tests](#writing-tests)
|
||||
- [Unit test](#unit-test)
|
||||
- [Integration test](#integration-test)
|
||||
- [End to End (e2e) test](#end-to-end-e2e-test)
|
||||
- [Contributor Licence Agreement (CLA)](#contributor-licence-agreement-cla)
|
||||
|
||||
The following instructions are **tested on Linux (Ubuntu).**
|
||||
## Code of Conduct
|
||||
|
||||
### Prerequisites
|
||||
This project and everyone participating in it is governed by our [Code of Conduct](code_of_conduct.txt).
|
||||
By participating, you are expected to uphold this code. Please report unacceptable behavior to haystack@deepset.ai.
|
||||
|
||||
Before starting, make sure your system packages are up-to-date and that a few dependencies are installed. From the terminal, run:
|
||||
## I Have a Question
|
||||
|
||||
```bash
|
||||
sudo apt update && sudo apt-get install libsndfile1 ffmpeg
|
||||
> [!TIP]
|
||||
> If you want to ask a question, we assume that you have read the available [Documentation](https://docs.haystack.deepset.ai/docs/intro).
|
||||
|
||||
Before you ask a question, it is best to search for existing [Issues](/issues) that might help you. In case you have
|
||||
found a suitable issue and still need clarification, you can write your question in this issue. It is also advisable to
|
||||
search the internet for answers first.
|
||||
|
||||
If you then still feel the need to ask a question and need clarification, you can use one of our
|
||||
[Community Channels](https://haystack.deepset.ai/community), Discord, in particular, is often very helpful.
|
||||
|
||||
## Reporting Bugs
|
||||
|
||||
### Before Submitting a Bug Report
|
||||
|
||||
A good bug report shouldn't leave others needing to chase you up for more information. Therefore, we ask you to
|
||||
investigate carefully, collect information, and describe the issue in detail in your report. Please complete the
|
||||
following steps in advance to help us fix any potential bug as fast as possible.
|
||||
|
||||
- Make sure that you are using the latest version.
|
||||
- Determine if your bug is really a bug and not an error on your side, for example using incompatible versions.
|
||||
Make sure that you have read the [documentation](https://docs.haystack.deepset.ai/docs/intro). If you are looking
|
||||
for support, you might want to check [this section](#i-have-a-question).
|
||||
- To see if other users have experienced (and potentially already solved) the same issue you are having, check if there
|
||||
is not already a bug report existing for your bug or error in the [bug tracker](/issues).
|
||||
- Also make sure to search the internet (including Stack Overflow) to see if users outside of the GitHub community have
|
||||
discussed the issue.
|
||||
- Collect information about the bug:
|
||||
- OS, Platform and Version (Windows, Linux, macOS, x86, ARM)
|
||||
- Version of Haystack and the integrations you're using
|
||||
- Possibly your input and the output
|
||||
- If you can reliably reproduce the issue, a snippet of code we can use
|
||||
|
||||
### How Do I Submit a Good Bug Report?
|
||||
|
||||
> [!IMPORTANT]
|
||||
> You must never report security-related issues, vulnerabilities, or bugs, including sensitive information to the issue
|
||||
> tracker, or elsewhere in public. Instead, sensitive bugs must be reported using [this link](https://github.com/deepset-ai/haystack/security/advisories/new).
|
||||
>
|
||||
|
||||
We use GitHub issues to track bugs and errors. If you run into an issue with the project:
|
||||
|
||||
- Open an [Issue of type Bug Report](/issues/new?assignees=&labels=bug&projects=&template=bug_report.md&title=).
|
||||
- Explain the behavior you would expect and the actual behavior.
|
||||
- Please provide as much context as possible and describe the *reproduction steps* that someone else can follow to
|
||||
recreate the issue on their own. This usually includes your code. For good bug reports, you should isolate the problem
|
||||
and create a reduced test case.
|
||||
- Provide the information you collected in the previous section.
|
||||
|
||||
Once it's filed:
|
||||
|
||||
- The project team will label the issue accordingly.
|
||||
- A team member will try to reproduce the issue with your provided steps. If there are no reproduction steps or no
|
||||
obvious way to reproduce the issue, the team will ask you for those steps.
|
||||
- If the team is able to reproduce the issue, the issue will scheduled for a fix or left to be
|
||||
[implemented by someone](#your-first-code-contribution).
|
||||
|
||||
## Suggesting Enhancements
|
||||
|
||||
This section guides you through submitting an enhancement suggestion, including new integrations and improvements
|
||||
to existing ones. Following these guidelines will help maintainers and the community to understand your suggestion and
|
||||
find related suggestions.
|
||||
|
||||
### Before Submitting an Enhancement
|
||||
|
||||
- Make sure that you are using the latest version.
|
||||
- Read the [documentation](https://docs.haystack.deepset.ai/docs/intro) carefully and find out if the functionality
|
||||
is already covered, maybe by an individual configuration.
|
||||
- Perform a [search](/issues) to see if the enhancement has already been suggested. If it has, add a comment to the
|
||||
existing issue instead of opening a new one.
|
||||
- Find out whether your idea fits with the scope and aims of the project. It's up to you to make a strong case to
|
||||
convince the project's developers of the merits of this feature. Keep in mind that we want features that will be
|
||||
useful to the majority of our users and not just a small subset. If you're just targeting a minority of users,
|
||||
consider writing and distributing the integration on your own.
|
||||
|
||||
### How Do I Submit a Good Enhancement Suggestion?
|
||||
|
||||
Enhancement suggestions are tracked as GitHub issues of type [Feature request for existing integrations](/issues/new?assignees=&labels=feature+request&projects=&template=feature-request-for-existing-integrations.md&title=).
|
||||
|
||||
- Use a **clear and descriptive title** for the issue to identify the suggestion.
|
||||
- Fill the issue following the template
|
||||
|
||||
## Contribute code
|
||||
|
||||
> [!IMPORTANT]
|
||||
> When contributing to this project, you must agree that you have authored 100% of the content, that you have the
|
||||
> necessary rights to the content and that the content you contribute may be provided under the project license.
|
||||
|
||||
### Where to start
|
||||
|
||||
If this is your first contribution, a good starting point is looking for an open issue that's marked with the label
|
||||
["good first issue"](https://github.com/deepset-ai/haystack/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
|
||||
The core contributors periodically mark certain issues as good for first-time contributors. Those issues are usually
|
||||
limited in scope, easily fixable and low priority, so there is absolutely no reason why you should not try fixing them,
|
||||
it's a good excuse to start looking into the project and a safe space for experimenting failure: if you don't get the
|
||||
grasp of something, pick another one! Once you become comfortable contributing to Haystack, you can have a look at the
|
||||
list of issues marked as [contributions wanted](https://github.com/orgs/deepset-ai/projects/14/views/1) to look for your
|
||||
next contribution!
|
||||
|
||||
### Setting up your development environment
|
||||
|
||||
Haystack makes heavy use of [Hatch](https://hatch.pypa.io/latest/), a Python project manager that we use to set up the
|
||||
virtual environments, build the project, and publish packages. As you can imagine, the first step towards becoming a
|
||||
Haystack contributor is installing Hatch. There are a variety of installation methods depending on your operating system
|
||||
platform, version, and personal taste: please have a look at [this page](https://hatch.pypa.io/latest/install/#installation)
|
||||
and keep reading once you can run from your terminal:
|
||||
|
||||
```console
|
||||
$ hatch --version
|
||||
Hatch, version 1.9.3
|
||||
```
|
||||
|
||||
You might need to install additional dependencies, depending on what exactly you will be working with. Refer to the relevant node's documentation to understand which dependencies are required.
|
||||
### Clone the git repository
|
||||
|
||||
### Installation
|
||||
You won't be able to make changes directly to this repo, so the first step is to [create a fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo).
|
||||
Once your fork is ready, you can clone a local copy with:
|
||||
|
||||
Now fork and clone the repo. From the terminal, run:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/<your-gh-username>/haystack.git
|
||||
```console
|
||||
$ git clone https://github.com/YOUR-USERNAME/haystack
|
||||
```
|
||||
|
||||
or use your favourite Git(Hub) client.
|
||||
If everything worked, you should be able to do something like this (the output might be different):
|
||||
|
||||
Then move into the cloned folder, create a virtualenv, and perform an **editable install**.
|
||||
|
||||
```bash
|
||||
# Move into the cloned folder
|
||||
cd haystack/
|
||||
|
||||
# Create a virtual environment
|
||||
python3 -m venv venv
|
||||
|
||||
# Activate the environment
|
||||
source venv/bin/activate
|
||||
|
||||
# Upgrade pip (very important!)
|
||||
pip install --upgrade pip
|
||||
|
||||
# Install Haystack in editable mode
|
||||
pip install -e '.[dev]'
|
||||
```console
|
||||
$ cd haystack
|
||||
$ hatch version
|
||||
2.3.0-rc0
|
||||
```
|
||||
Note that the `.[dev]` part is enough in many development scenarios when adding minor code fixes. However, if your changes require a schema change, then you'll need to install all dependencies with `pip install -e '.[all,dev]' ` command. Introducing new components or changing their interface requires a schema change.
|
||||
This will install all the dependencies you need to work on the codebase, plus testing and formatting dependencies.
|
||||
|
||||
Last, install the pre-commit hooks with:
|
||||
|
||||
@ -63,88 +181,71 @@ Last, install the pre-commit hooks with:
|
||||
pre-commit install
|
||||
```
|
||||
|
||||
This utility will run some tasks right before all `git commit` operations. From now on, your `git commit` output for Haystack should look something like this:
|
||||
This utility will run some tasks right before all `git commit` operations. From now on, your `git commit` output for
|
||||
Haystack should look something like this:
|
||||
|
||||
```
|
||||
> git commit -m "test"
|
||||
check python ast.....................................(no files to check)Skipped
|
||||
check python ast.........................................................Passed
|
||||
check json...........................................(no files to check)Skipped
|
||||
check yaml...............................................................Passed
|
||||
check toml...........................................(no files to check)Skipped
|
||||
fix end of files.........................................................Passed
|
||||
trim trailing whitespace.................................................Passed
|
||||
check for merge conflicts................................................Passed
|
||||
check that scripts with shebangs are executable..........................Passed
|
||||
check toml...........................................(no files to check)Skipped
|
||||
check yaml...........................................(no files to check)Skipped
|
||||
fix end of files.........................................................Passed
|
||||
mixed line ending........................................................Passed
|
||||
don't commit to branch...................................................Passed
|
||||
pretty format json...................................(no files to check)Skipped
|
||||
black................................................(no files to check)Skipped
|
||||
Update API documentation (slow)..........................................Passed
|
||||
[my-branch bc7f4af7] test
|
||||
1 files changed, 1 insertions(+), 0 deletions(-)
|
||||
trim trailing whitespace.................................................Passed
|
||||
black-jupyter............................................................Passed
|
||||
ruff.....................................................................Passed
|
||||
codespell................................................................Passed
|
||||
Lint GitHub Actions workflow files...................(no files to check)Skipped
|
||||
[massi/contrib d18a2577] test
|
||||
2 files changed, 178 insertions(+), 45 deletions(-)
|
||||
```
|
||||
|
||||
Note: If you prefer you can run this hook before `git push` instead of `git commit`. To do so, install the hook with `pre-commit install --hook-type pre-push`
|
||||
### Run the tests locally
|
||||
|
||||
Note: pre-commit hooks might fail. If that happens to you and you can't understand why, please do the following:
|
||||
- Ask for help by opening an issue or reaching out on our Discord channel. We usually give some feedback within a day for most questions.
|
||||
- As the last resort, if you are desperate and everything failed, ask Git to skip the hook with `git commit --no-verify`. This command will suspend all pre-commit hooks and let you push in all cases. The CI might fail, but at that point we will be able to help.
|
||||
- In case of further issues pushing your changes, please uninstall the hook with `pre-commit uninstall -t pre-commit -t pre-push` and review your Git setup.
|
||||
Tests will automatically run in our CI for every commit you push to your PR on Github. In order to save precious CI time
|
||||
we encourage you to run the tests locally before pushing new commits to Github. From the root of the git repository,
|
||||
you can run all the unit tests like this:
|
||||
|
||||
## Proposing "Substantial" Changes
|
||||
```sh
|
||||
hatch run test:unit
|
||||
```
|
||||
|
||||
Most of the changes to Haystack, including bug fixes and small improvements, are implemented through the normal Pull Request workflow, according to these contribution guidelines.
|
||||
Hatch will create a dedicated virtual environment, sync the required dependencies and run all the unit tests from the
|
||||
project. If you want to run a subset of the tests or even one test in particular, `hatch` will accept all the
|
||||
options you would normally pass to `pytest`, for example:
|
||||
|
||||
Some changes, though, are "substantial", and these are the ones we want to put through a bit of a design process to make sure we're all on the same page before we invest the time into the actual implementation of a new feature or a deep refactoring.
|
||||
```sh
|
||||
# run one test method from a specific test class in a test file
|
||||
hatch run test:unit test/test_logging.py::TestSkipLoggingConfiguration::test_skip_logging_configuration
|
||||
```
|
||||
|
||||
Please check the [Proposals design process](/proposals) if you think your contribution may include the following:
|
||||
We also use tools to ensure consistent code style, quality, and static type checking. The quality of your code will be
|
||||
tested by the CI, but once again, running the checks locally will speed up the review cycle. To check your code you
|
||||
can run:
|
||||
|
||||
- A new feature that creates new API surface areas.
|
||||
- A new component (Nodes, Pipelines, Document Stores).
|
||||
- Removing features that already shipped in the current minor version.
|
||||
- A deep refactoring that would require new tests or introduce new dependencies.
|
||||
- A change that's complex enough to require multiple steps to be delivered.
|
||||
```sh
|
||||
hatch run test:lint
|
||||
```
|
||||
|
||||
If the linters spot any error, you can fix it before checking in your code:
|
||||
|
||||
```sh
|
||||
hatch run test:lint-fix
|
||||
```
|
||||
|
||||
## Requirements for Pull Requests
|
||||
|
||||
To ease the review process, please follow the instructions in this paragraph when creating a Pull Request.
|
||||
To ease the review process, please follow the instructions in this paragraph when creating a Pull Request:
|
||||
|
||||
### PR Title
|
||||
We use elements from the [conventional commit convention](https://www.conventionalcommits.org/en/v1.0.0/) for our
|
||||
PR titles. Your PR title should follow the same structure of commit type, scope, and description as the convention
|
||||
specifies:
|
||||
|
||||
```
|
||||
<type>[optional scope]: <description>
|
||||
```
|
||||
|
||||
Common types are:
|
||||
- `feat` (for enhancements)
|
||||
- `bug` (for bug fixes)
|
||||
- `docs` (for changes to the documentation)
|
||||
- `test` (for changes to the tests)
|
||||
- `perf` (for performance improvements)
|
||||
- `refactor` (for code refactorings)
|
||||
|
||||
If your change is breaking backwards compatibility (e.g. introducing a new required parameter) use a `!` after the type
|
||||
to indicate that your PR contains breaking changes.
|
||||
|
||||
Examples:
|
||||
- `feat: add pinecone document store`
|
||||
- `bug: fix MultiLabel serialization`
|
||||
- `feat!: make all document store methods async`
|
||||
|
||||
### PR Description
|
||||
Please use the existing [pull request template](https://github.com/deepset-ai/haystack/blob/main/.github/pull_request_template.md)
|
||||
for describing and documenting your changes:
|
||||
|
||||
- Link the issue that this relates to
|
||||
- What is changing?
|
||||
- What are limitations?
|
||||
- Breaking changes (Example of before vs. after)
|
||||
- How did you test the change?
|
||||
- For the title, use the [conventional commit convention](https://www.conventionalcommits.org/en/v1.0.0/).
|
||||
- For the body, follow the existing [pull request template](https://github.com/deepset-ai/haystack/blob/main/.github/pull_request_template.md) to describe and document your changes.
|
||||
|
||||
### Release notes
|
||||
|
||||
Each PR should include a file under the `releasenotes/notes` path created with `reno`, unless the PR doesn't have any
|
||||
impact on Haystack behavior, for example: tests and documentation updates, changes in code comments or docstrings,
|
||||
changes to the CI/CD systems. PRs that don't require a release note file must be labeled with `ignore-for-release-notes`
|
||||
@ -159,9 +260,10 @@ Created new notes file in releasenotes/notes/bump-transformers-to-4-31-a645334d5
|
||||
```
|
||||
`reno` will create the file appending a unique id to the name of the release note you provided (in this case,
|
||||
`bump-transformers-to-4-31`). You don't need to worry about duplicated entries, but be sure to use meaningful names
|
||||
so it's easier to find the file later. You can now edit the file to add the release note. In the file, you will find multiple
|
||||
sections along with an explanation of what they're for. Normally, you only need to fill one section; in this case, you
|
||||
would fill the `enhancements` section to describe the change:
|
||||
so it's easier to find the file later. You can now edit the file to add the release note. In the file, you will find
|
||||
multiple sections along with an explanation of what they're for. Normally, you only need to fill in one section; in this
|
||||
case, you would fill in the `enhancements` section to describe the change:
|
||||
|
||||
```yaml
|
||||
enhancements:
|
||||
- |
|
||||
@ -172,7 +274,8 @@ code you changed.
|
||||
|
||||
## CI (Continuous Integration)
|
||||
|
||||
We use GitHub Action for our Continuous Integration tasks. This means that, as soon as you open a PR, GitHub will start executing some workflows on your code, like automated tests, linting, formatting, api docs generation, etc.
|
||||
We use GitHub Action for our Continuous Integration tasks. This means that as soon as you open a PR, GitHub will start
|
||||
executing some workflows on your changes, like automated tests, linting, formatting, api docs generation, etc.
|
||||
|
||||
If all goes well, at the bottom of your PR page you should see something like this, where all checks are green.
|
||||
|
||||
@ -187,79 +290,15 @@ For example, in the case above, the CI will give you instructions on how to fix
|
||||
|
||||

|
||||
|
||||
## Working from Github forks
|
||||
## Working from GitHub forks
|
||||
|
||||
In order for maintainers to be able to help you, we usually ask contributors to give us push access to their fork.
|
||||
|
||||
To do so, please verify that "Allow edits and access to secrets by maintainers" on the PR preview page is checked (you can check it later on the PR's sidebar once it's created).
|
||||
To do so, please verify that "Allow edits and access to secrets by maintainers" on the PR preview page is checked
|
||||
(you can check it later on the PR's sidebar once it's created).
|
||||
|
||||

|
||||
|
||||
|
||||
## Running the tests
|
||||
|
||||
Tests will automatically run in our CI for every commit you push to your PR on Github. This is usually the most convenient
|
||||
way and we encourage you to create early "draft pull requests" to leverage the CI at an early stage.
|
||||
|
||||
Tests can also be executed locally, by launching `pytest` from the `/test` folder (this is important because running from the
|
||||
root folder would also execute the rest API tests, that require specific dependencies).
|
||||
|
||||
You can control which tests to run using Pytest markers, let's see how.
|
||||
|
||||
### Running a subset of tests (recommended)
|
||||
In most cases you rather want to run a **subset of tests** locally that are related to your dev, and the most important
|
||||
option to reduce the number of tests in a meaningful way, is to run tests only for a list of **selected document stores**.
|
||||
This is possible by adding the `--document_store_type` arg to your `pytest` command (possible values are: `"elasticsearch, faiss, memory, weaviate, pinecone"`).
|
||||
|
||||
For example, calling `pytest . --document_store_type="memory"` will run all the document store tests using the
|
||||
InMemoryDocumentStore only, skipping the others (the logs will show which ones). The `InMemoryDocument` store is a very
|
||||
good starting point as it doesn't require any external resource:
|
||||
```
|
||||
pytest . --document_store_type="memory"
|
||||
```
|
||||
|
||||
You can also run the tests using a **combination of document stores**, provided the corresponding services are up and
|
||||
running in your local environment. For example, Elasticsearch must be running before launching the following:
|
||||
```
|
||||
pytest . --document_store_type="memory,elasticsearch"
|
||||
```
|
||||
|
||||
**Note:** we recommend using Docker containers to run document stores locally:
|
||||
```
|
||||
# Elasticsearch
|
||||
docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms128m -Xmx256m" elasticsearch:7.9.2
|
||||
|
||||
# Weaviate
|
||||
docker run -d -p 8080:8080 --name haystack_test_weaviate --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' --env ENABLE_EXPERIMENTAL_BM25='true' semitechnologies/weaviate:1.14.1
|
||||
|
||||
# Tika
|
||||
docker run -d -p 9998:9998 -e "TIKA_CHILD_JAVA_OPTS=-JXms128m" -e "TIKA_CHILD_JAVA_OPTS=-JXmx128m" apache/tika:1.28.4
|
||||
```
|
||||
|
||||
Tests can be also run **individually**:
|
||||
```
|
||||
pytest -v test_retriever.py::test_dpr_embedding
|
||||
```
|
||||
|
||||
Or you can select a **logical subset of tests** via markers and the optional "not" keyword:
|
||||
```
|
||||
pytest -m not elasticsearch
|
||||
pytest -m elasticsearch
|
||||
pytest -m generator
|
||||
pytest -m tika
|
||||
pytest -m not slow
|
||||
...
|
||||
```
|
||||
|
||||
### Running all the tests
|
||||
**Important**: If you want to run **all** the tests locally, you'll need **all** document stores running in the background
|
||||
before you run the tests. Many of the tests will then be executed multiple times with different document stores.
|
||||
|
||||
To run all tests, from the `/test` folder just run:
|
||||
```
|
||||
pytest
|
||||
```
|
||||
|
||||
## Writing tests
|
||||
|
||||
We formally define three scopes for tests in Haystack with different requirements and purposes:
|
||||
@ -270,7 +309,7 @@ We formally define three scopes for tests in Haystack with different requirement
|
||||
- Any external resource is mocked
|
||||
- Always returns the same result
|
||||
- Can run in any order
|
||||
- Runs at every commit in draft and ready PRs, automated through pytest
|
||||
- Runs at every commit in PRs, automated through `hatch run test:unit`
|
||||
- Can run locally with no additional setup
|
||||
- **Goal: being confident in merging code**
|
||||
|
||||
@ -281,7 +320,7 @@ We formally define three scopes for tests in Haystack with different requirement
|
||||
- When using models, cannot use inference
|
||||
- Always returns the same result or an error
|
||||
- Can run in any order
|
||||
- Runs at every commit in ready PRs, automated through pytest
|
||||
- Runs at every commit in PRs, automated through `hatch run test:integration`
|
||||
- Can run locally with some additional setup (e.g. Docker)
|
||||
- **Goal: being confident in merging code**
|
||||
|
||||
@ -298,76 +337,18 @@ We formally define three scopes for tests in Haystack with different requirement
|
||||
- Might not be possible to run locally due to system and hardware requirements
|
||||
- **Goal: being confident in releasing Haystack**
|
||||
|
||||
> **Note**: migrating the existing tests into these new categories is still in progress. Please ask the maintainers if you are in doubt about how to
|
||||
classify your tests or where to place them.
|
||||
|
||||
If you are writing a test that depend on a document store, there are a few conventions to define on which document store
|
||||
type this test should/can run:
|
||||
|
||||
### Option 1: The test should run on all document stores / those supplied in the CLI arg `--document_store_type`:
|
||||
Use one of the fixtures `document_store` or `document_store_with_docs` or `document_store_type`.
|
||||
Do not parameterize it yourself.
|
||||
|
||||
Example:
|
||||
```
|
||||
def test_write_with_duplicate_doc_ids(document_store):
|
||||
...
|
||||
document_store.write(docs)
|
||||
....
|
||||
|
||||
```
|
||||
|
||||
### Option 2: The test is only compatible with certain document stores:
|
||||
Some tests you don't want to run on all possible document stores. Either because the test is specific to one/few doc store(s) or the test is not really document store related and it's enough to test it on one document store and speed up the execution time.
|
||||
|
||||
Example:
|
||||
```
|
||||
# Currently update_document_meta() is not implemented for InMemoryDocStore so it's not listed here as an option
|
||||
|
||||
@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss"], indirect=True)
|
||||
def test_update_meta(document_store):
|
||||
....
|
||||
```
|
||||
|
||||
### Option 3: The test is not using a `document_store`/ fixture, but still has a hard requirement for a certain document store:
|
||||
Example:
|
||||
```
|
||||
@pytest.mark.elasticsearch
|
||||
def test_elasticsearch_custom_fields(elasticsearch_fixture):
|
||||
client = Elasticsearch()
|
||||
client.indices.delete(index='haystack_test_custom', ignore=[404])
|
||||
document_store = ElasticsearchDocumentStore(index="haystack_test_custom", text_field="custom_text_field",
|
||||
embedding_field="custom_embedding_field")
|
||||
```
|
||||
|
||||
## Code format and style
|
||||
|
||||
We use [Black](https://github.com/psf/black) to ensure consistent code style, [mypy](http://mypy-lang.org/) for static type checking and
|
||||
[pylint](https://pylint.org/) for linting and code quality.
|
||||
|
||||
All checks and autoformatting happen on the CI, so in general you don't need to worry about configuring them in your local environment.
|
||||
However, should you prefer to execute them locally, here are a few details about the setup.
|
||||
|
||||
### Black
|
||||
Black runs with no other configuration than increasing line length to 120 characters. Its configuration can be found in `pyproject.toml`.
|
||||
|
||||
You can run it with `python -m black .` from the root folder.
|
||||
|
||||
### Mypy
|
||||
Mypy currently runs with limited configuration options that can be found at the bottom of `pyproject.toml`.
|
||||
|
||||
You can run it with `python -m mypy haystack/ rest_api/` from the root folder.
|
||||
|
||||
### Pylint
|
||||
Pylint is still being integrated in Haystack. The current exclusion list is very long, and can be found in `pyproject.toml`.
|
||||
|
||||
You can run it with `python -m pylint haystack/ rest_api/ -ry` from the root folder.
|
||||
|
||||
|
||||
## Contributor Licence Agreement (CLA)
|
||||
|
||||
Significant contributions to Haystack require a Contributor License Agreement (CLA). If the contribution requires a CLA, we will get in contact with you. CLAs are quite common among company backed open-source frameworks and our CLA’s wording is similar to other popular projects, like [Rasa](https://cla-assistant.io/RasaHQ/rasa) or [Google's Tensorflow](https://cla.developers.google.com/clas/new?domain=DOMAIN_GOOGLE&kind=KIND_INDIVIDUAL) (retrieved 4th November 2021).
|
||||
Significant contributions to Haystack require a Contributor License Agreement (CLA). If the contribution requires a CLA,
|
||||
we will get in contact with you. CLAs are quite common among company-backed open-source frameworks, and our CLA’s wording
|
||||
is similar to other popular projects, like [Rasa](https://cla-assistant.io/RasaHQ/rasa) or
|
||||
[Google's Tensorflow](https://cla.developers.google.com/clas/new?domain=DOMAIN_GOOGLE&kind=KIND_INDIVIDUAL)
|
||||
(retrieved 4th November 2021).
|
||||
|
||||
The agreement's main purpose is to protect the continued open use of Haystack. At the same time it also helps in protecting you as a contributor. Contributions under this agreement will ensure that your code will continue to be open to everyone in the future (“You hereby grant to Deepset **and anyone** [...]”) as well as removing liabilities on your end (“you provide your Contributions on an AS IS basis, without warranties or conditions of any kind [...]”). You can find the Contributor Licence Agreement [here](https://cla-assistant.io/deepset-ai/haystack).
|
||||
The agreement's main purpose is to protect the continued open use of Haystack. At the same time, it also helps in
|
||||
\protecting you as a contributor. Contributions under this agreement will ensure that your code will continue to be
|
||||
open to everyone in the future (“You hereby grant to Deepset **and anyone** [...]”) as well as remove liabilities on
|
||||
your end (“you provide your Contributions on an AS IS basis, without warranties or conditions of any kind [...]”). You
|
||||
can find the Contributor Licence Agreement [here](https://cla-assistant.io/deepset-ai/haystack).
|
||||
|
||||
If you have further questions about the licensing feel free to reach out to contributors@deepset.ai.
|
||||
If you have further questions about the licensing, feel free to reach out to contributors@deepset.ai.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user