haystack/docs/decisions/pydantic-dataclasses.md
Sara Zan 8de1aa3e43
Pylint: solve or silence locally rare warnings (#2170)
* Remove invalid-envvar-default and logging-too-many-args

* Remove import-self, access-member-before-definition and deprecated-argument

* Remove used-before-assignment by restructuring type import

* Remove unneeded-not

* Silence unnecessary-lambda (it's necessary)

* Remove pointless-string-statement

* Update Documentation & Code Style

* Silenced unsupported-membership-test (probably a real bug, can't fix though)

* Remove trailing-newlines

* Remove super-init-not-called and slience invalid-sequence-index (it's valid)

* Remove invalid-envvar-default in ui

* Remove some more warnings from pyproject.toml than actually solrted in code, CI will fail

* Linting all modules together is more readable

* Update Documentation & Code Style

* Typo in pylint disable comment

* Simplify long boolean statement

* Simplify init call in FAISS

* Fix inconsistent-return-statements

* Fix useless-super-delegation

* Fix useless-else-on-loop

* Fix another inconsistent-return-statements

* Move back pylint disable comment moved by black

* Fix consider-using-set-comprehension

* Fix another consider-using-set-comprehension

* Silence non-parent-init-called

* Update pylint exclusion list

* Update Documentation & Code Style

* Resolve unnecessary-else-after-break

* Fix superfluous-parens

* Fix no-else-break

* Remove is_correctly_retrieved along with its pylint issue

* Update exclusions list

* Silence constructor issue in squad_data.py (method is already broken)

* Fix too-many-return-statements

* Fix use-dict-literal

* Fix consider-using-from-import and useless-object-inheritance

* Update exclusion list

* Fix simplifiable-if-statements

* Fix one consider-using-dict-items

* Fix another consider-using-dict-items

* Fix a third consider-using-dict-items

* Fix last consider-using-dict-items

* Fix three use-a-generator

* Silence import errors on numba, tensorboardX and apex, but add comments & logs

* Fix couple of mypy issues

* Fix another typing issue

* Silence mypy, was conflicting with more meaningful pylint issue

* Fix no-else-continue

* Silence unsubscriptable-object and fix an import error with importlib.metadata

* Update Documentation & Code Style

* Fix all no-else-raise

* Update Documentation & Code Style

* Fix inverted parameters in simplified if switch

* Change [test] to [all] in some jobs (for typing and linting)

* Add comment in haystack/schema.py on pydantic's dataclasses

* Move comment from get_documents_by_id into _convert_weaviate_result_to_document in weaviate.py

* Add comment on pylint silencing

* Fix bug introduced rest_api/controller/search.py

* Update Documentation & Code Style

* Add ADR about Pydantic dataclasses

* Update pydantic-dataclasses.md

* Add link to Pydantic docs on Dataclasses

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 20:16:14 +01:00

3.0 KiB

Dataclasses

  • Status: accepted
  • Deciders: @tholor
  • Date: 2021-10-14

Technical Story: https://github.com/deepset-ai/haystack/pull/1598

Context and Problem Statement

Originally we implemented Haystack's primitive based on Python's vanilla dataclasses. However, shortly after we realized this causes issues with FastAPI, which uses Pydantic's implementation. We need to decide which version (vanilla Python's or Pydantic's) to use in our codebase.

Decision Drivers

  • The Swagger autogenerated documentation for REST API in FastAPI was broken where the dataclasses include non-standard fields (pd.dataframe + np.ndarray)

Considered Options

  • Switch to Pydantic dataclasses in our codebase as well.
  • Staying with vanilla dataclasses and find a workaround for FastAPI to accept them in place of Pydantic's implementation.

Decision Outcome

Chosen option: 1, because our initial concerns about speed proved negligible and Pydantic's implementation provided some additional functionality for free (see below).

Positive Consequences

  • We can now inherit directly from the primitives in the REST API dataclasses, and overwrite the problematic fields with standard types.
  • We now get runtime type checks "for free", as this is a core feature of Pydantic's implementation.

Negative Consequences

  • Pydantic dataclasses are slower. See https://github.com/deepset-ai/haystack/pull/1598 for a rough performance assessment.
  • Pydantic dataclasses do not play nice with mypy and autocomplete tools unaided. In many cases a complex import statement, such as the following, is needed:
if typing.TYPE_CHECKING:
    from dataclasses import dataclass
else:
    from pydantic.dataclasses import dataclass

Pros and Cons of the Options

Switch to Pydantic dataclasses

  • Good, because it solves the issue without having to find workarounds for FastAPI.
  • Good, because it adds type checks at runtime.
  • Bad, because mypy and autocomplete tools need assistance to parse its dataclasses properly. Example:
if typing.TYPE_CHECKING:
    from dataclasses import dataclass
else:
    from pydantic.dataclasses import dataclass
  • Bad, because it introduces an additional dependency to Haystack (negligible)
  • Bad, because it adds some overhead on the creation of primitives (negligible)

Staying with vanilla dataclasses

  • Good, because it's Python's standard way to generate data classes
  • Good, because mypy can deal with them without plugins or other tricks.
  • Good, because it's faster than Pydantic's implementation.
  • Bad, because does not play well with FastAPI and Swagger (critical).
  • Bad, because it has no validation at runtime (negligible)