2022-04-10 21:11:31 -07:00

4.1 KiB

description
Configure Python and test the Ingestion Framework

Ingestion Framework

Prerequisites

The Ingestion Framework is a Python module that wraps the OpenMetadata API and builds workflows and utilities on top of it. Therefore, you need to make sure that you have the complete OpenMetadata stack running: MySQL + ElasticSearch + OpenMetadata Server.

To do so, you can either build and run the OpenMetadata Server locally as well, or use the metadata CLI to spin up the Docker containers.

Python Setup

We recommend using pyenv to properly install and manage different Python versions in your system. Note that OpenMetadata requires Python version +3.8. This doc might be helpful to set up the environment virtualization.

Generated Sources

The backbone of OpenMetadata is the series of JSON schemas defining the Entities and their properties.

All different parts of the code rely on those definitions. The first step to start developing new connectors is to properly set up your local environment to interact with the Entities.

In the Ingestion Framework, this process is handled with datamodel-code-generator, which is able to read JSON schemas and automatically prepare pydantic models representing the input definitions. Please, make sure to run make install_dev generate from the project root to fill the ingestion/src/metadata/generated directory with the required models.

Once you have generated the sources, you should be able to run the tests and the metadata CLI. You can test your setup by running make coverage and see if you get any errors.

Quality tools

When working on the Ingestion Framework, you might want to take into consideration the following style-check tooling:

  • pylint is a Static Code Analysis tool to catch errors, align coding standards and help us follow conventions and apply improvements.
  • black can be used to both autoformat the code and validate that the codebase is compliant.
  • isort helps us not lose time trying to find the proper combination of importing from stdlib, requirements, project files…

The main goal is to ensure standardized formatting throughout the codebase.

When developing, you can run these tools with make recipes: make lint, make black and make isort. Note that we are excluding the generated sources from the JSON Schema standards.

If you want to take this one step further and make sure that you are not committing any malformed changes, you can use pre-commit hooks. This is a powerful tool that allows us to run specific validations at commit time. If those validations fail, the commit won't proceed. The interesting point is that the tools are going to fix your code for you, so you can freely try to commit again!

You can install our hooks via make precommit_install.

Tooling Status

We are currently using:

  • pylint & black in the CI validations, so make sure to review your PRs for any warnings you generated.
  • black & isort in the pre-commit hooks.

Run Integration Tests

Run MySQL test

Run the following commands from the top-level directory

python3 -m venv /tmp/venv
source /tmp/venv/bin/activate
pip install -r ingestion/requirements.txt
pip install -e ingestion
pip install pytest
pip install pytest-docker
cd ingestion/tests/integration/mysql
pytest -s -c /dev/null

Run MsSQL test

cd ingestion
source env/bin/activate
cd tests/integration/mssql
pytest -s -c /dev/null

Run Postgres test

cd ingestion
source env/bin/activate
cd tests/integration/postgres
pytest -s -c /dev/null

Run LDAP test

python3 -m venv /tmp/venv
source /tmp/venv/bin/activate
pip install -r ingestion/requirements.txt
pip install -e ingestion
pip install pytest
pip install pytest-docker
cd ingestion/tests/integration/ldap
pytest -s -c /dev/null

Run Hive test

python3 -m venv /tmp/venv
source /tmp/venv/bin/activate
pip install -r inges