GitBook: [#110] Python dev docs

This commit is contained in:
pmbrull 2022-03-22 15:33:07 +00:00 committed by Sriharsha Chintalapani
parent 7c2271c953
commit 9ff6b928e5
9 changed files with 61 additions and 32 deletions

View File

@ -223,10 +223,11 @@
* [How to Contribute](open-source-community/developer/how-to-contribute.md)
* [Prerequisites](open-source-community/developer/prerequisites.md)
* [Backend](open-source-community/developer/backend/README.md)
* [Build the code & run tests](open-source-community/developer/build-code-run-tests.md)
* [Build the code & run tests](open-source-community/developer/build-code-run-tests/README.md)
* [OpenMetadata Server](open-source-community/developer/build-code-run-tests/openmetadata-server.md)
* [Ingestion Framework](open-source-community/developer/build-code-run-tests/ingestion-framework.md)
* [Quick Start Guide](open-source-community/developer/quick-start-guide.md)
* [Build a Connector](open-source-community/developer/build-a-connector/README.md)
* [Setup](open-source-community/developer/build-a-connector/setup.md)
* [Source](open-source-community/developer/build-a-connector/source.md)
* [Sink](open-source-community/developer/build-a-connector/sink.md)
* [Stage](open-source-community/developer/build-a-connector/stage.md)

View File

@ -18,8 +18,8 @@ This document summarizes information relevant to OpenMetadata committers and con
[backend](backend/)
{% endcontent-ref %}
{% content-ref url="build-code-run-tests.md" %}
[build-code-run-tests.md](build-code-run-tests.md)
{% content-ref url="build-code-run-tests/" %}
[build-code-run-tests](build-code-run-tests/)
{% endcontent-ref %}
{% content-ref url="quick-start-guide.md" %}

View File

@ -24,10 +24,6 @@ Workflow execution happens in a serial fashion.
In the cases where we need aggregation over the records, we can use the **stage** to write to a file or other store. Use the file written to in **stage** and pass it to **bulk sink** to publish to external services such as **OpenMetadata** or **Elasticsearch**.
{% content-ref url="setup.md" %}
[setup.md](setup.md)
{% endcontent-ref %}
{% content-ref url="source.md" %}
[source.md](source.md)
{% endcontent-ref %}

View File

@ -0,0 +1,13 @@
---
description: Learn how to build and run the building blocks of OpenMetadata
---
# Build the code & run tests
{% content-ref url="openmetadata-server.md" %}
[openmetadata-server.md](openmetadata-server.md)
{% endcontent-ref %}
{% content-ref url="ingestion-framework.md" %}
[ingestion-framework.md](ingestion-framework.md)
{% endcontent-ref %}

View File

@ -1,8 +1,18 @@
---
description: Let's review the Python tooling to start working on the Ingestion Framework.
description: Configure Python and test the Ingestion Framework
---
# Setup
# Ingestion Framework
## Prerequisites
The Ingestion Framework is a Python module that wraps the OpenMetadata API and builds workflows and utilities on top of it. Therefore, you need to make sure that you have the complete OpenMetadata stack running: MySQL + ElasticSearch + OpenMetadata Server.
To do so, you can either build and run the [OpenMetadata Server](openmetadata-server.md) locally as well, or use the `metadata` CLI to spin up the [Docker containers](../../../overview/run-openmetadata/).
## Python Setup
We recommend using `pyenv` to properly install and manage different Python versions in your system. Note that OpenMetadata requires Python version +3.8. This [doc](https://python-docs.readthedocs.io/en/latest/dev/virtualenvs.html) might be helpful to set up the environment virtualization.
### Generated Sources
@ -12,6 +22,8 @@ All different parts of the code rely on those definitions. The first step to sta
In the Ingestion Framework, this process is handled with `datamodel-code-generator`, which is able to read JSON schemas and automatically prepare `pydantic` models representing the input definitions. Please, make sure to run `make install_dev generate` from the project root to fill the `ingestion/src/metadata/generated` directory with the required models.
Once you have generated the sources, you should be able to run the tests and the `metadata` CLI. You can test your setup by running `make coverage` and see if you get any errors.
### Quality tools
When working on the Ingestion Framework, you might want to take into consideration the following style-check tooling:
@ -20,7 +32,7 @@ When working on the Ingestion Framework, you might want to take into considerati
* [black](https://black.readthedocs.io/en/stable/) can be used to both autoformat the code and validate that the codebase is compliant.
* [isort](https://pycqa.github.io/isort/) helps us not lose time trying to find the proper combination of importing from `stdlib`, requirements, project files…
The main goal is to ensure standardised formatting throughout the codebase.
The main goal is to ensure standardized formatting throughout the codebase.
When developing, you can run these tools with `make` recipes: `make lint`, `make black` and `make isort`. Note that we are excluding the generated sources from the JSON Schema standards.
@ -34,3 +46,4 @@ We are currently using:
* `pylint` & `black` in the CI validations, so make sure to review your PRs for any warnings you generated.
* `black` & `isort` in the pre-commit hooks.

View File

@ -1,4 +1,10 @@
# Build the code & run tests
---
description: >-
Learn how to run the OpenMetadata server in development mode by using Docker
and IntelliJ.
---
# OpenMetadata Server
## Prerequisites
@ -11,7 +17,7 @@
```
* Bootstrap MySQL with tables
1. Create a distribution as explained [here](build-code-run-tests.md#create-a-distribution-packaging)
1. Create a distribution as explained [here](openmetadata-server.md#create-a-distribution-packaging)
2. Extract the distribution tar.gz file and run the following command
```
@ -20,7 +26,7 @@
```
* Bootstrap ES with indexes and load sample data into MySQL
1. Run OpenMetadata service instances through IntelliJ IDEA following the instructions [here](build-code-run-tests.md#run-instance-through-intellij-idea)
1. Run OpenMetadata service instances through IntelliJ IDEA following the instructions [here](openmetadata-server.md#run-instance-through-intellij-idea)
2. Once the logs indicate that the instance is up, run the following commands from the top-level directory
```
@ -34,7 +40,7 @@
metadata ingest -c ./pipelines/sample_usage.json
metadata ingest -c ./pipelines/metadata_to_es.json
```
* You are now ready to explore the app by going to http://localhost:8585 \*If the web page doesn't work as intended, please take a look at the troubleshooting steps [here](build-code-run-tests.md#troubleshooting)
* You are now ready to explore the app by going to http://localhost:8585 \*If the web page doesn't work as intended, please take a look at the troubleshooting steps [here](openmetadata-server.md#troubleshooting)
## Building
@ -67,33 +73,33 @@ Add a new Run/Debug configuration like the below screenshot.
2. Click on "Edit Configurations"
3. Click + sign and Select Application and make sure your config looks similar to the below image
![Intellij Runtime Configuration](<../../.gitbook/assets/Intellij-Runtime Config.png>)
![Intellij Runtime Configuration](<../../../.gitbook/assets/Intellij-Runtime Config.png>)
## Add missing dependency
Right-click on catalog-rest-service
![](../../../.gitbook/assets/image-1-.png)
![](../../../../.gitbook/assets/image-1-.png)
Click on "Open Module Settings"
![](../../../.gitbook/assets/image-2-.png)
![](../../../../.gitbook/assets/image-2-.png)
Go to "Dependencies"
![](../../../.gitbook/assets/image-3-.png)
![](../../../../.gitbook/assets/image-3-.png)
Click “+” at the bottom of the dialog box and click "Add"
![](../../../.gitbook/assets/image-4-.png)
![](../../../../.gitbook/assets/image-4-.png)
Click on Library
![](../../../.gitbook/assets/image-5-.png)
![](../../../../.gitbook/assets/image-5-.png)
In that list look for "jersey-client:2.25.1"
![](../../../.gitbook/assets/image-6-.png)
![](../../../../.gitbook/assets/image-6-.png)
Select it and click "OK". Now run/debug the application.
@ -104,7 +110,7 @@ Select it and click "OK". Now run/debug the application.
* If ElasticSearch in Docker on Mac is crashing, try changing Preferences -> Resources -> Memory to 4GB
* If ElasticSearch logs show `high disk watermark [90%] exceeded`, try changing Preferences -> Resources -> Disk Image Size to at least 16GB
* `Public Key Retrieval is not allowed` - verify that the JDBC connect URL in `conf/openmetadata.yaml` is configured with the parameter `allowPublicKeyRetrieval=true`
* Browser console shows javascript errors, try doing a [clean build](build-code-run-tests.md#building). Some npm packages may not have been built properly.
* Browser console shows javascript errors, try doing a [clean build](openmetadata-server.md#building). Some npm packages may not have been built properly.
## Coding Style

View File

@ -36,7 +36,7 @@ git remote -v
git checkout -b ISSUE-200
```
Make changes. Follow the [Coding Style](https://github.com/open-metadata/OpenMetadata/blob/main/docs/open-source-community/developer/docs/open-source-community/developer/backend/coding-style.md) Guide on best practices and [Build the code & run tests](build-code-run-tests.md) on how to set up IntelliJ, Maven.
Make changes. Follow the [Coding Style](https://github.com/open-metadata/OpenMetadata/blob/main/docs/open-source-community/developer/docs/open-source-community/developer/backend/coding-style.md) Guide on best practices and [Build the code & run tests](build-code-run-tests/) on how to set up IntelliJ, Maven.
## Push your changes to Github

View File

@ -170,7 +170,7 @@ The same would happen if, inside the actual OpenMetadata code, there was not a w
As OpenMetadata is a data-centric solution, we need to make sure we have the right ingredients at all times. That is why we have developed a high-level Python API, using `pydantic` models automatically generated from the JSON Schemas.
> OBS: If you are using a [published](https://pypi.org/project/openmetadata-ingestion/) version of the Ingestion Framework, you are already good to go, as we package the code with the `metadata.generated` module. If you are developing a new feature, you can get more information [here](build-a-connector/setup.md).
> OBS: If you are using a [published](https://pypi.org/project/openmetadata-ingestion/) version of the Ingestion Framework, you are already good to go, as we package the code with the `metadata.generated` module. If you are developing a new feature, you can get more information [here](broken-reference).
This API wrapper helps developers and consumers in:

View File

@ -3,13 +3,14 @@
{% hint style="info" %}
**The integration tests don't work at the moment.**
Make sure OpenMetadata is up and running. Refer to instructions [building and running](build-code-run-tests.md).
Make sure OpenMetadata is up and running. Refer to instructions [building and running](build-code-run-tests/).
{% endhint %}
## Run MySQL test
Run the following commands from the top-level directory
```text
```
python3 -m venv /tmp/venv
source /tmp/venv/bin/activate
pip install -r ingestion/requirements.txt
@ -22,7 +23,7 @@ pytest -s -c /dev/null
## Run MsSQL test
```text
```
cd ingestion
source env/bin/activate
cd tests/integration/mssql
@ -31,7 +32,7 @@ pytest -s -c /dev/null
## Run Postgres test
```text
```
cd ingestion
source env/bin/activate
cd tests/integration/postgres
@ -40,7 +41,7 @@ pytest -s -c /dev/null
## Run LDAP test
```text
```
python3 -m venv /tmp/venv
source /tmp/venv/bin/activate
pip install -r ingestion/requirements.txt
@ -53,7 +54,7 @@ pytest -s -c /dev/null
## Run Hive test
```text
```
python3 -m venv /tmp/venv
source /tmp/venv/bin/activate
pip install -r ingestion/requirements.txt
@ -64,4 +65,3 @@ pip install pyhive thrift sasl thrift_sasl
cd ingestion/tests/integration/hive
pytest -s -c /dev/null
```