GitBook: [#110] Python dev docs

2026-01-06 12:36:56 +00:00 · 2022-03-22 15:33:07 +00:00 · 2022-03-22 15:33:07 +00:00 · 9ff6b928e5
commit 9ff6b928e5
parent 7c2271c953
9 changed files with 61 additions and 32 deletions
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@ -223,10 +223,11 @@
  * [How to Contribute](open-source-community/developer/how-to-contribute.md)
  * [Prerequisites](open-source-community/developer/prerequisites.md)
  * [Backend](open-source-community/developer/backend/README.md)
-  * [Build the code & run tests](open-source-community/developer/build-code-run-tests.md)
+  * [Build the code & run tests](open-source-community/developer/build-code-run-tests/README.md)
+    * [OpenMetadata Server](open-source-community/developer/build-code-run-tests/openmetadata-server.md)
+    * [Ingestion Framework](open-source-community/developer/build-code-run-tests/ingestion-framework.md)
  * [Quick Start Guide](open-source-community/developer/quick-start-guide.md)
  * [Build a Connector](open-source-community/developer/build-a-connector/README.md)
-    * [Setup](open-source-community/developer/build-a-connector/setup.md)
    * [Source](open-source-community/developer/build-a-connector/source.md)
    * [Sink](open-source-community/developer/build-a-connector/sink.md)
    * [Stage](open-source-community/developer/build-a-connector/stage.md)
--- a/docs/open-source-community/developer/README.md
+++ b/docs/open-source-community/developer/README.md
@ -18,8 +18,8 @@ This document summarizes information relevant to OpenMetadata committers and con
 [backend](backend/)
 {% endcontent-ref %}

-{% content-ref url="build-code-run-tests.md" %}
-[build-code-run-tests.md](build-code-run-tests.md)
+{% content-ref url="build-code-run-tests/" %}
+[build-code-run-tests](build-code-run-tests/)
 {% endcontent-ref %}

 {% content-ref url="quick-start-guide.md" %}
--- a/docs/open-source-community/developer/build-a-connector/README.md
+++ b/docs/open-source-community/developer/build-a-connector/README.md
@ -24,10 +24,6 @@ Workflow execution happens in a serial fashion.

 In the cases where we need aggregation over the records, we can use the **stage** to write to a file or other store. Use the file written to in **stage** and pass it to **bulk sink** to publish to external services such as **OpenMetadata** or **Elasticsearch**.

-{% content-ref url="setup.md" %}
-[setup.md](setup.md)
-{% endcontent-ref %}
-
 {% content-ref url="source.md" %}
 [source.md](source.md)
 {% endcontent-ref %}
--- a/docs/open-source-community/developer/build-code-run-tests/README.md
+++ b/docs/open-source-community/developer/build-code-run-tests/README.md
@ -0,0 +1,13 @@
+---
+description: Learn how to build and run the building blocks of OpenMetadata
+---
+
+# Build the code & run tests
+
+{% content-ref url="openmetadata-server.md" %}
+[openmetadata-server.md](openmetadata-server.md)
+{% endcontent-ref %}
+
+{% content-ref url="ingestion-framework.md" %}
+[ingestion-framework.md](ingestion-framework.md)
+{% endcontent-ref %}
--- a/docs/open-source-community/developer/build-code-run-tests/ingestion-framework.md
+++ b/docs/open-source-community/developer/build-code-run-tests/ingestion-framework.md
@ -1,8 +1,18 @@
 ---
-description: Let's review the Python tooling to start working on the Ingestion Framework.
+description: Configure Python and test the Ingestion Framework
 ---

-# Setup
+# Ingestion Framework
+
+## Prerequisites
+
+The Ingestion Framework is a Python module that wraps the OpenMetadata API and builds workflows and utilities on top of it. Therefore, you need to make sure that you have the complete OpenMetadata stack running: MySQL + ElasticSearch + OpenMetadata Server.
+
+To do so, you can either build and run the [OpenMetadata Server](openmetadata-server.md) locally as well, or use the `metadata` CLI to spin up the [Docker containers](../../../overview/run-openmetadata/).
+
+## Python Setup
+
+We recommend using `pyenv` to properly install and manage different Python versions in your system. Note that OpenMetadata requires Python version +3.8. This [doc](https://python-docs.readthedocs.io/en/latest/dev/virtualenvs.html) might be helpful to set up the environment virtualization.

 ### Generated Sources

@ -12,6 +22,8 @@ All different parts of the code rely on those definitions. The first step to sta

 In the Ingestion Framework, this process is handled with `datamodel-code-generator`, which is able to read JSON schemas and automatically prepare `pydantic` models representing the input definitions. Please, make sure to run `make install_dev generate` from the project root to fill the `ingestion/src/metadata/generated` directory with the required models.

+Once you have generated the sources, you should be able to run the tests and the `metadata` CLI. You can test your setup by running `make coverage` and see if you get any errors.
+
 ### Quality tools

 When working on the Ingestion Framework, you might want to take into consideration the following style-check tooling:
@ -20,7 +32,7 @@ When working on the Ingestion Framework, you might want to take into considerati
 * [black](https://black.readthedocs.io/en/stable/) can be used to both autoformat the code and validate that the codebase is compliant.
 * [isort](https://pycqa.github.io/isort/) helps us not lose time trying to find the proper combination of importing from `stdlib`, requirements, project files…

-The main goal is to ensure standardised formatting throughout the codebase.
+The main goal is to ensure standardized formatting throughout the codebase.

 When developing, you can run these tools with `make` recipes: `make lint`, `make black` and `make isort`. Note that we are excluding the generated sources from the JSON Schema standards.

@ -34,3 +46,4 @@ We are currently using:

 * `pylint` & `black` in the CI validations, so make sure to review your PRs for any warnings you generated.
 * `black` & `isort` in the pre-commit hooks.
+
--- a/docs/open-source-community/developer/build-code-run-tests/openmetadata-server.md
+++ b/docs/open-source-community/developer/build-code-run-tests/openmetadata-server.md
@ -1,4 +1,10 @@
-# Build the code & run tests
+---
+description: >-
+  Learn how to run the OpenMetadata server in development mode by using Docker
+  and IntelliJ.
+---
+
+# OpenMetadata Server

 ## Prerequisites

@ -11,7 +17,7 @@
    ```
 *   Bootstrap MySQL with tables

-    1. Create a distribution as explained [here](build-code-run-tests.md#create-a-distribution-packaging)
+    1. Create a distribution as explained [here](openmetadata-server.md#create-a-distribution-packaging)
    2. Extract the distribution tar.gz file and run the following command

    ```
@ -20,7 +26,7 @@
    ```
 *   Bootstrap ES with indexes and load sample data into MySQL

-    1. Run OpenMetadata service instances through IntelliJ IDEA following the instructions [here](build-code-run-tests.md#run-instance-through-intellij-idea)
+    1. Run OpenMetadata service instances through IntelliJ IDEA following the instructions [here](openmetadata-server.md#run-instance-through-intellij-idea)
    2. Once the logs indicate that the instance is up, run the following commands from the top-level directory

    ```
@ -34,7 +40,7 @@
    metadata ingest -c ./pipelines/sample_usage.json
    metadata ingest -c ./pipelines/metadata_to_es.json
    ```
-* You are now ready to explore the app by going to http://localhost:8585 \*If the web page doesn't work as intended, please take a look at the troubleshooting steps [here](build-code-run-tests.md#troubleshooting)
+* You are now ready to explore the app by going to http://localhost:8585 \*If the web page doesn't work as intended, please take a look at the troubleshooting steps [here](openmetadata-server.md#troubleshooting)

 ## Building

@ -67,33 +73,33 @@ Add a new Run/Debug configuration like the below screenshot.
 2. Click on "Edit Configurations"
 3. Click + sign and Select Application and make sure your config looks similar to the below image

-![Intellij Runtime Configuration](<../../.gitbook/assets/Intellij-Runtime Config.png>)
+![Intellij Runtime Configuration](<../../../.gitbook/assets/Intellij-Runtime Config.png>)

 ## Add missing dependency

 Right-click on catalog-rest-service

-![](../../../.gitbook/assets/image-1-.png)
+![](../../../../.gitbook/assets/image-1-.png)

 Click on "Open Module Settings"

-![](../../../.gitbook/assets/image-2-.png)
+![](../../../../.gitbook/assets/image-2-.png)

 Go to "Dependencies"

-![](../../../.gitbook/assets/image-3-.png)
+![](../../../../.gitbook/assets/image-3-.png)

 Click “+” at the bottom of the dialog box and click "Add"

-![](../../../.gitbook/assets/image-4-.png)
+![](../../../../.gitbook/assets/image-4-.png)

 Click on Library

-![](../../../.gitbook/assets/image-5-.png)
+![](../../../../.gitbook/assets/image-5-.png)

 In that list look for "jersey-client:2.25.1"

-![](../../../.gitbook/assets/image-6-.png)
+![](../../../../.gitbook/assets/image-6-.png)

 Select it and click "OK". Now run/debug the application.

@ -104,7 +110,7 @@ Select it and click "OK". Now run/debug the application.
    * If ElasticSearch in Docker on Mac is crashing, try changing Preferences -> Resources -> Memory to 4GB
    * If ElasticSearch logs show `high disk watermark [90%] exceeded`, try changing Preferences -> Resources -> Disk Image Size to at least 16GB
  * `Public Key Retrieval is not allowed` - verify that the JDBC connect URL in `conf/openmetadata.yaml` is configured with the parameter `allowPublicKeyRetrieval=true`
-  * Browser console shows javascript errors, try doing a [clean build](build-code-run-tests.md#building). Some npm packages may not have been built properly.
+  * Browser console shows javascript errors, try doing a [clean build](openmetadata-server.md#building). Some npm packages may not have been built properly.

 ## Coding Style

--- a/docs/open-source-community/developer/how-to-contribute.md
+++ b/docs/open-source-community/developer/how-to-contribute.md
@ -36,7 +36,7 @@ git remote -v
 git checkout -b ISSUE-200
 ```

-Make changes. Follow the [Coding Style](https://github.com/open-metadata/OpenMetadata/blob/main/docs/open-source-community/developer/docs/open-source-community/developer/backend/coding-style.md) Guide on best practices and [Build the code & run tests](build-code-run-tests.md) on how to set up IntelliJ, Maven.
+Make changes. Follow the [Coding Style](https://github.com/open-metadata/OpenMetadata/blob/main/docs/open-source-community/developer/docs/open-source-community/developer/backend/coding-style.md) Guide on best practices and [Build the code & run tests](build-code-run-tests/) on how to set up IntelliJ, Maven.

 ## Push your changes to Github

--- a/docs/open-source-community/developer/python-api.md
+++ b/docs/open-source-community/developer/python-api.md
@ -170,7 +170,7 @@ The same would happen if, inside the actual OpenMetadata code, there was not a w

 As OpenMetadata is a data-centric solution, we need to make sure we have the right ingredients at all times. That is why we have developed a high-level Python API, using `pydantic` models automatically generated from the JSON Schemas.

-> OBS: If you are using a [published](https://pypi.org/project/openmetadata-ingestion/) version of the Ingestion Framework, you are already good to go, as we package the code with the `metadata.generated` module. If you are developing a new feature, you can get more information [here](build-a-connector/setup.md).
+> OBS: If you are using a [published](https://pypi.org/project/openmetadata-ingestion/) version of the Ingestion Framework, you are already good to go, as we package the code with the `metadata.generated` module. If you are developing a new feature, you can get more information [here](broken-reference).

 This API wrapper helps developers and consumers in:

--- a/docs/open-source-community/developer/run-integration-tests.md
+++ b/docs/open-source-community/developer/run-integration-tests.md
@ -3,13 +3,14 @@
 {% hint style="info" %}
 **The integration tests don't work at the moment.**

-Make sure OpenMetadata is up and running. Refer to instructions [building and running](build-code-run-tests.md).
+Make sure OpenMetadata is up and running. Refer to instructions [building and running](build-code-run-tests/).
 {% endhint %}

 ## Run MySQL test

 Run the following commands from the top-level directory
-```text
+
+```
 python3 -m venv /tmp/venv
 source /tmp/venv/bin/activate
 pip install -r ingestion/requirements.txt
@ -22,7 +23,7 @@ pytest -s -c /dev/null

 ## Run MsSQL test

-```text
+```
 cd ingestion
 source env/bin/activate
 cd tests/integration/mssql
@ -31,7 +32,7 @@ pytest -s -c /dev/null

 ## Run Postgres test

-```text
+```
 cd ingestion
 source env/bin/activate
 cd tests/integration/postgres
@ -40,7 +41,7 @@ pytest -s -c /dev/null

 ## Run LDAP test

-```text
+```
 python3 -m venv /tmp/venv
 source /tmp/venv/bin/activate
 pip install -r ingestion/requirements.txt
@ -53,7 +54,7 @@ pytest -s -c /dev/null

 ## Run Hive test

-```text
+```
 python3 -m venv /tmp/venv
 source /tmp/venv/bin/activate
 pip install -r ingestion/requirements.txt
@ -64,4 +65,3 @@ pip install pyhive thrift sasl thrift_sasl
 cd ingestion/tests/integration/hive
 pytest -s -c /dev/null
 ```
-