MINOR: Add Multithread Documentation (#15706)

* Add Multithread Documentation

* Add general considerations
This commit is contained in:
IceS2 2024-03-27 08:30:16 +01:00 committed by GitHub
parent d52bf7aacb
commit ce3f124a33
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 41 additions and 7 deletions

View File

@ -23,8 +23,8 @@ The flow is depicted in the images below.
**TopologyRunner Standard Flow**
![image](https://github.com/IceS2/OpenMetadata/assets/4912399/c253af53-c11a-4b91-b101-892fa8169c81)
![image](../openmetadata-docs/images/v1.4/features/ingestion/workflows/metadata/multithreading/single-thread-flow.png)
**TopologyRunner Multithread Flow**
![image](https://github.com/IceS2/OpenMetadata/assets/4912399/3fcef845-10da-4aee-82cc-28d5f5ff9532)
![image](../openmetadata-docs/images/v1.4/features/ingestion/workflows/metadata/multithreading/multi-thread-flow.png)

View File

@ -46,6 +46,7 @@ If the owner's name is openmetadata, you need to enter `openmetadata@domain.com`
- **Enabled**: If `True`, enables Metadata Extraction to be Incremental.
- **lookback Days**: Number of days to search back for a successful pipeline run. The timestamp of the last found successful pipeline run will be used as a base to search for updated entities.
- **Safety Margin Days**: Number of days to add to the last successful pipeline run timestamp to search for updated entities.
- **Threads (Beta)**: Use a Multithread approach for Metadata Extraction. You can define here the number of threads you would like to run concurrently. For further information please check the documentation on [**Metadata Ingestion - Multithreading**](/connectors/ingestion/workflows/metadata/multithreading)
Note that the right-hand side panel in the OpenMetadata UI will also share useful documentation when configuring the ingestion.

View File

@ -28,7 +28,7 @@ How this is done depends a lot on the Source itself, but the general idea is to
When using the Incremental Extraction feature with External Ingestions (ingesting using YAML files instead of setting it up from the UI), you must pass the ingestion pipeline fully qualified name to the configuration.
This should be `{service_name}{pipeline_name}`
This should be `{service_name}.{pipeline_name}`
**Example:**
@ -53,7 +53,3 @@ ingestionPipelineFQN: my_service.my_pipeline
{% connectorInfoCard name="Snowflake" stage="BETA" href="/connectors/ingestion/workflows/metadata/incremental-extraction/snowflake" platform="OpenMetadata" / %}
{% /connectorsListContainer %}
<!-- [**BigQuery**](/connectors/ingestion/workflows/metadata/incremental-extraction/bigquery) -->
<!-- [**Redshift**](/connectors/ingestion/workflows/metadata/incremental-extraction/redshift) -->
<!-- [**Snowflake**](/connectors/ingestion/workflows/metadata/incremental-extraction/snowflake) -->

View File

@ -0,0 +1,34 @@
---
title: Metadata Ingestion - Multithreading (Beta)
slug: /connectors/ingestion/workflows/metadata/multithreading
---
# Metadata Ingestion - Multithreading (Beta)
The default Metadata Ingestion runs sequentially. This feature allows to run the ingestion concurrently using [Threading](https://docs.python.org/3/library/threading.html).
The user is able to define the amount of threads he would like to use and then the ingestion pipeline is responsible for opening at most that amount. The specific behaviour changes depending on the Service Type used. Please check below on [Feature available for](#feature-available-for) for more information.
## General Considerations
Each case is specific and **more threads does not necessarily translate into a better performance**.
Take into account that with each thread we
- **Increase the load on the Database**, since it opens a new connection that will be used.
- **Increases the Memory used**, since we are holding more context at any given time.
We recommend testing with different values from 1 to 8. **If unsure or having issues, leaving it at 1 is recommended.**
## Feature available for
### Databases
This feature is implemented for all Databases at `schema` level. This means that instead of processing one `schema` at a time we open at most the amount number of configured threads, each with a dedicated Database connection and process them concurrently.
**Example: 4 Threads**
{% image
src="/images/v1.4/features/ingestion/workflows/metadata/multithreading/example-diagram.png"
alt="Example: 4 Threads"
caption="Small Diagram to depict how multithreading works." /%}

View File

@ -797,6 +797,9 @@ site_menu:
- category: Connectors / Ingestion / Workflows/ Metadata / Incremental Extraction / Snowflake
url: /connectors/ingestion/workflows/metadata/incremental-extraction/snowflake
- category: Connectors / Ingestion / Workflows/ Metadata / Multithreading
url: /connectors/ingestion/workflows/metadata/multithreading
- category: Connectors / Ingestion / Workflows / Usage
url: /connectors/ingestion/workflows/usage
- category: Connectors / Ingestion / Workflows / Usage / Usage Workflow Through Query Logs

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB