mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-12-27 07:28:30 +00:00
MINOR: Add Multithread Documentation (#15706)
* Add Multithread Documentation * Add general considerations
This commit is contained in:
parent
d52bf7aacb
commit
ce3f124a33
@ -23,8 +23,8 @@ The flow is depicted in the images below.
|
||||
|
||||
**TopologyRunner Standard Flow**
|
||||
|
||||

|
||||

|
||||
|
||||
**TopologyRunner Multithread Flow**
|
||||
|
||||

|
||||

|
||||
|
||||
@ -46,6 +46,7 @@ If the owner's name is openmetadata, you need to enter `openmetadata@domain.com`
|
||||
- **Enabled**: If `True`, enables Metadata Extraction to be Incremental.
|
||||
- **lookback Days**: Number of days to search back for a successful pipeline run. The timestamp of the last found successful pipeline run will be used as a base to search for updated entities.
|
||||
- **Safety Margin Days**: Number of days to add to the last successful pipeline run timestamp to search for updated entities.
|
||||
- **Threads (Beta)**: Use a Multithread approach for Metadata Extraction. You can define here the number of threads you would like to run concurrently. For further information please check the documentation on [**Metadata Ingestion - Multithreading**](/connectors/ingestion/workflows/metadata/multithreading)
|
||||
|
||||
Note that the right-hand side panel in the OpenMetadata UI will also share useful documentation when configuring the ingestion.
|
||||
|
||||
|
||||
@ -28,7 +28,7 @@ How this is done depends a lot on the Source itself, but the general idea is to
|
||||
|
||||
When using the Incremental Extraction feature with External Ingestions (ingesting using YAML files instead of setting it up from the UI), you must pass the ingestion pipeline fully qualified name to the configuration.
|
||||
|
||||
This should be `{service_name}{pipeline_name}`
|
||||
This should be `{service_name}.{pipeline_name}`
|
||||
|
||||
**Example:**
|
||||
|
||||
@ -53,7 +53,3 @@ ingestionPipelineFQN: my_service.my_pipeline
|
||||
{% connectorInfoCard name="Snowflake" stage="BETA" href="/connectors/ingestion/workflows/metadata/incremental-extraction/snowflake" platform="OpenMetadata" / %}
|
||||
|
||||
{% /connectorsListContainer %}
|
||||
|
||||
<!-- [**BigQuery**](/connectors/ingestion/workflows/metadata/incremental-extraction/bigquery) -->
|
||||
<!-- [**Redshift**](/connectors/ingestion/workflows/metadata/incremental-extraction/redshift) -->
|
||||
<!-- [**Snowflake**](/connectors/ingestion/workflows/metadata/incremental-extraction/snowflake) -->
|
||||
|
||||
@ -0,0 +1,34 @@
|
||||
---
|
||||
title: Metadata Ingestion - Multithreading (Beta)
|
||||
slug: /connectors/ingestion/workflows/metadata/multithreading
|
||||
---
|
||||
|
||||
# Metadata Ingestion - Multithreading (Beta)
|
||||
|
||||
The default Metadata Ingestion runs sequentially. This feature allows to run the ingestion concurrently using [Threading](https://docs.python.org/3/library/threading.html).
|
||||
|
||||
The user is able to define the amount of threads he would like to use and then the ingestion pipeline is responsible for opening at most that amount. The specific behaviour changes depending on the Service Type used. Please check below on [Feature available for](#feature-available-for) for more information.
|
||||
|
||||
## General Considerations
|
||||
|
||||
Each case is specific and **more threads does not necessarily translate into a better performance**.
|
||||
|
||||
Take into account that with each thread we
|
||||
|
||||
- **Increase the load on the Database**, since it opens a new connection that will be used.
|
||||
- **Increases the Memory used**, since we are holding more context at any given time.
|
||||
|
||||
We recommend testing with different values from 1 to 8. **If unsure or having issues, leaving it at 1 is recommended.**
|
||||
|
||||
## Feature available for
|
||||
|
||||
### Databases
|
||||
|
||||
This feature is implemented for all Databases at `schema` level. This means that instead of processing one `schema` at a time we open at most the amount number of configured threads, each with a dedicated Database connection and process them concurrently.
|
||||
|
||||
**Example: 4 Threads**
|
||||
|
||||
{% image
|
||||
src="/images/v1.4/features/ingestion/workflows/metadata/multithreading/example-diagram.png"
|
||||
alt="Example: 4 Threads"
|
||||
caption="Small Diagram to depict how multithreading works." /%}
|
||||
@ -797,6 +797,9 @@ site_menu:
|
||||
- category: Connectors / Ingestion / Workflows/ Metadata / Incremental Extraction / Snowflake
|
||||
url: /connectors/ingestion/workflows/metadata/incremental-extraction/snowflake
|
||||
|
||||
- category: Connectors / Ingestion / Workflows/ Metadata / Multithreading
|
||||
url: /connectors/ingestion/workflows/metadata/multithreading
|
||||
|
||||
- category: Connectors / Ingestion / Workflows / Usage
|
||||
url: /connectors/ingestion/workflows/usage
|
||||
- category: Connectors / Ingestion / Workflows / Usage / Usage Workflow Through Query Logs
|
||||
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 99 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 134 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 80 KiB |
Loading…
x
Reference in New Issue
Block a user