Docs: Glue Spark Pipeline Lineage (#18311)

This commit is contained in:
Mayur Singal 2024-10-17 16:01:55 +05:30 committed by GitHub
parent d20ee5cc8a
commit 1e01cb45a0
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 73 additions and 0 deletions

View File

@ -343,3 +343,40 @@ spark.openmetadata.transport.timeout 30
```
After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata.
## Using Spark Agent with Glue
Follow the below steps in order to use OpenMetadata Spark Agent with glue.
### 1. Specify the OpenMetadata Spark Agent JAR URL
1. Upload the OpenMetadata Spark Agent Jar to S3
2. Navigate to the glue job,In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path
3. Add the S3 url of OpenMetadata Spark Agent Jar in the Dependent Jars path.
{% image
src="/images/v1.5/connectors/spark/glue-job-jar.png"
alt="Glue Job Configure Jar"
caption="Glue Job Configure Jar"
/%}
### 2. Add Spark configuration in Job Parameters
In the same Job details tab, add a new property under Job parameters:
1. Add the `--conf` property with following value, make sure to customize this configuration as described in the above documentation.
```
spark.extraListeners=org.openmetadata.spark.agent.OpenMetadataSparkListener --conf spark.openmetadata.transport.hostPort=https://your-org.host:port --conf spark.openmetadata.transport.type=openmetadata --conf spark.openmetadata.transport.jwtToken=<jwt-token> --conf spark.openmetadata.transport.pipelineServiceName=glue_spark_pipeline_service --conf spark.openmetadata.transport.pipelineName=glue_pipeline_name --conf spark.openmetadata.transport.timeout=30
```
2. Add the `--user-jars-first` parameter and set its value to `true`
{% image
src="/images/v1.5/connectors/spark/glue-job-params.png"
alt="Glue Job Configure Params"
caption="Glue Job Configure Params"
/%}

View File

@ -343,3 +343,39 @@ spark.openmetadata.transport.timeout 30
```
After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata.
## Using Spark Agent with Glue
Follow the below steps in order to use OpenMetadata Spark Agent with glue.
### 1. Specify the OpenMetadata Spark Agent JAR URL
1. Upload the OpenMetadata Spark Agent Jar to S3
2. Navigate to the glue job,In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path
3. Add the S3 url of OpenMetadata Spark Agent Jar in the Dependent Jars path.
{% image
src="/images/v1.6/connectors/spark/glue-job-jar.png"
alt="Glue Job Configure Jar"
caption="Glue Job Configure Jar"
/%}
### 2. Add Spark configuration in Job Parameters
In the same Job details tab, add a new property under Job parameters:
1. Add the `--conf` property with following value, make sure to customize this configuration as described in the above documentation.
```
spark.extraListeners=org.openmetadata.spark.agent.OpenMetadataSparkListener --conf spark.openmetadata.transport.hostPort=https://your-org.host:port --conf spark.openmetadata.transport.type=openmetadata --conf spark.openmetadata.transport.jwtToken=<jwt-token> --conf spark.openmetadata.transport.pipelineServiceName=glue_spark_pipeline_service --conf spark.openmetadata.transport.pipelineName=glue_pipeline_name --conf spark.openmetadata.transport.timeout=30
```
2. Add the `--user-jars-first` parameter and set its value to `true`
{% image
src="/images/v1.6/connectors/spark/glue-job-params.png"
alt="Glue Job Configure Params"
caption="Glue Job Configure Params"
/%}

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 792 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 792 KiB