mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-08-22 07:58:06 +00:00
Docs: Glue Spark Pipeline Lineage (#18311)
This commit is contained in:
parent
d20ee5cc8a
commit
1e01cb45a0
@ -343,3 +343,40 @@ spark.openmetadata.transport.timeout 30
|
|||||||
```
|
```
|
||||||
|
|
||||||
After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata.
|
After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata.
|
||||||
|
|
||||||
|
|
||||||
|
## Using Spark Agent with Glue
|
||||||
|
|
||||||
|
Follow the below steps in order to use OpenMetadata Spark Agent with glue.
|
||||||
|
|
||||||
|
### 1. Specify the OpenMetadata Spark Agent JAR URL
|
||||||
|
|
||||||
|
1. Upload the OpenMetadata Spark Agent Jar to S3
|
||||||
|
2. Navigate to the glue job,In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path
|
||||||
|
3. Add the S3 url of OpenMetadata Spark Agent Jar in the Dependent Jars path.
|
||||||
|
|
||||||
|
{% image
|
||||||
|
src="/images/v1.5/connectors/spark/glue-job-jar.png"
|
||||||
|
alt="Glue Job Configure Jar"
|
||||||
|
caption="Glue Job Configure Jar"
|
||||||
|
/%}
|
||||||
|
|
||||||
|
|
||||||
|
### 2. Add Spark configuration in Job Parameters
|
||||||
|
|
||||||
|
In the same Job details tab, add a new property under Job parameters:
|
||||||
|
|
||||||
|
1. Add the `--conf` property with following value, make sure to customize this configuration as described in the above documentation.
|
||||||
|
|
||||||
|
```
|
||||||
|
spark.extraListeners=org.openmetadata.spark.agent.OpenMetadataSparkListener --conf spark.openmetadata.transport.hostPort=https://your-org.host:port --conf spark.openmetadata.transport.type=openmetadata --conf spark.openmetadata.transport.jwtToken=<jwt-token> --conf spark.openmetadata.transport.pipelineServiceName=glue_spark_pipeline_service --conf spark.openmetadata.transport.pipelineName=glue_pipeline_name --conf spark.openmetadata.transport.timeout=30
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Add the `--user-jars-first` parameter and set its value to `true`
|
||||||
|
|
||||||
|
{% image
|
||||||
|
src="/images/v1.5/connectors/spark/glue-job-params.png"
|
||||||
|
alt="Glue Job Configure Params"
|
||||||
|
caption="Glue Job Configure Params"
|
||||||
|
/%}
|
||||||
|
|
||||||
|
@ -343,3 +343,39 @@ spark.openmetadata.transport.timeout 30
|
|||||||
```
|
```
|
||||||
|
|
||||||
After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata.
|
After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata.
|
||||||
|
|
||||||
|
|
||||||
|
## Using Spark Agent with Glue
|
||||||
|
|
||||||
|
Follow the below steps in order to use OpenMetadata Spark Agent with glue.
|
||||||
|
|
||||||
|
### 1. Specify the OpenMetadata Spark Agent JAR URL
|
||||||
|
|
||||||
|
1. Upload the OpenMetadata Spark Agent Jar to S3
|
||||||
|
2. Navigate to the glue job,In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path
|
||||||
|
3. Add the S3 url of OpenMetadata Spark Agent Jar in the Dependent Jars path.
|
||||||
|
|
||||||
|
{% image
|
||||||
|
src="/images/v1.6/connectors/spark/glue-job-jar.png"
|
||||||
|
alt="Glue Job Configure Jar"
|
||||||
|
caption="Glue Job Configure Jar"
|
||||||
|
/%}
|
||||||
|
|
||||||
|
|
||||||
|
### 2. Add Spark configuration in Job Parameters
|
||||||
|
|
||||||
|
In the same Job details tab, add a new property under Job parameters:
|
||||||
|
|
||||||
|
1. Add the `--conf` property with following value, make sure to customize this configuration as described in the above documentation.
|
||||||
|
|
||||||
|
```
|
||||||
|
spark.extraListeners=org.openmetadata.spark.agent.OpenMetadataSparkListener --conf spark.openmetadata.transport.hostPort=https://your-org.host:port --conf spark.openmetadata.transport.type=openmetadata --conf spark.openmetadata.transport.jwtToken=<jwt-token> --conf spark.openmetadata.transport.pipelineServiceName=glue_spark_pipeline_service --conf spark.openmetadata.transport.pipelineName=glue_pipeline_name --conf spark.openmetadata.transport.timeout=30
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Add the `--user-jars-first` parameter and set its value to `true`
|
||||||
|
|
||||||
|
{% image
|
||||||
|
src="/images/v1.6/connectors/spark/glue-job-params.png"
|
||||||
|
alt="Glue Job Configure Params"
|
||||||
|
caption="Glue Job Configure Params"
|
||||||
|
/%}
|
||||||
|
BIN
openmetadata-docs/images/v1.5/connectors/spark/glue-job-jar.png
Normal file
BIN
openmetadata-docs/images/v1.5/connectors/spark/glue-job-jar.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 104 KiB |
Binary file not shown.
After Width: | Height: | Size: 792 KiB |
BIN
openmetadata-docs/images/v1.6/connectors/spark/glue-job-jar.png
Normal file
BIN
openmetadata-docs/images/v1.6/connectors/spark/glue-job-jar.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 104 KiB |
Binary file not shown.
After Width: | Height: | Size: 792 KiB |
Loading…
x
Reference in New Issue
Block a user