diff --git a/openmetadata-docs/content/v1.5.x/connectors/ingestion/lineage/spark-lineage.md b/openmetadata-docs/content/v1.5.x/connectors/ingestion/lineage/spark-lineage.md index 3e48cfc5eaa..43a7ebf8cc4 100644 --- a/openmetadata-docs/content/v1.5.x/connectors/ingestion/lineage/spark-lineage.md +++ b/openmetadata-docs/content/v1.5.x/connectors/ingestion/lineage/spark-lineage.md @@ -343,3 +343,40 @@ spark.openmetadata.transport.timeout 30 ``` After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata. + + +## Using Spark Agent with Glue + +Follow the below steps in order to use OpenMetadata Spark Agent with glue. + +### 1. Specify the OpenMetadata Spark Agent JAR URL + +1. Upload the OpenMetadata Spark Agent Jar to S3 +2. Navigate to the glue job,In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path +3. Add the S3 url of OpenMetadata Spark Agent Jar in the Dependent Jars path. + +{% image + src="/images/v1.5/connectors/spark/glue-job-jar.png" + alt="Glue Job Configure Jar" + caption="Glue Job Configure Jar" + /%} + + +### 2. Add Spark configuration in Job Parameters + +In the same Job details tab, add a new property under Job parameters: + +1. Add the `--conf` property with following value, make sure to customize this configuration as described in the above documentation. + +``` +spark.extraListeners=org.openmetadata.spark.agent.OpenMetadataSparkListener --conf spark.openmetadata.transport.hostPort=https://your-org.host:port --conf spark.openmetadata.transport.type=openmetadata --conf spark.openmetadata.transport.jwtToken= --conf spark.openmetadata.transport.pipelineServiceName=glue_spark_pipeline_service --conf spark.openmetadata.transport.pipelineName=glue_pipeline_name --conf spark.openmetadata.transport.timeout=30 +``` + +2. Add the `--user-jars-first` parameter and set its value to `true` + +{% image + src="/images/v1.5/connectors/spark/glue-job-params.png" + alt="Glue Job Configure Params" + caption="Glue Job Configure Params" + /%} + diff --git a/openmetadata-docs/content/v1.6.x-SNAPSHOT/connectors/ingestion/lineage/spark-lineage.md b/openmetadata-docs/content/v1.6.x-SNAPSHOT/connectors/ingestion/lineage/spark-lineage.md index 3e48cfc5eaa..3bc39c4340e 100644 --- a/openmetadata-docs/content/v1.6.x-SNAPSHOT/connectors/ingestion/lineage/spark-lineage.md +++ b/openmetadata-docs/content/v1.6.x-SNAPSHOT/connectors/ingestion/lineage/spark-lineage.md @@ -343,3 +343,39 @@ spark.openmetadata.transport.timeout 30 ``` After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata. + + +## Using Spark Agent with Glue + +Follow the below steps in order to use OpenMetadata Spark Agent with glue. + +### 1. Specify the OpenMetadata Spark Agent JAR URL + +1. Upload the OpenMetadata Spark Agent Jar to S3 +2. Navigate to the glue job,In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path +3. Add the S3 url of OpenMetadata Spark Agent Jar in the Dependent Jars path. + +{% image + src="/images/v1.6/connectors/spark/glue-job-jar.png" + alt="Glue Job Configure Jar" + caption="Glue Job Configure Jar" + /%} + + +### 2. Add Spark configuration in Job Parameters + +In the same Job details tab, add a new property under Job parameters: + +1. Add the `--conf` property with following value, make sure to customize this configuration as described in the above documentation. + +``` +spark.extraListeners=org.openmetadata.spark.agent.OpenMetadataSparkListener --conf spark.openmetadata.transport.hostPort=https://your-org.host:port --conf spark.openmetadata.transport.type=openmetadata --conf spark.openmetadata.transport.jwtToken= --conf spark.openmetadata.transport.pipelineServiceName=glue_spark_pipeline_service --conf spark.openmetadata.transport.pipelineName=glue_pipeline_name --conf spark.openmetadata.transport.timeout=30 +``` + +2. Add the `--user-jars-first` parameter and set its value to `true` + +{% image + src="/images/v1.6/connectors/spark/glue-job-params.png" + alt="Glue Job Configure Params" + caption="Glue Job Configure Params" + /%} diff --git a/openmetadata-docs/images/v1.5/connectors/spark/glue-job-jar.png b/openmetadata-docs/images/v1.5/connectors/spark/glue-job-jar.png new file mode 100644 index 00000000000..5ce7b558770 Binary files /dev/null and b/openmetadata-docs/images/v1.5/connectors/spark/glue-job-jar.png differ diff --git a/openmetadata-docs/images/v1.5/connectors/spark/glue-job-params.png b/openmetadata-docs/images/v1.5/connectors/spark/glue-job-params.png new file mode 100644 index 00000000000..3cef8e39272 Binary files /dev/null and b/openmetadata-docs/images/v1.5/connectors/spark/glue-job-params.png differ diff --git a/openmetadata-docs/images/v1.6/connectors/spark/glue-job-jar.png b/openmetadata-docs/images/v1.6/connectors/spark/glue-job-jar.png new file mode 100644 index 00000000000..5ce7b558770 Binary files /dev/null and b/openmetadata-docs/images/v1.6/connectors/spark/glue-job-jar.png differ diff --git a/openmetadata-docs/images/v1.6/connectors/spark/glue-job-params.png b/openmetadata-docs/images/v1.6/connectors/spark/glue-job-params.png new file mode 100644 index 00000000000..3cef8e39272 Binary files /dev/null and b/openmetadata-docs/images/v1.6/connectors/spark/glue-job-params.png differ