From 1e01cb45a0d1f3de719e5bc5aa414f865519f80b Mon Sep 17 00:00:00 2001 From: Mayur Singal <39544459+ulixius9@users.noreply.github.com> Date: Thu, 17 Oct 2024 16:01:55 +0530 Subject: [PATCH] Docs: Glue Spark Pipeline Lineage (#18311) --- .../ingestion/lineage/spark-lineage.md | 37 ++++++++++++++++++ .../ingestion/lineage/spark-lineage.md | 36 +++++++++++++++++ .../v1.5/connectors/spark/glue-job-jar.png | Bin 0 -> 107027 bytes .../v1.5/connectors/spark/glue-job-params.png | Bin 0 -> 810772 bytes .../v1.6/connectors/spark/glue-job-jar.png | Bin 0 -> 107027 bytes .../v1.6/connectors/spark/glue-job-params.png | Bin 0 -> 810772 bytes 6 files changed, 73 insertions(+) create mode 100644 openmetadata-docs/images/v1.5/connectors/spark/glue-job-jar.png create mode 100644 openmetadata-docs/images/v1.5/connectors/spark/glue-job-params.png create mode 100644 openmetadata-docs/images/v1.6/connectors/spark/glue-job-jar.png create mode 100644 openmetadata-docs/images/v1.6/connectors/spark/glue-job-params.png diff --git a/openmetadata-docs/content/v1.5.x/connectors/ingestion/lineage/spark-lineage.md b/openmetadata-docs/content/v1.5.x/connectors/ingestion/lineage/spark-lineage.md index 3e48cfc5eaa..43a7ebf8cc4 100644 --- a/openmetadata-docs/content/v1.5.x/connectors/ingestion/lineage/spark-lineage.md +++ b/openmetadata-docs/content/v1.5.x/connectors/ingestion/lineage/spark-lineage.md @@ -343,3 +343,40 @@ spark.openmetadata.transport.timeout 30 ``` After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata. + + +## Using Spark Agent with Glue + +Follow the below steps in order to use OpenMetadata Spark Agent with glue. + +### 1. Specify the OpenMetadata Spark Agent JAR URL + +1. Upload the OpenMetadata Spark Agent Jar to S3 +2. Navigate to the glue job,In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path +3. Add the S3 url of OpenMetadata Spark Agent Jar in the Dependent Jars path. + +{% image + src="/images/v1.5/connectors/spark/glue-job-jar.png" + alt="Glue Job Configure Jar" + caption="Glue Job Configure Jar" + /%} + + +### 2. Add Spark configuration in Job Parameters + +In the same Job details tab, add a new property under Job parameters: + +1. Add the `--conf` property with following value, make sure to customize this configuration as described in the above documentation. + +``` +spark.extraListeners=org.openmetadata.spark.agent.OpenMetadataSparkListener --conf spark.openmetadata.transport.hostPort=https://your-org.host:port --conf spark.openmetadata.transport.type=openmetadata --conf spark.openmetadata.transport.jwtToken= --conf spark.openmetadata.transport.pipelineServiceName=glue_spark_pipeline_service --conf spark.openmetadata.transport.pipelineName=glue_pipeline_name --conf spark.openmetadata.transport.timeout=30 +``` + +2. Add the `--user-jars-first` parameter and set its value to `true` + +{% image + src="/images/v1.5/connectors/spark/glue-job-params.png" + alt="Glue Job Configure Params" + caption="Glue Job Configure Params" + /%} + diff --git a/openmetadata-docs/content/v1.6.x-SNAPSHOT/connectors/ingestion/lineage/spark-lineage.md b/openmetadata-docs/content/v1.6.x-SNAPSHOT/connectors/ingestion/lineage/spark-lineage.md index 3e48cfc5eaa..3bc39c4340e 100644 --- a/openmetadata-docs/content/v1.6.x-SNAPSHOT/connectors/ingestion/lineage/spark-lineage.md +++ b/openmetadata-docs/content/v1.6.x-SNAPSHOT/connectors/ingestion/lineage/spark-lineage.md @@ -343,3 +343,39 @@ spark.openmetadata.transport.timeout 30 ``` After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata. + + +## Using Spark Agent with Glue + +Follow the below steps in order to use OpenMetadata Spark Agent with glue. + +### 1. Specify the OpenMetadata Spark Agent JAR URL + +1. Upload the OpenMetadata Spark Agent Jar to S3 +2. Navigate to the glue job,In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path +3. Add the S3 url of OpenMetadata Spark Agent Jar in the Dependent Jars path. + +{% image + src="/images/v1.6/connectors/spark/glue-job-jar.png" + alt="Glue Job Configure Jar" + caption="Glue Job Configure Jar" + /%} + + +### 2. Add Spark configuration in Job Parameters + +In the same Job details tab, add a new property under Job parameters: + +1. Add the `--conf` property with following value, make sure to customize this configuration as described in the above documentation. + +``` +spark.extraListeners=org.openmetadata.spark.agent.OpenMetadataSparkListener --conf spark.openmetadata.transport.hostPort=https://your-org.host:port --conf spark.openmetadata.transport.type=openmetadata --conf spark.openmetadata.transport.jwtToken= --conf spark.openmetadata.transport.pipelineServiceName=glue_spark_pipeline_service --conf spark.openmetadata.transport.pipelineName=glue_pipeline_name --conf spark.openmetadata.transport.timeout=30 +``` + +2. Add the `--user-jars-first` parameter and set its value to `true` + +{% image + src="/images/v1.6/connectors/spark/glue-job-params.png" + alt="Glue Job Configure Params" + caption="Glue Job Configure Params" + /%} diff --git a/openmetadata-docs/images/v1.5/connectors/spark/glue-job-jar.png b/openmetadata-docs/images/v1.5/connectors/spark/glue-job-jar.png new file mode 100644 index 0000000000000000000000000000000000000000..5ce7b558770cba83abb3d46fe925a4fcc1d69d86 GIT binary patch literal 107027 zcmeFYWmucdwm*!N(n2XMTAb1X#hp-sLveS9;+o(trL<7IKyfYZPJjfbxE8k{#a#lV z_?zyt&pG=#&)(0c_v1gg?%b1^S+izlt$Sw8TE9C?RapiPha3kD4Gm9DR!SWW?O`Gs z8fM+2hp3!N0W5Shv?rprl9H-&l9IHlE?_HL2TL?G*{}p1Y+a3hl1ziRuVOan5Ar@O zJwRtd&-*lnA$!CxK|uTM1*Z6qXm;jCkLnM2M#O6Syg)PS=llL@QsR1~xE>oyINQ)l zKX5Df@CKR1bN|-$a1rQjBHppUu>PR1nWqYsbjf}U4xIw7Qn zii(`j234<-r-uw;c?W#H#nYnK_x3+mP7sY~v}(@^Pfi<~p3yu+t6UHNaf$XUzj?0C zFP5kar|t)**cVbc=MYLAdFNEhqolbX-Bq+8bTPTlUC*u2vgeL;;VaS`8OMeAzfTo$hqf5@%@LIpxNo?ImDnPctz%6{?L-xjnq=2y*@g5mKAgJx(n5;t}W+kVx> z;-ss|^uX^WPk?M_RuYj2FDbPst*VG;Stv(jjMlc_vyJQCzTRI%YRca_sR0{X4-~vL zz8Ib2m=T+weDZ6g-MI3Cz7b_sre-w$*8Mmtt~Zzid-Ver1{#&DsS>7!7|FA`^C@Hp z&|Br%3K~}*nxAQkFy9^<(^1e