3089 Commits

Author SHA1 Message Date
Atul Saurav
e8e0067f23
fix(cli):Supress printing variables to logs during ingestion failure (#4566)
Currently when ingestion while running pipeline, stackprinter prints all vars to logs.
This may contain sensitive information. To prevent this from happening, a optional `safe` 
flag is added to cli. If this flag is set while running ingestion, no variables are logged in
case of unexpected failures.
2022-04-15 10:30:48 -07:00
Tamas Nemeth
61dc6e8723
fix(ingestion): airflow - import emitters indirectly to avoid unneeded dependency (#4668) 2022-04-14 10:22:16 -07:00
Fernanda de Camargo
d508f5c036
fix(ingestion): tableau - validate datasource before creating its upstream (#4613)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2022-04-13 23:08:01 -07:00
Aseem Bansal
73d69510f8
fix(sqlparser): fix sqlparser breaking due to # sign (#4662)
* fix(sqlparser): fix sqlparser breaking due to # sign

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-04-13 17:15:38 -07:00
Arun Vasudevan
5aa3da5c9c
feat(ingestion) dbt: Fixing issue with strip_user_ids_from_email and adding owner_extraction_pattern (#4587)
* Fixing issue with strip_user_ids_from_email and adding owner_extraction_pattern
Co-authored-by: BZ <93607724+BoyuanZhangDE@users.noreply.github.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-04-13 16:58:36 -07:00
Aseem Bansal
5a59d5a1dd
fix(ingestion): Adding missing init.py (#4659) 2022-04-13 11:02:57 +02:00
Aseem Bansal
155209f0e1
fix(ingestion): add missing workunit ids (#4657) 2022-04-13 10:19:37 +02:00
Tamas Nemeth
f99d27fd8c
feat(ingest): airflow - add support to capture airflow executions, add high level dataflow jobs api to python sdk (#4615)
Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com>
2022-04-12 23:19:39 -07:00
Kevin Hu
08c34bfe15
feat(ingest): capture MSSQL table+column descriptions (#4579)
* feat(ingest): capture MSSQL table+column descriptions

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-04-12 17:49:56 -07:00
David Sanchez
9a950ef231
fix(tableau): avoid duplicate schema in URNs for upstream tables (#4645)
* fix(tableau): avoid duplicate schema in URNs for upstream tables

* Fix(lint)
2022-04-12 16:26:52 -07:00
Meenakshi Kamalaseshan Radha
e75e2f8bbf
fix(ingest): Fix snowflake KEY_PAIR auth (#4638)
* fix(ingest): Fix snowflake KEY_PAIR auth to work with stateful ingestion.


Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-04-12 15:58:53 -07:00
Zach Bluhm
ff685b7feb
feat: Enable the ingestion of bigquery audit logs to parse usage info… (#4441)
* feat: Enable the ingestion of bigquery audit logs to parse usage information

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-04-12 14:58:34 -07:00
Ravindra Lanka
9226e3e27f
Enable lower-casing of the name part of dataset urn via an environment vairable. (#4649) 2022-04-12 12:54:22 -07:00
Dyana Rose
5b22d96e04
fix(ingestion): looker - extract explore views from join name (#4627)
Co-authored-by: Dyana Rose <dyanarose@gmail.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-04-12 08:20:10 -07:00
Aseem Bansal
23ece3b1a4
fix(ingestion): ensure source/sink reports are always logged (#4592) 2022-04-12 05:00:59 -07:00
Xu Wang
7b1487135a
feat(ingest): add Urn python library for DataJob, DataFlow, Domain and Tag (#4618)
* feat(ingest): add python library for DataJobUrn

* add DataFlowUrn lib and fix DataJobUrn

* fix create_from_str method

* fix lint error and unit test

* add DomainUrn and TagUrn

Co-authored-by: Xu Wang <xu.wang@grandrounds.com>
2022-04-12 09:02:28 +02:00
Marcin Szymański
e7c5eb357c
feat(ingest): add trino platform for great expectations (#4594) 2022-04-11 19:48:15 -07:00
jchen0824
524d183d93
feat: add presto-on-hive metadata ingestion source (#4625)
* feat(metadata ingestion source): add presto-on-hive metadata ingestion source

Co-authored-by: Houren Chen <houren.chen@grabtaxi.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-04-11 17:46:44 -07:00
BZ
5637e73ca5
feat(glue): add CatalogId parameter for cross-account access (#4608)
* Update glue.py

* Update glue.md

* Update glue.py
2022-04-11 09:08:25 +02:00
Aseem Bansal
61a95f41ae
chore: fix lint and remove incorrect integration mark from unit tests (#4621)
* chore: fix lint and remove incorrect integration mark from unit tests

* add to test requirements

* revert athena source tests
2022-04-08 17:18:48 +02:00
Marcin Szymański
7c3ad3d293
feat(ingest): enable connection string for all sqlalchemy datasources (#4508)
* feat(ingest): enable connection string for all sqlalchemy datasources

* Update sql_common.py

* fix types

* update docs

* rename variable to sqlalchemy_uri

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2022-04-07 23:11:52 -04:00
Aseem Bansal
336a628c5b
fix(bigquery): fix lineage bug, improve docs, add dataset filter config (#4607)
* fix(bigquery): fix metadata from exported logs, doc missing permission, improve logging, add tests

Co-authored-by: Ravindra Lanka <rslanka@gmail.com>
2022-04-07 13:10:21 -07:00
David Haglund
0785ed6143
fix: urlencode slash in urns too (#4527)
* fix: urlencode slash in urns too + tests

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-04-07 13:04:57 -07:00
Gabe Lyons
112589db32
feat(tableau): add some logic to normalize table names in tableau (#4609)
* add some logic to normalize table names in tableau
2022-04-07 12:15:41 -07:00
Ravindra Lanka
5e25cd1e22
feat(ingestion): Redshift Usage Source - simplify OperationalStats workunit generation. (#4585)
* feat(ingestion): Redshift Usage Source - simplify OperationalStats workunit generation.
2022-04-07 11:24:26 -07:00
Aseem Bansal
5ebb37ab4c
fix(bigquery): incorrect lineage when views are present (#4568)
* fix(bigquery): incorrect lineage when views are present

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-04-06 17:29:02 -07:00
Aditya Radhakrishnan
aeafa7e63f
feat(okta) - add support for filtering/searching when ingesting Okta groups and users (#4586) 2022-04-05 16:15:34 -07:00
mayurinehate
0a97fa22f9
fix(tableau): fix for incorrect schema returned by tableau api for snowflake connectionType (#4577) 2022-04-05 14:56:35 -07:00
Ravindra Lanka
fe5f24c2b3
fix(ingestion): Refactor redshift_usage source: simplify, annotate & fix bugs. (#4572) 2022-04-05 09:21:27 -07:00
Aseem Bansal
809d1beae9
feat(snowflake): reduce permissions provisioned by default (#4543)
* feat(snowflake): reduce permissions provisioned by default

Co-authored-by: John Joyce <john@acryl.io>
2022-04-05 09:03:00 -07:00
Kevin Hu
030d25f0a1
feat(ingest): add option for external Spark cluster (#4571)
* Add option for configuring spark cluster manager

Co-authored-by: Ravindra Lanka <rslanka@gmail.com>

Co-authored-by: Ravindra Lanka <rslanka@gmail.com>
2022-04-04 15:56:50 -07:00
David Haglund
df9e07fda2
fix: replace direct and indirect references to linkedin with datahub-project (#4557)
* Update links for github-related links to use datahub-project:
  - https://github.com
  - https://img.shields.io/github/...
  - https://raw.githubusercontent.com/...
* Also replace references for github repo linkedin/datahub with
  datahub-project/datahub.
2022-04-04 14:39:30 -05:00
mayurinehate
58e4364354
fix(tableau): gracefully stop ingestion if tableau sign in not successful (#4548)
* fix(tableau): gracefully stop ingestion if tableau sign in not successful

* Update tableau.md

* Update tableau.md

* docs(tableau): update doc, add caveats, use env variables in credentials

Co-authored-by: John Joyce <john@acryl.io>
2022-04-04 13:15:08 +02:00
Abhiram98
26742728a6
feat(ingestion): schema, table filtering for redshift-usage (#4396)
* Filter based on table/schema pattern + documentation

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-04-01 20:48:23 -07:00
darapuk
a05d798939
(fix): Update path generated when creating LookML URL (#4554)
* (fix): Update path generated when creating LookML URL
2022-04-01 11:54:36 -07:00
Corentin
2fc3a48bc5
feat(ingest): indent sql queries for usage sources (#3782)
* feat(ingest): indent sql queries for usage connectors.

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-22-140.eu-west-1.compute.internal>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-03-31 15:15:09 -07:00
Aseem Bansal
94890c1e71
fix(ingest): snowflake-usage - log warning instead of error out (#4544) 2022-03-31 08:16:15 -07:00
mayurinehate
c09834d52b
fix(kafka-connect): add platform for default case in jdbc connector, update tests for platform instance map (#4545) 2022-03-31 08:13:09 -07:00
mayurinehate
467ea7917c
fix(kafka-connect): fix lineage for postgres-like 3-level hierarchy d… (#4375)
* fix(kafka-connect): fix lineage for postgres-like 3-level hierarchy dialects in jdbc source connnector

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-03-30 20:59:02 -07:00
RyanHolstien
d311067384
fix(cli): delete - handle case insensitive entity types (#4492) 2022-03-30 20:37:14 -07:00
Sergio Gómez Villamor
bdf17f551e
feat(ingest): glue - adds platform instance capability (#4130) 2022-03-30 18:50:26 -07:00
mohdsiddique
57002c766d
feat(stateful dbt): add stateful ingestion capability in dbt source (#4456)
* feat(stateful dbt): add stateful ingestion capability in dbt source

Co-authored-by: MohdSiddiqueBagwan <mohdsiddique.bagwan@gslab.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-03-30 18:09:02 -07:00
Pedro Silva
306ddff13e
feat(platform): adds side-effect report for rollbacks (#4482)
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-03-30 17:33:35 -07:00
mayurinehate
9ba36100ab
feat(tableau): emit lineage edge from embedded datasource to upstream… (#4470)
* feat(tableau): emit lineage edge from embedded datasource to upstream published datasource

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-03-30 15:32:15 -07:00
Sunil Patil
36e9552d61
feat(ingestion): Support pluggable Schema Registry for Kafka Source (#4535)
* Support for pluggable schema registry for the Kafka source.
Co-authored-by: Sunil Patil <spatil@twilio.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-03-30 13:20:23 -07:00
Kevin Hu
1bad3c7bc9
fix(ingest): mssql - support database_alias (#4523) 2022-03-29 20:47:43 -07:00
Arun Vasudevan
c79c778270
feat(ingest): kafka-connect - support mapping for multiple DB instances (#4501) 2022-03-29 20:46:07 -07:00
cuong-pham
4833452192
fix(ingest): make tableau ingestion more resilient to error (#4494) 2022-03-29 17:44:50 -07:00
Kevin Hu
838982abb6
feat(ingestion): detect and disable telemetry in CI (#4513) 2022-03-29 13:21:53 -07:00
Ravindra Lanka
4f7d0f3281
Fix: Snowflake Table to View lineage (#4520) 2022-03-29 09:42:43 -07:00