905 Commits

Author SHA1 Message Date
Xu Wang
7b1487135a
feat(ingest): add Urn python library for DataJob, DataFlow, Domain and Tag (#4618)
* feat(ingest): add python library for DataJobUrn

* add DataFlowUrn lib and fix DataJobUrn

* fix create_from_str method

* fix lint error and unit test

* add DomainUrn and TagUrn

Co-authored-by: Xu Wang <xu.wang@grandrounds.com>
2022-04-12 09:02:28 +02:00
Marcin Szymański
e7c5eb357c
feat(ingest): add trino platform for great expectations (#4594) 2022-04-11 19:48:15 -07:00
Aseem Bansal
61a95f41ae
chore: fix lint and remove incorrect integration mark from unit tests (#4621)
* chore: fix lint and remove incorrect integration mark from unit tests

* add to test requirements

* revert athena source tests
2022-04-08 17:18:48 +02:00
Aseem Bansal
336a628c5b
fix(bigquery): fix lineage bug, improve docs, add dataset filter config (#4607)
* fix(bigquery): fix metadata from exported logs, doc missing permission, improve logging, add tests

Co-authored-by: Ravindra Lanka <rslanka@gmail.com>
2022-04-07 13:10:21 -07:00
David Haglund
0785ed6143
fix: urlencode slash in urns too (#4527)
* fix: urlencode slash in urns too + tests

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-04-07 13:04:57 -07:00
Corentin
2fc3a48bc5
feat(ingest): indent sql queries for usage sources (#3782)
* feat(ingest): indent sql queries for usage connectors.

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-22-140.eu-west-1.compute.internal>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-03-31 15:15:09 -07:00
Sergio Gómez Villamor
bdf17f551e
feat(ingest): glue - adds platform instance capability (#4130) 2022-03-30 18:50:26 -07:00
Sunil Patil
36e9552d61
feat(ingestion): Support pluggable Schema Registry for Kafka Source (#4535)
* Support for pluggable schema registry for the Kafka source.
Co-authored-by: Sunil Patil <spatil@twilio.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-03-30 13:20:23 -07:00
Tamas Nemeth
4358d8fb01
feat(ingest): athena - set Athena location as upstream (#4503) 2022-03-29 07:06:48 -07:00
Shirshanka Das
a69eac8247
feat(ingest): dbt,looker,sql_common,kafka - moving sources to produce display names and subtypes more consistently (#4496) 2022-03-27 18:49:26 -05:00
Xu Wang
d04092e634
feat(ingest): add python utility classes for NotebookUrn, CorpuserUrn and CorpGroupUrn (#4469)
* feat: add python utility classes for NotebookUrn, CorpuserUrn and CorpGroupUrn

Co-authored-by: Xu Wang <xu.wang@grandrounds.com>
Co-authored-by: Ravindra Lanka <rslanka@gmail.com>
2022-03-23 16:07:57 -07:00
Aseem Bansal
c5f1d2c9bd
feat(ingestion): snowflake, bigquery - enhancements to log and bugfix (#4442)
feat(ingestion): add logging for snowflake, bigquery
2022-03-21 09:50:36 -07:00
Ravindra Lanka
60925e3e8c
Fix bug in the SchemaField type computation for AVRO logical types. (#4433) 2022-03-18 12:06:54 +01:00
Tamas Nemeth
f557b2c1b3
fix(ingestion) containers: Adding platform instance to container keys (#4279) 2022-03-16 14:57:50 -07:00
Jorgen Evens
af5c4ee4d0
fix(ingest): handle endpoints without 200 response in openapi (#4332) 2022-03-14 17:52:08 -07:00
Aseem Bansal
4bcc2b3d12
feat(ingestion): improve logging, docs for bigquery, snowflake, redshift (#4344) 2022-03-14 08:50:29 -07:00
Tamas Nemeth
48380ada4c
fix(ingest) bigquery-usage: Adding credential support for bigquery usage (#4111) 2022-03-08 12:29:10 -08:00
MugdhaHardikar-GSLab
f198a92def
fix(config-parsing): add support for variable expansion for in variables in between string (#4350) 2022-03-08 12:24:08 -08:00
Aseem Bansal
7eec30b2ec
fix(hive): clean protocol for hive source (#4330) 2022-03-08 11:57:26 -08:00
Swaroop Jagadish
35b187a8d4
feat(ingest): transformers - add support for processing MCP-s (#4337) 2022-03-07 13:14:29 -08:00
John Joyce
9f1c5a8f75
feat(assertions): Adding Assertions Entity & Great Expectations BETA (#4305) 2022-03-04 11:51:31 -08:00
Kevin Hu
02fe05eb8f
feat(ingest): data-lake - remove spark requirement if not profiling (#4131) 2022-02-24 23:26:06 -08:00
Edward Vaisman
6ff551cbcd
feat(ingest): lineage-file - add ability to provide lineage manually through a file (#4116) 2022-02-24 17:02:38 -08:00
Harshal Sheth
49a8ece02a
fix(ingestion): enable compat with avro 1.11 (#4205) 2022-02-22 22:13:50 -08:00
Xu Wang
aa3363bcc2
feat(ingest): lib - add better support for working with urns (#4172)
Co-authored-by: Xu Wang <xu.wang@grandrounds.com>
2022-02-22 19:39:24 -08:00
Ravindra Lanka
7f4cb87c57
Revert "fix(ingest): Use lower-case dataset names in the dataset urns for all SQL-styled datasets. (#4140)" (#4218)
This reverts commit 6c75185445bbb23974932ff64cb142ee6bf5b51b.
2022-02-22 16:21:40 -08:00
Ravindra Lanka
84005d3848
feat(ingest): kafka - add support for non-default schema registry subject name strategies (#4215) 2022-02-22 16:05:46 -08:00
Alexander Chashnikov
c2065bd7fe
feat(ingest): clickhouse - add initial support (#4057)
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-02-21 07:36:08 -08:00
Tamas Nemeth
3d02b5bec8
feat(ingest): bigquery - ignore temporary tables from lineage and connect edges directly (#4160) 2022-02-20 14:23:23 -08:00
Harshal Sheth
1b60fae014
test(airflow): fix airflow version parsing (#4142)
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-02-19 18:13:01 -08:00
abiwill
8bbc66b3e6
fix(ingest): elasticsearch - http/https host config support (#4191)
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-02-19 11:42:01 -08:00
Ravindra Lanka
6c75185445
fix(ingest): Use lower-case dataset names in the dataset urns for all SQL-styled datasets. (#4140) 2022-02-16 19:45:07 -08:00
Tamas Nemeth
b2664916e3
feat(ingest): Glue - Support for domains and containers (#4110)
* Add container and domain support for Glue.
Adding option to set aws profile for Glue.

* Adding domain doc for Glue

* Making get_workunits less complex

* Updating golden file

* Addressing pr review comments

* Remove unneded empty line
2022-02-16 08:29:14 -08:00
Claudio Benfatto
aeefde4fa1
feat(ingestion): Kafka stateful ingestion (#4028)
* test: test stateful ingestion for kafka

test: some more advancement

test: some improvements

refactoring

* refactor: remove some linter modifications

* tests: add unit tests for kafka state

* refactor: minor changes

* tests: improve test coverage

* fix: fix naming

* style: fix format with black

* fix: fix broken test

* revert: revert smoke tests to master

* feat: add reporting to kafka source

* tests: add smoke tests for kafka reporting

* revert: revert changes to the smoke tests

* test: add kafka integration test for stateful ingestion

* docs: update documentation on kafka source

* fix: return empty string when no platform instance

* revert: remove unwanted file

* fix: solve problem with platform instance

* chore: use console sink instead of file

* fix: disable complexity check for _extract_record

* fix: remove if condition in get_platform_instance_id

* chore: remove unneeded integration test

* test: test platform instance in kafka source unit tests
2022-02-15 07:18:36 -08:00
Tamas Nemeth
bfaec300b6
feat(ingest) Athena: Getting table properties for Athena datasets (#4123)
* Getting table properties for Athena datasets

* Isorting

* Fixing mypy error

* Addressing pr review comments
Adding tests

* Adding missing import

* black

* Fixing test run

* fixing flake8

* Adding athen to tox tests as well

* Not running athena tests on python < 3.7

* Adressing more pr comments
2022-02-14 13:51:45 -08:00
Harshal Sheth
ea2b092fe8
chore(ingest): remove unused groupby_unsorted utility (#4011) 2022-02-10 21:03:33 -08:00
Claudio Benfatto
f944a9ba05
fix(ingest): enforce correct behaviour for commit policy (#4092) 2022-02-08 23:21:23 -08:00
John Joyce
2a9a076fc1
feat(ingest): Adding Tableau Source Connector [BETA] (#4063) 2022-02-08 14:26:44 -08:00
Tamas Nemeth
63bc830cfe
Data domain containers ingestion (#4051) 2022-02-07 09:51:49 -08:00
Ravindra Lanka
f20382f956
feat(ingest): framework - client side changes for monitoring and reporting (#3807) 2022-02-02 13:19:15 -08:00
Ravindra Lanka
f4209504f1
feat(ingest): support Kafka confluent external schema resolution by name or subject (#4035) 2022-02-02 07:44:56 -08:00
mayurinehate
1afe8876b7
feat(ingest): nifi - handle provenance api variation for older versions (#4022) 2022-02-01 10:03:05 -08:00
Tamas Nemeth
771c8567da
fix(ingest): snowflake - Run authentication validation if default value used (#4024) 2022-02-01 10:01:29 -08:00
Tamas Nemeth
68711222d4
feat(ingest): usage-stats - add ability to ignore users from top users calculation (#3735) 2022-02-01 00:11:23 -08:00
Michael A. Schlosser
c36662f837
feat(ingest): snowflake - support for additional auth mechanisms (#4009) 2022-01-30 11:47:53 -08:00
Aseem Bansal
400e0fe838
feat(ingest): kafka - support schema references (#3862) 2022-01-17 14:29:54 -08:00
Ravindra Lanka
1efe04f88a
feat(ingest): glue - support for nested structs (#3895) 2022-01-17 14:21:53 -08:00
Swaroop Jagadish
7d986ec880
fix(ingest): populate system metadata for all metadata events (mcp, mcpw) (#3900) 2022-01-16 12:03:38 -08:00
Ravindra Lanka
a44b48a6b8
feat(ingest): elasticsearch - add Elasticsearch Source (#3893)
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-01-14 13:10:12 -08:00
Tamas Nemeth
e95446be1c
fix(ingest): sqlparser - Not lowercasing looker source's special table name (#3891)
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-01-14 12:22:17 -08:00