578 Commits

Author SHA1 Message Date
Tony Ouyang
f4da93988e
feat(ingestion/dynamodb): Add DynamoDB as new metadata ingestion source (#8768)
Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
2023-09-15 13:26:17 -07:00
Andrew Sikowitz
e75900b9a9
build(ingest): Remove constraint on jsonschema for Python >= 3.8 (#8842) 2023-09-14 12:25:41 -07:00
Andrew Sikowitz
1474ac01b1
build(ingest): Bump jsonschema for Python >= 3.8 (#8836) 2023-09-13 12:32:45 -07:00
Mayuri Nehate
303a2d0863
build(ingest): upgrade to sqlalchemy 1.4, drop 1.3 support (#8810)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-09-12 11:30:24 -07:00
Harshal Sheth
0e8000cf18
feat(ingest): drop sql_metadata parser (#8765) 2023-09-07 11:32:28 -07:00
Harshal Sheth
4ffad4d9b9
chore(ingest): upgrade sqlglot fork (#8775)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-09-06 12:49:44 -07:00
cccs-eric
6fe60a274e
feat(iceberg): Upgrade Iceberg ingestion source to pyiceberg 0.4.0 (#8357)
Co-authored-by: cccs-Dustin <96579982+cccs-Dustin@users.noreply.github.com>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-08-31 13:01:05 -04:00
Mayuri Nehate
e867dbc3da
ci: separate airflow build and test (#8688)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-08-30 14:08:42 -07:00
Andrew Sikowitz
19ce0036c7
build(ingest): Pin mypy-boto3-sagemaker directly (#8746) 2023-08-29 12:37:27 -05:00
Tamas Nemeth
d86b336e70
chore(ingest/s3) Bump Deequ and Pyspark version (#8638)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-08-29 18:11:37 +02:00
Andrew Sikowitz
6659ff26ef
feat(ingest/sql-queries): Add sql queries source, SqlParsingBuilder, sqlglot_lineage performance optimizations (#8494)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-08-24 10:35:46 -04:00
Harshal Sheth
a97548ce46
fix(ingest/powerbi): add sqlglot python dep (#8704) 2023-08-23 22:05:53 -07:00
Andrew Sikowitz
68abf9c6a1
build(ingest): Bump pydantic pin (#8660) 2023-08-23 16:55:51 +05:30
skrydal
baae3d261d
fix(ingest/okta): fix event_loop RuntimeError with nested asyncio (#8637) 2023-08-16 10:32:57 +05:30
Andrew Sikowitz
526e626146
feat(ingest): Add DataHub source (#8561)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-08-15 17:49:20 -04:00
Harshal Sheth
ebf42e2702
fix(ingest): use hive pure_sasl variant (#8570) 2023-08-09 13:04:36 -04:00
Harshal Sheth
3428bcaaad
fix(ingest): add tableau sqlglot dep (#8552) 2023-08-02 15:18:06 -03:00
Pedro Silva
a4a8182001
feat(cli): Adds ability to upload recipes to DataHub's UI (#8317)
Co-authored-by: Indy Prentice <iprentic@users.noreply.github.com>
2023-08-01 17:35:42 -03:00
VISHAL KUMAR
ef3b9489aa
feat(ingest/vertica): performance improvement and bug fixes (#8328)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-08-01 19:34:35 +05:30
Harshal Sheth
d8b2397b93
fix(ingest): pin boto3-stubs in CI (#8527)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-07-31 19:48:05 -07:00
Harshal Sheth
89f23d3c36
chore(ingest): bump sqllineage and sqlparse (#8481) 2023-07-28 13:10:19 -07:00
Harshal Sheth
d733363bed
chore(ingest): drop bigquery-beta and snowflake-beta aliases (#8451) 2023-07-20 14:05:25 -04:00
Aseem Bansal
9df70d7355
ingest(elasticsearch): add basic profiling (#8351) 2023-07-20 08:25:30 +05:30
Andrew Sikowitz
48c1dc820e
build(ingest/boto3): Update boto3-stubs to fix CI (#8452) 2023-07-18 21:29:50 +00:00
Andrew Sikowitz
20b3adb7b1
fix(ingest/snowflake): Add sqlglot as snowflake dependency (#8427) 2023-07-14 21:31:24 -04:00
Andrew Sikowitz
f41f642eaf
build(ingest/boto3): Update boto3-stubs to fix CI (#8425) 2023-07-14 15:48:04 -07:00
mohdsiddique
cbbe083731
fix(ingestion/powerbi): increment msal version (#8385)
Co-authored-by: MohdSiddiqueBagwan <mohdsiddique.bagwan@gslab.com>
2023-07-13 17:33:19 +05:30
Tamas Nemeth
54c7aef1bc
feat(ingest/presto-on-hive): Extracting all the table properties from Hive Metastore (#8348)
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-07-12 15:56:13 -03:00
Andrew Sikowitz
2261531e31
test(ingest): Aspect level golden file comparison (#8310) 2023-07-11 10:39:47 -04:00
Harshal Sheth
3e47b3d228
feat(ingest): schema-aware SQL parsing for column-level lineage (#8334) 2023-07-07 16:24:35 -07:00
Andrew Sikowitz
8617e072fa
build(ingest): Pin pydeequ to unblock CI (#8381) 2023-07-07 16:18:52 -04:00
Andrew Sikowitz
8a198cd615
fix(ingest/unity): Pin databricks-sdk and update docs (#8293) 2023-06-27 13:38:55 -04:00
Andrew Sikowitz
584366771d
refactor(unity): Remove databricks_cli and cleanup (#8249)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-06-23 18:01:05 +05:30
Tamas Nemeth
d3aed62778
feat(cli): Initial support for sending exceptions to Sentry (#7172) 2023-06-22 10:24:58 +02:00
Mayuri Nehate
ac06cf3d3f
feat(classification): configurable minimum values threshold (#8186) 2023-06-07 21:28:13 -07:00
Andrew Sikowitz
7041281bbe
build(ingest/feast): Pin feast to minor version (#8180) 2023-06-07 10:04:42 +02:00
Andrew Sikowitz
6bad15be5c
fix(ingest): Fix modeldocgen; bump feast to relax pyarrow constraint (#8178) 2023-06-06 13:12:10 -07:00
Mayuri Nehate
983a8ca675
feat(classification): support for regex based custom infotypes (#8177) 2023-06-06 14:41:51 +02:00
Vinícius Mello
7059874dec
feat(ingest/bigquery): Add BigQuery Views lineage extraction from Google Data Catalog API (#8100) 2023-05-25 08:37:46 -07:00
Mayuri Nehate
84270bcac8
feat(ingest/nifi): kerberos authentication (#8097)
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
Co-authored-by: Indy Prentice <iprentic@users.noreply.github.com>
2023-05-24 15:09:01 -07:00
Tamas Nemeth
4ca7a9b50e
fix(ingest/build): setting typing extension <4.6.0 because it breaks tests (#8108) 2023-05-23 18:55:28 +05:30
Shirshanka Das
b3c790aab6
feat: Add support for Data Products (#8039)
Co-authored-by: Chris Collins <chriscollins3456@gmail.com>
2023-05-17 07:17:25 +00:00
Tamas Nemeth
c0d50d0b2c
fix(ingest/s3) Adding missing more-itertools dependency (#8023) 2023-05-11 12:14:25 -07:00
Andrew Sikowitz
9c7742b1d7
fix(ingest/unity): Update databricks-cli pin (#8024) 2023-05-11 12:14:10 -07:00
Andrew Sikowitz
a68833769e
refactor(ingest/unity): Use databricks-sdk over databricks-cli for usage query (#7981) 2023-05-09 13:30:11 -07:00
Andrew Sikowitz
4e9c398e1d
fix(ingest/unity): Add sqllineage dependency (#7938) 2023-05-01 23:26:49 -04:00
Andrew Sikowitz
eb1674ffdb
fix(ingest/unity-catalog): Add usage_common dependency to unity catalog plugin (#7935) 2023-05-01 14:47:44 -07:00
Andrew Sikowitz
5b290c9bc5
feat(ingest/unity): Add usage extraction; add TableReference (#7910)
- Adds usage extraction to the unity catalog source and a TableReference object to handle references to tables
Also makes the following refactors:
- Creates UsageAggregator class to usage_common, as I've seen this same logic multiple times.
- Allows customizable user_urn_builder in usage_common as not all unity users are emails. We create emails with a default email_domain config in other connectors like redshift and snowflake, which seems unnecessary now?
- Creates TableReference for unity catalog and adds it to the Table dataclass, for managing string references to tables. Replaces logic, especially in lineage extraction, with these references
- Creates gen_dataset_urn and gen_user_urn on unity source to reduce duplicate code
Breaks up proxy.py into implementation and types
2023-05-01 11:30:09 -07:00
Mayuri Nehate
a0c4e0dd46
feat(ingest): add GCS ingestion source (#7903)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-04-27 19:03:41 +02:00
Harshal Sheth
29e5cfd643
fix(ingest): fix minor bug + protective dep requirements (#7861) 2023-04-25 14:35:01 -07:00