Tony Ouyang
f4da93988e
feat(ingestion/dynamodb): Add DynamoDB as new metadata ingestion source ( #8768 )
...
Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
2023-09-15 13:26:17 -07:00
Andrew Sikowitz
e75900b9a9
build(ingest): Remove constraint on jsonschema for Python >= 3.8 ( #8842 )
2023-09-14 12:25:41 -07:00
Andrew Sikowitz
1474ac01b1
build(ingest): Bump jsonschema for Python >= 3.8 ( #8836 )
2023-09-13 12:32:45 -07:00
Mayuri Nehate
303a2d0863
build(ingest): upgrade to sqlalchemy 1.4, drop 1.3 support ( #8810 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-09-12 11:30:24 -07:00
Harshal Sheth
0e8000cf18
feat(ingest): drop sql_metadata parser ( #8765 )
2023-09-07 11:32:28 -07:00
Harshal Sheth
4ffad4d9b9
chore(ingest): upgrade sqlglot fork ( #8775 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-09-06 12:49:44 -07:00
cccs-eric
6fe60a274e
feat(iceberg): Upgrade Iceberg ingestion source to pyiceberg 0.4.0 ( #8357 )
...
Co-authored-by: cccs-Dustin <96579982+cccs-Dustin@users.noreply.github.com>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-08-31 13:01:05 -04:00
Mayuri Nehate
e867dbc3da
ci: separate airflow build and test ( #8688 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-08-30 14:08:42 -07:00
Andrew Sikowitz
19ce0036c7
build(ingest): Pin mypy-boto3-sagemaker directly ( #8746 )
2023-08-29 12:37:27 -05:00
Tamas Nemeth
d86b336e70
chore(ingest/s3) Bump Deequ and Pyspark version ( #8638 )
...
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-08-29 18:11:37 +02:00
Andrew Sikowitz
6659ff26ef
feat(ingest/sql-queries): Add sql queries source, SqlParsingBuilder, sqlglot_lineage performance optimizations ( #8494 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-08-24 10:35:46 -04:00
Harshal Sheth
a97548ce46
fix(ingest/powerbi): add sqlglot python dep ( #8704 )
2023-08-23 22:05:53 -07:00
Andrew Sikowitz
68abf9c6a1
build(ingest): Bump pydantic pin ( #8660 )
2023-08-23 16:55:51 +05:30
skrydal
baae3d261d
fix(ingest/okta): fix event_loop RuntimeError with nested asyncio ( #8637 )
2023-08-16 10:32:57 +05:30
Andrew Sikowitz
526e626146
feat(ingest): Add DataHub source ( #8561 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-08-15 17:49:20 -04:00
Harshal Sheth
ebf42e2702
fix(ingest): use hive pure_sasl variant ( #8570 )
2023-08-09 13:04:36 -04:00
Harshal Sheth
3428bcaaad
fix(ingest): add tableau sqlglot dep ( #8552 )
2023-08-02 15:18:06 -03:00
Pedro Silva
a4a8182001
feat(cli): Adds ability to upload recipes to DataHub's UI ( #8317 )
...
Co-authored-by: Indy Prentice <iprentic@users.noreply.github.com>
2023-08-01 17:35:42 -03:00
VISHAL KUMAR
ef3b9489aa
feat(ingest/vertica): performance improvement and bug fixes ( #8328 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-08-01 19:34:35 +05:30
Harshal Sheth
d8b2397b93
fix(ingest): pin boto3-stubs in CI ( #8527 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-07-31 19:48:05 -07:00
Harshal Sheth
89f23d3c36
chore(ingest): bump sqllineage and sqlparse ( #8481 )
2023-07-28 13:10:19 -07:00
Harshal Sheth
d733363bed
chore(ingest): drop bigquery-beta and snowflake-beta aliases ( #8451 )
2023-07-20 14:05:25 -04:00
Aseem Bansal
9df70d7355
ingest(elasticsearch): add basic profiling ( #8351 )
2023-07-20 08:25:30 +05:30
Andrew Sikowitz
48c1dc820e
build(ingest/boto3): Update boto3-stubs to fix CI ( #8452 )
2023-07-18 21:29:50 +00:00
Andrew Sikowitz
20b3adb7b1
fix(ingest/snowflake): Add sqlglot as snowflake dependency ( #8427 )
2023-07-14 21:31:24 -04:00
Andrew Sikowitz
f41f642eaf
build(ingest/boto3): Update boto3-stubs to fix CI ( #8425 )
2023-07-14 15:48:04 -07:00
mohdsiddique
cbbe083731
fix(ingestion/powerbi): increment msal version ( #8385 )
...
Co-authored-by: MohdSiddiqueBagwan <mohdsiddique.bagwan@gslab.com>
2023-07-13 17:33:19 +05:30
Tamas Nemeth
54c7aef1bc
feat(ingest/presto-on-hive): Extracting all the table properties from Hive Metastore ( #8348 )
...
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-07-12 15:56:13 -03:00
Andrew Sikowitz
2261531e31
test(ingest): Aspect level golden file comparison ( #8310 )
2023-07-11 10:39:47 -04:00
Harshal Sheth
3e47b3d228
feat(ingest): schema-aware SQL parsing for column-level lineage ( #8334 )
2023-07-07 16:24:35 -07:00
Andrew Sikowitz
8617e072fa
build(ingest): Pin pydeequ to unblock CI ( #8381 )
2023-07-07 16:18:52 -04:00
Andrew Sikowitz
8a198cd615
fix(ingest/unity): Pin databricks-sdk and update docs ( #8293 )
2023-06-27 13:38:55 -04:00
Andrew Sikowitz
584366771d
refactor(unity): Remove databricks_cli and cleanup ( #8249 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-06-23 18:01:05 +05:30
Tamas Nemeth
d3aed62778
feat(cli): Initial support for sending exceptions to Sentry ( #7172 )
2023-06-22 10:24:58 +02:00
Mayuri Nehate
ac06cf3d3f
feat(classification): configurable minimum values threshold ( #8186 )
2023-06-07 21:28:13 -07:00
Andrew Sikowitz
7041281bbe
build(ingest/feast): Pin feast to minor version ( #8180 )
2023-06-07 10:04:42 +02:00
Andrew Sikowitz
6bad15be5c
fix(ingest): Fix modeldocgen; bump feast to relax pyarrow constraint ( #8178 )
2023-06-06 13:12:10 -07:00
Mayuri Nehate
983a8ca675
feat(classification): support for regex based custom infotypes ( #8177 )
2023-06-06 14:41:51 +02:00
Vinícius Mello
7059874dec
feat(ingest/bigquery): Add BigQuery Views lineage extraction from Google Data Catalog API ( #8100 )
2023-05-25 08:37:46 -07:00
Mayuri Nehate
84270bcac8
feat(ingest/nifi): kerberos authentication ( #8097 )
...
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
Co-authored-by: Indy Prentice <iprentic@users.noreply.github.com>
2023-05-24 15:09:01 -07:00
Tamas Nemeth
4ca7a9b50e
fix(ingest/build): setting typing extension <4.6.0 because it breaks tests ( #8108 )
2023-05-23 18:55:28 +05:30
Shirshanka Das
b3c790aab6
feat: Add support for Data Products ( #8039 )
...
Co-authored-by: Chris Collins <chriscollins3456@gmail.com>
2023-05-17 07:17:25 +00:00
Tamas Nemeth
c0d50d0b2c
fix(ingest/s3) Adding missing more-itertools dependency ( #8023 )
2023-05-11 12:14:25 -07:00
Andrew Sikowitz
9c7742b1d7
fix(ingest/unity): Update databricks-cli pin ( #8024 )
2023-05-11 12:14:10 -07:00
Andrew Sikowitz
a68833769e
refactor(ingest/unity): Use databricks-sdk over databricks-cli for usage query ( #7981 )
2023-05-09 13:30:11 -07:00
Andrew Sikowitz
4e9c398e1d
fix(ingest/unity): Add sqllineage dependency ( #7938 )
2023-05-01 23:26:49 -04:00
Andrew Sikowitz
eb1674ffdb
fix(ingest/unity-catalog): Add usage_common dependency to unity catalog plugin ( #7935 )
2023-05-01 14:47:44 -07:00
Andrew Sikowitz
5b290c9bc5
feat(ingest/unity): Add usage extraction; add TableReference ( #7910 )
...
- Adds usage extraction to the unity catalog source and a TableReference object to handle references to tables
Also makes the following refactors:
- Creates UsageAggregator class to usage_common, as I've seen this same logic multiple times.
- Allows customizable user_urn_builder in usage_common as not all unity users are emails. We create emails with a default email_domain config in other connectors like redshift and snowflake, which seems unnecessary now?
- Creates TableReference for unity catalog and adds it to the Table dataclass, for managing string references to tables. Replaces logic, especially in lineage extraction, with these references
- Creates gen_dataset_urn and gen_user_urn on unity source to reduce duplicate code
Breaks up proxy.py into implementation and types
2023-05-01 11:30:09 -07:00
Mayuri Nehate
a0c4e0dd46
feat(ingest): add GCS ingestion source ( #7903 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-04-27 19:03:41 +02:00
Harshal Sheth
29e5cfd643
fix(ingest): fix minor bug + protective dep requirements ( #7861 )
2023-04-25 14:35:01 -07:00