16 Commits

Author SHA1 Message Date
Harshal Sheth
5bc8a895f9
chore(ingest): remove calls to deprecated methods (#13009) 2025-03-28 13:42:54 -07:00
Mayuri Nehate
fba09966f3
fix(ingest): consistent fingerprint for sql parsing aggregator (#12239) 2025-01-03 10:49:42 -08:00
Harshal Sheth
f4be88d0a9
feat(ingest): set pipeline name in system metadata (#10190)
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
2024-06-27 15:00:35 -07:00
Andrew Sikowitz
026f7abe9c
feat(ingest/usage): Make cumulative query character limit configurable (#8751) 2023-08-30 15:53:08 -04:00
Harshal Sheth
89f23d3c36
chore(ingest): bump sqllineage and sqlparse (#8481) 2023-07-28 13:10:19 -07:00
Andrew Sikowitz
7ba2d13087
refactor(ingest): Make get_workunits() return MetadataWorkUnits (#8051)
- Deprecates UsageAggregationClass, /usageStats?action=batchIngest, UsageStatsWorkUnit
- Removes parsing of UsageAggregationClass in file source, all sinks, and WorkUnitRecordExtractor
2023-05-17 00:01:57 -04:00
Andrew Sikowitz
5b290c9bc5
feat(ingest/unity): Add usage extraction; add TableReference (#7910)
- Adds usage extraction to the unity catalog source and a TableReference object to handle references to tables
Also makes the following refactors:
- Creates UsageAggregator class to usage_common, as I've seen this same logic multiple times.
- Allows customizable user_urn_builder in usage_common as not all unity users are emails. We create emails with a default email_domain config in other connectors like redshift and snowflake, which seems unnecessary now?
- Creates TableReference for unity catalog and adds it to the Table dataclass, for managing string references to tables. Replaces logic, especially in lineage extraction, with these references
- Creates gen_dataset_urn and gen_user_urn on unity source to reduce duplicate code
Breaks up proxy.py into implementation and types
2023-05-01 11:30:09 -07:00
Andrew Sikowitz
087855f374
fix(ingest/bigquery): Support cross project usage using FileBackedDict (#7663)
Includes major refactor of bigquery usage ingestion, minor refactor of the source as a whole, and reporting cleanup.
Includes bigquery performance testing changes.
2023-04-07 12:18:26 -07:00
Andrew Sikowitz
8101f0d47a
feat(ingest): Introduce FileBackedDict for offloading data to disk (#7461)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Also includes minor refactoring to the bigquery connector
2023-03-01 19:09:51 -05:00
Mayuri Nehate
aedf1522fb
feat(ingest): snowflake-beta - minor changes, tests (#5910) 2022-09-12 10:42:52 -07:00
Harshal Sheth
e556bcb306
feat(ingest): add entity type inference to mcpw (#5880) 2022-09-10 20:36:10 -07:00
BZ
367fac6066
feat(ingestion): For all usage connectors, allow exclusion of top_n_queries from ingestion via a config param. (#4839)
* feat(redshift-usage): allow users to not ingest top_n_queries

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-05-12 14:26:03 -07:00
Corentin
2fc3a48bc5
feat(ingest): indent sql queries for usage sources (#3782)
* feat(ingest): indent sql queries for usage connectors.

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-22-140.eu-west-1.compute.internal>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-03-31 15:15:09 -07:00
Tamas Nemeth
68711222d4
feat(ingest): usage-stats - add ability to ignore users from top users calculation (#3735) 2022-02-01 00:11:23 -08:00
Harshal Sheth
adf9d2ead7
test(ingest): fix pytest warning for class starting with Test (#3745) 2021-12-14 22:44:42 -08:00
Tamas Nemeth
b9f67c5b65
feat(ingest): trim long sql queries in usage connector (#3725) 2021-12-13 09:16:24 -08:00