198 Commits

Author SHA1 Message Date
John Joyce
352a0abf8d
Introducing TimeSeries Aspects + Dataset Profile (Stats) Aspect (#2983)
Co-authored-by: Dexter Lee <dexter@acryl.io>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2021-07-30 17:41:03 -07:00
aseembansal-gogo
2712f5587e
docs(ingest): Add instructions to install required dependency (#2995) 2021-07-30 07:21:24 -07:00
Chinmay Bhat
cabcdd0553
docs(ingest): fixed Snowflake recipe to escape dollar-sign (#2994) 2021-07-29 16:36:09 -07:00
Chinmay Bhat
a33770b022
fix(ingest): fix hive ingestion to respect database configuration (#2978) 2021-07-28 20:20:18 -07:00
Kevin Hu
a1d1dd4269
feat(docs): tutorial for writing a custom transformer (#2959) 2021-07-28 14:38:13 -07:00
Harshal Sheth
fa22b8e17b
feat(ingest): add Airflow TaskFlow example (#2958) 2021-07-26 13:09:25 -07:00
Harshal Sheth
4bcfe92df0
fix(ingest): handle quotes in lookml properly (#2940) 2021-07-22 22:19:04 -07:00
Harshal Sheth
b064e51a84
fix(ingestion): make snowflake database names lowercase (#2942) 2021-07-22 21:52:03 -07:00
Harshal Sheth
01982310be
feat(ingest): use urn builders in looker and validate data platforms (#2939) 2021-07-22 21:50:44 -07:00
Harshal Sheth
bc6fdfa2d4
docs(ingest): update looker + docker script docs (#2934) 2021-07-22 13:30:20 -07:00
Harshal Sheth
7535cf2b85
fix(ingest): note that views are not supported for Athena (#2924) 2021-07-21 12:48:40 -07:00
Harshal Sheth
ad30f2b8ec
feat(ingestion): support multiple project IDs in bigquery usage stats (#2920) 2021-07-21 12:42:06 -07:00
aseembansal-gogo
6e1b2cf4f7
feat(ingest): Add option to change name of database for postgres (#2898) 2021-07-20 07:01:42 -07:00
Kevin Hu
44ed2f3684
feat(ingest): extract lineage between SageMaker jobs and models (#2868) 2021-07-15 18:56:13 -07:00
Harshal Sheth
83fd69d46d
docs(ingest): remove hanging sentence from docs (#2853) 2021-07-08 16:37:32 -07:00
Kevin Hu
a2106ca9e8
feat(ingest): SageMaker jobs and models (#2830) 2021-07-08 16:16:16 -07:00
Fredrik Sannholm
c2f2973c1b
fix(ingest): Fix glob pattern and handle possible recursion in lookml (#2851) 2021-07-08 12:26:11 -07:00
Harshal Sheth
2d1dd95a84
docs(ingest): clarify that the Kafka options are pass-through (#2837) 2021-07-06 19:22:35 -07:00
Harshal Sheth
6b59cdeb82
fix(ingest): mask password in info-level logs (#2835) 2021-07-06 16:41:54 -07:00
Harshal Sheth
288d17f07e
docs(ingest): update links to Kafka docs (#2834) 2021-07-06 15:33:52 -07:00
Harshal Sheth
4c39d86f63
docs(ingest): add extra info for Redshift behind a proxy (#2817) 2021-07-02 10:31:14 -07:00
Harshal Sheth
e51f86a9de
feat(ingest): support ingesting from multiple snowflake dbs (#2793) 2021-06-30 15:54:17 -07:00
Kevin Hu
4da76726d3
feat(ingest): SageMaker feature store ingestion (#2758) 2021-06-29 19:43:31 -07:00
Kevin Hu
14294e8f89
fix(docs): links to Feast entities (#2780) 2021-06-29 08:12:11 -07:00
Kevin Hu
09bbcea0a8
feat(ingest): add non-random sampling for mongo (#2778) 2021-06-27 23:40:17 -07:00
Harshal Sheth
c05459b446
docs: upgrade docusaurus, minor ingestion updates (#2774) 2021-06-27 23:38:38 -07:00
Harshal Sheth
424139145b
docs(ingest): move usage stats docs into the "sources" section (#2766) 2021-06-24 23:03:26 -07:00
Harshal Sheth
19b2a42a00
feat: usage stats (part 2) (#2762)
Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com>
2021-06-24 19:44:59 -07:00
Remi
91f5d4f59a
feat(ingest): add option to specify source platform database in lookml ingestion (#2749) 2021-06-23 16:16:20 -07:00
Kevin Hu
22a2ed81e4
feat(ingest): ingest last-modified from dbt sources.json (#2729) 2021-06-23 13:56:20 -07:00
Kevin Hu
a89094da5b
feat(ingest): add support for Glue ETL jobs (#2687) 2021-06-22 11:33:22 -07:00
vijayan-nallasami-curve
26329e0af9
added node_type_pattern in dbt yaml file (#2705) 2021-06-17 10:24:33 -07:00
Kevin Hu
63fe82995b
feat(ingest): Add test case and docs for SQL view ingestion (#2709) 2021-06-16 16:51:57 -07:00
Kevin Hu
1c364b52f6
feat(docs): Docs for S3 ingestion with AWS Glue (#2672) 2021-06-14 17:08:50 -07:00
Vincenzo Lavorini
a7fc76f590
feat(sql_views): added views as datasets for SQLAlchemy DBs (#2663) 2021-06-11 17:30:33 -07:00
Harshal Sheth
1b539220d5
feat(ingest): support Oracle service names (#2676) 2021-06-11 17:27:34 -07:00
Kevin Hu
ebdaa0e359
feat(ingest): Feast ingestion integration (#2605)
* Add feast testing setup

* Init Feast test script

* Add feast to dependencies

* Update feast descriptors

* Sort integrations

* Working feast pytest

* Clean up feast docker-compose file

* Expand Feast tests

* Setup feast classes

* Add continuous and bytes data to feature types

* Update field type mapping

* Add PDLs

* Add MLFeatureSetUrn.java

* Comment out feast setup

* Add snapshot file and update inits

* Init Feast golden files generation

* Clean up Feast ingest

* Feast testing comments

* Yield Feature snapshots

* Fix Feature URN naming

* Update feast MCE

* Update Feature URN prefix

* Add MLEntity

* Update golden files with entities

* Specify feast sources

* Add feast source configs

* Working feast docker ingestion

* List entities and features before adding tables

* Add featureset names

* Remove unused

* Rename feast image

* Update README

* Add env to feast URNs

* Fix URN naming

* Remove redundant URN names

* Fix enum backcompatibility

* Move feast testing to docker

* Move URN generators to mce_builder

* Add source for features

* Switch TypeClass -> enum_type

* Rename source -> sourceDataset

* Add local Feast ingest image builds

* Rename Entity -> MLPrimaryKey

* Restore features and keys for each featureset

* Do not json encode source configs

* Remove old source properties from feature sets

* Regenerate golden file

* Fix race condition with Feast tests

* Exclude unknown source

* Update feature datatype enum

* Update README and fix typos

* Fix Entity typo

* Fix path to local docker image

* Specify feast config and version

* Fix feast env variables

* PR fixes

* Refactor feast ingest constants

* Make feature sources optional for back-compatibility

* Remove unused GCP files

* adding docker publish workflow

* Simplify name+namespace in PrimaryKeys

* adding docker publish workflow

* debug

* final attempt

* final final attempt

* final final final commit

* Switch to published ingestion image

* Update name and namespace in java files

* Rename FeatureSet -> FeatureTable

* Regenerate codegen

* Fix initial generation errors

* Update snapshot jsons

* Regenerated schemas

* Fix URN formats

* Revise builds

* Clean up feast URN builders

* Fix naming typos

* Fix Feature Set -> Feature Table

* Fix comments

* PR fixes

* All you need is Urn

* Regenerate snapshots and update validation

* Add UNKNOWN data type

* URNs for source types

* Add note on docker requirement

* Fix typo

* Reorder aspect unions

* Refactor feast ingest functions

* Update snapshot jsons

* Rebuild

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2021-06-09 15:07:04 -07:00
Harshal Sheth
31eae24300
fix(ingest): support mssql encryption via ODBC (#2657) 2021-06-04 18:19:11 -07:00
Harshal Sheth
a0ad590b3f
fix(ingest): improve redshift ingestion performance (#2635) 2021-06-03 11:14:34 -07:00
Remi
6aa133f99c
fix(ingest): fix lineage after dbt metadata ingestion when tables name and identifier differ (#2596) 2021-05-25 18:59:35 -07:00
Kevin Hu
48d2b94203
fix(ingest): default values for env (#2598) 2021-05-24 14:09:55 -07:00
Harshal Sheth
deca4a5073
docs(ingest): add a guide for writing sources (#2575) 2021-05-24 12:23:03 -07:00
John Bodley
227c52f29b
fix(docs): Fix Superset typo in README (#2584) 2021-05-18 21:22:17 -07:00
taufiqibrahim
db78373427
feat(ingest): kafka connect metadata ingestion (#2516) 2021-05-18 14:45:38 -07:00
Harshal Sheth
1d4bcbe4fb
feat(ingest): add dataset tag transformer (#2580) 2021-05-18 14:43:43 -07:00
Harshal Sheth
7af1a13138
fix(ingest): better active directory LDAP support (#2571) 2021-05-17 14:42:54 -07:00
Gary Lucas
af4f3b9683
fix(dbt): set target platform and load schema (#2483) 2021-05-17 12:22:52 -07:00
Albert Franzi
38e3f6d4d0
feat(ingest): add AWS IAM Roles Support to the Glue Source (#2563) 2021-05-17 12:19:34 -07:00
Harshal Sheth
3dfe3d375b
feat(ingest): add options for Airflow lineage backend (#2557) 2021-05-13 20:02:47 -07:00
Kevin Hu
5ab1cbbbb2
feat(ingest): MongoDB schema inference (#2546) 2021-05-13 19:44:33 -07:00