362 Commits

Author SHA1 Message Date
Kevin Hu
ebdaa0e359
feat(ingest): Feast ingestion integration (#2605)
* Add feast testing setup

* Init Feast test script

* Add feast to dependencies

* Update feast descriptors

* Sort integrations

* Working feast pytest

* Clean up feast docker-compose file

* Expand Feast tests

* Setup feast classes

* Add continuous and bytes data to feature types

* Update field type mapping

* Add PDLs

* Add MLFeatureSetUrn.java

* Comment out feast setup

* Add snapshot file and update inits

* Init Feast golden files generation

* Clean up Feast ingest

* Feast testing comments

* Yield Feature snapshots

* Fix Feature URN naming

* Update feast MCE

* Update Feature URN prefix

* Add MLEntity

* Update golden files with entities

* Specify feast sources

* Add feast source configs

* Working feast docker ingestion

* List entities and features before adding tables

* Add featureset names

* Remove unused

* Rename feast image

* Update README

* Add env to feast URNs

* Fix URN naming

* Remove redundant URN names

* Fix enum backcompatibility

* Move feast testing to docker

* Move URN generators to mce_builder

* Add source for features

* Switch TypeClass -> enum_type

* Rename source -> sourceDataset

* Add local Feast ingest image builds

* Rename Entity -> MLPrimaryKey

* Restore features and keys for each featureset

* Do not json encode source configs

* Remove old source properties from feature sets

* Regenerate golden file

* Fix race condition with Feast tests

* Exclude unknown source

* Update feature datatype enum

* Update README and fix typos

* Fix Entity typo

* Fix path to local docker image

* Specify feast config and version

* Fix feast env variables

* PR fixes

* Refactor feast ingest constants

* Make feature sources optional for back-compatibility

* Remove unused GCP files

* adding docker publish workflow

* Simplify name+namespace in PrimaryKeys

* adding docker publish workflow

* debug

* final attempt

* final final attempt

* final final final commit

* Switch to published ingestion image

* Update name and namespace in java files

* Rename FeatureSet -> FeatureTable

* Regenerate codegen

* Fix initial generation errors

* Update snapshot jsons

* Regenerated schemas

* Fix URN formats

* Revise builds

* Clean up feast URN builders

* Fix naming typos

* Fix Feature Set -> Feature Table

* Fix comments

* PR fixes

* All you need is Urn

* Regenerate snapshots and update validation

* Add UNKNOWN data type

* URNs for source types

* Add note on docker requirement

* Fix typo

* Reorder aspect unions

* Refactor feast ingest functions

* Update snapshot jsons

* Rebuild

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2021-06-09 15:07:04 -07:00
Kevin Hu
c0ace2ce59
fix(ingest): fix MyPy stubs (#2666) 2021-06-08 16:10:16 -07:00
Harshal Sheth
2123c8b6d7
fix(ingest): exclude mssql-odbc from "all" extra (#2660) 2021-06-07 14:00:35 -07:00
Harshal Sheth
31eae24300
fix(ingest): support mssql encryption via ODBC (#2657) 2021-06-04 18:19:11 -07:00
Harshal Sheth
a0ad590b3f
fix(ingest): improve redshift ingestion performance (#2635) 2021-06-03 11:14:34 -07:00
Harshal Sheth
acef397ece
fix(ingest): fail gracefully when lookml used on old python versions (#2614) 2021-05-26 17:16:17 -07:00
Kevin Hu
48d2b94203
fix(ingest): default values for env (#2598) 2021-05-24 14:09:55 -07:00
taufiqibrahim
db78373427
feat(ingest): kafka connect metadata ingestion (#2516) 2021-05-18 14:45:38 -07:00
Harshal Sheth
ebe7409897
fix(cli): prevent click from suppressing errors (#2560) 2021-05-17 11:50:38 -07:00
Harshal Sheth
3dfe3d375b
feat(ingest): add options for Airflow lineage backend (#2557) 2021-05-13 20:02:47 -07:00
Fredrik Sannholm
133577557c
feat(ingest): Looker view and dashboard ingestion (#2493) 2021-05-13 11:42:53 -07:00
Harshal Sheth
a671001824
refactor(ingest): move Airflow into datahub_provider module (#2521) 2021-05-12 15:01:11 -07:00
Albert Franzi
7fce505ffb
feat(ingest): define Redshift as a Postgres Source (#2540) 2021-05-12 10:00:34 -07:00
Harshal Sheth
cd588baccb
build(ingest): include package data in sdist (#2513) 2021-05-07 15:21:43 -07:00
Harshal Sheth
1facfbd5a3
feat(ingest): capture table properties if available (#2497) 2021-05-05 14:07:08 -07:00
Harshal Sheth
c32bf494d5
fix(ingest): support https connections with cookies in Hive ingestion (#2489)
Tested locally.
2021-05-04 13:10:52 -07:00
Harshal Sheth
6f1f0a4845
feat(ingest): support hive over http (#2486) 2021-05-03 22:11:50 -07:00
Harshal Sheth
d415234a8c
fix(ingest): fields with defaults should be optional (#2461) 2021-04-26 16:45:48 -07:00
Harshal Sheth
2da5e1fd10
feat(ingest): setup scaffolding for tox testing (#2451) 2021-04-26 16:44:36 -07:00
Harshal Sheth
034c33a050
fix(ingest): use entrypoints lib instead of pkg_resources (#2438) 2021-04-22 00:13:47 -07:00
Gabe Lyons
c7b49de67b
feat(ingest): adding superset ingestion source (#2425) 2021-04-22 00:11:54 -07:00
Harshal Sheth
ffe49f061a
fix(ingest): fix chart type enum serialization and add tests for rest emitter (#2429) 2021-04-21 11:34:24 -07:00
Harshal Sheth
79daec29b7
fix(ingest): ensure upstreams in airflow lineage emission are entities (#2427) 2021-04-20 20:44:38 -07:00
Harshal Sheth
9ac17c4ee0
fix(ingest): bump avro-gen3 (#2403)
Closes #2375.
2021-04-16 11:59:05 -07:00
Harshal Sheth
777c05973f
fix(ingest): add sqlalchemy extra (#2409) 2021-04-16 09:41:23 -07:00
Harshal Sheth
ffe03e6758
fix(ingest): streamline codegen init methods (#2400) 2021-04-14 19:25:57 -07:00
Harshal Sheth
fb6f74b1da
feat(ingest): add generic sqlalchemy source (#2389) 2021-04-13 08:01:38 -07:00
Harshal Sheth
41cd52f9e2
feat(ingest): add Airflow lineage backend (#2368) 2021-04-12 17:40:15 -07:00
Harshal Sheth
b0d8f70354
fix(ingest): bump pybigquery version (#2352) 2021-04-06 18:34:06 -07:00
Harshal Sheth
cfc02ee196
feat(ingest): add Oracle db support (#2347) 2021-04-06 15:38:25 -07:00
Harshal Sheth
bd78b84bd3
feat(ingest): start airflow integration + metadata builders (#2331) 2021-04-05 19:11:28 -07:00
Harshal Sheth
c1f3eaed35
fix(ingest): add support for database and table patterns to glue source (#2339) 2021-04-05 17:14:02 -07:00
amy m
759288161c
feat(ingest): adding support for AWS Glue (#2319)
Co-authored-by: Harry Nash <harrywilliamnash@gmail.com>
2021-04-04 11:00:27 -07:00
Harshal Sheth
cb24628886
feat(ingest): verify dynamic registry types at runtime (#2327) 2021-04-01 12:15:05 -07:00
Joe Mirizio
f3304bec7c
feat(ingest): dynamically register plugins (#2316)
Co-authored-by: Joe Mirizio <mirizioj@email.chop.edu>
2021-03-31 20:59:45 -07:00
Harshal Sheth
f57c954fc6
feat(ingest): support environment variables in recipes (#2306) 2021-03-26 21:57:05 -07:00
Harshal Sheth
07f4cb1199
feat: datahub check local-docker (#2295) 2021-03-26 10:03:51 -07:00
Harshal Sheth
cc19465f55
fix(ingest): resolve array serialization bug (#2290) 2021-03-24 10:02:46 -07:00
Harshal Sheth
a921d0deae
feat(ingest): MongoDB ingestion source (#2289) 2021-03-23 20:15:44 -07:00
Harshal Sheth
1ea450e0e4
fix(ingest): use custom pybigquery ref to get descriptions (#2279) 2021-03-22 23:17:54 -07:00
Harshal Sheth
b8462028c3
feat(ingest): various minor fixes (#2246) 2021-03-17 23:05:05 -07:00
Pedro Silva
6a0c402a58
feat(ingest): Add support for druid (#2235) 2021-03-17 20:06:48 -07:00
Harshal Sheth
aa6bc15cd7
fix(ingest): various avro codegen fixes (#2232) 2021-03-15 15:27:30 -07:00
Harshal Sheth
95c124ffc4
fix(ingest): pin version of avro-gen3 (#2230) 2021-03-12 09:39:38 -08:00
Harshal Sheth
6a8fca59f1
feat(ingest): use plugin system based on Python extras (#2224) 2021-03-11 13:41:05 -08:00
Harshal Sheth
dced25fef7
feat(ingest): switch quickstart to Python ingestion (#2158) 2021-03-02 11:48:26 -08:00
Harshal Sheth
347148b79b Update python workflow 2021-02-15 18:29:27 -08:00
Harshal Sheth
38f75be8ad gometa -> datahub 2021-02-15 18:29:27 -08:00
Harshal Sheth
0063c04460 gometa-ingest -> datahub ingest 2021-02-15 18:29:27 -08:00
Harshal Sheth
b91d0cf63b Add bigquery and refactor others 2021-02-15 18:29:27 -08:00