3975 Commits

Author SHA1 Message Date
Harshal Sheth
c7892ada4c Codegen avro + datahub kafka sink (#3)
* Add codegen

* New architecture + setup file -> console pipeline

* Cleanup source loader

* Basic Kafka metadata source

* Kafka source and extractor

* Add kwargs construct interface

* Fix kafka source unit test

* start working on pipeline test

* kafka datahub sink

* Make myself a profile

* Ingest to datahub from kafka

* Update codegen

* Add restli transport

* Fix bug in restli conversion
2021-02-15 18:29:27 -08:00
Shirshanka Das
b59a62fa1c setting modest coverage targets 2021-02-15 18:29:27 -08:00
Shirshanka Das
35e9f28b56 dropping unit tests 2021-02-15 18:29:27 -08:00
Shirshanka Das
60d861b498 dropping hard failures 2021-02-15 18:29:27 -08:00
Shirshanka Das
5e589da514 adding python action 2021-02-15 18:29:27 -08:00
Shirshanka Das
6b5bbbdc5f workaround for docker exec, waiting for 5 more seconds 2021-02-15 18:29:27 -08:00
Harshal Sheth
4fb673925c Start using avro producer 2021-02-15 18:29:27 -08:00
Shirshanka Das
9e61220132 checking in testing fixtures. docker still not working 2021-02-15 18:29:27 -08:00
Shirshanka Das
1ddbdee60c Support for SQL databases (MySQL + MS-SQL) (#2)
* adding sql source + mysql

* adding sql support

* MSSQL support, basic integration test

* file sink and pipeline context
2021-02-15 18:29:27 -08:00
Shirshanka Das
faf472aa64 adding some TODOs 2021-02-15 18:29:27 -08:00
Shirshanka Das
128781942d Firstdrop of ingest (#1) 2021-02-15 18:29:27 -08:00
Shirshanka Das
90b635fb7c Initial commit 2021-02-15 18:29:27 -08:00
Harshal Sheth
082c86463e Move old metadata ingestion scripts out of the way 2021-02-15 18:29:27 -08:00
Harshal Sheth
b491e4ad3c
fix(SQL ingest): Bump confluent-kafka version (#2082)
This should help resolve #2079.
2021-02-03 18:06:35 -08:00
Mars Lan
7a786c185b
Drop obsolete info on mysql-etl (#2072) 2021-01-29 09:03:53 -08:00
Kerem Sahin
4d8320e4a0
feat(dashboard): Dashboards backend implementation (#1884) 2020-11-23 09:25:58 -08:00
Grant Nicholas
fa58c2d161
fix(metadata-ingestion): Fix auditStamp unix timestamp format in sql etl ingestion (#1918)
Datahub was expecting this timestamp to be in milliseconds since epoch, not seconds. This change makes the lastModified timestamp render correctly in the UI when it is converted to a date time string.
2020-10-06 11:13:02 -07:00
John Plaisted
821bce7d69
feat: Port mce-cli to Java. (#1871)
Port mce-cli to Java.

Also moved off the avro format event file to json instead. Much nicer to use :)
2020-09-25 14:05:29 -07:00
Charlie Tran
57fdc5c00c
Adds ability for midtier to serve custom dataset properties from aspect (#1881) 2020-09-20 11:04:51 -07:00
John Plaisted
6ece2d6469
Start adding java ETL examples, starting with kafka etl. (#1805)
Start adding java ETL examples, starting with kafka etl.

We've had a few requests to start providing Java examples rather than Python due to type safety.

I've also started to add these to metadata-ingestion-examples to make it clearer these are *examples*. They can be used directly or as a basis for other things.

As we port to Java we'll move examples to contrib.
2020-09-11 13:04:21 -07:00
John Plaisted
23ad0e9c8b
Small fixes to mce_cli (#1868)
- default argument value should be None not "None"
- Test data should have corpuser, not corpUser (case sensitive)

fixes https://github.com/linkedin/datahub/issues/1867
fixes https://github.com/linkedin/datahub/issues/1865
2020-09-10 19:30:47 -07:00
fabiofilz
340c54317c
1849 support ssl to mce cli.py (#1857)
* Adding SSL support to mce_cli.py

* Kafka Config option

* Adding space and removing the commented line

Co-authored-by: Fabio de Simoni <fabio.desimoni@kindredgroup.com>
2020-09-04 12:17:27 -07:00
Mars Lan
7d6fde4f37
feat: add MCE ingestion support for CorpGroup (#1837)
* feat: add MCE ingestion support for CorpGroup

Also use consistent camel case for corp user URNs in bootstrap MCE data

Fixes https://github.com/linkedin/datahub/issues/1822
2020-08-31 10:08:58 -07:00
Mars Lan
03e3d49445
feat(ingest): add example crawler for MS SQL (#1803)
Also fix the incorrect assumption on column comments & add sample docker-compose file
2020-08-12 08:51:39 -07:00
Chris Lee
381c3e7fcd
Update README.md 2020-07-31 12:29:39 -07:00
Chris Lee
4143fb901e
<refactor>[ingestions]: align the default kafka topics with PR #1756 (#1758) 2020-07-29 20:26:01 -07:00
cobolbaby
5dc61658f8
fix: correct the way to catch the exception (#1727)
* fix: modify the etl script dependency

* fix: Correct the way to catch the exception

* fix: Compatible with the following kafka cluster when the Kafka Topic message Key cannot be empty

* fix: Adjust the kafka message key; Improve the comment of field

* fix: Avro schema required for key

Co-authored-by: Cobolbaby <Zhang.Xing-Long@inventec.com>
2020-07-10 07:56:19 -07:00
cobolbaby
ed128080e2
fix: modify the etl script dependency (#1726)
Co-authored-by: Cobolbaby <Zhang.Xing-Long@inventec.com>
2020-07-08 21:51:42 -07:00
Kerem Sahin
2dc11a51f4
fix(py3): Bump ingestion Docker py dependency to 3.6 (#1716) 2020-06-29 08:22:50 -07:00
Mars Lan
65bf623b8b
feat(ingest): add snowflake ETL script (#1714) 2020-06-25 19:05:38 -07:00
Mars Lan
682bb87a7e
feat(ingest): replace custom hive-etl with sql-based ETL (#1713)
This offloads most of the heavy lifting to SQLAlchemy.
Also add a docker file for testing
2020-06-25 19:04:56 -07:00
Mars Lan
5da55fe8d3
Update README.md 2020-06-25 16:32:22 -07:00
Mars Lan
52a54b9fda
feat(ingest): add PostgreSQL ETL script (#1712)
Also add a simple docker file for testing
2020-06-25 15:28:42 -07:00
Mars Lan
221c9af220
feature(ingest): add bigquery ETL script (#1711)
Also fix minor issues in the common script
2020-06-25 15:28:13 -07:00
Mars Lan
fa9fe5e110
refactor(py3): Refactor all ETL scripts to using Python 3 exclusively (#1710)
* refactor(py3): Refactor all ETL scripts to using Python 3 exclusively

Fix https://github.com/linkedin/datahub/issues/1688

* Update requirements.txt
2020-06-25 15:16:04 -07:00
Mars Lan
8e6665fc94
Update README.md 2020-06-22 21:26:38 -07:00
Mars Lan
4fea6083f8
feature(etl): add SQLAlchemy-based ingestion script (#1708)
This replaces the old incomplete rdbms ETL script.
2020-06-22 21:25:55 -07:00
Kerem Sahin
f79b2c958a fix(ingestion): Fix sample MCE for data process 2020-06-11 01:04:52 -07:00
Liangjun Jiang
92c4a3689e
Data process entity (#1680)
* add job info as aspect of a dataset

* add job urn def., aspect and entity

* job entity with upstream and downstream lineage

* use job urn in upstream & downstream

* add Job entity rest APIs

* rest.li api, impl and factory for job entity

* code cleanup

* use pdl; onboard data process entity

* add es index json

* fix gradlew build ignored tasks

* add a comment about data process info field

* fix style warning issues

* update content based on PR

* checked in generated snapshot json

* updated based on PR feedback

* update data process data format

* updated based on code review feedback

* revert back gms & mce-job docker image

* delete temp files

* update based pr feedback

* file name and a typo

* format with linkedin style

Co-authored-by: Liangjun <liajiang@expediagroup.com>
2020-06-09 15:42:08 -07:00
Mars Lan
867dbd0d36
fix: use tuple notations for union types 2020-06-03 15:36:07 -07:00
Mars Lan
b6589ab1d1
Update README.md 2020-06-03 13:52:56 -07:00
Chris Lee
2a59070d54
fix(metadata-ingestion): pass schema_record to mce-cli cosumer (#1646) 2020-04-24 14:34:16 -07:00
Mars Lan
aa81e774fd
doc: fix example MCEs 2020-04-02 19:39:12 -07:00
Chris Lee
ba33c7a5cd Specify python version in mce-cli requirement.txt 2020-03-27 13:33:22 -07:00
Chris Lee
d1cf62854d
Fix: Docker Quickstart - Sample Data Loading Error
Specify the python version for the required confluent-Kafka library.
2020-03-27 13:14:23 -07:00
Jay Sen
1579a209b3
specify explicit avro lib for compatibility issue (#1605) 2020-03-23 09:50:46 -07:00
Kerem Sahin
a745c4035f Update metadata for bootstrap datasets 2020-02-11 00:23:25 -08:00
Kerem Sahin
8704e3dd62 Update bootstrap data 2020-02-07 18:11:10 -08:00
Kerem Sahin
9b536ecf80 Small doc fix 2020-02-06 18:28:29 -08:00
Kerem Sahin
165d4aef95 Documentation update part-1 2019-12-18 18:57:18 -08:00