Classified the ETL jobs in metadata-ingestion.

This commit is contained in:
Chris Lee 2019-09-16 12:52:37 -07:00
parent 28b876f323
commit 81a38f87f7
4 changed files with 8 additions and 7 deletions

View File

@ -34,7 +34,7 @@ optional arguments:
```
## Bootstrapping Data Hub
If you want to quickly ingest lots of sample data and test Data Hub in action, you can run below command:
Leverage the mce-cli to quickly ingest lots of sample data and test Data Hub in action, you can run below command:
```
➜ python mce_cli.py produce -d bootstrap_mce.dat
Producing MetadataChangeEvent records to topic MetadataChangeEvent. ^c to exit.
@ -45,7 +45,7 @@ Flushing records...
This will bootstrap Data Hub with sample datasets and sample users.
## Ingest metadata from LDAP server to Data Hub
The ldap_etl.py provides you ETL channel to communicate with your LDAP server.
The ldap_etl provides you ETL channel to communicate with your LDAP server.
```
➜ Config your LDAP server environmental variable in the file
LDAPSERVER # Your server host.

View File

@ -14,10 +14,10 @@ PAGESIZE = PAGESIZE
ATTRLIST = ['cn', 'title', 'mail', 'sAMAccountName', 'department','manager']
SEARCHFILTER='SEARCHFILTER'
AVROLOADPATH = 'AVROLOADPATH'
KAFKATOPIC = 'KAFKATOPIC'
BOOTSTRAP = 'BOOTSTRAP'
SCHEMAREGISTRY = 'SCHEMAREGISTRY'
AVROLOADPATH = '../../metadata-events/mxe-schemas/src/renamed/avro/com/linkedin/mxe/MetadataChangeEvent.avsc'
KAFKATOPIC = 'MetadataChangeEvent'
BOOTSTRAP = 'localhost:9092'
SCHEMAREGISTRY = 'http://localhost:8081'
def create_controls(pagesize):
"""

View File

@ -1,7 +1,8 @@
#! /usr/bin/python
import argparse
from confluent_kafka import avro
record_schema = avro.load("../metadata-events/mxe-schemas/src/renamed/avro/com/linkedin/mxe/MetadataChangeEvent.avsc")
record_schema = avro.load("../../metadata-events/mxe-schemas/src/renamed/avro/com/linkedin/mxe/MetadataChangeEvent.avsc")
topic = "MetadataChangeEvent"
class MetadataChangeEvent(object):