82 Commits

Author SHA1 Message Date
John Joyce
352a0abf8d
Introducing TimeSeries Aspects + Dataset Profile (Stats) Aspect (#2983)
Co-authored-by: Dexter Lee <dexter@acryl.io>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2021-07-30 17:41:03 -07:00
Wei Hou
bac1ae42fc
refactor(datahub-web): removing frontend Ember app (i.e. datahub-web folder) (#2921) 2021-07-22 15:58:30 -07:00
John Joyce
cc95916201
feat(gms): Merge MAE, MCE consumers into GMS (#2690) 2021-06-15 08:44:15 -07:00
John Joyce
97e9660037
feat: No Code Metadata Modeling (#2629)
Co-authored-by: Dexter Lee <dexter@acryl.io>
Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2021-06-03 13:24:33 -07:00
Harshal Sheth
d0ca3191c9
build(ingest): add metadata-ingestion to gradle build (#2510) 2021-05-06 22:10:49 -07:00
John Joyce
16cada2055
fix(Ember App): Allow ember build (disabled by default) (#2348)
Co-authored-by: John Joyce <john@acryl.io>
2021-04-06 16:10:50 -07:00
Harshal Sheth
478d232d2f
build: remove deprecated ember app from build (#2328) 2021-04-01 12:16:47 -07:00
Harshal Sheth
c015cf7ca9
feat(docs): use gradle for building docs (#2239) 2021-03-15 16:13:07 -07:00
Arun Vasudevan
84e952e138
feat (graphql): Datahub GMS Graphql Api Application for Querying Dataset (#2071) 2021-02-01 11:51:15 -08:00
Gabe Lyons
e2e3aca478
fix (react): fixing browse routing (#2069)
* fixing browse routing
* including react app in build
2021-01-27 10:52:46 -08:00
John Joyce
50cec65f57
feat(GQL Queries): Productionalizing GraphQL Part 1: Dataset Query support + adding shared GraphQL module (#2066)
* Productionalizing GraphQL Part 1: Dataset Query support + introducing common datahub-graphql-core module.

Co-authored-by: John Joyce <jjoyce0510@gmail.com>
2021-01-22 15:44:00 -08:00
John Plaisted
25b663cc18
refactor: move code to linkedin/datahub-gma. (#1955)
Move code to linkedin/datahub-gma.

"GMA" (Generalized Metadata Architecture) is the backend of DataHub, and has been moved to its own repository.

This deletes the code that was moved and uses jars that GMA publishes to bintray to load it.

Note that not all of GMA was moved, but most of it. We may still move more things to the other repository in the future.
2020-10-23 15:14:57 -07:00
John Plaisted
821bce7d69
feat: Port mce-cli to Java. (#1871)
Port mce-cli to Java.

Also moved off the avro format event file to json instead. Much nicer to use :)
2020-09-25 14:05:29 -07:00
John Plaisted
5e70f3648c Fix build after sync.
- Add build files for new module.
- Correctly edit TestUtils since it isn't synced.
- Reference new test utils.
- Delete duplicate pdl files.

SYNC=metadata-models_100.0.1
2020-09-24 16:02:12 -07:00
John Plaisted
6ece2d6469
Start adding java ETL examples, starting with kafka etl. (#1805)
Start adding java ETL examples, starting with kafka etl.

We've had a few requests to start providing Java examples rather than Python due to type safety.

I've also started to add these to metadata-ingestion-examples to make it clearer these are *examples*. They can be used directly or as a basis for other things.

As we port to Java we'll move examples to contrib.
2020-09-11 13:04:21 -07:00
Jyoti Wadhwani
ad6f1653e1
metadata-models 62.0.3 -> 72.0.8 (#1693) 2020-06-11 10:21:51 -07:00
Kerem Sahin
c64c5d384d Rename elasticsearch-index-job to mae-consumer-job 2019-11-20 18:19:31 -08:00
Kerem Sahin
5bf797b216 corp-identity-gms 1.0.0 -> 1.0.25:
1.0.24: Corp user search across teams and skills
    1.0.21: Make /corpGroups /gridUsers /gridGroups extend BaseEntityResource
    1.0.17: Use correct util function to load resource file
    1.0.16: Add ingest, backfill & getSnapshot action methods to all top-level resources in corp-identity-gms
    1.0.13: Onboard search query templates on corp-identity-gms
     1.0.9: Fix batch get and add client for batch get
     1.0.8: Change package name for corpuser search config
     1.0.7: Use search config to get autocomplete field
     1.0.6: Implement searchable client
     1.0.5: Auto-complete backend support
     1.0.3: Add search API

metadata-models 24.0.0 -> 38.1.6:
    38.1.3: Index active status to corp user search index
    38.1.2: Change update response to create response for create API
   38.0.10: Mark BaseAspectResource as deprecated
    38.0.9: Allow TYPEREF items which have primitive types for arrays in models
    38.0.7: Add get-and-set-if-absent function to Local DAO
    38.0.6: Add find entities with one relationship in query dao
    38.0.4: Fix the inconsistency use of constants and urn params in query dao
    38.0.2: Parse source map to obtain the urn
    38.0.1: Search document validator in Index Builder
    38.0.0: Add urns to search result metadata
    37.0.7: Refactor the query dao
    37.0.6: Use test models in neo4j dao
    37.0.5: Drop metadata model structural assumptions made in neo4j DAOs
    37.0.2: Return empty list from getBrowsePaths if browsePaths field doesn't exist
    36.0.3: Drop elasticsearch-dao's metadata-models dependency
   35.0.10: ES Search DAO to handle null values
    35.0.5: Ebean local Dao query string match
    35.0.4: Drop all search & browse configs that have been moved to individual GMS
    35.0.3: Add ReportTo relationship model
    35.0.0: Load resource file properly
    34.0.9: Make RestliAuditor injectable
    34.0.8: Use encoded query in the test resource
    34.0.4: Handle empty aspects param correctly for backfill & getSnapshot actions
    34.0.1: Remove corp user specific files from metadata-models
    34.0.0: Add backfill & getSnapshot actions to BaseEntityResource
   32.0.16: Merge data template classes into metadata-dao's main artifact
   32.0.14: Replace "update" method with "ingest" action in BaseEntityResource
   32.0.13: Make fliter & sortCriteria parameters optional as they should have been
   32.0.12: Move AspectVersion & SnapshotKey back to their original namespaces
   32.0.11: Break metadata-dao's dependency on metadata-models
   32.0.10: Move model validators to a separate module
    32.0.9: Extract principal from the request context for user AuditStamp
    32.0.8: Fixing nullability annotations for search/autocomplete/browse resources & daos
    32.0.7: Move DAO-specific models to metadata-dao module
    32.0.4: Fix search finder not returning total search results count
    32.0.3: Implement get_all using search index
    32.0.2: Add missing nullability annotation
    32.0.1: Use more consistent naming for the test models
    31.0.1: Use test-specific metadata models in metadata-dao
    31.0.0: Add sort order to Search Dao
    30.0.2: Rename Aspect test model to AspectUnion to avoid confusion
    30.0.1: Committing migration for metadata-models.
    30.0.0: Add default autocomplete field in search config
   29.0.16: Modify testcase to account for empty filters
   29.0.15: Add searchable interface that clients can use
   29.0.14: Use test-specific metadata models in ebean-dao
   29.0.12: Move TestUtils to metadata-test-models module
   29.0.11: Refactor all tests in metadata-restli to use test models
   29.0.10: Move li-metadata-test-utils, metadata-test-models, metadata-test-utils into a new metadata-testing directory to improve code organization.
            Note that this is a backward compatible change as this doesn't alter the produced artifacts.
    29.0.9: Move test-specific models to a stand-alone module
    29.0.2: Refactor the rest of validators
    28.0.3: Refactor validateSchema for aspect
    28.0.2: Implement searchDao for CorpUserInfo.
   27.0.16: Refactor for ModelValidation tests
   27.0.10: Add new relationship union to model utils
    27.0.9: Add plugin to rest client factory
    27.0.6: Add rest high level factory
    27.0.5: Fix a test bug when reviewing the code
    27.0.4: Add create via lambda API to BaseVersionedAspectResource
    27.0.2: Change return type of search finder to capture search result metadata in BaseSearchableEntityResource
    27.0.1: Drop the unnecessary get method from BaseEntityResource
    27.0.0: Add BaseBrowsableEntityResource
   26.0.15: Add autocomplete action to BaseSearchableEntityResource
   26.0.14: Add BaseSearchableEntityResource
   26.0.13: Add getUrnFromDocument & urnClassForDocument util methods that are needed in future RBs
   26.0.11: Add BaseVersionedAspectResource
    26.0.9: Index signals associated with dataset relevance
    26.0.4: Support namespace for ID generation
    26.0.1: Fix inconsistent instance variable naming in SearchResult
    25.0.6: Add entity-snapshot conversion
    25.0.5: Use test-specific metadata models in metadata-restli
    25.0.3: Add aspect filtering to BaseEntityResource
    24.0.9: Add update method to BaseEntityResource
    24.0.7: Fix for parameter types of getBrowsePaths action method

MP_VERSION=corp-identity-gms:1.0.25
MP_VERSION=metadata-models:38.1.6
MP_VERSION=wherehows-samza:1.0.29
2019-10-02 11:13:44 -07:00
Kerem Sahin
693ee97c6c Fix ingestion script by pointing to correct MCE schema
Refactor for metadata-ingestion module
Adding readme for metadata-ingestion
2019-09-08 05:27:19 -07:00
Kerem Sahin
23339df23a Initial commit for Data Hub 2019-08-31 20:51:14 -07:00
Yi (Alan) Wang
ffa80b8b9f
Move kafka processors to module wherehows-ingestion (#1377) 2018-09-12 12:08:52 -07:00
Mars Lan
4b3f7b8935 Remove OS-specific build logic for wherehows-data-model (#773) 2017-09-29 10:00:14 -07:00
Yi (Alan) Wang
d2a3fe58db Add schema and generated java to data model, refactor Gobblin audit processor (#732) 2017-09-11 15:26:06 -07:00
hzhang2
7c87b89e73 refactor based on comments 2017-08-22 22:35:27 -07:00
hzhang2
2ca851753a remove backend changes for this PR 2017-08-22 22:35:27 -07:00
Mars Lan
120bd7c4bc Rename wherehows-api to wherehows-frontend to be consistent with the actual artifact. (#589) 2017-07-10 13:44:36 -07:00
Mars Lan
c7b6fd1688 Move TMS restli related code out of open source. (#587)
Add skeleton for a generalized DAO framework.
2017-07-10 13:44:35 -07:00
Mars Lan
d75ae54b4b Rename data-model to wherehows-data-model. (#492) 2017-07-10 13:42:57 -07:00
Mars Lan
5f5c0937d1 Rename web, backend-service (#490)
* Rename web to wherehows-api and update README.

* Rename backend-service to wherehows-backend

* Rename metadata-etl to wherehows-etl

* Rename hadoop-dataset-extractor-standalone to wherehows-hadoop
2017-07-10 13:42:56 -07:00
Mars Lan
d1e644a7ec Add gradle support for building and packaging the ember web app (#409)
* Add gradle support for building and packaging the ember web app, alleviating the manual instlalation of node/npm/bower/ember.

* Remove generated files under web/public

* Update ember.gradle
2017-07-10 09:55:15 -07:00
na zhang
5f6fffde57 Restli Client for populating espresso/oracle datasets and schema metadata (#349)
* add dali view owner etl

* add idpc ui

* add the internal flag to switch linkedin internal features

* add idpc ui

* add the internal flag to switch linkedin internal features

* DSS-3495, implement the UI for IDPC JIRA part

* DSS-4076, update the metric view since data model changed

* DSS-4092, add metric into search and advanced search

* update metric database table name and fix the refId and refIdType issue

* remove duplicated idpc entry and javascript log

* Add fetch_owner hive script

* support Appworx flow and job definition and execution

* implement the Appworx log parser

* bring the script finder back

* update the script finder source table name

* add the flow_path into lineage and extract the script info

* fix the appwors flow job and lineage extract issues

* bring the git location back to lineage script node

* sort the script finder lineage info by type

* bring the script info back for lineage job tab

* fix the master branch merge issue

* fix the oracle unixtime calculating issue

* shorten the flow&job extract interval time to 2 hours instead of 1 day

* shorten the appworx refresh time

* add license header; include RUNNING chains from SO_JOB_QUEUE for Appworx

* implement the list view for metrics

* Modify /dataset POST method to perform INSERT or UPDATE of the DatasetRecord

* apply the list view css change to metric

* upgrade idpc and script finder to ember 2.6.2

* metadata dashboard confidential field data collecting

* implement the confidential fields of metadata dashboard

* metadata dashboard dataset description collecting

* update the final table name

* update the final table name for other load function

* exchange the source target of cfg_object_name_map

* implement the description tab for metadata dashboard

* add the load dataset and field comments function

* implemented the bar and pie chart for description

* implement the ownership section for metadata dashboard

* fix the issue that appworx lineage job running too long

* add the table job attempt source code

* implemented the idpc compliance section

* Security Compliance Tab UI (#246)

* Add back WhereHows internal tracking (#251)

* DSS-5178 DSS-5277: Implements Compliance and Confidential Spec
Adds 'logs/' to ignored files

Updates EmberSelectorComponent to handle a list of string options or list of options with value and label, flags the currently selected option, and bubble change actions with 'selectionDidChange' action

DSS-5178: Removes previous updates to search.js: moving jQuery + DOM heavy imperative implementation to Ember component

DSS-5178: Adds templates and components DropRegion and DraggableItem

DSS-5178: Adds getSecuritySpec action and compliance types to Dataset controller, cleans up Datasets route and removes inline securitySpec fetch from route

DSS-5178: Updates templates for compliance spec

DSS-5178: Adds compliance component and updates template

Adds .DS_Store to gitignore

DSS-5277: Adds dataset-confidential component to DOM, Creates DatasetConfidential component, refactors out data handling from component

DSS-5277: Moves data fetching to Dataset Route model and set model data on controller, Adds template for confidential spec component

DSS-5178: Moves view related complianceTypes to component

DSS-5277 DSS-5178: Adds styling for tab content

* DSS-5277 DSS-5178: Adds support for modifying compliancePurgeEntities that don't currently have identifierFields persisted on the remote, PurgeableEntityFieldIdentifierType enum is sourced in client

* DSS-5178 DSS-5277: Adds dataType field to UI for schema field name search result. Refactors processSchema into parseSchema to get fields and types

* DSS-5277 Fixes bug with missing params property on controller depending on route entry point

* DSS-5543: Fixes rendering of datasets in detailview navigating from sidebar/ treeview (#259)

* DSS-5677: Changes component from block syntax to inline. Add property for creating a new PrivacyCompliancePolicy and SecuritySpecification for statasets without either

* DSS-5677: Adds ability to create a new PrivacyCompliancePolicy and SecuritySpecification from the client UI. Also fixes issue with matching fields and data type properties on schema with inconsistent shapes

* DSS-5677: Add create banner for datasets without Privacy policy or Security specification

* DSS-5677: Updates UI to more closely match spec, changes search input behaviour to filter from search

* ADD ESPRESSO_DATASET_METADATA_ETL job to fetch Espresso metadata from Nuage

* Update Nuage load process, fix owner subtype and source

* Add VOLDEMORT ETL job to fetch datasets from Nuage

* Add KAFKA ETL job to fetch topics from Nuage

* skip KAFKA topics starting with 'test' when fetching from Nuage

* Merges front-end changes from master -> DSS-5178 DSS-5577 DSS-5677 DSS-5277 DSS-5677

* DSS-5784: Fixes issue with AdvancedSearch and ScriptFinder URL queries being RFC-3986 incompliant

* ScriptFinder Controller add URL decoding for Json fields (#290)

* DSS-5888 Adds configuration support for Piwik environment tracking. Setting the 'tracking.piwik.siteid' to a value will get rendered in the template and consumed by the tracking initializer

* DSS-5888 DSS-5875 Adds tracking for users. Adds client side tracking for keyword and init for Piwik script module

* Fixes mismatch with compliance api property name: privacyCompliancePolicy != privacyCompliance

* DSS-5888 Fixes tracking userId for noscript tag

* DSS-5865 Removes spinner on metadata/dashboard/idpc-compliance fail

* DSS-6177 Removed unused links in Metric Detail page

* Update Appworx Execution and Lineage jobs (#321)

* DSS-6197: Adds default value for classification property on security specification if not defined

* DSS-6198: Fixes issue with nested fields not getting rendered in the schema for compliance and confidential tabs

* DSS-6018 Adds ui feature to track feedback on user search results relevance using a up/down voting mechanism

* Make unit tests buildable again for backend and web (#325)

* Make unit tests buildable again for backend and web.

* Add back fest dependency so the tests can stay more of less the same as before.

* Generate code coverage reports (#334)

* Add playCoverage task to run code coverage using JaCoco for backend and web.

* Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.

* Add data platform filter for dashboard APIs (#322)

* Add data platform filter for dashboard APIs

* Add exception handling for Espresso and Kafka ETL job

* restli client to populate espresso and oracle metadata
2017-07-10 09:54:08 -07:00
SunZhaonan
d5c3d87d00 Initial commit 2015-11-19 14:39:21 -08:00