41 Commits

Author SHA1 Message Date
Dexter Lee
cda1ce4589
feat(dashboards): Add browse end point for charts and dashboards (#2143)
Co-authored-by: Dexter Lee <dexter@acryl.io>
2021-02-28 10:53:02 -08:00
RyanHolstien
ea86ade29b
feat: ML Model Backend Implementation (#1896)
Co-authored-by: RyanHolstien <rholstien@expediagroup.com>
2021-02-17 13:28:13 -08:00
Nagarjuna Kanamarlapudi
f103998dcb
feat(Search): Inject restli client into index builders (#2024)
* feat(Search): Inject restli client into index builders
Inject restli client into index builders
2020-12-03 11:43:48 -08:00
Kerem Sahin
4d8320e4a0
feat(dashboard): Dashboards backend implementation (#1884) 2020-11-23 09:25:58 -08:00
Nagarjuna Kanamarlapudi
5d083143db
feat(dataset): Enable search of datasets by field names (#2001)
* feat(dataset): Enable search of datasets by field names
2020-11-20 12:01:07 -08:00
John Plaisted
60e43061d8
[Breaking] Update to GMA 0.2.0 and fix Urn definitions. (#1977)
Urn definitions needed to be updated since 0.2.0 changed the base Urn class. 

I also added some more urn coercers that were missing.
2020-11-11 16:06:29 -08:00
John Plaisted
b4d22fc463
refactor: use better assertj assertions. (#1975) 2020-10-29 11:46:25 -07:00
Jyoti Wadhwani
0c92a8e887
refactor search index builder to store urn parts efficiently (#1937) (#1972)
* refactor search index builder to store urn parts efficiently (#1937)

Co-authored-by: Jyoti Wadhwani <jywadhwa@linkedin.com>

* set urn for all documents

* rebase, fix merge conflicts and modify tests

Co-authored-by: Jyoti Wadhwani <jywadhwa@linkedin.com>
2020-10-29 09:52:41 -07:00
Jyoti Wadhwani
f455f51756
get rid of search mock utils (#1973) 2020-10-28 14:13:46 -07:00
John Plaisted
b2e73fa003
test: improve test coverage for DatasetIndexBuilder. (#1971)
The other index builders need similar improvements, but after #1937 some coverage is better than no coverage.
2020-10-28 13:32:04 -07:00
John Plaisted
5fd2110b7f
Revert "refactor search index builder to store urn parts efficiently (#1937)" (#1970)
This broke MAE processor because not all documents have urns now.

This reverts commit 5fca512a07cea75480633188e634ba07f49cdb28.
2020-10-28 11:32:18 -07:00
John Plaisted
25b663cc18
refactor: move code to linkedin/datahub-gma. (#1955)
Move code to linkedin/datahub-gma.

"GMA" (Generalized Metadata Architecture) is the backend of DataHub, and has been moved to its own repository.

This deletes the code that was moved and uses jars that GMA publishes to bintray to load it.

Note that not all of GMA was moved, but most of it. We may still move more things to the other repository in the future.
2020-10-23 15:14:57 -07:00
Jyoti Wadhwani
5fca512a07
refactor search index builder to store urn parts efficiently (#1937)
Co-authored-by: Jyoti Wadhwani <jywadhwa@linkedin.com>
2020-10-14 13:47:12 -07:00
John Plaisted
5e70f3648c Fix build after sync.
- Add build files for new module.
- Correctly edit TestUtils since it isn't synced.
- Reference new test utils.
- Delete duplicate pdl files.

SYNC=metadata-models_100.0.1
2020-09-24 16:02:12 -07:00
John Plaisted
96da83033c Break dependency of metadata-test-utils on metadata-models. 2020-09-24 16:02:12 -07:00
John Plaisted
6b9a053f6e ROLL FORWARD: Add new style checks and fix issues.
- Upgrade to checkstyle 8
- Copy javadoc checks from Google
- Disable missing class and method checks for now, too many warnings. I'll have to figure out how to suppress them instead.
- Fix other issues, which are mostly missing periods at the end of sentences and lack of paragraph tags.

Revert "Reverting the commit range: 8dfdb73ac6c73581ef56c0d81c21a2a92e8a1a02..194bd6f57f4a4d075d2ea1f442397d1139080f7a."

This reverts commit ab178ec1469fa72c0c339f0b842e7ff0850e7c74.
2020-09-11 09:15:56 -07:00
John Plaisted
6ac7622af6 Reverting the commit range: 8dfdb73ac6c73581ef56c0d81c21a2a92e8a1a02..194bd6f57f4a4d075d2ea1f442397d1139080f7a.
REVERTED RB=99999 PCVALIDATIONOVERRIDE I18NOVERRIDE CIOVERRIDE TRUNKBLOCKERFIX

See https://crt.prod.linkedin.com/#/testing/executions/77e10182-d60f-4c8d-9e55-599bdc4384e0/execution for more details.
2020-09-11 09:15:56 -07:00
John Plaisted
b9f11ae21b Add new style checks and fix issues.
- Upgrade to checkstyle 8
- Copy javadoc checks from Google
- Disable missing class and method checks for now, too many warnings. I'll have to figure out how to suppress them instead.
- Fix other issues, which are mostly missing periods at the end of sentences and lack of paragraph tags.
2020-09-11 09:15:56 -07:00
John Plaisted
bc7a29802d Add user email to the search index.
ldap and email are technically different in a few ways. Email not only includes the domain (@linkedin.com), but the user part of it may be different than ldap. Generally emails are username@domain; at LinkedIn ldaps are generally usernames truncated to 8 characters.

For the sake of being technically correct; also index emails so that if clients want to search email, they can search email rather than ldap.
2020-09-11 09:15:56 -07:00
John Plaisted
d9b86d1f05
Update metadata-models to head! (#1811)
metadata-models 80.0.0 -> 90.0.13:

   90.0.13: Roll forward: Fix the open source build by avoiding URN method that isn't part of the open source URN.
    90.0.2: Refactor listUrnsFromIndex method
    90.0.0: Start distinguishing between [] aspects vs null aspects input param
    89.0.4: Fix the open source build by avoiding URN method that isn't part of the open source URN.
    89.0.2: fix some test case name
    89.0.0: META-12686: Made the MXE_v5 topics become strictly ACL'ed to avoid the wildcard write ACL as "MetadataXEvent.+"
    88.0.6: change DAO to take Storage Config as input
    88.0.3: Add a comment on lack of avro generation for MXEv5 + add MXEv5 to the pegasus validation task.
   87.0.15: META-12651: Integrate the metadata-models-ext with metadata-models
   87.0.13: add StorageConfig to Local DAO
    87.0.3: Treat empty aspect vs optional aspect same until all clients are migrated
    87.0.2: Treat empty aspect vs optional aspect differently
    87.0.1: META-12533: Skip processing unregistered aspect specific MAE.
    83.0.6: action method to return list of urns from strong consistent index
    83.0.4: Change input param type for batch backfill
    83.0.3: Implement batch backfill
    83.0.1: Implement support for OR filter in browse query
   82.0.10: Throw UnsupportedOperationException for unsupported condition types in search filter
    82.0.6: Implement local secondary backfilling index as part of backfill method
    82.0.5: [strongly consistent index] implement getUrns method
    82.0.4: Add indexing urn fields to the local secondary index
    82.0.0: Render Delta fiels in the MCE_v5.
    81.0.1: Add pegasus to avro conversion for FMCE
    80.0.4: add get all support for BaseSingleAspectEntitySimpleKeyResource
    80.0.2: Add a BaseSearchWriterDAO with an ESBulkWriterDAO implementation.
    80.0.1: META-12254: Produce aspect specific MAE with always emit option
    80.0.0: Convert getNodesInTraversedPath to getSubgraph to return complete view of the subgraph (nodes+edges)
2020-08-19 16:06:29 -07:00
Mars Lan
872ca3598a
fix(search): clear description from dataset index when it's cleared (#1808)
Fixes https://github.com/linkedin/datahub/issues/1798
2020-08-14 07:09:27 -07:00
Mars Lan
1efe249e3b
refactor: remove unused internal models (#1789) 2020-08-08 13:46:24 -07:00
Liangjun Jiang
5d078aa617
Implemented data process search feature (#1706)
* implement search feature

* add test for dataprocessIndexBuilder; refactor code based on feedback

* update based on PR feedback

* Update DataProcessDocument.pdl

fixed typo wording.

* add not null check for data process info
2020-06-29 10:20:22 -07:00
Liangjun Jiang
40f08ecaf1
Implement data process graph feature (#1695)
* implement data process graph feature; update the document

* add unit test for data process graph

* removed auto generated avro files

* update image
2020-06-17 11:58:42 -07:00
Kerem Sahin
2e2fb2b810
Add missing updates from recent internal push (#1700) 2020-06-12 12:55:50 -07:00
Jyoti Wadhwani
ad6f1653e1
metadata-models 62.0.3 -> 72.0.8 (#1693) 2020-06-11 10:21:51 -07:00
Kerem Sahin
f79b2c958a fix(ingestion): Fix sample MCE for data process 2020-06-11 01:04:52 -07:00
Liangjun Jiang
92c4a3689e
Data process entity (#1680)
* add job info as aspect of a dataset

* add job urn def., aspect and entity

* job entity with upstream and downstream lineage

* use job urn in upstream & downstream

* add Job entity rest APIs

* rest.li api, impl and factory for job entity

* code cleanup

* use pdl; onboard data process entity

* add es index json

* fix gradlew build ignored tasks

* add a comment about data process info field

* fix style warning issues

* update content based on PR

* checked in generated snapshot json

* updated based on PR feedback

* update data process data format

* updated based on code review feedback

* revert back gms & mce-job docker image

* delete temp files

* update based pr feedback

* file name and a typo

* format with linkedin style

Co-authored-by: Liangjun <liajiang@expediagroup.com>
2020-06-09 15:42:08 -07:00
Mars Lan
38fc7249d2 Revert "metadata-models 54.0.1 -> 58.0.1:"
This reverts commit bab5daa56d77f067de949f6f0eb5bc7c537641f7.
2020-03-25 21:43:28 -07:00
Jyoti Wadhwani
bab5daa56d metadata-models 54.0.1 -> 58.0.1:
58.0.1: Remove all keys that can be moved back to respective GMS
    58.0.0: Revert "Reverting the commit range: f0c894b490d3df047837cf2fb7b9911c86188cae..4b5f31ed8844f818d7db0880d30c8dc8c7ac0087."
   57.0.16: Reverting the commit range: f0c894b490d3df047837cf2fb7b9911c86188cae..4b5f31ed8844f818d7db0880d30c8dc8c7ac0087.
   57.0.15: Disable filtering removed entities in browse until META-10900 is solved
   57.0.14: (resubmit) add graph index builder for ai-metadata entities and relationships
   57.0.13: Reverting the commit range: 830e63b4b40cf701db216952c34d731a7a82ea1d..4255871452062c2fd14651cb4fffb7d337bad300.
   57.0.12: add graph index builder for ai-metadata entities and relationships
   57.0.11: Fix bug which sets removed field to always true while building DatasetDocument
   57.0.10: Change p12 file name to new ina group name
    57.0.9: Add removal field in field compliance to flag the proposal as removal or not.
    57.0.8: Adding action Builder for DatasetInstance entity
    57.0.7: Adding GMA entities and relations for GridWorkflow and GridWorkflowExecution
    57.0.6: Adding dataType and dataClassification to the search document
    57.0.5: Rename graph entity MlTrainedModel to MlTrainedModelEntity
    57.0.4: Code to form the FollowedBy Graph based on the Follow Aspect
    57.0.3: add graph entity and relationship models for ai-metadata
    57.0.2: Refactor incorrect use of mock in variable names
    57.0.1: Add support for <, <=, >, >= conditions for the filter API
    57.0.0: Update Conditions model for <, <=, >, >= conditions
    56.0.5: update version of pegasus metadata plugin
    56.0.4: update container dependency
    56.0.3: Move mlFeatures from SnapshotRequestBuilders to ActionRequestbuilder
    56.0.2: Adding reserved versions aspect
    56.0.1: Create search filter for compliance pending review proposal.
    56.0.0: Add Likes aspect resource in metadata restli utils
    55.0.6: Fix a bug with getAll API
    55.0.5: Move applicable metadata-store SnapshotRequestBuilders to ActionRequestbuilder
    55.0.4: EspressoDAO: Updated to expect a separator between entityType and aspectName for config mapping keys
    55.0.3: Added EspressoRecordSerializer and EspressoDAOUtils
    55.0.2: Rewrote EspressoLocalDAOTest with a mocked EspressoAccessor
    55.0.1: Migrate metric-gms SnapshotRequestBuilders to ActionRequestBuilder
    55.0.0: [Wormhole] Deprecate Holdem-centric locations in favor of the more general CORP locations, which contain Holdem.
    54.0.1: Migrate job-gms SnapshotRequestBuilders to ActionRequestBuilder
wherehows-samza 1.0.56 -> 1.0.56:

    1.0.56: Gradle5 migration
MP_VERSION=metadata-models:58.0.1
MP_VERSION=wherehows-samza:1.0.56

This commit is automatically generated by li-opensource tool.
2020-03-25 21:13:14 -07:00
Kerem Sahin
1168501083 Enable tests for all modules by using global gradle config 2020-02-21 11:53:45 -08:00
Ben Haley
d09cedca28
Allow dashes in user urn (#1564)
* Fix: allow user/group urns to contain dashes

CorpUser urns containing dashes are valid entries. When adding that user as an
owner, the MAE job validates the owner's urn using a regex filter that only accepts
alphanumeric characters and underscore (\w). That means any ownership changes where
the user urn contains an underscore are rejected.

This change extends the regex filter to allow dashes in the name. It includes unit
tests that verify the change works for multiple dashes and underscores.

There are other cases to consider:

1. Should any other characters be allowed?
2. Should the filter check the urn starts and ends with alphanumeric characters?

CLOSES: User urn does not handle dashes consistently #1554

BREAKING CHANGE: None. This change relaxes a restriction so existing code is ok.

* Added tests for group members and fixed assertion
2020-02-21 10:58:53 -08:00
Kerem Sahin
b17b91f24a Bump gradle to 5.6.4 and pegasus to 27.7.18 2020-02-12 17:10:49 -08:00
Kerem Sahin
07a6e8b085 Remove dataset groups entity 2019-12-13 15:12:50 -08:00
Kerem Sahin
f929190e6a metadata-models 50.0.6 -> 54.0.1:
54.0.0: Filter removed documents during browse
   53.0.15: Throwing 404, when no aspects found in DB for a given entity
   53.0.14: add node label when updating relationship
    53.0.8: Handle * character in the directory path for browse
    53.0.4: apply label for add/update graph node&edge
    53.0.0: META-10395: Don't package KafkaAuditHeader and UUID classes in mxe-avro
    52.0.7: Add API in search DAO to support query filters
    52.0.5: META-10073: Refactor remote DAO to use the new Ingest action method
   51.0.16: allow query dao use default order by from neo4j
   51.0.15: enable dataset indexing in graph
   51.0.12: Move EMPTY_FILTER to RestliConstants
   51.0.11: Add KafkaEventProducerFactory to utils
   51.0.10: Create in-memory Neo4j in Neo4jDriverFactory if integration config is set
    51.0.9: Generalized add() in BaseLocalDAO and EbeanLocalDAO
    51.0.4: Move Neo4jTestServerBuilder to metadata test utils
    51.0.3: Move makeRelationshipFilter to neo4j utils
    50.0.7: Implement Neo4jDriverFactory

MP_VERSION=metadata-models:54.0.1
MP_VERSION=wherehows-samza:1.0.56
2019-12-13 11:46:49 -08:00
Kerem Sahin
e2ad0f2adf corp-identity-gms 1.0.26 -> 1.0.40:
1.0.34: Downrank inactive users in user search query
    1.0.33: Refactor clients to remove snapshot builder
    1.0.32: Adding client & integration test for get_all
    1.0.30: Implement other clients for corp groups
    1.0.28: Add resources for search and autocomplete for corp groups
    1.0.27: Start using BaseClient from metadata-models
    1.0.26: Add get_all resource for CorpUsers

metadata-models 38.1.12 -> 50.0.6:
    50.0.2: Fix removed field update logic for all entities
    49.0.1: Add dataset graph builder with DownstreamOf relationship
    48.0.3: support query dao with traverse paths
    47.0.2: refactor the query dao with relationship filter model
    47.0.1: Fix for creating duplicate nodes when label for the node is missing
   46.0.21: extend filter model with relationship direction
   46.0.19: add unit test for entities partial update
   46.0.16: Allow relationship filter in the model and query dao
   46.0.15: support relationship directions for multi hop query
   46.0.14: Implementing reportsto relationship builder and corpuser graph builder
   46.0.10: refactor query dao interface using nullable to replace optional
    46.0.9: Rename Mock Utils to Test Utils in Metadata-models mp
    46.0.6: Remove search index config from metadata models
    46.0.2: neo4j query DAO with relationships directions support
    45.1.7: refactoring the graph relationship builders
    45.1.5: Use correct total count in search response
    45.1.3: Fix issue with empty search query filter
    45.1.2: Fix a bug with autocomplete limit param
    45.0.3: Change platform field type in the dataset search document
    45.0.2: implement multi hops query DAO with interface 5
    45.0.1: Moving dataset browsePaths build logic from wherehows-samza
    44.0.2: implement interface 2 in query DAO
    40.0.2: Only return records which exist in the DB after getting search hits
    39.0.0: Add a getAuditor method to BaseSnapshotResource rather than taking it in as a constructor argument
   38.1.13: Move BaseClient to metadata-models out of GMS template
   38.1.12: Remove default filtering on removed field for get_all

MP_VERSION=corp-identity-gms:1.0.40
MP_VERSION=metadata-models:50.0.6
MP_VERSION=wherehows-samza:1.0.56

This commit is automatically generated by li-opensource tool.
2019-11-19 02:27:28 -08:00
Kerem Sahin
f29a88c365 metadata-models 38.1.6 -> 38.1.8:
38.1.8: Add getFilter method as a search util
    38.1.7: Add index builder for corp groups

MP_VERSION=corp-identity-gms:1.0.25
MP_VERSION=metadata-models:38.1.8
MP_VERSION=wherehows-samza:1.0.30

This commit is automaticaly generated by li-opensource tool.
2019-10-02 18:04:53 -07:00
Kerem Sahin
5bf797b216 corp-identity-gms 1.0.0 -> 1.0.25:
1.0.24: Corp user search across teams and skills
    1.0.21: Make /corpGroups /gridUsers /gridGroups extend BaseEntityResource
    1.0.17: Use correct util function to load resource file
    1.0.16: Add ingest, backfill & getSnapshot action methods to all top-level resources in corp-identity-gms
    1.0.13: Onboard search query templates on corp-identity-gms
     1.0.9: Fix batch get and add client for batch get
     1.0.8: Change package name for corpuser search config
     1.0.7: Use search config to get autocomplete field
     1.0.6: Implement searchable client
     1.0.5: Auto-complete backend support
     1.0.3: Add search API

metadata-models 24.0.0 -> 38.1.6:
    38.1.3: Index active status to corp user search index
    38.1.2: Change update response to create response for create API
   38.0.10: Mark BaseAspectResource as deprecated
    38.0.9: Allow TYPEREF items which have primitive types for arrays in models
    38.0.7: Add get-and-set-if-absent function to Local DAO
    38.0.6: Add find entities with one relationship in query dao
    38.0.4: Fix the inconsistency use of constants and urn params in query dao
    38.0.2: Parse source map to obtain the urn
    38.0.1: Search document validator in Index Builder
    38.0.0: Add urns to search result metadata
    37.0.7: Refactor the query dao
    37.0.6: Use test models in neo4j dao
    37.0.5: Drop metadata model structural assumptions made in neo4j DAOs
    37.0.2: Return empty list from getBrowsePaths if browsePaths field doesn't exist
    36.0.3: Drop elasticsearch-dao's metadata-models dependency
   35.0.10: ES Search DAO to handle null values
    35.0.5: Ebean local Dao query string match
    35.0.4: Drop all search & browse configs that have been moved to individual GMS
    35.0.3: Add ReportTo relationship model
    35.0.0: Load resource file properly
    34.0.9: Make RestliAuditor injectable
    34.0.8: Use encoded query in the test resource
    34.0.4: Handle empty aspects param correctly for backfill & getSnapshot actions
    34.0.1: Remove corp user specific files from metadata-models
    34.0.0: Add backfill & getSnapshot actions to BaseEntityResource
   32.0.16: Merge data template classes into metadata-dao's main artifact
   32.0.14: Replace "update" method with "ingest" action in BaseEntityResource
   32.0.13: Make fliter & sortCriteria parameters optional as they should have been
   32.0.12: Move AspectVersion & SnapshotKey back to their original namespaces
   32.0.11: Break metadata-dao's dependency on metadata-models
   32.0.10: Move model validators to a separate module
    32.0.9: Extract principal from the request context for user AuditStamp
    32.0.8: Fixing nullability annotations for search/autocomplete/browse resources & daos
    32.0.7: Move DAO-specific models to metadata-dao module
    32.0.4: Fix search finder not returning total search results count
    32.0.3: Implement get_all using search index
    32.0.2: Add missing nullability annotation
    32.0.1: Use more consistent naming for the test models
    31.0.1: Use test-specific metadata models in metadata-dao
    31.0.0: Add sort order to Search Dao
    30.0.2: Rename Aspect test model to AspectUnion to avoid confusion
    30.0.1: Committing migration for metadata-models.
    30.0.0: Add default autocomplete field in search config
   29.0.16: Modify testcase to account for empty filters
   29.0.15: Add searchable interface that clients can use
   29.0.14: Use test-specific metadata models in ebean-dao
   29.0.12: Move TestUtils to metadata-test-models module
   29.0.11: Refactor all tests in metadata-restli to use test models
   29.0.10: Move li-metadata-test-utils, metadata-test-models, metadata-test-utils into a new metadata-testing directory to improve code organization.
            Note that this is a backward compatible change as this doesn't alter the produced artifacts.
    29.0.9: Move test-specific models to a stand-alone module
    29.0.2: Refactor the rest of validators
    28.0.3: Refactor validateSchema for aspect
    28.0.2: Implement searchDao for CorpUserInfo.
   27.0.16: Refactor for ModelValidation tests
   27.0.10: Add new relationship union to model utils
    27.0.9: Add plugin to rest client factory
    27.0.6: Add rest high level factory
    27.0.5: Fix a test bug when reviewing the code
    27.0.4: Add create via lambda API to BaseVersionedAspectResource
    27.0.2: Change return type of search finder to capture search result metadata in BaseSearchableEntityResource
    27.0.1: Drop the unnecessary get method from BaseEntityResource
    27.0.0: Add BaseBrowsableEntityResource
   26.0.15: Add autocomplete action to BaseSearchableEntityResource
   26.0.14: Add BaseSearchableEntityResource
   26.0.13: Add getUrnFromDocument & urnClassForDocument util methods that are needed in future RBs
   26.0.11: Add BaseVersionedAspectResource
    26.0.9: Index signals associated with dataset relevance
    26.0.4: Support namespace for ID generation
    26.0.1: Fix inconsistent instance variable naming in SearchResult
    25.0.6: Add entity-snapshot conversion
    25.0.5: Use test-specific metadata models in metadata-restli
    25.0.3: Add aspect filtering to BaseEntityResource
    24.0.9: Add update method to BaseEntityResource
    24.0.7: Fix for parameter types of getBrowsePaths action method

MP_VERSION=corp-identity-gms:1.0.25
MP_VERSION=metadata-models:38.1.6
MP_VERSION=wherehows-samza:1.0.29
2019-10-02 11:13:44 -07:00
Kerem Sahin
1ec31bbfb7 Start indexing upstream datasets to search index to support downstream dataset relationships 2019-09-26 11:57:58 -07:00
Kerem Sahin
c365c406e1 Normalize browse paths for datasets 2019-09-10 15:14:40 -07:00
Kerem Sahin
23339df23a Initial commit for Data Hub 2019-08-31 20:51:14 -07:00