Add DataFlow and DataJob urn and aspect models.
The aspects include metadata associated with Data processing flow and job entities,
associated inputs and outputs of a job.
* feat(dashboard): Dashboard models update
* Keep chartId/dashboardId fields in the URN definitions and add fields for chartURL/dashboardURL into info aspects
* Rebase and address some comments
* feat(field-level-lineage): adding models for field level lineage
adding models for field level lineage. Introduce DatasetFieldUrn as a unique identifier for dataset field
Move code to linkedin/datahub-gma.
"GMA" (Generalized Metadata Architecture) is the backend of DataHub, and has been moved to its own repository.
This deletes the code that was moved and uses jars that GMA publishes to bintray to load it.
Note that not all of GMA was moved, but most of it. We may still move more things to the other repository in the future.
* ML Model Schema Initial Version for feedback
* Added Deprecation Model
* Remove lock files
* Committing yarn lock file
* Fix Review Comments
* Using Common VersionTag Entity
* PR Review Comments Round-2
* Updated all model and feature references to MLModel and MLFeature
* Addressing PR Comments (Round 3)
* Updating Hyperparameter to a Map type
* Update to Dataset
* Review comments based on RFC
* ML Model Schema Initial Version for feedback
* Added Deprecation Model
* Remove lock files
* Committing yarn lock file
* Fix Review Comments
* Using Common VersionTag Entity
* PR Review Comments Round-2
* Updated all model and feature references to MLModel and MLFeature
* Addressing PR Comments (Round 3)
* Updating Hyperparameter to a Map type
* Update to Dataset
* fix: modify the etl script dependency (#1726)
Co-authored-by: Cobolbaby <Zhang.Xing-Long@inventec.com>
* fix: correct the way to catch the exception (#1727)
* fix: modify the etl script dependency
* fix: Correct the way to catch the exception
* fix: Compatible with the following kafka cluster when the Kafka Topic message Key cannot be empty
* fix: Adjust the kafka message key; Improve the comment of field
* fix: Avro schema required for key
Co-authored-by: Cobolbaby <Zhang.Xing-Long@inventec.com>
* refactor(models): remove internal cluster model (#1733)
* refactor(models): remove internal cluster model
Remove internal model which is not used in open source
* build(deps): bump lodash from 4.17.15 to 4.17.19 in /datahub-web (#1738)
Bumps [lodash](https://github.com/lodash/lodash) from 4.17.15 to 4.17.19.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.15...4.17.19)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Update README.md
* Update README.md
* Update README.md
* Update the roadmap (#1740)
* Update the roadmap
- Make short term more like what we're doing this quarter
- Medium term is next quarter
- Long term is 2 or 3 quarters from now
- Visionary is even beyond that
Making this PR mostly to discuss the roadmap. I've moved a few items down to "unprioritized"; before merging this we should put these in a category. Mostly saving the state of what I've done so far.
* Update roadmap.md
Co-authored-by: Mars Lan <mars.th.lan@gmail.com>
* Update roadmap.md
* Update README.md
* doc: add a separate doc to keep track of the full list or links (#1744)
* Update README.md
* Create links.md
* Update README.md
* Update links.md
* Update README.md
* Update README.md
* Update features.md
* Update faq.md
* Update README.md
* Update README.md
* feat(gms): add postgres & mariadb supports to GMS (#1742)
* feat(gms): add postgres & mariadb supports to GMS
Also add corresponding docker-compose files
* Update README.md
* build(frontend): Drop unnecessary DB-related dependencies (#1741)
* refactor(frontend): Drop unnecessary DB-related dependencies
* Drop unused dependencies from top-level build script
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update links.md
* Update README.md
* Doc fixes
* Update roadmap.md
* Update faq.md
* Set theme jekyll-theme-cayman
* Create _config.yml
* Delete _config.yml
* Set theme jekyll-theme-cayman
* Update _config.yml
* Update _config.yml
* build: build GitHub Page from /docs directory (#1750)
- Move top-level MD files to /docs and symlink them back
- Update all absolute links to files in /docs to relative links
* Revert "build: build GitHub Page from /docs directory (#1750)" (#1751)
This reverts commit b0f56de7a81b8bf921ff37cb81024692d1b9a8ce.
* build: build GitHub Pages from /docs directory (#1752)
- Move non-README top-level MD files to /docs
- Update all absolute links to files in /docs to relative links
- Add a placeholder front page for GitHub Pages
* Update README.md
* Update README.md
* Update README.md
* feat(kafka-config): Add ability to configure other Kafka props (#1745)
* Integarte spring-kafka & spring-boot for security props
- Upgrade spring-kafka to 2.1.14
- Use KafkaListener and KafkaTemplates to enable KafkaAutoConfiguration
- Integrates spring-boot's KafkaProperties into spring-kafka's config
* Cleanup imports
* Add DataHub kafka env vars
* Remove kafka-streams dependency
* Add KafkaProperties to gms; Add docs
* Add to Adoption
* Remove KAFKA_BOOTSTRAP_SERVER default
Co-authored-by: jsotelo <javier.sotelo@viasat.com>
Co-authored-by: Kerem Sahin <ksahin@linkedin.com>
* Agenda for next town hall
* Update townhalls.md
* Update README.md
* Update README.md
* Add documentation around the DataHub RFC process. (#1754)
Other repos have similar RFC processes (though they seem to have a separate repo for their RFC docs).
This provides a more structured way for contributors to make siginficant design contributions.
https://github.com/linkedin/datahub/issues/1692
* metadata-models 72.0.8 -> 80.0.0 (#1756)
* <refactor>[ingestions]: align the default kafka topics with PR #1756 (#1758)
* docs: add a sequence diagram and a description (#1757)
* add a sequence diagram and a description
* update descrpition based on feedback
* Update README.md
* Update README.md
Co-authored-by: Mars Lan <mars.th.lan@gmail.com>
* Update README.md
* Fix reflinks in PR template (#1764)
* Update kafka-config.md (#1763)
Fix name of spring-kafka property to pass SASL_JAAS config
* Update entity.md
* Update README.md
* Update faq.md
* Update townhalls.md
* Update README.md
* Update townhalls.md
* Update townhalls.md
* docs: move quickstart guide to a separate file under docs (#1765)
docs: move quickstart guide to a separate doc under docs directory
* Update slack.md
* Update README.md
* Update slack.md
* Update metadata-ingestion.md
* Add workflow to check build and tests on PRs + releases. (#1769)
PRs are setup to skip docs.
Also, only run docker actions on linkedin/datahub (i.e. disable on forks; makes forks nicer since you don't have failing actions).
* Update developers.md
* Update developers.md
* Update README.md
* fix(models): remove unused model (#1748)
* fix(models): remove unused model
Fixes https://github.com/linkedin/datahub/issues/1719
* Drop DeploymentInfo from Dataset's value model & rebuild snapshot
* Update README.md
* Add a separate page for previous townhalls
* Update for August invite; link to history
* Update README.md
* build: remove travis (we're using GitHub actions). (#1770)
Remove travis (we're using GitHub actions).
Also ignore markdown in our current workflows.
Also update the README.md badge.
* update townhall date
* Update README.md
* Update townhalls.md
* build(docker): build & publish GitHub Package (#1771)
* build(docker): build & publish docker images to GitHub Packages
Will kepp publishing to Docker Hub meanwhile until all Dockerfiles have been updated to point to GitHub.
Fixes https://github.com/linkedin/datahub/issues/1548
* Rebase & fix dockerfile locations
* Update README.md
* Fix README.md
* docs: add placeholders for advanced topics (#1780)
* Create high-cardinality.md
* Create pdl-best-practices
* Create partial-update.md
* Rename pdl-best-practices to pdl-best-practices.md
* Create entity-hierarchy.md
* docs: more placeholders for advance topics (#1781)
* Create aspect-versioning.md
* Create derived-aspects.md
* Create backfilling.md
* Update README.md
* Update aspect-versioning.md
* Update aspect.md
* Update README.md
* Update townhall-history.md
* Update townhall-history.md
* Update rfc.md
* refactor(docker): make docker files easier to use during development. (#1777)
* Make docker files easier to use during development.
During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support.
Changes made to docker files:
- Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides.
- Remove redundant README files that provided little information.
- Rename docker/<dir> to match the service name in the docker-compose file for clarity.
- Move environment variables to .env files. We only provide dev / the default environment for quickstart.
- Add debug options to docker files using multistage build to build minimal images with the idea that built files will be mounted instead.
- Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image).
- Added docs/docker documentation for this.
* build: fix docker actions. (#1787)
* bug: Fix docker actions.
We renamed directories in docker/ which broke the actions.
Also try to refactor the action files a little so that we can run (but not publish) these images on pull requests that change the docker/ dir as an extra check. Note this only seems to be supported by the dockerhub plugin; the github plugin doesn't support this (so that will be an issue when we move to it only).
* Drop extra pipes
* Update README.md
* refactor: remove unused model (#1788)
* refactor: remove unused internal models (#1789)
* docs: create search-over-new-field.md (#1790)
Add a doc on searching over a new field
* Update search-onboarding.md
* add description field for dataset index mapping (#1791)
* docs: how to customize the search experience (#1795)
* add description field for dataset index mapping
* documentation on how to customize the search experience
* feat(ingest): add example crawler for MS SQL (#1803)
Also fix the incorrect assumption on column comments & add sample docker-compose file
* Add log documentation
we didn't end up mounting logs; docker desktop is a better experience
* Update townhall-history.md
* Update quickstart.md
* fix(search): clear description from dataset index when it's cleared (#1808)
Fixes https://github.com/linkedin/datahub/issues/1798
* Update README.md
* Revert "Update README.md"
This reverts commit 74a0d7b262a2ac22de9bc52974b721d580914ff0.
* Update README.md
* Update README.md
* Update high-cardinality.md
* Update README.md
* Update relationship.md
* Update high-cardinality.md
* Update metadata-models to head! (#1811)
metadata-models 80.0.0 -> 90.0.13:
90.0.13: Roll forward: Fix the open source build by avoiding URN method that isn't part of the open source URN.
90.0.2: Refactor listUrnsFromIndex method
90.0.0: Start distinguishing between [] aspects vs null aspects input param
89.0.4: Fix the open source build by avoiding URN method that isn't part of the open source URN.
89.0.2: fix some test case name
89.0.0: META-12686: Made the MXE_v5 topics become strictly ACL'ed to avoid the wildcard write ACL as "MetadataXEvent.+"
88.0.6: change DAO to take Storage Config as input
88.0.3: Add a comment on lack of avro generation for MXEv5 + add MXEv5 to the pegasus validation task.
87.0.15: META-12651: Integrate the metadata-models-ext with metadata-models
87.0.13: add StorageConfig to Local DAO
87.0.3: Treat empty aspect vs optional aspect same until all clients are migrated
87.0.2: Treat empty aspect vs optional aspect differently
87.0.1: META-12533: Skip processing unregistered aspect specific MAE.
83.0.6: action method to return list of urns from strong consistent index
83.0.4: Change input param type for batch backfill
83.0.3: Implement batch backfill
83.0.1: Implement support for OR filter in browse query
82.0.10: Throw UnsupportedOperationException for unsupported condition types in search filter
82.0.6: Implement local secondary backfilling index as part of backfill method
82.0.5: [strongly consistent index] implement getUrns method
82.0.4: Add indexing urn fields to the local secondary index
82.0.0: Render Delta fiels in the MCE_v5.
81.0.1: Add pegasus to avro conversion for FMCE
80.0.4: add get all support for BaseSingleAspectEntitySimpleKeyResource
80.0.2: Add a BaseSearchWriterDAO with an ESBulkWriterDAO implementation.
80.0.1: META-12254: Produce aspect specific MAE with always emit option
80.0.0: Convert getNodesInTraversedPath to getSubgraph to return complete view of the subgraph (nodes+edges)
* Update townhalls.md
* Update townhalls.md
* fix: drop the commits badge as it's flakey
* Update README.md
* fix: update defaults of aspectNames params (#1815)
fix: Update defaults of aspectNames params.
The last PR to sync internal code broke the external GMS, as code was now expected aspectNames to be null rather than empty by default. This preventing me logging into DataHub as the corp user request would fail (it assumed I asked for no aspects rather than all aspects).
TESTED: Built locally, launched with docker/dev.sh (so used latest frontend, but whatever). Verified I can now log into DataHub, browse and search for datasets, and view my profile.
* Update README.md
* Update README.md
* feat(kubernetes): Improve the security of the kubernetes/helm charts (#1782)
* 1747 | remove obsolete yaml files
* 1747 | remove configmap and its hardcoded references
* 1747 | add missing input parameter of neo4j.host
* 1747 | remove obsolete secrets and parameterize the rest
* 1747 | auto-generate gms secret
* 1747 | remove fullName overrides
* 1747 | fix parameters in subchart's values.yaml
* 1747 | remove hardcoding from parameters for gms host and port
* 1747 | upgrade chart version
* 1747 | update helm docs
* 1747 | add extraEnv, extraVolume and extraMounts
* 1747 | Alters pull policy of images to 'always' for ldh
Co-authored-by: shakti-garg <shakti.garg@gmail.com>
* Update README.md
* feat(data-platforms): adding rest resource for /dataPlatforms and mid-tier support (#1817)
* feat(data-platforms): Adding rest resource for /dataPlatforms and mid-tier support
* Removed data platforms which are Linkedin internal
* docs: add NOTICE (#1810)
* Copy NOTICE from wherehows
Copies the file from the wherehows branch.
* Update notice.
* Update links.md
* Update links.md
* Update README.md
* feat(dashboards): RFC for dashboards (#1778)
* feature(dashboards): RFC for dashboards
* Change directory structure
* Create goals & non-goals sections
* Removing alternatives section
* Update README.md
* Update links.md
* Update townhalls.md
* Update notice to include embedded licenses
Also list apache projects specifically.
* feat(frontend): update datahub-web client UI code (#1806)
* Releases updated version of datahub-web client UI code
* Fix typo in yarn lock
* Change yarn lock to match yarn registry directories
* Previous commit missed some paths
* Even more changes to yarnlock missing in previous commit
* Include codegen file for typings
* Add files to get parity for datahub-web and current OS datahub-midtier
* Add in typo fix from previous commit - change to proper license
* Implement proper OS fix for person entity picture url
* Workarounds for open source DH issues
* Fixes institutional memory api and removes unopensourced tabs for datasets
* Fixes search dataset deprecation and user search issue as a result of changes
* Remove internal only options in the avatar menu
* Update search-over-new-field.md
* docs: add external link (#1828)
* Update README.md
* Update links.md
* Review comments based on RFC
Co-authored-by: cobolbaby <cobolbaby@qq.com>
Co-authored-by: Cobolbaby <Zhang.Xing-Long@inventec.com>
Co-authored-by: Harsh Shah <hrshah@linkedin.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mars Lan <mars.th.lan@gmail.com>
Co-authored-by: John Plaisted <jplaisted@linkedin.com>
Co-authored-by: Kerem Sahin <ksahin@linkedin.com>
Co-authored-by: Javier Sotelo <javier.a.sotelo@gmail.com>
Co-authored-by: jsotelo <javier.sotelo@viasat.com>
Co-authored-by: Jyoti Wadhwani <jywadhwani@linkedin.com>
Co-authored-by: Chris Lee <wlee@linkedin.com>
Co-authored-by: Liangjun Jiang <ljiang510@gmail.com>
Co-authored-by: shakti-garg-saxo <68685481+shakti-garg-saxo@users.noreply.github.com>
Co-authored-by: na zhang <nazhang@linkedin.com>
Co-authored-by: shakti-garg <shakti.garg@gmail.com>
Co-authored-by: Charlie Tran <catran@linkedin.com>
* add job info as aspect of a dataset
* add job urn def., aspect and entity
* job entity with upstream and downstream lineage
* use job urn in upstream & downstream
* add Job entity rest APIs
* rest.li api, impl and factory for job entity
* code cleanup
* use pdl; onboard data process entity
* add es index json
* fix gradlew build ignored tasks
* add a comment about data process info field
* fix style warning issues
* update content based on PR
* checked in generated snapshot json
* updated based on PR feedback
* update data process data format
* updated based on code review feedback
* revert back gms & mce-job docker image
* delete temp files
* update based pr feedback
* file name and a typo
* format with linkedin style
Co-authored-by: Liangjun <liajiang@expediagroup.com>
* feat(urn): add AzkabanFlow and AzkabanJob urn
Add AzkabanFlow urn to represent an azkaban flow defined on a specific cluster
and inside a project.
Add AzkabanJob urn to represent an azkaban job defined as part of an azkaban
flow.
For more details on Azkaban flow and job see - https://azkaban.readthedocs.io/en/latest/
Expose EbeanServer from EbeanLocalDAO
Create test URNs and use those from entity models
Implement interface for browse
MP_VERSION=metadata-models:62.0.3
MP_VERSION=wherehows-samza:1.0.56
* Fix: allow user/group urns to contain dashes
CorpUser urns containing dashes are valid entries. When adding that user as an
owner, the MAE job validates the owner's urn using a regex filter that only accepts
alphanumeric characters and underscore (\w). That means any ownership changes where
the user urn contains an underscore are rejected.
This change extends the regex filter to allow dashes in the name. It includes unit
tests that verify the change works for multiple dashes and underscores.
There are other cases to consider:
1. Should any other characters be allowed?
2. Should the filter check the urn starts and ends with alphanumeric characters?
CLOSES: User urn does not handle dashes consistently #1554
BREAKING CHANGE: None. This change relaxes a restriction so existing code is ok.
* Added tests for group members and fixed assertion
1.0.34: Downrank inactive users in user search query
1.0.33: Refactor clients to remove snapshot builder
1.0.32: Adding client & integration test for get_all
1.0.30: Implement other clients for corp groups
1.0.28: Add resources for search and autocomplete for corp groups
1.0.27: Start using BaseClient from metadata-models
1.0.26: Add get_all resource for CorpUsers
metadata-models 38.1.12 -> 50.0.6:
50.0.2: Fix removed field update logic for all entities
49.0.1: Add dataset graph builder with DownstreamOf relationship
48.0.3: support query dao with traverse paths
47.0.2: refactor the query dao with relationship filter model
47.0.1: Fix for creating duplicate nodes when label for the node is missing
46.0.21: extend filter model with relationship direction
46.0.19: add unit test for entities partial update
46.0.16: Allow relationship filter in the model and query dao
46.0.15: support relationship directions for multi hop query
46.0.14: Implementing reportsto relationship builder and corpuser graph builder
46.0.10: refactor query dao interface using nullable to replace optional
46.0.9: Rename Mock Utils to Test Utils in Metadata-models mp
46.0.6: Remove search index config from metadata models
46.0.2: neo4j query DAO with relationships directions support
45.1.7: refactoring the graph relationship builders
45.1.5: Use correct total count in search response
45.1.3: Fix issue with empty search query filter
45.1.2: Fix a bug with autocomplete limit param
45.0.3: Change platform field type in the dataset search document
45.0.2: implement multi hops query DAO with interface 5
45.0.1: Moving dataset browsePaths build logic from wherehows-samza
44.0.2: implement interface 2 in query DAO
40.0.2: Only return records which exist in the DB after getting search hits
39.0.0: Add a getAuditor method to BaseSnapshotResource rather than taking it in as a constructor argument
38.1.13: Move BaseClient to metadata-models out of GMS template
38.1.12: Remove default filtering on removed field for get_all
MP_VERSION=corp-identity-gms:1.0.40
MP_VERSION=metadata-models:50.0.6
MP_VERSION=wherehows-samza:1.0.56
This commit is automatically generated by li-opensource tool.