mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-10-13 01:38:13 +00:00

This aims at fixing the s3 ingestion for parquet files, current behaviour is that the pipeline will break if it encounters a file that is not valid parquet in the the container, this is not great as containers might container non parquet files on purpose like for example _SUCCESS files created by spark. For that do not fail the whole pipeline when a single container fails, instead count it as a failure and move on with the remainder of the containers, this is already an improvement by ideally the ingestion should try a couple more files under the given prefix before given up, additionally we can allow users to specify file patterns to be ignored. Co-authored-by: Abdallah Serghine <abdallah.serghine@olx.pl> Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
This guide will help you setup the Ingestion framework and connectors
OpenMetadata Ingestion is a simple framework to build connectors and ingest metadata of various systems through OpenMetadata APIs. It could be used in an orchestration framework(e.g. Apache Airflow) to ingest metadata. Prerequisites
- Python >= 3.8.x
Docs
Please refer to the documentation here https://docs.open-metadata.org/connectors

TopologyRunner
All the Ingestion Workflows run through the TopologyRunner.
The flow is depicted in the images below.
TopologyRunner Standard Flow
TopologyRunner Multithread Flow