OpenMetadata

mirror of https://github.com/open-metadata/OpenMetadata.git synced 2025-10-13 01:38:13 +00:00

History

Abdallah Serghine ac967dfe50

ISSUE-16094: fix s3 storage parquet structureFormat ingestion (#18660 )

This aims at fixing the s3 ingestion for parquet files, current behaviour is that
the pipeline will break if it encounters a file that is not valid parquet in the
the container, this is not great as containers might container non parquet files
on purpose like for example _SUCCESS files created by spark.

For that do not fail the whole pipeline when a single container fails, instead
count it as a failure and move on with the remainder of the containers, this is
already an improvement by ideally the ingestion should try a couple more files
under the given prefix before given up, additionally we can allow users to specify
file patterns to be ignored.

Co-authored-by: Abdallah Serghine <abdallah.serghine@olx.pl>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>

2024-12-14 11:40:23 +01:00

examples

fix: modify fqn to allow quotes with dots (#18719 )

2024-11-22 09:33:50 +05:30

operators

fix: PostgreSQL installation in Ingestion Docker (#18114 )

2024-10-04 11:31:01 +05:30

pipelines

MINOR: Add failed rows sample to test case (#15682 )

2024-04-10 17:00:00 +02:00

plugins

…

src

ISSUE-16094: fix s3 storage parquet structureFormat ingestion (#18660 )

2024-12-14 11:40:23 +01:00

tests

fix: azuresql sampler logic (#19034 )

2024-12-13 07:35:04 +01:00

__init__.py

…

Dockerfile

MINOR: Fix Ingestion Dockerfile Compatibility by changing IF syntax (#18098 )

2024-10-04 02:52:48 +05:30

Dockerfile.ci

MINOR: Fix Ingestion Dockerfile Compatibility by changing IF syntax (#18098 )

2024-10-04 02:52:48 +05:30

ingestion_dependency.sh

#15243 - Pydantic V2 & Airflow 2.9 (#16480 )

2024-06-05 21:18:37 +02:00

LICENSE

Docs - Ingestion License (#17893 )

2024-09-17 08:58:53 -07:00

Makefile

FIX - Ingestion Airflow Image constraints (#17296 )

2024-08-06 06:44:27 +02:00

pyproject.toml

MINOR: add reportExplicitAny = false for basedpyright (#18777 )

2024-11-25 14:38:04 +00:00

README.md

MINOR: Update README.md python version (#15893 )

2024-04-18 21:33:16 +05:30

setup.py

Feature: Cassandra Connector (#18943 )

2024-12-12 15:12:55 +05:30

sonar-project.properties

…

README.md

This guide will help you setup the Ingestion framework and connectors

OpenMetadata Ingestion is a simple framework to build connectors and ingest metadata of various systems through OpenMetadata APIs. It could be used in an orchestration framework(e.g. Apache Airflow) to ingest metadata. Prerequisites

Python >= 3.8.x

Docs

Please refer to the documentation here https://docs.open-metadata.org/connectors

TopologyRunner

All the Ingestion Workflows run through the TopologyRunner.

The flow is depicted in the images below.

TopologyRunner Standard Flow

TopologyRunner Multithread Flow