* MINOR: Improve UDF Lineage Processing & Better Logging Time & MultiProcessing (#20848)
* Fix multiprocessing with better memory management and Airflow 2+ compatibility
* Add support for both multiprocessing and multithreading for relevant platforms
* Handle conflicting cross-db lineage changes of service_name parameter change
* Handle stored proc queries without caching all and increase the thread timeout times to cover 100% lineage
* Fix `get_table_query` inheritance and pylint
* Remove mocks from db_utils tests
* Better db_utils test and fix the service_names parameter in case of schema_fallback
---------
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
* Fix Oracle DataDiff and Change Oracle Connection to BaseConnection
* Add small unittest
* Fix Test
* Fix logic, to void other engines to denormalize table/schema names
* Add calculated view columns' formula parsing logic with correct source reference
* Handle top level column formula parsing and pass formula expression in column lineage detail
---------
Co-authored-by: Suman Maharana <sumanmaharana786@gmail.com>
* fix: ingestion fails for Iceberg tables with nested partition column
* test: added test to cover nested partition column for iceberg
* refactor: used if-else in tablePartition check
* fix: partition_column_name & column_partition_type typo
* Enhance Sample Data Generation: Update table and column limits, add description and owner fields to table creation requests in sample_data.py
* Refactor SampleDataSource: Improve readability by adjusting conditional formatting for owner checks in sample_data.py
* Reduced number of tables per schema to 10
* Update sample_data.py: Reduce the maximum number of columns per table from 2000 to 200 for improved data generation efficiency
* Refactor BigQuery client type hinting in bigquery_utils.py
- Updated type hint for the BigQuery client to use forward declaration for better compatibility with type checking.
- Moved import statement for google.cloud.bigquery inside TYPE_CHECKING block to optimize imports during runtime.
* Refactor BigQuery client import structure in bigquery_utils.py
- Moved the import statement for google.cloud.bigquery inside the TYPE_CHECKING block to enhance type hinting compatibility.
- Adjusted the import location for better runtime performance and adherence to best practices.
* fix(elasticsearch.py) - add None value filter
Sometimes elasticsearch returns lists with None values in it.
To fix this issue we need to filter them out first, befor returning most relevant to the endpoint.
* fix tests
---------
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
Co-authored-by: ulixius9 <mayursingal9@gmail.com>
* 🎉 Init
* linter
* bring back removed code for api collection
* remove comment
* fix type hint
---------
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
* Add lineage to Exasol connector
* Update test_connection to return TestConnectionResult
* Add exasol tests & dependencies to tests in setup.py
* Opensearch is required for testing, so add it there
* Modify metadata
* Update documentation for lineage
* Apply formatting changes to code
* Apply make py_format