mirror of
https://github.com/datahub-project/datahub.git
synced 2025-09-17 05:03:24 +00:00
543 B
543 B
Compatibility
Profiles are computed with PyDeequ, which relies on PySpark. Therefore, for computing profiles, we currently require Spark 3.0.3 with Hadoop 3.2 to be installed and the SPARK_HOME
and SPARK_VERSION
environment variables to be set. The Spark+Hadoop binary can be downloaded here.
For an example guide on setting up PyDeequ on AWS, see this guide.