GitBook: [#115] Add env to one-time ingestion for glue

This commit is contained in:
Shannon Bradshaw 2022-03-23 00:31:31 +00:00 committed by Sriharsha Chintalapani
parent c4f87bd4c8
commit 808fe13d8e

View File

@ -719,18 +719,71 @@ python3 --version
Heres an overview of the steps in this procedure. Please follow the steps relevant to your use case.
1. Configure your AWS default profile (optional)
2. Create a configuration file using template JSON
3. Configure service settings
4. Configure data filters (optional)
5. Confirm sink settings
6. Confirm metadata\_server settings
7. Install the Python module for this connector
8. Run ingestion workflow
1. Prepare a Python virtual environment
2. Install the Python module for this connector
3. Configure your AWS default profile (optional)
4. Create a configuration file using template JSON
5. Configure service settings
6. Configure data filters (optional)
7. Confirm sink settings
8. Confirm metadata\_server settings
9. Run ingestion workflow
### **1. Configure your AWS default profile (optional)**
### **1. Prepare a Python virtual environment**
In this step, well create a Python virtual environment. Using a virtual environment enables us to avoid conflicts with other Python installations and packages on your host system.
In a later step, you will install the Python module for this connector and its dependencies in this virtual environment.
#### **1.1 Create a directory for openmetadata**
Throughout the docs, we use a consistent directory structure for OpenMetadata services and connector installation. If you have not already done so by following another guide, please create an openmetadata directory now and change into that directory in your command line environment.
```
mkdir openmetadata; cd openmetadata
```
#### **1.2 Create a virtual environment**
Run the following command to create a Python virtual environment called, `env`. You can try multiple connectors in the same virtual environment.
```
python3 -m venv env
```
#### **1.3 Activate the virtual environment**
Run the following command to activate the virtual environment.
```
source env/bin/activate
```
Once activated, you should see your command prompt change to indicate that your commands will now be executed in the environment named `env`.
#### **1.4 Upgrade pip and setuptools to the latest versions**
Ensure that you have the latest version of pip by running the following command. If you have followed the steps above, this will upgrade pip in your virtual environment.
```javascript
pip3 install --upgrade pip setuptools
```
****
### **2. Install the Python module for this connector**
Once the virtual environment is set up and activated as described in Step 1, run the following command to install the Python module for this connector.
```javascript
pip3 install 'openmetadata-ingestion[glue]'
```
****
### **3. Configure your AWS default profile (optional)**
In order to use the Glue Catalog connector, you will need AWS credentials configured and available to the connector. The best way to do this is by configuring your AWS default profile using the AWS Command-Line Interface (CLI). In this step we will install the AWS CLI and then configure an AWS profile.
@ -740,13 +793,13 @@ Note: If you do not have an existing AWS profile and opt not to create one, you
####
#### 1a. Install the AWS CLI
#### 3a. Install the AWS CLI
To install the AWS CLI, follow the installation guide for your operating system from the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).
####
#### 1b. Configure your AWS default profile
#### 3b. Configure your AWS default profile
With the AWS CLI installed, to configure your AWS profile run the following command.
@ -768,7 +821,7 @@ Please enter your `Access Key`, `Secret Key`, and `Region` when prompted. The Op
####
#### 1c. Test access to your Glue catalog
#### 3c. Test access to your Glue catalog
Run the following command to ensure your AWS credentials and region are configured properly.
@ -780,7 +833,7 @@ In response you will either see a formatted list of schemas defined in your Glue
### **2. Create a configuration file using template JSON**
### **4. Create a configuration file using template JSON**
Create a new file called `glue.json`. Copy and paste the configuration template below into the `glue.json` file you created.
@ -821,7 +874,7 @@ Note: The `source.config` field in the configuration JSON will include the major
### **3. Configure service settings**
### **5. Configure service settings**
In this step we will configure the Glue service settings required for this connector. Please follow the instructions below to ensure that youve configured the connector to read from your Glue service as desired.
@ -937,7 +990,7 @@ You may specify a value for `source.config.endpoint_url` to override this behavi
### **4. Configure data filters (optional)**
### **6. Configure data filters (optional)**
#### **include\_views (optional)**
@ -1011,9 +1064,9 @@ You may use either `excludes` or `includes` but not both in `table_filter_patter
### **5. Confirm `sink` settings**
### **7. Confirm `sink` settings**
You need not make any changes to the fields defined for `sink` in the template code you copied into `glue.json` in Step 2. This part of your configuration file should be as follows.
You need not make any changes to the fields defined for `sink` in the template code you copied into `glue.json` in Step 4. This part of your configuration file should be as follows.
```javascript
"sink": {
@ -1024,7 +1077,7 @@ You need not make any changes to the fields defined for `sink` in the template c
### 6**. Confirm `metadata_server` settings**
### **8. Confirm `metadata_server` settings**
You need not make any changes to the fields defined for `metadata_server` in the template code you copied into `glue.json` in Step 2. This part of your configuration file should be as follows.
@ -1040,17 +1093,7 @@ You need not make any changes to the fields defined for `metadata_server` in the
### 7. Install the Python module for this connector
Run the following command to install the Python module for the Glue connector.
```bash
pip3 install --upgrade 'openmetadata-ingestion[glue]'
```
### 8**. Run ingestion workflow**
### **9. Run ingestion workflow**
Your `glue.json` configuration file should now be fully configured and ready to use in an ingestion workflow.
@ -1083,13 +1126,13 @@ This error usually occurs due to an older version of pip. Try upgrading pip as f
pip3 install --upgrade pip setuptools
```
Then re-run the install command in Step 7.
Then re-run the install command in Step 2.
### ****
### **requests.exceptions.ConnectionError**
If you encounter the following error when attempting to run the ingestion workflow in Step 8, this is probably because there is no OpenMetadata server running at http://localhost:8585.
If you encounter the following error when attempting to run the ingestion workflow in Step 9, this is probably because there is no OpenMetadata server running at http://localhost:8585.
```
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8585):
@ -1100,9 +1143,7 @@ Failed to establish a new connection: [Errno 61] Connection refused'))
To correct this problem, please follow the steps in the [Run OpenMetadata](https://docs.open-metadata.org/v/main/try-openmetadata/run-openmetadata) guide to deploy OpenMetadata in Docker on your local machine.
Then re-run the metadata ingestion workflow in Step 8.
Then re-run the metadata ingestion workflow in Step 9.
{% endtab %}
{% endtabs %}