For hybrid deployment, Data Processing Engine (DPE) can be installed on a Linux host using the
Ansible
installation packages.
The following guide provides installation instructions for both RHEL and Ubuntu OS and it covers the following topics:
You have the necessary access rights and permissions to install Ansible and its prerequisites.
Ansible, an open-source tool for server orchestration, is a Python application that requires specific versions of Python libraries.
To make sure that there are no conflicts with other installed modules, it is necessary to create a separate virtual environment using the Python
venv
module.
As the last step in the installation procedure, the
ansible-galaxy
tool is used to download a number of community-maintained Ansible roles (libraries).
Create and activate a virtual environment for Ansible installation.
This helps prevent potential version conflicts between the system modules and the modules necessary for setting up hybrid DPE.
Execute the following commands:
Set up virtual environment
python3 -m venv ~/venv
. ~/venv/bin/activate
Your command prompt now starts with
(venv)
, which indicates that all Python processes now use the modules from the virtual environment.
This also means that you need to have your virtual environment active anytime you want to work with this Ansible repository.
To start the virtual environment, use the following command:
Start virtual environment
. ~/venv/bin/activate
Install Ansible and the remaining dependencies within your virtual environment.
To do so, execute the following commands:
Install Ansible and dependencies
cd ~/one/ansible
pip install --upgrade 'pip>=20.3'
pip install wheel 'ansible<4.6'
pip install -r requirements-pip.txt
Once the installation is completed, the last line of the expected output is as follows:
Successfully installed ansible-4.5.0 ansible-core-2.11.12 resolvelib-0.5.4
Ansible must be correctly configured before it can be used, which is why we provide a basic configuration file for Ansible (
ansible-example.cfg
).
One of the key options in the configuration is
log_path
.
This way, Ansible tracks all playbook runs in a single log file (in this case,
~/.ansible/ansible.log
).
To make sure the configuration file is always applied, copy it to your home directory using the following command.
Make sure to adapt the path to the
ansible-example.cfg
file depending on your current working directory.
Copy Ansible configuration file to home directory
cp ~/one/ansible/ansible-example.cfg ~/.ansible.cfg
Verify that the configuration file has been correctly set up.
To do so, check the Ansible version using the following command:
Verify Ansible version
ansible --version
The expected output is as follows.
The exact paths might vary depending on your environment.
Verify Ansible version console output
ansible [core 2.11.12]
config file = None
configured module search path = ['/home/<user>/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/<user>/venv/lib/python3.11/site-packages/ansible
ansible collection location = /home/<user>/.ansible/collections:/usr/share/ansible/collections
executable location = /home/<user>/venv/bin/ansible
python version = 3.11.2 (main, Feb 16 2023, 02:55:59) [Clang 14.0.0 (clang-1400.0.29.202)]
jinja version = 3.1.2
libyaml = False
Once you have a working Ansible installation, proceed with installing external dependency roles.
The expected output is as follows.
While the warnings about the missing inventory and hosts list are expected, the play recap must not contain any failed steps.
An inventory is a list of servers that are provisioned through Ansible.
Building an inventory includes defining all the necessary hosts and variables in a dedicated folder structure.
To prepare an inventory for hybrid DPE installation:
Create a copy of the sample inventory (
one/ansible/inventories/example_hybrid
).
The inventory should be stored in the same parent folder (
inventories
) under a different name, for example,
customer
.
To do this, you can use the following command:
Create a new inventory
cd ~/one/ansible/inventories
cp -r example_hybrid/. <new_inventory>
Add your provisioned hosts (managed nodes) to the hosts file.
This is done by modifying the
hosts.yml
file located in the inventory created in step 1 (
one/ansible/inventories/<inventory>
).
The file has the following structure:
Structure of hosts.yml file
children:
processing:
hosts:
<dpe-server-hostname>:
<dpe-2-server-hostname>: optional, but at least one server in the processing group is required
key1: value1 for any server (host), one can optionally define variables, which are host-specific (e.g. license)
Replace the hostname placeholders (such as
<dpe-server-hostname>
) with the correct hostnames for all DPE servers that you want to work with.
The processing group can have one or multiple hosts and it is also possible to add host-specific variables.
For example, to specify the license that a DPE instance should use, the variable needs to be declared as follows:
<dpe-server-hostname>:
license: /path/to/license.plf
Provide the necessary variables.
Variables are declared in the
vars.yml
file, located in
one/ansible/inventories/<inventory>/group_vars/all
.
This is where you define the various endpoints and secrets that are used to connect to the Ataccama ONE PaaS environment.
Some variables are specific for each environment and therefore must be set.
The following list contains all the variables you must update accordingly:
dpe_license_file
: The path to the DPE license file on the Ansible controller.
You can define it in the group variables (
vars.yml
) or, when provisioning multiple DPEs, for each host separately (
hosts.yml
, see step 2).
keycloak_url
: The PaaS Keycloak authentication server endpoint:
https://[customer].[env].ataccama.online/auth/
.
dpe_token_client
,
dpe_admin_client
: The credentials for the DPE token and admin clients.
dpe_token_client:
client: dpe-token-client
secret: dpe-token-client-s3cret
dpe_admin_client:
client: dpe-admin-client
secret: dpe-admin-client-s3cret
minio_url
: The PaaS ONE Object Storage (MinIO) endpoint:
https://minio.[customer].[env].ataccama.online
.
minio
: MinIO credentials (access key and secret key).
minio:
access_key: minio
secret_key: minio-secret
dpe_jwt_key
: The private key of the on-premise DPE.
Provided by Ataccama together with other necessary credentials.
dpm
: The gRPC host where DPM is available.
The gRPC and HTTP port numbers should remain unchanged.
host: dpm-grpc.[customer].[env].ataccama.online
grpc_port: 443
http_port: 8031
dpm_jwt_key
: The public key of the DPM module.
Provided by Ataccama together with other necessary credentials.
dpm_jwt_key:
name: dpm-prod-key
jwt key content
content: {kty":"EC",crv":"P-256","kid":"vcjAli5Xm_pvtE8ItBkd3aT_FWi_23WieMf5f-lppBI","x":"Hbs53V5zC-1DjNf5RtJ1bNHlxvzM5jST7J1ADVePV9g","y":"4pVfzrF7FMHt_Xx2FgLauvLZuJqbpL9crdOxvTXWb64","alg":"ES256"}
jwt key fingerprint
fp: vcjAli5Xm_pvtE8ItBkd3aT_FWi_23WieMf5f-lppBI
Define any additional configuration for the DPE module under the variable
dpe_additional_config
, depending on your requirements.
The following example shows how to enable communication over bidirectional gRPC stream between DPE and DPM.
For context, when this option is set, the two modules require only one open connection to communicate.
dpe_additional_config: |
Additional DPM connection properties - enable bidirectional streaming via TLS and trust all certificates
ataccama.client.connection.dpm.grpc.tls.enabled=true
ataccama.client.connection.dpm.grpc.tls.trust-all=true
ataccama.one.dpe.service.dpm.connection.mode=FIREWALL_FRIENDLY_REGISTRATION
ataccama.one.dpe.label=dpe-hybrid
Additional DPE data sources / drivers configuration
plugin.jdbcdatasource.ataccama.one.driver.redshift.disabled=false
Regarding
ataccama.one.dpe.label
, it is important you assign a unique value to the DPE configuration label to ensure proper functioning and avoid conflicts with remote configurations.
The label serves as the DPE’s unique identifier.
The default value is
dpe
.
For engines with different configurations, each must have a distinct label value.
By default, the variables file is also preconfigured to download several JDBC drivers for some of commonly used data sources.
The list of downloaded drivers is provided in the
dpe_drivers
variable using the following syntax:
dpe_drivers:
- name: redshift
remote_url: https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/2.0.0.1/redshift-jdbc42-2.0.0.1.jar
For example, if you used the paths provided in this guide and want to connect through the
admin
account, with the private key located in
~/.ssh/admin-private
, the command should be updated as follows:
If the installation finished successfully, the expected output of the play recap is as follows.
To verify that DPE can also communicate with the Ataccama ONE PaaS, make sure to go through sections
Installation checks
and
Check DPE status
as well.
PLAY RECAP ************************************************************************************************************************************************************************************************
dpe-1-server-hostname : ok=86 changed=31 unreachable=0 failed=0 skipped=32 rescued=0 ignored=2
dpe-2-server-hostname : ok=86 changed=31 unreachable=0 failed=0 skipped=32 rescued=0 ignored=2
A post-installation check included in the Ansible play verifies that DPE is running without issues.
If that is the case, the following output is expected:
In case this task fails, more information about the issue can be found in the output.
The following example illustrates the error that occurs when DPE is not able to reach DPM (the expected response code from the monitoring endpoint is
200 OK
, however,
503 Service Unavailable
was received).