Atlas integration
Starburst Enterprise platform (SEP) features integration with
Apache Atlas
, a framework for the governance of data and
metadata assets. This allows you to include changes to SEP catalogs,
schemas, tables, columns, and queries as part of an overall enterprise data
governance plan.
Introduction
The Atlas support in SEP is implemented as an
event listener
that detects changes to SEP objects and sends
notice of those changes to an Atlas server by means of a
Kafka message bus
. Starburst also provides the
atlas-cli
command,
which allows you to manage the relationship between your SEP
cluster and Atlas.
Most home-grown and commercial data governance systems can import from and
export to Apache Atlas. This means that enterprises using a non-Atlas data
governance system can still take advantage of SEP’s Atlas support by using it
as a bridge to their system.
Setup steps
To integrate Atlas with your SEP cluster, follow the sections in numbered
order.
Setup summary
Set up Atlas support for a SEP cluster with the following steps:
The
requirements
must be in place before you begin.
Configure an Atlas plugin
on your coordinator.
Register Atlas types
for SEP objects with the
atlas-cli
command.
Register your SEP cluster
on Atlas with
atlas-cli
.
Load catalogs
and their components onto Atlas with
atlas-cli
.
Restart your cluster
and verify Atlas connectivity.
1. Requirements
SEP’s support for Apache Atlas requires:
SEP cluster version 356 or later, configured and running.
Apache Atlas 2.1.0 or later, configured and running.
Apache Kafka, configured to consume and emit Atlas messages.
You must be able to contact the Atlas and Kafka servers at their specified
ports from the SEP coordinator.
Atlas CLI
downloaded from Starburst Support then installed and
configured.
A valid
Starburst Enterprise license
for the Starburst Atlas
plugin.
Follow the
guidance for the Starburst Atlas plugin
to create a configuration file that defines the
properties of your cluster’s connection to Atlas and Kafka.
After preparing this configuration,
do not restart your cluster yet
! Wait
for
step 6
before you restart.
3. Register SEP types
The
atlas-cli
command keeps an internal registry of eight Atlas-format
types that describe SEP objects. Run the following command to upload
these SEP-specific definitions to Atlas.
atlas-cli types create --server https://atlas.example.com:21000 --user=admin --password
See the Atlas CLI reference for this command.
4. Register SEP cluster
One of the properties you configure for your cluster in step 2 is atlas.cluster.name
, where you assign an
arbitrary name for your SEP cluster. Use a command like the following to
register this cluster name with Atlas.
atlas-cli cluster register --server https://atlas.example.com:21000 \
--cluster-name fastqueries --user admin --password
The value of the cluster-name
parameter here must match the
atlas.cluster.name
property already configured.
See the Atlas CLI reference for this
command.
5. Load catalogs on Atlas
You must tell Atlas what SEP catalogs and/or schemas and tables you want
tracked. This step loads the object names to be tracked. Thereafter, if there
are any changes in these objects, the Starburst Atlas plugin running on your SEP
cluster detects those changes and notifies Atlas.
“Change” here refers to a change in structure, such as a new column added to a
table, or a table deleted from a schema. SEP does not store data, so it is not
the job of the Atlas plugin to track changes in table data.
For each catalog on your SEP cluster whose objects you want to track in Atlas,
use an atlas-cli
command with catalog register
command. For example:
atlas-cli catalog register --server https://atlas.example.com:21000 \
--cluster-name fastqueries --user admin --password \
--starburst-jdbc-url "jdbc:trino://cluster.example.com:8080?user=starburst_service" \
--catalog tpch --schema tiny --table nation
See the Atlas CLI reference for further
options.
6. Restart cluster and test
When all SEP cluster objects are registered in Atlas, restart your cluster.
Test the Atlas integration by browsing with the Atlas web interface. Create a
new table and register that table with Atlas. Then add a column to that table
and make sure the change is reflected in Atlas.
Limitations
SEP’s support for Apache Atlas has the following limitations:
Once a cluster or catalog is registered on an Atlas server, it cannot be
unregistered.
There is no attempt to de-duplicate tables. For example, on a cluster
connected to other SEP clusters by means of the Starburst Stargate
connector, it is possible for the same
table’s structure metadata to be loaded twice, from a local catalog and from a
remote catalog.