This article describes how to manage Databricks compute, including displaying, editing, starting, terminating, deleting, controlling access, and monitoring performance and logs. You can also use the
Clusters API
to manage compute programmatically.
View compute
To view your compute, click
Compute
in the workspace sidebar.
On the left side are two columns indicating if the compute has been pinned and the status of the compute. Hover over the status to get more information.
View compute configuration as a JSON file
Sometimes it can be helpful to view your compute configuration as JSON. This is especially useful when you want to create similar compute using the
Clusters API
. When you view an existing compute, go to the
Configuration
tab, click
JSON
in the top right of the tab, copy the JSON, and paste it into your API call. JSON view is read-only.
Pin a compute
30 days after a compute is terminated, it is permanently deleted. To keep an all-purpose compute configuration after a compute has been
terminated
for more than 30 days, an administrator can pin the compute. Up to 100 compute resources can be pinned.
Admins can pin a compute from the compute list or the compute detail page by clicking the pin icon.
Edit a compute
You can edit a compute’s configuration from the compute details UI.
Notebooks and jobs that were attached to the compute remain attached after editing.
Libraries installed on the compute remain installed after editing.
If you edit any attribute of a running compute (except for the compute size and permissions), you must restart it. This can disrupt users who are currently using the compute.
You can only edit a running or terminated compute. You can, however, update
permissions
for compute not in those states on the compute details page.
Clone a compute
To clone an existing compute, select
Clone
from the compute’s
kebab menu.
After you select
Clone
, the compute creation UI opens pre-populated with the compute configuration. The following attributes are NOT included in the clone:
Compute permissions
Attached notebooks
If you don’t want to include the previously installed libraries in the cloned compute, click the drop-down menu next to the
Create compute
button and select
Create without libraries
.
Compute permissions
There are four permission levels for a compute: NO PERMISSIONS, CAN ATTACH TO, CAN RESTART, and CAN MANAGE. The table lists the abilities for each permission.
Important
Users with CAN ATTACH TO permissions can view the service account
keys in the log4j file. Use caution when granting this permission level.
Workspace admins have the CAN MANAGE permission on all compute in their workspace. Users automatically have the CAN MANAGE permission on the compute they create.
Secrets
are not redacted from a cluster’s Spark driver log
stdout
and
stderr
streams. To protect sensitive data, by default, Spark driver logs are viewable only by users with CAN MANAGE permission on job, single user access mode, and shared access mode clusters. To allow users with CAN ATTACH TO or CAN RESTART permission to view the logs on these clusters, set the following Spark configuration property in the cluster configuration:
spark.databricks.acl.needAdminPermissionToViewLogs
false
.
On No Isolation Shared access mode clusters, the Spark driver logs can be viewed by users with CAN ATTACH TO or CAN MANAGE permission. To limit who can read the logs to only users with the CAN MANAGE permission, set
spark.databricks.acl.needAdminPermissionToViewLogs
to
true
.
See
Spark configuration
to learn how to add Spark properties to a cluster configuration.
You must have the CAN MANAGE permission on a compute to configure compute permissions.
In the sidebar, click
Compute
.
On the row for the compute, click the kebab menu
on the right, and select
Edit permissions
.
In
Permission Settings
, click the
Select user, group or service principal…
drop-down menu and select a user, group, or service principal.
Select a permission from the permission drop-down menu.
Click
Add
and click
Save
.
Terminate a compute
To save compute resources, you can terminate a compute. The terminated compute’s configuration is stored so that it can be
reused
(or, in the case of jobs,
autostarted
) at a later time. You can manually terminate a compute or configure the compute to terminate automatically after a specified period of inactivity. When the number of terminated compute exceeds 150, the oldest compute is deleted.
Unless a compute is
pinned
or restarted, it is automatically and permanently deleted 30 days after termination.
Terminated compute appear in the compute list with a gray circle at the left of the compute name.
When you run a
job
on a new Job compute (which is usually recommended), the compute terminates and is unavailable for restarting when the job is complete. On the other hand, if you schedule a job to run on an existing All-Purpose compute that has been terminated, that compute will
autostart
.
Manual termination
You can manually terminate a compute from the compute list (by clicking the square on the compute’s row) or the compute detail page (by clicking
Terminate
).
Automatic termination
You can also set auto termination for a compute. During compute creation, you can specify an inactivity period in minutes after which you want the compute to terminate.
If the difference between the current time and the last command run on the compute is more than the inactivity period specified, Databricks automatically terminates that compute.
A compute is considered inactive when all commands on the compute, including Spark jobs, Structured Streaming, and JDBC calls, have finished executing. This does not include commands run by SSH-ing into the compute and running bash commands.
Warning
Compute do not report activity resulting from the use of DStreams. This means that an auto-terminating compute may be terminated while it is running DStreams. Turn off auto termination for compute running DStreams or consider using Structured Streaming.
Idle compute continue to accumulate DBU and cloud instance charges during the inactivity period before termination.
Configure automatic termination
You can configure automatic termination in the new compute UI. Ensure that the box is checked, and enter the number of minutes in the
Terminate after ___ of minutes of inactivity
setting.
You can opt out of auto termination by clearing the Auto Termination checkbox or by specifying an inactivity period of
0
.
Auto termination is best supported in the latest Spark versions. Older Spark versions have known limitations which can result in inaccurate reporting of compute activity. For example, compute running JDBC, R, or streaming commands can report a stale activity time that leads to premature compute termination. Please upgrade to the most recent Spark version to benefit from bug fixes and improvements to auto termination.
Unexpected termination
Sometimes a compute is terminated unexpectedly, not as a result of a manual termination or a configured automatic termination.
For a list of termination reasons and remediation steps, see the
Knowledge Base
.
Delete a compute
Deleting a compute terminates the compute and removes its configuration. To delete a compute, select
Delete
from the compute’s
menu.
Warning
You cannot undo this action.
To delete a pinned compute, it must first be unpinned by an administrator.
You can also invoke the
Clusters API
endpoint to delete a compute programmatically.
Restart a compute
You can restart a previously terminated compute from the compute list, the compute detail page, or a notebook. You can also invoke the
Clusters API
endpoint to start a compute programmatically.
Databricks identifies a compute using its unique
cluster ID
. When you start a terminated compute, Databricks re-creates the compute with the same ID, automatically installs all the libraries, and reattaches the notebooks.
Restart a compute to update it with the latest images
When you restart a compute, it gets the latest images for the compute resource containers and the VM hosts. It is important to schedule regular restarts for long-running compute such as those used for processing streaming data.
It is your responsibility to restart all compute resources regularly to keep the image up-to-date with the latest image version.
Important
If you enable the
compliance security profile
for your account or your workspace, long-running compute is automatically restarted as needed during a scheduled maintenance window. This reduces the risk of an auto-restart disrupting a scheduled job. You can also force restart during the maintenance window. See
Automatic cluster update
.
Notebook example: Find long-running compute
If you are a workspace admin, you can run a script that determines how long each of your compute has been running, and optionally, restart them if they are older than a specified number of days. Databricks provides this script as a notebook.
If your workspace is part of the
public preview of automatic compute update
, you might not need this script. Compute restarts automatically if needed during the scheduled maintenance windows.
The first lines of the script define configuration parameters:
min_age_output
: The maximum number of days that a compute can run. Default is 1.
perform_restart
: If
True
, the script restarts any compute with age greater than the number of days specified by
min_age_output
. The default is
False
, which identifies long-running compute but does not restart them.
secret_configuration
: Replace
REPLACE_WITH_SCOPE
and
REPLACE_WITH_KEY
with a
secret scope and key name
. For more details of setting up the secrets, see the notebook.
Warning
If you set
perform_restart
to
True
, the script automatically restarts eligible compute, which can cause active jobs to fail and reset open notebooks. To reduce the risk of disrupting your workspace’s business-critical jobs, plan a scheduled maintenance window and be sure to notify the workspace users.
Identify and optionally restart long-running compute