The merge function in Databricks Git folders merges one branch into another using
git
merge
. A merge operation is a way to combine the commit history from one branch into another branch; the only difference is the strategy it uses to achieve this. For Git beginners, we recommend using merge (over rebase) because it does not require force pushing to a branch and therefore does not rewrite commit history.
Rebase
a branch on another branch
Access the Git
Rebase
operation by selecting it from the
kebab menu in the upper right of the Git operations dialog.
Rebasing alters the commit history of a branch. Like
git
merge
,
git
rebase
integrates changes from one branch into another. Rebase does the following:
Saves the commits on your current branch to a temporary area.
Resets the current branch to the chosen branch.
Reapplies each individual commit previously saved on the current branch, resulting in a linear history that combines changes from both branches.
For an in-depth explanation of rebasing, see
git rebase
.
Warning
Using rebase can cause versioning issues for collaborators working in the same repo.
A common workflow is to rebase a feature branch on the main branch.
To rebase a branch on another branch:
From the
Branch
menu in the Git folders UI, select the branch you want to rebase.
Select
Rebase
from the kebab menu.
Select the branch you want to rebase on.
The rebase operation integrates changes from the branch you choose here into the current branch.
Databricks Git folders runs
git
commit
and
git
push
--force
to update the remote Git repo.
Resolve merge conflicts
Merge conflicts happen when 2 or more Git users attempt to merge changes to the same lines of a file into a common branch and Git cannot choose the “right” changes to apply. Merge conflicts can also occur when a user attempts to pull or merge changes from another branch into a branch with uncommitted changes.
If an operation such as pull, rebase, or merge causes a merge conflict, the Git folders UI shows a list of files with conflicts and options for resolving the conflicts.
You have two primary options:
Use the Git folders UI to resolve the conflict.
Abort the Git operation, manually discard the changes in the conflicting file, and try the Git operation again.
When resolving merge conflicts with the Git folders UI, you must choose between manually resolving the conflicts in the editor or keeping all incoming or current changes.
Keep All Current
or
Take Incoming Changes
If you know you
only
want to keep all of the current or incoming changes, click the kebab to the right of the file name in your notebook pane and select either
Keep all current changes
or
Take all incoming changes
. Click the button with the same label to commit the changes and resolve the conflict.
Confused about which option to pick? The color of each option matches the respective code changes that it will keep in the file.
Manually Resolving Conflicts
Manual conflict resolution lets you determine which of the conflicting lines should be accepted in the merge. For merge conflicts, you resolve the conflict by directly editing the contents of the file with the conflicts.
To resolve the conflict, select the code lines you want to preserve and delete everything else, including the Git merge conflict markers. When you’re done, select
Mark As Resolved
.
If you decide you made the wrong choices when resolving merge conflicts, click the
Abort
button to abort the process and undo everything. Once all conflicts are resolved, click the
Continue Merge
or
Continue Rebase
option to resolve the conflict and complete the operation.
Git
reset
In Databricks Git folders, you can perform a Git
reset
within the Databricks UI. Git reset in Databricks Git folders is equivalent to
git
reset
--hard
combined with
git
push
--force
.
Git reset replaces the branch contents and history with the most recent state of another branch. You can use this when edits are in conflict with the upstream branch, and you don’t mind losing those edits when you reset to the upstream branch.
Read more about git `reset –hard`
.
Reset to an upstream (remote) branch
With
git
reset
in this scenario:
You reset your selected branch (for example,
feature_a
) to a different branch (for example,
main
).
You also reset the upstream (remote) branch
feature_a
to main.
Important
When you reset, you lose all uncommitted and committed changes in both the local and remote version of the branch.
To reset a branch to a remote branch:
In the Git folders UI from the
Branch
menu, choose the branch you want to reset.
Configure sparse checkout mode
Sparse checkout is a client side setting which allows you to clone and work with only a subset of the remote repositories’s directories in Databricks. This is especially useful if your repository’s size is beyond the Databricks supported
limits
.
You can use the Sparse Checkout mode when adding (cloning) a new repo.
In the
Add Git folder
dialog, open
Advanced
.
Select
Sparse checkout mode
.
In the
Cone patterns
box, specify the cone checkout patterns you want. Separate multiple patterns by line breaks.
At this time, you can’t disable sparse checkout for a repo in Databricks.
How cone patterns work
To understand how cone pattern works in the sparse checkout mode, see the following diagram representing the remote repository structure.
If you select
Sparse checkout mode
, but do not specify a cone pattern, the default cone pattern is applied. This includes only the files in root and no subdirectories, resulting in a repo structure as following:
Setting the sparse checkout cone pattern as
parent/child/grandchild
results in all contents of the
grandchild
directory being recursively included. The files immediately in the
/parent
,
/parent/child
and root directory are also included. See the directory structure in the following diagram:
You can add multiple patterns separated by line breaks.
Exclusion behaviors (
!
) are not supported in Git cone pattern syntax.
Modify sparse checkout settings
Once a repo is created, the sparse checkout cone pattern can be edited from
Settings > Advanced > Cone patterns
.
Note the following behavior:
Removing a folder from the cone pattern removes it from Databricks if there are no uncommitted changes.
Adding a folder via editing the sparse checkout cone pattern adds it to Databricks without requiring an additional pull.
Sparse checkout patterns cannot be changed to remove a folder when there are uncommitted changes in that folder.
For example, a user edits a file in a folder and does not commit changes. She then tries to change the sparse checkout pattern to not include this folder. In this case, the pattern is accepted, but the actual folder is not deleted. She needs to revert the pattern to include that folder, commit changes, and then reapply the new pattern.
You can’t disable sparse checkout for a repo that was created with Sparse Checkout mode enabled.
Make and push changes with sparse checkout
You can edit existing files and commit and push them from the Git folder. When creating new folders of files, include them in the cone pattern you specified for that repo.
Including a new folder outside of the cone pattern results in an error during the commit and push operation. To fix it, edit the cone pattern to include the new folder you are trying to commit and push.
Patterns for a repo config file
The commit outputs config file uses patterns similar to
gitignore patterns
and does the following:
Positive patterns enable outputs inclusion for matching notebooks.
Negative patterns disable outputs inclusion for matching notebooks.
Patterns are evaluated in order for all notebooks.
Invalid paths or paths not resolving to
.ipynb
notebooks are ignored.
Positive pattern:
To include outputs from a notebook path
folder/innerfolder/notebook.ipynb
, use following patterns:
folder
/**
folder
/
innerfolder
/
note
*
Negative pattern:
To exclude outputs for a notebook, check that none of the positive patterns match or add a negative pattern in a correct spot of the configuration file. Negative (exclude) patterns start with
!
:
!folder/innerfolder/*.ipynb
!folder/**/*.ipynb
!**/notebook.ipynb