Running with gitlab-runner 16.0.1 (79704081)
on #SERVER#
Preparing the "docker" executor
Using Docker executor with image python:3.10.12 ...
Pulling docker image python:3.10.12 ...
Using docker image sha256:23e11cf6844c334b2970fd265fb09cfe88ec250e1e80db7db973d69d757bdac4 for python:3.10.12 with digest docker.io/python@sha256:60ec661aff9aa0ec90bc10ceeab55d6d04ce7b384157d227917f3b49f2ddb32e ...
Preparing environment
Running on #RUNNER# via #SERVER#...
Getting source from Git repository 00:03
Fetching changes with git depth set to 50...
Initialized empty Git repository in #BUILD_GITDIR#
Created fresh repository.
Checking out #HASH# as detached HEAD (ref is test-build-change)...
Skipping Git submodules setup
Executing "step_script" stage of the job script 00:01
Using docker image sha256:23e11cf6844c334b2970fd265fb09cfe88ec250e1e80db7db973d69d757bdac4 for python:3.10.12 with digest docker.io/python@sha256:60ec661aff9aa0ec90bc10ceeab55d6d04ce7b384157d227917f3b49f2ddb32e ...
shell not found
Cleaning up project directory and file based variables 00:01
ERROR: Job failed: exit code 1
That works fine if we pin it back to python:3.10.11
.
A further bit of discovery: the new Docker images build from a Debian 12 (bookworm) base image, rather than the previous Debian 11 (bullseye) image, presumably because there was a high severity OpenSSL vulnerability (CVE-2023-2650).
python:3.10.11
image on Docker Hub
python:3.10.12
image on Docker Hub
Is it possible that the change in the underlying OS base image could have also changed the shell configuration/availability for these images, such that it’s not holding hands with the Gitlab runner correctly anymore?
Could be.
What you are seeing, I’ve experienced once (similar) with a Windows runner, when a wrong shell command was defined in the runner configuration.
AFAIK:
Every docker image provides one or more shells (terminals) that can be used by a runner to execute script defined in .gitlab-ci.yml
. This is prerequisite for any script to run in the job and I believe this might be the reason why your script part is not executing. E.g. if I use ubuntu:latest - it provides “/bin/sh” shells and “/bin/bash” shells → this means Runner has to use one of those shells as well.
GitLab runner supports different shells, depending on the platform - Types of shells supported by GitLab Runner | GitLab . It can be configured in config.toml
file of the runner. Normally, default works, but this is where things can be mismatched.
I might be wrong as well, but this could be something to check.
Are you using your own GitLab runners or shared runners from gitlab.com ? If you have your own runners, can you please share your config.toml
with us?
P.S. Have you tried adding this to your config file?
Confirming essentially what @DrCuriosity wrote above – the images that fail here was rebuilt from bookworm to bullseye, but in some cases, the release number was not bumped. Several work-arounds below, including using 3.10.11 if you previously relied on 3.10.
It’s not clear to me what the source problem is. A similar problem occurred many years ago and is referenced "shell not found" when trying to use Ubuntu or Fedora image (#27614) · Issues · GitLab.org / gitlab-runner · GitLab, but that task is still open! Some suggest a newer version of docker fixes the problem. However, I don’t think that’s the right answer.
I suspect that the gitlab-ci runner does actually have a problem, perhaps by relying on the use of bash, instead of using purely posix shell scripts. But I could not reproduce the problem running a container directly using the same inputs. The source code of the gitlab-ci runner is quite convoluted. Even with debugging, I could not ascertain what is really going on.
I also cannot understand why there is a difference because of Debian11 to 12. In analyzing diffs across the exported containers, I could not understand why the third workaround (see below) would have the effect it does:
On both exported filesystems, /bin/sh points to /bin/dash
On both exported filesystems, /bin/dash and /bin/bash are real executables about the same size from their corresponding mate on the other image.
Perhaps gitlab-ci-runner is invoking a scriptlet or the container in some way that the gitlab-runner’s --debug
mode does not indicate.
OK, taking a step back:
python:3.10.12 is seen with bookworm in digest python@sha256:aa79a3d35cb9787452dad51e17e4b6e06822a1a601f8b4ac4ddf74f0babcbfd5 . There are no problems with this image.
However, the same version of python with the same minor version number was release under bullseye with the digest python@sha256:a8462db480ec3a74499a297b1f8e074944283407b7a417f22f20d8e2e1619782. This image will fail without workarounds.
Workarounds
Use the digest of the last working image, as suggested above
Find the most recent minor version number that still works: For 3.10, it’s 3.10.11. And pray some idiot doesn’t rebuild and re-push that image.
Use the fugly hack suggested on gitlab issue tracker.
image:
name: python:3.10
entrypoint: [ '/bin/bash', '-c', 'ln -snf /bin/bash /bin/sh && /bin/bash -c $0' ]
Are you using your own GitLab runners or shared runners from gitlab.com ? If you have your own runners, can you please share your config.toml
with us?
I’m working with a GitLab instance internal to an institution. Community Edition v16.0.2, runner is currently gitlab-runner 16.0.1 (79704081)
. The runner configuration is locked down and not available to me. I’ll see if I can find the right person to make aware of this thread, though.
I’m having the same issues. Running gitlab-runner v. 16.0.2.
My config.toml:
concurrent = 1
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "*************************************************"
url = "https://gitlab.com/"
id = 22901457
token = "*********************************"
token_obtained_at = 2023-04-24T16:18:40Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker"
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
As mentioned earlier, this appears to be a problem with any image built using bookworm. Looking at the projects in my GitLab instance, we use different containers for different tasks. Our composer images are built using alpine which run fine. However, the latest version of node, python, and php use bookworm, those all give the no shell found error. If I change the job that uses node:latest to node:18-alpine it will work.
I tried specifying an entrypoint for the image but get the following error:
install_npm_dependencies:
image:
name: "node:latest"
entrypoint: ["/bin/bash"] # also tried /bin/sh, /usr/bin/bash, and /usr/bin/sh
/usr/bin/sh: /usr/bin/sh: cannot execute binary file
When the entry point is set to /usr/bin/sh then I get a message saying that it can’t open the file. When I run the container locally with either /bin/bash or /usr/bin/sh it works.
I can confirm that this
entrypoint: [ '/bin/bash', '-c', 'ln -snf /bin/bash /bin/sh && /bin/bash -c $0' ]
works. Apparently you have to override the entrypoint. I’ve also set my shell=“bash” in the [[runners]] section of config.toml if that matters.
In my case gitlab-runner’s shell detection script was failing to stat the available shell executables due to an incompatibility between the container and the host, thus returning failure for every check and giving up with the “shell not found” error.
This sometimes happens when running bleeding edge images on older hosts, but typically it’s more obvious and often presents itself as a filesystem permissions error or some other system call failure. Essentially, the binaries/libraries in the container are using new/modified system calls that the dockerd/containerd’s seccomp layer doesn’t understand yet. Updating the host kernel and container runtime tends to fix this.
Thanks @rpetti!
We faced same kind of issue while using oracle linux 9 build image on lower version of VM. Akash has come across your comment.
We are getting your insights in and out of opentext 