apr 15 09:08:34 xxxxxxxx systemd[1]: Starting Elasticsearch...
apr 15 09:09:49 xxxxxxxx systemd[1]: elasticsearch.service: start operation timed out. Terminating.
apr 15 09:09:50 xxxxxxxx systemd[1]: elasticsearch.service: Failed with result 'timeout'.
apr 15 09:09:50 xxxxxxxx systemd[1]: Failed to start Elasticsearch.
xxxxxxxx@xxxxxxxx:~$ sudo systemctl status elasticsearch.service
● elasticsearch.service - Elasticsearch
Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Fri 2022-04-15 09:09:50 CEST; 18s ago
Docs: https://www.elastic.co
Process: 759 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=exited, status=143)
Main PID: 759 (code=exited, status=143)
Fortunately it happened on the test system and not on the main server
I did the same operation on two other graylog systems identical to the one that had a bad outcome and in the other cases the problem did not manifest itself
In the case failed by restarting the service manually everything goes back to working
What could it have depended on?
I don’t have much faith in updating on the main server before figuring out how to fix the problem highlighted should it re-emerge
I would not like to have to resort to manual restart of the service
Thanks for taking the time
Hello @alessio.dapelo
It might have been couple different reasons why it failed to start.
I need to ask a couple question.
Elasticsearch failed to start what did the log files show before and after the failed restart?
vi /var/log/elasticsearch/elqsticsearch.log
When this issue occurred by chance was the Journalctl checked? Journalctl is to view a Systemd Logs.
root# journalctl
alessio.dapelo:
In the case failed by restarting the service manually everything goes back to working
Correct me if I’m wrong but after you restart elasticsearch it started working again?
If so, it might be something with the system and not with Graylog, not 100% sure. I would need to see more data. If this is correct it doesn’t seam to be a configuration issue.
alessio.dapelo:
What could it have depended on?
A complete scan of any logs files that could possibly show an interruption of elasticsearch service.
alessio.dapelo:
I would not like to have to resort to manual restart of the service
Not sure how this upgrade && update was executed. More details would be needed.
Checklist:
1.Insure you have elasticsearch starting up on reboot.
systemctl enable elasticsearch
2.When upgrades are applied, it is suggested that Elasticsearch starts first, wait till the service is fully operational, then start MongoDb service. Once both of those service are running fine, next step is to start Graylog service. These steps have never failed me.
Hope that helps
EDIT: This also may help
How to prevent systemd service start operation from timing out | sleeplessbeastie's notes
Hello @alessio.dapelo
It might have been couple different reasons why it failed to start.
I need to ask a couple question.
Elasticsearch failed to start what did the log files show before and after the failed restart?
vi /var/log/elasticsearch/elqsticsearch.log
When this issue occurred by chance was the Journalctl checked? Journalctl is to view a Systemd Logs.
root# journalctl
alessio.dapelo:
In the case failed by restarting the service manually everything goes back to working
Correct me if I’m wrong but after you restart elasticsearch it started working again?
If so, it might be something with the system and not with Graylog, not 100% sure. I would need to see more data. If this is correct it doesn’t seam to be a configuration issue.
alessio.dapelo:
What could it have depended on?
A complete scan of any logs files that could possibly show an interruption of elasticsearch service.
alessio.dapelo:
I would not like to have to resort to manual restart of the service
Not sure how this upgrade && update was executed. More details would be needed.
Checklist:
1.Insure you have elasticsearch starting up on reboot.
systemctl enable elasticsearch
2.When upgrades are applied, it is suggested that Elasticsearch starts first, wait till the service is fully operational, then start MongoDb service. Once both of those service are running fine, next step is to start Graylog service. These steps have never failed me.
Hope that helps
EDIT: This also may help
Thanks for your answer.
I’m trying to analyze the logs you indicated.
To answer some of your questions, I can tell you that by restarting the elasticsearch service everything will work again.
As for how the update was done, I can answer that having set the various repositories I simply ran the commands apt-get update upgrade and dist-upgrade from the terminal saying not to replace the server.conf file.
I performed the same procedure on two other identical virtual machines without running into this problem.
Could it be useful to increase the waiting time before reporting a timeout?
This is my current configuration
XXXX @ XXXX: ~ $ sudo systemctl show elasticsearch | grep ^ Timeout
[sudo] password for XXXX:
TimeoutStartUSec = 1min 15s
TimeoutStopUSec = infinity
TimeoutAbortUSec = infinity
TimeoutCleanUSec = infinity
XXXX @ XXXX: ~ $
alessio.dapelo:
Could it be useful to increase the waiting time before reporting a timeout?
Not sure that will help. We need to know exactly why that service failed. The reason there is a timeout is because systemd tries to restart or start the service and at the end of those tries systemd is unable to start that service, it will timeout.
So increasing the timeout may only delay the inevitable which is the error were seeing.
What is needed is more logs to identify the issue.
Executing systemctl status elasticsearch -l
may show more details on this issue.
Also using Journalctl to show system logs are helpful in troubleshooting services.
Examples:
journalctl -ef
Jump to the end of the journal (-e, and enable follow mode (-f).
journalctl -u elasticsearch
This will display all messages generated by, and about, the elasticsearch.service.
alessio.dapelo:
As for how the update was done, I can answer that having set the various repositories I simply ran the commands apt-get update upgrade and dist-upgrade from the terminal saying not to replace the server.conf file.
I have done that also, but now when I upgrade/update Graylog I make it a point to shutdown Graylog service run updates then start graylog back up and tail -f logs files. Few time I caught problems/issue that manifest during the start-up process. If Graylog is the only service that requires update/s I do not stop the service, this would also depend on what version is going to be installed.
With Elasticsearch I have a tendency to check out the release notes on that version to insure a easy upgrade process. Sometime it may require a restart on ES service. This also depends what ES version is being updated to.
I’ll do some more checks on the system logs
At the moment using the command
sudo systemctl edit --full elasticsearch.service
I increased the timeout time and restarting several times to test the virtual machine the elasticsearch service has always started regularly