Hi all,
recently I got complaints about some mailboxes (all get fetched from external pop3 accounts) not receiving mails for hours (randomly for about 3 or 4, sometimes even 8 hours).
After checking the logs I found that endeed some of the accounts fail in SSL/TLS handshaking and so don't connect and don't fetch new mails from pop3.
The strange thing is, if I trigger by the "Download now" button within one of the external accounts the immediate fetch works. As well the first try after a hms restart always seems to work. All of the accounts (the working and the ones with the handshake error fetch from the same server (XX.XXX.XX.X) with the same settings except the last two lines in the log showing YYY.YYY.YY.YYY, which is another server. All accounts are set up identically (fetch every 2 minutes, use SSL/TLS, Anti-SPAM and Anti-Virus active, delete after 31 days), at least as I tried to double checked them all carefully.
Disabling all pop3 fetches one by one and renenabling them did work for the effected ones just one time. After that, the handshakes starts failing again while the other accounts which already worked before stay working.
In total there are about 20 accounts using pop3 collect from which I think nearly 25-50% do have the handshake problem.
"DEBUG" 130648 "2019-08-20 06:19:13.646" "TCP connection started for session 589"
"DEBUG" 130648 "2019-08-20 06:19:13.647" "Performing SSL/TLS handshake for session 589. Verify certificate: True, Expected remote host name: pop.XXX.de"
"DEBUG" 130648 "2019-08-20 06:19:13.648" "TCP connection started for session 590"
"DEBUG" 108256 "2019-08-20 06:19:13.648" "Certificate verification succeeded for session 578."
"DEBUG" 130648 "2019-08-20 06:19:13.649" "Performing SSL/TLS handshake for session 590. Verify certificate: True, Expected remote host name: pop.XXX.de"
"DEBUG" 67736 "2019-08-20 06:19:13.649" "Certificate verification succeeded for session 579."
"DEBUG" 120292 "2019-08-20 06:19:13.650" "TCP connection started for session 588"
"DEBUG" 92100 "2019-08-20 06:19:13.651" "Certificate verification succeeded for session 580."
"DEBUG" 120292 "2019-08-20 06:19:13.652" "Performing SSL/TLS handshake for session 588. Verify certificate: True, Expected remote host name: pop.YYY.de"
"TCPIP" 19560 "2019-08-20 06:19:13.653" "TCPConnection - TLS/SSL handshake failed. Session Id: 589, Remote IP: XX.XXX.XX.X, Error code: 1, Message: stream truncated"
"DEBUG" 19560 "2019-08-20 06:19:13.654" "Ending session 589"
"DEBUG" 58612 "2019-08-20 06:19:13.655" "Completed retrieval of messages from external account."
"TCPIP" 19560 "2019-08-20 06:19:13.655" "TCPConnection - TLS/SSL handshake failed. Session Id: 590, Remote IP: XX.XXX.XX.X, Error code: 1, Message: stream truncated"
"DEBUG" 19560 "2019-08-20 06:19:13.657" "Ending session 590"
"TCPIP" 67736 "2019-08-20 06:19:13.657" "TCPConnection - TLS/SSL handshake completed. Session Id: 577, Remote IP: XX.XXX.XX.X, Version: TLSv1.2, Cipher: AES128-GCM-SHA256, Bits: 128"
"DEBUG" 57144 "2019-08-20 06:19:13.657" "Completed retrieval of messages from external account."
"DEBUG" 92100 "2019-08-20 06:19:13.660" "Certificate verification succeeded for session 584."
"TCPIP" 67736 "2019-08-20 06:19:13.665" "TCPConnection - TLS/SSL handshake completed. Session Id: 578, Remote IP: XX.XXX.XX.X, Version: TLSv1.2, Cipher: AES128-GCM-SHA256, Bits: 128"
"TCPIP" 67736 "2019-08-20 06:19:13.669" "TCPConnection - TLS/SSL handshake completed. Session Id: 579, Remote IP: XX.XXX.XX.X, Version: TLSv1.2, Cipher: AES128-GCM-SHA256, Bits: 128"
"TCPIP" 67736 "2019-08-20 06:19:13.671" "TCPConnection - TLS/SSL handshake completed. Session Id: 580, Remote IP: XX.XXX.XX.X, Version: TLSv1.2, Cipher: AES128-GCM-SHA256, Bits: 128"
"DEBUG" 67736 "2019-08-20 06:19:13.678" "Certificate verification succeeded for session 588."
"TCPIP" 92100 "2019-08-20 06:19:13.693" "TCPConnection - TLS/SSL handshake completed. Session Id: 584, Remote IP: YYY.YYY.YY.YYY, Version: TLSv1.2, Cipher: ECDHE-RSA-AES128-GCM-SHA256, Bits: 128"
"TCPIP" 92100 "2019-08-20 06:19:13.710" "TCPConnection - TLS/SSL handshake completed. Session Id: 588, Remote IP: YYY.YYY.YY.YYY, Version: TLSv1.2, Cipher: ECDHE-RSA-AES128-GCM-SHA256, Bits: 128"
What I already tried:
- updating to the newest (I know "non porduction") build (HMS 5.7.0-B2484 as it already was on 5.7.0-2469) --> no difference
- disabling all pop3 accounts one by one and renenabling --> works just after enabling one time
- fetching manually by triggering "Download now" button --> works - as far as I can tell allways
- deleting one of the affected external pop3 accounts and creating it new --> works one time but fetches again all mails not having alrady been deleted on pop3 server and so messes up the users mailbox - don't like to try that again
- enabling and disabling of "TLS v1.3" --> no difference
- enabling and disabling of "Verify remote server SSL/TLS certificate" --> no difference
- tried to somehow install the certifcate of the pop3 server locally on hms --> no difference
Does anyone by any chance have a clue what this can be and how to fix it?
Many thanks in advance for any hint or help!
HMS 5.7.0
-B2484 (x64) (DB version 5700) - MySQL 8.0.15
ClamWin0.99.4
+ Clamd service -
SpamAssassin 3.4.1.38
Windows Server 2012 - 64Bit
Message: stream truncated
It means a Winsock2 server socket in LISTENING MODE (in this case hMailServer.exe) did receive a few TCP packets but not all follow up fragments packets
needed to get the complete network payload. In this case, the received Socket DATA is dropped entirely.
You mentioned, sometimes it works but sometimes not.
Overall this is a clear indication there is something problematic going on with your Network, because you have Network Packet loss.
This Problem can't be resolved by hMailServer, it needs to be resolved by your Network settings, by your Firewall or your Internet Connection Provider.
you're telling there are about 20 accounts popping.
are they on same provider? if so are they starting to pop more or less on same time?
AFAIK 1&1 Co. mail providers (gmx, web.de) restrict too frequent and/or simultaneus POP connections.
did you try to pop in longer periods (say every 10 minutes) and put some minutes delay after each account?
Katip
HMS 5.7, MariaDB 10.4.10, SA 4.0.0, ClamAV 0.103.8
From those logs, I see session 589 and 590 failed which, if your redacting is true, go to the same server ("pop.XXX.de") where the other that succeed go to pop.YYY.de.
I also note that they all try to connect at the same moment of
06:19:13
.
Are you sure that the receiving server(s) allow so many multiple connections at exactly the same time and/or so frequent? Try staggering the retry interval between the accounts that connect to pop.XXX.de (in this example) to that there are not so many simultaneous connections.
EDIT: there you go, just like Katip said! (he beat me to it).
5.7 on test.
SpamassassinForWindows 3.4.0
spamd service
AV:
Clamwin
+
Clamd
service +
sanesecurity defs
:
https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
Thanks to all of you for your answers and comments!! It really helps having more eyes on things like this
You mentioned, sometimes it works but sometimes not.
Overall this is a clear indication there is something problematic going on with your Network, because you have Network Packet loss.
This Problem can't be resolved by hMailServer, it needs to be resolved by your Network settings, by your Firewall or your Internet Connection Provider.
I'm note sure if this is true. Because why does it allways happen to the same accounts and to some it does never - if it is really a general network problem? But I don't want to exclude this completely.
Any advice for a simple test?
you're telling there are about 20 accounts popping.
are they on same provider? if so are they starting to pop more or less on same time?
AFAIK 1&1 Co. mail providers (gmx, web.de) restrict too frequent and/or simultaneus POP connections.
did you try to pop in longer periods (say every 10 minutes) and put some minutes delay after each account?
3 are popping from 1&1
18 are popping from an external exchange server
Good point!
Did not try longer periods yet, will test this. As far as I know there is no possibility to set something like a delay between different accounts, right?
From those logs, I see session 589 and 590 failed which, if your redacting is true, go to the same server ("pop.XXX.de") where the other that succeed go to pop.YYY.de.
I also note that they all try to connect at the same moment of
06:19:13
.
Are you sure that the receiving server(s) allow so many multiple connections at exactly the same time and/or so frequent? Try staggering the retry interval between the accounts that connect to pop.XXX.de (in this example) to that there are not so many simultaneous connections.
EDIT: there you go, just like Katip said! (he beat me to it).
Somehow yes, 18 accounts go to pop.XXX.de which is the external exchange server (in this part of the log 2 of them are failing, correct. But in sum there are much more).
3 go to pop.YYY.de which is 1&1 (as far as I can see they always work, maybe because they are only three at the same time, yes).
You are right, I am not sure! Maybe we reduced the fetching interval at some time and the problem started with that.
I will try now to set different intervals to hopefully avoid to many simlutaneous connections.
If there is a way to set a real "delay" between fetching of different accounts, please let me know. Don't see the possibility.
HMS 5.7.0
-B2484 (x64) (DB version 5700) - MySQL 8.0.15
ClamWin0.99.4
+ Clamd service -
SpamAssassin 3.4.1.38
Windows Server 2012 - 64Bit
Are you using 1&1 as Broadband DSL-Provider?
If you are using a Fritzbox /or different DSL-Router, check your DSL-Sync / Noise reduction level.
Restart your DSL-Router, update your Firmware and ping random ip addresses with -t against gmx.net and make sure there are no
Timeouts. If there are Timeouts, call technical support hotline of 1&1 DSL-Support and tell them about DSL Sync Errors and demand fixing.
Maybe you need a better DSL-Router which can handle your DSL-Line better at all.
Bottom line: "Truncated" means in all possible cases, a few Datagrams where received but other Datagrams where crippled and unrepairable
by the TCP/IP-Stack of your Windows Operating system or expected Datagrams to complete the data read from socket wasn't completed due the lack
of correct data.
Anyway:
There is something trashing/crippling your Network data while it travels from/to your hMailServer.
If there is a way to set a real "delay" between fetching of different accounts, please let me know. Don't see the possibility.
not that i know. it starts fetching as soon as you create/save account or click on manual download. then delay starts counting.
workaround: make manual downloads and wait for desired intervall long to start each next manual download. after you've done all, keep it so.
Katip
HMS 5.7, MariaDB 10.4.10, SA 4.0.0, ClamAV 0.103.8
Are you using 1&1 as Broadband DSL-Provider?
If you are using a Fritzbox /or different DSL-Router, check your DSL-Sync / Noise reduction level.
Restart your DSL-Router, update your Firmware and ping random ip addresses with -t against gmx.net and make sure there are no
Timeouts. If there are Timeouts, call technical support hotline of 1&1 DSL-Support and tell them about DSL Sync Errors and demand fixing.
Maybe you need a better DSL-Router which can handle your DSL-Line better at all.
No, it's a Telekom V-DSL line.
We are using a Lancom Router with up to date firmware connected to a Speedport Modem. I don't know how to check nosie reduction on these yet, I would know it for the Fritzbox. So I will do some research for new firmware of the Speedport and how to analyse on this setup. And maybe in the end get some new hardware or at least do tests with some other.
not that i know. it starts fetching as soon as you create/save account or click on manual download. then delay starts counting.
workaround: make manual downloads and wait for desired intervall long to start each next manual download. after you've done all, keep it so.
Hm, this will only work until the next restart of HMS, right? Which will be every evening after backup job script. So not a really efficient workaround in my opinion.
But thanks for the idea and at least it is just good for testing it!
HMS 5.7.0
-B2484 (x64) (DB version 5700) - MySQL 8.0.15
ClamWin0.99.4
+ Clamd service -
SpamAssassin 3.4.1.38
Windows Server 2012 - 64Bit
Hm, this will only work until the next restart of HMS, right?
no no, i think effectively rescheduled immediately after manual check. just try and see.
Katip
HMS 5.7, MariaDB 10.4.10, SA 4.0.0, ClamAV 0.103.8
Yes, I understood. I ment it will get back to start all fetches at the same time when hms gets restarted.
So it will only fix it for one day in my case.
But anyway as I can now see it even does not. I adjusted all 18 accounts to different intervalls, some have 2 minutes, some 3, some 4 and some 5 minutes. It seemed to get better and I got less handshake errors. But after a while some accounts again stopped fetching.
In addition my feeling is that even after I modified all accounts of course at a different time
I still get something like this (no hms restart yet since yesterday evening):
"TCPIP" 120292 "2019-08-20 18:22:36.257" "TCPConnection - TLS/SSL handshake failed. Session Id: 8597, Remote IP: XXX.XXX.XX.XXX, Error code: 1, Message: stream truncated"
"TCPIP" 83212 "2019-08-20 18:22:36.257" "TCPConnection - TLS/SSL handshake failed. Session Id: 8598, Remote IP: XXX.XXX.XX.XXX, Error code: 1, Message: stream truncated"
"TCPIP" 120292 "2019-08-20 18:22:36.257" "TCPConnection - TLS/SSL handshake failed. Session Id: 8599, Remote IP: XXX.XXX.XX.XXX, Error code: 1, Message: stream truncated"
"TCPIP" 120292 "2019-08-20 18:22:36.272" "TCPConnection - TLS/SSL handshake failed. Session Id: 8600, Remote IP: XXX.XXX.XX.XXX, Error code: 1, Message: stream truncated"
"TCPIP" 120292 "2019-08-20 18:22:36.272" "TCPConnection - TLS/SSL handshake failed. Session Id: 8601, Remote IP: XXX.XXX.XX.XXX, Error code: 1, Message: stream truncated"
"TCPIP" 120292 "2019-08-20 18:22:36.272" "TCPConnection - TLS/SSL handshake failed. Session Id: 8602, Remote IP: XXX.XXX.XX.XXX, Error code: 1, Message: stream truncated"
"TCPIP" 137356 "2019-08-20 18:22:36.335" "TCPConnection - TLS/SSL handshake completed. Session Id: 8594, Remote IP: XXX.XXX.XX.XXX, Version: TLSv1.2, Cipher: AES128-GCM-SHA256, Bits: 128"
"TCPIP" 9588 "2019-08-20 18:22:36.335" "TCPConnection - TLS/SSL handshake completed. Session Id: 8595, Remote IP: XXX.XXX.XX.XXX, Version: TLSv1.2, Cipher: AES128-GCM-SHA256, Bits: 128"
"TCPIP" 137356 "2019-08-20 18:22:36.335" "TCPConnection - TLS/SSL handshake completed. Session Id: 8596, Remote IP: XXX.XXX.XX.XXX, Version: TLSv1.2, Cipher: AES128-GCM-SHA256, Bits: 128"
"TCPIP" 9588 "2019-08-20 18:22:36.335" "TCPConnection - TLS/SSL handshake completed. Session Id: 8593, Remote IP: XXX.XXX.XX.XXX, Version: TLSv1.2, Cipher: AES128-GCM-SHA256, Bits: 128"
So it looks like they still start somehow synchronized, not sure how and why... And it is really hard to find out which of them fail and need to get out of synch by doing a manual fetch.
I would have expected that if I modify one by one they get a new "start fetch time" at the timepoint I click OK and than the intervall gets added to that timepoint and everything gets rolling and as I choosed different intervalls they should stay out off sync and never hit the same timpoint again.
But even if I find one of the failing accounts and hit "Download now" it works for a wile and after some hours it stopps again.
My only idea about this behavoir is that saving an external account sets (maybe) a new "global" starting fetch time point for all accounts.
HMS 5.7.0
-B2484 (x64) (DB version 5700) - MySQL 8.0.15
ClamWin0.99.4
+ Clamd service -
SpamAssassin 3.4.1.38
Windows Server 2012 - 64Bit
You cant't fix this Problem by changing or tweaking hMailServer or Windows settings. Your DSL signal cuts TCP Packets which are required to decode the Email message content. At this level, only fixing your DSL Problem will help. Or if its not DSL its your LAN cables or a switch which cripples hMailServers connections.
Ill give you a little idea for scheduling External Downloads.
The script below is a VBS script I have (eg, "Run_Download_prompt.vbs") that prompts me to choose a specific external down. Depending on the number I choose it then runs the download associated to that account (note there are 2x downloads for 'sylvester' to choose from).
You can use this as an example of how to created a standalone script that, with the use of a WAIT or windows 'timeout.exe' within it, you can cycle through your downloads firing off the individual downloads separately and therefore not simulataneously. Create a VBS, stick it in to Task Scheduler, then sit back knowing your downloads are separated. (Remember that the script below uses prompting but this is just a starter (remove the prompting)).
You cant't fix this Problem by changing or tweaking hMailServer or Windows settings. Your DSL signal cuts TCP Packets which are required to decode the Email message content. At this level, only fixing your DSL Problem will help. Or if its not DSL its your LAN cables or a switch which cripples hMailServers connections.
Thanks for the hints! I will try finding the root cause. I started to investigate this yesterday night and can indeed see some "TCP retransition errors" in wireshark, about 0.7%. Not nowing if there should be zero or how much are ok. Will have to trow a closer look.
So there seems to be at least an additional problem but I have to contradict that it will not be possible to fix or tweak it. More below...
Ill give you a little idea for scheduling External Downloads.
The script below is a VBS script I have (eg, "Run_Download_prompt.vbs") that prompts me to choose a specific external down. Depending on the number I choose it then runs the download associated to that account (note there are 2x downloads for 'sylvester' to choose from).
You can use this as an example of how to created a standalone script that, with the use of a WAIT or windows 'timeout.exe' within it, you can cycle through your downloads firing off the individual downloads separately and therefore not simulataneously. Create a VBS, stick it in to Task Scheduler, then sit back knowing your downloads are separated. (Remember that the script below uses prompting but this is just a starter (remove the prompting)).
Man, this is really awesome!! Thanks a lot for the script. I fiddeled around for an hour and changed it to my needs. In the end I ended up with a task running every 2 minutes and starting the vbs script with an argument passed. If the argument is set it loops through all domains and within all active accounts and active external pop3 accounts. Between every fetch I do a 1 second sleep. Worked like a charme for about 5 hours now without a single handshake error!
So this really seems to be a solution for me
Thanks again to all of you who contributed here!
HMS 5.7.0
-B2484 (x64) (DB version 5700) - MySQL 8.0.15
ClamWin0.99.4
+ Clamd service -
SpamAssassin 3.4.1.38
Windows Server 2012 - 64Bit
Man, this is really awesome!! Thanks a lot for the script. I fiddeled around for an hour and changed it to my needs. In the end I ended up with a task running every 2 minutes and starting the vbs script with an argument passed. If the argument is set it loops through all domains and within all active accounts and active external pop3 accounts. Between every fetch I do a 1 second sleep. Worked like a charme for about 5 hours now without a single handshake error!
So this really seems to be a solution for me
Thanks again to all of you who contributed here!
Care to share the script? It will be useful for others.
5.7 on test.
SpamassassinForWindows 3.4.0
spamd service
AV:
Clamwin
+
Clamd
service +
sanesecurity defs
:
https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
Care to share the script? It will be useful for others.
Sure, I like too! Shall I do it directly here or better in the Scripting forum and post the link here as reference?
HMS 5.7.0
-B2484 (x64) (DB version 5700) - MySQL 8.0.15
ClamWin0.99.4
+ Clamd service -
SpamAssassin 3.4.1.38
Windows Server 2012 - 64Bit
HMS 5.7.0
-B2484 (x64) (DB version 5700) - MySQL 8.0.15
ClamWin0.99.4
+ Clamd service -
SpamAssassin 3.4.1.38
Windows Server 2012 - 64Bit