HPC Pack 2019 - Unable to connect to the head node running cluster manager on the head node server -

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

很酷的橡皮擦 · 依赖属性值继承 - WPF .NET | ...· 1 月前 ·

冷静的热带鱼 · 地平线、黑芝麻智能接踵赴港IPO - 21财经· 3 月前 ·

含蓄的高山 · 和平湾以南！沈阳又一个核心板块有“大动作”， ...· 3 月前 ·

从容的饭卡 · 持普通护照中国公民前往有关国家和地区入境便利 ...· 5 月前 ·

眼睛小的烤地瓜 · 类似海棠搜书自由的小说搜索软件有哪些 ...· 5 月前 ·

I'm having trouble starting HPC Cluster Manager and Job Manager locally on the Head Node, getting the following error almost every time (sometimes it works):

Unable to connect to the head node.

The connection to the management service failed. detail error: Microsoft.Hpc.RetryCountExhaustException: Retry Count of RetryManager is exhausted. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full
   at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot, SocketAddress socketAddress)
   at System.Net.Sockets.Socket.InternalBind(EndPoint localEP)
   at System.Net.Sockets.Socket.BeginConnectEx(EndPoint remoteEP, Boolean flowContext, AsyncCallback callback, Object state)
   at System.Net.Sockets.Socket.UnsafeBeginConnect(EndPoint remoteEP, AsyncCallback callback, Object state)
   at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)
   --- End of inner exception stack trace ---
   at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
   at System.Net.Http.HttpClientHandler.GetResponseCallback(IAsyncResult ar)
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Hpc.HttpClientExtension.<>c__DisplayClass5_0.<
			 Hi SvenssonOscar,  
This issue could be caused by the depletion of max user ports for TCP connections. You may run the following commands to check and modify the max user ports.  
netsh int ipv4 show dynamicport tcp  
netsh int ipv4 set dynamicport tcp start=10000 num=55536  
Regards,  

Yutong Sun
											Hi Oscar,  
Do you have any chance to run 'netstat -ano' and share the output to investigate this issue further? Could you check which process is occupying the dynamic ports?  
Regards,  

Yutong Sun  
			 Hello all together.  
We are running into the very same problem described at the beginning.  
Using Windows Server 2016 Standard (14393.4770) and HPC Pack 2016 (5.2.6291.0).  
Already changed the portrange to  
netsh int ipv4 show dynamicport tcp
    Protocol tcp Dynamic Port Range
    ---------------------------------
    Start Port      : 1025
    Number of Ports : 40000
it ran longer until the same error behavior occurred.  
Seems to me that the HPC scheduler opens a lot winsock based connections until it exhausts the configured number of ports.  
Is there a way to solve this behavior without restarting the server?  
Many thanks in advance and regards,  

Michael  
											Hi Michael,  
We have known port leak issue in HPC Pack 2016 Update 2. It is fixed in HPC Pack 2016 Update 3 with the latest QFE. Please upgrade the cluster to version 5.3.6450 whenever possible.  
Please check https://github.com/azure/hpcpack for version details.  
Regards,  

Yutong Sun
											Dear Yutong Sun,  
thank you for your reply. I updated the HPC cluster server to Update 3 with the latest QFE and since then it works fine.  
Regards,  

Michael
			 Hi SvenssonOscar,    

I got the same problem,  WinSrv 2022 & HPC 2019 update1, and I think it caused by TLS1.0 was disabled.    

I used "IISCrypto.exe" on the head node, and click "Best Practices" and reboot, then everything is OK .