You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
By clicking “Sign up for GitHub”, you agree to our
terms of service
and
privacy statement
. We’ll occasionally send you account related emails.
Already on GitHub?
Sign in
to your account
你的卡之间没有用pcie,驱动的问题,你试试,import torch a=torch.rand(2) a=a.to(”cuda:0") b=a.to("cuda:1") b输出如果都归零了,说明多卡之间的通信失败,会导致归零,需要改底层代码,卡之间传递的中间结果先转到cpu,再转到第二个卡。
试了下服务器上单机多卡,的确归零了。但是都插在PCIe上,请教下可能是什么原因造成呢?