这个问题是因为深度学习的程序(服务)跟本地主机连接不上,解决方法是确认rank起始数为0。
报错原文
[W socket.cpp:663] [c10d] The client socket has failed to connect to [csdn-xiaohu]:12345 (errno: 22 - Invalid argument).
解决方法
Rank应该从0开始,Rank should start from 0。
opt.rank = kwargs.get("start_rank", 0) + opt.gpu_id
To
opt.rank = kwargs.get("start_rank", 0) + i
原版笔记
If the socket is not valid.
The call is being blocked and cannot get to the client who opened it.
The client has closed/is closing the socket at the time of the call.