运行pytorch作业出现错误 RuntimeError: unable to write to file </torch_xxx>

https://github.com/huaweicloud/dls-example/issues/26

pytorch将共享内存的临时文件保存在了/torch_xxx文件中,即容器中的根目录下。容器磁盘空间不足导致该问题的发生。目前可以通过以下代码暂时关闭pytorch的shared memory功能来规避

直接加在train.py的最前面就可以

import sys
import torch
from torch.utils.data import dataloader
from torch.multiprocessing import reductions
from multiprocessing.reduction import ForkingPickler

default_collate_func = dataloader.default_collate


def default_collate_override(batch):
  dataloader._use_shared_memory = False
  return default_collate_func(batch)

setattr(dataloader, 'default_collate', default_collate_override)

for t in torch._storage_classes:
  if sys.version_info[0] == 2:
    if t in ForkingPickler.dispatch:
        del ForkingPickler.dispatch[t]
  else:
    if t in ForkingPickler._extra_reducers:
        del ForkingPickler._extra_reducers[t]

####以下是train的原始代码

 

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐