Python多处理 – debuggingOSError：无法分配内存

我正面临以下问题。我试图并行化一个函数，更新文件，但我不能启动Pool()因为OSError: [Errno 12] Cannot allocate memory 。我已经开始在服务器上四处查看了，而不是像我正在使用一个旧的，弱的/实际的内存。请参阅htop ：

另外， free -m显示除了〜7GB的交换内存之外，我还有大量的RAM：

而我正在尝试使用的文件也不是那么大。我将在下面粘贴我的代码（和堆栈跟踪），大小如下：

使用的predictionmatrixmatrixdataframe占用了大约。 80MB根据pandasdataframe.memory_usage()文件geo.geojson是2MB

我怎么去debugging呢？我可以检查什么以及如何？感谢您的任何提示/技巧！

码：

Linux的OOM杀手中的默认oom_adj值

如何解决在10800x10800matrix的Matlab内存不足错误？

如何解决Matlab中的内存不足错误？

Windows上的IBM Websphere – OutOfMemoryError：无法创build线程

为什么在不同的堆栈使用情况下，每次运行都会发生堆栈溢出而不是固定的数量？

def parallelUpdateJSON(paramMatch,predictionmatrix,data): for feature in data['features']: currentfeature = predictionmatrix[(predictionmatrix['SId']==feature['properties']['cellId']) & paramMatch] if (len(currentfeature) > 0): feature['properties'].update({"style": {"opacity": currentfeature.AllActivity.item()}}) else: feature['properties'].update({"style": {"opacity": 0}}) def writeGeoJSON(weekdaytopredict,hourtopredict,predictionmatrix): with open('geo.geojson') as f: data = json.load(f) paramMatch = (predictionmatrix['Hour']==hourtopredict) & (predictionmatrix['Weekday']==weekdaytopredict) pool = Pool() func = partial(parallelUpdateJSON,paramMatch,predictionmatrix) pool.map(func,data) pool.close() pool.join() with open('output.geojson','w') as outfile: json.dump(data,outfile)

堆栈跟踪：

--------------------------------------------------------------------------- OSError Traceback (most recent call last) <ipython-input-428-d6121ed2750b> in <module>() ----> 1 writeGeoJSON(6,15,baseline) <ipython-input-427-973b7a5a8acc> in writeGeoJSON(weekdaytopredict,predictionmatrix) 14 print("Start loop") 15 paramMatch = (predictionmatrix['Hour']==hourtopredict) & (predictionmatrix['Weekday']==weekdaytopredict) ---> 16 pool = Pool(2) 17 func = partial(parallelUpdateJSON,predictionmatrix) 18 print(predictionmatrix.memory_usage()) /usr/lib/python3.5/multiprocessing/context.py in Pool(self,processes,initializer,initargs,maxtasksperchild) 116 from .pool import Pool 117 return Pool(processes,maxtasksperchild,--> 118 context=self.get_context()) 119 120 def RawValue(self,typecode_or_type,*args): /usr/lib/python3.5/multiprocessing/pool.py in __init__(self,context) 166 self._processes = processes 167 self._pool = [] --> 168 self._repopulate_pool() 169 170 self._worker_handler = threading.Thread( /usr/lib/python3.5/multiprocessing/pool.py in _repopulate_pool(self) 231 w.name = w.name.replace('Process','PoolWorker') 232 w.daemon = True --> 233 w.start() 234 util.debug('added worker') 235 /usr/lib/python3.5/multiprocessing/process.py in start(self) 103 'daemonic processes are not allowed to have children' 104 _cleanup() --> 105 self._popen = self._Popen(self) 106 self._sentinel = self._popen.sentinel 107 _children.add(self) /usr/lib/python3.5/multiprocessing/context.py in _Popen(process_obj) 265 def _Popen(process_obj): 266 from .popen_fork import Popen --> 267 return Popen(process_obj) 268 269 class SpawnProcess(process.BaseProcess): /usr/lib/python3.5/multiprocessing/popen_fork.py in __init__(self,process_obj) 18 sys.stderr.flush() 19 self.returncode = None ---> 20 self._launch(process_obj) 21 22 def duplicate_for_child(self,fd): /usr/lib/python3.5/multiprocessing/popen_fork.py in _launch(self,process_obj) 65 code = 1 66 parent_r,child_w = os.pipe() ---> 67 self.pid = os.fork() 68 if self.pid == 0: 69 try: OSError: [Errno 12] Cannot allocate memory

UPDATE

根据@ robyschek的解决scheme，我已经更新了我的代码：

global g_predictionmatrix def worker_init(predictionmatrix): global g_predictionmatrix g_predictionmatrix = predictionmatrix def parallelUpdateJSON(paramMatch,data_item): for feature in data_item['features']: currentfeature = predictionmatrix[(predictionmatrix['SId']==feature['properties']['cellId']) & paramMatch] if (len(currentfeature) > 0): feature['properties'].update({"style": {"opacity": currentfeature.AllActivity.item()}}) else: feature['properties'].update({"style": {"opacity": 0}}) def use_the_pool(data,predictionmatrix): pool = Pool(initializer=worker_init,initargs=(predictionmatrix,)) func = partial(parallelUpdateJSON,paramMatch) pool.map(func,data) pool.close() pool.join() def writeGeoJSON(weekdaytopredict,predictionmatrix): with open('geo.geojson') as f: data = json.load(f) paramMatch = (predictionmatrix['Hour']==hourtopredict) & (predictionmatrix['Weekday']==weekdaytopredict) use_the_pool(data,predictionmatrix) with open('trentino-grid.geojson',outfile)

我仍然得到同样的错误。此外，根据文档， map()应该将我的data分成块，所以我不认为它应该复制我的80MB rownum时间。我可能是错误的，但… :)另外我注意到，如果我使用较小的input（〜11MB而不是80MB），我不会得到错误。所以我想我试图使用太多的内存，但我无法想象如何从80MB到16GB的内存无法处理。

多个Java webapps在Linux上的总线程限制

模拟磁盘空间exception

如何检测内存不足段错误？

处理，如果没有足够的内存可用来启动这个线程c＃

充满10M +日志文件的文件夹，我无法删除它们

我们有这个几次。据我的系统管理员，在Unix中有一个“错误”，如果你的内存不足，会引发同样的错误，如果你的进程达到最大文件描述符限制。

我们有一个文件描述符的泄漏，并提出错误[Errno 12]不能分配内存＃012OSError。

所以你应该看看你的脚本，并仔细检查问题是不是创建了太多的FD

在使用multiprocessing.Pool ，启动进程的默认方式是fork 。 fork问题是整个过程是重复的。（详情请看这里）。因此，如果你的主进程已经使用了大量的内存，这个内存将被复制，达到这个MemoryError 。例如，如果您的主进程使用2GB的内存，并且使用8 18GB ，则RAM中需要18GB的内存。

您应该尝试使用不同的启动方法，如'forkserver'或'spawn' ：

from multiprocessing import set_start_method,Pool set_start_method('forkserver') # You can then start your Pool without each process # cloning your entire memory pool = Pool() func = partial(parallelUpdateJSON,data)

这些方法可以避免重复Process的工作空间，但可能会稍微慢一些，因为您需要重新加载正在使用的模块。

Python多处理 – debuggingOSError：无法分配内存

相关推荐