Errno::ENOMEM: Cannot allocate memory - cat

2019-04-19 06:52发布

问题:

I have a job running on production which process xml files. xml files counts around 4k and of size 8 to 9 GB all together.

After processing we get CSV files as output. I've a cat command which will merge all CSV files to a single file I'm getting:

Errno::ENOMEM: Cannot allocate memory

on cat (Backtick) command.

Below are few details:

  • System Memory - 4 GB
  • Swap - 2 GB
  • Ruby : 1.9.3p286

Files are processed using nokogiri and saxbuilder-0.0.8.

Here, there is a block of code which will process 4,000 XML files and output is saved in CSV (1 per xml) (sorry, I'm not suppose to share it b'coz of company policy).

Below is the code which will merge the output files to a single file

Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each {|file|
            `cat #{file} >> #{final_output_file}`
}

I've taken memory consumption snapshots during processing.It consumes almost all part of the memory, but, it won't fail. It always fails on cat command.

I guess, on backtick it tries to fork a new process which doesn't get enough memory so it fails.

Please let me know your opinion and alternative to this.

回答1:

So it seems that your system is running pretty low on memory and spawning a shell + calling cat is too much for the few memory left.

If you don't mind loosing some speed, you can merge the files in ruby, with small buffers. This avoids spawning a shell, and you can control the buffer size.

This is untested but you get the idea :

buffer_size = 4096
output_file = File.open(final_output_file, 'w')

Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each do |file|
  f = File.open(file)
  while buffer = f.read(buffer_size)
    output_file.write(buffer)
  end
  f.close
end


回答2:

You are probably out of physical memory, so double check that and verify your swap (free -m). In case you don't have a swap space, create one.

Otherwise if your memory is fine, the error is most likely caused by shell resource limits. You may check them by ulimit -a.

They can be changed by ulimit which can modify shell resource limits (see: help ulimit), e.g.

ulimit -Sn unlimited && ulimit -Sl unlimited

To make these limit persistent, you can configure it by creating the ulimit setting file by the following shell command:

cat | sudo tee /etc/security/limits.d/01-${USER}.conf <<EOF
${USER} soft core unlimited
${USER} soft fsize unlimited
${USER} soft nofile 4096
${USER} soft nproc 30654
EOF

Or use /etc/sysctl.conf to change the limit globally (man sysctl.conf), e.g.

kern.maxprocperuid=1000
kern.maxproc=2000
kern.maxfilesperproc=20000
kern.maxfiles=50000


回答3:

I have the same problem, but instead of cat it was sendmail (gem mail).

I found problem & solution here by installing posix-spawn gem, e.g.

gem install posix-spawn

and here is the example:

a = (1..500_000_000).to_a

require 'posix/spawn'
POSIX::Spawn::spawn('ls')

This time creating child process should succeed.

See also: Minimizing Memory Usage for Creating Application Subprocesses at Oracle.