I have a job running on production which process xml files.
xml files counts around 4k and of size 8 to 9 GB all together.
After processing we get CSV files as output. I've a cat command which will merge all CSV files to a single file I'm getting:
Errno::ENOMEM: Cannot allocate memory
on cat
(Backtick) command.
Below are few details:
- System Memory - 4 GB
- Swap - 2 GB
- Ruby : 1.9.3p286
Files are processed using nokogiri
and saxbuilder-0.0.8
.
Here, there is a block of code which will process 4,000 XML files and output is saved in CSV (1 per xml) (sorry, I'm not suppose to share it b'coz of company policy).
Below is the code which will merge the output files to a single file
Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each {|file|
`cat #{file} >> #{final_output_file}`
}
I've taken memory consumption snapshots during processing.It consumes almost all part of the memory, but, it won't fail.
It always fails on cat
command.
I guess, on backtick it tries to fork a new process which doesn't get enough memory so it fails.
Please let me know your opinion and alternative to this.
So it seems that your system is running pretty low on memory and spawning a shell + calling cat is too much for the few memory left.
If you don't mind loosing some speed, you can merge the files in ruby, with small buffers.
This avoids spawning a shell, and you can control the buffer size.
This is untested but you get the idea :
buffer_size = 4096
output_file = File.open(final_output_file, 'w')
Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each do |file|
f = File.open(file)
while buffer = f.read(buffer_size)
output_file.write(buffer)
end
f.close
end
You are probably out of physical memory, so double check that and verify your swap (free -m
). In case you don't have a swap space, create one.
Otherwise if your memory is fine, the error is most likely caused by shell resource limits. You may check them by ulimit -a
.
They can be changed by ulimit
which can modify shell resource limits (see: help ulimit
), e.g.
ulimit -Sn unlimited && ulimit -Sl unlimited
To make these limit persistent, you can configure it by creating the ulimit setting file by the following shell command:
cat | sudo tee /etc/security/limits.d/01-${USER}.conf <<EOF
${USER} soft core unlimited
${USER} soft fsize unlimited
${USER} soft nofile 4096
${USER} soft nproc 30654
EOF
Or use /etc/sysctl.conf
to change the limit globally (man sysctl.conf
), e.g.
kern.maxprocperuid=1000
kern.maxproc=2000
kern.maxfilesperproc=20000
kern.maxfiles=50000
I have the same problem, but instead of cat
it was sendmail
(gem mail
).
I found problem & solution here by installing posix-spawn
gem, e.g.
gem install posix-spawn
and here is the example:
a = (1..500_000_000).to_a
require 'posix/spawn'
POSIX::Spawn::spawn('ls')
This time creating child process should succeed.
See also: Minimizing Memory Usage for Creating Application Subprocesses at Oracle.