I'm having a problem with running npm install
from a chef recipe. When I run it from the command line, it finishes fine in under a minute with just a few warnings related to package.json no repository field (which should be harmless). But when I run it from chef, it hangs with the last line output back to the command line as this:
* execute[npm-install-app] action run
Which is this resource block in the recipe:
execute "npm-install-app" do
cwd "#{home}/#{prefix}#{app}"
command "npm --registry #{priv['url']}:#{priv['port']}#{priv['path']} install --cache #{home}/.npm --tmp #{home}/tmp > npm-run.log 2>&1"
user node['nodejs']['user']
action :run
end
Where #{home}
expands out to /home/nodejs
and the user is nodejs
.
As you can see, I redirected the output to a file to a file with > npm-run.log 2>&1
. The output file gets the output of the npm install command written to it (unlike the command line), and the last thing that comes through is this:
-- a bunch of 200's and 304s, like this --
npm http 304 http://my.private.npm.amazonaws.com/registry/_design/app/_rewrite/esprima
kerberos@0.0.3 install /home/nodejs/my-app/node_modules/mongoose-q/node_modules/mongoose/node_modules/mongodb/node_modules/kerberos
(node-gyp rebuild 2> builderror.log) || (exit 0)
kerberos
is a dependency of one module we're relying on, but we aren't using kerberos ourselves. I gather from other sources that npm is running node-gyp to compile a version of the app that isn't available packaged on the npm server.
It will sit in that state for 2 hours until chef shellout registers a timeout and it shows a fatal error. ps -e
will show npm is still running when chef-client is still running, and interrupting chef-client will cause npm to disappear from the process list, which suggests that npm still thinks it is still doing meaningful work, at least. (On a side note, while I was having connection problems, I was inclined to ask this question. There is a high degree of probability that this npm install
is the underlying problem of the other question, but I think they warrant separate consideration.)
Edit: Running the chef-client with a -l debug
adds a tiny amount of information to the /var/log/chef/client.log
file, which basically confirms that the npm install
command is the last resource to be executed before hanging:
[2014-01-09T22:49:28+00:00] INFO: Processing execute[npm-install-app] action run (my-app::default line 111)
[2014-01-09T22:49:28+00:00] DEBUG: Platform ubuntu version 12.04 found
Am I right in thinking that the || (exit 0) is throwing off the chef ShellOut provider detecting a successful exit? Is there anything I can do about it?
Edit 2: Chef just timed out from a run with -l debug
set, and still only got log information on the timeout.
[2014-01-10T00:26:56+00:00] ERROR: execute[npm-install-app] (my-app::default line 111) had an error: Mixlib::ShellOut::CommandTimeout: command timed out:
---- Begin output of npm --registry http:my.private.npm.amazonaws.com:5984/registry/_design/app/_rewrite install --cache /home/nodejs/.npm --tmp /home/nodejs/tmp > npm-run.log 2>&1 ----
STDOUT:
STDERR:
---- End output of npm --registry http://ec2-54-221-190-191.compute-1.amazonaws.com:5984/registry/_design/app/_rewrite install --cache /home/nodejs/.npm --tmp /home/nodejs/tmp > npm-run.log 2>&1 ----
But! Another node just successfully finished after ~5 minutes and had this content in the npm-run.log
file:
> kerberos@0.0.3 install /home/nodejs/spicoli-authorization/node_modules/mongoose-q/node_modules/mongoose/node_modules/mongodb/node_modules/kerberos
> (node-gyp rebuild 2> builderror.log) || (exit 0)
make: Entering directory `/home/nodejs/spicoli-authorization/node_modules/mongoose-q/node_modules/mongoose/node_modules/mongodb/node_modules/kerberos/build'
SOLINK_MODULE(target) Release/obj.target/kerberos.node
SOLINK_MODULE(target) Release/obj.target/kerberos.node: Finished
COPY Release/kerberos.node
make: Leaving directory `/home/nodejs/spicoli-authorization/node_modules/mongoose-q/node_modules/mongoose/node_modules/mongodb/node_modules/kerberos/build'
I can't think of why there would be such a huge performance difference, both servers are running on amazon small ec2 instances. Maybe there is a permissions difference between the home directory on the working and broken servers... I will investigate that angle.