How to store the current state of crypto.createHash('sha1')
(after it got filled with hash.update(buffer)
) to use it at another http request
which might occur at a different process of node.js?
I imagine doing something like this:
var crypto = require('crypto'),
hash = someDatabase.read('hashstate') // continue with filled hash
|| crypto.createHash('sha1'); // start a new hash
// update the hash
someObj.on('data', function(buffer){
hash.update(buffer);
});
someObj.on('end', function(){
// store the current state of hash to retrieve it later (this won't work:)
someDatabase.write('hashstate', hash);
if(theEndOfAllRequests){
// create the result of multiple http requests
hash.digest('hex');
}
});
There are a couple of options I can come up with, with varying trade-offs. The big thing to note is that crypto
doesn't expose partial state of its hash functions, so there's no way to directly implement your plan of saving state to a db.
Option 1 involves diving into a hash function, which can be tricky. Fortunately, there already is one written in javascript. Again, it doesn't expose state, but I don't expect that would be a terribly difficult code transformation. I believe the entire state is stored in the variables defined at the top of create
- h0-4
, block
, offset
, shift
, and totalLength
. Then, you could save state in a db as you planned.
Options 2 involves using crypto
and passing data to be hashed between processes. This is a lot easier to work with, I think, but also a lot slower. In a few quick tests, it looks like messages will pass around at a rate of about 2.5-3MB/sec, so each 3MB chunk will take about 1.5 seconds (you can only pass strings, so I expect you'll need a Base64 conversion which costs an extra 33%). To do this, you would use process.send
to send the data along with identifying id. The master process would use worker.on
on each worker to get the messages, and keep a mapping of ids to hashing objects. Finally, you would want to have a flag in the message that tells the master it is receiving the last message, and it would worker.send
the resulting hash (received in the worker with process.on
).
I'd be happy to elaborate on whichever of these sounds most suitable.
Basically, all you need to do is create a new hash for each "related request group", store it in an object directly in your code, and keep updating that hash independently of any other unrelated requests that are going on.
All that's required is that you somehow are able to name a group of related requests, so that you know they belong together, and make sure the scope of your long-lived hashes encompass the processing functions.
Something like the following (this assumes only one group of requests occurring at any given moment, and doesn't worry about naming the request group to make sure you don't get crossover):
var crypto = require('crypto'),
// don't create it here, but set the scope so it will live between requests
hash = null;
someObj.on('data', function(chunk) {
// we have to have some data in chunk that allows us
// to relate this request to its fellow requests, or assume
// that no unrelated requests are occurring at this time
// var name = chunk.this_is_my_name;
if (hash === null) hash = crypto.createHash('sha1');
hash.update(chunk);
});
someObj.on('end', function(){
if(theEndOfAllRequests){
// create the result of multiple http requests
var digest = hash.digest('hex');
/* use the digest */
hash = null; // so it can be created fresh for the next set of requests
}
});
You can call hash.update
multiple times as data comes in.
Hard to say exactly what you should do without knowing how you are getting the chunks, but here's a simple example with v1 Streams:
var hash = crypto.createHash('sha1');
var data = // incoming file data
data.on('data', function(chunk){
hash.update(chunk);
});
data.on('end', function(){
var sha = hash.digest('hex');
// Do something with it
})