From what I understood here, "V8 has a generational garbage collector. Moves objects aound randomly. Node can’t get a pointer to raw string data to write to socket." so I shouldn't store data that comes from a TCP stream in a string, specially if that string becomes bigger than Math.pow(2,16)
bytes. (hope I'm right till now..)
What is then the best way to handle all the data that's comming from a TCP socket ? So far I've been trying to use _:_:_
as a delimiter because I think it's somehow unique and won't mess around other things.
A sample of the data that would come would be something_:_:_maybe a large text_:_:_ maybe tons of lines_:_:_more and more data
This is what I tried to do:
net = require('net');
var server = net.createServer(function (socket) {
socket.on('connect',function() {
console.log('someone connected');
buf = new Buffer(Math.pow(2,16)); //new buffer with size 2^16
socket.on('data',function(data) {
if (data.toString().search('_:_:_') === -1) { // If there's no separator in the data that just arrived...
buf.write(data.toString()); // ... write it on the buffer. it's part of another message that will come.
} else { // if there is a separator in the data that arrived
parts = data.toString().split('_:_:_'); // the first part is the end of a previous message, the last part is the start of a message to be completed in the future. Parts between separators are independent messages
if (parts.length == 2) {
msg = buf.toString('utf-8',0,4) + parts[0];
console.log('MSG: '+ msg);
buf = (new Buffer(Math.pow(2,16))).write(parts[1]);
} else {
msg = buf.toString() + parts[0];
for (var i = 1; i <= parts.length -1; i++) {
if (i !== parts.length-1) {
msg = parts[i];
console.log('MSG: '+msg);
} else {
buf.write(parts[i]);
}
}
}
}
});
});
});
server.listen(9999);
Whenever I try to console.log('MSG' + msg)
, it will print out the whole buffer, so it's useless to see if something worked.
How can I handle this data the proper way ? Would the lazy module work, even if this data is not line oriented ? Is there some other module to handle streams that are not line oriented ?
It has indeed been said that there's extra work going on because Node has to take that buffer and then push it into v8/cast it to a string. However, doing a toString() on the buffer isn't any better. There's no good solution to this right now, as far as I know, especially if your end goal is to get a string and fool around with it. Its one of the things Ryan mentioned @ nodeconf as an area where work needs to be done.
As for delimiter, you can choose whatever you want. A lot of binary protocols choose to include a fixed header, such that you can put things in a normal structure, which a lot of times includes a length. In this way, you slice apart a known header and get information about the rest of the data without having to iterate over the entire buffer. With a scheme like that, one can use a tool like:
As an aside, buffers can be accessed via array syntax, and they can also be sliced apart with .slice().
Lastly, check here: https://github.com/joyent/node/wiki/modules -- find a module that parses a simple tcp protocol and seems to do it well, and read some code.
You should use the new stream2 api. http://nodejs.org/api/stream.html
Here are some very useful examples: https://github.com/substack/stream-handbook
https://github.com/lvgithub/stick