Could you please help me with following issue.
Goal
Read file on client side (in browser via JS and HTML5 classes) line by line, without loading whole file to memory.
Scenario
I'm working on web page which should parse files on client side. Currently, I'm reading file as it described in this article.
HTML:
<input type="file" id="files" name="files[]" />
JavaScript:
$("#files").on('change', function(evt){
// creating FileReader
var reader = new FileReader();
// assigning handler
reader.onloadend = function(evt) {
lines = evt.target.result.split(/\r?\n/);
lines.forEach(function (line) {
parseLine(...);
});
};
// getting File instance
var file = evt.target.files[0];
// start reading
reader.readAsText(file);
}
The problem is that FileReader reads whole file at once, which causes crashed tab for big files (size >= 300 MB). Using reader.onprogress
doesn't solve a problem, as it just increments a result till it will hit the limit.
Inventing a wheel
I've done some research in internet and have found no simple way to do this (there are bunch of articles describing this exact functionality but on server side for node.js).
As only way to solve it I see only following:
- Split file by chunks (via
File.split(startByte, endByte)
method) - Find last new line character in that chunk ('/n')
- Read that chunk except part after last new line character and convert it to the string and split by lines
- Read next chunk starting from last new line character found on step 2
But I'll better use something already existing to avoid entropy growth.
I have written a module named line-reader-browser for the same purpose. It uses
Promises
.Syntax (Typescript):-
Usage:-
Try following code snippet to see module working.
Hope it saves someone's time!
Eventually I've created new line-by-line reader, which is totally different from previous one.
Features are:
Check this jsFiddle for examples.
Usage:
Performance is same to previous solution. You can measure it invoking 'Read' in jsFiddle.
GitHub: https://github.com/anpur/client-line-navigator/wiki
Update: check LineNavigator from my second answer instead, that reader is way better.
I've made my own reader, which fulfills my needs.
Performance
As the issue is related only to huge files performance was the most important part.
As you can see, performance is almost the same as direct read (as described in question above).
Currently I'm trying to make it better, as bigger time consumer is async call to avoid call stack limit hit, which is not unnecessary for execution problem.Performance issue solved.Quality
Following cases were tested:
Code & Usage
Html:
Usage:
Code: