How can I read any large file(greater than 1 gigabytes) locally by chunks(2kb or more),and then convert the chunk to a string, process the string and then get the next chunk and so on until the end of the file?
I'm only able to read small files and convert it to string, as you can see from the code I don't know how to read the file by chunks. The browser freezes if I try it with a file greater than 10mb.
<html>
<head>
<title>Read File</title>
</head>
<body>
<input type="file" id="myFile">
<hr>
<textarea style="width:500px;height: 400px" id="output"></textarea>
<script>
var input = document.getElementById("myFile");
var output = document.getElementById("output");
input.addEventListener("change", function () {
if (this.files && this.files[0]) {
var myFile = this.files[0];
var reader = new FileReader();
reader.addEventListener('load', function (e) {
output.textContent = e.target.result;
});
reader.readAsBinaryString(myFile);
}
});
</script>
</body>
</html>
Below are the links and answers I found on StackOverflow whilst researching on how to accomplish it, but it didn't solve my question.
1: This question was asking about how to do it using UniversalXPConnect, and only in Firefox, which is why i found the answer there to be irrelevant, because I use Chrome and don't know what UniversalXPConnect is.
How to read a local file by chunks in JavaScript
2: This question was asking about how to read text files only, but I want to be able to read any file not just text, and also by chunks, which makes the answers there irrelevant, but i liked how short the code of the answer was. Reading local text file into a JavaScript array [duplicate]
3: This also is about text files and doesn't show how to read files by chunks How to read a local text file.
I know a little bit of Java, which you can easily do it by;
char[] myBuffer = new char[512];
int bytesRead = 0;
BufferedReader in = new BufferedReader(new FileReader("foo.mp4"));
while ((bytesRead = in.read(myBuffer,0,512)) != -1){
...
}
but I'm new to javascript
So the issue isn't with FileReader
, it's with :
output.textContent = e.target.result;
Because you are trying to dump 10MB+ worth of string into that textarea
all at once. I'm not even sure there is a "right" way to do what you are wanting, since even if you did have it in chunks, it would still have to concat the previous value of output.textContent
on each loop through those chunks, so that as it gets closer to the end, it would start slowing down in the same way (worse, really, because it would be doing the slow memory hogging business on every loop). So I think part of the looping process is going to have to be adding a new element (like a new textarea
to push the current chunk to (so it doesn't have to do any concatenation to preserve what has already been output). I haven't worked that part out yet, but here's what I've got so far:
var input = document.getElementById("myFile");
var output = document.getElementById("output");
var chunk_length = 2048; //2KB as you mentioned
var chunker = new RegExp('[^]{1,' + chunk_length + '}', 'g');
var chunked_results;
input.addEventListener("change", function () {
if (this.files && this.files[0]) {
var myFile = this.files[0];
var reader = new FileReader();
reader.addEventListener('load', function (e) {
chunked_results = e.target.result.match(chunker);
output.textContent = chunked_results[0];
});
reader.readAsBinaryString(myFile);
}
});
This is just outputting the first string in the array of 2KB chunks. You would want to do your thing as far as adding a new element/node in the DOM document for outputting all the other chunks.
Using RegExp
and match
for the actual chunking was lifted from a clever gist I found.
I was able to solve it by slicing the file by specifying attributes of where to begin the slice and where to end which will be the chunk, I then enclosed it in a while loop so that for each loop chunk position will shift according to the desired chunk size until the end of the file.
But after running it, I end up getting the last value of the chunk in the text area, so to display all the binary string i concatenate the output on each iteration.
<html>
<head>
<title>Read File</title>
</head>
<body>
<input type="file" id="myFile">
<hr>
<textarea style="width:500px;height: 400px" id="output"></textarea>
<script>
var input = document.getElementById("myFile");
var output = document.getElementById("output");
var chunk_size = 2048;
var offset = 0;
input.addEventListener("change", function () {
if (this.files && this.files[0]) {
var myFile = this.files[0];
var size = myFile.size; //getting the file size so that we can use it for loop statement
var i=0;
while( i<size){
var blob = myFile.slice(offset, offset + chunk_size); //slice the file by specifying the index(chunk size)
var reader = new FileReader();
reader.addEventListener('load', function (e) {
output.textContent += e.target.result; //concatenate the output on each iteration.
});
reader.readAsBinaryString(blob);
offset += chunk_size; // Increment the index position(chunk)
i += chunk_size; // Keeping track of when to exit, by incrementing till we reach file size(end of file).
}
}
});
</script>
</body>
</html>
You can do that using fs.createReadStream(), The amount of data potentially buffered depends on the highWaterMark option passed into the streams constructor.
So you would do it like this:
var read = fs.createReadStream('/something/something', { highWaterMark: 64 });
here's an example :
var fs = require('fs')
var read = fs.createReadStream('readfile.txt',{highWaterMark:64})
var write = fs.createWriteStream('written.txt')
read.on('open', function () {
read.pipe(write);
});
see how it reads 64 bytes at a time (Very Slow), you can view it on explorer in a fun way, but make sure you have a large text file to test it not a gigabyte but at least 17 megabytes like I did "fill it with any dummy text"
make the file view to "details" and keep refreshing the destination in windows explorer, you will see the size increase on every refresh.
I assumed you know about the pipe method if you don't, no problem! it's very simple, here is a link:
https://nodejs.org/api/stream.html#stream_readable_pipe_destination_options
or a quick explanation :
readable.pipe(writable)
The pipe() function reads data from a readable stream as it becomes available and writes it to a destination writable stream.