I have a developed a code that reads very large files from FTP and writes it to local machine using Java. The code that does it is as follows . This is a part from the next(Text key, Text value)
inside the RecordReader
of the CustomInputFormat
if(!processed)
{
System.out.println("in processed");
in = fs.open(file);
processed=true;
}
while(bytesRead <= fileSize) {
byte buf[] = new byte[1024];
try {
in.read(buf);
in.skip(1024);
bytesRead+=1024;
long diff = fileSize-bytesRead;
if(diff<1024)
{
break;
}
value.set(buf, 0, 1024); // This is where the value of the record is set and it goes to the mapper .
}
catch(Exception e)
{
e.printStackTrace();
}
}
if(diff<1024)
{
int difference= (int) (fileSize-bytesRead);
byte buf[] = new byte[difference];
in.read(buf);
bytesRead+=difference;
}
System.out.println("closing stream");
in.close();
After the write is over , I see that the transfer is done and the size of the file at the destination is same as that at the source. But I am unable to open the file and the editor gives the error as
gedit has not been able to detect the character coding.
Please check that you are not trying to open a binary file.
Select a character coding from the menu and try again.
This Question: Java upload jpg using JakartaFtpWrapper - makes the file unreadable is related to mine I believe , but I couldn't make sense of it.
Any pointers ?