I'm working on a File-Handling type of app, I recently encountered a bug that is caused by links that doesn't have a file extension like this:
https://drive.google.com/uc?export=download&id=1234567abcdefghijk
I've been basing the file type by the filename located at the end of the link which is the direct link to the file.
In the case of a redirecting link like the google drive link above, it still returns the data but the problem is since it doesn't have a file extension, the UIWebView
doesn't render the document types of file (I use a different viewer for image types and it renders quite fine because you can pass the data directly to a UIImage
).
The solution I came up with was to check for File Signature which you can find in the first 1024 bytes of the data. I found the file signatures for document types in http://www.filesignatures.net/index.php.
I can differentiate the images and pdf type of files but the problem is the xls/ppt/doc and xlsx/pptx/docx because they have the same file signatures, [D0 CF 11 E0 A1 B1 1A E1]
and [50 4B 03 04]
respectively.
Now what I want to know is if there are other ways to differentiate those Microsoft Office document files.
This is the code that I've already done, if you know how to enhance this function, I would accept it with some explanation:
typedef enum FileSignature {
kFileSignaturePDF,
kFileSignaturePPT_DOC_XLS,
kFileSignaturePPTX_DOCX_XLSX,
kFileSignaturePNG,
kFileSignatureJPG,
kFileSignatureBMP,
kFileSignatureUndefined,
}FileSignature;
+ (FileSignature) getDocumentTypeOfData:(NSData *)documentData {
if ( documentData.length >= 1024 ) {
const unsigned char pdfBytes[] = {0x25, 0x50, 0x44, 0x46};
const unsigned char jpgBytes[] = {0xFF, 0xD8, 0xFF, 0xE0};
const unsigned char pngBytes[] = {0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A};
const unsigned char bmpBytes[] = {0x42, 0x4D};
// pptx,xlsx,docx
const unsigned char msOfficeXBytes[] = {0x50, 0x4B, 0x03, 0x04};
// ppt,xls,doc
const unsigned char msOfficeBytes[] = {0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1};
NSString *pdfByteString = [[NSString alloc] initWithBytes:pdfBytes length:sizeof(pdfBytes) encoding:NSASCIIStringEncoding];
NSString *jpgByteString = [[NSString alloc] initWithBytes:jpgBytes length:sizeof(jpgBytes) encoding:NSASCIIStringEncoding];
NSString *pngByteString = [[NSString alloc] initWithBytes:pngBytes length:sizeof(pngBytes) encoding:NSASCIIStringEncoding];
NSString *bmpByteString = [[NSString alloc] initWithBytes:bmpBytes length:sizeof(bmpBytes) encoding:NSASCIIStringEncoding];
NSString *msOfficeXByteString = [[NSString alloc] initWithBytes:msOfficeXBytes length:sizeof(msOfficeXBytes) encoding:NSASCIIStringEncoding];
NSString *msOfficeByteString = [[NSString alloc] initWithBytes:msOfficeBytes length:sizeof(msOfficeBytes) encoding:NSASCIIStringEncoding];
NSArray *arrayOfBytesToSearchFor = [[NSArray alloc] initWithObjects:pdfByteString,jpgByteString,pngByteString,bmpByteString, msOfficeByteString, msOfficeXByteString, nil];
NSString *foundByteString = NULL;
for (NSString *byteString in arrayOfBytesToSearchFor) {
const unsigned char *searchForByte = (const unsigned char *) [byteString cStringUsingEncoding:NSASCIIStringEncoding];
NSData *searchForByteData = [NSData dataWithBytes:searchForByte length:sizeof(searchForByte)];
NSRange foundRange = [documentData rangeOfData:searchForByteData options:NSDataSearchAnchored range:NSMakeRange(0, 1024)];
if (foundRange.length > 0) {
foundByteString = byteString;
break;
}
}
FileSignature fileType = kFileSignatureUndefined;
int indexOfFoundByteString = [arrayOfBytesToSearchFor indexOfObject:foundByteString];
switch (indexOfFoundByteString) {
case 0:
fileType = kFileSignaturePDF;
break;
case 1:
fileType = kFileSignatureJPG;
break;
case 2:
fileType = kFileSignaturePNG;
break;
case 3:
fileType = kFileSignatureBMP;
break;
case 4:
fileType = kFileSignaturePPT_DOC_XLS;
break;
case 5:
fileType = kFileSignaturePPTX_DOCX_XLSX;
break;
default:
fileType = kFileSignatureUndefined;
break;
}
return fileType;
}
return kFileSignatureUndefined;
}
Took me a while to post this, but I went down on trojanfoe's idea of getting the
content-type
in the response header, if you are using AFNetworking 2.0 then on the success block you can get the content-type byoperation.response.allHeaderFields
,allHeaderFields
is also a property ofNSHTTPURLResponse
for those doing the manualNSURLConnection
way.If you can do some improvements in this, be it optimization or lesser line of code or additions in the list of supported documents, I suggest you post an answer.