what is wrong with this binary file transfer (corr

2019-02-27 15:28发布

问题:

I've been trying to resolve this issue for over a week and could really do with some help.

We are using a httprequest to post files to an api. Most files come out ok, but docx files end up corrupted.

After much research I'm pretty sure that I'm doing something wrong in the binary post that is adding extra data / bytes to the file.

Streams are being closed and I think I've got the boundries and headers right....

Are there any obvious mistakes in the code below? Or would anybody be able to point me in the right direction for a fix. Why is extra data being added to this file? Are http headers the issue, or am I reading the stream incorrectly? What is the most likely cause of my woes?

(I have tried to examine the extra data in the docx file to find out where it's coming from. But I have been unable to do so. There are many docx repair tools out there, but none I've come across give information about the error, they just fix the file. I have tried the Open XML SDK 2.0 for Microsoft Office, but this won't open the corrupt file, so I can't compare it to a fixed one. )

Code:

Sub PostTheFile(CVFile, fullFilePath, PostToURL)

    strBoundary = "---------------------------9849436581144108930470211272"
    strRequestStart = "--" & strBoundary & vbCrlf &_
        "Content-Disposition: attachment; name=""file""; filename=""" & CVFile & """" & vbcrlf & vbcrlf
    strRequestEnd = vbCrLf & "--" & strBoundary & "--" 

    Set stream = Server.CreateObject("ADODB.Stream")
        stream.Type = adTypeBinary 
        stream.Mode = adModeReadWrite     
        stream.Open
        stream.Write StringToBinary(strRequestStart)
        stream.Write ReadBinaryFile(fullFilePath)
        stream.Write StringToBinary(strRequestEnd)
        stream.Position = 0
        BINARYPOST= stream.read
        stream.Close

    Set stream = Nothing    

    Set httpRequest = Server.CreateObject("MSXML2.ServerXMLHTTP.6.0")
        httpRequest.Open "PATCH", PostToURL, False, "username", "pw"
        httpRequest.setRequestHeader "Content-Type", "multipart/form-data; boundary=""" & strBoundary & """"
        httpRequest.Send BINARYPOST
        Response.write "httpRequest.status: " & httpRequest.status 
    Set httpRequest = Nothing   
End Sub


Function StringToBinary(input)
    dim stream
    set stream = Server.CreateObject("ADODB.Stream")
        stream.Charset = "UTF-8"
        stream.Type = adTypeText 
        stream.Mode = adModeReadWrite 
        stream.Open
        stream.WriteText input
        stream.Position = 0
        stream.Type = adTypeBinary 
        StringToBinary = stream.Read
        stream.Close
    set stream = Nothing
End Function

Function ReadBinaryFile(fullFilePath) 
    dim stream
    set stream = Server.CreateObject("ADODB.Stream")
        stream.Type = 1
        stream.Open()
        stream.LoadFromFile(fullFilePath)
        ReadBinaryFile = stream.Read()
        stream.Close
    set stream = nothing
end function  

Links to Files

Here are links to the files before and after going through the API. I kept them really simple.

http://fresherandprosper.com/cvsamples/testcv.corrupted.docx

http://fresherandprosper.com/cvsamples/testcv.notcorrupted.docx

Update

After Edi9999's fantastic help (see below) I thought my problems were over. All I had to do was figure out how I was generating the unwanted additional sequence in my code and remove it.

But I couldn't seem to nail WHAT to remove from my code. Nothing worked as expected.

Then I realised... each time I posted the file, the ending sequence came out slightly different.

0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 

And the exact same file, using the exact same code posted 30 seconds later:

0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 00

And again, a few minutes later:

0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 0000 ed24

Maybe this deserves a new question. But there's already about 6 relating to this issue so I'm reluctant to add yet another one.

回答1:

Here is what I tried to do with your docx:

  • I opened them with word, the corrupted one was indeed corrupt
  • I unzipped the files, they were fully identical

I watched at the size of the docx, it was different for the docx.

So I looked into the binary file: The beginning of the file is identical

504b 0304 1400 0600 0800 0000 2100 ddfc
9537 6601 0000 2005 0000 1300 0802 5b43
6f6e 7465 6e74 5f54 7970 6573 5d2e 786d
6c20 a204 0228 a000 0200 0000 0000 0000

But at then end:

Uncorrupted file

6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 0000 ed24
0000 0000 

Corrupted file

6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 0000 ed24
0000 0000 0a2d 2d2d 2d2d 2d2d 2d2d 

As you can see, they is a sequence: 0a2d 2d2d 2d2d 2d2d 2d2d. The rest of the file is identical. And when I delete this sequence, the file is not corrupted any more.

Converted into ascii, 0a2d 2d2d 2d2d 2d2d 2d2d is \n----

This is probably due to the strRequestEnd = vbCrLf & "--" & strBoundary & "--"

Howewer, as I don't really understand exactly what happens into your code, If you want more help, please explain more deeply this portion of code.

Hope this helps