how to upload chunks of a string longer than 21474

I am trying to upload a file around ~5GB size as below but, it throws the error string longer than 2147483647 bytes. It sounds like there is a limit of 2 GB to upload. Is there a way to upload data in chunks? Can anyone provide guidance?

logger.debug(attachment_path)
currdir = os.path.abspath(os.getcwd())
os.chdir(os.path.dirname(attachment_path))
headers = self._headers
headers['Content-Type'] = content_type
headers['X-Override-File'] = 'true'
if not os.path.exists(attachment_path):
    raise Exception, "File path was invalid, no file found at the path %s" % attachment_path
filesize = os.path.getsize(attachment_path) 
fileToUpload = open(attachment_path, 'rb').read()
logger.info(filesize)
logger.debug(headers)
r = requests.put(self._baseurl + 'problems/' + problemID + "/" + attachment_type + "/" + urllib.quote(os.path.basename(attachment_path)), 
                 headers=headers,data=fileToUpload,timeout=300)

ERROR:

string longer than 2147483647 bytes

UPDATE:

def read_in_chunks(file_object,chunk_size=30720*30720):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data
        f = open(attachment_path)

for piece in read_in_chunks(f):
      r = requests.put(self._baseurl + 'problems/' + problemID + "/" + attachment_type + "/" + urllib.quote(os.path.basename(attachment_path)), 
                        headers=headers,data=piece,timeout=300)

标签： python

1条回答

▲ chillily

2楼-- · 2020-03-15 01:58

Your question has been asked on the requests bug tracker; their suggestion is to use streaming upload. If that doesn't work, you might see if a chunk-encoded request works.

[edit]

Example based on the original code:

# Using `with` here will handle closing the file implicitly
with open(attachment_path, 'rb') as file_to_upload:
    r = requests.put(
        "{base}problems/{pid}/{atype}/{path}".format(
            base=self._baseurl,
            # It's better to use consistent naming; search PEP-8 for standard Python conventions.
            pid=problem_id,
            atype=attachment_type,
            path=urllib.quote(os.path.basename(attachment_path)),
        ),
        headers=headers,
        # Note that you're passing the file object, NOT the contents of the file:
        data=file_to_upload,
        # Hard to say whether this is a good idea with a large file upload
        timeout=300,
    )

I can't guarantee this would run as-is, since I can't realistically test it, but it should be close. The bug tracker comments I linked to also mention that sending multiple headers may cause issues, so if the headers you're specifying are actually necessary, this may not work.

Regarding chunk encoding: This should be your second choice. Your code was not specifying 'rb' as the mode for open(...), so changing that should probably make the code above work. If not, you could try this.

def read_in_chunks():
    # If you're going to chunk anyway, doesn't it seem like smaller ones than this would be a good idea?
    chunk_size = 30720 * 30720

    # I don't know how correct this is; if it doesn't work as expected, you'll need to debug
    with open(attachment_path, 'rb') as file_object:
        while True:
            data = file_object.read(chunk_size)
            if not data:
                break
            yield data


# Same request as above, just using the function to chunk explicitly; see the `data` param
r = requests.put(
    "{base}problems/{pid}/{atype}/{path}".format(
        base=self._baseurl,
        pid=problem_id,
        atype=attachment_type,
        path=urllib.quote(os.path.basename(attachment_path)),
    ),
    headers=headers,
    # Call the chunk function here and the request will be chunked as you specify
    data=read_in_chunks(),
    timeout=300,
)

0人赞添加讨论(0) 举报

how to upload chunks of a string longer than 21474

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间