Send a dictionary containing a file through a sock

2019-07-25 10:51发布

问题:

Is it possible to send a dict that contains a file(image or document) as a value through a socket?

I tried something like bellow but i failed..

with open("cat.jpeg", "rb") as f:
    myFile = f.read(2048)

data = {"id": "1283", "filename": "cat.jpeg", "file": myFile}

dataToSend = json.dumps(data).encode("utf-8")

This gives a json error, myFile being a byte array can't be serialized.

I tried coverting the myFile into a string using the base64 encode but it didn't worked.

What partially worked was casting myFile into a string, like str(myFile). The json serializer worked, i send it through the socket, the dict was ok but the myFile data was corrupted so i couldn't recreate the picture.

So is it possible using this approach or how should i send the file and the data through a socket to be easily parsed on the other side?

LE:

Still doesn't work using base64 encoding, myFile is still "bytes" format and json gives this error: TypeError: Object of type 'bytes' is not JSON serializable

Client

import os
import base64
import json
import socket

currentPath = os.path.dirname(os.path.abspath(__file__)) + "\\downloads\\"

with open(currentPath + "cat.png", "rb") as f:
    l = f.read()

print(type(l))   #prints <class 'bytes'>

myFile = base64.b64encode(l)

print(type(myFile))    #prints <class 'bytes'>

data = {"id": "12", "filename": "cat.png", "message": "So cute!", "file": myFile}

dataToSend = json.dumps(data).encode("utf-8")   #prints TypeError: Object of type 'bytes' is not JSON serializable

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("127.0.0.1", 1234))
s.sendall(dataToSend)
s.close()

And the Server:

import socket
import json
import os
import sys
import time
import base64

currentPath = os.path.dirname(os.path.abspath(__file__)) + "\\fileCache\\"
tempData = bytearray()

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("127.0.0.1", 1234))
s.listen(5)
conn, addr = s.accept()

while True:
    dataReceived = conn.recv(2048)
    if sys.getsizeof(dataReceived) > 17:
        tempData = tempData + dataReceived
    else:
        data = json.loads(tempData.decode("utf-8"))
        break
    time.sleep(1)

print(data)

myFile = base64.b64decode(data["file"])

with open(currentPath + data["filename"], "wb") as f:
    f.write(myFile)
    f.close()

回答1:

As I was saying in my comment, packing binary data into a string format (like JSON) is wasteful - if you use base64 you're increasing the data transfer size by 33% and it also makes it hard for the JSON decoder to properly decode the JSON as it needs to stream through the whole structure just to extract the indices.

It's much better to send them separately - JSON as JSON, and then the file contents straight as binary. Of course, you'll need a way to distinguish between the two and the easiest is to just preface the JSON data with its length when sending it so that the server knows how much bytes to read to obtain the JSON, and then read the rest as the file contents. This would make it a sort of a very simple protocol with packages formed as:

[JSON LENGTH][JSON][FILE CONTENTS]

Assuming that the JSON will never be larger than 4GB (and if it is, you'll have much bigger problems as parsing it would be a nightmare) it's more than enough to have the JSON LENGTH of fixed 4 bytes (32 bits) as an unsigned integer (you can even go for 16-bit if you don't expect the JSON to go over 64KB) so the whole strategy would work on the client side as:

  1. Create the payload
  2. Encode it to JSON and then encode it to bytes using UTF-8 encoding
  3. Get the length of the aforementioned package and send it as the first 4 bytes of the stream
  4. Send the JSON package
  5. Read and send the file contents

And on the server side you do the same process

  1. Read the first 4 bytes of the received data to get the JSON payload length
  2. Read the next number of bytes to match this length
  3. Decode them to a string using UTF-8 and then decode the JSON to get the payload
  4. Read the rest of the streamed data and store it to a file

Or in code, client:

import json
import os
import socket
import struct

BUFFER_SIZE = 4096  # a uniform buffer size to use for our transfers

# pick up an absolute path from the script folder, not necessary tho
file_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "downloads", "cat.png"))

# let's first prepare the payload to send over
payload = {"id": 12, "filename": os.path.basename(file_path), "message": "So cute!"}
# now JSON encode it and then turn it onto a bytes stream by encoding it as UTF-8
json_data = json.dumps(payload).encode("utf-8")
# then connect to the server and send everything
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:  # create a socket
    print("Connecting...")
    s.connect(("127.0.0.1", 1234))  # connect to the server
    # first send the JSON payload length
    print("Sending `{filename}` with a message: {message}.".format(**payload))
    s.sendall(struct.pack(">I", len(json_data)))  # pack as BE 32-bit unsigned int
    # now send the JSON payload itself
    s.sendall(json_data)  # let Python deal with the buffer on its own for the JSON...
    # finally, open the file and 'stream' it to the socket
    with open(file_path, "rb") as f:
        chunk = f.read(BUFFER_SIZE)
        while chunk:
            s.send(chunk)
            chunk = f.read(BUFFER_SIZE)
    # alternatively, if you're using Python 3.5+ you can just use socket.sendfile() instead
    print("Sent.")

And the server:

import json
import os
import socket
import struct

BUFFER_SIZE = 4096  # a uniform buffer size to use for our transfers

target_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "fileCache"))

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind(("127.0.0.1", 1234))  # bind to the 1234 port on localhost
    s.listen(0)  # allow only one connection so we don't have to deal with data separation
    while True:
        print("Waiting for a connection...")
        connection, address = s.accept()  # wait for and accept the incoming connection
        print("Connection from `{}` accepted.".format(address))
        # read the starting 32 bits and unpack them into an int to get the JSON length
        json_length = struct.unpack(">I", connection.recv(4))[0]
        # now read the JSON data of the given size and JSON decode it
        json_data = b""  # initiate an empty bytes structure
        while len(json_data) < json_length:
            chunk = connection.recv(min(BUFFER_SIZE, json_length - len(json_data)))
            if not chunk:  # no data, possibly broken connection/bad protocol
                break  # just exit for now, you should deal with this case in production
            json_data += chunk
        payload = json.loads(json_data.decode("utf-8"))  # JSON decode the payload
        # now read the rest and store it into a file at the target path
        file_path = os.path.join(target_path, payload["filename"])
        with open(file_path, "wb") as f:  # open the target file for writing...
            chunk = connection.recv(BUFFER_SIZE)  # and stream the socket data to it...
            while chunk:
                f.write(chunk)
                chunk = connection.recv(BUFFER_SIZE)
        # finally, lets print out that we received the data
        print("Received `{filename}` with a message: {message}".format(**payload))

NOTE: Keep in mind that this is Python 3.x code - for Python 2.x you'll have to deal with context management yourself instead of having the with ... block to open/close your sockets.

And that's all there is to it. Of course, in a real setting you need to deal with disconnects, multiple clients, etc. But this is the underlying process.



回答2:

You should be able to do:

data = base64.b64encode(myFile)
dataToSend = json.dumps({"id":"1283","filename":"cat.jpeg", "file":data})

and then send through the socket. When you receive the data on the other end of the socket, just do:

jsonDict = json.loads(dataReceived)
data = base64.b64decode(jsonDict["file"])

A better way might be to just use bson, https://github.com/py-bson/bson.

from gevent import monkey, socket
monkey.patch_all()

import bson
bson.patch_socket()
with open("cat.jpeg", "rb") as f:
    myFile = f.read()
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("127.0.0.1", 12345))
s.sendobj({u"id": "1283", u"filename": "cat.jpeg", u"file": myFile})


回答3:

Thanks everyone for help, i finally done it using base64. I found the answer here on stack overflow, i forgot the link to it but here it goes.

I had to encode and decode the file like this before using json.dumps.

base64_bytes = b64encode(l)
myFile = base64_bytes.decode("utf-8")

Here is a working example:

Client:

import os
from base64 import b64encode
import json
import socket

currentPath = os.path.dirname(os.path.abspath(__file__)) + "\\downloads\\"

with open(currentPath + "cat.png", "rb") as f:
    l = f.read()

base64_bytes = b64encode(l)
myFile = base64_bytes.decode("utf-8")

data = {"id": "12", "filename": "cat.png", "message": "So cute!", "file": myFile}

dataToSend = json.dumps(data).encode("utf-8")

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("127.0.0.1", 1234))
s.sendall(dataToSend)
s.close()

Server:

import socket
import json
import os
import sys
import base64

currentPath = os.path.dirname(os.path.abspath(__file__)) + "\\fileCache\\"
tempData = bytearray()

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("127.0.0.1", 1234))
s.listen(5)
conn, addr = s.accept()

while True:
    dataReceived = conn.recv(4096)

    if sys.getsizeof(dataReceived) > 17:
        tempData = tempData + dataReceived
    else:
        data = json.loads(tempData.decode("utf-8"))
        break

myFile = base64.b64decode(data["file"])

with open(currentPath + data["filename"], "wb") as f:
    f.write(myFile)
    f.close()