Why does server and client gets out of sync? (pyth

2020-07-22 09:51发布

问题:

I'm currently writing a small client-server application for transferring an arbitrary file from a server to a client, via sockets.

The server will only handle one client at a time, but when a client is served it shall be ready to handle a new client connection.

The client will request a file, if the file exist, the client will receive the file, write it to disk and close the connection.

Server code:

PORT = 9000
BUFSIZE = 1000

def main(argv):
    print('The server is ready to receive')
    server_socket = socket(AF_INET, SOCK_STREAM)
    server_socket.bind(('', PORT))
    server_socket.listen(1)
    while True:
        connection_socket, addr = server_socket.accept()

        try:
            requested_filepath = connection_socket.recv(BUFSIZE).decode()
            print("Client requested the file: " + requested_filepath)
            capital_sentence = requested_filepath.upper()
            if(os.path.isfile(requested_filepath)):
                filesize = str(os.path.getsize(requested_filepath))
                connection_socket.send(filesize.encode())
                with open(requested_filepath, 'rb') as f:
                    while(True):
                        content = f.read(BUFSIZE)
                        if not content:
                            break
                        connection_socket.send(content)
                print('File has been send')
            else:
                error = "error"
                connection_socket.send(error.encode())
        finally: 
            connection_socket.close()

Client code:

PORT = 9000
BUFSIZE = 1000

def main(argv):
    servername = argv[0]
    filepath = argv[1]

    client_socket = socket(AF_INET, SOCK_STREAM)    
    client_socket.connect((servername, PORT))
    try:
        client_socket.send(filepath.encode())
        response = client_socket.recv(BUFSIZE).decode()
        if(response != "error"):
            filesize = int(response)
            print("Requested filesize: " + str(filesize))
            filename = filepath.split('/')[-1]
            with open(filename, 'wb') as f:
                while(True):
                    content = client_socket.recv(BUFSIZE)
                    if not content:
                        break
                    f.write(content)
            print('File recived')
        else:
            print("The requested file did not exist")
    finally:
        client_socket.close()

I can run the server and get the client to request and get a file, but when i run the client for a second or third time, the server and client seems to get out of sync. Both programs breaks and returns the following error message:

Client error:

Traceback (most recent call last):
  File "client.py", line 37, in <module>
    main(sys.argv[1:])
  File "client.py", line 16, in main
    response = client_socket.recv(BUFSIZE).decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte

Server error:

The server is ready to receive
Client requested the file: /pepe.jpeg
File has been send
Client requested the file: /pepe.jpeg
File has been send
Client requested the file: /pepe.jpeg
Traceback (most recent call last):
  File "server.py", line 44, in <module>
    main(sys.argv[1:])
  File "server.py", line 30, in main
    connection_socket.send(content)
ConnectionResetError: [Errno 104] Connection reset by peer

Am I not closing the socket connection in a proper way?

回答1:

You have fallen into one of the most common TCP socket programming traps. You assumed your socket would send messages, while it sends and receives only data and is completely agnostic to your messaging structure. Even if you send data using several send calls, your recv calls do not receive this exact structure but whatever happens to be in the buffer. If you sent one byte a thousand times, your recv(1000) would receive a thousand bytes and this is what is going on here.

Your issue is caused by your server being a bit faster than your client. I had to tweak your code to be able to reproduce the code reliably but this does it:

client_socket.send(filepath.encode())
sleep(1)
response = client_socket.recv(BUFSIZE).decode()

This emulates your server being faster than the client, which eventually will happen anyway. By adding sleep we can make it happen every time.

When you call recv on a TCP socket, one of the following five things can happen:

  1. There is no data and the call blocks
  2. You received data and the data you received is exactly one "message", whatever that is in your context
  3. Your server had sent more than one message before you read from the socket and you received them all on one go
  4. Your client was too eager to read and it decided to read when only a part of your first message was available
  5. Combination of 3 and 4: You receive several full messages plus one partial

What happens with your code is that your server has managed to send the encoded file size and some of your data as well. On your client you now assume your first recv receives only the file size, but this is no way guaranteed. There can be already some file data (as you will read BUFSIZE - there can be almost a full buffer of data there) and when you try to decode that as an integer, weird things happen as the data is not what you expected it to be.

The only reliable way to handle TCP sockets is to read from the socket, append to a temporary processing buffer, then parse that buffer and see what is in there. If there is a "message", process it and delete it from the buffer. Whatever remains in the buffer must stay there and your next recv result gets appended to this.

The simplest way to quickfix this is if your server makes the initial message of a fixed length. Then you can safely read exactly this amount of characters from the socket and process this as the size/error message, and the rest will be data. This is a horrible fix in many, many ways and you should aim for something better. The "proper" way is to devise a protocol, where the server puts delimiters in place so that your client can detect which message means what. Your protocol could be for example

SIZE: <decimal>\n
DATA: <data>

or even as simple as assuming everything before a newline is filesize and everything that follows is data.

But this works better even with sleep(1) added as it will now pad the initial message to exactly 100 bytes. This could still go wrong because of (4), so actually you will need to check that you received 100 characters initially and keep reading until you do, but I will leave this for you to implement.

        if(os.path.isfile(requested_filepath)):
            filesize = str(os.path.getsize(requested_filepath))
            connection_socket.send(("%s" % filesize).encode().ljust(100))
            with open(requested_filepath, 'rb') as f:
                while(True):
                    content = f.read(BUFSIZE)
                    if not content:
                        break
                    connection_socket.send(content)
            print('File has been send')
        else:
            error = "error"
            connection_socket.send(error.encode().ljust(100))

Client:

try:
    client_socket.send(filepath.encode())
    sleep(1)
    response_raw = client_socket.recv(100)
    response = response_raw.strip().decode()

PS your server should catch the "connection reset by peer" error. It is something that can happen if there is a network problem or the client application crashes. The server can safely ignore this error and just stop sending to that particular client socket.