In a bash script, I try to read lines from standard input, using built-in read
command after setting IFS=$'\n'
. The lines are truncated at 4095 character limit if I paste input to the read. This limitation seems to come from reading from terminal, because this worked perfectly fine:
fill=
for i in $(seq 1 94); do fill="${fill}x"; done
for i in $(seq 1 100); do printf "%04d00$fill" $i; done | (read line; echo $line)
I experience the same behavior with Python script (did not accept longer than 4095 input from terminal, but accepted from pipe):
#!/usr/bin/python
from sys import stdin
line = stdin.readline()
print('%s' % line)
Even C program works the same, using read(2)
:
#include <stdio.h>
#include <unistd.h>
int main(void)
{
char buf[32768];
int sz = read(0, buf, sizeof(buf) - 1);
buf[sz] = '\0';
printf("READ LINE: [%s]\n", buf);
return 0;
}
In all cases, I cannot enter longer than about 4095 characters. The input prompt stops accepting characters.
Question-1: Is there a way to interactively read from terminal longer than 4095 characters in Linux systems (at least Ubuntu 10.04 and 13.04)?
Question-2: Where does this limitation come from?
Systems affected: I noticed this limitation in Ubuntu 10.04/x86 and 13.04/x86, but Cygwin (recent version at least) does not truncate yet at over 10000 characters (did not test further since I need to get this script working in Ubuntu). Terminals used: Virtual Console and KDE konsole
(Ubuntu 13.04) and gnome-terminal
(Ubuntu 10.04).
The problem is definitely not the read() ; as it can read upto any valid integer value. The problem comes from the heap memory or the pipe size.. as they are the only possible limiting factors to the size..
Please refer to termios(3) manual page, "Canonical and noncanonical mode".
By default the terminal (standard input) is in canonical mode; in this mode the kernel will buffer the input line before returning the input to the application. The hard-coded limit for Linux (maybe
N_TTY_BUF_SIZE
defined in${linux_source_path}/include/linux/tty.h
) is set to 4096 allowing input of 4095 characters not counting the ending new line. In noncanonical mode there will by default be no buffering by kernel and the read(2) system call returns immediately once a single character of input is returned (key is pressed). You can manipulate the terminal settings to read a specified amount of characters or set a time-out for non-canonical mode, but then too the hard-coded limit is 4095 per thetermios(3)
manual page.Bash
read
builtin command still works in non-canonical mode as can be demonstrated by the following:After this modification of adding
stty -icanon
you can paste longer than 4096 character string and read it successfully usingbash
built-inread
command (I successfully tried longer than 10000 characters).If you put this in a file, i.e. make it a script, you can use
strace
to see the system calls called, and you will seeread(2)
called multiple times, each time returning a single character.I do not have a workaround for you, but I can answer question 2. In linux PIPE_BUF is set to 4096 (in
limits.h
) If you do a write of more than 4096 to a pipe it will be truncated.From
/usr/include/linux/limits.h
: