Someone sent this to me and claimed it is a hello world in Brainfuck (and I hope so...)
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
I know the basics that it works by moving a pointer and increment and decrementing stuff...
Yet I still want to know, how does it actually work? How does it print anything on the screen in the first place? How does it encode the text? I do not understand at all...
1. Basics
To understand Brainfuck you must imagine infinite array of cells initialized by 0
each.
...[0][0][0][0][0]...
When brainfuck program starts, it points to any cell.
...[0][0][*0*][0][0]...
If you move pointer right >
you are moving pointer from cell X to cell X+1
...[0][0][0][*0*][0]...
If you increase cell value +
you get:
...[0][0][0][*1*][0]...
If you increase cell value again +
you get:
...[0][0][0][*2*][0]...
If you decrease cell value -
you get:
...[0][0][0][*1*][0]...
If you move pointer left <
you are moving pointer from cell X to cell X-1
...[0][0][*0*][1][0]...
2. Input
To read character you use comma ,
. What it does is: Read character from standard input and write its decimal ASCII code to the actual cell.
Take a look at ASCII table. For example, decimal code of !
is 33
, while a
is 97
.
Well, lets imagine your BF program memory looks like:
...[0][0][*0*][0][0]...
Assuming standard input stands for a
, if you use comma ,
operator, what BF does is read a
decimal ASCII code 97
to memory:
...[0][0][*97*][0][0]...
You generally want to think that way, however the truth is a bit more complex. The truth is BF does not read a character but a byte (whatever that byte is). Let me show you example:
In linux
$ printf ł
prints:
ł
which is specific polish character. This character is not encoded by ASCII encoding. In this case it's UTF-8 encoding, so it used to take more than one byte in computer memory. We can prove it by making a hexadecimal dump:
$ printf ł | hd
which shows:
00000000 c5 82 |..|
Zeroes are offset. 82
is first and c5
is second byte representing ł
(in order we will read them). |..|
is graphical representation which is not possible in this case.
Well, if you pass ł
as input to your BF program that reads single byte, program memory will look like:
...[0][0][*197*][0][0]...
Why 197
? Well 197
decimal is c5
hexadecimal. Seems familiar ? Of course. It's first byte of ł
!
3. Output
To print character you use dot .
What it does is: Assuming we treat actual cell value like decimal ASCII code, print corresponding character to standard output.
Well, lets imagine your BF program memory looks like:
...[0][0][*97*][0][0]...
If you use dot (.) operator now, what BF does is print:
a
Because a
decimal code in ASCII is 97
.
So for example BF program like this (97 pluses 2 dots):
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++..
Will increase value of the cell it points to up to 97 and print it out 2 times.
aa
4. Loops
In BF loop consists of loop begin [
and loop end ]
. You can think it's like while in C/C++ where the condition is actual cell value.
Take a look BF program below:
++[]
++
increments actual cell value twice:
...[0][0][*2*][0][0]...
And []
is like while(2) {}
, so it's infinite loop.
Let's say we don't want this loop to be infinite. We can do for example:
++[-]
So each time a loop loops it decrements actual cell value. Once actual cell value is 0
loop ends:
...[0][0][*2*][0][0]... loop starts
...[0][0][*1*][0][0]... after first iteration
...[0][0][*0*][0][0]... after second iteration (loop ends)
Let's consider yet another example of finite loop:
++[>]
This example shows, we haven't to finish loop at cell that loop started on:
...[0][0][*2*][0][0]... loop starts
...[0][0][2][*0*][0]... after first iteration (loop ends)
However it is good practice to end where we started. Why ? Because if loop ends another cell it started, we can't assume where the cell pointer will be. To be honest, this practice makes brainfuck less brainfuck.
Wikipedia has a commented version of the code.
+++++ +++++ initialize counter (cell #0) to 10
[ use loop to set the next four cells to 70/100/30/10
> +++++ ++ add 7 to cell #1
> +++++ +++++ add 10 to cell #2
> +++ add 3 to cell #3
> + add 1 to cell #4
<<<< - decrement counter (cell #0)
]
> ++ . print 'H'
> + . print 'e'
+++++ ++ . print 'l'
. print 'l'
+++ . print 'o'
> ++ . print ' '
<< +++++ +++++ +++++ . print 'W'
> . print 'o'
+++ . print 'r'
----- - . print 'l'
----- --- . print 'd'
> + . print '!'
> . print '\n'
To answer your questions, the ,
and .
characters are used for I/O. The text is ASCII.
The Wikipedia article goes on in some more depth, as well.
The first line initialises a[0] = 10
by simply incrementing ten times
from 0. The loop from line 2 effectively sets the initial values for
the array: a[1] = 70
(close to 72, the ASCII code for the character
'H'), a[2] = 100
(close to 101 or 'e'), a[3] = 30
(close to 32, the
code for space) and a[4] = 10
(newline). The loop works by adding 7,
10, 3, and 1, to cells a[1]
, a[2]
, a[3]
and a[4]
respectively each
time through the loop - 10 additions for each cell in total (giving
a[1]=70
etc.). After the loop is finished, a[0]
is zero. >++.
then
moves the pointer to a[1]
, which holds 70, adds two to it (producing
72, which is the ASCII character code of a capital H), and outputs it.
The next line moves the array pointer to a[2]
and adds one to it,
producing 101, a lower-case 'e', which is then output.
As 'l' happens
to be the seventh letter after 'e', to output 'll' another seven are
added (+++++++
) to a[2]
and the result is output twice.
'o' is the
third letter after 'l', so a[2]
is incremented three more times and
output the result.
The rest of the program goes on in the same way.
For the space and capital letters, different array cells are selected
and incremented or decremented as needed.
To answer the question of how it knows what to print, I have added the calculation of ASCII values to the right of the code where the printing happens:
> just means move to the next cell
< just means move to the previous cell
+ and - are used for increment and decrement respectively. The value of the cell is updated when the increment/decrement happens
+++++ +++++ initialize counter (cell #0) to 10
[ use loop to set the next four cells to 70/100/30/10
> +++++ ++ add 7 to cell #1
> +++++ +++++ add 10 to cell #2
> +++ add 3 to cell #3
> + add 1 to cell #4
<<<< - decrement counter (cell #0)
]
> ++ . print 'H' (ascii: 70+2 = 72) //70 is value in current cell. The two +s increment the value of the current cell by 2
> + . print 'e' (ascii: 100+1 = 101)
+++++ ++ . print 'l' (ascii: 101+7 = 108)
. print 'l' dot prints same thing again
+++ . print 'o' (ascii: 108+3 = 111)
> ++ . print ' ' (ascii: 30+2 = 32)
<< +++++ +++++ +++++ . print 'W' (ascii: 72+15 = 87)
> . print 'o' (ascii: 111)
+++ . print 'r' (ascii: 111+3 = 114)
----- - . print 'l' (ascii: 114-6 = 108)
----- --- . print 'd' (ascii: 108-8 = 100)
> + . print '!' (ascii: 32+1 = 33)
> . print '\n'(ascii: 10)
Brainfuck
same as its name.
It uses only 8 characters > [ . ] , - +
which makes it the quickest programming language to learn but hardest to implement and understand.
….and makes you finally end up with f*cking your brain.
It stores values in array: [72 ][101 ][108 ][111 ]
let, initially pointer pointing to cell 1 of array:
>
move pointer to right by 1
<
move pointer to left by 1
+
increment the value of cell by 1
-
increment the value of element by 1
.
print value of current cell.
,
take input to current cell.
[ ]
loop, +++[ -] counter of 3 counts bcz it have 3 ′+’ before it, and - decrements count variable by 1 value.
the values stored in cells are ascii values:
so referring to above array: [72 ][101 ][108 ][108][111 ]
if you match the ascii values you’ll find that it is Hello writtern
Congrats! you have learned the syntax of BF
——-Something more ———
let us make our first program i.e Hello World, after which you’re able to write your name in this language.
+++++ +++++[> +++++ ++ >+++++ +++++ >+++ >+ <<<-]>++.>+.+++++ ++..+++.++.+++++ +++++ +++++.>.+++.----- -.----- ---.>+.>.
breaking into pieces:
+++++ +++++[> +++++ ++
>+++++ +++++
>+++
>+
<<<-]
Makes an array of 4 cells(number of >) and sets a counter of 10 something like :
—-psuedo code—-
array =[7,10,3,1]
i=10
while i>0:
element +=element
i-=1
because counter value is stored in cell 0 and > moves to cell 1 updates its value by+7 > moves to cell 2 increments 10 to its previous value and so on….
<<<
return to cell 0 and decrements its value by 1
hence after loop completion we have array : [70,100,30,10]
>++.
moves to 1st element and increment its value by 2(two ‘+’) and then prints(‘.’) character with that ascii value. i.e for example in python:
chr(70+2) # prints 'H'
>+.
moves to 2nd cell increment 1 to its value 100+1 and prints(‘.’) its value i.e chr(101)
chr(101) #prints ‘e’
now there is no > or < in next piece so it takes present value of latest element and increment to it only
+++++ ++..
latest element = 101 therefore, 101+7 and prints it twice(as there are two‘..’) chr(108) #prints l twice
can be used as
for i in array:
for j in range(i.count(‘.’)):
print_value
———Where is it used?——-
It is just a joke language made to challenge programmers and is not used practically anywhere.
All the answers are thorough, but they lack one tiny detail: Printing.
In building your brainfuck translator, you also consider the character .
, this is actually what a printing statement looks like in brainfuck. So what your brainfuck translator should do is, whenever it encounters a .
character it prints the currently pointed byte.
Example:
suppose you have --> char *ptr = [0] [0] [0] [97] [0]
...
if this is a brainfuck statement: >>>.
your pointer should be moved 3 spaces to right landing at: [97]
, so now *ptr = 97
, after doing that your translator encounters a .
, it should then call
write(1, ptr, 1)
or any equivalent printing statement to print the currently pointed byte, which has the value 97 and the letter a
will then be printed on the std_output
.
I think what you are asking is how does Brainfuck know what to do with all the code. There is a parser written in a higher level language such as Python to interpret what a dot means, or what an addition sign means in the code.
So the parser will read your code line by line, and say ok there is a > symbol so i have to advance memory location, the code is simply, if (contents in that memory location) == >, memlocation =+ memlocation which is written in a higher level language, similarly if (content in memory location) == ".", then print (contents of memory location).
Hope this clears it up. tc