What's the difference between “green threads”

2020-02-26 08:10发布

问题:

After reading about Erlang's lighweight processes I was pretty much sure that they were "green threads". Until I read that there are differences between green threads and Erlang's processes. But I don't get it.

What are the actual differences?

回答1:

Green Threads can share data memory amongst themselves directly (although synchronization is required of course).

Erlang doesn't use "Green Threads" but rather something closer to "Green Processes": processes do not share data memory directly but do so by "copying" it (i.e. having independent copies of the source data).



回答2:

It is a simplification that goes too far to say that Erlang processes can not share data memory directly, and that they only copy values between each other. That is more of a description of how it could be implemented, and how one can pretend that it is implemented. At least for all purposes except performance issues.

Erlang enforces a few semantic restrictions on what you can do as a programmer. For example, values are immutable, meaning that you can't change them after they are constructed. One then realise that it would be perfectly fine for multiple Erlang processes to access the same value in memory, since none of the can change it anyway. And locks are not necessary then.

Notable situations when this is done in Erlang/OTP is:

  • Large binaries (more than 64 byte) are reference counted in a special binary heap, and references into this heap is passed when messaging.
  • Literal values are placed in a special memory area, all processes referring to them refer to values in the same memory area (but as soon as the value is sent in a message a duplicate is made in the receiving process).
  • Each node as a global atom table, and atom values are really references into this table, this makes atom equality testing very efficient (compare pointer instead of string).
  • The experimental erl -hybrid setting that combines process-heaps and shared-heaps by having processes copy values from the process-heap into the shared-heap first when used in a message. I found this thread about hybrid heaps, which also explains some issues with the concept.

Another trick that can be done is to actually mutate values, but making sure that it isn't visible. This is to further explain that immutable values is a semantic restriction.

These are some examples when OTP/Erlang will actually mutate values:

  • "Recent" (R12) optimisations in handling of the binary syntax allow you to append to the end of binaries and actually not construct a complete new binary with the new tail added.
  • It has been said that, newly constructed tuples with an immediate set_element can be, or have once been, translated by the compiler to actually change the element in-place for the tuple.

These optimisations go under the theory that "if a tree falls in the forest, and nobody is there to hear it, does it really make a sound?". That is, references must not have escaped to the object that is to be mutated. Because then it can be observed that it has changed.

And this is really what Erlang semantics is about, things should not change as a side-effect of what some other process is doing. We would call that shared state, and we don't like it at all.

Another simplification that goes too far is to say that Erlang has no side-effects. But that is for another question if it is ever asked.



回答3:

When people object to calling Erlang's processes "green threads", they aren't objecting to the "green" part, they are objecting to the "threads" part.

The difference between threads and processes is basically, that threads have only their own instruction pointer, but share everything else (especially state, memory, address space). Processes OTOH are completely isolated and share nothing.

Erlang's processes share nothing, thus, they are true processes. However, they are usually implemented in a "green" manner. So, technically, they are "green processes".

I usually call them "green threads" when I want to emphasize the light weight implementation, and call them "processes" when I want to emphasize the shared-nothing semantics. That way I don't have to explain what I mean by "green processes".