Elixir processes have their own heap. If a process wants to share a data structure with another process, how could that be possible? One answer that comes to my mind is that the process sends a message to the other process containing the data structure. Does that mean that the entire data structure is copied from one heap to the other? And if this is true, isn't it inefficient?

TL;DR:

Yes, it is inefficient. But you'll almost never notice this in practice. Welcome to the world of enormously safer programming. Most of the stuff you'll probably use an Erlang-based language for will be network related, and the network is by far the greater constraint (and sometimes disk or port IO).

Also, the alternative is a freaking nightmare. If you do massively concurrent programming, anyway.

Discussion

There are two very different contexts to consider when contemplating "efficiency":

Is it efficient for the machine to perform the task in terms of time, space, and locked resources? Are there obvious shortcuts that do not introduce leaky abstractions?
Is it efficient for humans to write, understand and maintain?

When you consider these two aspects of efficiency you must eventually bring the question down to time and money -- because that's where things are going to actually matter in terms of usefully employing the tool.

The Human Context

This efficiency argument is very similar to the argument that "Python is way less efficient than assembler". I used to argue the same thing -- until I took charge of several large development efforts. I still think JavaScript, XML and a few other demonstrably bad languages and data representations are the devil, but in the general case (defined as "cases where you don't have precise knowledge and control over your interrupt timing as it relates to bus reads/write and CPU cycles") the greater the basic abstraction provided by the language (and the smaller that language), the better.

Erlang wins by every measure in the context of modern, massively concurrent systems, crushing even most other EVM languages in terms of simplicity and syntactic limitation (except for LFE -- Richard got that right, imo).

Consider the syntactic complexity of Elixir, for example. It is not a bad language by any means (quite the contrary). But while it is easier for many newcomers in terms of familiarity it is several times more complex in real terms, and that stings a lot longer than any initial learning curve. "Easy" is not at all the same thing as "simple"; "ease" being an issue of familiarity, not utility value.

The Machine Context

Whether or not a paradigm is efficient in execution depends almost entirely on the context of reference passing ("by pointer") VS message passing ("by value") in the underlying implementation.

How large are the things that are being passed? Is a hybrid approach employed that does not break the abstraction of passing message by value?

In Erlang (and by extension Elixir and LFE) most messages being passed between processes are quite small. Really, really tiny, in fact. Large, immutable messages are nearly always Erlang binaries -- and these actually are passed by reference (more on that later).

Large messages are a bit more rare, but considering the way that copying is implemented, even this is not such a huge problem. To allow processes to crash on their own and allow each process to have its own garbage collection schedule (as opposed to the nightmare scenario of unpredictable "stop the world" garbage collection) every Erlang process has its own heap.

That is an overall optimization in two ways:

This allows each process to crash and not affect anything.
It also allows each process to be written in a way so that every assignment is, generally speaking, an immutable label declaration as opposed to a mutable assignment (as opposed to an insanely dangerous and/or insanely complex to manage and schedule shared data object declaration).

All of that is what enables segregated garbage collection per process, and this single difference makes Erlang feel like it has incremental garbage collection while actually implementing a boringly ordinary GC model underneath (just splitting it up per process).

But then there are a few places where we really do want to have some pass-by-reference at the expense of underlying complexity (and according difficulty in terms of cognitive overhead for the programmer).

"Large" binaries are the classic example case. Any binary larger than 64 bytes is a shared object by default, passed by reference (pointer) instead of passed by value (copying). They are still immutable, of course, and that is the only reason this is safe to do. The problem is that without using binary:copy/1,2 any reference to a sub-section of a larger binary becomes a reference to the whole binary, so you can wind up with a surprising amount of underlying data in the global heap because of binary references to tiny fragments of larger overall binary objects in memory. This is problematic, but that's the price of implementing a performance hack like shared memory objects in the context of safe concurrency.

Conclusion (some unquantifiable anecdotally based guidance...)

I, personally, have never actually had copy-by-value be a bottleneck. Not once. And I've written a lot of Erlang programs.

Your real bottleneck is almost always shared access to an external resource such as disk/storage/network (which are the same thing, conceptually). It is much cheaper, by any measure, to pay for an extra core or extra VM/instance than to pay for programmers to track down cases where binary:copy/1,2 should be used -- and the speed of memory and CPU time is only getting faster and cheaper, so whatever you think is a "performance hit" today is going to appear as a trivial complaint by next year compared to the real cost of having your expensive programmers track down silly speed hacks in your code in the future.

(And if your programmers aren't profoundly more expensive than your computing resources why do you hire such awful programmers?!?!? ZOMG!)

A note on the future...

The future is only going to be increasingly multicore and in most cases both more parallel and more concurrent. Now that AMD is executing on its vision to bring 1000+ core systems to the desktop I predict the next big scramble is going to be massive improvements in bus speed, channeling, cache management, and huge increases in core memory sizes. That's the only way all those cores will ever see employment.

The only languages that are going to be able to harness that will be ones like Erlang that implement message passing by value as the primary approach, backed up by the hybrid cases such as large binary reference passing and explicit copying of global heap objects. In this kind of world hygienic paradigms will become more important and language simplicity will be the elements that save us from the explosion of complexity that so much parallelism and concurrency entails.

Consider the drive toward "microservices architecture" and even Docker -- people are unconsciously stumbling on and then solving many of the same problems Erlang was designed originally to solve, just in an ad hoc way.

In a massively multicore, massively concurrent environment, passing by value and having a heap per process seems like an overall optimization, considering how much more expensive good programmers are compared to cores, spindles, storage and memory. (Incidentally, I think more lasting software in concurrent languages will be written by fewer programmers in the future, while the army-of-monkeys approach will continue to produce essentially ephemeral codebases.)