Does the Hyper Threading allow to use of L1-cache to exchange the data between the two threads, which are executed simultaneously on a single physical core, but in two virtual cores?
With the proviso that both belong to the same process, i.e. in the same address space.
Page 85 (2-55) - Intel® 64 and IA-32 Architectures Optimization Reference Manual: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
2.5.9 Hyper-Threading Technology Support in Intel® Microarchitecture Code Name Nehalem
...
Deeper buffering and enhanced resource sharing/partition policies:
Replicated resource for HT operation: register state, renamed return stack buffer, large-page ITLB.
Partitioned resources for HT operation: load buffers, store buffers, re-order buffers, small-page ITLB are statically allocated between two logical processors.
Competitively-shared resource during HT operation: the reservation station, cache hierarchy, fill buffers, both DTLB0 and STLB.
Alternating during HT operation: front end operation generally alternates between two logical processors to ensure fairness.
HT unaware resources: execution units.
The Intel Architecture Software Optimization manual has a brief description of how processor resources are shared between HT threads on a core in chapter 2.3.9. Documented for the Nehalem architecture, getting stale but fairly likely to still be relevant for current ones since the partitioning is logically consistent:
Duplicated for each HT thread: the registers, the return stack buffer, the large-page ITLB
Statically allocated for each HT thread: the load, store and re-order buffers, the small-page ITLB
Competitively shared between HT threads: the reservation station, the caches, the fill buffers, DTLB0 and STLB.
Your question matches the 3rd bullet. In the very specific case of each HT thread executing code from the same process, a bit of an accident, you can generally expect L1 and L2 to contain data retrieved by one HT thread that can be useful to the other. Keep in mind that the unit of storage in the caches is a cache-line, 64 bytes. Just in case: this is not otherwise a good reason to pursue a thread-scheduling approach that favors getting two HT threads to execute on the same core, assuming your OS would support that. An HT thread generally runs quite a bit slower than a thread that gets the core to itself. 30% is the usual number bandied about, YMMV.