可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Intel's Threading Building Blocks (TBB) open source library looks really interesting. Even though there's even an O'Reilly Book about the subject I don't hear about a lot of people using it. I'm interested in using it for some multi-level parallel applications (MPI + threads) in Unix (Mac, Linux, etc.) environments. For what it's worth, I'm interested in high performance computing / numerical methods kinds of applications.
Does anyone have experiences with TBB? Does it work well? Is it fairly portable (including GCC and other compilers)? Does the paradigm work well for programs you've written? Are there other libraries I should look into?
回答1:
I've introduced it into our code base because we needed a bettor malloc to use when we moved to a 16 core machine. With 8 and under it wasn't a significant issue. It has worked well for us. We plan on using the fine grained concurrent containers next. Ideally we can make use of the real meat of the product, but that requires rethinking how we build our code. I really like the ideas in TBB, but it's not easy to retrofit onto a code base.
You can't think of TBB as another threading library. They have a whole new model that really sits on top of threads and abstracts the threads away. You learn to think in task, parallel_for type operations and pipelines. If I were to build a new project I would probably try to model it in this fashion.
We work in Visual Studio and it works just fine. It was originally written for linux/pthreads so it runs just fine over there also.
回答2:
I'm not doing numerical computing but I work with data mining (think clustering and classification), and our workloads are probably similar: all the data is static and you have it at the beginning of the program. I have briefly investigated Intel's TBB and found them overkill for my needs. After starting with raw pthread-based code, I switched to OPENMP and got the right mix between readability and performance.
回答3:
Portability
TBB is portable. It supports Intel and AMD (i.e. x86) processors, IBM PowerPC and POWER processors, ARM processors, and possibly others. If you look in the build directory, you can see all the configurations the build system support, which include a wide range of operating systems (Linux, Windows, Android, MacOS, iOS, FreeBSD, AIX, etc.) and compilers (GCC, Intel, Clang/LLVM, IBM XL, etc.). I have not tried TBB with the PGI C++ compiler and know that it does not work with the Cray C++ compiler (as of 2017).
A few years ago, I was part of the effort to port TBB to IBM Blue Gene systems. Static linking was a challenge, but is now addressed by the big_iron.inc build system helper. The other issues were supporting relatively ancient versions of GCC (4.1 and 4.4) and ensuring the PowerPC atomics were working. I expect that porting to any currently unsupported architecture would be relatively straightforward on platforms that provide or are compatible with GCC and POSIX.
Usage in community codes
I am aware of at least two HPC application frameworks that uses TBB:
I do not know how MOOSE uses TBB, but MADNESS uses TBB for its task queue and memory allocator.
Performance versus other threading models
I have personally used TBB in the Parallel Research Kernels project, within which I have compared TBB to OpenMP, OpenCL, Kokkos, RAJA, C++17 Parallel STL, and other models. See the C++ subdirectory for details.
The following figure shows the relative performance of the aforementioned models on an Intel Xeon Phi 7250 processor (the details aren't important - all models used the same settings). As you can see, TBB does quite well except for smaller problem sizes, where the overhead of adaptive scheduling is more relevant. TBB has tuning knobs that will affect these results.
Full disclosure: I work for Intel in a research/pathfinding capacity.
回答4:
I have used TBB briefly, and will probably use it more in the future. I liked using it, most importantly because you dont have to deal with macros/extensions of C++, but remain within the language. Also its pretty portable. I have used it on both windows and linux. One thing though: it is difficult to work with threads using TBB, you would have to think in terms of tasks (which is actually a good thing). Intel TBB would not support your use of bare locks (it will make this tedious). But overall, this is my preliminary experience.
I'd also recommend having a look at openMP 3 too.
回答5:
ZThread is LGPL, you are limited to use the library in dynamic linkage if not working in a open source project.
The Threading Building Blocks (TBB) in the open source version, (there is a new commercial version, $299 , don't know the differences yet) is GNU General Public License version 2 with a so-called “Runtime Exception” (that is specific to the use only on creating free software.)
I've seen other Runtime Exceptions that attempt to approach LGPL but enabling commercial use and static linking this is not is now the case.
I'm only writing this because I took the chance to examine the libraries licenses and those should be also a consideration for selection based on the use one intends to give them.
Txs, Jihn for pointing out this update...
回答6:
I use TBB in one project. It seemed to be easier to use it than threads.
There are tasks which can be run in parallel. A task is just a call to your parallelized subroutine. Load balancing is done automatically. That is why I accept it as a higher level parallelization library. I achieved 2.5x speed up without much work on a 4 core intel processor.
There are examples, they answer questions on forums and it is maintained and it is free.
回答7:
It's worth being clear what TBB (Threading Building Blocks) is for to contrast with other alternatives (e.g. C++ 11x concurrency features). TBB is a portable and scalable library (not a compiler extension) allowing you to write your code in the form of lightweight tasks that TBB will schedule to run as fast as possible on the CPU resources available. It's not designed support threading for other purposes (e.g. pre-emption).
I've used TBB to speed up existing image processing of for loops over image scan lines into parallel_for loops (a minimum of 2-4 scan lines as a 'grain' size). This has been very successful. It does require your loop body is (re)written to process an arbitrary index rather than assuming each loop body is processed sequentially (e.g. pointers that are incremented between each loop iteration).
This was a fairly trivial case as there wasn't any shared storage to update. Using the more powerful features (e.g. pipeline) will require significant reimagining and/or rewriting of existing code so is perhaps better suited to new code.
It's a powerful advantage that this TBB based code remains portable, doesn't seem to interfere with other code elsewhere in the same process concurrently using other threading strategies and can later be combined with multiprocessing strategies at a higher or lower levels (e.g. the TBB parallel_for code could be called from a filter in a TBB multiprocessing pipeline).
回答8:
I've looked into TBB but never used it in a project. I saw no advantages (for my purposes) over ZThread. A brief and somewhat dated overview can be found here.
It's fairly complete with several thread dispatch options, all the usual synchronization classes and a very handy exception based thread "interrupt" mechanism . It's easily extendable, well written and documented. I've used it on 20+ projects.
It also plays nice with any *NIX that supports POSIX threads as well as Windows.
Worth a look.
回答9:
The Threading Building Blocks (TBB) in
the open source version, (there is a
new commercial version, $299, don't
know the differences yet) is GNU
General Public License version 2 with
a so-called “Runtime Exception” (that
is specific to the use only on
creating free software.) I've seen
other Runtime Exceptions that attempt
to approach LGPL but enabling
comercial use and static linking this
is not the case.
According to this question threading building blocks is usable without copy-left restrictions with commercial use.
回答10:
Have you looked at boost library with its thread API?