What are some common hidden things that can bottleneck multithreading/parallelism in Jython? I have some parallel code (using Python's threading library) that won't scale past 3-4 CPUs, and I'm sure it's not because of any of these obvious pitfalls:
Basically all the algorithm does is a bunch of string processing, list and dictionary lookups and math. My understanding is that, unlike CPython, Jython does not have a GIL.
Accessing variables is one of those "hidden" bottlenecks. If all threads access some shared datastructure(s) there will be synchronization between the threads.
Jython tries hard to achieve language compatibility with CPython. One thing that the GIL ensures is that access to local/global variables, object members, dict elements (technically locals, globals and object members are also dict elements) or even list elements are atomic. To avoid surprises for users Jython uses a concurrent hash map to implement dicts. This means that there is some synchronization going on when accessing any kind of dict elements in Jython. This sycnhronization is striped to support access to the dict from multiple threads without blocking them, but if multiple threads access the same variable they are going to hit the same lock.
The best way to achieve scalability in Jython, and any other language, is to make sure that the data you are accessing in each thread is not accessed from other threads as well.
Jython doesn't have a GIL, but it's pretty tough to get a lot of parallelism. If you have any part that can't be done in parallel, you get bitten by Ahmdahl's Law:
The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.
Moreover, even if you do purely parallel computation, you get bitten by other things, like straining your cache. Also remember that your code is running on top of a virtual machine, so even if your code is purely parallel, the JVM might have some internal coordination that holds you back (garbage collection is an obvious candidate).
Have you tried any performance analysis packages? Even if they're not explicitly for Jython I bet it would provide some help.
I'd try YourKit first if you have access to a license.