I've recently come across Chapel and I'm very keen to try it out. I have a two-fold problem I'm hoping it can solve.
I typically work in Python or C++. Java when backed into a corner.
I have two matrices I
and V
. Both are sparse and of dimension about 600K x 600K, populated at about 1% density.
First, using SciPy, I can load both from a SQL database into memory at the moment. However, I expect our next iteration will be simply too large for our machines. Perhaps 1.5M^2. In a case like that, RDDs from Spark may work for the load. I wasn't able to get PyTables to make this happen. I understand this is described as an "Out-of-core" problem.
Even if they do get loaded, doing I'IV
goes OOM in minutes. (Here I'
is transpose), so I'm looking into distributing this multiplication over multiple cores (which SciPy can do) and multiple machines (which it cannot, so far as I know). Here, Spark falls down but Chapel appears to answer my prayers, so-to-speak.
A serious limitation is budget on machines. I can't afford a Cray, for instance. Does the Chapel community have a pattern for this?
Starting with a few high-level points:
- At its core, the Chapel language is more about arrays (data structures) than
about matrices (mathematical objects), though one can obviously use an array
to represent a matrix. Think of the distinction as being the set of supported
operations (e.g., iteration, access, and elemental operations for arrays vs.
transpose, cross-products, and factorings for matrices).
- Chapel supports sparse and associative arrays as well as dense ones.
- Chapel arrays can be stored local to a single memory or distributed across
multiple memories / compute nodes.
- In Chapel, you should expect
matrices/linear algebra operations to be supported through libraries
rather than the language. While Chapel has a start at such
libraries, they are still being expanded --
specifically, Chapel does not have library support for distributed
linear algebra operations as of Chapel 1.15 meaning that users would have
to write such operations manually.
In more detail:
The following program creates a Block-distributed dense array:
use BlockDist;
config const n = 10;
const D = {1..n, 1..n} dmapped Block({1..n, 1..n}); // distributed dense index set
var A: [D] real; // distributed dense array
// assign the array elements in parallel based on the owning locale's (compute node's) ID
forall a in A do
a = here.id;
// print out the array
writeln(A);
For example, when run on 6 nodes (./myProgram -nl 6
), the output is:
0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0
0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0
0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0
0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0
2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0
2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0
2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0
4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0
4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0
4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0
Note that running a Chapel program on multiple nodes requires configuring it to use multiple locales. Such programs can be run on clusters or networked workstations in addition to Crays.
Here's a program that declares a distributed sparse array:
use BlockDist;
config const n = 10;
const D = {1..n, 1..n} dmapped Block({1..n, 1..n}); // distributed dense index set
var SD: sparse subdomain(D); // distributed sparse subset
var A: [SD] real; // distributed sparse array
// populate the sparse index set
SD += (1,1);
SD += (n/2, n/4);
SD += (3*n/4, 3*n/4);
SD += (n, n);
// assign the sparse array elements in parallel
forall a in A do
a = here.id + 1;
// print a dense view of the array
for i in 1..n {
for j in 1..n do
write(A[i,j], " ");
writeln();
}
Running on six locales gives:
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 6.0
In both the examples above, the forall loops will compute on the distributed arrays / indices using multiple nodes in an owner-computes fashion, and using the multiple cores per node to do the local work.
Now for some caveats:
Distributed sparse array support is still in its infancy as of Chapel 1.15.0, as most of the project's focus on distributed memory to date has been on task parallelism and distributed dense arrays. A paper+talk from Berkeley in this year's annual Chapel workshop, "Towards a GraphBLAS Library in Chapel" highlighted several performance and scalability issues, some of which have since been fixed on the master branch, others of which still require attention. Feedback and interest from users in such features is the best way to accelerate improvements in these areas.
As mentioned at the outset, Linear Algebra libraries are a work-in-progress for Chapel. Past releases have added Chapel modules for BLAS and LAPACK. Chapel 1.15 included the start of a higher-level LinearAlgebra library. But none of these support distributed arrays at present (BLAS and LAPACK by design, LinearAlgebra because it's still early days).
Chapel does not have an SQL interface (yet), though a few community members have made rumblings about adding such support. It may also be possible to use Chapel's I/O features to read the data in some textual or binary format. Or, you could potentially use Chapel's interoperability features to interface with a C library that could read the SQL.