Build system that is not file-centric

2019-07-22 13:53发布

问题:

We have a software infrastructure which works pretty much like a software build system: Information is gathered from different sources and used to generate some outputs. Like in traditional software builds we have different types of output, dependency trees, etc.

The main difference is that our sources, intermediate results and outputs are not inherently file-based. Rather, they're (uniquely addressable) data objects.

Right now we're mapping our data structure to files and directories in combination with a traditional build system (SCons) but that does not scale, both w.r.t. performance but (more importantly) w.r.t. maintainability. Hence I'm looking for an infrastructure that's built for this purpose from the ground up.

As an illustration, assume you have 3 XML documents A, B and C. Let's say that B/foo/bar is to be calculated from A/x/y and A/x/z, and that similarly C/a/b is calculated from A/x/y. I need an infrastructure to

  1. Implement these relationships (i.e. the transformations and their dependencies)
  2. Automatically re-build the relevant parts after changes are made

One major problem with using files is that, if I map A, B and C to some files A.xml, B.xml and C.xml and use a traditional build system, then any change to A.xml will trigger a rebuild of B.xml and C.xml, even if A/x/y and A/x/z (the original dependencies of B) are not modified. For a fine-grained dependency resolution I therefore would need to map each of A, B and C not to a file, but to a directory where each sub-directory represents an element, files represents attributes, etc. As I said, this does not scale for us.

(Please note that our system is not actually based on XML)

Right now I'm looking for any existing software, infrastructure or concept which points into this direction, regardless of implementation language and underlying data structures.

回答1:

It sounds like you need an active object database management system (ODBMS) like GemStone/S. ODBMSs provide the traditional persistence services without the old cost of mapping data structures to files and the well-known benefits of object technology. As you've mentioned dependency trees and addressable objects, in ODBMSs navigational references are stored as part of their data, allowing any complex interaction patterns among objects to be represented/accessed. This is specially true when you predict a system which makes use of inheritance, object nesting and cross-referencing.

Although an object engine may seem oversized for your requirements, it is common for large-scale production business systems to store and execute methods using OODBMs, within a concurrent and multiuser environment. It doesn't come for free because you have to invest in the human part of the equation (education and experience) but once the initial fear is overcome, it will pay the return of investment.

For re-building (subscribed) parts after changes (notifications from announcers) are made, you may use the Observer design pattern, or one of its variants (SASE or Announcements framework), to implement your announce/subscription architecture. Under this type of event frameworks there are intrinsic problems which are hard to solve with traditional file-based solutions, as you have noticed already. For example, it is typical for a dependency mechanism to manage the replacement of an object, or in your example an XML document, by another one. Any modern events framework should manage when an object is removed, all dependents plugged to the old object are updated to the new reference.

Finally, there is a free GemStone/S stack which includes object dependency framework so you may experiment with a real object-database.



回答2:

So nothing comes to mind that solves exactly your problem, but there are a few tools that might get you a little closer than you are now:

1) You might be able to throw something together using Fuse that would give you better control of how your data objects are mapped out to files. Fuse basically allows you to construct arbitrary file systems from whatever backing data you want. (The python bindings are pretty friendly, but there are a number of other language interfaces available as well). Then you could use a traditional build tool, and take advantage of file like objects better associated w/your data.

2) Cmake has a pretty extensible language for writing custom targets that you might be able to press into service. Unfortunately its language is pretty didactic and has something of a steep learning curve, so it wouldn't be my first choice.