What algorithms can analyse call dependencies for

2019-04-01 10:44发布

Suppose I have a library which contains a bunch of interdependent functions, this library is too big and I want to split it up. What algorithms are there to find suitable partitions?

Simple example, there are four functions in it: alpha, beta, gamma, and delta.

  • beta and gamma both call delta.
  • module1 calls alpha and beta.
  • module2 calls gamma.
  • module3 calls alpha, beta, and gamma.

The output of the algorithm could be:

  • LibA contains (alpha,beta)
  • LibB contains (gamma)
  • LibC contains (delta)
  • module1 depends on LibA
  • module2 depends on LibB
  • module3 depends on LibA and LibB
  • LibA depends on LibC
  • LibB depends on LibC

i.e. it finds the most-finely-grained Lib* partition with the following property

For all X, if LibX is partitioned by any method into LibY and LibZ then all modules/libraries which depend on LibY also depend on LibZ and vice-versa.

Is there a standard solution for this?

1条回答
Rolldiameter
2楼-- · 2019-04-01 11:00

(This is the same kind of problem that people have with header files in C and C++ programs, too.)

It isn't just "calls" which create dependencies; it is any kind of reference, to a member variable, a static variable or even a constant definition.

Basically what you need to do is to discover all the fine grain dependencies (this usually requires a compiler-like analysis tool that reads the code and discovers such dependencies, between declared language elements (declarations, fields, methods, classes, packages if you are java-centric, etc.) and other language elements. using the semantics of the language in which the libraries are written. (Such an analyis is probably conservative). This is essence gives you a giant graph, with nodes being language elements, and arcs being "uses".

The library packaging problem in the abstract is breaking this graph apart into chunks, minimizing cross-chunk dependency arcs. This may give you huge numbers of small libraries.

The practical problem is grouping together some chunks which have no actual dependency on one another, but are commonly used together. For instance, a set of buffer access procedures may not have any explicit dependency on a definition of default buffersize, but you probably want one library containing both, rather than two libraries with one containing just the default buffersize declaration. This notion of used-together is really a problem domain artifact, and isn't visible anywhere in the code except for perhaps some statistical co-occurence of uses.

The hard part of this problem is discovering the fine-grain semantic dependencies. You can approximate this by hand, but if there is any scale to the problem, you won't have the appetite to do it. (People don't reorganize header files for the same reason). Pretty much you need language tools to do the analysis, big graph management to propose chunks, statistical analysis to get a hueristic grouping, and probably a UI to allow a domain expert to editing the grouping to produce the revised libraries.

Then you need a tool to go back to code that uses the legacy libraries, and modify them to use the revised libraries. Both the library refactoring and the code base revision requires massive code analysis and change, which needs automation.

Our DMS Software Reengineering Toolkit with its many language front ends is probably a good foundation for implementing such a library reorganization. We've considered doing this for C and C++ [which is why I have this response], but its a big task even for us to do. We'd love some serious additional motivation!.

查看更多
登录 后发表回答