How to use mercurial subrepos for shared component

2019-02-09 03:26发布

问题:

We develop .NET Enterprise Software in C#. We are looking to improve our version control system. I have used mercurial before and have been experimenting using it at our company. However, since we develop enterprise products we have a big focus on reusable components or modules. I have been attempting to use mercurial's sub-repos to manage components and dependencies but am having some difficulties. Here are the basic requirements for source control/dependency management:

  1. Reusable components
    1. Shared by source (for debugging)
    2. Have dependencies on 3rd party binaries and other reusable components
    3. Can be developed and commited to source control in the context of a consuming product
  2. Dependencies
    1. Products have dependencies on 3rd party binaries and other reusable components
    2. Dependencies have their own dependencies
    3. Developers should be notified of version conflicts in dependencies

Here is the structure in mercurial that I have been using:

A reusable component:

SHARED1_SLN-+-docs
            |
            +-libs----NLOG
            |
            +-misc----KEY
            |
            +-src-----SHARED1-+-proj1
            |                 +-proj2
            |
            +-tools---NANT

A second reusable component, consuming the first:

SHARED2_SLN-+-docs
            |
            +-libs--+-SHARED1-+-proj1
            |       |         +-proj2
            |       |
            |       +-NLOG
            |
            +-misc----KEY
            |
            +-src-----SHARED2-+-proj3
            |                 +-proj4
            |
            +-tools---NANT            

A product that consumes both components:

PROD_SLN----+-docs
            |
            +-libs--+-SHARED1-+-proj1
            |       |         +-proj2
            |       |
            |       +-SHARED2-+-proj3
            |       |         +-proj4
            |       |
            |       +-NLOG
            |
            +-misc----KEY
            |
            +-src-----prod----+-proj5
            |                 +-proj6
            |
            +-tools---NANT

Notes

  1. Repos are in CAPS
  2. All child repos are assumed to be subrepos
  3. 3rd party (binary) libs and internal (source) components are all subrepos located in the libs folder
  4. 3rd party libs are kept in individual mercurial repos so that consuming projects can reference particular versions of the libs (i.e. an old project may reference NLog v1.0, and a newer project may reference NLog v2.0).
  5. All Visual Studio .csproj files are at the 4th level (proj* folders) allowing for relative references to dependencies (i.e. ../../../libs/NLog/NLog.dll for all Visual Studio projects that reference NLog)
  6. All Visual Studio .sln files are at the 2nd level (src folders) so that they are not included when "sharing" a component into a consuming component or product
  7. Developers are free to organize their source files as they see fit, as long as the sources are children of proj* folder of the consuming Visual Studio project (i.e., there can be n children to the proj* folders, containing various sources/resources)
  8. If Bob is developing SHARED2 component and PROD1 product, it is perfectly legal for him to make changes the SHARED2 source (say sources belonging to proj3) within the PROD1_SLN repository and commit those changes. We don't mind if someone develops a library in the context of a consuming project.
  9. Internally developed components (SHARED1 and SHARED2) are generally included by source in consuming project (in Visual Studio adding a reference to a project rather than browsing to a dll reference). This allows for enhanced debugging (stepping into library code), allows Visual Studio to manage when it needs to rebuild projects (when dependencies are modified), and allows the modification of libraries when required (as described in the above note).

Questions

  1. If Bob is working on PROD1 and Alice is working on SHARED1, how can Bob know when Alice commits changes to SHARED1. Currently with Mercurial, Bob is forced to manually pull and update within each subrepo. If he pushes/pulls to the server from PROD_SLN repo, he never knows about updates to subrepos. This is described at Mercurial wiki. How can Bob be notified of updates to subrepos when he pulls the latest of PROD_SLN from the server? Ideally, he should be notified (preferable during the pull) and then have to manually decide which subrepos he wants to updated.

  2. Assume SHARED1 references NLog v1.0 (commit/rev abc in mercurial) and SHARED2 references Nlog v2.0 (commit/rev xyz in mercurial). If Bob is absorbing these two components in PROD1, he should be be made aware of this discrepancy. While technically Visual Studio/.NET would allow 2 assemblies to reference different versions of dependencies, my structure does not allow this because the path to NLog is fixed for all .NET projects that depend on NLog. How can Bob know that two of his dependencies have version conflicts?

  3. If Bob is setting up the repository structure for PROD1 and wants to include SHARED2, how can he know what dependencies are required for SHARED2? With my structure, he would have to manually clone (or browse on the server) the SHARED2_SLN repo and either look in the libs folder, or peak at the .hgsub file to determine what dependencies he needs to include. Ideally this would be automated. If I include SHARED2 in my product, SHARED1 and NLog are auto-magically included too, notifying me if there is version conflict with some other dependency (see question 2 above).

Bigger Questions

  1. Is mercurial the correct solution?

  2. Is there a better mercurial structure?

  3. Is this a valid use for subrepos (i.e. Mercurial developers marked subrepos as a feature of last resort)?

  4. Does it make sense to use mercurial for dependency management? We could use yet another tool for dependency management (maybe an internal NuGet feed?). While this would work well for 3rd party dependencies, it really would create a hassle for internally developed components (i.e. if they are actively developed, developers would have to constantly update the feed, we would have to serve them internally, and it would not allow components to be modified by a consuming project (Note 8 and Question 2).

  5. Do you have better a solution for Enterprise .NET software projects?

References

I have read several SO questions and found this one to be helpful, but the accepted answer suggests using a dedicated tool for dependencies. While I like the features of such a tool it does not allowed for dependencies to be modified and committed from a consuming project (see Bigger Question 4).

回答1:

This may not be the answer you were looking for, but we have recent experience of novice Mercurial users using sub-repos, and I've been looking for an opportunity to pass on our experience...

In summary, my advice based on experience is: however appealing Mercurial sub-repos may be, do not use them. Instead, find a way to lay out your directories side-by-side, and to adjust your builds to cope with that.

However appealing it seems to be to tie together revisions in the sub-repo with revisions in the parent repo, it just doesn't work in practice.

During all the preparation for the conversion, we received advice from multiple different sources that sub-repos were fragile and not well-implemented - but we went ahead anyway, as we wanted atomic commits between repo and sub-repo. The advice - or my understanding of it - talked more about the principles rather than the practical consequences.

It was only once we went live with Mercurial and a sub-repo, that I really understood the advice properly. Here (from memory) are examples of the sorts of problems we encountered.

  • Your users will end up fighting the update and merge process.
  • Some people will update the parent repo and not the sub-repo
  • Some people will push from the sub-repo, ang .hgsubstate won't get updated.
  • You will end up "losing" revisions that were made in the sub-repo, because someone will manage to leave the .hgsubstate in an incorrect state after a merge.
  • Some users will get into the situation where the .hgsubstate has been updated but the sub-repo hasn't, and then you'll get really cryptic error messages, and will spend many hours trying to work out what's going on.
  • And if you do tagging and branching for releases, the instructions for how to get this right for both parent and sub-repo will be many dozens of lines long. (And I even had a nice, tame Mercurial expert help me write the instructions!)

All of these things are annoying enough in the hands of expert users - but when you are rolling out Mercurial to novice users, they are a real nightmare, and the source of much wasted time.

So, having put in a lot of time to get a conversion with a sub-repo, several weeks later we then converted the sub-repo to a repo. Because we had large amounts of history in the conversion that referred to the sub-repo, via .hgsubstate, it's left us with something much more complicated.

I only wish I'd really appreciated the practical consequences of all the advice much earlier on, e.g. in Mercurial's Features of Last Resort page:

But I need to have managed subprojects!

Again, don't be so sure. Significant projects like Mozilla that have tons of dependencies do just fine without using subrepos. Most smaller projects will almost certainly be better off without using subrepos.


Edit: Thoughts on shell repos

With the disclaimer I don't have any experience of them...

No, I don't think many of them are. You are still using sub-repos, so all the same user issues apply (unless you can provide a wrapper script for every step, of course, to remove the need for humans to supply the correct options to handle sub-repos.)

Also note that the wiki page you quoted does list some specific issues with shell repos:

  • overly-strict tracking of relationship between project/ and somelib/
  • impossible to check or push project/ if somelib/ source repo becomes
  • unavailable lack of well-defined support for recursive diff, log, and
  • status recursive nature of commit surprising

Edit 2 - do a trial, involving all your users

The point at which we really started realising we had an issue was once multiple users started making commits, and pulling and pushing - including changes to the sub-repo. For us, it was too late in the day to respond to these issues. If we'd known them sooner, we could have responded much more easily and simply.

So at this point, the best advice I think I can offer is to recommend that you do a trial run of the project layout before the layout is carved in stone.

We left the full-scale trial until too late to make changes, and even then people only made changes in the parent repo, and not the sub-repos - so we still didn't see the full picture until too late.

In other words, whatever layout you consider, create a repository structure in that layout, and get lots of people making edits. Try to put enough real code into the various repos/sub-repos so that people can make real edits, even though they will be throw-way ones.

Possible outcomes:

  • You might find it all works fine - in which case, you'll have spent some time to gain certainty.
  • On the other hand, you might identify issues much more quickly than spending time trying to work out what the outcomes would be
  • And your users will learn a lot too.


回答2:

Question 1:

This command, when executed in the parent "shell" repo will traverse all subrepos and list changesets on from the default pull location that are not present:

hg incoming --subrepos

The same thing can be accomplished by clicking on the "Incoming" button on the "Synchronize" pane in TortoiseHg if you have the "--subrepos" option checked (on the same pane).

Thanks to the users in the mercurial IRC channel for helping here.

Questions 2 & 3:

First I need to modify my repo structures so that the parent repos are truly "shell" repos as recommended on the hg wiki. I will take this to the extreme and say that the shell should contain no content, only subrepos as children. In summary, rename src to main, move docs into the subrepo under main, and change the prod folder to a subrepo.

SHARED1_SLN:

SHARED1_SLN-+-libs----NLOG
            |
            +-misc----KEY
            |
            +-main----SHARED1-+-docs
            |                 +-proj1
            |                 +-proj2
            |
            +-tools---NANT

SHARED2_SLN:

SHARED2_SLN-+-libs--+-SHARED1-+-docs
            |       |         +-proj1
            |       |         +-proj2
            |       |
            |       +-NLOG
            |
            +-misc----KEY
            |
            +-main----SHARED2-+-docs
            |                 +-proj3
            |                 +-proj4
            |
            +-tools---NANT            

PROD_SLN:

PROD_SLN----+-libs--+-SHARED1-+-docs
            |       |         +-proj2
            |       |         +-proj2
            |       |
            |       +-SHARED2-+-docs
            |       |         +-proj3
            |       |         +-proj4
            |       |
            |       +-NLOG
            |
            +-misc----KEY
            |
            +-main----PROD----+-docs
            |                 +-proj5
            |                 +-proj6
            |
            +-tools---NANT
  1. All shared libs and products have there own repo (SHARED1, SHARED2, and PROD).
  2. If you need to work on a shared lib or product independently, there is a shell available (my repos ending with _SLN) that uses hg to manage the revisions of the dependencies. The shell is only for convenience because it contains no content, only subrepos.
  3. When rolling a release of a shared lib or product, the developer should list the all of the dependencies and their hg revs/changesets (or preferably human friendly tags) that were used to create the release. This list should be saved in a file in the repo for the lib or product (SHARED1, SHARED2, or PROD), not the shell. See Note A below for how this could solve Questions 2 & 3.
  4. If I roll a release of a shared lib or product I should put matching tags in the in the projects repo and it's shell for convenience, however, if the shell gets out of whack (a concern expressed from real experience in @Clare 's answer), it really should not matter because the shell itself is dumb and contains no content.
  5. Visual Studio sln files go into the root of the shared lib or product's repo (SHARED1, SHARED2, or PROD), again, not the shell. The result being if I include SHARED1 in PROD, I may end up with some extra solutions that I never open, but it doesn't matter. Furthermore, if I really want to work on SHARED1 and run it's unit tests (while working in PROD_SLN shell), it's really easy, just open the said solution.

Note A:

In regards to point 3 above, if the dependency file use a format similar to .hgsub but with the addition of the rev/changeset/tag, then getting the dependencies could be automated. For example, I want SHARED1 in my new product. Clone SHARED1 to my libs folder and update to the tip or the last release label. Now, I need to look at the dependencies file and a) clone the dependency to the correct location and b) update to the specified rev/changeset/tag. Very feasible to automate this. To take it further, it could even track the rev/changeset/tag and alert the developer of there is dependency conflict between shared libs.

A hole remains if Alice is actively developing SHARED1 while Bob is developing PROD. If Alice updates SHARED1_SLN to use NLog v3.0, Bob may not ever know this. If Alice updates her dependency file to reflect the change then Bob does have the info, he just has to be made aware of the change.

Bigger Questions 1 & 4:

I believe that this is a source control issue and not a something that can be solved with a dependency management tool since they generally work with binaries and only get dependencies (don't allow committing changes back to the dependencies). My dependency problems are not unique to Mercurial. From my experience, all source control tools have the same problem. One solution in SVN would be to just use svn:externals (or svn copies) and recursively have every component include its dependencies, creating a possibly huge tree to build a product. However, this falls apart in Visual Studio where I really only want to include one instance of a shared project and reference it everywhere. As implied by @Clare 's answer and Greg's response to my email to the hg mail list, keep components as flat as possible.

Bigger Questions 2 & 3:

There is a better structure as I have laid out above. I believe we have a strong use case for using subrepos and I do not see a viable alternative. As mentioned in @Clare 's answer, there is a camp that believes dependencies can be managed without subrepos. However, I have yet to see any evidence or actual references to back this statement up.

Bigger Question 5:

Still open to better ideas...