Third party library maintenance

Revision as of 22:37, 5 March 2012 by BradWhitlock (talk | contribs) (Changes to make our lives easier)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

build_visit is a wonderful tool for users, but it is an unwieldy architecture for continued support of our many third party dependencies.

Goals that the script has

build_visit exists to make it easy to.. well, build VisIt. One of the largest issues in building VisIt, especially for first time users, is that our third party libraries are numerous and must be compiled in specific ways. As two related examples, we compile Mesa specially by mangling the symbols, and then VTK is compiled (in part) against that mangled Mesa.

The goals of build_visit are:

  • Build system agnostic. Users should not need to know how to compile all of our third party dependencies. One library might build using the GNU autotools, another with CMake, a third with just raw makefiles; a user should not be expected to understand a plethora of build systems just to compile VisIt.
  • Encode VisIt-specific knowledge of compilation flags. As VisIt developers, we know that to compile Mesa with mangled symbols, one adds -DUSE_MGL_NAMESPACE to the compilation flags. There are many such flags in our third party dependencies, which may sometimes be required for VisIt to compile or function properly. Users should not be expected to know about all the flags across disparate packages.
  • You get what you pay for: A user that uses no HDF5 files should not have to download, compile, and install the HDF5 library.
  • Automation. Even a knowledgeable user does not normally want to spend the time required to look up where to download all of these packages, grab them all, and compile them manually.

Issues with current build_visit

  • Patches. There is little to no patch management system in build_visit. Patches are simply embedded inline and we attempt to apply them. If they fail, normally we continue on anyway, but this decision is made in an ad hoc manner.
  • Size and complexity. build_visit is over 9000 lines of shell script at this point. Each dependency within the script is essentially its own 'island'.
  • Upstream releases/changes. Given the number of third party components we depend on, it is a common occurrence that VisIt uses packages which are older than the released versions. build_visit does not lend itself to many of the issues that arise in such an environment: "Is our patch fixing X relevant in the new release?"; "How can I easily test to make sure VisIt supports versions A and B of this library?"; "Does a new release fix an issue which we haven't noticed yet?" (e.g. build errors due to missing headers with recent gcc's). etc.

Changes to make our lives easier

  • Version source (in third_party). Versioning binaries is a bad idea with subversion anyway, because it makes the repository sizes huge. More importantly, this provides us with a vehicle to persistently apply patches to a versioned release of a third party dependency. The important use case this allows is, "what has changed in an upstream release, and how is it relevant to us?". This also provides a convenient mechanism for generating such changes to contribute back to upstream.
  • Maintain build scripts and patches separately. This allows them to be updated independently. Further, it removes a barrier in identifying how "our" version of libraryX differs from the released version it is based on.
    • This could be in tandem with or an alternative to versioning source.
  • Provide an API instead of a series of united parts. We try to do this in some cases, but we're not vigilant enough in watching for duplicated functionality.
  • Most importantly, Work with upstream. It is much too common that we create a patch to get dependency D compiling on machine M, and then the issue drops. At the very least, we should send the patch to upstream and let them know we're using it to build on Platform P. Further, many times we reach for the 'patch solution' instead of the 'upgrade version solution', whereas the latter is almost always preferable (and upstream knows best in the case of ambiguity).

Source versioning can be done in a couple ways:

  1. We write our build scripts to take a series of our exported, pre-patched thirdparty trees (instead of a series of include directories and shared object files). Our system is then responsible for building and installing the packages.
    • This was recently done at SCI for a large package with many dependencies, and it has done wonders in easing build woes.
    • Unfortunately it violates one of our requirements: you only get what you pay for. I don't think we should mind doing this for developers who check out the source from subversion, but we need to be handle to case that some third party dependencies are missing.
  2. Maintain a separate build_visit script which knows how to obtain patched versions of thirdparty trees and build them.
    1. This could download a release from upstream, download our patches from <somewhere>, and finally apply our patches, or:
    2. It could know how to export our patched versions from a special directory in our repo (anonymous checkouts needed)