Packaging the Sage Library

Modules, packages, distribution packages

The Sage library consists of a large number of Python modules, organized into a hierarchical set of packages that fill the namespace sage. All source files are located in a subdirectory of the directory SAGE_ROOT/src/sage/.

For example,

  • the file SAGE_ROOT/src/sage/coding/code_bounds.py provides the module sage.coding.code_bounds;

  • the directory containing this file, SAGE_ROOT/src/sage/coding/, thus provides the package sage.coding.

There is another notion of “package” in Python, the distribution package (also known as a “distribution” or a “pip-installable package”). Currently, the entire Sage library is provided by a single distribution, sagemath-standard, which is generated from the directory SAGE_ROOT/pkgs/sagemath-standard.

Note that the distribution name is not required to be a Python identifier. In fact, using dashes (-) is preferred to underscores in distribution names; setuptools and other parts of Python’s packaging infrastructure normalize underscores to dashes. (Using dots in distribution names, to indicate ownership by organizations, still mentioned in PEP 423, appears to have largely fallen out of favor, and we will not use it in the SageMath project.)

A distribution that provides Python modules in the sage.* namespace, say mainly from sage.PAC.KAGE, should be named sagemath-DISTRI-BUTION. Example:

  • The distribution sagemath-categories provides a small subset of the modules of the Sage library, mostly from the packages sage.structure, sage.categories, and sage.misc.

Other distributions should not use the prefix sagemath- in the distribution name. Example:

  • The distribution sage-sws2rst provides the Python package sage_sws2rst, so it does not fill the sage.* namespace and therefore does not use the prefix sagemath-.

A distribution that provides functionality that does not need to import anything from the sage namespace should not use the sage namespace for its own packages/modules. It should be positioned as part of the general Python ecosystem instead of as a Sage-specific distribution. Examples:

  • The distribution pplpy provides the Python package ppl and is a much extended version of what used to be sage.libs.ppl, a part of the Sage library. The package sage.libs.ppl had dependencies on sage.rings to convert to/from Sage number types. pplpy has no such dependencies and is therefore usable in a wider range of Python projects.

  • The distribution memory-allocator provides the Python package memory_allocator. This used to be sage.ext.memory_allocator, a part of the Sage library.

Ordinary packages vs. implicit namespace packages

Each module of the Sage library must be packaged in exactly one distribution package. However, modules in a package may be included in different distribution packages. In this regard, there is an important constraint that an ordinary package (directory with __init__.py file) cannot be split into more than one distribution package.

By removing the __init__.py file, however, we can make the package an “implicit” (or “native”) “namespace” package, following PEP 420. Implicit namespace packages can be included in more than one distribution package. Hence whenever there are two distribution packages that provide modules with a common prefix of Python packages, that prefix needs to be a implicit namespace package, i.e., there cannot be an __init__.py file.

For example,

  • sagemath-tdlib will provide sage.graphs.graph_decompositions.tdlib,

  • sagemath-rw will provide sage.graphs.graph_decompositions.rankwidth,

  • sagemath-graphs will provide all of the rest of sage.graphs.graph_decompositions (and most of sage.graphs).

Then, none of

  • sage,

  • sage.graphs,

  • sage.graphs.graph_decomposition

can be an ordinary package (with an __init__.py file), but rather each of them has to be an implicit namespace package (no __init__.py file).

For an implicit namespace package, __init__.py cannot be used any more for initializing the package.

In the Sage 9.6 development cycle, we still use ordinary packages by default, but several packages are converted to implicit namespace packages to support modularization.

Source directories of distribution packages

The development of the Sage library uses a monorepo strategy for all distribution packages that fill the sage.* namespace. This means that the source trees of these distributions are included in a single git repository, in a subdirectory of SAGE_ROOT/pkgs.

All these distribution packages have matching version numbers. From the viewpoint of a single distribution, this means that sometimes there will be a new release of some distribution where the only thing changing is the version number.

The source directory of a distribution package, such as SAGE_ROOT/pkgs/sagemath-standard, contains the following files:

  • sage – a relative symbolic link to the monolithic Sage library source tree SAGE_ROOT/src/sage/

  • MANIFEST.in – controls which files and directories of the monolithic Sage library source tree are included in the distribution

  • pyproject.toml, setup.cfg, and requirements.txt – standard Python packaging metadata, declaring the distribution name, dependencies, etc.

  • README.rst – a description of the distribution

  • VERSION.txt, LICENSE.txt – relative symbolic links to the same files in SAGE_ROOT/src

  • setup.py – a setuptools-based installation script

  • tox.ini – configuration for testing with tox

The technique of using symbolic links pointing into SAGE_ROOT/src has allowed the modularization effort to keep the SAGE_ROOT/src tree monolithic: Modularization has been happening behind the scenes and will not change where Sage developers find the source files. When adding a new distribution package that uses a symbolic link pointing into SAGE_ROOT/src, please update search.exclude in SAGE_ROOT/.vscode/settings.json.

Some of these files may actually be generated from source files with suffix .m4 by the SAGE_ROOT/bootstrap script via the m4 macro processor.

Dependencies and distribution packages

When preparing a portion of the Sage library as a distribution package, dependencies matter.

Build-time dependencies

If the portion of the library contains any Cython modules, these modules are compiled during the wheel-building phase of the distribution package. If the Cython module uses cimport to pull in anything from .pxd files, these files must be either part of the portion shipped as the distribution being built, or the distribution that provides these files must be installed in the build environment. Also, any C/C++ libraries that the Cython module uses must be accessible from the build environment.

Declaring build-time dependencies: Modern Python packaging provides a mechanism to declare build-time dependencies on other distribution packages via the file pyproject.toml ([build-system] requires); this has superseded the older setup_requires declaration. (There is no mechanism to declare anything regarding the C/C++ libraries.)

While the namespace sage.* is organized roughly according to mathematical fields or categories, how we partition the implementation modules into distribution packages has to respect the hard constraints that are imposed by the build-time dependencies.

We can define some meaningful small distributions that just consist of a single or a few Cython modules. For example, sagemath-tdlib (trac ticket #29864) would just package the single Cython module that must be linked with tdlib, sage.graphs.graph_decompositions.tdlib. Starting with the Sage 9.6 development cycle, as soon as namespace packages are activated, we can start to create these distributions. This is quite a mechanical task.

Reducing build-time dependencies: Sometimes it is possible to replace build-time dependencies of a Cython module on a library by a runtime dependency. In other cases, it may be possible to split a module that simultaneously depends on several libraries into smaller modules, each of which has narrower dependencies.

Module-level runtime dependencies

Any import statements at the top level of a Python or Cython module are executed when the module is imported. Hence, the imported modules must be part of the distribution, or provided by another distribution – which then must be declared as a run-time dependency.

Declaring run-time dependencies: These dependencies are declared in setup.cfg (generated from setup.cfg.m4) as install_requires.

Reducing module-level run-time dependencies:

  • Avoid importing from sage.PAC.KAGE.all modules when sage.PAC.KAGE is a namespace package. The main purpose of the *.all modules is for populating the global interactive environment that is available to users at the sage: prompt. In particular, no Sage library code should import from sage.rings.all.

  • Replace module-level imports by method-level imports. Note that this comes with a small runtime overhead, which can become noticeable if the method is called in tight inner loops.

  • Sage provides the lazy_import() mechanism. Lazy imports can be declared at the module level, but the actual importing is only done on demand. It is a runtime error at that time if the imported module is not present. This can be convenient compared to local imports in methods when the same imports are needed in several methods.

  • Avoid the “modularization anti-pattern” of importing a class from another module just to run an isinstance(object, Class) test, in particular when the module implementing Class has heavy dependencies. For example, importing the class pAdicField (or the function is_pAdicField) requires the libraries NTL and PARI.

    Instead, provide an abstract base class (ABC) in a module that only has light dependencies, make Class a subclass of ABC, and use isinstance(object, ABC). For example, sage.rings.abc provides abstract base classes for many ring (parent) classes, including sage.rings.abc.pAdicField. So we can replace:

    from sage.rings.padics.generic_nodes import pAdicFieldGeneric  # heavy dependencies
    isinstance(object, pAdicFieldGeneric)
    

    and:

    from sage.rings.padics.generic_nodes import is_pAdicField      # heavy dependencies
    is_pAdicField(object)                                          # deprecated
    

    by:

    import sage.rings.abc                                          # no dependencies
    isinstance(object, sage.rings.abc.pAdicField)
    

    Note that going through the abstract base class only incurs a small performance penalty:

    sage: object = Qp(5)
    
    sage: from sage.rings.padics.generic_nodes import pAdicFieldGeneric
    sage: %timeit isinstance(object, pAdicFieldGeneric)            # fast                           # not tested
    68.7 ns ± 2.29 ns per loop (...)
    
    sage: import sage.rings.abc
    sage: %timeit isinstance(object, sage.rings.abc.pAdicField)    # also fast                      # not tested
    122 ns ± 1.9 ns per loop (...)
    
  • If it is not possible or desired to create an abstract base class for isinstance testing (for example, when the class is defined in some external package), other solutions need to be used.

    Note that Python caches successful module imports, but repeating an unsuccessful module import incurs a cost every time:

    sage: from sage.schemes.generic.scheme import Scheme
    sage: sZZ = Scheme(ZZ)
    
    sage: def is_Scheme_or_Pluffe(x):
    ....:    if isinstance(x, Scheme):
    ....:        return True
    ....:    try:
    ....:        from xxxx_does_not_exist import Pluffe            # slow on every call
    ....:    except ImportError:
    ....:        return False
    ....:    return isinstance(x, Pluffe)
    
    sage: %timeit is_Scheme_or_Pluffe(sZZ)                         # fast                           # not tested
    111 ns ± 1.15 ns per loop (...)
    
    sage: %timeit is_Scheme_or_Pluffe(ZZ)                          # slow                           # not tested
    143 µs ± 2.58 µs per loop (...)
    

    The lazy_import() mechanism can be used to simplify this pattern via the __instancecheck__() method and has similar performance characteristics:

    sage: lazy_import('xxxx_does_not_exist', 'Pluffe')
    
    sage: %timeit isinstance(sZZ, (Scheme, Pluffe))                # fast                           # not tested
    95.2 ns ± 0.636 ns per loop (...)
    
    sage: %timeit isinstance(ZZ, (Scheme, Pluffe))                 # slow                           # not tested
    158 µs ± 654 ns per loop (...)
    

    It is faster to do the import only once, for example when loading the module, and to cache the failure. We can use the following idiom, which makes use of the fact that isinstance accepts arbitrarily nested lists and tuples of types:

    sage: try:
    ....:     from xxxx_does_not_exist import Pluffe               # runs once
    ....: except ImportError:
    ....:     # Set to empty tuple of types for isinstance
    ....:     Pluffe = ()
    
    sage: %timeit isinstance(sZZ, (Scheme, Pluffe))                # fast                           # not tested
    95.9 ns ± 1.52 ns per loop (...)
    
    sage: %timeit isinstance(ZZ, (Scheme, Pluffe))                 # fast                           # not tested
    126 ns ± 1.9 ns per loop (...)
    

Other runtime dependencies

If import statements are used within a method, the imported module is loaded the first time that the method is called. Hence the module defining the method can still be imported even if the module needed by the method is not present.

It is then a question whether a run-time dependency should be declared. If the method needing that import provides core functionality, then probably yes. But if it only provides what can be considered “optional functionality”, then probably not, and in this case it will be up to the user to install the distribution enabling this optional functionality.

As an example, let us consider designing a distribution that centers around the package sage.coding. First, let’s see if it uses symbolics:

(9.5.beta6) $ git grep -E 'sage[.](symbolic|functions|calculus)' src/sage/coding
src/sage/coding/code_bounds.py:        from sage.functions.other import ceil
...
src/sage/coding/grs_code.py:from sage.symbolic.ring import SR
...
src/sage/coding/guruswami_sudan/utils.py:from sage.functions.other import floor

Apparently it does not in a very substantial way:

  • The imports of the symbolic functions ceil() and floor() can likely be replaced by the artithmetic functions integer_floor() and integer_ceil().

  • Looking at the import of SR by sage.coding.grs_code, it seems that SR is used for running some symbolic sum, but the doctests do not show symbolic results, so it is likely that this can be replaced.

  • Note though that the above textual search for the module names is merely a heuristic. Looking at the source of “entropy”, through log from sage.misc.functional, a runtime dependency on symbolics comes in. In fact, for this reason, two doctests there are already marked as # optional - sage.symbolic.

So if packaged as sagemath-coding, now a domain expert would have to decide whether these dependencies on symbolics are strong enough to declare a runtime dependency (install_requires) on sagemath-symbolics. This declaration would mean that any user who installs sagemath-coding (pip install sagemath-coding) would pull in sagemath-symbolics, which has heavy compile-time dependencies (ECL/Maxima/FLINT/Singular/…).

The alternative is to consider the use of symbolics by sagemath-coding merely as something that provides some extra features, which will only be working if the user also has installed sagemath-symbolics.

Declaring optional run-time dependencies: It is possible to declare such optional dependencies as extras_require in setup.cfg (generated from setup.cfg.m4). This is a very limited mechanism – in particular it does not affect the build phase of the distribution in any way. It basically only provides a way to give a nickname to a distribution that can be installed as an add-on.

In our example, we could declare an extras_require so that users could use pip install sagemath-coding[symbolics].

Doctest-only dependencies

Doctests often use examples constructed using functionality provided by other portions of the Sage library. This kind of integration testing is one of the strengths of Sage; but it also creates extra dependencies.

Fortunately, these dependencies are very mild, and we can deal with them using the same mechanism that we use for making doctests conditional on the presence of optional libraries: using # optional - FEATURE directives in the doctests. Adding these directives will allow developers to test the distribution separately, without requiring all of Sage to be present.

Declaring doctest-only dependencies: The extras_require mechanism mentioned above can also be used for this.

Version constraints of dependencies

The version information for dependencies comes from the files build/pkgs/*/install-requires.txt and build/pkgs/*/package-version.txt. We use the m4 macro processor to insert the version information in the generated files pyproject.toml, setup.cfg, requirements.txt.

Hierarchy of distribution packages

_images/packaging_sage_library-1.svg

Testing distribution packages

Of course, we need tools for testing modularized distributions of portions of the Sage library.

  • Modularized distributions must be testable separately!

  • But we want to keep integration testing with other portions of Sage too!

Preparing doctests

Whenever an optional package is needed for a particular test, we use the doctest annotation # optional. This mechanism can also be used for making a doctest conditional on the presence of a portion of the Sage library.

The available tags take the form of package or module names such as sage.combinat, sage.graphs, sage.plot, sage.rings.number_field, sage.rings.real_double, and sage.symbolic. They are defined via Feature subclasses in the module sage.features.sagemath, which also provides the mapping from features to the distributions providing them (actually, to SPKG names). Using this mapping, Sage can issue installation hints to the user.

For example, the package sage.tensor is purely algebraic and has no dependency on symbolics. However, there are a small number of doctests that depend on sage.symbolic.ring.SymbolicRing for integration testing. Hence, these doctests are marked # optional - sage.symbolic.

Testing the distribution in virtual environments with tox

So how to test that this works?

Sure, we could go into the installation directory SAGE_VENV/lib/python3.9/site-packages/ and do rm -rf sage/symbolic and test that things still work. But that’s not a good way of testing.

Instead, we use a virtual environment in which we only install the distribution to be tested (and its Python dependencies).

Let’s try it out first with the entire Sage library, represented by the distribution sagemath-standard. Note that after Sage has been built normally, a set of wheels for all installed Python packages is available in SAGE_VENV/var/lib/sage/wheels/:

$ ls venv/var/lib/sage/wheels
Babel-2.9.1-py2.py3-none-any.whl
Cython-0.29.24-cp39-cp39-macosx_11_0_x86_64.whl
Jinja2-2.11.2-py2.py3-none-any.whl
...
sage_conf-9.5b6-py3-none-any.whl
...
scipy-1.7.2-cp39-cp39-macosx_11_0_x86_64.whl
setuptools-58.2.0-py3-none-any.whl
...
wheel-0.37.0-py2.py3-none-any.whl
widgetsnbextension-3.5.1-py2.py3-none-any.whl
zipp-3.5.0-py3-none-any.whl

Note in particular the wheel for sage-conf, which provides configuration variable settings and the connection to the non-Python packages installed in SAGE_LOCAL.

We can now set up a separate virtual environment, in which we install these wheels and our distribution to be tested. This is where tox comes into play: It is the standard Python tool for creating disposable virtual environments for testing. Every distribution in SAGE_ROOT/pkgs/ provides a configuration file tox.ini.

Following the comments in the file SAGE_ROOT/pkgs/sagemath-standard/tox.ini, we can try the following command:

$ ./bootstrap && ./sage -sh -c '(cd pkgs/sagemath-standard && SAGE_NUM_THREADS=16 tox -v -v -v -e py39-sagewheels-nopypi)'

This command does not make any changes to the normal installation of Sage. The virtual environment is created in a subdirectory of SAGE_ROOT/pkgs/sagemath-standard-no-symbolics/.tox/. After the command finishes, we can start the separate installation of the Sage library in its virtual environment:

$ pkgs/sagemath-standard/.tox/py39-sagewheels-nopypi/bin/sage

We can also run parts of the testsuite:

$ pkgs/sagemath-standard/.tox/py39-sagewheels-nopypi/bin/sage -tp 4 src/sage/graphs/

The whole .tox directory can be safely deleted at any time.

We can do the same with other distributions, for example the large distribution sagemath-standard-no-symbolics (from trac ticket #32601), which is intended to provide everything that is currently in the standard Sage library, i.e., without depending on optional packages, but without the packages sage.symbolic, sage.functions, sage.calculus, etc.

Again we can run the test with tox in a separate virtual environment:

$ ./bootstrap && ./sage -sh -c '(cd pkgs/sagemath-standard-no-symbolics && SAGE_NUM_THREADS=16 tox -v -v -v -e py39-sagewheels-nopypi)'

Some small distributions, for example the ones providing the two lowest levels, sagemath-objects and sagemath-categories (from trac ticket #29865), can be installed and tested without relying on the wheels from the Sage build:

$ ./bootstrap && ./sage -sh -c '(cd pkgs/sagemath-objects && SAGE_NUM_THREADS=16 tox -v -v -v -e py39)'

This command finds the declared build-time and run-time dependencies on PyPI, either as source tarballs or as prebuilt wheels, and builds and installs the distribution sagemath-objects in a virtual environment in a subdirectory of pkgs/sagemath-objects/.tox.

Building these small distributions serves as a valuable regression testsuite. However, a current issue with both of these distributions is that they are not separately testable: The doctests for these modules depend on a lot of other functionality from higher-level parts of the Sage library.