Sequences of bounded integers

This module provides BoundedIntegerSequence, which implements sequences of bounded integers and is for many (but not all) operations faster than representing the same sequence as a Python tuple.

The underlying data structure is similar to Bitset, which means that certain operations are implemented by using fast shift operations from MPIR. The following boilerplate functions can be cimported in Cython modules:

  • cdef bint biseq_init(biseq_t R, mp_size_t l, mp_size_t itemsize) except -1

    Allocate memory for a bounded integer sequence of length l with items fitting in itemsize bits.

  • cdef inline void biseq_dealloc(biseq_t S)

    Deallocate the memory used by S.

  • cdef bint biseq_init_copy(biseq_t R, biseq_t S)

    Initialize R as a copy of S.

  • cdef tuple biseq_pickle(biseq_t S)

    Return a triple (bitset_data, itembitsize, length) defining S.

  • cdef bint biseq_unpickle(biseq_t R, tuple bitset_data, mp_bitcnt_t itembitsize, mp_size_t length) except -1

    Initialise R from data returned by biseq_pickle.

  • cdef bint biseq_init_list(biseq_t R, list data, size_t bound) except -1

    Convert a list to a bounded integer sequence, which must not be allocated.

  • cdef inline Py_hash_t biseq_hash(biseq_t S)

    Hash value for S.

  • cdef inline bint biseq_richcmp(biseq_t S1, biseq_t S2, int op)

    Comparison of S1 and S2. This takes into account the bound, the length, and the list of items of the two sequences.

  • cdef bint biseq_init_concat(biseq_t R, biseq_t S1, biseq_t S2) except -1

    Concatenate S1 and S2 and write the result to R. Does not test whether the sequences have the same bound!

  • cdef inline bint biseq_startswith(biseq_t S1, biseq_t S2)

    Is S1=S2+something? Does not check whether the sequences have the same bound!

  • cdef mp_size_t biseq_contains(biseq_t S1, biseq_t S2, mp_size_t start) except -2

    Return the position in S1 of S2 as a subsequence of S1[start:], or -1 if S2 is not a subsequence. Does not check whether the sequences have the same bound!

  • cdef mp_size_t biseq_starswith_tail(biseq_t S1, biseq_t S2, mp_size_t start) except -2:

    Return the smallest number i such that the bounded integer sequence S1 starts with the sequence S2[i:], where start <= i < S1.length, or return -1 if no such i exists.

  • cdef mp_size_t biseq_index(biseq_t S, size_t item, mp_size_t start) except -2

    Return the position in S of the item in S[start:], or -1 if S[start:] does not contain the item.

  • cdef size_t biseq_getitem(biseq_t S, mp_size_t index)

    Return S[index], without checking margins.

  • cdef size_t biseq_getitem_py(biseq_t S, mp_size_t index)

    Return S[index] as Python int or long, without checking margins.

  • cdef biseq_inititem(biseq_t S, mp_size_t index, size_t item)

    Set S[index] = item, without checking margins and assuming that S[index] has previously been zero.

  • cdef inline void biseq_clearitem(biseq_t S, mp_size_t index)

    Set S[index] = 0, without checking margins.

  • cdef bint biseq_init_slice(biseq_t R, biseq_t S, mp_size_t start, mp_size_t stop, mp_size_t step) except -1

    Initialise R with S[start:stop:step].

AUTHORS:

class sage.data_structures.bounded_integer_sequences.BoundedIntegerSequence

Bases: object

A sequence of non-negative uniformely bounded integers.

INPUT:

  • bound – non-negative integer. When zero, a ValueError will be raised. Otherwise, the given bound is replaced by the power of two that is at least the given bound.
  • data – a list of integers.

EXAMPLES:

We showcase the similarities and differences between bounded integer sequences and lists respectively tuples.

To distinguish from tuples or lists, we use pointed brackets for the string representation of bounded integer sequences:

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: S = BoundedIntegerSequence(21, [2, 7, 20]); S
<2, 7, 20>

Each bounded integer sequence has a bound that is a power of two, such that all its item are less than this bound:

sage: S.bound()
32
sage: BoundedIntegerSequence(16, [2, 7, 20])
Traceback (most recent call last):
...
OverflowError: list item 20 larger than 15

Bounded integer sequences are iterable, and we see that we can recover the originally given list:

sage: L = [randint(0,31) for i in range(5000)]
sage: S = BoundedIntegerSequence(32, L)
sage: list(L) == L
True

Getting items and slicing works in the same way as for lists:

sage: n = randint(0,4999)
sage: S[n] == L[n]
True
sage: m = randint(0,1000)
sage: n = randint(3000,4500)
sage: s = randint(1, 7)
sage: list(S[m:n:s]) == L[m:n:s]
True
sage: list(S[n:m:-s]) == L[n:m:-s]
True

The index() method works different for bounded integer sequences and tuples or lists. If one asks for the index of an item, the behaviour is the same. But we can also ask for the index of a sub-sequence:

sage: L.index(L[200]) == S.index(L[200])
True
sage: S.index(S[100:2000])    # random
100

Similarly, containment tests work for both items and sub-sequences:

sage: S[200] in S
True
sage: S[200:400] in S
True
sage: S[200]+S.bound() in S
False

Bounded integer sequences are immutable, and thus copies are identical. This is the same for tuples, but of course not for lists:

sage: T = tuple(S)
sage: copy(T) is T
True
sage: copy(S) is S
True
sage: copy(L) is L
False

Concatenation works in the same way for lists, tuples and bounded integer sequences:

sage: M = [randint(0,31) for i in range(5000)]
sage: T = BoundedIntegerSequence(32, M)
sage: list(S+T)==L+M
True
sage: list(T+S)==M+L
True
sage: (T+S == S+T) == (M+L == L+M)
True

However, comparison works different for lists and bounded integer sequences. Bounded integer sequences are first compared by bound, then by length, and eventually by reverse lexicographical ordering:

sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9])
sage: T = BoundedIntegerSequence(51, [4,1,6,2,7,20])
sage: S < T   # compare by bound, not length
True
sage: T < S
False
sage: S.bound() < T.bound()
True
sage: len(S) > len(T)
True
sage: T = BoundedIntegerSequence(21, [0,0,0,0,0,0,0,0])
sage: S < T    # compare by length, not lexicographically
True
sage: T < S
False
sage: list(T) < list(S)
True
sage: len(T) > len(S)
True
sage: T = BoundedIntegerSequence(21, [4,1,5,2,8,20,9])
sage: T > S   # compare by reverse lexicographic ordering...
True
sage: S > T
False
sage: len(S) == len(T)
True
sage: list(S) > list(T) # direct lexicographic ordering is different
True
bound()

Return the bound of this bounded integer sequence.

All items of this sequence are non-negative integers less than the returned bound. The bound is a power of two.

EXAMPLES:

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9])
sage: T = BoundedIntegerSequence(51, [4,1,6,2,7,20,9])
sage: S.bound()
32
sage: T.bound()
64
index(other)

The index of a given item or sub-sequence of self

EXAMPLES:

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: S = BoundedIntegerSequence(21, [4,1,6,2,6,20,9,0])
sage: S.index(6)
2
sage: S.index(5)
Traceback (most recent call last):
...
ValueError: 5 is not in sequence
sage: S.index(BoundedIntegerSequence(21, [6, 2, 6]))
2
sage: S.index(BoundedIntegerSequence(21, [6, 2, 7]))
Traceback (most recent call last):
...
ValueError: not a sub-sequence

The bound of (sub-)sequences matters:

sage: S.index(BoundedIntegerSequence(51, [6, 2, 6]))
Traceback (most recent call last):
...
ValueError: not a sub-sequence
sage: S.index(0)
7
sage: S.index(S.bound())
Traceback (most recent call last):
...
ValueError: 32 is not in sequence
list()

Converts this bounded integer sequence to a list

NOTE:

A conversion to a list is also possible by iterating over the sequence.

EXAMPLES:

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: L = [randint(0,26) for i in range(5000)]
sage: S = BoundedIntegerSequence(32, L)
sage: S.list() == list(S) == L
True

The discussion at trac ticket #15820 explains why the following is a good test:

sage: (BoundedIntegerSequence(21, [0,0]) + BoundedIntegerSequence(21, [0,0])).list()
[0, 0, 0, 0]
maximal_overlap(other)

Returns self’s maximal trailing sub-sequence that other starts with.

Returns None if there is no overlap

EXAMPLES:

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: X = BoundedIntegerSequence(21, [4,1,6,2,7,2,3])
sage: S = BoundedIntegerSequence(21, [0,0,0,0,0,0,0])
sage: T = BoundedIntegerSequence(21, [2,7,2,3,0,0,0,0,0,0,0,1])
sage: (X+S).maximal_overlap(T)
<2, 7, 2, 3, 0, 0, 0, 0, 0, 0, 0>
sage: print((X+S).maximal_overlap(BoundedIntegerSequence(21, [2,7,2,3,0,0,0,0,0,1])))
None
sage: (X+S).maximal_overlap(BoundedIntegerSequence(21, [0,0]))
<0, 0>
sage: B1 = BoundedIntegerSequence(4,[1,2,3,2,3,2,3])
sage: B2 = BoundedIntegerSequence(4,[2,3,2,3,2,3,1])
sage: B1.maximal_overlap(B2)
<2, 3, 2, 3, 2, 3>
startswith(other)

Tells whether self starts with a given bounded integer sequence

EXAMPLES:

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: L = [randint(0,26) for i in range(5000)]
sage: S = BoundedIntegerSequence(27, L)
sage: L0 = L[:1000]
sage: T = BoundedIntegerSequence(27, L0)
sage: S.startswith(T)
True
sage: L0[-1] += 1
sage: T = BoundedIntegerSequence(27, L0)
sage: S.startswith(T)
False
sage: L0[-1] -= 1
sage: L0[0] += 1
sage: T = BoundedIntegerSequence(27, L0)
sage: S.startswith(T)
False
sage: L0[0] -= 1

The bounds of the sequences must be compatible, or startswith() returns False:

sage: T = BoundedIntegerSequence(51, L0)
sage: S.startswith(T)
False
sage.data_structures.bounded_integer_sequences.NewBISEQ(bitset_data, itembitsize, length)

Helper function for unpickling of BoundedIntegerSequence.

EXAMPLES:

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: L = [randint(0,26) for i in range(5000)]
sage: S = BoundedIntegerSequence(32, L)
sage: loads(dumps(S)) == S    # indirect doctest
True