String Monoid Elements¶

AUTHORS:

David Kohel <kohel@maths.usyd.edu.au>, 2007-01

Elements of free string monoids, internal representation subject to change.

These are special classes of free monoid elements with distinct printing.

The internal representation of elements does not use the exponential compression of FreeMonoid elements (a feature), and could be packed into words.

class sage.monoids.string_monoid_element.StringMonoidElement(S, x, check=True)[source]¶

Bases: FreeMonoidElement

Element of a free string monoid.

character_count()[source]¶

Return the count of each unique character.

EXAMPLES:

Count the character frequency in an object comprised of capital letters of the English alphabet:

Sage

sage: M = AlphabeticStrings().encoding("abcabf")
sage: sorted(M.character_count().items())
[(A, 2), (B, 2), (C, 1), (F, 1)]

Python

>>> from sage.all import *
>>> M = AlphabeticStrings().encoding("abcabf")
>>> sorted(M.character_count().items())
[(A, 2), (B, 2), (C, 1), (F, 1)]

In an object comprised of binary numbers:

Sage

sage: M = BinaryStrings().encoding("abcabf")
sage: sorted(M.character_count().items())
[(0, 28), (1, 20)]

Python

>>> from sage.all import *
>>> M = BinaryStrings().encoding("abcabf")
>>> sorted(M.character_count().items())
[(0, 28), (1, 20)]

In an object comprised of octal numbers:

Sage

sage: A = OctalStrings()
sage: M = A([1, 2, 3, 2, 5, 3])
sage: sorted(M.character_count().items())
[(1, 1), (2, 2), (3, 2), (5, 1)]

Python

>>> from sage.all import *
>>> A = OctalStrings()
>>> M = A([Integer(1), Integer(2), Integer(3), Integer(2), Integer(5), Integer(3)])
>>> sorted(M.character_count().items())
[(1, 1), (2, 2), (3, 2), (5, 1)]

In an object comprised of hexadecimal numbers:

Sage

sage: A = HexadecimalStrings()
sage: M = A([1, 2, 4, 6, 2, 4, 15])
sage: sorted(M.character_count().items())
[(1, 1), (2, 2), (4, 2), (6, 1), (f, 1)]

Python

>>> from sage.all import *
>>> A = HexadecimalStrings()
>>> M = A([Integer(1), Integer(2), Integer(4), Integer(6), Integer(2), Integer(4), Integer(15)])
>>> sorted(M.character_count().items())
[(1, 1), (2, 2), (4, 2), (6, 1), (f, 1)]

In an object comprised of radix-64 characters:

Sage

sage: A = Radix64Strings()
sage: M = A([1, 2, 63, 45, 45, 10]); M
BC/ttK
sage: sorted(M.character_count().items())
[(B, 1), (C, 1), (K, 1), (t, 2), (/, 1)]

Python

>>> from sage.all import *
>>> A = Radix64Strings()
>>> M = A([Integer(1), Integer(2), Integer(63), Integer(45), Integer(45), Integer(10)]); M
BC/ttK
>>> sorted(M.character_count().items())
[(B, 1), (C, 1), (K, 1), (t, 2), (/, 1)]

coincidence_index(prec=0)[source]¶: Return the probability of two randomly chosen characters being equal.

decoding(padic=False)[source]¶

The byte string associated to a binary or hexadecimal string monoid element.

EXAMPLES:

Sage

sage: S = HexadecimalStrings()
sage: s = S.encoding("A..Za..z"); s
412e2e5a612e2e7a
sage: s.decoding()
'A..Za..z'
sage: s = S.encoding("A..Za..z",padic=True); s
14e2e2a516e2e2a7
sage: s.decoding()
'\x14\xe2\xe2\xa5\x16\xe2\xe2\xa7'
sage: s.decoding(padic=True)
'A..Za..z'
sage: S = BinaryStrings()
sage: s = S.encoding("A..Za..z"); s
0100000100101110001011100101101001100001001011100010111001111010
sage: s.decoding()
'A..Za..z'
sage: s = S.encoding("A..Za..z",padic=True); s
1000001001110100011101000101101010000110011101000111010001011110
sage: s.decoding()
'\x82ttZ\x86tt^'
sage: s.decoding(padic=True)
'A..Za..z'

Python

>>> from sage.all import *
>>> S = HexadecimalStrings()
>>> s = S.encoding("A..Za..z"); s
412e2e5a612e2e7a
>>> s.decoding()
'A..Za..z'
>>> s = S.encoding("A..Za..z",padic=True); s
14e2e2a516e2e2a7
>>> s.decoding()
'\x14\xe2\xe2\xa5\x16\xe2\xe2\xa7'
>>> s.decoding(padic=True)
'A..Za..z'
>>> S = BinaryStrings()
>>> s = S.encoding("A..Za..z"); s
0100000100101110001011100101101001100001001011100010111001111010
>>> s.decoding()
'A..Za..z'
>>> s = S.encoding("A..Za..z",padic=True); s
1000001001110100011101000101101010000110011101000111010001011110
>>> s.decoding()
'\x82ttZ\x86tt^'
>>> s.decoding(padic=True)
'A..Za..z'

frequency_distribution(length=1, prec=0)[source]¶

Return the probability space of character frequencies.

The output of this method is different from that of the method characteristic_frequency().

One can think of the characteristic frequency probability of an element in an alphabet \(A\) as the expected probability of that element occurring. Let \(S\) be a string encoded using elements of \(A\). The frequency probability distribution corresponding to \(S\) provides us with the frequency probability of each element of \(A\) as observed occurring in \(S\). Thus one distribution provides expected probabilities, while the other provides observed probabilities.

INPUT:

length – (default: 1) if length=1 then consider the probability space of monogram frequency, i.e. probability distribution of single characters. If length=2 then consider the probability space of digram frequency, i.e. probability distribution of pairs of characters. This method currently supports the generation of probability spaces for monogram frequency (length=1) and digram frequency (length=2).
prec – (default: 0) a nonnegative integer representing the precision (in number of bits) of a floating-point number. The default value prec=0 means that we use 53 bits to represent the mantissa of a floating-point number. For more information on the precision of floating-point numbers, see the function RealField() or refer to the module real_mpfr.

EXAMPLES:

Capital letters of the English alphabet:

Sage

sage: M = AlphabeticStrings().encoding("abcd")
sage: L = M.frequency_distribution().function()
sage: sorted(L.items())

[(A, 0.250000000000000),
(B, 0.250000000000000),
(C, 0.250000000000000),
(D, 0.250000000000000)]

Python

>>> from sage.all import *
>>> M = AlphabeticStrings().encoding("abcd")
>>> L = M.frequency_distribution().function()
>>> sorted(L.items())
<BLANKLINE>
[(A, 0.250000000000000),
(B, 0.250000000000000),
(C, 0.250000000000000),
(D, 0.250000000000000)]

The binary number system:

Sage

sage: M = BinaryStrings().encoding("abcd")
sage: L = M.frequency_distribution().function()
sage: sorted(L.items())
[(0, 0.593750000000000), (1, 0.406250000000000)]

Python

>>> from sage.all import *
>>> M = BinaryStrings().encoding("abcd")
>>> L = M.frequency_distribution().function()
>>> sorted(L.items())
[(0, 0.593750000000000), (1, 0.406250000000000)]

The hexadecimal number system:

Sage

sage: M = HexadecimalStrings().encoding("abcd")
sage: L = M.frequency_distribution().function()
sage: sorted(L.items())

[(1, 0.125000000000000),
(2, 0.125000000000000),
(3, 0.125000000000000),
(4, 0.125000000000000),
(6, 0.500000000000000)]

Python

>>> from sage.all import *
>>> M = HexadecimalStrings().encoding("abcd")
>>> L = M.frequency_distribution().function()
>>> sorted(L.items())
<BLANKLINE>
[(1, 0.125000000000000),
(2, 0.125000000000000),
(3, 0.125000000000000),
(4, 0.125000000000000),
(6, 0.500000000000000)]

Get the observed frequency probability distribution of digrams in the string “ABCD”. This string consists of the following digrams: “AB”, “BC”, and “CD”. Now find out the frequency probability of each of these digrams as they occur in the string “ABCD”:

Sage

sage: M = AlphabeticStrings().encoding("abcd")
sage: D = M.frequency_distribution(length=2).function()
sage: sorted(D.items())
[(AB, 0.333333333333333), (BC, 0.333333333333333), (CD, 0.333333333333333)]

Python

>>> from sage.all import *
>>> M = AlphabeticStrings().encoding("abcd")
>>> D = M.frequency_distribution(length=Integer(2)).function()
>>> sorted(D.items())
[(AB, 0.333333333333333), (BC, 0.333333333333333), (CD, 0.333333333333333)]

sage.monoids.string_monoid_element.is_AlphabeticStringMonoidElement(x)[source]¶

sage.monoids.string_monoid_element.is_BinaryStringMonoidElement(x)[source]¶

sage.monoids.string_monoid_element.is_HexadecimalStringMonoidElement(x)[source]¶

sage.monoids.string_monoid_element.is_OctalStringMonoidElement(x)[source]¶

sage.monoids.string_monoid_element.is_Radix64StringMonoidElement(x)[source]¶

sage.monoids.string_monoid_element.is_StringMonoidElement(x)[source]¶