Free String Monoids¶

AUTHORS:

David Kohel <kohel@maths.usyd.edu.au>, 2007-01

Sage supports a wide range of specific free string monoids.

class sage.monoids.string_monoid.AlphabeticStringMonoid[source]¶

Bases: StringMonoid_class

The free alphabetic string monoid on generators A-Z.

EXAMPLES:

Sage

sage: S = AlphabeticStrings(); S
Free alphabetic string monoid on A-Z
sage: S.gen(0)
A
sage: S.gen(25)
Z
sage: S([ i for i in range(26) ])
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Python

>>> from sage.all import *
>>> S = AlphabeticStrings(); S
Free alphabetic string monoid on A-Z
>>> S.gen(Integer(0))
A
>>> S.gen(Integer(25))
Z
>>> S([ i for i in range(Integer(26)) ])
ABCDEFGHIJKLMNOPQRSTUVWXYZ

characteristic_frequency(table_name='beker_piper')[source]¶

Return a table of the characteristic frequency probability distribution of the English alphabet. In written English, various letters of the English alphabet occur more frequently than others. For example, the letter “E” appears more often than other vowels such as “A”, “I”, “O”, and “U”. In long works of written English such as books, the probability of a letter occurring tends to stabilize around a value. We call this value the characteristic frequency probability of the letter under consideration. When this probability is considered for each letter of the English alphabet, the resulting probabilities for all letters of this alphabet is referred to as the characteristic frequency probability distribution. Various studies report slightly different values for the characteristic frequency probability of an English letter. For instance, [Lew2000] reports that “E” has a characteristic frequency probability of 0.12702, while [BP1982] reports this value as 0.127. The concepts of characteristic frequency probability and characteristic frequency probability distribution can also be applied to non-empty alphabets other than the English alphabet.

The output of this method is different from that of the method frequency_distribution(). One can think of the characteristic frequency probability of an element in an alphabet \(A\) as the expected probability of that element occurring. Let \(S\) be a string encoded using elements of \(A\). The frequency probability distribution corresponding to \(S\) provides us with the frequency probability of each element of \(A\) as observed occurring in \(S\). Thus one distribution provides expected probabilities, while the other provides observed probabilities.

INPUT:

table_name – (default: 'beker_piper') the table of characteristic frequency probability distribution to use. The following tables are supported:
- 'beker_piper' – the table of characteristic frequency probability distribution by Beker and Piper [BP1982]. This is the default table to use.
- 'lewand' – the table of characteristic frequency probability distribution by Lewand as described on page 36 of [Lew2000].

OUTPUT:

A table of the characteristic frequency probability distribution of the English alphabet. This is a dictionary of letter/probability pairs.

EXAMPLES:

The characteristic frequency probability distribution table of Beker and Piper [BP1982]:

Sage

sage: A = AlphabeticStrings()
sage: table = A.characteristic_frequency(table_name='beker_piper')
sage: sorted(table.items())

[('A', 0.0820000000000000),
('B', 0.0150000000000000),
('C', 0.0280000000000000),
('D', 0.0430000000000000),
('E', 0.127000000000000),
('F', 0.0220000000000000),
('G', 0.0200000000000000),
('H', 0.0610000000000000),
('I', 0.0700000000000000),
('J', 0.00200000000000000),
('K', 0.00800000000000000),
('L', 0.0400000000000000),
('M', 0.0240000000000000),
('N', 0.0670000000000000),
('O', 0.0750000000000000),
('P', 0.0190000000000000),
('Q', 0.00100000000000000),
('R', 0.0600000000000000),
('S', 0.0630000000000000),
('T', 0.0910000000000000),
('U', 0.0280000000000000),
('V', 0.0100000000000000),
('W', 0.0230000000000000),
('X', 0.00100000000000000),
('Y', 0.0200000000000000),
('Z', 0.00100000000000000)]

Python

>>> from sage.all import *
>>> A = AlphabeticStrings()
>>> table = A.characteristic_frequency(table_name='beker_piper')
>>> sorted(table.items())
<BLANKLINE>
[('A', 0.0820000000000000),
('B', 0.0150000000000000),
('C', 0.0280000000000000),
('D', 0.0430000000000000),
('E', 0.127000000000000),
('F', 0.0220000000000000),
('G', 0.0200000000000000),
('H', 0.0610000000000000),
('I', 0.0700000000000000),
('J', 0.00200000000000000),
('K', 0.00800000000000000),
('L', 0.0400000000000000),
('M', 0.0240000000000000),
('N', 0.0670000000000000),
('O', 0.0750000000000000),
('P', 0.0190000000000000),
('Q', 0.00100000000000000),
('R', 0.0600000000000000),
('S', 0.0630000000000000),
('T', 0.0910000000000000),
('U', 0.0280000000000000),
('V', 0.0100000000000000),
('W', 0.0230000000000000),
('X', 0.00100000000000000),
('Y', 0.0200000000000000),
('Z', 0.00100000000000000)]

The characteristic frequency probability distribution table of Lewand [Lew2000]:

Sage

sage: table = A.characteristic_frequency(table_name='lewand')
sage: sorted(table.items())

[('A', 0.0816700000000000),
('B', 0.0149200000000000),
('C', 0.0278200000000000),
('D', 0.0425300000000000),
('E', 0.127020000000000),
('F', 0.0222800000000000),
('G', 0.0201500000000000),
('H', 0.0609400000000000),
('I', 0.0696600000000000),
('J', 0.00153000000000000),
('K', 0.00772000000000000),
('L', 0.0402500000000000),
('M', 0.0240600000000000),
('N', 0.0674900000000000),
('O', 0.0750700000000000),
('P', 0.0192900000000000),
('Q', 0.000950000000000000),
('R', 0.0598700000000000),
('S', 0.0632700000000000),
('T', 0.0905600000000000),
('U', 0.0275800000000000),
('V', 0.00978000000000000),
('W', 0.0236000000000000),
('X', 0.00150000000000000),
('Y', 0.0197400000000000),
('Z', 0.000740000000000000)]

Python

>>> from sage.all import *
>>> table = A.characteristic_frequency(table_name='lewand')
>>> sorted(table.items())
<BLANKLINE>
[('A', 0.0816700000000000),
('B', 0.0149200000000000),
('C', 0.0278200000000000),
('D', 0.0425300000000000),
('E', 0.127020000000000),
('F', 0.0222800000000000),
('G', 0.0201500000000000),
('H', 0.0609400000000000),
('I', 0.0696600000000000),
('J', 0.00153000000000000),
('K', 0.00772000000000000),
('L', 0.0402500000000000),
('M', 0.0240600000000000),
('N', 0.0674900000000000),
('O', 0.0750700000000000),
('P', 0.0192900000000000),
('Q', 0.000950000000000000),
('R', 0.0598700000000000),
('S', 0.0632700000000000),
('T', 0.0905600000000000),
('U', 0.0275800000000000),
('V', 0.00978000000000000),
('W', 0.0236000000000000),
('X', 0.00150000000000000),
('Y', 0.0197400000000000),
('Z', 0.000740000000000000)]

Illustrating the difference between characteristic_frequency() and frequency_distribution():

Sage

sage: A = AlphabeticStrings()
sage: M = A.encoding("abcd")
sage: FD = M.frequency_distribution().function()
sage: sorted(FD.items())

[(A, 0.250000000000000),
(B, 0.250000000000000),
(C, 0.250000000000000),
(D, 0.250000000000000)]
sage: CF = A.characteristic_frequency()
sage: sorted(CF.items())

[('A', 0.0820000000000000),
('B', 0.0150000000000000),
('C', 0.0280000000000000),
('D', 0.0430000000000000),
('E', 0.127000000000000),
('F', 0.0220000000000000),
('G', 0.0200000000000000),
('H', 0.0610000000000000),
('I', 0.0700000000000000),
('J', 0.00200000000000000),
('K', 0.00800000000000000),
('L', 0.0400000000000000),
('M', 0.0240000000000000),
('N', 0.0670000000000000),
('O', 0.0750000000000000),
('P', 0.0190000000000000),
('Q', 0.00100000000000000),
('R', 0.0600000000000000),
('S', 0.0630000000000000),
('T', 0.0910000000000000),
('U', 0.0280000000000000),
('V', 0.0100000000000000),
('W', 0.0230000000000000),
('X', 0.00100000000000000),
('Y', 0.0200000000000000),
('Z', 0.00100000000000000)]

Python

>>> from sage.all import *
>>> A = AlphabeticStrings()
>>> M = A.encoding("abcd")
>>> FD = M.frequency_distribution().function()
>>> sorted(FD.items())
<BLANKLINE>
[(A, 0.250000000000000),
(B, 0.250000000000000),
(C, 0.250000000000000),
(D, 0.250000000000000)]
>>> CF = A.characteristic_frequency()
>>> sorted(CF.items())
<BLANKLINE>
[('A', 0.0820000000000000),
('B', 0.0150000000000000),
('C', 0.0280000000000000),
('D', 0.0430000000000000),
('E', 0.127000000000000),
('F', 0.0220000000000000),
('G', 0.0200000000000000),
('H', 0.0610000000000000),
('I', 0.0700000000000000),
('J', 0.00200000000000000),
('K', 0.00800000000000000),
('L', 0.0400000000000000),
('M', 0.0240000000000000),
('N', 0.0670000000000000),
('O', 0.0750000000000000),
('P', 0.0190000000000000),
('Q', 0.00100000000000000),
('R', 0.0600000000000000),
('S', 0.0630000000000000),
('T', 0.0910000000000000),
('U', 0.0280000000000000),
('V', 0.0100000000000000),
('W', 0.0230000000000000),
('X', 0.00100000000000000),
('Y', 0.0200000000000000),
('Z', 0.00100000000000000)]

encoding(S)[source]¶

The encoding of the string S in the alphabetic string monoid, obtained by the monoid homomorphism

A -> A, ..., Z -> Z, a -> A, ..., z -> Z

and stripping away all other characters. It should be noted that this is a non-injective monoid homomorphism.

EXAMPLES:

Sage

sage: S = AlphabeticStrings()
sage: s = S.encoding("The cat in the hat."); s
THECATINTHEHAT
sage: s.decoding()
'THECATINTHEHAT'

Python

>>> from sage.all import *
>>> S = AlphabeticStrings()
>>> s = S.encoding("The cat in the hat."); s
THECATINTHEHAT
>>> s.decoding()
'THECATINTHEHAT'

sage.monoids.string_monoid.AlphabeticStrings[source]¶: alias of AlphabeticStringMonoid

class sage.monoids.string_monoid.BinaryStringMonoid[source]¶

Bases: StringMonoid_class

The free binary string monoid on generators \(\{ 0, 1 \}\).

encoding(S, padic=False)[source]¶

The binary encoding of the string S, as a binary string element.

The default is to keep the standard ASCII byte encoding, e.g.

A = 65 -> 01000001
B = 66 -> 01000010
.
.
.
Z = 90 -> 01001110

rather than a 2-adic representation 65 -> 10000010.

Set padic=True to reverse the bit string.

EXAMPLES:

Sage

sage: S = BinaryStrings()
sage: S.encoding('A')
01000001
sage: S.encoding('A',padic=True)
10000010
sage: S.encoding(' ',padic=True)
00000100

Python

>>> from sage.all import *
>>> S = BinaryStrings()
>>> S.encoding('A')
01000001
>>> S.encoding('A',padic=True)
10000010
>>> S.encoding(' ',padic=True)
00000100

sage.monoids.string_monoid.BinaryStrings[source]¶: alias of BinaryStringMonoid

class sage.monoids.string_monoid.HexadecimalStringMonoid[source]¶

Bases: StringMonoid_class

The free hexadecimal string monoid on generators \(\{ 0, 1, \dots, 9, a, b, c, d, e, f \}\).

encoding(S, padic=False)[source]¶

The encoding of the string S as a hexadecimal string element.

The default is to keep the standard right-to-left byte encoding, e.g.

A = '\x41' -> 41
B = '\x42' -> 42
.
.
.
Z = '\x5a' -> 5a

rather than a left-to-right representation A = 65 -> 14. Although standard (e.g., in the Python constructor ‘xhh’), this can be confusing when the string reads left-to-right.

Set padic=True to reverse the character encoding.

EXAMPLES:

Sage

sage: S = HexadecimalStrings()
sage: S.encoding('A')
41
sage: S.encoding('A',padic=True)
14
sage: S.encoding(' ',padic=False)
20
sage: S.encoding(' ',padic=True)
02

Python

>>> from sage.all import *
>>> S = HexadecimalStrings()
>>> S.encoding('A')
41
>>> S.encoding('A',padic=True)
14
>>> S.encoding(' ',padic=False)
20
>>> S.encoding(' ',padic=True)
02

sage.monoids.string_monoid.HexadecimalStrings[source]¶: alias of HexadecimalStringMonoid

class sage.monoids.string_monoid.OctalStringMonoid[source]¶

Bases: StringMonoid_class

The free octal string monoid on generators \(\{ 0, 1, \dots, 7 \}\).

sage.monoids.string_monoid.OctalStrings[source]¶: alias of OctalStringMonoid

class sage.monoids.string_monoid.Radix64StringMonoid[source]¶

Bases: StringMonoid_class

The free radix 64 string monoid on 64 generators.

sage.monoids.string_monoid.Radix64Strings[source]¶: alias of Radix64StringMonoid

class sage.monoids.string_monoid.StringMonoid_class(n, alphabet=())[source]¶

Bases: FreeMonoid

A free string monoid on \(n\) generators.

alphabet()[source]¶

gen(i=0)[source]¶

The \(i\)-th generator of the monoid.

INPUT:

i – integer (default: 0)

EXAMPLES:

Sage

sage: S = BinaryStrings()
sage: S.gen(0)
0
sage: S.gen(1)
1
sage: S.gen(2)
Traceback (most recent call last):
...
IndexError: Argument i (= 2) must be between 0 and 1.
sage: S = HexadecimalStrings()
sage: S.gen(0)
0
sage: S.gen(12)
c
sage: S.gen(16)
Traceback (most recent call last):
...
IndexError: Argument i (= 16) must be between 0 and 15.

Python

>>> from sage.all import *
>>> S = BinaryStrings()
>>> S.gen(Integer(0))
0
>>> S.gen(Integer(1))
1
>>> S.gen(Integer(2))
Traceback (most recent call last):
...
IndexError: Argument i (= 2) must be between 0 and 1.
>>> S = HexadecimalStrings()
>>> S.gen(Integer(0))
0
>>> S.gen(Integer(12))
c
>>> S.gen(Integer(16))
Traceback (most recent call last):
...
IndexError: Argument i (= 16) must be between 0 and 15.

one()[source]¶

Return the identity element of self.

EXAMPLES:

Sage

sage: b = BinaryStrings(); b
Free binary string monoid
sage: b.one() * b('1011')
1011
sage: b.one() * b('110') == b('110')
True
sage: b('10101') * b.one() == b('101011')
False

Python

>>> from sage.all import *
>>> b = BinaryStrings(); b
Free binary string monoid
>>> b.one() * b('1011')
1011
>>> b.one() * b('110') == b('110')
True
>>> b('10101') * b.one() == b('101011')
False