Huffman encoding#
This module implements functionalities relating to Huffman encoding and decoding.
AUTHOR:
Nathann Cohen (2010-05): initial version.
Classes and functions#
- class sage.coding.source_coding.huffman.Huffman(source)[source]#
Bases:
SageObject
This class implements the basic functionalities of Huffman codes.
It can build a Huffman code from a given string, or from the information of a dictionary associating to each key (the elements of the alphabet) a weight (most of the time, a probability value or a number of occurrences).
INPUT:
source
– can be eitherA string from which the Huffman encoding should be created.
A dictionary that associates to each symbol of an alphabet a numeric value. If we consider the frequency of each alphabetic symbol, then
source
is considered as the frequency table of the alphabet with each numeric (non-negative integer) value being the number of occurrences of a symbol. The numeric values can also represent weights of the symbols. In that case, the numeric values are not necessarily integers, but can be real numbers.
In order to construct a Huffman code for an alphabet, we use exactly one of the following methods:
Let
source
be a string of symbols over an alphabet and feedsource
to the constructor of this class. Based on the input string, a frequency table is constructed that contains the frequency of each unique symbol insource
. The alphabet in question is then all the unique symbols insource
. A significant implication of this is that any subsequent string that we want to encode must contain only symbols that can be found insource
.Let
source
be the frequency table of an alphabet. We can feed this table to the constructor of this class. The tablesource
can be a table of frequencies or a table of weights.
In either case, the alphabet must consist of at least two symbols.
EXAMPLES:
sage: from sage.coding.source_coding.huffman import Huffman, frequency_table sage: h1 = Huffman("There once was a french fry") sage: for letter, code in sorted(h1.encoding_table().items()): ....: print("'{}' : {}".format(letter, code)) ' ' : 00 'T' : 11100 'a' : 0111 'c' : 1010 'e' : 100 'f' : 1011 'h' : 1100 'n' : 1101 'o' : 11101 'r' : 010 's' : 11110 'w' : 11111 'y' : 0110
>>> from sage.all import * >>> from sage.coding.source_coding.huffman import Huffman, frequency_table >>> h1 = Huffman("There once was a french fry") >>> for letter, code in sorted(h1.encoding_table().items()): ... print("'{}' : {}".format(letter, code)) ' ' : 00 'T' : 11100 'a' : 0111 'c' : 1010 'e' : 100 'f' : 1011 'h' : 1100 'n' : 1101 'o' : 11101 'r' : 010 's' : 11110 'w' : 11111 'y' : 0110
We can obtain the same result by “training” the Huffman code with the following table of frequency:
sage: ft = frequency_table("There once was a french fry") sage: sorted(ft.items()) [(' ', 5), ('T', 1), ('a', 2), ('c', 2), ('e', 4), ('f', 2), ('h', 2), ('n', 2), ('o', 1), ('r', 3), ('s', 1), ('w', 1), ('y', 1)] sage: h2 = Huffman(ft)
>>> from sage.all import * >>> ft = frequency_table("There once was a french fry") >>> sorted(ft.items()) [(' ', 5), ('T', 1), ('a', 2), ('c', 2), ('e', 4), ('f', 2), ('h', 2), ('n', 2), ('o', 1), ('r', 3), ('s', 1), ('w', 1), ('y', 1)] >>> h2 = Huffman(ft)
Once
h1
has been trained, and hence possesses an encoding table, it is possible to obtain the Huffman encoding of any string (possibly the same) using this code:sage: encoded = h1.encode("There once was a french fry"); encoded '11100110010001010000111011101101010000111110111111100001110010110101001101101011000010110100110'
>>> from sage.all import * >>> encoded = h1.encode("There once was a french fry"); encoded '11100110010001010000111011101101010000111110111111100001110010110101001101101011000010110100110'
We can decode the above encoded string in the following way:
sage: h1.decode(encoded) 'There once was a french fry'
>>> from sage.all import * >>> h1.decode(encoded) 'There once was a french fry'
Obviously, if we try to decode a string using a Huffman instance which has been trained on a different sample (and hence has a different encoding table), we are likely to get some random-looking string:
sage: h3 = Huffman("There once were two french fries") sage: h3.decode(encoded) ' eierhffcoeft TfewrnwrTrsc'
>>> from sage.all import * >>> h3 = Huffman("There once were two french fries") >>> h3.decode(encoded) ' eierhffcoeft TfewrnwrTrsc'
This does not look like our original string.
Instead of using frequency, we can assign weights to each alphabetic symbol:
sage: from sage.coding.source_coding.huffman import Huffman sage: T = {"a":45, "b":13, "c":12, "d":16, "e":9, "f":5} sage: H = Huffman(T) sage: L = ["deaf", "bead", "fab", "bee"] sage: E = [] sage: for e in L: ....: E.append(H.encode(e)) ....: print(E[-1]) 111110101100 10111010111 11000101 10111011101 sage: D = [] sage: for e in E: ....: D.append(H.decode(e)) ....: print(D[-1]) deaf bead fab bee sage: D == L True
>>> from sage.all import * >>> from sage.coding.source_coding.huffman import Huffman >>> T = {"a":Integer(45), "b":Integer(13), "c":Integer(12), "d":Integer(16), "e":Integer(9), "f":Integer(5)} >>> H = Huffman(T) >>> L = ["deaf", "bead", "fab", "bee"] >>> E = [] >>> for e in L: ... E.append(H.encode(e)) ... print(E[-Integer(1)]) 111110101100 10111010111 11000101 10111011101 >>> D = [] >>> for e in E: ... D.append(H.decode(e)) ... print(D[-Integer(1)]) deaf bead fab bee >>> D == L True
- decode(string)[source]#
Decode the given string using the current encoding table.
INPUT:
string
– a string of Huffman encodings.
OUTPUT:
The Huffman decoding of
string
.
EXAMPLES:
This is how a string is encoded and then decoded:
sage: from sage.coding.source_coding.huffman import Huffman sage: str = "Sage is my most favorite general purpose computer algebra system" sage: h = Huffman(str) sage: encoded = h.encode(str); encoded '11000011010001010101100001111101001110011101001101101111011110111001111010000101101110100000111010101000101000000010111011011000110100101001011100010011011110101011100100110001100101001001110101110101110110001000101011000111101101101111110011111101110100011' sage: h.decode(encoded) 'Sage is my most favorite general purpose computer algebra system'
>>> from sage.all import * >>> from sage.coding.source_coding.huffman import Huffman >>> str = "Sage is my most favorite general purpose computer algebra system" >>> h = Huffman(str) >>> encoded = h.encode(str); encoded '11000011010001010101100001111101001110011101001101101111011110111001111010000101101110100000111010101000101000000010111011011000110100101001011100010011011110101011100100110001100101001001110101110101110110001000101011000111101101101111110011111101110100011' >>> h.decode(encoded) 'Sage is my most favorite general purpose computer algebra system'
- encode(string)[source]#
Encode the given string based on the current encoding table.
INPUT:
string
– a string of symbols over an alphabet.
OUTPUT:
A Huffman encoding of
string
.
EXAMPLES:
This is how a string is encoded and then decoded:
sage: from sage.coding.source_coding.huffman import Huffman sage: str = "Sage is my most favorite general purpose computer algebra system" sage: h = Huffman(str) sage: encoded = h.encode(str); encoded '11000011010001010101100001111101001110011101001101101111011110111001111010000101101110100000111010101000101000000010111011011000110100101001011100010011011110101011100100110001100101001001110101110101110110001000101011000111101101101111110011111101110100011' sage: h.decode(encoded) 'Sage is my most favorite general purpose computer algebra system'
>>> from sage.all import * >>> from sage.coding.source_coding.huffman import Huffman >>> str = "Sage is my most favorite general purpose computer algebra system" >>> h = Huffman(str) >>> encoded = h.encode(str); encoded '11000011010001010101100001111101001110011101001101101111011110111001111010000101101110100000111010101000101000000010111011011000110100101001011100010011011110101011100100110001100101001001110101110101110110001000101011000111101101101111110011111101110100011' >>> h.decode(encoded) 'Sage is my most favorite general purpose computer algebra system'
- encoding_table()[source]#
Returns the current encoding table.
INPUT:
None.
OUTPUT:
A dictionary associating an alphabetic symbol to a Huffman encoding.
EXAMPLES:
sage: from sage.coding.source_coding.huffman import Huffman sage: str = "Sage is my most favorite general purpose computer algebra system" sage: h = Huffman(str) sage: T = sorted(h.encoding_table().items()) sage: for symbol, code in T: ....: print("{} {}".format(symbol, code)) 101 S 110000 a 1101 b 110001 c 110010 e 010 f 110011 g 0001 i 10000 l 10001 m 0011 n 00000 o 0110 p 0010 r 1110 s 1111 t 0111 u 10010 v 00001 y 10011
>>> from sage.all import * >>> from sage.coding.source_coding.huffman import Huffman >>> str = "Sage is my most favorite general purpose computer algebra system" >>> h = Huffman(str) >>> T = sorted(h.encoding_table().items()) >>> for symbol, code in T: ... print("{} {}".format(symbol, code)) 101 S 110000 a 1101 b 110001 c 110010 e 010 f 110011 g 0001 i 10000 l 10001 m 0011 n 00000 o 0110 p 0010 r 1110 s 1111 t 0111 u 10010 v 00001 y 10011
- tree()[source]#
Returns the Huffman tree corresponding to the current encoding.
INPUT:
None.
OUTPUT:
The binary tree representing a Huffman code.
EXAMPLES:
sage: from sage.coding.source_coding.huffman import Huffman sage: str = "Sage is my most favorite general purpose computer algebra system" sage: h = Huffman(str) sage: T = h.tree(); T # needs sage.graphs Digraph on 39 vertices sage: T.show(figsize=[20,20]) # needs sage.graphs sage.plot
>>> from sage.all import * >>> from sage.coding.source_coding.huffman import Huffman >>> str = "Sage is my most favorite general purpose computer algebra system" >>> h = Huffman(str) >>> T = h.tree(); T # needs sage.graphs Digraph on 39 vertices >>> T.show(figsize=[Integer(20),Integer(20)]) # needs sage.graphs sage.plot <BLANKLINE>
- sage.coding.source_coding.huffman.frequency_table(string)[source]#
Return the frequency table corresponding to the given string.
INPUT:
string
– a string of symbols over some alphabet.
OUTPUT:
A table of frequency of each unique symbol in
string
. Ifstring
is an empty string, return an empty table.
EXAMPLES:
The frequency table of a non-empty string:
sage: from sage.coding.source_coding.huffman import frequency_table sage: str = "Stop counting my characters!" sage: T = sorted(frequency_table(str).items()) sage: for symbol, code in T: ....: print("{} {}".format(symbol, code)) 3 ! 1 S 1 a 2 c 3 e 1 g 1 h 1 i 1 m 1 n 2 o 2 p 1 r 2 s 1 t 3 u 1 y 1
>>> from sage.all import * >>> from sage.coding.source_coding.huffman import frequency_table >>> str = "Stop counting my characters!" >>> T = sorted(frequency_table(str).items()) >>> for symbol, code in T: ... print("{} {}".format(symbol, code)) 3 ! 1 S 1 a 2 c 3 e 1 g 1 h 1 i 1 m 1 n 2 o 2 p 1 r 2 s 1 t 3 u 1 y 1
The frequency of an empty string:
sage: frequency_table("") defaultdict(<... 'int'>, {})
>>> from sage.all import * >>> frequency_table("") defaultdict(<... 'int'>, {})