# The Sage Preparser¶

AUTHORS:

• William Stein (2006-02-19)
• William Stein (2006-03-09)
• Fixed crash in parsing exponentials.
• Precision of real literals now determined by digits of input (like Mathematica).
• Joe Wetherell (2006-04-14)
• Bobby Moretti (2007-01-25)
• Added preliminary function assignment notation.
• Added strip_string_literals, containing_block utility functions. Arrr!
• Implicit multiplication (off by default).
• Factor out constants.
• Simplify preparser by making it modular and using regular expressions.
• Bug fixes, complex numbers, and binary input.

EXAMPLES:

Preparsing:

sage: preparse('2/3')
'Integer(2)/Integer(3)'
sage: preparse('2.5')
"RealNumber('2.5')"
sage: preparse('2^3')
'Integer(2)**Integer(3)'
sage: preparse('a^b')            # exponent
'a**b'
sage: preparse('a**b')
'a**b'
sage: preparse('G.0')            # generator
'G.gen(0)'
sage: preparse('a = 939393R')    # raw
'a = 939393'
sage: implicit_multiplication(True)
sage: preparse('a b c in L')     # implicit multiplication
'a*b*c in L'
sage: preparse('2e3x + 3exp(y)')
"RealNumber('2e3')*x + Integer(3)*exp(y)"


A string with escaped quotes in it (the point here is that the preparser doesn’t get confused by the internal quotes):

sage: ""Yes," he said."
'"Yes," he said.'
sage: s = "\"; s
'\'


A hex literal:

sage: preparse('0x2e3')
'Integer(0x2e3)'
sage: 0xA
10
sage: 0xe
14


Raw and hex work correctly:

sage: type(0xa1)
<type 'sage.rings.integer.Integer'>
sage: type(0xa1r)
<... 'int'>
sage: type(0Xa1R)
<... 'int'>


In Sage, methods can also be called on integer and real literals (note that in pure Python this would be a syntax error):

sage: 16.sqrt()
4
sage: 87.factor()
3 * 29
sage: 15.10.sqrt()
3.88587184554509
sage: preparse('87.sqrt()')
'Integer(87).sqrt()'
sage: preparse('15.10.sqrt()')
"RealNumber('15.10').sqrt()"


Note that calling methods on int literals in pure Python is a syntax error, but Sage allows this for Sage integers and reals, because users frequently request it:

sage: eval('4.__add__(3)')
Traceback (most recent call last):
...
SyntaxError: invalid syntax


Symbolic functional notation:

sage: a=10; f(theta, beta) = theta + beta; b = x^2 + theta
sage: f
(theta, beta) |--> beta + theta
sage: a
10
sage: b
x^2 + theta
sage: f(theta,theta)
2*theta

sage: a = 5; f(x,y) = x*y*sqrt(a)
sage: f
(x, y) |--> sqrt(5)*x*y


This involves an =-, but should still be turned into a symbolic expression:

sage: preparse('a(x) =- 5')
'__tmp__=var("x"); a = symbolic_expression(- Integer(5)).function(x)'
sage: f(x)=-x
sage: f(10)
-10


This involves -=, which should not be turned into a symbolic expression (of course a(x) isn’t an identifier, so this will never be valid):

sage: preparse('a(x) -= 5')
'a(x) -= Integer(5)'


Raw literals:

Raw literals are not preparsed, which can be useful from an efficiency point of view. Just like Python ints are denoted by an L, in Sage raw integer and floating literals are followed by an”r” (or “R”) for raw, meaning not preparsed.

We create a raw integer:

sage: a = 393939r
sage: a
393939
sage: type(a)
<... 'int'>


We create a raw float:

sage: z = 1.5949r
sage: z
1.5949
sage: type(z)
<... 'float'>


You can also use an upper case letter:

sage: z = 3.1415R
sage: z
3.1415
sage: type(z)
<... 'float'>


This next example illustrates how raw literals can be very useful in certain cases. We make a list of even integers up to 10000:

sage: v = [ 2*i for i in range(10000)]


This takes a noticeable fraction of a second (e.g., 0.25 seconds). After preparsing, what Python is really executing is the following:

sage: preparse('v = [ 2*i for i in range(10000)]')
'v = [ Integer(2)*i for i in range(Integer(10000))]'


If instead we use a raw 2 we get execution that is instant (0.00 seconds):

sage: v = [ 2r * i for i in range(10000r)]


Behind the scenes what happens is the following:

sage: preparse('v = [ 2r * i for i in range(10000r)]')
'v = [ 2 * i for i in range(10000)]'


Warning

The results of the above two expressions are different. The first one computes a list of Sage integers, whereas the second creates a list of Python integers. Python integers are typically much more efficient than Sage integers when they are very small; large Sage integers are much more efficient than Python integers, since they are implemented using the GMP C library.

sage.repl.preparse.containing_block(code, idx, delimiters=['()', '[]', '{}'], require_delim=True)

Find the code block given by balanced delimiters that contains the position idx.

INPUT:

• code - a string
• idx - an integer; a starting position
• delimiters - a list of strings (default: [‘()’, ‘[]’, ‘{}’]); the delimiters to balance. A delimiter must be a single character and no character can at the same time be opening and closing delimiter.
• require_delim - a boolean (default: True); whether to raise a SyntaxError if delimiters are present. If the delimiters are unbalanced, an error will be raised in any case.

OUTPUT:

• a 2-tuple (a,b) of integers, such that code[a:b] is delimited by balanced delimiters, a<=idx<b, and a is maximal and b is minimal with that property. If that does not exist, a SyntaxError is raised.
• If require_delim is false and a,b as above can not be found, then 0, len(code) is returned.

EXAMPLES:

sage: from sage.repl.preparse import containing_block
sage: s = "factor(next_prime(L[5]+1))"
sage: s[22]
'+'
sage: start, end = containing_block(s, 22)
sage: start, end
(17, 25)
sage: s[start:end]
'(L[5]+1)'
sage: s[20]
'5'
sage: start, end = containing_block(s, 20); s[start:end]
'[5]'
sage: start, end = containing_block(s, 20, delimiters=['()']); s[start:end]
'(L[5]+1)'
sage: start, end = containing_block(s, 10); s[start:end]
'(next_prime(L[5]+1))'

sage.repl.preparse.extract_numeric_literals(code)

Pulls out numeric literals and assigns them to global variables. This eliminates the need to re-parse and create the literals, e.g., during every iteration of a loop.

INPUT:

• code - a string; a block of code

OUTPUT:

• a (string, string:string dictionary) 2-tuple; the block with literals replaced by variable names and a mapping from names to the new variables

EXAMPLES:

sage: from sage.repl.preparse import extract_numeric_literals
sage: code, nums = extract_numeric_literals("1.2 + 5")
sage: print(code)
_sage_const_1p2  + _sage_const_5
sage: print(nums)
{'_sage_const_1p2': "RealNumber('1.2')", '_sage_const_5': 'Integer(5)'}

sage: extract_numeric_literals("[1, 1.1, 1e1, -1e-1, 1.]")[0]
'[_sage_const_1 , _sage_const_1p1 , _sage_const_1e1 , -_sage_const_1en1 , _sage_const_1p ]'

sage: extract_numeric_literals("[1.sqrt(), 1.2.sqrt(), 1r, 1.2r, R.1, R0.1, (1..5)]")[0]
'[_sage_const_1 .sqrt(), _sage_const_1p2 .sqrt(), 1 , 1.2 , R.1, R0.1, (_sage_const_1 .._sage_const_5 )]'

sage.repl.preparse.handle_encoding_declaration(contents, out)

Find a PEP 263-style Python encoding declaration in the first or second line of contents. If found, output it to out and return contents without the encoding line; otherwise output a default UTF-8 declaration and return contents.

EXAMPLES:

sage: from sage.repl.preparse import handle_encoding_declaration
sage: import sys
sage: c1='# -*- coding: latin-1 -*-\nimport os, sys\n...'
sage: c2='# -*- coding: iso-8859-15 -*-\nimport os, sys\n...'
sage: c3='# -*- coding: ascii -*-\nimport os, sys\n...'
sage: c4='import os, sys\n...'
sage: handle_encoding_declaration(c1, sys.stdout)
# -*- coding: latin-1 -*-
'import os, sys\n...'
sage: handle_encoding_declaration(c2, sys.stdout)
# -*- coding: iso-8859-15 -*-
'import os, sys\n...'
sage: handle_encoding_declaration(c3, sys.stdout)
# -*- coding: ascii -*-
'import os, sys\n...'
sage: handle_encoding_declaration(c4, sys.stdout)
# -*- coding: utf-8 -*-
'import os, sys\n...'


NOTES:

• PEP 263: http://www.python.org/dev/peps/pep-0263/
• PEP 263 says that Python will interpret a UTF-8 byte order mark as a declaration of UTF-8 encoding, but I don’t think we do that; this function only sees a Python string so it can’t account for a BOM.
• We default to UTF-8 encoding even though PEP 263 says that Python files should default to ASCII.
• Also see http://docs.python.org/ref/encodings.html.

AUTHORS:

sage.repl.preparse.implicit_mul(code, level=5)

Inserts *’s to make implicit multiplication explicit.

INPUT:

• code – a string; the code with missing *’s
• level – an integer (default: 5); how aggressive to be in placing *’s
• 0 - Do nothing
• 1 - Numeric followed by alphanumeric
• 2 - Closing parentheses followed by alphanumeric
• 3 - Spaces between alphanumeric
• 10 - Adjacent parentheses (may mangle call statements)

OUTPUT:

• a string

EXAMPLES:

sage: from sage.repl.preparse import implicit_mul
sage: implicit_mul('(2x^2-4x+3)a0')
'(2*x^2-4*x+3)*a0'
sage: implicit_mul('a b c in L')
'a*b*c in L'
sage: implicit_mul('1r + 1e3 + 5exp(2)')
'1r + 1e3 + 5*exp(2)'
sage: implicit_mul('f(a)(b)', level=10)
'f(a)*(b)'

sage.repl.preparse.implicit_multiplication(level=None)

Turns implicit multiplication on or off, optionally setting a specific level. Returns the current level if no argument is given.

INPUT:

EXAMPLES:

sage: implicit_multiplication(True)
sage: implicit_multiplication()
5
sage: preparse('2x')
'Integer(2)*x'
sage: implicit_multiplication(False)
sage: preparse('2x')
'2x'

sage.repl.preparse.in_quote()
sage.repl.preparse.isalphadigit_(s)

Return True if s is a non-empty string of alphabetic characters or a non-empty string of digits or just a single _

EXAMPLES:

sage: from sage.repl.preparse import isalphadigit_
True
True
True
False

sage.repl.preparse.parse_ellipsis(code, preparse_step=True)

Preparses [0,2,..,n] notation.

INPUT:

• code - a string
• preparse_step - a boolean (default: True)

OUTPUT:

• a string

EXAMPLES:

sage: from sage.repl.preparse import parse_ellipsis
sage: parse_ellipsis("[1,2,..,n]")
'(ellipsis_range(1,2,Ellipsis,n))'
sage: parse_ellipsis("for i in (f(x) .. L[10]):")
'for i in (ellipsis_iter(f(x) ,Ellipsis, L[10])):'
sage: [1.0..2.0]
[1.00000000000000, 2.00000000000000]

sage.repl.preparse.preparse(line, reset=True, do_time=False, ignore_prompts=False, numeric_literals=True)

Preparses a line of input.

INPUT:

• line - a string
• reset - a boolean (default: True)
• do_time - a boolean (default: False)
• ignore_prompts - a boolean (default: False)
• numeric_literals - a boolean (default: True)

OUTPUT:

• a string

EXAMPLES:

sage: preparse("ZZ.<x> = ZZ['x']")
"ZZ = ZZ['x']; (x,) = ZZ._first_ngens(1)"
sage: preparse("ZZ.<x> = ZZ['y']")
"ZZ = ZZ['y']; (x,) = ZZ._first_ngens(1)"
sage: preparse("ZZ.<x,y> = ZZ[]")
"ZZ = ZZ['x, y']; (x, y,) = ZZ._first_ngens(2)"
sage: preparse("ZZ.<x,y> = ZZ['u,v']")
"ZZ = ZZ['u,v']; (x, y,) = ZZ._first_ngens(2)"
sage: preparse("ZZ.<x> = QQ[2^(1/3)]")
'ZZ = QQ[Integer(2)**(Integer(1)/Integer(3))]; (x,) = ZZ._first_ngens(1)'
sage: QQ[2^(1/3)]
Number Field in a with defining polynomial x^3 - 2

sage: preparse("a^b")
'a**b'
sage: preparse("a^^b")
'a^b'
sage: 8^1
8
sage: 8^^1
9
sage: 9^^1
8

sage: preparse("A \ B")
'A  * BackslashOperator() * B'
sage: preparse("A^2 \ B + C")
'A**Integer(2)  * BackslashOperator() * B + C'
sage: preparse("a \\ b \\") # There is really only one backslash here, it's just being escaped.
'a  * BackslashOperator() * b \\'

sage: preparse("time R.<x> = ZZ[]", do_time=True)
'__time__=misc.cputime(); __wall__=misc.walltime(); R = ZZ[\'x\']; print("Time: CPU %.2f s, Wall: %.2f s"%(misc.cputime(__time__), misc.walltime(__wall__))); (x,) = R._first_ngens(1)'

sage.repl.preparse.preparse_calculus(code)

Supports calculus-like function assignment, e.g., transforms:

f(x,y,z) = sin(x^3 - 4*y) + y^x


into:

__tmp__=var("x,y,z")
f = symbolic_expression(sin(x**3 - 4*y) + y**x).function(x,y,z)


AUTHORS:

• Bobby Moretti
• Initial version - 02/2007
• William Stein
• Make variables become defined if they aren’t already defined.
• Rewrite using regular expressions (01/2009)

EXAMPLES:

sage: preparse("f(x) = x^3-x")
'__tmp__=var("x"); f = symbolic_expression(x**Integer(3)-x).function(x)'
sage: preparse("f(u,v) = u - v")
'__tmp__=var("u,v"); f = symbolic_expression(u - v).function(u,v)'
sage: preparse("f(x) =-5")
'__tmp__=var("x"); f = symbolic_expression(-Integer(5)).function(x)'
sage: preparse("f(x) -= 5")
'f(x) -= Integer(5)'
sage: preparse("f(x_1, x_2) = x_1^2 - x_2^2")
'__tmp__=var("x_1,x_2"); f = symbolic_expression(x_1**Integer(2) - x_2**Integer(2)).function(x_1,x_2)'


For simplicity, this function assumes all statements begin and end with a semicolon:

sage: from sage.repl.preparse import preparse_calculus
sage: preparse_calculus(";f(t,s)=t^2;")
';__tmp__=var("t,s"); f = symbolic_expression(t^2).function(t,s);'
sage: preparse_calculus(";f( t , s ) = t^2;")
';__tmp__=var("t,s"); f = symbolic_expression(t^2).function(t,s);'

sage.repl.preparse.preparse_file(contents, globals=None, numeric_literals=True)

Preparses input, attending to numeric literals and load/attach file directives.

Note

Temporarily, if @parallel is in the input, then numeric_literals is always set to False.

INPUT:

• contents - a string
• globals - dict or None (default: None); if given, then arguments to load/attach are evaluated in the namespace of this dict.
• numeric_literals - bool (default: True), whether to factor out wrapping of integers and floats, so they don’t get created repeatedly inside loops

OUTPUT:

• a string
sage.repl.preparse.preparse_file_named(name)

Preparse file named code{name} (presumably a .sage file), outputting to a temporary file. Returns name of temporary file.

sage.repl.preparse.preparse_file_named_to_stream(name, out)

Preparse file named code{name} (presumably a .sage file), outputting to stream code{out}.

sage.repl.preparse.preparse_generators(code)

Parses generator syntax, converting:

obj.<gen0,gen1,...,genN> = objConstructor(...)


into:

obj = objConstructor(..., names=("gen0", "gen1", ..., "genN"))
(gen0, gen1, ..., genN,) = obj.gens()


and:

obj.<gen0,gen1,...,genN> = R[interior]


into:

obj = R[interior]; (gen0, gen1, ..., genN,) = obj.gens()


INPUT:

• code - a string

OUTPUT:

• a string

LIMITATIONS:

• The entire constructor must be on one line.

AUTHORS:

• 2006-04-14: Joe Wetherell (jlwether@alum.mit.edu)
• Initial version.
• 2006-04-17: William Stein
• Improvements to allow multiple statements.
• 2006-05-01: William
• Fix bug that Joe found.
• 2006-10-31: William
• Fix so obj doesn’t have to be mutated.
• Rewrite using regular expressions
sage.repl.preparse.preparse_numeric_literals(code, extract=False)

This preparses numerical literals into their Sage counterparts, e.g. Integer, RealNumber, and ComplexNumber.

INPUT:

• code - a string; a code block to preparse
• extract - a boolean (default: False); whether to create names for the literals and return a dictionary of name-construction pairs

OUTPUT:

• a string or (string, string:string dictionary) 2-tuple; the preparsed block and, if extract is True, the name-construction mapping

EXAMPLES:

sage: from sage.repl.preparse import preparse_numeric_literals
sage: preparse_numeric_literals("5")
'Integer(5)'
sage: preparse_numeric_literals("5j")
"ComplexNumber(0, '5')"
sage: preparse_numeric_literals("5jr")
'5J'
sage: preparse_numeric_literals("5l")
'5l'
sage: preparse_numeric_literals("5L")
'5L'
sage: preparse_numeric_literals("1.5")
"RealNumber('1.5')"
sage: preparse_numeric_literals("1.5j")
"ComplexNumber(0, '1.5')"
sage: preparse_numeric_literals(".5j")
"ComplexNumber(0, '.5')"
sage: preparse_numeric_literals("5e9j")
"ComplexNumber(0, '5e9')"
sage: preparse_numeric_literals("5.")
"RealNumber('5.')"
sage: preparse_numeric_literals("5.j")
"ComplexNumber(0, '5.')"
sage: preparse_numeric_literals("5.foo()")
'Integer(5).foo()'
sage: preparse_numeric_literals("5.5.foo()")
"RealNumber('5.5').foo()"
sage: preparse_numeric_literals("5.5j.foo()")
"ComplexNumber(0, '5.5').foo()"
sage: preparse_numeric_literals("5j.foo()")
"ComplexNumber(0, '5').foo()"
sage: preparse_numeric_literals("1.exp()")
'Integer(1).exp()'
sage: preparse_numeric_literals("1e+10")
"RealNumber('1e+10')"
sage: preparse_numeric_literals("0x0af")
'Integer(0x0af)'
sage: preparse_numeric_literals("0x10.sqrt()")
'Integer(0x10).sqrt()'
sage: preparse_numeric_literals('0o100')
'Integer(0o100)'
sage: preparse_numeric_literals('0b111001')
'Integer(0b111001)'
sage: preparse_numeric_literals('0xe')
'Integer(0xe)'
sage: preparse_numeric_literals('0xEAR')
'0xEA'
sage: preparse_numeric_literals('0x1012Fae')
'Integer(0x1012Fae)'

sage.repl.preparse.strip_prompts(line)

Removes leading sage: and >>> prompts so that pasting of examples from the documentation works.

INPUT:

• line - a string to process

OUTPUT:

• a string stripped of leading prompts

EXAMPLES:

sage: from sage.repl.preparse import strip_prompts
sage: strip_prompts("sage: 2 + 2")
'2 + 2'
sage: strip_prompts(">>>   3 + 2")
'3 + 2'
sage: strip_prompts("  2 + 4")
'  2 + 4'

sage.repl.preparse.strip_string_literals(code, state=None)

Returns a string with all literal quotes replaced with labels and a dictionary of labels for re-substitution. This makes parsing easier.

INPUT:

• code - a string; the input
• state - a 2-tuple (default: None); state with which to continue processing, e.g., across multiple calls to this function

OUTPUT:

• a 3-tuple of the processed code, the dictionary of labels, and any accumulated state

EXAMPLES:

sage: from sage.repl.preparse import strip_string_literals
sage: s, literals, state = strip_string_literals(r'''['a', "b", 'c', "d\""]''')
sage: s
'[%(L1)s, %(L2)s, %(L3)s, %(L4)s]'
sage: literals
{'L1': "'a'", 'L2': '"b"', 'L3': "'c'", 'L4': '"d\\""'}
sage: print(s % literals)
['a', "b", 'c', "d\""]
sage: print(strip_string_literals(r'-"\\\""-"\\"-')[0])
-%(L1)s-%(L2)s-


Triple-quotes are handled as well:

sage: s, literals, state = strip_string_literals("[a, '''b''', c, '']")
sage: s
'[a, %(L1)s, c, %(L2)s]'
sage: print(s % literals)
[a, '''b''', c, '']


sage: s, literals, state = strip_string_literals("code '#' # ccc 't'"); s
'code %(L1)s #%(L2)s'
sage: s % literals
"code '#' # ccc 't'"


A state is returned so one can break strings across multiple calls to this function:

sage: s, literals, state = strip_string_literals('s = "some'); s
's = %(L1)s'
sage: s, literals, state = strip_string_literals('thing" * 5', state); s
'%(L1)s * 5'