Monday, October 10, 2016

Python, regex, and SymPy to automate custom text conversions to LaTeX

In [1]:
Author: John Volk
Date: 10/10/2016
from __future__ import print_function
from sympy.parsing.sympy_parser import (parse_expr, standard_transformations, implicit_multiplication,\
import numpy as np
import sympy
import re

Python, regex, and SymPy to automate custom text conversions to LaTeX

This post includes examples on how to:

  • Convert text equation in bad format for Python and SymPy

  • Convert normal Python mathematical experssion into a suitable form for SymPy's LaTeX printer

  • Use sympy to produce LaTeX output

  • Create functions and data structures to make the process reusable and efficient to fit your needs

Lets start with the following string that we assign to the variable text that represents a mathematical model but in poor printing form:

In [2]:
text = """
Ln(Y) = a0 + a1 LnQ + a2 LnQ^2 + a3 Sin(2 pi dtime) + a4 Cos(2 pi dtime)
+ a5 dtime + a6 dtime^2"""

'\nLn(Y) = a0 + a1 LnQ + a2 LnQ^2 + a3 Sin(2 pi dtime) + a4 Cos(2 pi dtime)\n+ a5 dtime + a6 dtime^2'

However, we want this expression to look like:

$ \log{\left (Y \right )} = a_{0} + a_{1} \log{\left (Q \right )} + a_{2} \log{\left (Q^{2} \right )} + a_{3} \sin{\left (2 \pi dtime \right )} + a_{4} \cos{\left (2 \pi dtime \right )} + a_{5} dtime + a_{6} dtime^{2} $

Observe the following differences between text and valid LateX:

  • Some variables and functions are concatenated, i.e.: LnQ, correct latex would be \log{Q}

  • Functions are not in proper latex form (e.g. Sin = \sin, Ln = \log, ...)

  • Missing subscripts: a0 = a_0

  • Newline characters need to be removed

  • Some symbols need to be replaced: dtime = t

Python's symbolic math pacakge SymPy can automate some of the transformations that we need, and SymPy has built in LaTeX printing capabilities.

If you are not familiar with SymPy you should take some time to familiarize yourself with it; it takes some time to get used to its syntax. Check out the well done documentation for Sympy here.

First we need to convert the string (text) into valid SymPy input

  • Valid sympy input includes valid python math expressions with added recognition of math operations. For example the following expression can be parsed by SymPy without error:

In [3]:
exp = "(x + 4) * (x + sin(x**3) + log(x + 5*x) + 3*x - sqrt(y))" 
4*x**2 - x*sqrt(y) + x*log(x) + x*sin(x**3) + x*log(6) + 16*x - 4*sqrt(y) + 4*log(x) + 4*sin(x**3) + 4*log(6)

To convert a valid SymPy expression like t above into LaTeX is easy:

In [4]:
4 x^{2} - x \sqrt{y} + x \log{\left (x \right )} + x \sin{\left (x^{3} \right )} + x \log{\left (6 \right )} + 16 x - 4 \sqrt{y} + 4 \log{\left (x \right )} + 4 \sin{\left (x^{3} \right )} + 4 \log{\left (6 \right )}

Which, when rendered as LaTeX is

$4 x^{2} - x \sqrt{y} + x \log{\left (x \right )} + x \sin{\left (x^{3} \right )} + x \log{\left (6 \right )} + 16 x - 4 \sqrt{y} + 4 \log{\left (x \right )} + 4 \sin{\left (x^{3} \right )} + 4 \log{\left (6 \right )}$

SymPly Beautiful!!!

Now back to our original text that we want to convert, we need to make some simple adjustments to make the string a valid SymPy expression

You have several options here, in this case I choose to use regular expressions (regex) to do basic string pattern substitutions. You will likely need to modify these operations or create alrenative regex to prepare your text. If you do not know regex you can probably get by without using basic Python string methods.

In [5]:
## Note, I removed the LHS and the equal sign from the equation- SymPy requires special syntac for equations
## further explanation below
text = """
a0 + a1 LnQ + a2 LnQ^2 + a3 Sin(2 pi dtime) + a4 Cos(2 pi dtime)
+ a5 dtime + a6 dtime^2"""

## Make a dictionary to map our strings to standard python math or symbols as needed
symbol_map = {
              '^': '**',
              'Ln': 'log ',
              'Sin': 'sin ',
              'Cos': 'cos ',
              'dtime': 't'
## use the dictionary to compile a regex on the keys
## escape regex characters because ^ is one of the keys, (^ is a regex special character)
to_symbols = re.compile('|'.join(re.escape(key) for key in symbol_map.keys())) 
# run through the text looking for keys (regex) and replacing them with the values from the dict
text = to_symbols.sub(lambda x: symbol_map[], text) 

'\na0 + a1 log Q + a2 log Q**2 + a3 sin (2 pi t) + a4 cos (2 pi t)\n+ a5 t + a6 t**2'
In [6]:
## remove new line characters from the text 
text = re.sub('\n', ' ', text)
' a0 + a1 log Q + a2 log Q**2 + a3 sin (2 pi t) + a4 cos (2 pi t) + a5 t + a6 t**2'
In [7]:
## regex to replace coefficients a0, a1, ... with their equivalents with subscripts e.g. a0 = a_0
text = re.sub(r"\s+a(\d)", r"a_\1", text)
'a_0 +a_1 log Q +a_2 log Q**2 +a_3 sin (2 pi t) +a_4 cos (2 pi t) +a_5 t +a_6 t**2'

At this point text is almost ready for LaTeX...

The remaining issues are sufficiently difficult string manipulations, SymPy's Parser is perfect for the remaining conversions:

<p class ="blog">Instead of trying to figure out how to place asterisks everywhere that multiplication is implied and parenthesis where functions are implied, e.g. log Q**2 should be log(Q**2) we can use SymPy's Parser that is quite powerful.

We use implicit multiplication (self-explantory) and implicit application for function applications that are mising parenthesis, both of these are transformations provided by the SymPy Parser. Remember the parser will still follow mathematical order of operations (PEMDAS) when doing implicit application. The parser can handle additional cases as well such as function exponentiation. Check the handy examples at the documentation link above. </p>

In [8]:
## get the transformations we need (imported above) and place into a tuple that is required for the parser
transformations = standard_transformations + (implicit_multiplication, implicit_application, )
## parse the text by applying implicit multiplication and implicit (math function) appplication
expr = parse_expr(text, transformations=transformations)

a_0 + a_1*log(Q) + a_2*log(Q**2) + a_3*sin(2*pi*t) + a_4*cos(2*pi*t) + a_5*t + a_6*t**2

We're done, just print using SymPy's latex printer!

In [9]:
a_{0} + a_{1} \log{\left (Q \right )} + a_{2} \log{\left (Q^{2} \right )} + a_{3} \sin{\left (2 \pi t \right )} + a_{4} \cos{\left (2 \pi t \right )} + a_{5} t + a_{6} t^{2}

SymPly amazing!!

$a_{0} + a_{1} \log{\left (Q \right )} + a_{2} \log{\left (Q^{2} \right )} + a_{3} \sin{\left (2 \pi t \right )} + a_{4} \cos{\left (2 \pi t \right )} + a_{5} t + a_{6} t^{2} $

Now let's put this all together into a function:

In [10]:
## global variables for the function
symbol_map = {
              '^': '**',
              'Ln': 'log ',
              'Sin': 'sin ',
              'Cos': 'cos ',
              'dtime': 't'

transformations = standard_transformations + (implicit_multiplication, implicit_application, )
## the function
def translate(bad_text):
    """My custom string-to-LaTeX-ready SymPy expression translation function
        bad_text (str): text that is in some bad format that requires string manipulation
            including custom string modifications to math functions, symbols, and operators
            defined by the global symbol_map dictionary (for substitutions), and the regexs 
            compiled herein. More advanced manipulations providied by SymPy are defined by 
            the global variable `transformations` are inputs to the SymPy parser
        expr (sympy expression): A SymPy expresion created by the SymPy expression parser
            after first doing custom string modifications to math functions, symbols, and operators

    to_symbols = re.compile('|'.join(re.escape(key) for key in symbol_map.keys())) 
    bad_text = to_symbols.sub(lambda x: symbol_map[], bad_text)
    bad_text = re.sub('\n', '', bad_text)
    text = re.sub(r"\s+a(\d)", r"a_\1", bad_text)
    expr = parse_expr(text, transformations=transformations)
    return expr
In [11]:
## very handy, now we just have to convert to TeX and print
a_{0} + a_{1} \log{\left (Q \right )} + a_{2} \log{\left (Q^{2} \right )} + a_{3} \sin{\left (2 \pi t \right )} + a_{4} \cos{\left (2 \pi t \right )} + a_{5} t + a_{6} t^{2}

What about the original text ? It was an equation with a left-hand-side:

  • Parse both the LHS and RHS separately and combine with SymPy's Equation method

In [12]:
text = """
Ln(Y) = a0 + a1 LnQ + a2 LnQ^2 + a3 Sin(2 pi dtime) + a4 Cos(2 pi dtime)
+ a5 dtime + a6 dtime^2"""

# split on the equal sign
t1 = text.split('=')[0] 
t2 = text.split('=')[1] 
In [13]:
## Use sympy.Eq(LHS,RHS)
LHS = translate(t1)
RHS = translate(t2)
print(sympy.latex(sympy.Eq(LHS, RHS)))
\log{\left (Y \right )} = a_{0} + a_{1} \log{\left (Q \right )} + a_{2} \log{\left (Q^{2} \right )} + a_{3} \sin{\left (2 \pi t \right )} + a_{4} \cos{\left (2 \pi t \right )} + a_{5} t + a_{6} t^{2}

$\log{\left (Load \right )} = a_{0} + a_{1} \log{\left (Q \right )} + a_{2} \log{\left (Q^{2} \right )} + a_{3} \sin{\left (2 \pi t \right )} + a_{4} \cos{\left (2 \pi t \right )} + a_{5} t + a_{6} t^{2} $

SymPly fantastic!!!

SymPy's power can now be used to modify our LaTeX expression

One quick example: let's plug in random values for the following variables:

$$\large{a_0, a_1, a_2, a_3, a_4, a_5, ~\text{and}~ a_6 }$$
In [14]:
## extract SymPy symbols from both sides of eqn
LHS_symbols = [str(x) for x in LHS.atoms(sympy.symbol.Symbol)]
RHS_symbols = [str(x) for x in RHS.atoms(sympy.symbol.Symbol)]

In [15]:
['a_0', 'Q', 'a_5', 'a_6', 'a_2', 'a_3', 'a_1', 'a_4', 't']
In [16]:
## remove Q and t from the RHS list because we do not want to plug values in for them
In [17]:
## create a dictionary assigning each symbol to random variables
plug_in_dict = {k: np.random.randint(10) for k in RHS_symbols }
{'a_6': 4, 'a_5': 7, 'a_4': 5, 'a_3': 0, 'a_2': 1, 'a_1': 0, 'a_0': 6}
In [18]:
## now plug in our values and let sympy simplyfy! Note, the variables we changed only appear on the RHS
4*t**2 + 7*t + log(Q**2) + 5*cos(2*pi*t) + 6

Using our function above, let's convert and render the modified expression in TeX


a_6 = 4
a_5 = 7
a_4 = 5
a_3 = 0
a_2 = 1
a_1 = 0
a_0 = 6
In [19]:
print(sympy.latex(sympy.Eq(LHS, RHS.subs(plug_in_dict))))
\log{\left (Y \right )} = 4 t^{2} + 7 t + \log{\left (Q^{2} \right )} + 5 \cos{\left (2 \pi t \right )} + 6

$\log{\left (Y \right )} = 4 t^{2} + 7 t + \log{\left (Q^{2} \right )} + 5 \cos{\left (2 \pi t \right )} + 6 $


I hope this was useful to anyone trying to use Python to batch process strings into mathematical expressions and LaTeX. In my case I needed to process many of these types of strings that were output from a computer code that fits regression models to input data. As you can see, if you work with mathematical expressions of any kind and already know basic Python, SymPy is undoubtedly useful. If you liked this or have experimented with your own implementations of Python, regex, and/or SymPy to do cool and useful things please share in the comments below.

No comments:

Post a Comment