Python, regex, and SymPy to automate custom text conversions to LaTeX
A Jupyter Notebook that details the use of SymPy to create LaTeX formatted equations.
"""
Author: John Volk
Date: 10/10/2016
"""
from __future__ import print_function
from sympy.parsing.sympy_parser import (parse_expr, standard_transformations, implicit_multiplication,\
implicit_application)
import numpy as np
import sympy
import re
Python, regex, and SymPy to automate custom text conversions to LaTeX¶
This post includes examples on how to:
Convert text equation in bad format for Python and SymPy
Convert normal Python mathematical experssion into a suitable form for SymPy's LaTeX printer
Use sympy to produce LaTeX output
Create functions and data structures to make the process reusable and efficient to fit your needs
Lets start with the following string that we assign to the variable text that represents a mathematical model but in poor printing form:¶
text = """
Ln(Y) = a0 + a1 LnQ + a2 LnQ^2 + a3 Sin(2 pi dtime) + a4 Cos(2 pi dtime)
+ a5 dtime + a6 dtime^2"""
text
However, we want this expression to look like:¶
$ \log{\left (Y \right )} = a_{0} + a_{1} \log{\left (Q \right )} + a_{2} \log{\left (Q^{2} \right )} + a_{3} \sin{\left (2 \pi dtime \right )} + a_{4} \cos{\left (2 \pi dtime \right )} + a_{5} dtime + a_{6} dtime^{2} $
Observe the following differences between text and valid LateX:¶
Some variables and functions are concatenated, i.e.: LnQ, correct latex would be \log{Q}
Functions are not in proper latex form (e.g. Sin = \sin, Ln = \log, ...)
Missing subscripts: a0 = a_0
Newline characters need to be removed
Some symbols need to be replaced: dtime = t
Python's symbolic math pacakge SymPy can automate some of the transformations that we need, and SymPy has built in LaTeX printing capabilities.¶
If you are not familiar with SymPy you should take some time to familiarize yourself with it; it takes some time to get used to its syntax. Check out the well done documentation for Sympy here.
First we need to convert the string (text) into valid SymPy input¶
Valid sympy input includes valid python math expressions with added recognition of math operations. For example the following expression can be parsed by SymPy without error:
exp = "(x + 4) * (x + sin(x**3) + log(x + 5*x) + 3*x - sqrt(y))"
sympy.expand(exp)
To convert a valid SymPy expression like t above into LaTeX is easy:¶
print(sympy.latex(sympy.expand(exp)))
Now back to our original text that we want to convert, we need to make some simple adjustments to make the string a valid SymPy expression¶
You have several options here, in this case I choose to use regular expressions (regex) to do basic string pattern substitutions. You will likely need to modify these operations or create alrenative regex to prepare your text. If you do not know regex you can probably get by without using basic Python string methods.
## Note, I removed the LHS and the equal sign from the equation- SymPy requires special syntac for equations
## further explanation below
text = """
a0 + a1 LnQ + a2 LnQ^2 + a3 Sin(2 pi dtime) + a4 Cos(2 pi dtime)
+ a5 dtime + a6 dtime^2"""
## Make a dictionary to map our strings to standard python math or symbols as needed
symbol_map = {
'^': '**',
'Ln': 'log ',
'Sin': 'sin ',
'Cos': 'cos ',
'dtime': 't'
}
## use the dictionary to compile a regex on the keys
## escape regex characters because ^ is one of the keys, (^ is a regex special character)
to_symbols = re.compile('|'.join(re.escape(key) for key in symbol_map.keys()))
# run through the text looking for keys (regex) and replacing them with the values from the dict
text = to_symbols.sub(lambda x: symbol_map[x.group()], text)
text
## remove new line characters from the text
text = re.sub('\n', ' ', text)
text
## regex to replace coefficients a0, a1, ... with their equivalents with subscripts e.g. a0 = a_0
text = re.sub(r"\s+a(\d)", r"a_\1", text)
text
At this point text is almost ready for LaTeX...¶
The remaining issues are sufficiently difficult string manipulations, SymPy's Parser is perfect for the remaining conversions:¶
Instead of trying to figure out how to place asterisks everywhere that multiplication is implied and parenthesis where functions are implied, e.g. log Q**2 should be log(Q**2) we can use SymPy's Parser that is quite powerful.
We use implicit multiplication (self-explantory) and implicit application for function applications that are mising parenthesis, both of these are transformations provided by the SymPy Parser. Remember the parser will still follow mathematical order of operations (PEMDAS) when doing implicit application. The parser can handle additional cases as well such as function exponentiation. Check the handy examples at the documentation link above.
## get the transformations we need (imported above) and place into a tuple that is required for the parser
transformations = standard_transformations + (implicit_multiplication, implicit_application, )
## parse the text by applying implicit multiplication and implicit (math function) appplication
expr = parse_expr(text, transformations=transformations)
expr
We're done, just print using SymPy's latex printer!¶
print(sympy.latex(expr))
SymPly amazing!!¶
$a_{0} + a_{1} \log{\left (Q \right )} + a_{2} \log{\left (Q^{2} \right )} + a_{3} \sin{\left (2 \pi t \right )} + a_{4} \cos{\left (2 \pi t \right )} + a_{5} t + a_{6} t^{2} $
Now let's put this all together into a function:¶
## global variables for the function
symbol_map = {
'^': '**',
'Ln': 'log ',
'Sin': 'sin ',
'Cos': 'cos ',
'dtime': 't'
}
transformations = standard_transformations + (implicit_multiplication, implicit_application, )
## the function
def translate(bad_text):
"""My custom string-to-LaTeX-ready SymPy expression translation function
Arguments:
bad_text (str): text that is in some bad format that requires string manipulation
including custom string modifications to math functions, symbols, and operators
defined by the global symbol_map dictionary (for substitutions), and the regexs
compiled herein. More advanced manipulations providied by SymPy are defined by
the global variable `transformations` are inputs to the SymPy parser
Returns:
expr (sympy expression): A SymPy expresion created by the SymPy expression parser
after first doing custom string modifications to math functions, symbols, and operators
"""
to_symbols = re.compile('|'.join(re.escape(key) for key in symbol_map.keys()))
bad_text = to_symbols.sub(lambda x: symbol_map[x.group()], bad_text)
bad_text = re.sub('\n', '', bad_text)
text = re.sub(r"\s+a(\d)", r"a_\1", bad_text)
expr = parse_expr(text, transformations=transformations)
return expr
## very handy, now we just have to convert to TeX and print
print(sympy.latex(translate(text)))
What about the original text ? It was an equation with a left-hand-side:¶
Parse both the LHS and RHS separately and combine with SymPy's Equation method
text = """
Ln(Y) = a0 + a1 LnQ + a2 LnQ^2 + a3 Sin(2 pi dtime) + a4 Cos(2 pi dtime)
+ a5 dtime + a6 dtime^2"""
# split on the equal sign
t1 = text.split('=')[0]
t2 = text.split('=')[1]
## Use sympy.Eq(LHS,RHS)
LHS = translate(t1)
RHS = translate(t2)
print(sympy.latex(sympy.Eq(LHS, RHS)))
$\log{\left (Load \right )} = a_{0} + a_{1} \log{\left (Q \right )} + a_{2} \log{\left (Q^{2} \right )} + a_{3} \sin{\left (2 \pi t \right )} + a_{4} \cos{\left (2 \pi t \right )} + a_{5} t + a_{6} t^{2} $
SymPly fantastic!!!¶
## extract SymPy symbols from both sides of eqn
LHS_symbols = [str(x) for x in LHS.atoms(sympy.symbol.Symbol)]
RHS_symbols = [str(x) for x in RHS.atoms(sympy.symbol.Symbol)]
LHS_symbols
RHS_symbols
## remove Q and t from the RHS list because we do not want to plug values in for them
RHS_symbols.pop(RHS_symbols.index('Q'))
RHS_symbols.pop(RHS_symbols.index('t'));
## create a dictionary assigning each symbol to random variables
plug_in_dict = {k: np.random.randint(10) for k in RHS_symbols }
print(plug_in_dict)
## now plug in our values and let sympy simplyfy! Note, the variables we changed only appear on the RHS
RHS.subs(plug_in_dict)
print(sympy.latex(sympy.Eq(LHS, RHS.subs(plug_in_dict))))
$\log{\left (Y \right )} = 4 t^{2} + 7 t + \log{\left (Q^{2} \right )} + 5 \cos{\left (2 \pi t \right )} + 6 $
Remarks¶
I hope this was useful to anyone trying to use Python to batch process strings into mathematical expressions and LaTeX. In my case I needed to process many of these types of strings that were output from a computer code that fits regression models to input data. As you can see, if you work with mathematical expressions of any kind and already know basic Python, SymPy is undoubtedly useful. If you liked this or have experimented with your own implementations of Python, regex, and/or SymPy to do cool and useful things please share in the comments below.
Comments
Post a Comment