Python module#

Overview#

Manipulate LaTeX files and BibTeX databases#

`texplain.TeX`(text)	Interpret TeX file to allow simple manipulations.
`texplain.bib_select`(text, keys[, reorder])	Limit a BibTeX file to a list of keys.

Indent LaTeX files#

texplain.indent(text[, indentation, rstrip, ...])

Indent text.

Support functions#

`texplain.environments`(text)	Return list with present environments.
`texplain.Placeholder`(placeholder, content[, ...])	Placeholder for text.
`texplain.text_to_placeholders`(text, ptypes)	Replace text with placeholders.
`texplain.text_from_placeholders`(text, ...[, ...])	Replace placeholders with original text.
`texplain.find_commented`(text)	Find comments.
`texplain.is_commented`(text)	Per character if it corresponds to commented text.
`texplain.remove_comments`(text)	Remove comments from a string.

Details#

class texplain.GeneratePlaceholder(base: str, name: str, start: int = 0)#

Class to generate a new placeholder. The following placeholder is generated every time the object is called:

-{base}-{name}-{i:d}-

For example:

>>> gen = GeneratePlaceholder(base="foo", name="bar")
>>> gen()
'-foo-bar-1-'
>>> gen()
'-foo-bar-2-'

Parameters:

base – The base of the placeholder.
name – The name of the placeholder.
start – The starting index of the placeholder.

property search_placeholder: str#: Return the regex that can be used to search for the placeholder.

class texplain.Placeholder(placeholder: str, content: str, space_front: str = None, space_back: str = None, ptype: PlaceholderType = None, search_placeholder: str = None)#

Placeholder for text. This class stores the text to be replaced by a placeholder and the placeholder itself. In addition, it can store the whitespace before and after the placeholder.

Parameters:

placeholder – The placeholder to use.
content – The text replaced by the placeholder.
space_front – The whitespace before the placeholder.
space_back – The whitespace after the placeholder.
ptype – The type of placeholder, see PlaceholderType.
search_placeholder – The regex used to search for the placeholder (optional, but speeds up greatly for batch searches).

classmethod from_text(placeholder: str, text: str, start: int, end: int, ptype: PlaceholderType = None, search_placeholder: str = None)#

Replace text with placeholder. Save the content and the current whitespace before and after the placeholder. To restore the original text precisely:

placeholder, text = Placeholder.from_text(placeholder, text, start, end)
text = placeholder.to_text(text)

Parameters:

placeholder – The placeholder to use.
text – The text to consider.
start – The start index of text to be replaced by the placeholder.
end – The end index of text to be replaced by the placeholder.
ptype – The type of placeholder, see PlaceholderType.
search_placeholder – The regex used to search the placeholder.

Returns:

(Placeholder, text) where in text the placeholder is inserted.

to_text(text: str, index: int = None, keep_placeholder: bool = False) → str#

Replace placeholder with content. If the whitespace before and after the placeholder is stored, it is restored.

Parameters:

text – Text.
index – The index of the placeholder.
keep_placeholder – If True keep the placeholder, change only the whitespace.

Returns:

Text with placeholder replaced by content.

class texplain.PlaceholderType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)#

Type of placeholder. The placeholders’ practical definition is in text_to_placeholders(). The intended use is:

line: A single line of content (no newline).
inline_comment: A comment that is preceded by some content (no newline).
comment: A comment that is the only content on that line (no newline).
tabular: Block \begin{tabular} ... \end{tabular}.
math: Block of displaymath. E.g. \begin{equation} ... \end{equation}.
inline_math: Block of inline math. E.g. $ ... $ .
math_line: A single line of content in math mode (no newline).
environment: Block of environment: \begin{...} ... \end{...}.
command: Block of command: \command[...]{...}.
curly_braced: Block of curly braced content: {...}.
command_like: Block of command or curly_braced.
texindent_block: Block of % \begin{texindent} ... % \end{texindent}.
noindent_block: Block of % \begin{noindent} ... % \end{noindent}.
verbatim: Block of \begin{verbatim} ... \end{verbatim}.
let_command: Definition \let....
newif_command: Definition \newif....

Except for line, math_line, comment, inline_comment, and math_line all placeholders can span more than one line.

command = 9#

command_like = 11#

comment = 3#

curly_braced = 10#

environment = 8#

inline_comment = 2#

inline_math = 6#

let_command = 15#

line = 1#

math = 5#

math_line = 7#

newif_command = 16#

noindent_block = 13#

tabular = 4#

texindent_block = 12#

verbatim = 14#

class texplain.TeX(text: str)#

Interpret TeX file to allow simple manipulations. The manipulations are the member functions.

Parameters:: text – LaTeX code.

change_label(old_label: str, new_label: str, overwrite: bool = False)#

Change label in \label{...} and \ref{...} (-like) commands.

Parameters:

old_label – Old label.
new_label – New label.
overwrite – Overwrite existing labels.

changed()#: Check if the document has changed.

citation_keys() → list[str]#

Read the citation keys in the TeX file (keys in \cite{...}, \citet{...}, \citep{...}).

Returns:: Unique list of keys in the order or appearance.

config_files() → list[str]#

Read configuration files in the directory of the TeX file.

Returns:: List of filenames.

environments() → list[str]#: Return list with present environments (between \begin{...} ... \end{...}).

find_by_extension(ext: str) → list[str]#

Find all files with a certain extensions in the directory of the TeX file.

Parameters:: ext – File extension.
Returns:: List of filenames.

fix_quotes()#

Replace:

"..." by \`\`...''.
'...' by \`...'.

float_filenames(cmd: str = '\\includegraphics') → list[tuple[str]]#

Extract the keys of ‘float’ commands (e.g. \includegraphics{...}, \bibliography{...}) and reconstruct their filenames. This operation is read-only.

Parameters:: cmd – The command to look for.
Returns:: A list [('key', 'filename')] in order of appearance.

format_labels(prefix: str = None)#

Format all labels as:

sec:...: Section labels.
ch:...: Chapter labels.
fig:...: Figure labels.
tab:...: Table labels.
eq:...: Math labels.
note:...: Footnote.
misc:...: Anything else.

Parameters:: prefix – Add optional prefix. E.g. key:prefix:....

classmethod from_file(filename: str)#

Read from file.

Parameters:: filename – Path to the file to read.

get()#: Return document.

labels() → list[str]#: Return list of labels (in order of appearance).

remove_commentlines()#: Remove lines that are entirely a comment.

remove_comments()#: Remove comments form the main text.

rename_float(old: str, new: str, cmd: str = '\\includegraphics')#

Rename a key of a ‘float’ command (e.g. \includegraphics{...}, \bibliography{...}). This changes the TeX file.

Parameters:

old – Old key.
new – New key.
cmd – The command to look for.

replace_command(cmd: str, replace: str, ignore_commented: bool = False)#

Replace command. For example:

Remove the command:

replace_command(r"{\TG}[1]", "")

>>> This is a \TG{I would replace this} text.
<<< This is a  text.

Select a part of the command:

replace_command(r"{\TG}[2]", "#1")

>>> This is a \TG{text}{test}.
<<< This is a test.

Change the command:

replace_command(r"{\TG}[2]", "\mycomment{#1}{#2}")

>>> This is a \TG{text}{test}.
<<< This is a \mycomment{text}{test}.

Parameters:

cmd – The command’s definition. Given \newcommand{cmd}[args]{def} you should specify {cmd}[args], or {cmd} (or even cmd) which defaults to {cmd}[1]
replace – The def part (curly braces around are optional). As in LaTeX replacement is done on #1, #2, …
ignore_commented – If True the command is not replaced if it is commented out.

use_cleveref()#

Replace:

Eq.~\eqref{...}
Fig.~\ref{...}
...

By:

\cref{...}

everywhere.

texplain.bib_select(text: str, keys: list[str], reorder: bool = False) → str#

Limit a BibTeX file to a list of keys.

Parameters:

test – The BibTeX file as string.
keys – The list of keys to select.
reorder – Reorder the entries in the bib-file to match the order of keys.

Returns:

The (reduced) BibTeX file, as string.

texplain.environments(text: str) → list[str]#: Return list with present environments. This corresponds to the text between \begin{...} and \end{...}.

texplain.find_command(text: str, name: str = None, regex: str = '(?<!\\\\)(\\\\)([a-zA-Z\\@]+)(\\*?)', is_comment: list[bool] = None) → list[list[tuple[int]]]#

Find indices of commands, and their options, and arguments. The following pattern is searched for:

Backslash
Word
Any number of matching [] and {} (in any order).

Parameters:

text – Text.
name – Name of command without backslash (e.g. "textbf").
regex – Regex to match search the command name.
is_comment – Per character of text, True if the character is part of a comment. Default: search for comments using is_commented().

Returns:

List of indices of commands and their arguments: [[(name_start, name_end), (arg1_start, arg1_end), ...], ...] Note the definition is such that one can extract the j-th component of the i-th command as follows: text[cmd[i][j][0]:cmd[i][j][1]].

texplain.find_commented(text: str) → list[list[int]]#

Find comments.

The output is such that one can find the comments text as follows:

for i, j in find_commented(text):
    print(text[i : j]) # i is the index of "%"

Parameters:: text – Text.
Returns:: List of of indices of the beginning and end of the comments.

texplain.find_matching(text: str, opening: str, closing: str, ignore_escaped: bool = True, ignore_commented: bool = False, escape: bool = True, opening_match: int = 0, closing_match: int = 0, return_array: bool = False) → dict#

Find matching ‘brackets’.

Parameters:

text – The string to consider.
opening – The opening bracket (e.g. “(”, “[”, “{“).
closing – The closing bracket (e.g. “)”, “]”, “}”).
ignore_escaped – Ignore escaped bracket (e.g. “(”, “[”, “{”, “)”, “]”, “}”).
ignore_commented – Ignore any text that is commented (e.g. “% …”).
escape – If True, opening and closing are escaped.
opening_match – Select index of begin (0) or end (1) of opening bracket match.
closing_match – Select index of begin (0) or end (1) of closing bracket match.
return_array – If True, return NumPy-array of indices instead of dictionary.

Returns:

Dictionary with {index_opening: index_closing}

Find matching ‘brackets’, based on a list of indices corresponding to opening and closing ‘brackets’.

Parameters:

opening – Indices of the opening brackets.
closing – Indices of the closing brackets.
return_array – If True, return NumPy-array of indices instead of dictionary.

Returns:

Dictionary with {index_opening: index_closing}

texplain.indent(text: str, indentation: str = ' ', rstrip: bool = True, lstrip: bool = True, squashlines: bool = True, squashspaces: bool = True, symbols: bool = True, environment: bool = True, argument: bool = True, inlinemath: bool = True, linebreak: bool = True, itemize: bool = True, sentence: bool = True, alignment: bool = True, texindent: bool = True, noindent: bool = True) → str#

Indent text.

Parameters:

text – The text to indent.
indentation –
Set indentation of lines between:
- \begin{...}[...]{...} and \end{...}.
- \[ and \].
- { and }.
- [ and ] (as command option).
Comment lines follow indentation. Requires: lstrip, inlinemath, environment. To switch off indentation, set indentation="".
rstrip – Remove trailing spaces on all lines.
lstrip – Remove all leading spaces before applying indentation.
squashlines – Reduce the maximum number of consecutive blank lines to 2.
squashspaces – Reduce the maximum number of consecutive spaces to 1.
symbols – In math-mode: all symbols are separated by a space.
environment – \begin{...}[...]{...} and \end{...} (and \[ and \]) are placed on separate lines.

argument –

Any option or argument that spans more than one line is placed on separate lines. For example:

xxx { This is a very long argument
that is more than one line long. } yyy

is formatted to:

xxx {
    This is a very long argument
    that is more than one line long.
} yyy

inlinemath – Inline math is placed on one line.
linebreak – \\ is followed by a newline.
itemize – Each \item is placed on a separate line.
sentence –
One sentence per line. Every sentence should start on a new line, and it should be (as much as possible) on a single line. The following rules of thumb are followed:
- A sentence ends with:
  - A period, question mark, or exclamation mark.
  - \begin{...} or \end{...}.
  - Two white lines.
  - \\
  - The end of an argument (} or ]), see below.
  - A command on the next line.
- Commands and inline math are treated as a single word. Formatting is applied on the arguments of commands.
Requires: rstrip, lstrip, squashspaces.
alignment –
- If the resulting line is less that 100 characters columns in tabular environments are aligned at & and also \\ are aligned.
- In other cases single spaces are placed around & and before \\.
Requires: environment.
texindent –
Custom formatting in blocks:
```
% \begin{texindent}{...}
...
% \end{texindent}
```
where the {...} argument is a comma-separated list of options of this function; for example:
```
% \begin{texindent}{sentence=False, inlinemath=False}
...
% \end{texindent}
```
noindent –
Verbatim environments and everything between
```
% \begin{noindent}
...
% \end{noindent}
```
is not formatted.

Returns:

The indented text.

texplain.is_commented(text: str) → ndarray[Any, dtype[bool_]]#

Per character if it corresponds to commented text.

Parameters:: text – Text.
Returns:: Array of booleans of size len(text).

texplain.remove_comments(text: str) → str#

Remove comments from a string.

Parameters:: text – Text
Returns:: Text without comments.

texplain.texcleanup(args: list[str])#: Command-line tool to copy to clean output directory, see --help.

texplain.texindent_cli(args: list[str])#: Indent TeX file, see --help.

texplain.texplain(args: list[str])#: Command-line tool to copy to clean output directory, see --help.

texplain.text_from_placeholders(text: str, placeholders: list[Placeholder], keep_placeholders: bool = False) → str#

Replace placeholders with original text. The whitespace before and after the placeholder is modified to the match Placeholder.space_front and Placeholder.space_back.

Parameters:

text – Text with placeholders.
placeholders – List of placeholders.
keep_placeholders – If True, the placeholders are kept (they are merely positioned).

Returns:

Text with content of the placeholders.

texplain.text_to_placeholders(text: str, ptypes: list[PlaceholderType], base: str = 'TEXINDENT', placeholders_comments: list[Placeholder] = None) → tuple[str, list[Placeholder]]#

Replace text with placeholders. The following placeholders are supported:

PlaceholderType.noindent_block:

% \begin{noindent}
...
% \end{noindent}

is replaced with

-BASE-NOINDENT-1-

PlaceholderType.texindent_block:

% \begin{texindent}
...
% \end{texindent}

is replaced with

-BASE-TEXINDENT-1-

PlaceholderType.verbatim:

\begin{verbatim}
...
\end{verbatim}

is replaced with

-BASE-VERBATIM-1-

PlaceholderType.comment:

A comment on a line that contains no other text.
```
% ...
```
is replaced with
```
-BASE-COMMENT-1-
```
PlaceholderType.inline_comment:

A comment following some other text on the same line.
```
xxx % ...
```
is replaced with
```
xxx -BASE-INLINE-COMMENT-1-
```
PlaceholderType.inline_math:
```
$...$
```
is replaced with
```
-BASE-INLINE-MATH-1-
```
Also looks for $...$ and \begin{math}...\end{math}.
PlaceholderType.math:
```
\begin{equation}
...
\end{equation}
```
is replaced with
```
-BASE-MATH-1-
```
Also looks for \[...\], \begin{equation*}...\end{equation*}, \begin{align}...\end{align}, and \begin{align*}...\end{align*}.

PlaceholderType.math_line:

A line of display mode math (see PlaceholderType.math)

\begin{equation}
...
...
\end{equation}

is replaced with

\begin{equation}
-BASE-MATH-LINE-1-
-BASE-MATH-LINE-2-
\end{equation}

PlaceholderType.environment:

\begin{...}
...
\end{...}

is replaced with

-BASE-ENVIRONMENT-1-

PlaceholderType.tabular:

\begin{tabular}
...
\end{tabular}

is replaced with

-BASE-TABULAR-1-

PlaceholderType.command:
```
\foo[...]{...}
```
is replaced with
```
-BASE-COMMAND-1-
```

PlaceholderType.command_like:

{...}
\foo[...]{...}
{\foo[...]{...}}

is replaced with

-BASE-COMMAND-1-
-BASE-COMMAND-2-
-BASE-COMMAND-3-

PlaceholderType.curly_braced:
```
{...}
```
is replaced with
```
-BASE-CURLY-BRACED-1-
```
PlaceholderType.let_command:
```
\let\iffoo
```
is replaced with
```
-BASE-LET-1-
```
PlaceholderType.newif_command:
```
\newif\iffoo
```
is replaced with
```
-BASE-NEWIF-1-
```

Parameters:

text – Text.
ptypes – List of placeholder types to replace
base – Base string for placeholders
placeholders_comments – List of placeholders that are comments (needed to search commands)

Returns:

(text, placeholders) with

text: Text with placeholders
placeholders: List of placeholders