Python module#

Overview#

Manipulate LaTeX files and BibTeX databases#

texplain.TeX(text)

Simple TeX file manipulations.

texplain.bib_select(text, keys)

Limit a BibTeX file to a list of keys.

Indent LaTeX files#

Support functions#

texplain.environments(text)

Return list with present environments (between ``begin{...} .

texplain.Placeholder(placeholder, content[, ...])

Placeholder for text.

texplain.text_to_placeholders(text, ptypes)

Replace text with placeholders.

texplain.text_from_placeholders(text, ...[, ...])

Replace placeholders with original text.

texplain.find_commented(text)

Find comments bits of text. The output is such that one can find the comments text as follows::.

texplain.is_commented(text)

Return array that lists per character if it corresponds to commented text.

texplain.remove_comments(text)

Remove comments from a string.

Documentation#

class texplain.GeneratePlaceholder(base: str, name: str)#

Class to generate a new placeholder. The following placeholder is generated every time the object is called:

-{base}-{name}-{i:d}-

For example:

>>> gen = GeneratePlaceholder(base="foo", name="bar")
>>> gen()
'-foo-bar-1-'
>>> gen()
'-foo-bar-2-'
Parameters
  • base – The base of the placeholder.

  • name – The name of the placeholder.

property search_placeholder: str#

Return the regex that can be used to search for the placeholder.

class texplain.Placeholder(placeholder: str, content: str, space_front: str = None, space_back: str = None, ptype: PlaceholderType = None, search_placeholder: str = None)#

Placeholder for text. This class stores the text to be replaced by a placeholder and the placeholder itself. In addition, it can store the whitespace before and after the placeholder.

Parameters
  • placeholder – The placeholder to use.

  • content – The text replaced by the placeholder.

  • space_front – The whitespace before the placeholder.

  • space_back – The whitespace after the placeholder.

  • ptype – The type of placeholder.

  • search_placeholder – The regex used to search for the placeholder (optional, but speeds up greatly for batch searches).

classmethod from_text(placeholder: str, text: str, start: int, end: int, ptype: PlaceholderType = None, search_placeholder: str = None)#

Replace text with placeholder. Save the content and the current whitespace before and after the placeholder. To restore the original text precisely:

placeholder, text = Placeholder.from_text(placeholder, text, start, end)
text = placeholder.to_text(text)
Parameters
  • placeholder – The placeholder to use.

  • text – The text to consider.

  • start – The start index of text to be replaced by the placeholder.

  • end – The end index of text to be replaced by the placeholder.

  • ptype – The type of placeholder.

  • search_placeholder – The regex used to search the placeholder.

Returns

(Placeholder, text) where in text the placeholder is inserted.

to_text(text: str, index: int = None, keep_placeholder: bool = False) str#

Replace placeholder with content. If the whitespace before and after the placeholder is stored, it is restored.

Parameters
  • text – Text.

  • index – The index of the placeholder.

  • keep_placeholder – If True the placeholder is kept (it is merely positioned).

Returns

Text with placeholder replaced by content.

class texplain.PlaceholderType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#

Type of placeholder. The placeholders’ practical definition is in text_to_placeholder(). The intended use is:

line: A single line of content (1 line). inline_comment: A comment that is preceded by some content. comment: A comment that is the only content on that line. tabular: Entire block \begin{tabular} ... \end{tabular}. math: Entire block of displaystyle math. E.g. \begin{equation} ... \end{equation} inline_math: Entire block of inline math. E.g. $ ... $. math_line: A single line of content in math mode. environment: Entire block of environment: \begin{...} ... \end{...}. command: Entire block of command: \command[...]{...}. curly_braced: Entire block of curly braced content: {...}. command_like: Entire block of command or curly_braced. noindent_block: Entire block of % \begin{noindent} ... % \end{noindent}. verbatim: Entire block of \begin{verbatim} ... \end{verbatim}. let_command: Definition \let.... newif_command: Definition \newif....

Except for line, math_line, comment, inline_comment all placeholders can span more than one line.

command = 9#
command_like = 11#
comment = 3#
curly_braced = 10#
environment = 8#
inline_comment = 2#
inline_math = 6#
let_command = 14#
line = 1#
math = 5#
math_line = 7#
newif_command = 15#
noindent_block = 12#
tabular = 4#
verbatim = 13#
class texplain.TeX(text: str)#

Simple TeX file manipulations.

Parameters

text – LaTeX code.

change_label(old_label: str, new_label: str, overwrite: bool = False)#

Change label in \label{...} and \ref{...} (-like) commands.

Parameters
  • old_label – Old label.

  • new_label – New label.

  • overwrite – Overwrite existing labels.

changed()#

Check if the document has changed.

citation_keys() list[str]#

Read the citation keys in the TeX file (keys in \cite{...}, \citet{...}, \citep{...}).

Returns

Unique list of keys in the order or appearance.

config_files() list[str]#

Read configuration files in the directory of the TeX file.

Returns

List of filenames.

environments() list[str]#

Return list with present environments (between \begin{...} ... \end{...}).

find_by_extension(ext: str) list[str]#

Find all files with a certain extensions in the directory of the TeX file.

Parameters

ext – File extension.

Returns

List of filenames.

float_filenames(cmd: str = '\\includegraphics') list[tuple[str]]#

Extract the keys of ‘float’ commands (e.g. \includegraphics{...}, \bibliography{...}) and reconstruct their filenames. This operation is read-only.

Parameters

cmd – The command to look for.

Returns

A list [('key', 'filename')] in order of appearance.

format_labels(prefix: str = None)#

Format all labels as:

  • sec:...: Section labels.

  • ch:...: Chapter labels.

  • fig:...: Figure labels.

  • tab:...: Table labels.

  • eq:...: Math labels.

  • note:...: Footnote.

  • misc:...: Anything else.

Parameters

prefix – Add optional prefix. E.g. key:prefix:....

classmethod from_file(filename: str)#

Read from file.

Parameters

filename – Path to the file to read.

get()#

Return document.

labels() list[str]#

Return list of labels (in order of appearance).

remove_commentlines()#

Remove lines that are entirely a comment.

remove_comments()#

Remove comments form the main text.

rename_float(old: str, new: str, cmd: str = '\\includegraphics')#

Rename a key of a ‘float’ command (e.g. \includegraphics{...}, \bibliography{...}). This changes the TeX file.

Parameters
  • old – Old key.

  • new – New key.

  • cmd – The command to look for.

replace_command(cmd: str, replace: str, ignore_commented: bool = False)#

Replace command. For example:

  • Remove the command:

    replace_command(r"{\TG}[1]", "")
    
        >>> This is a \TG{I would replace this} text.
        <<< This is a  text.
    
  • Select a part of the command:

    replace_command(r"{\TG}[2]", "#1")
    
        >>> This is a \TG{text}{test}.
        <<< This is a test.
    
  • Change the command:

    replace_command(r"{\TG}[2]", "\mycomment{#1}{#2}")
    
        >>> This is a \TG{text}{test}.
        <<< This is a \mycomment{text}{test}.
    
Parameters
  • cmd – The command’s definition. Given \newcommand{cmd}[args]{def} you should specify {cmd}[args], or {cmd} (or even cmd) which defaults to {cmd}[1]

  • replace – The def part (curly braces around are optional). As in LaTeX replacement is done on #1, #2, …

  • ignore_commented – If True the command is not replaced if it is commented out.

use_cleveref()#

Replace:

Eq.~\eqref{...}
Fig.~\ref{...}
...

By:

\cref{...}

everywhere.

texplain.align(text: str, environment: str, align: str = '<', base: str = 'TEXINDENT-ALIGN')#

For all occurrences of an environment: - Place \begin{...}[...]{...} and \end{...} on own line. - Align & and \\ of all lines that contain those alignment characters.

Parameters
  • text – Text.

  • environment – Name of the environment.

  • align – Alignment of columns ("<", ">" or "^").

  • base – Base for temporary placeholders.

Returns

Formatted text.

texplain.bib_select(text: str, keys: list[str]) str#

Limit a BibTeX file to a list of keys.

Parameters
  • test – The BibTeX file as string.

  • keys – The list of keys to select.

Returns

The (reduced) BibTeX file, as string.

texplain.environments(text: str) list[str]#

Return list with present environments (between \begin{...} ... \end{...}).

texplain.find_command(text: str, name: str = None, regex: str = '(?<!\\\\)(\\\\)([a-zA-Z\\@]+)(\\*?)', is_comment: list[bool] = None) list[list[tuple[int]]]#

Find indices of commands, and their options, and arguments.

Parameters
  • text – Text.

  • name – Name of command without backslash (e.g. "textbf").

  • regex – Regex to match search the command name.

  • is_comment – Per character of text, True if the character is part of a comment. Default: search for comments using is_commented().

Returns

List of indices of commands and their arguments: [[(name_start, name_end), (arg1_start, arg1_end), ...], ...] Note the definition is such that one can extract the j-th component of the i-th command as follows: text[cmd[i][j][0]:cmd[i][j][1]].

texplain.find_commented(text: str) list[list[int]]#

Find comments bits of text. The output is such that one can find the comments text as follows:

for i, j in find_commented(text):
    print(text[i : j]) # i is the index of "%"
Parameters

text – Text.

Returns

List of of indices of the beginning and end of the comments.

texplain.find_matching(text: str, opening: str, closing: str, ignore_escaped: bool = True, ignore_commented: bool = False, escape: bool = True, opening_match: int = 0, closing_match: int = 0, return_array: bool = False) dict#

Find matching ‘brackets’.

Parameters
  • text – The string to consider.

  • opening – The opening bracket (e.g. “(”, “[”, “{“).

  • closing – The closing bracket (e.g. “)”, “]”, “}”).

  • ignore_escaped – Ignore escaped bracket (e.g. “(”, “[”, “{”, “)”, “]”, “}”).

  • ignore_commented – Ignore any text that is commented (e.g. “% …”).

  • escape – If True, opening and closing are escaped.

  • opening_match – Select index of begin (0) or end (1) of opening bracket match.

  • closing_match – Select index of begin (0) or end (1) of closing bracket match.

  • return_array – If True, return NumPy-array of indices instead of dictionary.

Returns

Dictionary with {index_opening: index_closing}

texplain.find_matching_index(opening: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], closing: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], return_array: bool = False) dict#

Find matching ‘brackets’, based on a list of indices corresponding to opening and closing ‘brackets’.

Parameters
  • opening – Indices of the opening brackets.

  • closing – Indices of the closing brackets.

  • return_array – If True, return NumPy-array of indices instead of dictionary.

Returns

Dictionary with {index_opening: index_closing}

texplain.find_opening(text: str, opening: str, ignore_escaped: bool = True) list[int]#

Find opening ‘bracket’.

Parameters
  • text – The string to consider.

  • opening – The opening ‘bracket’ (e.g. “(”, “[”, “{”, but also “%”).

  • ignore_escaped – Ignore escaped ‘bracket’ (e.g. “(”, “[”, “{”, “%”).

Returns

List of indices of opening ‘brackets’ (sorted by definition).

texplain.indent(text: str, indent: str = '    ') str#

Indent text.

Parameters
  • text – The text to indent.

  • indent – The indentation to use.

Returns

The indented text.

texplain.is_commented(text: str) ndarray[Any, dtype[bool_]]#

Return array that lists per character if it corresponds to commented text.

Parameters

text – Text.

Returns

Array of booleans of size len(text).

texplain.remove_comments(text: str) str#

Remove comments from a string.

Parameters

text – Text

Returns

Text without comments.

texplain.texcleanup(args: list[str])#

Command-line tool to copy to clean output directory, see --help.

texplain.texindent_cli(args: list[str])#

Wrapper around latexindent.pl, see --help.

texplain.texplain(args: list[str])#

Command-line tool to copy to clean output directory, see --help.

texplain.text_from_placeholders(text: str, placeholders: list[texplain.Placeholder], keep_placeholders: bool = False) str#

Replace placeholders with original text. The whitespace before and after the placeholder is modified to the match Placeholder.space_front and Placeholder.space_back.

Parameters
  • text – Text with placeholders.

  • placeholders – List of placeholders.

  • keep_placeholders – If True, the placeholders are kept (they are merely positioned).

Returns

Text with content of the placeholders.

texplain.text_to_placeholders(text: str, ptypes: list[texplain.PlaceholderType], base: str = 'TEXINDENT', placeholders_comments: list[texplain.Placeholder] = None) tuple[str, list[texplain.Placeholder]]#

Replace text with placeholders. The following placeholders are supported:

Parameters
  • text – Text.

  • ptypes – List of placeholder types to replace

  • base – Base string for placeholders

  • placeholders_comments – List of placeholders that are comments (needed to search commands)

Returns

(text, placeholders) with - text: Text with placeholders - placeholders: List of placeholders