Python module#

Overview#

Manipulate LaTeX files and BibTeX databases#

texplain.TeX(text)

Interpret TeX file to allow simple manipulations.

texplain.bib_select(text, keys[, reorder])

Limit a BibTeX file to a list of keys.

Indent LaTeX files#

texplain.indent(text[, indentation, rstrip, ...])

Indent text.

Support functions#

texplain.environments(text)

Return list with present environments.

texplain.Placeholder(placeholder, content[, ...])

Placeholder for text.

texplain.text_to_placeholders(text, ptypes)

Replace text with placeholders.

texplain.text_from_placeholders(text, ...[, ...])

Replace placeholders with original text.

texplain.find_commented(text)

Find comments.

texplain.is_commented(text)

Per character if it corresponds to commented text.

texplain.remove_comments(text)

Remove comments from a string.

Details#

class texplain.GeneratePlaceholder(base: str, name: str, start: int = 0)#

Class to generate a new placeholder. The following placeholder is generated every time the object is called:

-{base}-{name}-{i:d}-

For example:

>>> gen = GeneratePlaceholder(base="foo", name="bar")
>>> gen()
'-foo-bar-1-'
>>> gen()
'-foo-bar-2-'
Parameters:
  • base – The base of the placeholder.

  • name – The name of the placeholder.

  • start – The starting index of the placeholder.

property search_placeholder: str#

Return the regex that can be used to search for the placeholder.

class texplain.Placeholder(placeholder: str, content: str, space_front: str = None, space_back: str = None, ptype: PlaceholderType = None, search_placeholder: str = None)#

Placeholder for text. This class stores the text to be replaced by a placeholder and the placeholder itself. In addition, it can store the whitespace before and after the placeholder.

Parameters:
  • placeholder – The placeholder to use.

  • content – The text replaced by the placeholder.

  • space_front – The whitespace before the placeholder.

  • space_back – The whitespace after the placeholder.

  • ptype – The type of placeholder, see PlaceholderType.

  • search_placeholder – The regex used to search for the placeholder (optional, but speeds up greatly for batch searches).

classmethod from_text(placeholder: str, text: str, start: int, end: int, ptype: PlaceholderType = None, search_placeholder: str = None)#

Replace text with placeholder. Save the content and the current whitespace before and after the placeholder. To restore the original text precisely:

placeholder, text = Placeholder.from_text(placeholder, text, start, end)
text = placeholder.to_text(text)
Parameters:
  • placeholder – The placeholder to use.

  • text – The text to consider.

  • start – The start index of text to be replaced by the placeholder.

  • end – The end index of text to be replaced by the placeholder.

  • ptype – The type of placeholder, see PlaceholderType.

  • search_placeholder – The regex used to search the placeholder.

Returns:

(Placeholder, text) where in text the placeholder is inserted.

to_text(text: str, index: int = None, keep_placeholder: bool = False) str#

Replace placeholder with content. If the whitespace before and after the placeholder is stored, it is restored.

Parameters:
  • text – Text.

  • index – The index of the placeholder.

  • keep_placeholder – If True keep the placeholder, change only the whitespace.

Returns:

Text with placeholder replaced by content.

class texplain.PlaceholderType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)#

Type of placeholder. The placeholders’ practical definition is in text_to_placeholders(). The intended use is:

  • line: A single line of content (no newline).

  • inline_comment: A comment that is preceded by some content (no newline).

  • comment: A comment that is the only content on that line (no newline).

  • tabular: Block \begin{tabular} ... \end{tabular}.

  • math: Block of displaymath. E.g. \begin{equation} ... \end{equation}.

  • inline_math: Block of inline math. E.g. $ ... $.

  • math_line: A single line of content in math mode (no newline).

  • environment: Block of environment: \begin{...} ... \end{...}.

  • command: Block of command: \command[...]{...}.

  • curly_braced: Block of curly braced content: {...}.

  • command_like: Block of command or curly_braced.

  • texindent_block: Block of % \begin{texindent} ... % \end{texindent}.

  • noindent_block: Block of % \begin{noindent} ... % \end{noindent}.

  • verbatim: Block of \begin{verbatim} ... \end{verbatim}.

  • let_command: Definition \let....

  • newif_command: Definition \newif....

Except for line, math_line, comment, inline_comment, and math_line all placeholders can span more than one line.

command = 9#
command_like = 11#
comment = 3#
curly_braced = 10#
environment = 8#
inline_comment = 2#
inline_math = 6#
let_command = 15#
line = 1#
math = 5#
math_line = 7#
newif_command = 16#
noindent_block = 13#
tabular = 4#
texindent_block = 12#
verbatim = 14#
class texplain.TeX(text: str)#

Interpret TeX file to allow simple manipulations. The manipulations are the member functions.

Parameters:

text – LaTeX code.

change_label(old_label: str, new_label: str, overwrite: bool = False)#

Change label in \label{...} and \ref{...} (-like) commands.

Parameters:
  • old_label – Old label.

  • new_label – New label.

  • overwrite – Overwrite existing labels.

changed()#

Check if the document has changed.

citation_keys() list[str]#

Read the citation keys in the TeX file (keys in \cite{...}, \citet{...}, \citep{...}).

Returns:

Unique list of keys in the order or appearance.

config_files() list[str]#

Read configuration files in the directory of the TeX file.

Returns:

List of filenames.

environments() list[str]#

Return list with present environments (between \begin{...} ... \end{...}).

find_by_extension(ext: str) list[str]#

Find all files with a certain extensions in the directory of the TeX file.

Parameters:

ext – File extension.

Returns:

List of filenames.

fix_quotes()#

Replace:

  • "..." by \`\`...''.

  • '...' by \`...'.

float_filenames(cmd: str = '\\includegraphics') list[tuple[str]]#

Extract the keys of ‘float’ commands (e.g. \includegraphics{...}, \bibliography{...}) and reconstruct their filenames. This operation is read-only.

Parameters:

cmd – The command to look for.

Returns:

A list [('key', 'filename')] in order of appearance.

format_labels(prefix: str = None)#

Format all labels as:

  • sec:...: Section labels.

  • ch:...: Chapter labels.

  • fig:...: Figure labels.

  • tab:...: Table labels.

  • eq:...: Math labels.

  • note:...: Footnote.

  • misc:...: Anything else.

Parameters:

prefix – Add optional prefix. E.g. key:prefix:....

classmethod from_file(filename: str)#

Read from file.

Parameters:

filename – Path to the file to read.

get()#

Return document.

labels() list[str]#

Return list of labels (in order of appearance).

remove_commentlines()#

Remove lines that are entirely a comment.

remove_comments()#

Remove comments form the main text.

rename_float(old: str, new: str, cmd: str = '\\includegraphics')#

Rename a key of a ‘float’ command (e.g. \includegraphics{...}, \bibliography{...}). This changes the TeX file.

Parameters:
  • old – Old key.

  • new – New key.

  • cmd – The command to look for.

replace_command(cmd: str, replace: str, ignore_commented: bool = False)#

Replace command. For example:

  • Remove the command:

    replace_command(r"{\TG}[1]", "")
    
    >>> This is a \TG{I would replace this} text.
    <<< This is a  text.
    
  • Select a part of the command:

    replace_command(r"{\TG}[2]", "#1")
    
    >>> This is a \TG{text}{test}.
    <<< This is a test.
    
  • Change the command:

    replace_command(r"{\TG}[2]", "\mycomment{#1}{#2}")
    
    >>> This is a \TG{text}{test}.
    <<< This is a \mycomment{text}{test}.
    
Parameters:
  • cmd – The command’s definition. Given \newcommand{cmd}[args]{def} you should specify {cmd}[args], or {cmd} (or even cmd) which defaults to {cmd}[1]

  • replace – The def part (curly braces around are optional). As in LaTeX replacement is done on #1, #2, …

  • ignore_commented – If True the command is not replaced if it is commented out.

use_cleveref()#

Replace:

Eq.~\eqref{...}
Fig.~\ref{...}
...

By:

\cref{...}

everywhere.

texplain.bib_select(text: str, keys: list[str], reorder: bool = False) str#

Limit a BibTeX file to a list of keys.

Parameters:
  • test – The BibTeX file as string.

  • keys – The list of keys to select.

  • reorder – Reorder the entries in the bib-file to match the order of keys.

Returns:

The (reduced) BibTeX file, as string.

texplain.environments(text: str) list[str]#

Return list with present environments. This corresponds to the text between \begin{...} and \end{...}.

texplain.find_command(text: str, name: str = None, regex: str = '(?<!\\\\)(\\\\)([a-zA-Z\\@]+)(\\*?)', is_comment: list[bool] = None) list[list[tuple[int]]]#

Find indices of commands, and their options, and arguments. The following pattern is searched for:

  • Backslash

  • Word

  • Any number of matching [] and {} (in any order).

Parameters:
  • text – Text.

  • name – Name of command without backslash (e.g. "textbf").

  • regex – Regex to match search the command name.

  • is_comment – Per character of text, True if the character is part of a comment. Default: search for comments using is_commented().

Returns:

List of indices of commands and their arguments: [[(name_start, name_end), (arg1_start, arg1_end), ...], ...] Note the definition is such that one can extract the j-th component of the i-th command as follows: text[cmd[i][j][0]:cmd[i][j][1]].

texplain.find_commented(text: str) list[list[int]]#

Find comments.

The output is such that one can find the comments text as follows:

for i, j in find_commented(text):
    print(text[i : j]) # i is the index of "%"
Parameters:

text – Text.

Returns:

List of of indices of the beginning and end of the comments.

texplain.find_matching(text: str, opening: str, closing: str, ignore_escaped: bool = True, ignore_commented: bool = False, escape: bool = True, opening_match: int = 0, closing_match: int = 0, return_array: bool = False) dict#

Find matching ‘brackets’.

Parameters:
  • text – The string to consider.

  • opening – The opening bracket (e.g. “(”, “[”, “{“).

  • closing – The closing bracket (e.g. “)”, “]”, “}”).

  • ignore_escaped – Ignore escaped bracket (e.g. “(”, “[”, “{”, “)”, “]”, “}”).

  • ignore_commented – Ignore any text that is commented (e.g. “% …”).

  • escape – If True, opening and closing are escaped.

  • opening_match – Select index of begin (0) or end (1) of opening bracket match.

  • closing_match – Select index of begin (0) or end (1) of closing bracket match.

  • return_array – If True, return NumPy-array of indices instead of dictionary.

Returns:

Dictionary with {index_opening: index_closing}

texplain.find_matching_index(opening: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], closing: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], return_array: bool = False) dict#

Find matching ‘brackets’, based on a list of indices corresponding to opening and closing ‘brackets’.

Parameters:
  • opening – Indices of the opening brackets.

  • closing – Indices of the closing brackets.

  • return_array – If True, return NumPy-array of indices instead of dictionary.

Returns:

Dictionary with {index_opening: index_closing}

texplain.indent(text: str, indentation: str = '    ', rstrip: bool = True, lstrip: bool = True, squashlines: bool = True, squashspaces: bool = True, symbols: bool = True, environment: bool = True, argument: bool = True, inlinemath: bool = True, linebreak: bool = True, itemize: bool = True, sentence: bool = True, alignment: bool = True, texindent: bool = True, noindent: bool = True) str#

Indent text.

Parameters:
  • text – The text to indent.

  • indentation

    Set indentation of lines between:

    • \begin{...}[...]{...} and \end{...}.

    • \[ and \].

    • { and }.

    • [ and ] (as command option).

    Comment lines follow indentation. Requires: lstrip, inlinemath, environment. To switch off indentation, set indentation="".

  • rstrip – Remove trailing spaces on all lines.

  • lstrip – Remove all leading spaces before applying indentation.

  • squashlines – Reduce the maximum number of consecutive blank lines to 2.

  • squashspaces – Reduce the maximum number of consecutive spaces to 1.

  • symbols – In math-mode: all symbols are separated by a space.

  • environment\begin{...}[...]{...} and \end{...} (and \[ and \]) are placed on separate lines.

  • argument

    Any option or argument that spans more than one line is placed on separate lines. For example:

    xxx { This is a very long argument
    that is more than one line long. } yyy
    

    is formatted to:

    xxx {
        This is a very long argument
        that is more than one line long.
    } yyy
    

  • inlinemath – Inline math is placed on one line.

  • linebreak\\ is followed by a newline.

  • itemize – Each \item is placed on a separate line.

  • sentence

    One sentence per line. Every sentence should start on a new line, and it should be (as much as possible) on a single line. The following rules of thumb are followed:

    • A sentence ends with:

      • A period, question mark, or exclamation mark.

      • \begin{...} or \end{...}.

      • Two white lines.

      • \\

      • The end of an argument (} or ]), see below.

      • A command on the next line.

    • Commands and inline math are treated as a single word. Formatting is applied on the arguments of commands.

    Requires: rstrip, lstrip, squashspaces.

  • alignment

    • If the resulting line is less that 100 characters columns in tabular environments are aligned at & and also \\ are aligned.

    • In other cases single spaces are placed around & and before \\.

    Requires: environment.

  • texindent

    Custom formatting in blocks:

    % \begin{texindent}{...}
    ...
    % \end{texindent}
    

    where the {...} argument is a comma-separated list of options of this function; for example:

    % \begin{texindent}{sentence=False, inlinemath=False}
    ...
    % \end{texindent}
    

  • noindent

    Verbatim environments and everything between

    % \begin{noindent}
    ...
    % \end{noindent}
    

    is not formatted.

Returns:

The indented text.

texplain.is_commented(text: str) ndarray[Any, dtype[bool_]]#

Per character if it corresponds to commented text.

Parameters:

text – Text.

Returns:

Array of booleans of size len(text).

texplain.remove_comments(text: str) str#

Remove comments from a string.

Parameters:

text – Text

Returns:

Text without comments.

texplain.texcleanup(args: list[str])#

Command-line tool to copy to clean output directory, see --help.

texplain.texindent_cli(args: list[str])#

Indent TeX file, see --help.

texplain.texplain(args: list[str])#

Command-line tool to copy to clean output directory, see --help.

texplain.text_from_placeholders(text: str, placeholders: list[Placeholder], keep_placeholders: bool = False) str#

Replace placeholders with original text. The whitespace before and after the placeholder is modified to the match Placeholder.space_front and Placeholder.space_back.

Parameters:
  • text – Text with placeholders.

  • placeholders – List of placeholders.

  • keep_placeholders – If True, the placeholders are kept (they are merely positioned).

Returns:

Text with content of the placeholders.

texplain.text_to_placeholders(text: str, ptypes: list[PlaceholderType], base: str = 'TEXINDENT', placeholders_comments: list[Placeholder] = None) tuple[str, list[Placeholder]]#

Replace text with placeholders. The following placeholders are supported:

Parameters:
  • text – Text.

  • ptypes – List of placeholder types to replace

  • base – Base string for placeholders

  • placeholders_comments – List of placeholders that are comments (needed to search commands)

Returns:

(text, placeholders) with
  • text: Text with placeholders

  • placeholders: List of placeholders