Python module#
Overview#
Manipulate LaTeX files and BibTeX databases#
|
Interpret TeX file to allow simple manipulations. |
|
Limit a BibTeX file to a list of keys. |
Indent LaTeX files#
|
Indent text. |
Support functions#
|
Return list with present environments. |
|
Placeholder for text. |
|
Replace text with placeholders. |
|
Replace placeholders with original text. |
|
Find comments. |
|
Per character if it corresponds to commented text. |
|
Remove comments from a string. |
Details#
- class texplain.GeneratePlaceholder(base: str, name: str)#
Class to generate a new placeholder. The following placeholder is generated every time the object is called:
-{base}-{name}-{i:d}-
For example:
>>> gen = GeneratePlaceholder(base="foo", name="bar") >>> gen() '-foo-bar-1-' >>> gen() '-foo-bar-2-'
- Parameters:
base – The base of the placeholder.
name – The name of the placeholder.
- property search_placeholder: str#
Return the regex that can be used to search for the placeholder.
- class texplain.Placeholder(placeholder: str, content: str, space_front: str | None = None, space_back: str | None = None, ptype: PlaceholderType | None = None, search_placeholder: str | None = None)#
Placeholder for text. This class stores the text to be replaced by a placeholder and the placeholder itself. In addition, it can store the whitespace before and after the placeholder.
- Parameters:
placeholder – The placeholder to use.
content – The text replaced by the placeholder.
space_front – The whitespace before the placeholder.
space_back – The whitespace after the placeholder.
ptype – The type of placeholder, see
PlaceholderType
.search_placeholder – The regex used to search for the placeholder (optional, but speeds up greatly for batch searches).
- classmethod from_text(placeholder: str, text: str, start: int, end: int, ptype: PlaceholderType | None = None, search_placeholder: str | None = None)#
Replace text with placeholder. Save the content and the current whitespace before and after the placeholder. To restore the original text precisely:
placeholder, text = Placeholder.from_text(placeholder, text, start, end) text = placeholder.to_text(text)
- Parameters:
placeholder – The placeholder to use.
text – The text to consider.
start – The start index of
text
to be replaced by the placeholder.end – The end index of
text
to be replaced by the placeholder.ptype – The type of placeholder, see
PlaceholderType
.search_placeholder – The regex used to search the placeholder.
- Returns:
(Placeholder, text)
where intext
the placeholder is inserted.
- to_text(text: str, index: int | None = None, keep_placeholder: bool = False) str #
Replace placeholder with content. If the whitespace before and after the placeholder is stored, it is restored.
- Parameters:
text – Text.
index – The index of the placeholder.
keep_placeholder – If
True
keep the placeholder, change only the whitespace.
- Returns:
Text with placeholder replaced by content.
- class texplain.PlaceholderType(value)#
Type of placeholder. The placeholders’ practical definition is in
text_to_placeholders()
. The intended use is:line
: A single line of content (no newline).inline_comment
: A comment that is preceded by some content (no newline).comment
: A comment that is the only content on that line (no newline).tabular
: Block\begin{tabular} ... \end{tabular}
.math
: Block of displaymath. E.g.\begin{equation} ... \end{equation}
.inline_math
: Block of inline math. E.g.$ ... $
.math_line
: A single line of content in math mode (no newline).environment
: Block of environment:\begin{...} ... \end{...}
.command
: Block of command:\command[...]{...}
.curly_braced
: Block of curly braced content:{...}
.command_like
: Block ofcommand
orcurly_braced
.texindent_block
: Block of% \begin{texindent} ... % \end{texindent}
.noindent_block
: Block of% \begin{noindent} ... % \end{noindent}
.verbatim
: Block of\begin{verbatim} ... \end{verbatim}
.let_command
: Definition\let...
.newif_command
: Definition\newif...
.
Except for
line
,math_line
,comment
,inline_comment
, andmath_line
all placeholders can span more than one line.- command = 9#
- command_like = 11#
- comment = 3#
- curly_braced = 10#
- environment = 8#
- inline_comment = 2#
- inline_math = 6#
- let_command = 15#
- line = 1#
- math = 5#
- math_line = 7#
- newif_command = 16#
- noindent_block = 13#
- tabular = 4#
- texindent_block = 12#
- verbatim = 14#
- class texplain.TeX(text: str)#
Interpret TeX file to allow simple manipulations. The manipulations are the member functions.
- Parameters:
text – LaTeX code.
- change_label(old_label: str, new_label: str, overwrite: bool = False)#
Change label in
\label{...}
and\ref{...}
(-like) commands.- Parameters:
old_label – Old label.
new_label – New label.
overwrite – Overwrite existing labels.
- changed()#
Check if the document has changed.
- citation_keys() list[str] #
Read the citation keys in the TeX file (keys in
\cite{...}
,\citet{...}
,\citep{...}
).- Returns:
Unique list of keys in the order or appearance.
- config_files() list[str] #
Read configuration files in the directory of the TeX file.
- Returns:
List of filenames.
- environments() list[str] #
Return list with present environments (between
\begin{...} ... \end{...}
).
- find_by_extension(ext: str) list[str] #
Find all files with a certain extensions in the directory of the TeX file.
- Parameters:
ext – File extension.
- Returns:
List of filenames.
- fix_quotes()#
Replace:
"..."
by\`\`...''
.'...'
by\`...'
.
- float_filenames(cmd: str = '\\includegraphics') list[tuple[str]] #
Extract the keys of ‘float’ commands (e.g.
\includegraphics{...}
,\bibliography{...}
) and reconstruct their filenames. This operation is read-only.- Parameters:
cmd – The command to look for.
- Returns:
A list
[('key', 'filename')]
in order of appearance.
- format_labels(prefix: str | None = None)#
Format all labels as:
sec:...
: Section labels.ch:...
: Chapter labels.fig:...
: Figure labels.tab:...
: Table labels.eq:...
: Math labels.note:...
: Footnote.misc:...
: Anything else.
- Parameters:
prefix – Add optional
prefix
. E.g.key:prefix:...
.
- classmethod from_file(filename: str)#
Read from file.
- Parameters:
filename – Path to the file to read.
- get()#
Return document.
- labels() list[str] #
Return list of labels (in order of appearance).
- remove_commentlines()#
Remove lines that are entirely a comment.
- remove_comments()#
Remove comments form the main text.
- rename_float(old: str, new: str, cmd: str = '\\includegraphics')#
Rename a key of a ‘float’ command (e.g.
\includegraphics{...}
,\bibliography{...}
). This changes the TeX file.- Parameters:
old – Old key.
new – New key.
cmd – The command to look for.
- replace_command(cmd: str, replace: str, ignore_commented: bool = False)#
Replace command. For example:
Remove the command:
replace_command(r"{\TG}[1]", "")
>>> This is a \TG{I would replace this} text. <<< This is a text.
Select a part of the command:
replace_command(r"{\TG}[2]", "#1")
>>> This is a \TG{text}{test}. <<< This is a test.
Change the command:
replace_command(r"{\TG}[2]", "\mycomment{#1}{#2}")
>>> This is a \TG{text}{test}. <<< This is a \mycomment{text}{test}.
- Parameters:
cmd – The command’s definition. Given
\newcommand{cmd}[args]{def}
you should specify{cmd}[args]
, or{cmd}
(or evencmd
) which defaults to{cmd}[1]
replace – The
def
part (curly braces around are optional). As in LaTeX replacement is done on#1
,#2
, …ignore_commented – If
True
the command is not replaced if it is commented out.
- use_cleveref()#
Replace:
Eq.~\eqref{...} Fig.~\ref{...} ...
By:
\cref{...}
everywhere.
- texplain.bib_select(text: str, keys: list[str]) str #
Limit a BibTeX file to a list of keys.
- Parameters:
test – The BibTeX file as string.
keys – The list of keys to select.
- Returns:
The (reduced) BibTeX file, as string.
- texplain.environments(text: str) list[str] #
Return list with present environments. This corresponds to the text between
\begin{...}
and\end{...}
.
- texplain.find_command(text: str, name: str | None = None, regex: str = '(?<!\\\\)(\\\\)([a-zA-Z\\@]+)(\\*?)', is_comment: list[bool] | None = None) list[list[tuple[int]]] #
Find indices of commands, and their options, and arguments.
- Parameters:
text – Text.
name – Name of command without backslash (e.g.
"textbf"
).regex – Regex to match search the command name.
is_comment – Per character of
text
,True
if the character is part of a comment. Default: search for comments usingis_commented()
.
- Returns:
List of indices of commands and their arguments:
[[(name_start, name_end), (arg1_start, arg1_end), ...], ...]
Note the definition is such that one can extract thej
-th component of thei
-th command as follows:text[cmd[i][j][0]:cmd[i][j][1]]
.
- texplain.find_commented(text: str) list[list[int]] #
Find comments.
The output is such that one can find the comments text as follows:
for i, j in find_commented(text): print(text[i : j]) # i is the index of "%"
- Parameters:
text – Text.
- Returns:
List of of indices of the beginning and end of the comments.
- texplain.find_matching(text: str, opening: str, closing: str, ignore_escaped: bool = True, ignore_commented: bool = False, escape: bool = True, opening_match: int = 0, closing_match: int = 0, return_array: bool = False) dict #
Find matching ‘brackets’.
- Parameters:
text – The string to consider.
opening – The opening bracket (e.g. “(”, “[”, “{“).
closing – The closing bracket (e.g. “)”, “]”, “}”).
ignore_escaped – Ignore escaped bracket (e.g. “(”, “[”, “{”, “)”, “]”, “}”).
ignore_commented – Ignore any text that is commented (e.g. “% …”).
escape – If
True
,opening
andclosing
are escaped.opening_match – Select index of begin (
0
) or end (1
) of opening bracket match.closing_match – Select index of begin (
0
) or end (1
) of closing bracket match.return_array – If
True
, return NumPy-array of indices instead of dictionary.
- Returns:
Dictionary with
{index_opening: index_closing}
- texplain.find_matching_index(opening: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], closing: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], return_array: bool = False) dict #
Find matching ‘brackets’, based on a list of indices corresponding to opening and closing ‘brackets’.
- Parameters:
opening – Indices of the opening brackets.
closing – Indices of the closing brackets.
return_array – If
True
, return NumPy-array of indices instead of dictionary.
- Returns:
Dictionary with
{index_opening: index_closing}
- texplain.indent(text: str, indentation: str = ' ', rstrip: bool = True, lstrip: bool = True, squashlines: bool = True, squashspaces: bool = True, symbols: bool = True, environment: bool = True, argument: bool = True, inlinemath: bool = True, linebreak: bool = True, itemize: bool = True, sentence: bool = True, alignment: bool = True, texindent: bool = True, noindent: bool = True) str #
Indent text.
- Parameters:
text – The text to indent.
indentation –
Set indentation of lines between:
\begin{...}[...]{...}
and\end{...}
.\[
and\]
.{
and}
.[
and]
(as command option).
Comment lines follow indentation. Requires:
lstrip
,inlinemath
,`environment
. To switch off indentation, setindentation=""
.rstrip – Remove trailing spaces on all lines.
lstrip – Remove all leading spaces before applying indentation.
squashlines – Reduce the maximum number of consecutive blank lines to 2.
squashspaces – Reduce the maximum number of consecutive spaces to 1.
symbols – In math-mode: all symbols are separated by a space.
environment –
\begin{...}[...]{...}
and\end{...}
(and\[
and\]
) are placed on separate lines.argument –
Any option or argument that spans more than one line is placed on separate lines. For example:
xxx { This is a very long argument that is more than one line long. } yyy
is formatted to:
xxx { This is a very long argument that is more than one line long. } yyy
inlinemath – Inline math is placed on one line.
linebreak –
\\
is followed by a newline.itemize – Each
\item
is placed on a separate line.sentence –
One sentence per line. Every sentence should start on a new line, and it should be (as much as possible) on a single line. The following rules of thumb are followed:
A sentence ends with:
A period, question mark, or exclamation mark.
\begin{...}
or\end{...}
.Two white lines.
\\
The end of an argument (
}
or]
), see below.A command on the next line.
Commands and inline math are treated as a single word. Formatting is applied on the arguments of commands.
Requires:
rstrip
,lstrip
,squashspaces
.alignment –
If the resulting line is less that 100 characters columns in tabular environments are aligned at
&
and also\\
are aligned.In other cases single spaces are placed around
&
and before\\
.
Requires:
environment
.texindent –
Custom formatting in blocks:
% \begin{texindent}{...} ... % \end{texindent}
where the
{...}
argument is a comma-separated list of options of this function; for example:% \begin{texindent}{sentence=False, inlinemath=False} ... % \end{texindent}
noindent –
Verbatim environments and everything between
% \begin{noindent} ... % \end{noindent}
is not formatted.
- Returns:
The indented text.
- texplain.is_commented(text: str) ndarray[Any, dtype[bool_]] #
Per character if it corresponds to commented text.
- Parameters:
text – Text.
- Returns:
Array of booleans of size
len(text)
.
- texplain.remove_comments(text: str) str #
Remove comments from a string.
- Parameters:
text – Text
- Returns:
Text without comments.
- texplain.texcleanup(args: list[str])#
Command-line tool to copy to clean output directory, see
--help
.
- texplain.texindent_cli(args: list[str])#
Indent TeX file, see
--help
.
- texplain.texplain(args: list[str])#
Command-line tool to copy to clean output directory, see
--help
.
- texplain.text_from_placeholders(text: str, placeholders: list[texplain.Placeholder], keep_placeholders: bool = False) str #
Replace placeholders with original text. The whitespace before and after the placeholder is modified to the match
Placeholder.space_front
andPlaceholder.space_back
.- Parameters:
text – Text with placeholders.
placeholders – List of placeholders.
keep_placeholders – If
True
, the placeholders are kept (they are merely positioned).
- Returns:
Text with content of the placeholders.
- texplain.text_to_placeholders(text: str, ptypes: list[texplain.PlaceholderType], base: str = 'TEXINDENT', placeholders_comments: list[texplain.Placeholder] | None = None) tuple[str, list[texplain.Placeholder]] #
Replace text with placeholders. The following placeholders are supported:
PlaceholderType.noindent_block
:% \begin{noindent} ... % \end{noindent}
is replaced with
-BASE-NOINDENT-1-
PlaceholderType.texindent_block
:% \begin{texindent} ... % \end{texindent}
is replaced with
-BASE-TEXINDENT-1-
-
\begin{verbatim} ... \end{verbatim}
is replaced with
-BASE-VERBATIM-1-
-
A comment on a line that contains no other text.
% ...
is replaced with
-BASE-COMMENT-1-
PlaceholderType.inline_comment
:A comment following some other text on the same line.
xxx % ...
is replaced with
xxx -BASE-INLINE-COMMENT-1-
-
$...$
is replaced with
-BASE-INLINE-MATH-1-
Also looks for
\(...\)
and\begin{math}...\end{math}
. -
\begin{equation} ... \end{equation}
is replaced with
-BASE-MATH-1-
Also looks for
\[...\]
,\begin{equation*}...\end{equation*}
,\begin{align}...\end{align}
, and\begin{align*}...\end{align*}
. -
A line of display mode math (see
PlaceholderType.math
)\begin{equation} ... ... \end{equation}
is replaced with
\begin{equation} -BASE-MATH-LINE-1- -BASE-MATH-LINE-2- \end{equation}
-
\begin{...} ... \end{...}
is replaced with
-BASE-ENVIRONMENT-1-
-
\begin{tabular} ... \end{tabular}
is replaced with
-BASE-TABULAR-1-
-
\foo[...]{...}
is replaced with
-BASE-COMMAND-1-
-
{...} \foo[...]{...} {\foo[...]{...}}
is replaced with
-BASE-COMMAND-1- -BASE-COMMAND-2- -BASE-COMMAND-3-
-
{...}
is replaced with
-BASE-CURLY-BRACED-1-
-
\let\iffoo
is replaced with
-BASE-LET-1-
PlaceholderType.newif_command
:\newif\iffoo
is replaced with
-BASE-NEWIF-1-
- Parameters:
text – Text.
ptypes – List of placeholder types to replace
base – Base string for placeholders
placeholders_comments – List of placeholders that are comments (needed to search commands)
- Returns:
(text, placeholders)
withtext
: Text with placeholdersplaceholders
: List of placeholders