What is the difference between the backslash followed by a space (\
) and the tilde character (~
)? Both seem to create a non-breaking space, but I'm unsure of how they differ in behavior. When should each of these be used, and how do they affect the spacing in the document?
-
3What’s the basis for the claim that the backslash-space control sequence creates a non-breaking space?– MicoCommented Dec 11 at 23:09
-
I think I got them confused :/– Emanuele NardiCommented Dec 11 at 23:34
-
I was considering their role within an expl3 class.– Emanuele NardiCommented 2 days ago
4 Answers
\
creates a space that allows line breaks, it is mostly used after command names, when a space would normally be gobbled. Note that it always makes a "normal" inter word space, even after end of sentence punctuation such as .
.
~
inserts a no break penalty as well as a space (using \
)
If you execute
\ShowCommand\ %
\ShowCommand~
in LaTeX you will see
> \ =\ .
<argument> \
l.3 \ShowCommand\
%
?
> ~=\protected macro:
->\ifincsname \expandafter ~\else \expandafter \nobreakspace \fi .
<argument> ~
l.4 \ShowCommand~
?
so \
is the Tex primitive, not defined by LaTeX. ~
is normally \nobreakspace
but inside a csname (which includes in filename arguments) the ~
just expands to itself.
A line break is definitely possible at \
. It’s a TeX primitive that adds interword glue and resets the space factor to 1000, so you don’t get an extended end-of-sentence space in Dr.\ Treemunch
, but no penalty is inserted to avoid or even discourage line breaks.
In plain TeX the definition of the active character ~
is
\nobreak\ %
(the percent character is just to make the space evident). The definition in LaTeX is basically the same, with some shenanigans in order to behave when writing onto auxiliary files or if found in labels.
By the way, \nobreak
is a shorthand for \penalty\@M
and \@M
stands for 10000.
-
Vaguely related: At least with lualatex, I have discovered that
\char160
or\char"00A0
inserts the actual nonbreaking space character (from the font) there. Useful with a custom font that kerns against various spaces. Since~
is not the font character, but is generated, it does not kern. Not sure if this is repeatable for all users, or jut something that works with my other custom code.– rallgCommented 2 days ago -
1@rallg The character you mention doesn’t participate in space stretching or shrinking and has fixed width.– egregCommented 2 days ago
-
Yes, fixed width. For me, intended behavior. I can still use
~
when I want stretch/shrink. There are a few commercial fonts that kern against space characters, but TeX users are unlikely to use those fonts.– rallgCommented 2 days ago
In the following
␣ denotes a space character.
stuff
like this denotes code written to a .tex-input file.things like \token and X12 denote tokens.
Tokens exist only while the TeX program is running.
Tokens can come into being in two ways:
- In the processing stage of tokenization, i.e., in the process of looking at the characters that occur in the .tex-input-file and taking them for instructions for creating explicit character tokens or gathering characters that form names of control sequence tokens and creating control sequence tokens and appending the tokens created to the token stream.
- In the processing stage of expansion when expandable tokens and tokens forming their arguments get removed from the token-stream and instead of them other constellations of tokens are inserted into the token-stream.
The former is a control sequence token whose name is "token". The latter is an explicit character token. Character tokens have properties "category" and "character code". The category of the latter is denoted by the subscript, i.e., is 12(other). The character code of the latter is the number which the codepoint of the character X in TeX's internal character encoding scheme has.
With traditional TeX engines TeX's internal character encoding scheme is ASCII. With the TeX engines XeTeX and LuaTeX TeX's internal character encoding scheme is Unicode whereof ASCII is a strict subset.The "class" of control sequences (without the word "tokens") at the time of running the TeX program and reading and tokenizing the .tex-input file divides into two "sub-classes": The "sub-class" of active character tokens and the "sub-class" of control sequence tokens (with the word "tokens"). The "sub-class" of control sequence tokens in turn divides into two "sub-sub-classes": The "sub-sub-class" of control word tokens and the "sub-sub-class" of control symbol tokens.
The names of control symbol tokens modulo the leading backslash-character/modulo the leading character of category 0(escape) consist of a single character which is not of category 11(letter) whereas names of control word tokens—modulo the leading backslash-character/modulo the leading character of category 0(escape)—either consist of a single character which is of category 11(letter) or consist of several characters.
So in the stage of tokenization, when by looking at characters that come from the .tex-input-file, right after encountering a backslash/a character of category 0(escape) gathering characters that form the name of a control sequence token, the category of the first character of the name is the criterion by which TeX decides whether gathering the name of a control symbol token and thus ceasing gathering or gathering the name of a control word token and thus continuing gathering characters until encountering a character whose category code is not 11(letter).
\␣
- ⟨control space⟩:
Under usual category code régime, where the space character is not assigned the category code 11(letter) but is assigned the category code 10(space), \␣
, ⟨control space⟩ gets tokenized as a control symbol token \␣.
When TeX writes control word tokens unexpanded to text file or to screen, it appends a space character. It does not do so with control symbol tokens.
Each ⟨control space⟩ of a sequence of ⟨control spaces⟩ is tokenized and counts. However, be aware, when TeX tokenizes a control word token or a ⟨control space⟩, the reading apparatus is switched to state S (skipping blanks), thus subsequent characters of category 10(space) occurring in the .tex-input-file are skipped and do not lead to the appending of whatsoever tokens to the token-stream.
Both tokenizing \␣␣\␣␣
and tokenizing \␣\␣
yields the control symbol tokens \␣\␣.
When TeX tokenizes a control symbol token other than ⟨control space⟩, the reading-apparatus is switched to state M (middle of line) so that a subsequent character of category 10(space) is not skipped but yields appending an explicit character token ␣10 of category 10(space) and character code 32 to the token-stream—32 is the number of the codepoint of the space character in TeX's internal character encoding scheme. In TeX-jargon such a token is called an explicit space token.
(Note that in the stage of tokenization all characters of category 10(space) not being first characters of names of control sequence tokens/not being names of control symbol tokens and not being skipped due to occurring behind a character of category 5(end of line) or 14(comment), which causes TeX to drop the remaining characters of the current line of .tex-input, and not being skipped due to the reading apparatus of TeX being in state S (skipping blanks) or N (new line), get tokenized as explicit space tokens, i.e., as explicit character tokens whose category is 10(space) and whose character code is 32 regardless the number which the codepoint of the character in question has in TeX's internal character representation scheme.)
By now control sequences were divided/classified according to tokenization and unexpanded writing.
You can also divide/classify control sequences according to their meaning:
⟨control space⟩ usually is one of TeX's primitive control sequences.
The TeXbook says a few things about the primitive ⟨control space⟩:
In "Chapter 25: Summary of Horizontal Mode" you find:
\␣
. A control-space command appends glue to the current list, using the same amount that a ⟨space token⟩ inserts when the space factor is 1000.
In "Chapter 13: Modes", somewhere right after exercise 13.1, you find:
TeX ignores blank spaces and blank lines (or
\par
commands) when it's in vertical or internal vertical mode, so you need not worry that such things might change the mode or affect a printed document. A control space (\␣
) will, however, be regarded as the beginning of a paragraph; the paragraph will start with a blank space after the indentation.
In LaTeX ~
usually is an active character (category 13).
The last time this answer was updated, the definition of recent LaTeX 2ε-kernel's active tilde ~13 could be looked up in source2e.pdf, page 438, File 18: ltspace.dtx, Date: 2024/09/12 Version v1.3s.
Usually with recent LaTeX 2ε-kernels ~13 is defined as a \protected macro which in non-engine-protected expansion contexts expands to
\ifincsname\expandafter~12\else\expandafter\nobreakspace\fi
whereby ~12 behind \expandafter is of category 12(other). So inside \csname...\endcsname
-expressions you just get a tilde character. Otherwise you get \nobreakspace which in turn is defined as a robust macro which in non-LaTeX-protected expansion contexts expands to
\leavevmode\nobreak\␣
so that ~
is basically the same as \␣
.
But ~13
- due to \leavevmode switches the typesetting mode to horizontal mode (and thus starts a new paragraph) if TeX is in vertical mode or in internal vertical mode.
- due to \nobreak in the stage of typesetting things forbids breaking the line and discarding the horizontal glue at the place where TeX currently is typesetting things when ~13 is encountered in the token-stream.
When, due to tokenizing a control word token or a ⟨control space⟩ or an explicit space token, TeX's reading apparatus is in state S (skipping blanks), so that characters of category 10(space) occurring in the .tex-input file are skipped instead of leading to the appending of an explicit space token to the token stream, tilde characters occurring in the .tex-input-file are not skipped as long as the tilde-character is assigned a category other than 10(space) and other than 5(end of line) and other than 9(ignored) and other than 14(comment) and other than 15(invalid) while usually tilde is assigned the category 13(active). Thus under usual catcode régime each tilde of a sequence of tildes is tokenized and counts.
When TeX tokenizes ~13, the reading apparatus switches to state M(middle of line) so that a subsequent space character or tilde character occurring in the token stream is not skipped but does get tokenized.
Tokenizing ~␣~␣
yields the tokens ~13␣10~13␣10 while tokenizing ~~
yields the tokens ~13~13.
I.e., characters of category 10(space) occurring in the .tex-input-file behind ⟨control space⟩ are skipped while characters of category 10(space) occurring in the .tex-input-file behind ~
are tokenized as explicit space tokens.
Above it was said "In LaTeX ~
usually is an active character (category 13)".
In expl3-mode, i.e., after using \ExplSyntaxOn
for switching to a category code régime more suitable for writing code using the routines of LaTeX's L3 programming layer, the character ~
is assigned category 10(space). Thus in expl3-mode, in the stage of tokenization, when TeX is not gathering the first character of the name of a control sequence token and when characters occurring in the .tex-input-file are not skipped/dropped otherwise, ~
is tokenized as an ordinary explicit space token ␣10
- which in the stage of expansion may serve as ⟨one optional space⟩.
- where in stages of typesetting in horizontal mode discardable horizontal glue is inserted whereby the current space factor is applied in the calculation of the length of that glue.
- where in verbatim-typesetting a space-character or a blank-character (␣) coming from the font for typesetting verbatim code is typeset.
- where a space character is written when writing to external text file or screen/terminal/console.
- ...
-
There are two redundant
\expandafter
s in the macro beginning by\ifincsname
– wipetCommented 2 days ago -
@wipet I don't fully understand. Maybe I made a mistake in trying to write up and explain things. The defintion of current LaTeX 2e's active tilde, however, is in source2e.pdf, File 18: ltspace.dtx Date: 2024/09/12 Version v1.3s, page 438. It contains
\expandafter
- I guess for ensuring expansion of\ifincsname
's matching\else
/\fi
. Probably there are situations where this is useful, I did not implement it this way, so I did not think about it. ;-) Commented 2 days ago -
This is not your problem, this is problem in LaTeX code, I know. And it is only little problem because the expandafters are "only" redundant here. There is no situation where this is useful.– wipetCommented 2 days ago
[Written; hopefully corrected; expanded; deleted; currently undeleted by request.]
Note that when expl3
syntax is active e.g. in an expl3
class or package or after \ExplSyntaxOn
, the tilde (~
) is a regular space, which allows line-breaking and
and newlines are ignored.
expl3
syntax is not usually active in text mode, but this may be relevant if you are writing text in a class or package which uses expl3
.
After complaining at length about the way its arguments are specified and spamming the terminal with warnings, latexdef \ExplSyntaxOn
finally settles down and says this:
\ExplSyntaxOn:
\protected macro:->\bool_if:NF \l__kernel_expl_bool {
\cs_set_protected:Npe \ExplSyntaxOff {
\char_set_catcode:nn {9}{\char_value_catcode:n {9}
}
\char_set_catcode:nn {32}{\char_value_catcode:n {32}}
\char_set_catcode:nn {34}{\char_value_catcode:n {34}}
\char_set_catcode:nn {58}{\char_value_catcode:n {58}}
\char_set_catcode:nn {94}{\char_value_catcode:n {94}}
\char_set_catcode:nn {95}{\char_value_catcode:n {95}}
\char_set_catcode:nn {124}{\char_value_catcode:n {124}}
\char_set_catcode:nn {126}{\char_value_catcode:n {126}}
\tex_endlinechar:D =\tex_the:D \tex_endlinechar:D \scan_stop:
\bool_set_false:N \l__kernel_expl_bool
\cs_set_protected:Npn \ExplSyntaxOff {}}
}
\char_set_catcode_ignore:n {9}
\char_set_catcode_ignore:n {32}
\char_set_catcode_other:n {34}
\char_set_catcode_letter:n {58}
\char_set_catcode_math_superscript:n {94}
\char_set_catcode_letter:n {95}
\char_set_catcode_other:n {124}
\char_set_catcode_space:n {126}
\tex_endlinechar:D =32\scan_stop:
\bool_set_true:N \l__kernel_expl_bool
Among other things, this means that
- underscores (
_
) and colons (:
) can be used in command names because they are to be treated as letters; - the tilde (
~
) is a regular space, which allows line breaks; - regular spaces (
\
is unchanged.
So
\ExplSyntaxOn
H
e
l
l
o ~ wo
r
l
d
!
is equivalent to
Hello world!
-
1(this is the answer I was secretly looking for, thank you) Commented Dec 12 at 1:46
-
@EmanueleNardi I thought that might be so, though the 'non-breaking space' hypothesis threw me somewhat. ;)– cfrCommented Dec 12 at 1:51
-
1
-
1@cfr but your description seems wrong
\
has its normal meaning Commented Dec 12 at 2:07 -