MANDOC_HTML(3) | Library Functions Manual | MANDOC_HTML(3) |
mandoc_html
—
internals of the mandoc HTML formatter
#include
<sys/types.h>
#include
"mandoc.h"
#include
"roff.h"
#include
"out.h"
#include
"html.h"
void
print_gen_decls
(struct
html *h);
void
print_gen_comment
(struct
html *h, struct roff_node
*n);
void
print_gen_head
(struct
html *h);
struct tag *
print_otag
(struct html *h,
enum htmltag tag, const char
*fmt, ...);
void
print_tagq
(struct html *h,
const struct tag *until);
void
print_stagq
(struct html *h,
const struct tag *suntil);
void
html_close_paragraph
(struct
html *h);
enum roff_tok
html_fillmode
(struct html *h,
enum roff_tok tok);
int
html_setfont
(struct html *h,
enum mandoc_esc font);
void
print_text
(struct html *h,
const char *word);
void
print_tagged_text
(struct html
*h, const char *word, struct
roff_node *n);
char *
html_make_id
(const struct roff_node
*n, int unique);
struct tag *
print_otag_id
(struct html *h,
enum htmltag tag, const char
*cattr, struct roff_node *n);
void
print_endline
(struct
html *h);
The mandoc HTML formatter is not a formal library. However, as it is compiled into more than one program, in particular mandoc(1) and man.cgi(8), and because it may be security-critical in some contexts, some documentation is useful to help to use it correctly and to prevent XSS vulnerabilities.
The formatter produces HTML output on the standard output. Since
proper escaping is usually required and best taken care of at one central
place, the language-specific formatters (*_html.c,
see FILES) are not supposed to print
directly to stdout
using functions like
printf(3),
putc(3),
puts(3), or
write(2). Instead,
they are expected to use the output functions declared in
html.h and implemented as part of the main HTML
formatting engine in html.c.
These structures are declared in html.h.
The function
print_gen_decls
()
prints the opening ⟨!DOCTYPE
⟩
declaration.
The function
print_gen_comment
()
prints the leading comments, usually containing a Copyright notice and
license, as an HTML comment. It is intended to be called right after opening
the ⟨HTML
⟩ element. Pass the first
ROFFT_COMMENT
node in n.
The function
print_gen_head
()
prints the opening ⟨META
⟩ and
⟨LINK
⟩ elements for the document
⟨HEAD
⟩, using the
style member of h unless that is
NULL
. It uses print_otag
()
which takes care of properly encoding attributes, which is relevant for the
style link in particular.
The function
print_otag
()
prints the start tag of an HTML element with the name
tag, optionally including the attributes specified by
fmt. If fmt is the empty string,
no attributes are written. Each letter of fmt
specifies one attribute to write. Most attributes require one
char * argument which becomes the value of the
attribute. The arguments have to be given in the same order as the attribute
letters. If an argument is NULL
, the respective
attribute is not written.
c
class
attribute.h
href
attribute. This attribute letter can
optionally be followed by a modifier letter. If followed by
R
, it formats the link as a local one by prefixing
a ‘#’ character. If followed by I
,
it interpretes the argument as a header file name and generates a link
using the mandoc(1)
-O
includes
option. If
followed by M
, it takes two arguments instead of
one, a manual page name and section, and formats them as a link to a
manual page using the
mandoc(1)
-O
man
option.i
id
attribute.?
NULL
.s
style
attribute. If present, it must be
the last format letter. It requires two char *
arguments. The first is the name of the style property, the second its
value. The name must not be NULL
. The
s
fmt letter can be
repeated, each repetition requiring an additional pair of
char * arguments.print_otag
()
uses the private function
print_encode
()
to take care of HTML encoding. If required by the element type, it remembers
in h that the element is open. The function
print_tagq
()
is used to close out all open elements up to and including
until;
print_stagq
()
is a variant to close out all open elements up to but excluding
suntil. The function
html_close_paragraph
()
closes all open elements that establish phrasing context, thus returning to
the innermost flow context.
The function
html_fillmode
()
switches to fill mode if want is
ROFF_fi
or to no-fill mode if
want is ROFF_nf
. Switching
from fill mode to no-fill mode closes the current paragraph and opens a
⟨PRE
⟩ element. Switching in the
opposite direction closes the ⟨PRE
⟩
element, but does not open a new paragraph. If want
matches the mode that is already active, no elements are closed nor opened.
If want is TOKEN_NONE
, the
mode remains as it is.
The function
html_setfont
()
selects the font, which can be
ESCAPE_FONTROMAN
,
ESCAPE_FONTBOLD
,
ESCAPE_FONTITALIC
,
ESCAPE_FONTBI
, or
ESCAPE_FONTCW
, for future text output and internally
remembers the font that was active before the change. If the
font argument is
ESCAPE_FONTPREV
, the current and the previous font
are exchanged. This function only changes the internal state of the
h object; no HTML elements are written yet. Subsequent
text output will write font elements when needed.
The function
print_text
()
prints HTML element content. It uses the private function
print_encode
()
to take care of HTML encoding. If the document has requested a non-standard
font, for example using a roff(7)
\f
font escape sequence,
print_text
() wraps word in an
HTML font selection element using the print_otag
()
and print_tagq
() functions.
The function
print_tagged_text
()
is a variant of print_text
() that wraps
word in an ⟨A
⟩
element of class "permalink" if n is not
NULL
and yields a segment identifier when passed to
html_make_id
().
The function
html_make_id
()
allocates a string to be used for the id
attribute
of an HTML element and/or as a segment identifier for a URI in an
⟨A
⟩ element. If
n contains a tag attribute, it
is used; otherwise, child nodes are used. If n is an
Sh
, Ss
,
Sx
, SH
, or
SS
node, the resulting string is the concatenation
of the child strings; for other node types, only the first child is used.
Bytes not permitted in URI-fragment strings are replaced by underscores. If
any of the children to be used is not a text node, no string is generated
and NULL
is returned instead. If the
unique argument is non-zero, deduplication is
performed by appending an underscore and a decimal integer, if necessary. If
the unique argument is 1, this is assumed to be the
first call for this tag at this location, typically for use by
NODE_ID
, so the integer is incremented before use.
If the unique argument is 2, this is ssumed to be the
second call for this tag at this location, typically for use by
NODE_HREF
, so the existing integer, if any, is used
without incrementing it.
The function
print_otag_id
()
opens a tag element of class
cattr for the node n. If the
flag NODE_ID
is set in n, it
attempts to generate an id
attribute with
html_make_id
(). If the flag
NODE_HREF
is set in n, an
⟨A
⟩ element of class
"permalink" is added: outside if n generates
an element that can only occur in phrasing context, or inside otherwise.
This function is a wrapper around html_make_id
() and
print_otag
(), automatically chosing the
unique argument appropriately and setting the
fmt arguments to "chR" and "ci",
respectively.
The function
print_endline
()
makes sure subsequent output starts on a new HTML output line. If nothing
was printed on the current output line yet, it has no effect. Otherwise, it
appends any buffered text to the current output line, ends the line, and
updates the internal state of the h object.
The functions
print_eqn
(),
print_tbl
(),
and
print_tblclose
()
are not yet documented.
The functions print_otag
() and
print_otag_id
() return a pointer to a new element on
the stack of HTML elements. When print_otag_id
()
opens two elements, a pointer to the outer one is returned. The memory
pointed to is owned by the library and is automatically
free(3)d when
print_tagq
() is called on it or when
print_stagq
() is called on a parent element.
The function html_fillmode
() returns
ROFF_fi
if fill mode was active before the call or
ROFF_nf
otherwise.
The function html_make_id
() returns a
newly allocated string or NULL
if
n lacks text data to create the attribute from. The
caller is responsible for
free(3)ing the
returned string after using it.
In case of malloc(3) failure, these functions do not return but call err(3).
br
, ce
,
fi
, ft
,
nf
, rj
, and
sp
.The mandoc HTML formatter was written by Kristaps Dzonsons <kristaps@bsd.lv>. It is maintained by Ingo Schwarze <schwarze@openbsd.org>, who also wrote this manual.
April 24, 2020 | OpenBSD 6.7 |