systemd/tools/make-man-index.py

#!/usr/bin/env python3
# SPDX-License-Identifier: LGPL-2.1-or-later

import collections
import re
import sys

from xml_helper import tree, xml_parse, xml_print

MDASH = ' — ' if sys.version_info.major >= 3 else ' -- '

TEMPLATE = '''\
<refentry id="systemd.index">

  <refentryinfo>
    <title>systemd.index</title>
    <productname>systemd</productname>
  </refentryinfo>

  <refmeta>
    <refentrytitle>systemd.index</refentrytitle>
    <manvolnum>7</manvolnum>
  </refmeta>

  <refnamediv>
    <refname>systemd.index</refname>
    <refpurpose>List all manpages from the systemd project</refpurpose>
  </refnamediv>
</refentry>
'''

SUMMARY = '''\
  <refsect1>
    <title>See Also</title>
    <para>
      <citerefentry><refentrytitle>systemd.directives</refentrytitle><manvolnum>7</manvolnum></citerefentry>
    </para>

    <para id='counts' />
  </refsect1>
'''

COUNTS = '\
This index contains {count} entries, referring to {pages} individual manual pages.'


def check_id(page, t):
    page_id = t.getroot().get('id')
    if not re.search('/' + page_id + '[.]', page.translate(str.maketrans('@', '_'))):
        raise ValueError(f"id='{page_id}' is not the same as page name '{page}'")

def make_index(pages):
    index = collections.defaultdict(list)
    for p in pages:
        t = xml_parse(p)
        check_id(p, t)
        section = t.find('./refmeta/manvolnum').text
        refname = t.find('./refnamediv/refname').text
        purpose_text = ' '.join(t.find('./refnamediv/refpurpose').itertext())
        purpose = ' '.join(purpose_text.split())
        for f in t.findall('./refnamediv/refname'):
            infos = (f.text, section, purpose, refname)
            index[f.text[0].upper()].append(infos)
    return index

def add_letter(template, letter, pages):
    refsect1 = tree.SubElement(template, 'refsect1')
    title = tree.SubElement(refsect1, 'title')
    title.text = letter
    para = tree.SubElement(refsect1, 'para')
    for info in sorted(pages, key=lambda info: str.lower(info[0])):
        refname, section, purpose, _realname = info

        b = tree.SubElement(para, 'citerefentry')
        c = tree.SubElement(b, 'refentrytitle')
        c.text = refname
        d = tree.SubElement(b, 'manvolnum')
        d.text = section

        b.tail = MDASH + purpose # + ' (' + p + ')'

        tree.SubElement(para, 'sbr')

def add_summary(template, indexpages):
    count = 0
    pages = set()
    for group in indexpages:
        count += len(group)
        for info in group:
            _refname, section, _purpose, realname = info
            pages.add((realname, section))

    refsect1 = tree.fromstring(SUMMARY)
    template.append(refsect1)

    para = template.find(".//para[@id='counts']")
    para.text = COUNTS.format(count=count, pages=len(pages))

def make_page(*xml_files):
    template = tree.fromstring(TEMPLATE)
    index = make_index(xml_files)

    for letter in sorted(index):
        add_letter(template, letter, index[letter])

    add_summary(template, index.values())

    return template

if __name__ == '__main__':
    with open(sys.argv[1], 'wb') as file:
        file.write(xml_print(make_page(*sys.argv[2:])))
more portable python shebangs (#5816) This is useful on systems like NixOS, where python3 is not in /usr/bin/python3 as well as for people using alternative ways to install python such as virtualenv/pyenv. 2017-05-01 08:26:56 +08:00			`#!/usr/bin/env python3`
license: LGPL-2.1+ -> LGPL-2.1-or-later 2020-11-09 12:23:58 +08:00			`# SPDX-License-Identifier: LGPL-2.1-or-later`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`import collections`
build-sys: check if manpage ids match file names Commit ed1553a fixed current errors, but this error is easy to make. A wrong id messes up the indexes and linking, so it is better to catch this automatically. 2013-03-08 02:04:17 +08:00			`import re`
tools: pylint make-man-index.py 2023-07-18 02:02:54 +08:00			`import sys`

			`from xml_helper import tree, xml_parse, xml_print`
build-sys,man: use XML entities to substite strings This makes it easier to add substitutions to man pages, avoiding the separate transformation step. mkdir -p's are removed from the rule, because xsltproc will will create directories on it's own. All in all, two or three forks per man page are avoided, which should make things marginally faster. Unfortunately python parsers must too be tweaked to handle entities. This isn't particularly easy: with lxml a custom Resolver can be used, but the stdlib etree doesn't support external entities at all. So when running without lxml, the entities are just removed. Right now it doesn't matter, since the entities are not indexed anyway. But I intend to add indexing of filenames in the near future, and then the index generated without lxml might be missing a few lines. Oh well. 2013-03-30 02:22:27 +08:00
make-man-index: work around UnicodeDecodeError 2013-01-15 23:34:59 +08:00			`MDASH = ' — ' if sys.version_info.major >= 3 else ' -- '`
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00
			`TEMPLATE = '''\`
man: drop obsolete HAVE_PYTHON conditional It stopped making sense when automake support was dropped and python started being required to perform a build. Follow-up for 72cdb3e783174dcf9223a49f03e3b0e2ca95ddb8. 2020-06-13 22:44:58 +08:00			`<refentry id="systemd.index">`
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00
			`<refentryinfo>`
			`<title>systemd.index</title>`
			`<productname>systemd</productname>`
			`</refentryinfo>`

			`<refmeta>`
			`<refentrytitle>systemd.index</refentrytitle>`
			`<manvolnum>7</manvolnum>`
			`</refmeta>`

			`<refnamediv>`
			`<refname>systemd.index</refname>`
			`<refpurpose>List all manpages from the systemd project</refpurpose>`
			`</refnamediv>`
			`</refentry>`
			`'''`

			`SUMMARY = '''\`
			`<refsect1>`
			`<title>See Also</title>`
			`<para>`
			`<citerefentry><refentrytitle>systemd.directives</refentrytitle><manvolnum>7</manvolnum></citerefentry>`
			`</para>`

			`<para id='counts' />`
			`</refsect1>`
			`'''`

			`COUNTS = '\`
			`This index contains {count} entries, referring to {pages} individual manual pages.'`

build-sys,man: use XML entities to substite strings This makes it easier to add substitutions to man pages, avoiding the separate transformation step. mkdir -p's are removed from the rule, because xsltproc will will create directories on it's own. All in all, two or three forks per man page are avoided, which should make things marginally faster. Unfortunately python parsers must too be tweaked to handle entities. This isn't particularly easy: with lxml a custom Resolver can be used, but the stdlib etree doesn't support external entities at all. So when running without lxml, the entities are just removed. Right now it doesn't matter, since the entities are not indexed anyway. But I intend to add indexing of filenames in the near future, and then the index generated without lxml might be missing a few lines. Oh well. 2013-03-30 02:22:27 +08:00
build-sys: check if manpage ids match file names Commit ed1553a fixed current errors, but this error is easy to make. A wrong id messes up the indexes and linking, so it is better to catch this automatically. 2013-03-08 02:04:17 +08:00			`def check_id(page, t):`
tools: pylint make-man-index.py 2023-07-18 02:02:54 +08:00			`page_id = t.getroot().get('id')`
man: make ID valid The id attribute is of type ID, defined at https://www.w3.org/TR/1998/REC-xml-19980210#id . It may contain only selected non-alphanumeric characters; '@' is not among them. 2023-12-24 00:22:04 +08:00			`if not re.search('/' + page_id + '[.]', page.translate(str.maketrans('@', '_'))):`
tools: pylint make-man-index.py 2023-07-18 02:02:54 +08:00			`raise ValueError(f"id='{page_id}' is not the same as page name '{page}'")`
build-sys: check if manpage ids match file names Commit ed1553a fixed current errors, but this error is easy to make. A wrong id messes up the indexes and linking, so it is better to catch this automatically. 2013-03-08 02:04:17 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`def make_index(pages):`
			`index = collections.defaultdict(list)`
			`for p in pages:`
build-sys,man: use XML entities to substite strings This makes it easier to add substitutions to man pages, avoiding the separate transformation step. mkdir -p's are removed from the rule, because xsltproc will will create directories on it's own. All in all, two or three forks per man page are avoided, which should make things marginally faster. Unfortunately python parsers must too be tweaked to handle entities. This isn't particularly easy: with lxml a custom Resolver can be used, but the stdlib etree doesn't support external entities at all. So when running without lxml, the entities are just removed. Right now it doesn't matter, since the entities are not indexed anyway. But I intend to add indexing of filenames in the near future, and then the index generated without lxml might be missing a few lines. Oh well. 2013-03-30 02:22:27 +08:00			`t = xml_parse(p)`
build-sys: check if manpage ids match file names Commit ed1553a fixed current errors, but this error is easy to make. A wrong id messes up the indexes and linking, so it is better to catch this automatically. 2013-03-08 02:04:17 +08:00			`check_id(p, t)`
man: show man page summary in index, too 2012-07-16 23:39:26 +08:00			`section = t.find('./refmeta/manvolnum').text`
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`refname = t.find('./refnamediv/refname').text`
tools/make-man-index: fix purpose text that contains tags 2020-08-16 09:28:46 +08:00			`purpose_text = ' '.join(t.find('./refnamediv/refpurpose').itertext())`
			`purpose = ' '.join(purpose_text.split())`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00			`for f in t.findall('./refnamediv/refname'):`
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`infos = (f.text, section, purpose, refname)`
			`index[f.text[0].upper()].append(infos)`
			`return index`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`def add_letter(template, letter, pages):`
			`refsect1 = tree.SubElement(template, 'refsect1')`
			`title = tree.SubElement(refsect1, 'title')`
			`title.text = letter`
			`para = tree.SubElement(refsect1, 'para')`
			`for info in sorted(pages, key=lambda info: str.lower(info[0])):`
tools: pylint make-man-index.py 2023-07-18 02:02:54 +08:00			`refname, section, purpose, _realname = info`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`b = tree.SubElement(para, 'citerefentry')`
			`c = tree.SubElement(b, 'refentrytitle')`
			`c.text = refname`
			`d = tree.SubElement(b, 'manvolnum')`
			`d.text = section`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
make-man-index: work around UnicodeDecodeError 2013-01-15 23:34:59 +08:00			`b.tail = MDASH + purpose # + ' (' + p + ')'`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`tree.SubElement(para, 'sbr')`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`def add_summary(template, indexpages):`
			`count = 0`
			`pages = set()`
			`for group in indexpages:`
			`count += len(group)`
			`for info in group:`
tools: pylint make-man-index.py 2023-07-18 02:02:54 +08:00			`_refname, section, _purpose, realname = info`
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`pages.add((realname, section))`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`refsect1 = tree.fromstring(SUMMARY)`
			`template.append(refsect1)`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`para = template.find(".//para[@id='counts']")`
			`para.text = COUNTS.format(count=count, pages=len(pages))`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
build-sys,man: use XML entities to substite strings This makes it easier to add substitutions to man pages, avoiding the separate transformation step. mkdir -p's are removed from the rule, because xsltproc will will create directories on it's own. All in all, two or three forks per man page are avoided, which should make things marginally faster. Unfortunately python parsers must too be tweaked to handle entities. This isn't particularly easy: with lxml a custom Resolver can be used, but the stdlib etree doesn't support external entities at all. So when running without lxml, the entities are just removed. Right now it doesn't matter, since the entities are not indexed anyway. But I intend to add indexing of filenames in the near future, and then the index generated without lxml might be missing a few lines. Oh well. 2013-03-30 02:22:27 +08:00			`def make_page(*xml_files):`
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`template = tree.fromstring(TEMPLATE)`
			`index = make_index(xml_files)`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`for letter in sorted(index):`
			`add_letter(template, letter, index[letter])`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`add_summary(template, index.values())`
min: generate an index page for all man pages This makes use of python, if it is available 2012-07-16 23:19:39 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`return template`
man: set description in italics in the index 2012-07-17 00:10:18 +08:00
man: generate xml not html for index This way we also get a man page. The output is not as polished. I hope that it doesn't matter too much. index.html is not generated now, the page is called systemd.index.html. If necessary, an install hook should be added. 2013-01-15 11:17:49 +08:00			`if __name__ == '__main__':`
tools: pylint make-man-index.py 2023-07-18 02:02:54 +08:00			`with open(sys.argv[1], 'wb') as file:`
			`file.write(xml_print(make_page(*sys.argv[2:])))`