mirror of
https://sourceware.org/git/glibc.git
synced 2024-11-23 17:53:37 +08:00
25c9c3789e
= `Default_Ignorable_Code_Point`s should have width 0 = Unicode specifies (https://www.unicode.org/faq/unsup_char.html#3) that characters with the `Default_Ignorable_Code_Point` property > should be rendered as completely invisible (and non advancing, i.e. “zero width”), if not explicitly supported in rendering. Hence, `wcwidth()` should give them all a width of 0, with two exceptions: - the soft hyphen (U+00AD SOFT HYPHEN) is assigned width 1 by longstanding precedent - U+115F HANGUL CHOSEONG FILLER needs a carveout due to the unique behavior of the conjoining Korean jamo characters. One composed Hangul "syllable block" like 퓛 is made up of two to three individual component characters, or "jamo". These are all assigned an `East_Asian_Width` of `Wide` by Unicode, which would normally mean they would all be assigned width 2 by glibc; a combination of (leading choseong jamo) + (medial jungseong jamo) + (trailing jongseong jamo) would then have width 2 + 2 + 2 = 6. However, glibc (and other wcwidth implementations) special-cases jungseong and jongseong, assigning them all width 0, to ensure that the complete block has width 2 + 0 + 0 = 2 as it should. U+115F is meant for use in syllable blocks that are intentionally missing a leading jamo; it must be assigned a width of 2 even though it has no visible display to ensure that the complete block has width 2. However, `wcwidth()` currently (before this patch) incorrectly assigns non-zero width to U+3164 HANGUL FILLER and U+FFA0 HALFWIDTH HANGUL FILLER; this commit fixes that. Unicode spec references: - Hangul: §3.12 https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G24646 and §18.6 https://www.unicode.org/versions/Unicode15.0.0/ch18.pdf#G31028 - `Default_Ignorable_Code_Point`: §5.21 https://www.unicode.org/versions/Unicode15.0.0/ch05.pdf#G40095. = Non-`Default_Ignorable_Code_Point` format controls should be visible = The Unicode Standard, §5.21 - Characters Ignored for Display (https://www.unicode.org/versions/Unicode15.0.0/ch05.pdf#G40095) says the following: > A small number of format characters (General_Category = Cf ) > are also not given the Default_Ignorable_Code_Point property. > This may surprise implementers, who often assume > that all format characters are generally ignored in fallback display. > The exact list of these exceptional format characters > can be found in the Unicode Character Database. > There are, however, three important sets of such format characters to note: > > - prepended concatenation marks > - interlinear annotation characters > - Egyptian hieroglyph format controls > > The prepended concatenation marks always have a visible display. > See “Prepended Concatenation Marks” in [*Section 23.2, Layout Controls*](https://www.unicode.org/versions/Unicode15.1.0/ch23.pdf#M9.35858.HeadingBreak.132.Layout.Controls) > for more discussion of the use and display of these signs. > > The other two notable sets of format characters that exceptionally are not ignored > in fallback display consist of the interlinear annotation characters, > U+FFF9 INTERLINEAR ANNOTATION ANCHOR through > U+FFFB INTERLINEAR ANNOTATION TERMINATOR, > and the Egyptian hieroglyph format controls, > U+13430 EGYPTIAN HIEROGLYPH VERTICAL JOINER through > U+1343F EGYPTIAN HIEROGLYPH END WALLED ENCLOSURE. > These characters should have a visible glyph display for fallback rendering, > because if they are not displayed, > it is too easy to misread the resulting displayed text. > See “Annotation Characters” in [*Section 23.8, Specials*](https://www.unicode.org/versions/Unicode15.1.0/ch23.pdf#M9.21335.Heading.133.Specials), > as well as [*Section 11.4, Egyptian Hieroglyphs*](https://www.unicode.org/versions/Unicode15.1.0/ch11.pdf#M9.73291.Heading.1418.Egyptian.Hieroglyphs) > for more discussion of the use and display of these characters. glibc currently correctly assigns non-zero width to the prepended concatenation marks, but it incorrectly gives zero width to the interlinear annotation characters (which a generic terminal cannot interpret) and the Egyptian hieroglyph format controls (which are not widely supported in rendering implementations at present). This commit fixes both these issues as well. = Derive Hangul syllable type from Unicode data = Previosuly, the jungseong and jongseong jamo ranges were hard-coded into the script. With this commit, they are instead parsed from the HangulSyllableType.txt data file published by Unicode. This does not affect the end result. Signed-off-by: Jules Bertholet <julesbertholet@quoi.xyz>
223 lines
6.4 KiB
Plaintext
223 lines
6.4 KiB
Plaintext
# Files shared with other projects. Pass a file path to the
|
|
# get_glibc_shared_code() function in the python library
|
|
# scripts/glibc_shared_code.py to get a dict object with this information. See
|
|
# the library sources for more information.
|
|
|
|
# The headers on most of these files indicate that glibc is the canonical
|
|
# source for these files, although in many cases there seem to be useful
|
|
# changes in the gnulib versions that could be merged back in. Not all gnulib
|
|
# files contain such a header and it is not always consistent in its format, so
|
|
# it would be useful to make sure that all gnulib files that are using glibc as
|
|
# upstream have a greppable header.
|
|
#
|
|
# These files are quite hard to find without a header to grep for and each file
|
|
# has to be compared manually so this list is likely incomplete or may contain
|
|
# errors.
|
|
gnulib:
|
|
argp/argp-ba.c
|
|
argp/argp-ba.c
|
|
argp/argp-eexst.c
|
|
argp/argp-fmtstream.c
|
|
argp/argp-fmtstream.h
|
|
argp/argp-fs-xinl.c
|
|
argp/argp-help.c
|
|
argp/argp-namefrob.h
|
|
argp/argp-parse.c
|
|
argp/argp-pv.c
|
|
argp/argp-pvh.c
|
|
argp/argp-xinl.c
|
|
argp/argp.h
|
|
dirent/alphasort.c
|
|
dirent/scandir.c
|
|
# Merged from gnulib 2021-09-21
|
|
include/intprops.h
|
|
# Merged from gnulib 2021-09-21
|
|
include/regex.h
|
|
locale/programs/3level.h
|
|
# Merged from gnulib 2014-6-23
|
|
malloc/obstack.c
|
|
# Merged from gnulib 2014-6-23
|
|
malloc/obstack.h
|
|
# Merged from gnulib 2014-07-10
|
|
misc/error.c
|
|
misc/error.h
|
|
misc/getpass.c
|
|
misc/mkdtemp.c
|
|
# Merged from gnulib 2021-09-21
|
|
misc/sys/cdefs.h
|
|
posix/fnmatch_loop.c
|
|
# Intended to be the same. Gnulib copy contains glibc changes.
|
|
posix/getopt.c
|
|
# Intended to be the same. Gnulib copy contains glibc changes.
|
|
posix/getopt1.c
|
|
# Intended to be the same. Gnulib copy contains glibc changes.
|
|
posix/getopt_int.h
|
|
posix/glob.c
|
|
# Merged from gnulib 2021-09-21
|
|
posix/regcomp.c
|
|
# Merged from gnulib 2021-09-21
|
|
posix/regex.c
|
|
# Merged from gnulib 2021-09-21
|
|
posix/regex.h
|
|
# Merged from gnulib 2021-09-21
|
|
posix/regex_internal.c
|
|
# Merged from gnulib 2021-09-21
|
|
posix/regex_internal.h
|
|
# Merged from gnulib 2021-09-21
|
|
posix/regexec.c
|
|
posix/spawn.c
|
|
posix/spawn_faction_addclose.c
|
|
posix/spawn_faction_adddup2.c
|
|
posix/spawn_faction_addopen.c
|
|
posix/spawn_faction_destroy.c
|
|
posix/spawn_faction_init.c
|
|
posix/spawn_int.h
|
|
posix/spawnattr_destroy.c
|
|
posix/spawnattr_getdefault.c
|
|
posix/spawnattr_getflags.c
|
|
posix/spawnattr_getpgroup.c
|
|
posix/spawnattr_getschedparam.c
|
|
posix/spawnattr_getschedpolicy.c
|
|
posix/spawnattr_getsigmask.c
|
|
posix/spawnattr_init.c
|
|
posix/spawnattr_setdefault.c
|
|
posix/spawnattr_setflags.c
|
|
posix/spawnattr_setpgroup.c
|
|
posix/spawnattr_setschedparam.c
|
|
posix/spawnattr_setschedpolicy.c
|
|
posix/spawnattr_setsigmask.c
|
|
posix/spawnp.c
|
|
stdlib/atoll.c
|
|
stdlib/getsubopt.c
|
|
stdlib/setenv.c
|
|
stdlib/strtoll.c
|
|
stdlib/strtoul.c
|
|
# Merged from gnulib 2014-6-26, needs merge back
|
|
string/memchr.c
|
|
string/memcmp.c
|
|
string/memmem.c
|
|
string/mempcpy.c
|
|
string/memrchr.c
|
|
string/rawmemchr.c
|
|
string/stpcpy.c
|
|
string/stpncpy.c
|
|
string/str-two-way.h
|
|
string/strcasestr.c
|
|
string/strcspn.c
|
|
string/strdup.c
|
|
string/strndup.c
|
|
string/strpbrk.c
|
|
string/strsignal.c
|
|
string/strstr.c
|
|
string/strtok_r.c
|
|
string/strverscmp.c
|
|
# Merged from gnulib 2024-04-08 (gnulib commit 3238349628)
|
|
stdio-common/tmpdir.c
|
|
stdio-common/tmpdir.h
|
|
sysdeps/generic/pty-private.h
|
|
sysdeps/generic/siglist.h
|
|
sysdeps/posix/euidaccess.c
|
|
sysdeps/posix/gai_strerror.c
|
|
sysdeps/posix/getcwd.c
|
|
sysdeps/posix/pwrite.c
|
|
sysdeps/posix/spawni.c
|
|
# Merged from gnulib 2024-04-08 (gnulib commit 3238349628)
|
|
sysdeps/posix/tempname.c
|
|
# Merged from gnulib 2014-6-27
|
|
time/mktime.c
|
|
time/mktime-internal.h
|
|
time/strptime.c
|
|
time/timegm.c
|
|
|
|
# The last merge was 2014-12-11 and merged gettext 0.19.3 into glibc with a
|
|
# patch submitted to the gettext mailing list for changes that could be merged
|
|
# back.
|
|
#
|
|
# This commit was omitted from the merge as it does not appear to be compatible
|
|
# with how glibc expects things to work:
|
|
#
|
|
# commit 279b57fc367251666f00e8e2b599b83703451afb
|
|
# Author: Bruno Haible <bruno@clisp.org>
|
|
# Date: Fri Jun 14 12:03:49 2002 +0000
|
|
#
|
|
# Make absolute pathnames inside $LANGUAGE work.
|
|
gettext:
|
|
intl/bindtextdom.c
|
|
intl/dcgettext.c
|
|
intl/dcigettext.c
|
|
intl/dcngettext.c
|
|
intl/dgettext.c
|
|
intl/dngettext.c
|
|
intl/explodename.c
|
|
intl/finddomain.c
|
|
intl/gettext.c
|
|
intl/gettextP.h
|
|
intl/gmo.h
|
|
intl/hash-string.c
|
|
intl/hash-string.h
|
|
intl/l10nflist.c
|
|
intl/loadinfo.h
|
|
intl/loadmsgcat.c
|
|
intl/locale.alias
|
|
intl/localealias.c
|
|
intl/ngettext.c
|
|
intl/plural-exp.c
|
|
intl/plural-exp.h
|
|
intl/plural.y
|
|
intl/textdomain.c
|
|
|
|
# The following files are shared with the upstream Unicode project and must be
|
|
# updated regularly to stay in sync with the upstream unicode releases.
|
|
#
|
|
# Merged from Unicode 15.1.0 release.
|
|
unicode:
|
|
localedata/unicode-gen/UnicodeData.txt
|
|
localedata/unicode-gen/unicode-license.txt
|
|
localedata/unicode-gen/DerivedCoreProperties.txt
|
|
localedata/unicode-gen/EastAsianWidth.txt
|
|
localedata/unicode-gen/HangulSyllableType.txt
|
|
|
|
# The following files are shared with the upstream tzcode project and must be
|
|
# updated regularly to stay in sync with the upstream releases.
|
|
#
|
|
# Currently synced to TZDB 2024a, announced and distributed here:
|
|
# https://mm.icann.org/pipermail/tz-announce/2024-February/000081.html
|
|
# https://data.iana.org/time-zones/releases/tzdb-2024a.tar.lz
|
|
tzcode:
|
|
timezone/private.h
|
|
timezone/tzfile.h
|
|
timezone/tzselect.ksh
|
|
timezone/version
|
|
timezone/zdump.c
|
|
timezone/zic.c
|
|
|
|
# The following files are shared with the upstream tzdata project but is not
|
|
# synchronized regularly. The data files themselves are used only for testing
|
|
# purposes and their data is never used to generate any output. We synchronize
|
|
# them only to stay on top of newer data that might help with testing.
|
|
#
|
|
# Currently synced to tzcode 2009i, announced and distributed here:
|
|
# https://mm.icann.org/pipermail/tz/2009-June/040697.html
|
|
# https://data.iana.org/time-zones/releases/tzdata2009i.tar.gz
|
|
tzdata:
|
|
timezone/africa
|
|
timezone/antarctica
|
|
timezone/asia
|
|
timezone/australasia
|
|
timezone/europe
|
|
timezone/northamerica
|
|
timezone/southamerica
|
|
timezone/pacificnew
|
|
timezone/etcetera
|
|
timezone/factory
|
|
timezone/backward
|
|
timezone/systemv
|
|
timezone/solar87
|
|
timezone/solar88
|
|
timezone/solar89
|
|
timezone/iso3166.tab
|
|
timezone/zone.tab
|
|
timezone/leapseconds
|
|
# This is yearistype.sh in the parent project
|
|
timezone/yearistype
|