Windows places filenames with a trailing dot or space in the Win32
namespace and allows setting DOS names on such files. This is true even
though on Windows such filenames can only be created and accessed using
WinNT-style paths and will confuse most Windows software. Regardless,
because libntfs-3g did not allow setting DOS names on such files, in
some cases it was impossible to correctly restore, using libntfs-3g, a
directory structure that was created under Windows.
Update ntfs_set_ntfs_dos_name() to permit operating on a file that has a
long name with a trailing dot or space. But continue to forbid creating
such names on a filesystem FUSE-mounted with the windows_name option.
Additionally, continue to forbid a trailing a dot or space in DOS names;
this matches the Windows behavior.
(contributed by Eric Biggers)
If an output buffer was provided, ntfs_utf16_to_utf8() limited the
output string length without the terminating null to 'outs_len'. This
was incorrect because a terminating null was always added to the string,
causing a buffer overrun if the output string happened to have exactly
the maximum length. This was a longstanding bug. Fix it by leaving
space for a terminating null.
(contributed by Eric Biggers)
utf16_to_utf8_size() was not guaranteed to fail with ENAMETOOLONG if the
computed length was greater than @outs_len. This could cause a buffer
overrun in ntfs_utf16_to_utf8().
(contributed by Eric Biggers)
- Update documentation for COLLATION_RULES
- Document how ntfs_names_full_collate() compares names
- Update comments and DEBUG code to reflect that ntfs_names_full_collate()
always access 'upcase', even in CASE_SENSITIVE mode
- Remove unneeded assignments to 'c1' and 'c2' in IGNORE_CASE mode
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
The upper case value for 0x1d79 is 0xa77d, so the difference is 0x8a04,
which overflows in the table which defines the computation of upper case
values. Rewriting this difference as -0x75fc leads to the same result
in an upper case table truncated to two bytes, and this avoid the
compiler warning.
In the mailing list discussion we came to the conclusion that there
doesn't seem to be any reason to keep these declarations separate since
they address the same issue, namely libntfs-3g's tolerance for bad
Unicode data in filenames and other UTF-16 strings in the file system,
so merge the two defines into the new define ALLOW_BROKEN_UNICODE.
Windows filenames may contain invalid UTF-16 sequences (specifically
broken surrogate pairs), which cannot be converted to UTF-8 if we do
strict conversion.
This patch enables encoding broken UTF-16 into similarly broken UTF-8 by
encoding any surrogate character that don't have a match into a separate
3-byte UTF-8 sequence.
This is "sort of" valid UTF-8, but not valid Unicode since the code
points used for surrogate pair encoding are not supposed to occur in a
valid Unicode string... but on the other hand the source UTF-16 data is
also broken, so we aren't really making things any worse.
This format is sometimes referred to as WTF-8 (Wobbly Translation
Format, 8-bit encoding) and is a common solution to represent broken
UTF-16 as UTF-8.
It is a lossless round-trip conversion, i.e converting from broken
UTF-16 to "WTF-8" and back to UTF-16 yields the same broken UTF-16
sequence. Because of this property it enables accessing these files
by filename through ntfs-3g and the ntfsprogs (e.g. ls -la works as
expected).
To disable this behaviour you can pass the preprocessor/compiler flag
'-DALLOW_BROKEN_SURROGATES=0' when building ntfs-3g.
Newer versions of Windows use more recent definitions of upper-case
table defined by the Unicode consortium. Now using the same table as
Windows 7, windows 8 and Windows 10. This only has an effect on file
systems newly created by mkntfs.
Windows applies legacy restrictions to file names, so when the option
windows_names is applied, reject the same reserved names, which are
CON, PRN, AUX, NUL, COM1..COM9, and LPT1..LPT9