Commit Graph

35 Commits

Author SHA1 Message Date
Jean-Pierre André
e7c5950117 Silenced a truncation warning in upper case table
The upper case value for 0x1d79 is 0xa77d, so the difference is 0x8a04,
which overflows in the table which defines the computation of upper case
values. Rewriting this difference as -0x75fc leads to the same result
in an upper case table truncated to two bytes, and this avoid the
compiler warning.
2016-05-31 08:24:23 +02:00
Erik Larsson
f0370bfa9c unistr.c: Unify the two defines NOREVBOM and ALLOW_BROKEN_SURROGATES.
In the mailing list discussion we came to the conclusion that there
doesn't seem to be any reason to keep these declarations separate since
they address the same issue, namely libntfs-3g's tolerance for bad
Unicode data in filenames and other UTF-16 strings in the file system,
so merge the two defines into the new define ALLOW_BROKEN_UNICODE.
2016-04-12 17:02:40 +02:00
Erik Larsson
d9c61dd60e unistr.c: Enable encoding broken UTF-16 into broken UTF-8, A.K.A. WTF-8.
Windows filenames may contain invalid UTF-16 sequences (specifically
broken surrogate pairs), which cannot be converted to UTF-8 if we do
strict conversion.

This patch enables encoding broken UTF-16 into similarly broken UTF-8 by
encoding any surrogate character that don't have a match into a separate
3-byte UTF-8 sequence.

This is "sort of" valid UTF-8, but not valid Unicode since the code
points used for surrogate pair encoding are not supposed to occur in a
valid Unicode string... but on the other hand the source UTF-16 data is
also broken, so we aren't really making things any worse.

This format is sometimes referred to as WTF-8 (Wobbly Translation
Format, 8-bit encoding) and is a common solution to represent broken
UTF-16 as UTF-8.

It is a lossless round-trip conversion, i.e converting from broken
UTF-16 to "WTF-8" and back to UTF-16 yields the same broken UTF-16
sequence. Because of this property it enables accessing these files
by filename through ntfs-3g and the ntfsprogs (e.g. ls -la works as
expected).

To disable this behaviour you can pass the preprocessor/compiler flag
'-DALLOW_BROKEN_SURROGATES=0' when building ntfs-3g.
2016-04-08 05:39:48 +02:00
Erik Larsson
9893ea9ee6 Merge endianness fixes.
Conflicts:
	libntfs-3g/attrib.c
2016-01-28 09:22:42 +01:00
Erik Larsson
9cf04fd2cd Fix incorrect usage of native/little-endian types, signed types, etc.
This is harmless with regard to code generation but if we turn on strict
type checking these type mismatches will result in errors.
2015-12-21 23:55:31 +01:00
Erik Larsson
dfa4a6647f Fix code to use const_cpu_to_X/const_X_to_cpu macros for constants.
This enables the compiler to optimize this code in cases where compiler
support for endianness swapping is not present.
2015-12-21 23:21:00 +01:00
Erik Larsson
c9771d0509 unistr.c: Cleanup of OS X Unicode normalization code.
Normalize coding conventions to fit in with the rest of NTFS-3G,
including line breaks at column 80.
2015-06-23 06:43:17 +02:00
Jean-Pierre André
e40b86a86c Upgraded the upper-case table as defined by Windows 7
Newer versions of Windows use more recent definitions of upper-case
table defined by the Unicode consortium. Now using the same table as
Windows 7, windows 8 and Windows 10. This only has an effect on file
systems newly created by mkntfs.
2015-04-17 11:03:58 +02:00
Jean-Pierre André
543b17b7ef Rejected reserved files names when option windows_names is set
Windows applies legacy restrictions to file names, so when the option
windows_names is applied, reject the same reserved names, which are
CON, PRN, AUX, NUL, COM1..COM9, and LPT1..LPT9
2014-03-11 10:56:31 +01:00
Jean-Pierre André
4ce33daf6c Cosmetic : fixed an indentation in unistr 2012-01-23 17:09:19 +01:00
Jean-Pierre André
fa3d7a5728 minor : Fixed ntfs_upcase_build_default() returning garbage in error case (Fabian Keil) 2011-08-04 15:49:35 +02:00
Jean-Pierre André
82b00364a8 Fixed setting DOS names when defined with lower-case chars 2011-07-05 12:17:11 +02:00
Jean-Pierre André
a46a395006 Updated copyright notices 2011-02-08 13:52:12 +01:00
Jean-Pierre André
4c6cf9d977 Moved the knowledge of default upcase size to unistr.c 2011-02-08 13:52:12 +01:00
Jean-Pierre André
53599b1a98 Switched to the same Upcase table as Vista 2010-12-21 15:51:08 +01:00
Jean-Pierre André
8b910e9e80 Improved names comparing on big-endian computers 2010-10-26 08:59:51 +02:00
Jean-Pierre André
008d8c5df9 Fixed character translations when standard functions are not available 2010-08-28 13:59:43 +02:00
Jean-Pierre André
4d73c7c4f1 Fixed characters not allowed by Windows in names 2010-06-03 10:13:30 +02:00
Jean-Pierre André
693aa8780d enabled case insensitive file names in lowntfs-3g 2010-05-25 10:12:44 +02:00
jpandre
195945cdc0 Evaluated file names collations in a single parsing 2009-12-16 09:45:28 +00:00
jpandre
7a876eca36 Fixed possible memory leaks after char translation errors 2009-12-09 11:20:20 +00:00
jpandre
e23481624f Improved UTF8<-->UTF16 translations 2009-12-09 11:19:27 +00:00
jpandre
a75724fea8 Fixed a few misleading endianness types 2009-11-24 14:18:53 +00:00
jpandre
3af7bebe7b Mac OS X Unicode normalization form conversion (Erik Larsson) 2009-11-05 11:40:44 +00:00
jpandre
e4b3c59cb1 Accepted initial spaces in Win32/DOS names 2009-09-18 16:17:21 +00:00
jpandre
1d26eb2b97 Fixed checking spaces in Win32 names 2009-08-12 15:35:11 +00:00
jpandre
9a4672ca65 Developped getting and setting DOS names (short 8+3 names) 2009-07-01 19:45:59 +00:00
jpandre
fc78c03c39 Fixed an endianness error in default uppercase table 2009-04-20 15:27:03 +00:00
jpandre
11216c6942 Adapted to ntfs-3g-2009.1.1 2009-01-23 11:11:44 +00:00
jpandre
d3f3a19866 Adapted to ntfs-3g.1.5222-RC 2009-01-05 13:28:06 +00:00
jpandre
13552eba52 Integrated full utf-8 to utf-16le conversions, based on code by Berhard Kaindl 2008-08-21 12:04:51 +00:00
szaka
1098244bbf copyright update 2008-06-29 23:13:32 +00:00
jpandre
53fa335624 Adapted to ntfs-3g.1.2310 2008-03-10 15:35:54 +00:00
jpandre
038156ba82 Reengineered LRU caches, made generic, and applied to finding inode numbers 2008-01-10 17:32:55 +00:00
szaka
ba63b7daca initial CVS import 2006-10-30 22:32:48 +00:00