Although ntfs_log_trace() is defined to a no-op in non-DEBUG builds,
ntfs_attr_name_get() is not. This function performs a string conversion
and a memory allocation, so it is nice to have the call to it compiled
out when not needed.
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
It was possible for ntfs_attr_name_get() to set errno due to a wide
character string that could not be converted to a multibyte string. This
caused ntfs_delete() to fail.
Fix by checking for a nonzero return value specifically from
ntfs_attr_lookup(), rather than assuming that nothing else sets errno.
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Switch to the standard autoconf AC_HEADER_MAJOR macro which takes care
of the ugly details like when to use mkdev.h and when to use sysmacros.h.
(requires <sys/types.h> to be included)
Also include these in all files that use major/minor/makedev funcs.
(Contributed by Mike Frysinger)
The upper case value for 0x1d79 is 0xa77d, so the difference is 0x8a04,
which overflows in the table which defines the computation of upper case
values. Rewriting this difference as -0x75fc leads to the same result
in an upper case table truncated to two bytes, and this avoid the
compiler warning.
On Solaris/OpenIndiana, use the macroes makedev(), major() and minor()
from <sys/mkdev.h>. Those from <sys/sysmacros.h> are inappropriate for
current builds.
In the mailing list discussion we came to the conclusion that there
doesn't seem to be any reason to keep these declarations separate since
they address the same issue, namely libntfs-3g's tolerance for bad
Unicode data in filenames and other UTF-16 strings in the file system,
so merge the two defines into the new define ALLOW_BROKEN_UNICODE.
Windows filenames may contain invalid UTF-16 sequences (specifically
broken surrogate pairs), which cannot be converted to UTF-8 if we do
strict conversion.
This patch enables encoding broken UTF-16 into similarly broken UTF-8 by
encoding any surrogate character that don't have a match into a separate
3-byte UTF-8 sequence.
This is "sort of" valid UTF-8, but not valid Unicode since the code
points used for surrogate pair encoding are not supposed to occur in a
valid Unicode string... but on the other hand the source UTF-16 data is
also broken, so we aren't really making things any worse.
This format is sometimes referred to as WTF-8 (Wobbly Translation
Format, 8-bit encoding) and is a common solution to represent broken
UTF-16 as UTF-8.
It is a lossless round-trip conversion, i.e converting from broken
UTF-16 to "WTF-8" and back to UTF-16 yields the same broken UTF-16
sequence. Because of this property it enables accessing these files
by filename through ntfs-3g and the ntfsprogs (e.g. ls -la works as
expected).
To disable this behaviour you can pass the preprocessor/compiler flag
'-DALLOW_BROKEN_SURROGATES=0' when building ntfs-3g.
Prepare merging ntfsrecover.h into logfile.h by adding a usn field to
RESTART_PAGE_HEADER. As this changes the record size, ignore the new
field in existing code.
User extended attributes should only be set on files and directories,
not on symlinks, sockets, devices, etc. For safety they are also
forbidden on metadata files, but should be allowed on the root
directory. For files based on reparse points, requests are made
to the plugin to determine the type.
When creating a new MFT record, do not issue a warning if the current
record has bad fixups. These warnings are meaningless, difficult to
interpret and cause unneeded worries.
The new "system compression" files used by Windows 10 make use of reparse
points to record the compression parameters, and a specific named data
stream is used to store the compressed data. With this patch, processing
of reparse points can be done by an external plugin only loaded as needed.
Junctions and symlinks, which are also based on reparse points, are now
processed by "internal plugins".
For 64-bit (e.g. x86_64) Linux the 64-bit wide types resolve to long,
not long long as is the case in 32-bit (e.g. i386) Linux. So we need an
explicit cast to long long for 64-bit types since the format string must
specify the 'll' modifier in order to print 64-bit values.
These variable are only ever assigned to/from s64 values, so their type
should be s64, not u64. This fixes a compiler warning about
signed/unsigned comparison.
When looking up the lowercase equivalent of a Unicode character in
ntfs_fix_file_name, no byte swapping was performed on the ntfschar used
as index into the 'locase' array. This would lead to very strange
results on big-endian systems.
This commit addresses issues where little-endian variables are emitted
raw to a log or output stream which is to be interpreted by the user.
Outputting data in non-native endianness can cause confusion for anybody
attempting to debug issues with a file system.
When the unreadable directory has an ATTRIBUTE_LIST attribute and an
INDEX_ALLOCATION attribute occupying split over several extents, the first
of which defines a single cluster, the first INDEX_ALLOCATION extent has
lowest_vcn=0 and highest_vcn=0, and the second one has lowest_vcn=1.
This unusual case, which can be created by the combination of a small
volume and near-full MFT records, triggers some special-case behavior in
ntfs_mapping_pairs_decompress_i(). That behavior is incorrect if the
attribute's first extent only contains a single cluster, since in that case
highest_vcn=0 as well.
This configuration has been tested on Windows and it *is* able to
successfully read the directory. This supports the hypothesis that the
volume is valid and NTFS-3g has a bug on the read side.
This bug could, in theory, occur with any non-resident attribute, not just
INDEX_ALLOCATION attributes.
(Contributed by Eric Biggers)
Some constraints put on reparse points of unknown type (e.g. they cannot
be deleted) are not acceptable to archivers. This patch removes some
constraints.
Windows requires non-Microsoft reparse points (identified by having bit
31 of the reparse tag clear) to have a 16-byte GUID following the regular
reparse point header. This GUID is not, and cannot, be included in the
"reparse data length" field.
(Contributed by Eric Biggers)
Under some rare condition there is no space in an MFT entry to make
an index non-resident, and the index root has to be moved to an extent.
This fix cares for the situation when the attribute list was inserted
beforehand.
When writing to compressed data, the function ntfs_attr_pwrite()
cannot cross a compression block border. This is a problem for archivers
which rely on libntfs-3g, so the function is now wrapped in another one
which restarts the writing as needed.
ntfs_valid_sid() required that the subauthority count be between 1 and 8
inclusively. However, Windows permits more than 8 subauthorities as well
as 0 subauthorities:
- The install.wim file for the latest Windows 10 build contains a file
whose DACL contains a SID with 10 subauthorities.
ntfs_set_ntfs_acl() was failing on this file.
- The IsValidSid() function on Windows returns true for subauthority
less than or equal to 15, including 0.
There was actually already a another SID validation function that had the
Windows-compatible behavior, so I merged the two together.
Contributed by Eric Biggers
Compressed records may be written as full clusters even though cluster
tails are meaningless. This is to avoid the lower levels doing a read-
modify-write cycle. Be sure to zero the meaningless bytes to avoid
leaking information.
Contributed by Eric Biggers
When the owner and the group of a file have the same SID, and permissions
for the group is the same as permissions for other, no ACE is needed for
the group.
Newer versions of Windows use more recent definitions of upper-case
table defined by the Unicode consortium. Now using the same table as
Windows 7, windows 8 and Windows 10. This only has an effect on file
systems newly created by mkntfs.
An unused MFT record may show a bad length, leading to fetch fixups from
unallocated memory when allocating the record to a new file. So check
the length before applying the fixups. Such records have been found after
the MFT has been reallocated by a defragmenter, and they are not cleaned
by chkdsk.
When chmod'ing a file, no new ACL has to be created if the one needed
is already present in the cache. However the read-only flag may have
to be updated, so that it is kept as the opposite of S_IWUSR.
When the security attribute is present, chkdsk may set a null security id
in the standard attributes, and this should not be considered as an error.
(this partially reverts commit [70e5b1])
This patch changes the algorithm to use hash chains instead of binary
trees, with much stronger hashing. It also introduces useful (for
performance) parameters, such as the "nice match length" and "maximum
search depth", that are similar to those used in other commonly used
compression algorithms such as zlib's DEFLATE implementation.
The speed improvement is very significant, with some loss of compression
rate. The compression rate is still better than then Windows one.
Contributed by Eric Biggers
The new way goes via /sys/dev/block/MAJOR:MINOR to map partitions to
devices and get discard parameters of the parent device. It also ensures
that the partition is aligned to the discard block size.
Contributed by Richard W.M. Jones
fstrim(8) discards unused blocks on a mounted filesystem. It is useful for
solid-state drives (SSDs) and thinly-provisioned storage.
Only trimming the full device (with no option) is supported.
Contributed by Richard W.M. Jones