ntfsrecover -f -v <log file> receives a SIGSEGV because of trying to
read memory outside allocated buffer because of no sanity checks on
restart page header values. This happens on an empty $LogFile because
of no basic checks present. Attached patch adds basic checks similar
to those inside logfile library and allows tool to exit with more
suitable message.
(contributed by Rakesh Pandit)
In the mailing list discussion we came to the conclusion that there
doesn't seem to be any reason to keep these declarations separate since
they address the same issue, namely libntfs-3g's tolerance for bad
Unicode data in filenames and other UTF-16 strings in the file system,
so merge the two defines into the new define ALLOW_BROKEN_UNICODE.
Windows filenames may contain invalid UTF-16 sequences (specifically
broken surrogate pairs), which cannot be converted to UTF-8 if we do
strict conversion.
This patch enables encoding broken UTF-16 into similarly broken UTF-8 by
encoding any surrogate character that don't have a match into a separate
3-byte UTF-8 sequence.
This is "sort of" valid UTF-8, but not valid Unicode since the code
points used for surrogate pair encoding are not supposed to occur in a
valid Unicode string... but on the other hand the source UTF-16 data is
also broken, so we aren't really making things any worse.
This format is sometimes referred to as WTF-8 (Wobbly Translation
Format, 8-bit encoding) and is a common solution to represent broken
UTF-16 as UTF-8.
It is a lossless round-trip conversion, i.e converting from broken
UTF-16 to "WTF-8" and back to UTF-16 yields the same broken UTF-16
sequence. Because of this property it enables accessing these files
by filename through ntfs-3g and the ntfsprogs (e.g. ls -la works as
expected).
To disable this behaviour you can pass the preprocessor/compiler flag
'-DALLOW_BROKEN_SURROGATES=0' when building ntfs-3g.
These tools were originally developed for running on Windows and later
ported to libntfs-3g. This patch makes them similar to other ntfsprogs
tools, dropping the native Windows interfaces and using libntfs-3g on
all platforms.
There is no change in usage or supported features, only the command
names have changed.
These tools were developped before the ntfsprogs were merged into ntfs-3g,
redesigning them like the ntfsprogs make the code simpler.
Note : at this stage secaudit and usermap cannot be built any more.
Prepare merging ntfsrecover.h into logfile.h by adding a usn field to
RESTART_PAGE_HEADER. As this changes the record size, ignore the new
field in existing code.
User extended attributes should only be set on files and directories,
not on symlinks, sockets, devices, etc. For safety they are also
forbidden on metadata files, but should be allowed on the root
directory. For files based on reparse points, requests are made
to the plugin to determine the type.
Kernel cacheing of file attributes is usually not used by ntfs-3g,
because it has defects when dealing with hard linked files and directory
permission checks. Kernel cacheing is however possible when using
lowntfs-3g and not using Posix ACLs.
When creating a new MFT record, do not issue a warning if the current
record has bad fixups. These warnings are meaningless, difficult to
interpret and cause unneeded worries.
The new "system compression" files used by Windows 10 make use of reparse
points to record the compression parameters, and a specific named data
stream is used to store the compressed data. With this patch, processing
of reparse points can be done by an external plugin only loaded as needed.
Junctions and symlinks, which are also based on reparse points, are now
processed by "internal plugins".
Usually, only a few pages of the Windows log file are saved in an
ntfsclone image. This is inappropriate for building reference images
for recovering the log, and the --full-logfile option serves that
purpose.
When an INDX or MFT record could not be read while undoing the creation
of this record, there is nothing to do. However if this was undoing the
deletion of the last entry in an index, a new void index block has to be
created.
On the OpenIndiana Hipster distribution, compiling with GCC 4.9 would
fail because __BYTE_ORDER__ was defined but not to any of the values
assumed to be associated with this define (__LITTLE_ENDIAN__ or
__BIG_ENDIAN__). Instead it was defined to either
__ORDER_LITTLE_ENDIAN__ or __ORDER_BIG_ENDIAN__. This caused
compilation to fail.
Fixed by checking that all referenced defines are in fact defined
before using them and adding an additional #elif clause for this newly
discovered condition.
The previous fix for the warning referred to 'prevbuf' being used
uninitialized and this is also what the compiler says. However
initializing 'prevbuf' doesn't make the warning go away and further
testing revealed that it is really 'savebuf' being possibly used prior
to initialization that is the source of the warning (the incorrect
warning message is probably an optimization-related gcc bug). So replace
previous ineffective fix with explicit initialization of 'savebuf'.
For 64-bit (e.g. x86_64) Linux the 64-bit wide types resolve to long,
not long long as is the case in 32-bit (e.g. i386) Linux. So we need an
explicit cast to long long for 64-bit types since the format string must
specify the 'll' modifier in order to print 64-bit values.
Some compilers issue a warning when a pointer is initialized in
both alternatives of a condition. Force an extra initialization
to avoid such warnings.
Closing the volume is the way to sync the MFT to disk. When not doing
so, the MFT runlists in $DATA and $Bitmap are not synced if they have
been updated in the second resizing stage relative to runlists which
have grown outside their original MFT record.
Unlike in most cases, the bad sector inode has to be closed if it
was updated and required MFT extents (when there are a lot of bad
sectors and some of them were outside the truncated partition).
Not doing so causes the inode to not be fully synced to device.
This fixes compiler warnings emitted when you compare an le32 value with
e.g. 'const_cpu_to_le32(-1)' on a little-endian system, because
previously the expansion of the macro expression 'const_cpu_to_le32(-1)'
would be '(-1)' on a little-endian system but '(u32)((((u32)(-1) &
0xff000000u) >> 24) | (((u32)(-1) & 0x00ff0000u) >> 8) | (((u32)(-1) &
0x0000ff00u) << 8) | (((u32)(-1) & 0x000000ffu) << 24))' on a
big-endian system, i.e. the type of the expanded expression would be
'int' (signed) in the little-endian case but 'u32' (unsigned) in the
big-endian case.
With this commit the type of the expanded expression will be 'le32' in
both the little-endian and the big-endian case.