Now, we make sure that we update ref count and BCB list membership
with the BCB lock held, in a row.
This will avoid race conditions where the BCB was removed from the
list, then referenced again, leading to inconsistencies in memory
and crashes later on.
This could notably be triggered while building ReactOS on ReactOS
(one would call this a regression).
CORE-15235
- Rework CmpSetSystemValues() and remove its 1st-stage text-mode setup hack, since a real registry hive will be used for 1st-stage either.
- Lock, then unlock the registry in NtInitializeRegistry when initializing the hives & flusher.
- Call CmpInitializeHiveList() (i.e., initialize the other hives like \Software, \User, \.Default) only when we are not in setup-mode.
svn path=/branches/setup_improvements/; revision=74747
This reverts a7c26408 (r53255) and ff75ae1b (r53694), and a hack from 6075ae9a (r46690).
svn path=/branches/setup_improvements/; revision=74745
svn path=/branches/setup_improvements/; revision=74746
- Improve the capture of OBJECT_ATTRIBUTES parameters that are passed
(by pointer) to the Cm* helper functions, and the capture of
UNICODE_STRINGs.
- Correctly differentiate user-mode vs. kernel-mode root directory handles
(in OBJECT_ATTRIBUTES): note that most of the Cm* APIs expect their
parameters to be kernel-mode (pointers, handles...).
CORE-13448
- Validate the information class parameter in NtQueryValueKey().
- Call the post-callback in NtSetValueKey() only if the callback
has been registered and the CmSetValueKey() call is executed.
We won't reuse a BCB created for mapping, we will now have
our own dedicated BCB.
This allows having a bit more cleaner implementation of CcPinMappedData()
We now handle race conditions when creating BCB to avoid
having duplicated BCB per shared maps.
Also, we already specify whether the memory will be pinned
when creating the BCB, to avoid potential duplications or
BCB misuse.
If so, return such BCB instead of creating a new one. This will
allow (at some point) to be more consistent in case of concurrent
mapping.
This fixes a few CcMapData tests.
Given current ReactOS implementation, a VACB can be pinned
several times, with different BCB. In next commits, a single
BCB will be able to be pinned several times. That would
lead to severe inconsistencies in counting and thus corruption.
This quick check based on bits and operation is for 2^ based
sector sizes (most of the cases) and will perform faster than
the modulo operation which is still used in fallback in case
the sector size wouldn't be a power of 2.
IopCloseFile can be called by IopDeleteFile. In that situation, it
doesn't set any process as first parameter. Furthermore, we are in a
situation where it's not required to lock the file object (see the
assert before the call).
- SeIsTokenChild(): Correctly check whether a caller-provided token
is a child from the current process' primary token by looking at
its ParentTokenId member.
- Add a SeIsTokenSibling() helper to determine whether a caller-provided
token and the current process' primary token are siblings, by comparing
their ParentTokenId's and AuthenticationId's.
NOTE: Children tokens are created through CreateRestrictedToken();
sibling tokens are created through DuplicateToken() (amongst others).
See slide 49 of https://www.slideshare.net/Shakacon/social-engineering-the-windows-kernel-by-james-forshaw
or https://googleprojectzero.blogspot.com/2016/01/raising-dead.html
for some details.
This could be triggered when attempting to read/write to really big
files. It was causing an attempt to read 0 bytes in Cc, leading to
asserts failure in the kernel (and corrupted file).
CORE-15067
Currently, our CcMapData() behavior (same goes for CcPinRead()) is broken
and is the total opposite of what Windows kernel does. By default, the later
will let you map a view in memory without even attempting to bring its
data in memory. On first access, there will be a fault and memory will
be read from the hardware and brought to memory. If you want to force read
on mapping/pinning, you have to set the MAP_NO_READ (or PIN_NO_READ) flag
where kernel will fault on your behalf (hence the need for MAP_WAIT/PIN_WAIT).
On ReactOS, by default, on mapping (and thus pinning), we will force a view
read so that data is in memory. The way our cache memory is managed at the
moment seems not to allow to fault on invalid access and if we don't force
read, the memory content will just be zeroed.
So trying to match Windows behavior, by default, now CcMapData() will enforce
the MAP_NO_READ flag and warn once about this behavior change.
It's based on the code that was in CcPinRead() implementation. This
made no sense to have CcPinMappedData() doing nothing while implementing
everything in CcPinRead(). Indeed, drivers (starting with MS drivers)
can map data first and pin it afterwards with CcPinMappedData(). It was
leading to incorrect behavior with our previous noop implementation.
Short: The code was suffering from an off-by-one bug (inconsistency between inclusive end exclusive end address), which could lead to freeing one page above the initialization code. This led to freeing part of the kernel import section on x64. Now it is consistently using the aligned/exclusive end address.
Long:
* Initialization sections are freed both for the boot loaded images as well as for drivers that are loaded later. Obviously the second mechanism needs to be able to run at any time, so it is not initialization code itself. For some reason someone decided though, it would be a smart idea to implement the code twice, once for the boot loaded images, once for drivers and concluding that the former was initialization code itself and had to be freed.
* Since freeing the code that frees the initialization sections, while it is doing that is not possible, it uses a "smart trick", initially skipping that range, returning its start and end to the caller and have the caller free it afterwards.
* The code was using the end address in an inconsistent way, partly aligning it to the start of the following section, sometimes pointing to the last byte that should be freed. The function that freed each chunk was assuming the latter (i.e. that the end was included in the range) and thus freed the page that contained the end address. The end address for the range that was returned to the caller was aligned to the start of the next section, and the caller used it to free the range including the following page. On x64 this was the start of the import section of ntoskrnl. How that worked on x86 I don't even want to know.
The PROCESS_DEVICEMAP_INFORMATION union has 2 fields, one is a handle, the other one is a structure of 36 bytes (independent of architecture). The handle forces 64 bit alignment on 64 bit builds, making the structure 4 bytes bigger than on 32 bit builds. The site is checked in NtQueryInformationProcess (case ProcessDeviceMap). The expected size on x64 is the size of the Query structure without alignment. autocheck correctly passes the site of the Query union member, while smss passes the full size of PROCESS_DEVICEMAP_INFORMATION. Packing the structure is not an option, since it is defined in public headers without packing. Using the original headers sizeof(PROCESS_DEVICEMAP_INFORMATION) is 0x28, sizeof(PROCESS_DEVICEMAP_INFORMATION::Query) is 0x24.
- Rename ObDirectoryType to ObpDirectoryObjectType and remove it from NDK (this is not exported!)
- Rename ObSymbolicLinkType to ObpSymbolicLinkObjectType
- Remove duplicated ObpTypeObjectType from ob.h
Kernel stacks that re freed, can be placed on an SLIST for quick reuse. The old code was using a member of the PFN of the last stack page as the SLIST_ENTRY. This relies on the following (non-portable) assumptions:
- A stack always has a PTE associated with it.
- This PTE has a PFN associated with it.
- The PFN has an empty field that can be re-used as an SLIST_ENTRY.
- The PFN has another field that points back to the PTE, which then can be used to get the stack base.
Specifically: On x64 the PFN field is not 16 bytes aligned, so it cannot be used as an SLIST_ENTRY. (In a "usermode kernel" the other assumptions are also invalid).
The new code does what Windows does (and which seems absolutely obvious to do): Place the SLIST_ENTRY directly on the stack.
It's hardly understandable and doesn't really makes sense.
Furthermore, it breaks compatibility with 3rd party FSD that
don't implement such FSCTL.
Obviously, Windows doesn't do this.
For user mode, when probing output buffer, if it's null, length
will also be set to 0.
This avoids user mode applications being able to trigger various
asserts in ReactOS (and thus BSOD when no debugger is plugged ;-)).
NDK: Define PLUGPLAY_CONTROL_PROPERTY_DATA.Properties and PLUGPLAY_CONTROL_DEVICE_RELATIONS_DATA.Relations values.
NTOSKRNL: Map PLUGPLAY_CONTROL_PROPERTY_DATA.Properties values to IoGetDeviceProperty properties and add (dummy) code for unsupported cases.
UMPNPMGR: Use PLUGPLAY_CONTROL_PROPERTY_DATA.Properties values in PNP_GetDeviceRegProp.
- Overhaul SepCreateToken() and SepDuplicateToken() so that they
implement the "variable information area" of the token, where
immutable lists of user & groups and privileges reside, and the
"dynamic information area" (allocated separately in paged pool),
where mutable data such as the token's default DACL is stored.
Perform the necessary adaptations in SepDeleteToken() and in
NtSetInformationToken().
- Actually dereference the token's logon session, when needed, in the
'TokenSessionReference' case in NtSetInformationToken().
- Overhaul SepFindPrimaryGroupAndDefaultOwner() so that it returns
the indices of candidate primary group and default owner within the
token's user & groups array. This allows for fixing the 'TokenOwner'
and 'TokenPrimaryGroup' cases of NtSetInformationToken(), since the
owner or primary group being set *MUST* already exist in the token's
user & groups array (as a by-product, memory corruptions that existed
before due to the broken way of setting these properties disappear too).
- Lock tokens every time operations are performed on them (NOTE: we
still use a global token lock!).
- Touch the ModifiedId LUID member of tokens everytime a write operation
(property change, etc...) is made on them.
- Fix some group attributes in the SYSTEM process token, SepCreateSystemProcessToken().
- Make the SeCreateTokenPrivilege mandatory when calling NtCreateToken().
- Update the token pool tags.
- Explicitly use the Ex*ResourceLite() versions of the locking functions
in the token locking macros.
- Use TRUE/FALSE instead of 1/0 for booleans.
- Use NULL instead of 0 for null pointers.
- Print 0x prefix for hex values in DPRINTs.
- Use new annotations for SepCreateToken() and SepDuplicateToken().
Caught while debugging, in the case the ImpersonationLevel value was
uninitialized, due to the fact it was left untouched on purpose by
PsReferenceEffectiveToken().
kmtest:NtCreateSection calls CcInitializeCacheMap with a
NULL value for SectionObjectPointers. This will cause an exception when
trying to access it, which in Windows can be handled gracefully.
However accessing it while holding ViewLock means the lock will not be
released, leading to an APC_INDEX_MISMATCH bugcheck.
This solves the problem by allocating SharedCacheMap outside the lock,
then freeing it again under lock if another thread has updated SharedCacheMap
in the mean time. This is also What Windows Does(TM).
This avoids a really nasty race condition in our cache controler where
two concurrents could try to initialize cache on the same file.
This had two nasty effects: first shared map was purely leaked and erased
by the second one. And the private cache map, allocated on the first shared
cache map couldn't be freed and was leading to Mm BSOD (free in a middle of
a block).
This was often triggered while building ReactOS on ReactOS (with multi threads).
With that patch, I cannot crash anylonger while building ReactOS.
CORE-14634
Oneliner of the day... This typo just prevented the
whole feature to work properly. Because any allocated
work item would miserably fail to be freed.
This will obviously help real world FSD relying on
StackOverflow worker from FsRtl to work better!
CORE-14611
In the lazy writer run, first post items that are queued for this.
Only then, start executing deferred writes if any.
If there were any, reschedule immediately a lazy writer run, to keep
Cc warm and to make it unqueue write faster in case of high IOs situation.
To make second lazy writer run happen faster, we keep our state active to
use short delay (1s) instead of standard idle (3s).
Recent changes seem to show that it's not
required to be exclusive on VACB to be able
to flush it.
This commit goes with f2c44aa and fixes the
last issues going with copying huge files.
There are no longer BSODs (be it in Mm or Cc).
I could, with 750MB RAM extract a 2GB file from
a 53MB archive and copy a 2,5GB file from a VBox
share to the disk. Note that writes are often
deferred, so if copy works, it's not that fast for now.
Note that it also brings some beloved behavior from
Windows: copy times are totally unreliable now when
writes are deferred. Little remaining times when
actively copying, high remaining times when deferred
writes in action. And goes between both... Sorry! ;-)
https://xkcd.com/612/
CORE-9696
CORE-11175
Same mechanism exists in Windows (even their Cc
is way different from ours...) where when Cc is
out of memory (in their case, out of VACB), we
will start scavenge old & unused VACB to free
some of the memory.
It's useful in case we're operating we big files
operations, we may run out of memory where to map
VACB for them, so start to scavenge VACB to free
some of that memory.
With this, I am able to install Qt 4.8.6 with 2,5GB of RAM,
scavenging acting when needed!
CORE-12081
CORE-14582
Adjusting refcount and enabling lazy-write for pinned
VACB makes it actually more efficient, often purging
data to disk, reducing memory stress for the system.
This is required for defering writes.
This commit unfortunately (?) reverts a previous revert.
CORE-12081
CORE-14582
CORE-14313
When no name is set in the file object, try to read the name
from the FCB. We only support FastFAT (ours) FCB for now.
This is clearly a hack, but for a kdbg command, so ;-)
It seems that on process killing, some VACB may be deleted while
still mapped. With current reference counting, they will actually
not be deleted, but leaked, and an ASSERT will be triggered.
CORE-14578
- For non-PnP devices reported to the PnP manager through the
IoReportDetectedDevice() function, store the corresponding
service/driver name and (non-)legacy information inside their
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\Root\ entries.
- Drivers flagged as "DRVO_BUILTIN_DRIVER" (basically, only those
created via a IoCreateDriver() call) have their "Service" name that
contain "\Driver\", which should be stripped before being used in
building e.g. the corresponding "DETECTEDxxx" PnP compatible IDs.
CORE-14247
- Use explicit REG_OPTION_NON_VOLATILE flag where needed in the
IopCreateDeviceKeyPath() calls.
- Save NULL-terminated REG-SZ string properties in the enumeration tree
for each device enumerated inside \Enum\Root\.
- Always use upcased key name for the "LEGACY_***" elements in \Enum\Root\.
- Add a default "ConfigFlags" value for the legacy elements.
- Simplify few parts of code.
The size is in bytes, not in pages! On x86 we got away with it, since PEB and TEB require only a single page and the 1 passed to MiInsertVadEx() was aligned up to PAGE_SIZE. On x64 this doesn't work, since the size is 2 pages.
This has have several benefits for ReactOS Cc:
- It helps reducing potential deadlocks situations in Cc
- It speeds up ReactOS by reducing locks
- It gets us a bit closer to Windows VACB
CORE-14349
The avoids race conditions where attempts to read from disk to
not fully initialized VACB were performed.
Also, added more debug prints in such situations.
CORE-14349
Doing this is not only wrong because it acquires the same spinlock twice,
it also completely breaks the TLB flushing logic in MiMapPageInHyperSpace.
If the PTE with Offset 1 is still valid when a wrap-around to 0 happens,
the TLB flush on wrap-around will not clear the entry for this previous page.
After another loop around all hyperspace pages, page 1 is re-used but its
TLB entry has not been flushed, which may result into incorrect translation.
This allows being consistent between newly created and looked up
so that VACB can always safely be released.
Should really help with reference issues.
CORE-14481
CORE-14480
CORE-14482
This fixes various bugs linked to VACB counting:
- VACB not released when it should be
- Reference count expectations not being accurate
For the record, VACB should always have at least a reference
count of 1, unless you are to free it and removed it from
any linked list.
This commit also adds a bunch of asserts that should
help triggering invalid reference counting.
It should also fix numerous ASSERT currently triggered and
may help fixing random behaviours in Cc.
CORE-14285
CORE-14401
CORE-14293
Previously, we would keep sampling the CPU frequency until two subsequent
samples differed by at most 1 MHz. This could take several seconds, and would
unnecessarily delay boot.
Instead, if sampling is too unreliable, just give up and calculate the average
frequency from 10 samples. This is no worse than picking the frequency that
just happened to be returned twice in a row.
The fact that this method of sampling fails could indicate that there's a
problem with our performance counter implementation or timer interrupt,
but that's a separate issue...
This avoids locking Cc for too long by trying to read-ahead data which
is already in cache.
We now will only schedule a read ahead if next read should bring us
to a new VACB (perhaps not in cache).
This notably fixes Inkscape setup which was slown down by read-ahead
due to continous 1 byte reads.
Thanks to Thomas for his help on this issue.
CORE-14395
The reserve IRP is an IRP which is allocated on system boot and kept during
the whole system life. Its purpose is to allow page reads in case of
low-memory situations where the system doesn't have enough memory left
to allocate an IRP to read from the page file (would be catastrophic situation).
Standard shared cache map provides space for a private cache map, do the same
and make it available for the first handle. It avoids two allocations in a row.
This halfplements CcScheduleReadAhead() which is responsible for finding the next reads
to perform given last read and previous reads. I made it very basic for now, at least
to test the whole process.
This also introduces the CcExpressWorkQueue in the lazy writer which is responsible
for dealing with read ahead items and which is dealt with before the regular queue.
In CcCopyData(), if read was fine, schedule read ahead so that it can happen in background
without the FSD to notice it! Also, update the read history so that scheduling as a
bit of data.
Implement (à la "old Cc" ;-)) CcPerformReadAhead() which is responsible for performing
the read. It's only to be called by the worker thread.
Side note on the modifications done in CcRosReleaseFileCache(). Private cache map
is tied to a handle. If it goes away, private cache map gets deleted. Read ahead
can run after the handle was closed (and thus, private cache map deleted), so
it is mandatory to always lock the master lock before accessing the structure in
read ahead or before deleting it in CcRosReleaseFileCache(). Otherwise, you'll
just break everything. You've been warned!
This commit also partly reverts f8b5d27.
CORE-14312
before the call to CcRosReleaseFileCache() which expects to have it to properly clean the file.
So, move deletion code to CcRosReleaseFileCache() so that he's the only one to handle private map.
Should hopefully fix all the recent buildbots issues (and the universe perhaps, who knows?)
This reverts BCB being lazy written when marked dirty.
We'll go back to this behavior when this part will have been reworked and stabilized.
CORE-14263
CORE-14279
CORE-14285
We get rid of the old iLazyWriterNotify event in favor of work items
that contain an event that lazy writer will set once its done.
To implement this, we rely on the newly introduced CcPostTickWorkQueue work queue
that will contain work items that are to be queued once lazy writer is done.
Move the CcWaitForCurrentLazyWriterActivity() implementation to the
lazy writer file, and reimplemented it using the new support mechanisms
Instead move to a threading model like the Windows one.
We'll queue several work items to be executed in a system thread (Cc worker)
when there are VACB that have been marked as dirty. Furthermore, some delay
will be observed before action to avoid killing the system with IOs.
This new threading model opens way for read ahead and write behind implementation.
Also, moved the initialization of the lazy writer to CcInitializeCacheManager()
it has nothing to do with views and shouldn't be initialized there.
Also, moved the lazy writer implementation to its own file.
Modified CcDeferWrite() and CcRosMarkDirtyVacb() to take into account the new threading model.
Introduced new functions:
- CcPostWorkQueue(): post an item to be handled by Cc worker and spawn a worker if required
- CcScanDpc(): called after some time (not to have lazy writer always running) to queue a lazy scan
- CcLazyWriteScan(): the lazy writer we used to have
- CcScheduleLazyWriteScan(): function to call when you want to start a lazy writer run. It will make a DPC after some time and queue execution
- CcWorkerThread(): the worker thread that will handle lazy write, read ahead, and so on
This means that MmSystemCacheStart, MmSystemCacheEnd, MmSizeOfSystemCacheInPages
have now a valid value.
System cache is not used atm the moment though. MmMapViewInSystemCache() is to
be implemented, and Cc is to be made aware of this.
CORE-14259
- Change MM_SYSTEM_SPACE_START to 0xFFFFF88000000000
- Move MI_DEBUG_MAPPING to the end of the system PTE range
- Add MI_SYSTEM_CACHE_START and MI_SYSTEM_CACHE_END, which is in the range that Vista uses as dynamic VA space for cache and other allocations
- Wrap x86 specific code that makes now invalid assumptions about the address space layout in #ifdef _M_IX86
- CcUnpinDataForThread() only release VACB when the last BCB reference is gone. This avoids having a valid BCB with an invalid VACB
- CcRosMarkDirtyVacb() will only accept non-dirty VACB now. This avoids a major bug where a an already dirty VACB was over-dereferenced
- Thanks to previous point, simplify CcRosUnmapVacb(), CcRosReleaseVacb() implementation
- And only set VACB dirty once in CcSetDirtyPinnedData()
- Add a few sanity checks
With that I can again install ReactOS with 128MB RAM :-).
CORE-14263
CORE-14268