linux/tools/testing/selftests/kvm
Peter Xu 016ff1a442 KVM: selftests: Sync data verify of dirty logging with guest sync
This fixes a bug that can trigger with e.g. "taskset -c 0 ./dirty_log_test" or
when the testing host is very busy.

A similar previous attempt is done [1] but that is not enough, the reason is
stated in the reply [2].

As a summary (partly quotting from [2]):

The problem is I think one guest memory write operation (of this specific test)
contains a few micro-steps when page is during kvm dirty tracking (here I'm
only considering write-protect rather than pml but pml should be similar at
least when the log buffer is full):

  (1) Guest read 'iteration' number into register, prepare to write, page fault
  (2) Set dirty bit in either dirty bitmap or dirty ring
  (3) Return to guest, data written

When we verify the data, we assumed that all these steps are "atomic", say,
when (1) happened for this page, we assume (2) & (3) must have happened.  We
had some trick to workaround "un-atomicity" of above three steps, as previous
version of this patch wanted to fix atomicity of step (2)+(3) by explicitly
letting the main thread wait for at least one vmenter of vcpu thread, which
should work.  However what I overlooked is probably that we still have race
when (1) and (2) can be interrupted.

One example calltrace when it could happen that we read an old interation, got
interrupted before even setting the dirty bit and flushing data:

    __schedule+1742
    __cond_resched+52
    __get_user_pages+530
    get_user_pages_unlocked+197
    hva_to_pfn+206
    try_async_pf+132
    direct_page_fault+320
    kvm_mmu_page_fault+103
    vmx_handle_exit+288
    vcpu_enter_guest+2460
    kvm_arch_vcpu_ioctl_run+325
    kvm_vcpu_ioctl+526
    __x64_sys_ioctl+131
    do_syscall_64+51
    entry_SYSCALL_64_after_hwframe+68

It means iteration number cached in vcpu register can be very old when dirty
bit set and data flushed.

So far I don't see an easy way to guarantee all steps 1-3 atomicity but to sync
at the GUEST_SYNC() point of guest code when we do verification of the dirty
bits as what this patch does.

[1] https://lore.kernel.org/lkml/20210413213641.23742-1-peterx@redhat.com/
[2] https://lore.kernel.org/lkml/20210417140956.GV4440@xz-x1/

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210417143602.215059-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-21 12:20:02 -04:00
..
aarch64 KVM: arm64: selftests: Filter out DEMUX registers 2020-11-27 19:46:47 +00:00
include KVM: selftests: List all hugetlb src types specified with page sizes 2021-04-20 04:18:53 -04:00
lib KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers 2021-04-20 04:18:53 -04:00
s390x KVM: selftests: sync_regs test for diag318 2020-12-10 13:36:05 +01:00
x86_64 KVM: selftests: remove redundant semi-colon 2021-04-17 08:31:01 -04:00
.gitignore KVM: selftests: Add a test for kvm page table code 2021-04-20 04:18:53 -04:00
config selftests: kvm: Adding config fragments 2019-08-09 16:52:38 +02:00
demand_paging_test.c KVM: selftests: Add backing src parameter to dirty_log_perf_test 2021-02-04 05:27:19 -05:00
dirty_log_perf_test.c KVM: selftests: Disable dirty logging with vCPUs running 2021-02-04 05:27:20 -05:00
dirty_log_test.c KVM: selftests: Sync data verify of dirty logging with guest sync 2021-04-21 12:20:02 -04:00
hardware_disable_test.c selftests: kvm: add hardware_disable test 2021-02-15 11:42:36 -05:00
kvm_create_max_vcpus.c KVM: selftests: Convert some printf's to pr_info's 2020-03-16 17:57:07 +01:00
kvm_page_table_test.c KVM: selftests: Add a test for kvm page table code 2021-04-20 04:18:53 -04:00
Makefile KVM: selftests: Add a test for kvm page table code 2021-04-20 04:18:53 -04:00
memslot_modification_stress_test.c KVM: selftests: Add backing src parameter to dirty_log_perf_test 2021-02-04 05:27:19 -05:00
set_memory_region_test.c Merge branch 'kvm-master' into kvm-next 2021-01-07 18:06:52 -05:00
settings selftests: kvm: Raise the default timeout to 120 seconds 2021-02-09 08:17:08 -05:00
steal_time.c KVM: selftests: Rework timespec functions and usage 2020-03-18 14:08:56 +01:00