mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2025-01-05 21:35:04 +08:00
09f037aa48
update_permission_bitmask currently does a 128-iteration loop to, essentially, compute a constant array. Computing the 8 bits in parallel reduces it to 16 iterations, and is enough to speed it up substantially because many boolean operations in the inner loop become constants or simplify noticeably. Because update_permission_bitmask is actually the top item in the profile for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand clock cycles, or up to 30%: before after cpuid 35173 25954 vmcall 35122 27079 inl_from_pmtimer 52635 42675 inl_from_qemu 53604 44599 inl_from_kernel 38498 30798 outl_to_kernel 34508 28816 wr_tsc_adjust_msr 34185 26818 rd_tsc_adjust_msr 37409 27049 mmio-no-eventfd:pci-mem 50563 45276 mmio-wildcard-eventfd:pci-mem 34495 30823 mmio-datamatch-eventfd:pci-mem 35612 31071 portio-no-eventfd:pci-io 44925 40661 portio-wildcard-eventfd:pci-io 29708 27269 portio-datamatch-eventfd:pci-io 31135 27164 (I wrote a small C program to compare the tables for all values of CR0.WP, CR4.SMAP and CR4.SMEP, and they match). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
---|---|---|
.. | ||
cpuid.c | ||
cpuid.h | ||
debugfs.c | ||
emulate.c | ||
hyperv.c | ||
hyperv.h | ||
i8254.c | ||
i8254.h | ||
i8259.c | ||
ioapic.c | ||
ioapic.h | ||
irq_comm.c | ||
irq.c | ||
irq.h | ||
Kconfig | ||
kvm_cache_regs.h | ||
lapic.c | ||
lapic.h | ||
Makefile | ||
mmu_audit.c | ||
mmu.c | ||
mmu.h | ||
mmutrace.h | ||
mtrr.c | ||
page_track.c | ||
paging_tmpl.h | ||
pmu_amd.c | ||
pmu_intel.c | ||
pmu.c | ||
pmu.h | ||
svm.c | ||
trace.h | ||
tss.h | ||
vmx.c | ||
x86.c | ||
x86.h |