linux/arch/x86
Xiaochen Shen 06c5fe9b12 x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
The MBA software controller (mba_sc) is a feedback loop which
periodically reads MBM counters and tries to restrict the bandwidth
below a user-specified value. It tags along the MBM counter overflow
handler to do the updates with 1s interval in mbm_update() and
update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is shown to
the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d852 (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
2020-12-10 17:52:37 +01:00
..
boot A set of fixes for x86: 2020-12-06 11:22:39 -08:00
configs * A defconfig fix, from Daniel Díaz. 2020-09-20 15:06:43 -07:00
crypto crypto: x86/poly1305 - add back a needed assignment 2020-10-24 09:38:32 +11:00
entry A couple of x86 fixes which missed rc1 due to my stupidity: 2020-10-27 14:39:29 -07:00
events perf/x86/intel: Check PEBS status correctly 2020-12-03 10:00:26 +01:00
hyperv hyperv-fixes for 5.10-rc3 2020-11-05 11:32:03 -08:00
ia32 x86: remove address space overrides using set_fs() 2020-09-08 22:21:36 -04:00
include x86/mm/mem_encrypt: Fix definition of PMD_FLAGS_DEC_WP 2020-12-10 12:28:06 +01:00
kernel x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled 2020-12-10 17:52:37 +01:00
kvm ARM: 2020-11-27 11:04:13 -08:00
lib x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes 2020-12-06 10:03:08 +01:00
math-emu treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mm x86/mm/mem_encrypt: Fix definition of PMD_FLAGS_DEC_WP 2020-12-10 12:28:06 +01:00
net bpf: x64: Do not emit sub/add 0, %rsp when !stack_depth 2020-09-29 16:47:39 -07:00
oprofile
pci pci-v5.10-changes 2020-10-22 12:41:00 -07:00
platform efi/x86: Free efi_pgd with free_pages() 2020-11-10 19:18:11 +01:00
power Kbuild updates for v5.9 2020-08-09 14:10:26 -07:00
purgatory treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
ras treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
realmode x86/head/64: Don't call verify_cpu() on starting APs 2020-09-09 11:33:20 +02:00
tools x86/insn: Make inat-tables.c suitable for pre-decompression code 2020-09-07 19:45:24 +02:00
um arch/um: partially revert the conversion to __section() macro 2020-10-26 15:39:37 -07:00
video
xen xen: branch for v5.10-rc5 2020-11-20 10:30:48 -08:00
.gitignore
Kbuild
Kconfig kbuild: Hoist '--orphan-handling' into Kconfig 2020-12-01 22:45:36 +09:00
Kconfig.assembler
Kconfig.cpu treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Kconfig.debug x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() 2020-10-06 11:18:04 +02:00
Makefile kbuild: Hoist '--orphan-handling' into Kconfig 2020-12-01 22:45:36 +09:00
Makefile_32.cpu
Makefile.um