linux/kernel/trace
Steven Rostedt (Red Hat) 651e22f270 ring-buffer: Always reset iterator to reader page
When performing a consuming read, the ring buffer swaps out a
page from the ring buffer with a empty page and this page that
was swapped out becomes the new reader page. The reader page
is owned by the reader and since it was swapped out of the ring
buffer, writers do not have access to it (there's an exception
to that rule, but it's out of scope for this commit).

When reading the "trace" file, it is a non consuming read, which
means that the data in the ring buffer will not be modified.
When the trace file is opened, a ring buffer iterator is allocated
and writes to the ring buffer are disabled, such that the iterator
will not have issues iterating over the data.

Although the ring buffer disabled writes, it does not disable other
reads, or even consuming reads. If a consuming read happens, then
the iterator is reset and starts reading from the beginning again.

My tests would sometimes trigger this bug on my i386 box:

WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
Modules linked in:
CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
 f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
 ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
Call Trace:
 [<c18796b3>] dump_stack+0x4b/0x75
 [<c103a0e3>] warn_slowpath_common+0x7e/0x95
 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
 [<c103a185>] warn_slowpath_fmt+0x33/0x35
 [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M
 [<c10bed04>] trace_find_cmdline+0x40/0x64
 [<c10c3c16>] trace_print_context+0x27/0xec
 [<c10c4360>] ? trace_seq_printf+0x37/0x5b
 [<c10c0b15>] print_trace_line+0x319/0x39b
 [<c10ba3fb>] ? ring_buffer_read+0x47/0x50
 [<c10c13b1>] s_show+0x192/0x1ab
 [<c10bfd9a>] ? s_next+0x5a/0x7c
 [<c112e76e>] seq_read+0x267/0x34c
 [<c1115a25>] vfs_read+0x8c/0xef
 [<c112e507>] ? seq_lseek+0x154/0x154
 [<c1115ba2>] SyS_read+0x54/0x7f
 [<c188488e>] syscall_call+0x7/0xb
---[ end trace 3f507febd6b4cc83 ]---
>>>> ##### CPU 1 buffer started ####

Which was the __trace_find_cmdline() function complaining about the pid
in the event record being negative.

After adding more test cases, this would trigger more often. Strangely
enough, it would never trigger on a single test, but instead would trigger
only when running all the tests. I believe that was the case because it
required one of the tests to be shutting down via delayed instances while
a new test started up.

After spending several days debugging this, I found that it was caused by
the iterator becoming corrupted. Debugging further, I found out why
the iterator became corrupted. It happened with the rb_iter_reset().

As consuming reads may not read the full reader page, and only part
of it, there's a "read" field to know where the last read took place.
The iterator, must also start at the read position. In the rb_iter_reset()
code, if the reader page was disconnected from the ring buffer, the iterator
would start at the head page within the ring buffer (where writes still
happen). But the mistake there was that it still used the "read" field
to start the iterator on the head page, where it should always start
at zero because readers never read from within the ring buffer where
writes occur.

I originally wrote a patch to have it set the iter->head to 0 instead
of iter->head_page->read, but then I questioned why it wasn't always
setting the iter to point to the reader page, as the reader page is
still valid.  The list_empty(reader_page->list) just means that it was
successful in swapping out. But the reader_page may still have data.

There was a bug report a long time ago that was not reproducible that
had something about trace_pipe (consuming read) not matching trace
(iterator read). This may explain why that happened.

Anyway, the correct answer to this bug is to always use the reader page
an not reset the iterator to inside the writable ring buffer.

Cc: stable@vger.kernel.org # 2.6.28+
Fixes: d769041f86 "ring_buffer: implement new locking"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-06 16:06:34 -04:00
..
blktrace.c Most of the changes were largely clean ups, and some documentation. 2014-04-03 10:26:31 -07:00
ftrace.c tracing: Fix graph tracer with stack tracer on other archs 2014-07-15 11:10:25 -04:00
Kconfig tracing: Add tracepoint benchmark tracepoint 2014-05-29 22:49:54 -04:00
Makefile tracing: Add tracepoint benchmark tracepoint 2014-05-29 22:49:54 -04:00
power-traces.c PM / tracing: remove deprecated power trace API 2013-01-26 00:39:12 +01:00
ring_buffer_benchmark.c trace: Replace hardcoding of 19 with MAX_NICE 2014-02-27 12:41:03 +01:00
ring_buffer.c ring-buffer: Always reset iterator to reader page 2014-08-06 16:06:34 -04:00
rpm-traces.c PM / Runtime: Introduce trace points for tracing rpm_* functions 2011-09-27 22:53:27 +02:00
trace_benchmark.c tracing: Only calculate stats of tracepoint benchmarks for 2^32 times 2014-06-06 00:41:38 -04:00
trace_benchmark.h tracing: Add tracepoint benchmark tracepoint 2014-05-29 22:49:54 -04:00
trace_branch.c tracing: Update event filters for multibuffer 2013-11-05 16:50:20 -05:00
trace_clock.c tracing: Fix wraparound problems in "uptime" trace clock 2014-07-21 09:56:12 -04:00
trace_entries.h tracing: Add trace_puts() for even faster trace_printk() tracing 2013-03-15 00:35:55 -04:00
trace_event_perf.c kprobes, ftrace: Use NOKPROBE_SYMBOL macro in ftrace 2014-04-24 10:26:39 +02:00
trace_events_filter_test.h tracing/filter: Add startup tests for events filter 2011-08-19 14:35:59 -04:00
trace_events_filter.c tracing: Add and use generic set_trigger_filter() implementation 2013-12-21 22:02:17 -05:00
trace_events_trigger.c tracing: Use rcu_dereference_sched() for trace event triggers 2014-05-02 23:12:42 -04:00
trace_events.c tracing: instance_rmdir() leaks ftrace_event_file->filter 2014-07-14 14:41:13 -04:00
trace_export.c tracing: Fix anonymous unions in struct ftrace_event_call 2014-04-09 20:02:55 -04:00
trace_functions_graph.c tracing: Add funcgraph_tail option to print function name after closing braces 2014-05-20 23:29:32 -04:00
trace_functions.c tracing: Remove mock up poll wait function 2014-04-30 08:40:05 -04:00
trace_irqsoff.c tracing: Allow irq/preempt tracers to be used by instances 2014-04-21 13:59:29 -04:00
trace_kdb.c tracing: Consolidate max_tr into main trace_array structure 2013-03-15 00:35:40 -04:00
trace_kprobe.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 19:18:49 -07:00
trace_mmiotrace.c tracing: Update event filters for multibuffer 2013-11-05 16:50:20 -05:00
trace_nop.c tracing: Remove mock up poll wait function 2014-04-30 08:40:05 -04:00
trace_output.c tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks 2014-05-15 11:29:37 -04:00
trace_output.h tracing: Rename trace_event_mutex to trace_event_sem 2013-03-15 13:22:10 -04:00
trace_printk.c tracing: Add __tracepoint_string() to export string pointers 2013-07-26 13:39:44 -04:00
trace_probe.c kprobes, ftrace: Use NOKPROBE_SYMBOL macro in ftrace 2014-04-24 10:26:39 +02:00
trace_probe.h kprobes, ftrace: Use NOKPROBE_SYMBOL macro in ftrace 2014-04-24 10:26:39 +02:00
trace_sched_switch.c tracing: Update event filters for multibuffer 2013-11-05 16:50:20 -05:00
trace_sched_wakeup.c tracing: Remove mock up poll wait function 2014-04-30 08:40:05 -04:00
trace_selftest_dynamic.c ftrace: Add self-tests for multiple function trace users 2011-05-18 19:24:51 -04:00
trace_selftest.c tracing: Add static to local functions 2014-04-21 14:00:46 -04:00
trace_stack.c tracing: Print max callstack on stacktrace bug 2014-06-02 16:43:49 -04:00
trace_stat.c trace/trace_stat: use rbtree postorder iteration helper instead of opencoding 2013-11-05 16:01:47 -05:00
trace_stat.h tracing/stat: Add stat_release() callback 2009-07-10 12:14:05 +02:00
trace_syscalls.c tracing: Consolidate event trigger code 2014-01-09 21:20:07 -05:00
trace_uprobe.c tracing/uprobes: Fix the usage of uprobe_buffer_enable() in probe_event_enable() 2014-06-30 13:22:33 -04:00
trace.c tracing: Fix wraparound problems in "uptime" trace clock 2014-07-21 09:56:12 -04:00
trace.h tracing: Fix check of ftrace_trace_arrays list_empty() check 2014-06-10 13:53:50 -04:00