Normally a reader will stop once reaching a non-finalized
record. However, when a panic happens, writers from other CPUs
(or an interrupted context on the panic CPU) may have been
writing a record and were unable to finalize it. The panic CPU
will reserve/commit/finalize its panic records, but these will
be located after the non-finalized records. This results in
panic() not flushing the panic messages.
Extend _prb_read_valid() to skip over non-finalized records if
on the panic CPU.
Fixes: 896fbe20b4 ("printk: use the lockless ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240207134103.1357162-11-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
Currently pr_flush() will only wait for records that were
available to readers at the time of the call (using
prb_next_seq()). But there may be more records (non-finalized)
that have following finalized records. pr_flush() should wait
for these to print as well. Particularly because any trailing
finalized records may be the messages that the calling context
wants to ensure are printed.
Add a new ringbuffer function prb_next_reserve_seq() to return
the sequence number following the most recently reserved record.
This guarantees that pr_flush() will wait until all current
printk() messages (completed or in progress) have been printed.
Fixes: 3b604ca812 ("printk: add pr_flush()")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240207134103.1357162-10-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
With the lockless ringbuffer, it is allowed that multiple
CPUs/contexts write simultaneously into the buffer. This creates
an ambiguity as some writers will finalize sooner.
The documentation for the prb_read functions is not clear as it
refers to "not yet written" and "no data available". Clarify the
return values and language to be in terms of the reader: records
available for reading.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240207134103.1357162-9-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
For empty line records, no data blocks are created. Instead,
these valid records are identified by special logical position
values (in fields of @prb_desc.text_blk_lpos).
Currently the macro NO_LPOS is used for empty line records.
This name is confusing because it does not imply _why_ there is
no data block.
Rename NO_LPOS to EMPTY_LINE_LPOS so that it is clear why there
is no data block.
Also add comments explaining the use of EMPTY_LINE_LPOS as well
as clarification to the values used to represent data-less
blocks.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240207134103.1357162-6-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
Commit f244b4dc53 ("printk: ringbuffer: Improve
prb_next_seq() performance") introduced an optimization for
prb_next_seq() by using best-effort to track recently finalized
records. However, the order of finalization does not
necessarily match the order of the records. The optimization
changed prb_next_seq() to return inconsistent results, possibly
yielding sequence numbers that are not available to readers
because they are preceded by non-finalized records or they are
not yet visible to the reader CPU.
Rather than simply best-effort tracking recently finalized
records, force the committing writer to read records and
increment the last "contiguous block" of finalized records. In
order to do this, the sequence number instead of ID must be
stored because ID's cannot be directly compared.
A new memory barrier pair is introduced to guarantee that a
reader can always read the records up until the sequence number
returned by prb_next_seq() (unless the records have since
been overwritten in the ringbuffer).
This restores the original functionality of prb_next_seq()
while also keeping the optimization.
For 32bit systems, only the lower 32 bits of the sequence
number are stored. When reading the value, it is expanded to
the full 64bit sequence number using the 32bit seq macros,
which fold in the value returned by prb_first_seq().
Fixes: f244b4dc53 ("printk: ringbuffer: Improve prb_next_seq() performance")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240207134103.1357162-5-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
Note: This change only applies to 32bit architectures. On 64bit
architectures the macros are NOPs.
Currently prb_next_seq() is used as the base for the 32bit seq
macros __u64seq_to_ulseq() and __ulseq_to_u64seq(). However, in
a follow-up commit, prb_next_seq() will need to make use of the
32bit seq macros.
Use prb_first_seq() as the base for the 32bit seq macros instead
because it is guaranteed to return 64bit sequence numbers without
relying on any 32bit seq macros.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240207134103.1357162-4-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
If an output buffer size exceeded U16_MAX, the min_t(u16, ...) cast in
copy_data() was causing writes to truncate. This manifested as output
bytes being skipped, seen as %NUL bytes in pstore dumps when the available
record size was larger than 65536. Fix the cast to no longer truncate
the calculation.
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Reported-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Link: https://lore.kernel.org/lkml/d8bb1ec7-a4c5-43a2-9de0-9643a70b899f@linux.microsoft.com/
Fixes: b6cf8b3f33 ("printk: add lockless ringbuffer")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Reviewed-by: Tyler Hicks (Microsoft) <code@tyhicks.com>
Tested-by: Tyler Hicks (Microsoft) <code@tyhicks.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230811054528.never.165-kees@kernel.org
prb_next_seq() always iterates from the first known sequence number.
In the worst case, it might loop 8k times for 256kB buffer,
15k times for 512kB buffer, and 64k times for 2MB buffer.
It was reported that polling and reading using syslog interface
might occupy 50% of CPU.
Speedup the search by storing @id of the last finalized descriptor.
The loop is still needed because the @id is stored and read in the best
effort way. An atomic variable is used to keep the @id consistent.
But the stores and reads are not serialized against each other.
The descriptor could get reused in the meantime. The related sequence
number will be used only when it is still valid.
An invalid value should be read _only_ when there is a flood of messages
and the ringbuffer is rapidly reused. The performance is the least
problem in this case.
Reported-by: Chunlei Wang <chunlei.wang@mediatek.com>
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/1642770388-17327-1-git-send-email-quic_mojha@quicinc.com
Link: https://lore.kernel.org/lkml/YXlddJxLh77DKfIO@alley/T/#m43062e8b2a17f8dbc8c6ccdb8851fb0dbaabbb14
Counting text lines in a record simply involves counting the number
of newline characters (+1). However, it is searching the full data
block for newline characters, even though the text data can be (and
often is) a subset of that area. Since the extra area in the data
block was never initialized, the result is that extra newlines may
be seen and counted.
Restrict newline searching to the text data length.
Fixes: b6cf8b3f33 ("printk: add lockless ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210113144234.6545-1-john.ogness@linutronix.de
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl/A+VEACgkQUqAMR0iA
lPIwGw/9F/E2ZdX+Vgi3ZiR/5GdfVZeIW+QwhKXBQc8Jr9+p2JJ+UOPeeazKQA5l
bFt6GR67yjqtFS5gO76EPCQ6/Uu3cPA+A3HQRQZuE6p0zM+mrMXc/upLMy5DKi4Z
f4zkW8dYWSBpAWPvM9bb0gIKO9wVV6Aj1IyyZLfEghX/KrJPx0zutioO4ScYxhA9
YVITmnUQ6YzHEVE8CwWGV4lArC50ILGdIqNlZrkjuG3CuGTdyB2OY60P8XCy8bzn
W3WgRGI/bvfHwCPh8oYKm/5nM9JAVdhbEpoFQj8cMPKoH5DeSGNWfYXkali2gqhL
1Y2SntTcR7zclMcN0/gIn9ViVsma/eayAyawSYgQjmAdl6H/vv9B7x9ZswmK/b38
JzOzHwP+H3lXVg2yN4EbH3uDMTMjqflYuC7QiZ/HNa43KURXhoritw2hBRczhazp
mdyRQf4iv8NoYSthggD6LolCs+ay5NZpCeB3YXgnlpxiYFGCE+ykSz41AGdTyYTl
jOWVtK1VawFD0/FgpgF8XK7/gOXWeYb+4WeBYgGKgCJdneiB5eJt8eWT7zmpAPpG
FECexdAd4TAjD+EEbidiFWpMjJcY2TnOJp76O3/Wlo1QLbEgRHOklM/Rrq0zGg5b
vm3w0kobGZfpIJuzSOAHyErX0jGEVTq6yUi381jSQpf4bTttIpc=
=7IkU
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.10-rc6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk fixes from Petr Mladek:
- do not lose trailing newline in pr_cont() calls
- two trivial fixes for a dead store and a config description
* tag 'printk-for-5.10-rc6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: finalize records with trailing newlines
printk: remove unneeded dead-store assignment
init/Kconfig: Fix CPU number in LOG_CPU_MAX_BUF_SHIFT description
A bunch of functions in the new ringbuffer code take both a
printk_ringbuffer struct and a separate prb_data_ring. This is a relic
from an earlier version of the code when a second data ring was present.
Since this is no longer the case remove the extra function argument
from:
- data_make_reusable()
- data_push_tail()
- data_alloc()
- data_realloc()
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
make clang-analyzer on x86_64 defconfig caught my attention with:
kernel/printk/printk_ringbuffer.c:885:3: warning:
Value stored to 'desc' is never read [clang-analyzer-deadcode.DeadStores]
desc = to_desc(desc_ring, head_id);
^
Commit b6cf8b3f33 ("printk: add lockless ringbuffer") introduced
desc_reserve() with this unneeded dead-store assignment.
As discussed with John Ogness privately, this is probably just some minor
left-over from previous iterations of the ringbuffer implementation. So,
simply remove this unneeded dead assignment to make clang-analyzer happy.
As compilers will detect this unneeded assignment and optimize this anyway,
the resulting object code is identical before and after this change.
No functional change. No change to object code.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201106034005.18822-1-lukas.bulwahn@gmail.com
If a reader provides a buffer that is smaller than the message text,
the @text_len field of @info will have a value larger than the buffer
size. If readers blindly read @text_len bytes of data without
checking the size, they will read beyond their buffer.
Add this check to record_print_text() to properly recognize when such
truncation has occurred.
Add a maximum size argument to the ringbuffer function to extend
records so that records can not be created that are larger than the
buffer size of readers.
When extending records (LOG_CONT), do not extend records beyond
LOG_LINE_MAX since that is the maximum size available in the buffers
used by consoles and syslog.
Fixes: f5f022e53b ("printk: reimplement log_cont using record extension")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200930090134.8723-2-john.ogness@linutronix.de
Since there is no code that will ever store anything into the dict
ring, remove it. If any future dictionary properties are to be
added, these should be added to the struct printk_info.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200918223421.21621-4-john.ogness@linutronix.de
The majority of the size of a descriptor is taken up by meta data,
which is often not of interest to the ringbuffer (for example,
when performing state checks). Since descriptors are often
temporarily stored on the stack, keeping their size minimal will
help reduce stack pressure.
Rather than embedding the printk_info into the descriptor, create
a separate printk_info array. The index of a descriptor in the
descriptor array corresponds to the printk_info with the same
index in the printk_info array. The rules for validity of a
printk_info match the existing rules for the data blocks: the
descriptor must be in a consistent state.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200918223421.21621-2-john.ogness@linutronix.de
Add support for extending the newest data block. For this, introduce
a new finalization state (desc_finalized) denoting a committed
descriptor that cannot be extended.
Until a record is finalized, a writer can reopen that record to
append new data. Reopening a record means transitioning from the
desc_committed state back to the desc_reserved state.
A writer can explicitly finalize a record if there is no intention
of extending it. Also, records are automatically finalized when a
new record is reserved. This relieves writers of needing to
explicitly finalize while also making such records available to
readers sooner. (Readers can only traverse finalized records.)
Four new memory barrier pairs are introduced. Two of them are
insignificant additions (data_realloc:A/desc_read:D and
data_realloc:A/data_push_tail:B) because they are alternate path
memory barriers that exactly match the purpose, pairing, and
context of the two existing memory barrier pairs they provide an
alternate path for. The other two new memory barrier pairs are
significant additions:
desc_reopen_last:A / _prb_commit:B - When reopening a descriptor,
ensure the state transitions back to desc_reserved before
fully trusting the descriptor data.
_prb_commit:B / desc_reserve:D - When committing a descriptor,
ensure the state transitions to desc_committed before checking
the head ID to see if the descriptor needs to be finalized.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200914123354.832-6-john.ogness@linutronix.de
Rather than deriving the state by evaluating bits within the flags
area of the state variable, assign the states explicit values and
set those values in the flags area. Introduce macros to make it
simple to read and write state values for the state variable.
Although the functionality is preserved, the binary representation
for the states is changed.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200914123354.832-5-john.ogness@linutronix.de
prb_reserve() will set some meta data values and leave others
uninitialized (or rather, containing the values of the previous
wrap). Simplify the API by always clearing out all the fields.
Only the sequence number is filled in. The caller is now
responsible for filling in the rest of the meta data fields.
In particular, for correctly filling in text and dict lengths.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200914123354.832-4-john.ogness@linutronix.de
Rather than continually needing to explicitly check @begin and @next
to identify a dataless block, introduce and use a BLK_DATALESS()
macro.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200914123354.832-3-john.ogness@linutronix.de
Move the internal get_data() function as-is above prb_reserve() so
that a later change can make use of the static function.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200914123354.832-2-john.ogness@linutronix.de
@state_var is copied as part of the descriptor copying via
memcpy(). This is not allowed because @state_var is an atomic type,
which in some implementations may contain a spinlock.
Avoid using memcpy() with @state_var by explicitly copying the other
fields of the descriptor. @state_var is set using atomic set
operator before returning.
Fixes: b6cf8b3f33 ("printk: add lockless ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200914094803.27365-2-john.ogness@linutronix.de
It is expected that desc_read() will always set at least the
@state_var field. However, if the descriptor is in an inconsistent
state, no fields are set.
Also, the second load of @state_var is not stored in @desc_out and
so might not match the state value that is returned.
Always set the last loaded @state_var into @desc_out, regardless of
the descriptor consistency.
Fixes: b6cf8b3f33 ("printk: add lockless ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200914094803.27365-1-john.ogness@linutronix.de
With commit 896fbe20b4 ("printk: use the lockless ringbuffer"),
printk() started silently dropping messages without text because such
records are not supported by the new printk ringbuffer.
Add support for such records.
Currently dataless records are denoted by INVALID_LPOS in order
to recognize failed prb_reserve() calls. Change the ringbuffer
to instead use two different identifiers (FAILED_LPOS and
NO_LPOS) to distinguish between failed prb_reserve() records and
successful dataless records, respectively.
Fixes: 896fbe20b4 ("printk: use the lockless ringbuffer")
Fixes: https://lkml.kernel.org/r/20200718121053.GA691245@elver.google.com
Reported-by: Marco Elver <elver@google.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200721132528.9661-1-john.ogness@linutronix.de
Introduce a multi-reader multi-writer lockless ringbuffer for storing
the kernel log messages. Readers and writers may use their API from
any context (including scheduler and NMI). This ringbuffer will make
it possible to decouple printk() callers from any context, locking,
or console constraints. It also makes it possible for readers to have
full access to the ringbuffer contents at any time and context (for
example from any panic situation).
The printk_ringbuffer is made up of 3 internal ringbuffers:
desc_ring:
A ring of descriptors. A descriptor contains all record meta data
(sequence number, timestamp, loglevel, etc.) as well as internal state
information about the record and logical positions specifying where in
the other ringbuffers the text and dictionary strings are located.
text_data_ring:
A ring of data blocks. A data block consists of an unsigned long
integer (ID) that maps to a desc_ring index followed by the text
string of the record.
dict_data_ring:
A ring of data blocks. A data block consists of an unsigned long
integer (ID) that maps to a desc_ring index followed by the dictionary
string of the record.
The internal state information of a descriptor is the key element to
allow readers and writers to locklessly synchronize access to the data.
Co-developed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200709132344.760-3-john.ogness@linutronix.de