mirror of
https://github.com/python/cpython.git
synced 2025-01-06 08:34:26 +08:00
c8165036f3
Document that lnotab can contain invalid bytecode offsets (because of terrible reasons that are difficult to fix). Make dis.findlinestarts() ignore invalid offsets in lnotab. All other uses of lnotab in CPython (various reimplementations of addr2line or line2addr in Python, C and gdb) already ignore this, because they take an address to look for, instead. Add tests for the result of dis.findlinestarts() on wacky constructs in test_peepholer.py, because it's the easiest place to add them.
138 lines
5.9 KiB
Plaintext
138 lines
5.9 KiB
Plaintext
All about co_lnotab, the line number table.
|
|
|
|
Code objects store a field named co_lnotab. This is an array of unsigned bytes
|
|
disguised as a Python bytes object. It is used to map bytecode offsets to
|
|
source code line #s for tracebacks and to identify line number boundaries for
|
|
line tracing. Because of internals of the peephole optimizer, it's possible
|
|
for lnotab to contain bytecode offsets that are no longer valid (for example
|
|
if the optimizer removed the last line in a function).
|
|
|
|
The array is conceptually a compressed list of
|
|
(bytecode offset increment, line number increment)
|
|
pairs. The details are important and delicate, best illustrated by example:
|
|
|
|
byte code offset source code line number
|
|
0 1
|
|
6 2
|
|
50 7
|
|
350 207
|
|
361 208
|
|
|
|
Instead of storing these numbers literally, we compress the list by storing only
|
|
the difference from one row to the next. Conceptually, the stored list might
|
|
look like:
|
|
|
|
0, 1, 6, 1, 44, 5, 300, 200, 11, 1
|
|
|
|
The above doesn't really work, but it's a start. An unsigned byte (byte code
|
|
offset) can't hold negative values, or values larger than 255, a signed byte
|
|
(line number) can't hold values larger than 127 or less than -128, and the
|
|
above example contains two such values. (Note that before 3.6, line number
|
|
was also encoded by an unsigned byte.) So we make two tweaks:
|
|
|
|
(a) there's a deep assumption that byte code offsets increase monotonically,
|
|
and
|
|
(b) if byte code offset jumps by more than 255 from one row to the next, or if
|
|
source code line number jumps by more than 127 or less than -128 from one row
|
|
to the next, more than one pair is written to the table. In case #b,
|
|
there's no way to know from looking at the table later how many were written.
|
|
That's the delicate part. A user of co_lnotab desiring to find the source
|
|
line number corresponding to a bytecode address A should do something like
|
|
this:
|
|
|
|
lineno = addr = 0
|
|
for addr_incr, line_incr in co_lnotab:
|
|
addr += addr_incr
|
|
if addr > A:
|
|
return lineno
|
|
if line_incr >= 0x80:
|
|
line_incr -= 0x100
|
|
lineno += line_incr
|
|
|
|
(In C, this is implemented by PyCode_Addr2Line().) In order for this to work,
|
|
when the addr field increments by more than 255, the line # increment in each
|
|
pair generated must be 0 until the remaining addr increment is < 256. So, in
|
|
the example above, assemble_lnotab in compile.c should not (as was actually done
|
|
until 2.2) expand 300, 200 to
|
|
255, 255, 45, 45,
|
|
but to
|
|
255, 0, 45, 127, 0, 73.
|
|
|
|
The above is sufficient to reconstruct line numbers for tracebacks, but not for
|
|
line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
|
|
and maybe_call_line_trace() in ceval.c.
|
|
|
|
*** Tracing ***
|
|
|
|
To a first approximation, we want to call the tracing function when the line
|
|
number of the current instruction changes. Re-computing the current line for
|
|
every instruction is a little slow, though, so each time we compute the line
|
|
number we save the bytecode indices where it's valid:
|
|
|
|
*instr_lb <= frame->f_lasti < *instr_ub
|
|
|
|
is true so long as execution does not change lines. That is, *instr_lb holds
|
|
the first bytecode index of the current line, and *instr_ub holds the first
|
|
bytecode index of the next line. As long as the above expression is true,
|
|
maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note
|
|
that the same line may appear multiple times in the lnotab, either because the
|
|
bytecode jumped more than 255 indices between line number changes or because
|
|
the compiler inserted the same line twice. Even in that case, *instr_ub holds
|
|
the first index of the next line.
|
|
|
|
However, we don't *always* want to call the line trace function when the above
|
|
test fails.
|
|
|
|
Consider this code:
|
|
|
|
1: def f(a):
|
|
2: while a:
|
|
3: print(1)
|
|
4: break
|
|
5: else:
|
|
6: print(2)
|
|
|
|
which compiles to this:
|
|
|
|
2 0 SETUP_LOOP 26 (to 28)
|
|
>> 2 LOAD_FAST 0 (a)
|
|
4 POP_JUMP_IF_FALSE 18
|
|
|
|
3 6 LOAD_GLOBAL 0 (print)
|
|
8 LOAD_CONST 1 (1)
|
|
10 CALL_FUNCTION 1
|
|
12 POP_TOP
|
|
|
|
4 14 BREAK_LOOP
|
|
16 JUMP_ABSOLUTE 2
|
|
>> 18 POP_BLOCK
|
|
|
|
6 20 LOAD_GLOBAL 0 (print)
|
|
22 LOAD_CONST 2 (2)
|
|
24 CALL_FUNCTION 1
|
|
26 POP_TOP
|
|
>> 28 LOAD_CONST 0 (None)
|
|
30 RETURN_VALUE
|
|
|
|
If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18
|
|
and the co_lnotab will claim that execution has moved to line 4, which is wrong.
|
|
In this case, we could instead associate the POP_BLOCK with line 5, but that
|
|
would break jumps around loops without else clauses.
|
|
|
|
We fix this by only calling the line trace function for a forward jump if the
|
|
co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current
|
|
instruction offset matches the offset given for the start of a line by the
|
|
co_lnotab. For backward jumps, however, we always call the line trace function,
|
|
which lets a debugger stop on every evaluation of a loop guard (which usually
|
|
won't be the first opcode in a line).
|
|
|
|
Why do we set f_lineno when tracing, and only just before calling the trace
|
|
function? Well, consider the code above when 'a' is true. If stepping through
|
|
this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
|
|
line events on lines 2, 3, and 4, then a "return" type event -- but because the
|
|
code for the return actually falls in the range of the "line 6" opcodes, you
|
|
would be shown line 6 during this event. This is a change from the behaviour in
|
|
2.2 and before, and I've found it confusing in practice. By setting and using
|
|
f_lineno when tracing, one can report a line number different from that
|
|
suggested by f_lasti on this one occasion where it's desirable.
|