mirror of
https://github.com/python/cpython.git
synced 2024-11-27 03:45:08 +08:00
Just another intermediate version...
This commit is contained in:
parent
1c462adaa8
commit
7b632a6073
128
Doc/ref.tex
128
Doc/ref.tex
@ -1,5 +1,5 @@
|
||||
% Format this file with latex.
|
||||
|
||||
|
||||
\documentstyle[myformat]{report}
|
||||
|
||||
\title{\bf
|
||||
@ -65,17 +65,18 @@ rather than formal specifications for everything except syntax and
|
||||
lexical analysis. This should make the document better understandable
|
||||
to the average reader, but will leave room for ambiguities.
|
||||
Consequently, if you were coming from Mars and tried to re-implement
|
||||
Python from this document alone, you might in fact be implementing
|
||||
quite a different language. On the other hand, if you are using
|
||||
Python from this document alone, you might have to guess things and in
|
||||
fact you would be implementing quite a different language.
|
||||
On the other hand, if you are using
|
||||
Python and wonder what the precise rules about a particular area of
|
||||
the language are, you should be able to find it here.
|
||||
the language are, you should definitely be able to find it here.
|
||||
|
||||
It is dangerous to add too many implementation details to a language
|
||||
reference document -- the implementation may change, and other
|
||||
implementations of the same language may work differently. On the
|
||||
other hand, there is currently only one Python implementation, and
|
||||
particular quirks of it are sometimes worth mentioning, especially
|
||||
where it differs from the ``ideal'' specification.
|
||||
its particular quirks are sometimes worth being mentioned, especially
|
||||
where the implementation imposes additional limitations.
|
||||
|
||||
Every Python implementation comes with a number of built-in and
|
||||
standard modules. These are not documented here, but in the separate
|
||||
@ -93,20 +94,20 @@ name: lcletter (lcletter | "_")*
|
||||
lcletter: "a"..."z"
|
||||
\end{verbatim}
|
||||
|
||||
The first line says that a \verb\name\ is a \verb\lcletter\ followed by
|
||||
a sequence of zero or more \verb\lcletter\s and underscores. A
|
||||
The first line says that a \verb\name\ is an \verb\lcletter\ followed by
|
||||
a sequence of zero or more \verb\lcletter\s and underscores. An
|
||||
\verb\lcletter\ in turn is any of the single characters `a' through `z'.
|
||||
(This rule is actually adhered to for the names defined in syntax and
|
||||
grammar rules in this document.)
|
||||
|
||||
Each rule begins with a name (which is the name defined by the rule)
|
||||
followed by a colon. Each rule is wholly contained on one line. A
|
||||
vertical bar (\verb\|\) is used to separate alternatives, it is the
|
||||
least binding operator in this notation. A star (\verb\*\) means zero
|
||||
or more repetitions of the preceding item; likewise, a plus (\verb\+\)
|
||||
means one or more repetitions and a question mark (\verb\?\) zero or
|
||||
one (in other words, the preceding item is optional). These three
|
||||
operators bind as tight as possible; parentheses are used for
|
||||
and a colon, and is wholly contained on one line. A vertical bar
|
||||
(\verb\|\) is used to separate alternatives; it is the least binding
|
||||
operator in this notation. A star (\verb\*\) means zero or more
|
||||
repetitions of the preceding item; likewise, a plus (\verb\+\) means
|
||||
one or more repetitions, and a question mark (\verb\?\) zero or one
|
||||
(in other words, the preceding item is optional). These three
|
||||
operators bind as tightly as possible; parentheses are used for
|
||||
grouping. Literal strings are enclosed in double quotes. White space
|
||||
is only meaningful to separate tokens.
|
||||
|
||||
@ -117,7 +118,7 @@ characters. A phrase between angular brackets (\verb\<...>\) gives an
|
||||
informal description of the symbol defined; e.g., this could be used
|
||||
to describe the notion of `control character' if needed.
|
||||
|
||||
Although the notation used is almost the same, there is a big
|
||||
Even though the notation used is almost the same, there is a big
|
||||
difference between the meaning of lexical and syntactic definitions:
|
||||
a lexical definition operates on the individual characters of the
|
||||
input source, while a syntax definition operates on the stream of
|
||||
@ -131,22 +132,22 @@ chapter describes how the lexical analyzer breaks a file into tokens.
|
||||
|
||||
\section{Line structure}
|
||||
|
||||
A Python program is divided in a number of logical lines. Statements
|
||||
do not straddle logical line boundaries except where explicitly
|
||||
indicated by the syntax (i.e., for compound statements). To this
|
||||
purpose, the end of a logical line is represented by the token
|
||||
NEWLINE.
|
||||
A Python program is divided in a number of logical lines. The end of
|
||||
a logical line is represented by the token NEWLINE. Statements cannot
|
||||
cross logical line boundaries except where NEWLINE is allowed by the
|
||||
syntax (e.g., between statements in compound statements).
|
||||
|
||||
\subsection{Comments}
|
||||
|
||||
A comment starts with a hash character (\verb\#\) that is not part of
|
||||
a string literal, and ends at the end of the physical line. Comments
|
||||
are ignored by the syntax.
|
||||
a string literal, and ends at the end of the physical line. A comment
|
||||
always signifies the end of the logical line. Comments are ignored by
|
||||
the syntax.
|
||||
|
||||
\subsection{Line joining}
|
||||
|
||||
Two or more physical lines may be joined into logical lines using
|
||||
backslash characters (\verb/\/), as follows: When physical line ends
|
||||
backslash characters (\verb/\/), as follows: when a physical line ends
|
||||
in a backslash that is not part of a string literal or comment, it is
|
||||
joined with the following forming a single logical line, deleting the
|
||||
backslash and the following end-of-line character.
|
||||
@ -160,13 +161,14 @@ terminates a multi-line statement.
|
||||
|
||||
\subsection{Indentation}
|
||||
|
||||
Spaces and tabs at the beginning of a logical line are used to compute
|
||||
the indentation level of the line, which in turn is used to determine
|
||||
the grouping of statements.
|
||||
Leading whitespace (spaces and tabs) at the beginning of a logical
|
||||
line is used to compute the indentation level of the line, which in
|
||||
turn is used to determine the grouping of statements.
|
||||
|
||||
First, each tab is replaced by one to eight spaces such that the total
|
||||
number of spaces up to that point is a multiple of eight. The total
|
||||
number of spaces preceding the first non-blank character then
|
||||
First, tabs are replaced (from left to right) by one to eight spaces
|
||||
such that the total number of characters up to there is a multiple of
|
||||
eight (this is intended to be the same rule as used by UNIX). The
|
||||
total number of spaces preceding the first non-blank character then
|
||||
determines the line's indentation. Indentation cannot be split over
|
||||
multiple physical lines using backslashes.
|
||||
|
||||
@ -185,6 +187,38 @@ popped off, and for each number popped off a DEDENT token is
|
||||
generated. At the end of the file, a DEDENT token is generated for
|
||||
each number remaining on the stack that is larger than zero.
|
||||
|
||||
Here is an example of a correctly (though confusingly) indented piece
|
||||
of Python code:
|
||||
|
||||
\begin{verbatim}
|
||||
def perm(l):
|
||||
if len(l) <= 1:
|
||||
return [l]
|
||||
r = []
|
||||
for i in range(len(l)):
|
||||
s = l[:i] + l[i+1:]
|
||||
p = perm(s)
|
||||
for x in p:
|
||||
r.append(l[i:i+1] + x)
|
||||
return r
|
||||
\end{verbatim}
|
||||
|
||||
The following example shows various indentation errors:
|
||||
|
||||
\begin{verbatim}
|
||||
def perm(l): # error: first line indented
|
||||
for i in range(len(l)): # error: not indented
|
||||
s = l[:i] + l[i+1:]
|
||||
p = perm(l[:i] + l[i+1:]) # error: unexpected indent
|
||||
for x in p:
|
||||
r.append(l[i:i+1] + x)
|
||||
return r # error: inconsistent indent
|
||||
\end{verbatim}
|
||||
|
||||
(Actually, the first three errors are detected by the parser; only the
|
||||
last error is found by the lexical analyzer -- the indentation of
|
||||
\verb\return r\ does not match a level popped off the stack.)
|
||||
|
||||
\section{Other tokens}
|
||||
|
||||
Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
|
||||
@ -205,12 +239,13 @@ uppercase: "A"..."Z"
|
||||
digit: "0"..."9"
|
||||
\end{verbatim}
|
||||
|
||||
Identifiers are unlimited in length. Case is significant.
|
||||
Identifiers are unlimited in length. Case is significant. Keywords
|
||||
are not identifiers.
|
||||
|
||||
\section{Keywords}
|
||||
|
||||
The following identifiers are used as reserved words, or {\em
|
||||
keywords} of the language, and may not be used as ordinary
|
||||
keywords} of the language, and cannot be used as ordinary
|
||||
identifiers. They must be spelled exactly as written here:
|
||||
|
||||
\begin{verbatim}
|
||||
@ -260,7 +295,7 @@ are:
|
||||
\verb/\'/ & Single quote (\verb/'/) \\
|
||||
\verb/\a/ & ASCII Bell (BEL) \\
|
||||
\verb/\b/ & ASCII Backspace (BS) \\
|
||||
\verb/\E/ & ASCII Escape (ESC) \\
|
||||
%\verb/\E/ & ASCII Escape (ESC) \\
|
||||
\verb/\f/ & ASCII Formfeed (FF) \\
|
||||
\verb/\n/ & ASCII Linefeed (LF) \\
|
||||
\verb/\r/ & ASCII Carriage Return (CR) \\
|
||||
@ -272,13 +307,13 @@ are:
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
|
||||
For compatibility with in Standard C, up to three octal digits are
|
||||
In strict compatibility with in Standard C, up to three octal digits are
|
||||
accepted, but an unlimited number of hex digits is taken to be part of
|
||||
the hex escape (and then the lower 8 bits of the resulting hex number
|
||||
are used...).
|
||||
are used in all current implementations...).
|
||||
|
||||
All unrecognized escape sequences are left in the string {\em
|
||||
unchanged}, i.e., the backslash is left in the string. (This rule is
|
||||
All unrecognized escape sequences are left in the string unchanged,
|
||||
i.e., {\em the backslash is left in the string.} (This rule is
|
||||
useful when debugging: if an escape sequence is mistyped, the
|
||||
resulting output is more easily recognized as broken. It also helps a
|
||||
great deal for string literals used as regular expressions or
|
||||
@ -313,6 +348,18 @@ fraction: "." digit+
|
||||
exponent: ("e"|"E") ["+"|"-"] digit+
|
||||
\end{verbatim}
|
||||
|
||||
Some examples of numeric literals:
|
||||
|
||||
\begin{verbatim}
|
||||
1 1234567890 0177777 0x80000
|
||||
|
||||
|
||||
\end{verbatim}
|
||||
|
||||
Note that the definitions for literals do not include a sign; a phrase
|
||||
like \verb\-1\ is actually an expression composed of the operator
|
||||
\verb\-\ and the literal \verb\1\.
|
||||
|
||||
\section{Operators}
|
||||
|
||||
The following tokens are operators:
|
||||
@ -336,13 +383,16 @@ meaning:
|
||||
; , : . ` =
|
||||
\end{verbatim}
|
||||
|
||||
The following printing ASCII characters are currently not used;
|
||||
their occurrence is an unconditional error:
|
||||
The following printing ASCII characters are not used in Python (except
|
||||
in string literals and in comments). Their occurrence is an
|
||||
unconditional error:
|
||||
|
||||
\begin{verbatim}
|
||||
! @ $ " ?
|
||||
\end{verbatim}
|
||||
|
||||
They may be used by future versions of the language though!
|
||||
|
||||
\chapter{Execution model}
|
||||
|
||||
(XXX This chapter should explain the general model of the execution of
|
||||
|
128
Doc/ref/ref.tex
128
Doc/ref/ref.tex
@ -1,5 +1,5 @@
|
||||
% Format this file with latex.
|
||||
|
||||
|
||||
\documentstyle[myformat]{report}
|
||||
|
||||
\title{\bf
|
||||
@ -65,17 +65,18 @@ rather than formal specifications for everything except syntax and
|
||||
lexical analysis. This should make the document better understandable
|
||||
to the average reader, but will leave room for ambiguities.
|
||||
Consequently, if you were coming from Mars and tried to re-implement
|
||||
Python from this document alone, you might in fact be implementing
|
||||
quite a different language. On the other hand, if you are using
|
||||
Python from this document alone, you might have to guess things and in
|
||||
fact you would be implementing quite a different language.
|
||||
On the other hand, if you are using
|
||||
Python and wonder what the precise rules about a particular area of
|
||||
the language are, you should be able to find it here.
|
||||
the language are, you should definitely be able to find it here.
|
||||
|
||||
It is dangerous to add too many implementation details to a language
|
||||
reference document -- the implementation may change, and other
|
||||
implementations of the same language may work differently. On the
|
||||
other hand, there is currently only one Python implementation, and
|
||||
particular quirks of it are sometimes worth mentioning, especially
|
||||
where it differs from the ``ideal'' specification.
|
||||
its particular quirks are sometimes worth being mentioned, especially
|
||||
where the implementation imposes additional limitations.
|
||||
|
||||
Every Python implementation comes with a number of built-in and
|
||||
standard modules. These are not documented here, but in the separate
|
||||
@ -93,20 +94,20 @@ name: lcletter (lcletter | "_")*
|
||||
lcletter: "a"..."z"
|
||||
\end{verbatim}
|
||||
|
||||
The first line says that a \verb\name\ is a \verb\lcletter\ followed by
|
||||
a sequence of zero or more \verb\lcletter\s and underscores. A
|
||||
The first line says that a \verb\name\ is an \verb\lcletter\ followed by
|
||||
a sequence of zero or more \verb\lcletter\s and underscores. An
|
||||
\verb\lcletter\ in turn is any of the single characters `a' through `z'.
|
||||
(This rule is actually adhered to for the names defined in syntax and
|
||||
grammar rules in this document.)
|
||||
|
||||
Each rule begins with a name (which is the name defined by the rule)
|
||||
followed by a colon. Each rule is wholly contained on one line. A
|
||||
vertical bar (\verb\|\) is used to separate alternatives, it is the
|
||||
least binding operator in this notation. A star (\verb\*\) means zero
|
||||
or more repetitions of the preceding item; likewise, a plus (\verb\+\)
|
||||
means one or more repetitions and a question mark (\verb\?\) zero or
|
||||
one (in other words, the preceding item is optional). These three
|
||||
operators bind as tight as possible; parentheses are used for
|
||||
and a colon, and is wholly contained on one line. A vertical bar
|
||||
(\verb\|\) is used to separate alternatives; it is the least binding
|
||||
operator in this notation. A star (\verb\*\) means zero or more
|
||||
repetitions of the preceding item; likewise, a plus (\verb\+\) means
|
||||
one or more repetitions, and a question mark (\verb\?\) zero or one
|
||||
(in other words, the preceding item is optional). These three
|
||||
operators bind as tightly as possible; parentheses are used for
|
||||
grouping. Literal strings are enclosed in double quotes. White space
|
||||
is only meaningful to separate tokens.
|
||||
|
||||
@ -117,7 +118,7 @@ characters. A phrase between angular brackets (\verb\<...>\) gives an
|
||||
informal description of the symbol defined; e.g., this could be used
|
||||
to describe the notion of `control character' if needed.
|
||||
|
||||
Although the notation used is almost the same, there is a big
|
||||
Even though the notation used is almost the same, there is a big
|
||||
difference between the meaning of lexical and syntactic definitions:
|
||||
a lexical definition operates on the individual characters of the
|
||||
input source, while a syntax definition operates on the stream of
|
||||
@ -131,22 +132,22 @@ chapter describes how the lexical analyzer breaks a file into tokens.
|
||||
|
||||
\section{Line structure}
|
||||
|
||||
A Python program is divided in a number of logical lines. Statements
|
||||
do not straddle logical line boundaries except where explicitly
|
||||
indicated by the syntax (i.e., for compound statements). To this
|
||||
purpose, the end of a logical line is represented by the token
|
||||
NEWLINE.
|
||||
A Python program is divided in a number of logical lines. The end of
|
||||
a logical line is represented by the token NEWLINE. Statements cannot
|
||||
cross logical line boundaries except where NEWLINE is allowed by the
|
||||
syntax (e.g., between statements in compound statements).
|
||||
|
||||
\subsection{Comments}
|
||||
|
||||
A comment starts with a hash character (\verb\#\) that is not part of
|
||||
a string literal, and ends at the end of the physical line. Comments
|
||||
are ignored by the syntax.
|
||||
a string literal, and ends at the end of the physical line. A comment
|
||||
always signifies the end of the logical line. Comments are ignored by
|
||||
the syntax.
|
||||
|
||||
\subsection{Line joining}
|
||||
|
||||
Two or more physical lines may be joined into logical lines using
|
||||
backslash characters (\verb/\/), as follows: When physical line ends
|
||||
backslash characters (\verb/\/), as follows: when a physical line ends
|
||||
in a backslash that is not part of a string literal or comment, it is
|
||||
joined with the following forming a single logical line, deleting the
|
||||
backslash and the following end-of-line character.
|
||||
@ -160,13 +161,14 @@ terminates a multi-line statement.
|
||||
|
||||
\subsection{Indentation}
|
||||
|
||||
Spaces and tabs at the beginning of a logical line are used to compute
|
||||
the indentation level of the line, which in turn is used to determine
|
||||
the grouping of statements.
|
||||
Leading whitespace (spaces and tabs) at the beginning of a logical
|
||||
line is used to compute the indentation level of the line, which in
|
||||
turn is used to determine the grouping of statements.
|
||||
|
||||
First, each tab is replaced by one to eight spaces such that the total
|
||||
number of spaces up to that point is a multiple of eight. The total
|
||||
number of spaces preceding the first non-blank character then
|
||||
First, tabs are replaced (from left to right) by one to eight spaces
|
||||
such that the total number of characters up to there is a multiple of
|
||||
eight (this is intended to be the same rule as used by UNIX). The
|
||||
total number of spaces preceding the first non-blank character then
|
||||
determines the line's indentation. Indentation cannot be split over
|
||||
multiple physical lines using backslashes.
|
||||
|
||||
@ -185,6 +187,38 @@ popped off, and for each number popped off a DEDENT token is
|
||||
generated. At the end of the file, a DEDENT token is generated for
|
||||
each number remaining on the stack that is larger than zero.
|
||||
|
||||
Here is an example of a correctly (though confusingly) indented piece
|
||||
of Python code:
|
||||
|
||||
\begin{verbatim}
|
||||
def perm(l):
|
||||
if len(l) <= 1:
|
||||
return [l]
|
||||
r = []
|
||||
for i in range(len(l)):
|
||||
s = l[:i] + l[i+1:]
|
||||
p = perm(s)
|
||||
for x in p:
|
||||
r.append(l[i:i+1] + x)
|
||||
return r
|
||||
\end{verbatim}
|
||||
|
||||
The following example shows various indentation errors:
|
||||
|
||||
\begin{verbatim}
|
||||
def perm(l): # error: first line indented
|
||||
for i in range(len(l)): # error: not indented
|
||||
s = l[:i] + l[i+1:]
|
||||
p = perm(l[:i] + l[i+1:]) # error: unexpected indent
|
||||
for x in p:
|
||||
r.append(l[i:i+1] + x)
|
||||
return r # error: inconsistent indent
|
||||
\end{verbatim}
|
||||
|
||||
(Actually, the first three errors are detected by the parser; only the
|
||||
last error is found by the lexical analyzer -- the indentation of
|
||||
\verb\return r\ does not match a level popped off the stack.)
|
||||
|
||||
\section{Other tokens}
|
||||
|
||||
Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
|
||||
@ -205,12 +239,13 @@ uppercase: "A"..."Z"
|
||||
digit: "0"..."9"
|
||||
\end{verbatim}
|
||||
|
||||
Identifiers are unlimited in length. Case is significant.
|
||||
Identifiers are unlimited in length. Case is significant. Keywords
|
||||
are not identifiers.
|
||||
|
||||
\section{Keywords}
|
||||
|
||||
The following identifiers are used as reserved words, or {\em
|
||||
keywords} of the language, and may not be used as ordinary
|
||||
keywords} of the language, and cannot be used as ordinary
|
||||
identifiers. They must be spelled exactly as written here:
|
||||
|
||||
\begin{verbatim}
|
||||
@ -260,7 +295,7 @@ are:
|
||||
\verb/\'/ & Single quote (\verb/'/) \\
|
||||
\verb/\a/ & ASCII Bell (BEL) \\
|
||||
\verb/\b/ & ASCII Backspace (BS) \\
|
||||
\verb/\E/ & ASCII Escape (ESC) \\
|
||||
%\verb/\E/ & ASCII Escape (ESC) \\
|
||||
\verb/\f/ & ASCII Formfeed (FF) \\
|
||||
\verb/\n/ & ASCII Linefeed (LF) \\
|
||||
\verb/\r/ & ASCII Carriage Return (CR) \\
|
||||
@ -272,13 +307,13 @@ are:
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
|
||||
For compatibility with in Standard C, up to three octal digits are
|
||||
In strict compatibility with in Standard C, up to three octal digits are
|
||||
accepted, but an unlimited number of hex digits is taken to be part of
|
||||
the hex escape (and then the lower 8 bits of the resulting hex number
|
||||
are used...).
|
||||
are used in all current implementations...).
|
||||
|
||||
All unrecognized escape sequences are left in the string {\em
|
||||
unchanged}, i.e., the backslash is left in the string. (This rule is
|
||||
All unrecognized escape sequences are left in the string unchanged,
|
||||
i.e., {\em the backslash is left in the string.} (This rule is
|
||||
useful when debugging: if an escape sequence is mistyped, the
|
||||
resulting output is more easily recognized as broken. It also helps a
|
||||
great deal for string literals used as regular expressions or
|
||||
@ -313,6 +348,18 @@ fraction: "." digit+
|
||||
exponent: ("e"|"E") ["+"|"-"] digit+
|
||||
\end{verbatim}
|
||||
|
||||
Some examples of numeric literals:
|
||||
|
||||
\begin{verbatim}
|
||||
1 1234567890 0177777 0x80000
|
||||
|
||||
|
||||
\end{verbatim}
|
||||
|
||||
Note that the definitions for literals do not include a sign; a phrase
|
||||
like \verb\-1\ is actually an expression composed of the operator
|
||||
\verb\-\ and the literal \verb\1\.
|
||||
|
||||
\section{Operators}
|
||||
|
||||
The following tokens are operators:
|
||||
@ -336,13 +383,16 @@ meaning:
|
||||
; , : . ` =
|
||||
\end{verbatim}
|
||||
|
||||
The following printing ASCII characters are currently not used;
|
||||
their occurrence is an unconditional error:
|
||||
The following printing ASCII characters are not used in Python (except
|
||||
in string literals and in comments). Their occurrence is an
|
||||
unconditional error:
|
||||
|
||||
\begin{verbatim}
|
||||
! @ $ " ?
|
||||
\end{verbatim}
|
||||
|
||||
They may be used by future versions of the language though!
|
||||
|
||||
\chapter{Execution model}
|
||||
|
||||
(XXX This chapter should explain the general model of the execution of
|
||||
|
Loading…
Reference in New Issue
Block a user