PEP 263 - Defining Python Source Code Encodings

PEP 263 - Defining Python Source Code Encodings#!/usr/bin/env python# -*- coding: utf-8 -*-# Foreverstrong ChengPython Enhancement Proposals，PEPs：Python 增强提案1. Defining the EncodingPython will default

Yongqiang Cheng

781人浏览 · 2019-11-04 11:24:29

Yongqiang Cheng · 2019-11-04 11:24:29 发布

PEP 263 - Defining Python Source Code Encodings

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Foreverstrong Cheng

Python Enhancement Proposals，PEPs：Python 增强提案

1. Defining the Encoding

Python will default to ASCII as standard encoding if no other encoding hints are given.
如果没有其他编码提示，Python 将默认使用 ASCII 作为标准编码。

To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file, such as:
要定义源代码编码，必须将魔术注释作为源文件的第一行或第二行放置在源文件中，例如：

# coding=<encoding name>

or (using formats recognized by popular editors - 使用流行编辑器认可的格式):

#!/usr/bin/python
# -*- coding: <encoding name> -*-

or:

#!/usr/bin/python
# vim: set fileencoding=<encoding name> :

More precisely, the first or second line must match the following regular expression:
更准确地说，第一行或第二行必须与以下正则表达式匹配：

^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)

The first group of this expression is then interpreted as encoding name. If the encoding is unknown to Python, an error is raised during compilation. There must not be any Python statement on the line that contains the encoding declaration. If the first line matches the second line is ignored.
然后将此表达式的第一组解释为编码名称。如果 Python 未知编码，则在编译期间会引发错误。包含编码声明的行上不得有任何 Python 语句。如果第一行匹配，则第二行将被忽略。

To aid with platforms such as Windows, which add Unicode BOM marks to the beginning of Unicode files, the UTF-8 signature \xef\xbb\xbf will be interpreted as utf-8 encoding as well (even if no magic encoding comment is given).
为了帮助诸如 Windows 之类的平台，该平台在 Unicode 文件的开头添加了 Unicode BOM 标记，UTF-8 签名 \xef\xbb\xbf 也将被解释为 utf-8 编码 (即使没有魔术编码注释也是如此) 给出。

If a source file uses both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is utf-8. Any other encoding will cause an error.
如果源文件同时使用 UTF-8 BOM 标记签名和魔术编码注释，则注释的唯一允许编码为 utf-8。任何其他编码都会导致错误。

hint [hɪnt]：n. 暗示，线索 vt. 暗示，示意 vi. 示意

2. Examples

These are some examples to clarify the different styles for defining the source code encoding at the top of a Python source file:
以下是一些示例，用于阐明在 Python 源文件顶部定义源代码编码的不同样式：

2.1 With interpreter binary and using Emacs style file encoding comment:

使用解释器二进制文件并使用 Emacs 样式文件编码注释：

#!/usr/bin/python
# -*- coding: latin-1 -*-
import os, sys
...

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import os, sys
...

#!/usr/bin/python
# -*- coding: ascii -*-
import os, sys
...

2.2 Without interpreter line, using plain text:

没有解释器行，使用纯文本：

# This Python file uses the following encoding: utf-8
import os, sys
...

2.3 Text editors might have different ways of defining the file’s encoding, e.g.:

文本编辑器可能以不同的方式定义文件的编码，例如：

#!/usr/local/bin/python
# coding: latin-1
import os, sys
...

2.4 Without encoding comment, Python’s parser will assume ASCII text:

没有编码注释，Python 的解析器将采用 ASCII 文本：

#!/usr/local/bin/python
import os, sys
...

2.5 Encoding comments which don’t work:

编码无效的注释：

Missing “coding:” prefix:

#!/usr/local/bin/python
# latin-1
import os, sys
...

Encoding comment not on line 1 or 2:

#!/usr/local/bin/python
#
# -*- coding: latin-1 -*-
import os, sys
...

Unsupported encoding:

#!/usr/local/bin/python
# -*- coding: utf-42 -*-
import os, sys
...

coding 和 = 之间或者 coding 和 : 之间不能有空格。

3. Concepts

The complete Python source file should use a single encoding. Embedding of differently encoded data is not allowed and will result in a decoding error during compilation of the Python source code.
完整的 Python 源文件应使用单一编码。不允许嵌入不同编码的数据，这会在 Python 源代码的编译过程中导致解码错误。