Answer a question

The standard approach for using variable values in SQLite queries is the "question mark style", like this:

import sqlite3
with sqlite3.connect(":memory:") as connection:
    connection.execute("CREATE TABLE foo(bar)")
    connection.execute("INSERT INTO foo(bar) VALUES (?)", ("cow",))

    print(list(connection.execute("SELECT * from foo")))
    # prints [(u'cow',)]

However, this only works for substituting values into queries. It fails when used for table or column names:

import sqlite3
with sqlite3.connect(":memory:") as connection:
    connection.execute("CREATE TABLE foo(?)", ("bar",))
    # raises sqlite3.OperationalError: near "?": syntax error

Neither the sqlite3 module nor PEP 249 mention a function for escaping names or values. Presumably this is to discourage users from assembling their queries with strings, but it leaves me at a loss.

What function or technique is most appropriate for using variable names for columns or tables in SQLite? I'd would strongly prefer to do able to do this without any other dependencies, since I'll be using it in my own wrapper.

I looked for but couldn't find a clear and complete description of the relevant part of SQLite's syntax, to use to write my own function. I want to be sure this will work for any identifier permitted by SQLite, so a trial-and-error solution is too uncertain for me.

SQLite uses " to quote identifiers but I'm not sure that just escaping them is sufficient. PHP's sqlite_escape_string function's documentation suggests that certain binary data may need to be escaped as well, but that may be a quirk of the PHP library.

Answers

To convert any string into a SQLite identifier:

  • Ensure the string can be encoded as UTF-8.
  • Ensure the string does not include any NUL characters.
  • Replace all " with "".
  • Wrap the entire thing in double quotes.

Implementation

import codecs

def quote_identifier(s, errors="strict"):
    encodable = s.encode("utf-8", errors).decode("utf-8")

    nul_index = encodable.find("\x00")

    if nul_index >= 0:
        error = UnicodeEncodeError("NUL-terminated utf-8", encodable,
                                   nul_index, nul_index + 1, "NUL not allowed")
        error_handler = codecs.lookup_error(errors)
        replacement, _ = error_handler(error)
        encodable = encodable.replace("\x00", replacement)

    return "\"" + encodable.replace("\"", "\"\"") + "\""

Given a string single argument, it will escape and quote it correctly or raise an exception. The second argument can be used to specify any error handler registered in the codecs module. The built-in ones are:

  • 'strict': raise an exception in case of an encoding error
  • 'replace': replace malformed data with a suitable replacement marker, such as '?' or '\ufffd'
  • 'ignore': ignore malformed data and continue without further notice
  • 'xmlcharrefreplace': replace with the appropriate XML character reference (for encoding only)
  • 'backslashreplace': replace with backslashed escape sequences (for encoding only)

This doesn't check for reserved identifiers, so if you try to create a new SQLITE_MASTER table it won't stop you.

Example Usage

import sqlite3

def test_identifier(identifier):
    "Tests an identifier to ensure it's handled properly."

    with sqlite3.connect(":memory:") as c:
        c.execute("CREATE TABLE " + quote_identifier(identifier) + " (foo)")
        assert identifier == c.execute("SELECT name FROM SQLITE_MASTER").fetchone()[0]

test_identifier("'Héllo?'\\\n\r\t\"Hello!\" -☃") # works
test_identifier("北方话") # works
test_identifier(chr(0x20000)) # works

print(quote_identifier("Fo\x00o!", "replace")) # prints "Fo?o!"
print(quote_identifier("Fo\x00o!", "ignore")) # prints "Foo!"
print(quote_identifier("Fo\x00o!")) # raises UnicodeEncodeError
print(quote_identifier(chr(0xD800))) # raises UnicodeEncodeError

Observations and References

  • SQLite identifiers are TEXT, not binary.
    • SQLITE_MASTER schema in the FAQ
    • Python 2 SQLite API yelled at me when I gave it bytes it couldn't decode as text.
    • Python 3 SQLite API requires queries be strs, not bytes.
  • SQLite identifiers are quoted using double-quotes.
    • SQL as Understood by SQLite
  • Double-quotes in SQLite identifiers are escaped as two double quotes.
  • SQLite identifiers preserve case, but they are case-insensitive towards ASCII letters. It is possible to enable unicode-aware case-insensitivity.
    • SQLite FAQ Question #18
  • SQLite does not support the NUL character in strings or identifiers.
    • SQLite Ticket 57c971fc74
  • sqlite3 can handle any other unicode string as long as it can be properly encoded to UTF-8. Invalid strings could cause crashes between Python 3.0 and Python 3.1.2 or thereabouts. Python 2 accepted these invalid strings, but this is considered a bug.
    • Python Issue #12569
    • Modules/_sqlite/cursor.c
    • I tested it a bunch.
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐