玩转 Postgres 事务隔离

Postgredaxiang

86人浏览 · 2022-08-16 19:32:15

Postgredaxiang · 2022-08-16 19:32:15 发布

在这篇文章中,我们将了解 Postgres 中的事务隔离级别。但首先,让我们简要回顾一下交易以及为什么需要交易。

什么是事务?

一个事务结合了多个读取和写入步骤。如果事务已启动但无法完成,则会回滚。回滚事务对数据库中的数据没有影响。这是一个全有或全无的操作。

示例

假设您有一张供在线游戏用户使用的表格:

所有者

平衡

丽莎

2000

用户可以在游戏中的东西上花费数字硬币。

您希望天平在任何时候都是正确的。任何用户都不应该花费超过他们拥有的钱,也不应该为一件商品支付两次费用。

当 Lisa 登录并以 1000 个硬币购买物品时,它的 sql 可能如下所示:

-- check the balance to see if Lisa has enough coins
select balance from accounts where owner = 'Lisa';

-- in your business logic between the queries: check if that's enough and
-- calculate the new balance (1000)

-- update Lisa's balance
update accounts set balance = 1000 where owner = 'Lisa';

这完全没问题,丽莎的新平衡是正确的。

但是,如果 Lisa 真的很聪明并且打开了两个会话,一个在手机上,一个在她的笔记本电脑上呢?

在这种情况下,她几乎可以同时点击两件商品的购买按钮:

在会话 A 中,丽莎用1000硬币购买了一件物品,因此新余额将为1000。在会话 B 中,她以1250硬币购买了一件物品,因此新余额将为750。

问题是两个会话都读取了 Lisa 的初始余额(2000硬币),因此允许各自的购买。最新的更新查询将她的余额设置为750硬币,而她确实应该拥有-250。她有效地偷走了1000硬币🙀

交易可以用来避免这种彻底的噩梦。将购买查询包装在事务中,Postgres 确保余额正确,即使购买同时发生:

-- begin transaction
begin transaction isolation level repeatable read;

-- check the balance to see if Lisa has enough coins
select balance from accounts where owner = 'Lisa';

-- in your business logic between the queries: check if that's enough and
-- calculate the new balance (1000)

-- update Lisa's balance
update accounts set balance = 1000 where owner = 'Lisa';

-- commit the transaction
commit;

让我们看看当我们回放两个并发购买的场景时会发生什么。为了模拟这种情况,我们创建并填充表accounts:

create table accounts (owner text primary key, balance integer not null);

insert into accounts values ('Lisa', 2000);

从现在开始我们需要两个终端来模拟Lisa的两个用户会话,终端A和终端B:

-- TERMINAL A

-- begin transaction
begin transaction isolation level repeatable read;

-- check the balance to see if Lisa has enough coins
select balance from accounts where owner = 'Lisa';

-- update Lisa's balance
update accounts set balance = 1000 where owner = 'Lisa';

请注意,我们还没有提交事务,因为我们想看看当我们让两个事务同时运行时会发生什么。所以这是终端B中的第二个:

-- TERMINAL B

-- begin transaction
begin transaction isolation level repeatable read;

-- check the balance to see if Lisa has enough coins
select balance from accounts where owner = 'Lisa';

-- update Lisa's balance
update accounts set balance = 750 where owner = 'Lisa';

Lisa 以2000硬币的余额开始。在终端 A,我们正在处理购买价值1000硬币的物品,因此我们将新余额设置为1000硬币。在终端 B,我们正在处理价值1250硬币的购买并将余额设置为750。

如果不将相应的查询包装在事务中,这实际上可以工作; Lisa 将获得两个项目和1000或750的最终余额,具体取决于哪个更新查询更快。

现在我们在终端 A 中提交事务:

-- TERMINAL A

-- commit the transaction
commit;

那应该工作得很好。但现在我们在终端 B 中看到了这一点:

-- TERMINAL B

ERROR:  could not serialize access due to concurrent update

如果我们尝试在终端 B 中使用commit;提交事务,我们会得到

-- TERMINAL B

ROLLBACK

Postgres 抛出错误,因为更新在终端 A 的事务中更新的同一行是不安全的。

只有终端 A 的交易成功,Lisa 的新余额现在是1000。凉爽的! Postgres 救了我们,这次 Lisa 的把戏没有奏效。

Postgres的三个隔离级别

您可能注意到我们在示例中使用事务隔离级别repeatable read启动了事务:

-- begin transaction
begin transaction isolation level repeatable read;

Postgres 具有三个事务隔离级别:

read committed
repeatable read
serializable

Read committed是 Postgres 中的默认选项,在我们的示例中它不会抛出错误。这就是我们选择repeatable read的原因,这就是为什么我认为检查这些隔离级别实际上为我们做了什么是个好主意。

需要担心的四件事

并发事务中有四件事需要担心:

脏读
可重复读取
幻读
序列化异常

这有点不匹配:三个隔离级别,但要担心四件事。其他数据库系统多了一个隔离级别:read uncommitted。

你实际上可以将事务的隔离级别设置为read uncommited,但是在Postgres内部,read committed和read uncommited没有区别。

以下是 Postgres 中的隔离级别允许和阻止的内容:

隔离级别

脏读

不可重复读取

幻读

序列化异常

读未提交

🚫

✅

已提交读(默认)

🚫

✅

可重复读取

🚫

✅

可序列化

🚫

如您所见,read uncommited与read commited做的事情完全相同,所以从现在开始我们将忽略read uncommited。

要了解不同隔离级别的作用,我们必须了解事务中要担心的四件事。

脏读

脏读是事务中的读取,它从另一个未提交的事务中读取数据。无论隔离级别如何,Postgres 事务中都不允许脏读。如果您尝试在事务中创建脏读,则会发生这种情况:

脏读.png

您会期望事务 T2 中的读取获得 1000 的余额,但这将是脏读,因为事务 T1 尚未提交。

不可重复读

不可重复读取是我们在本文开头的示例中试图避免的那种现象。我们使用事务级别repeatable read来确保 Lisa 的余额是正确的,即使是在同时进行购买时也是如此。

在示例中,我们有两个并发事务,如下所示:

-- begin transaction
begin transaction isolation level repeatable read;

-- check the balance to see if Lisa has enough coins
select balance from accounts where owner = 'Lisa';

-- update Lisa's balance
update accounts set balance = 1000 where owner = 'Lisa';

-- commit the transaction
commit;

当使用隔离级别repeatable read同时运行其中两个事务时,一旦提交 T1,您将在事务 T2 中收到错误。然后当您尝试提交 T2 时,T2 将执行rollback:

不可重复读取.png

即使您省略了选择语句...

-- begin transaction
begin transaction isolation level repeatable read;

-- update Lisa's balance
update accounts set balance = 500 where owner = 'Lisa';

-- commit the transaction
commit;

...尝试同时运行其中两个事务时,您将收到相同的错误 (could not serialize access due to concurrent update)。

但是如果没有 select 语句,我们是否还在做不可重复的read?是的!根据Postgres 文档:

UPDATE、DELETE、SELECT FOR UPDATE和SELECT FOR SHARE命令在搜索目标行方面的行为与SELECT相同 [...]

幻读

当事务 T1 从表中读取一些行时发生幻读,然后另一个并发事务 T2 添加将在 T1 选择中的行并提交,最后 T1 再次执行相同的选择,但由于 T2 的插入而返回更多行.

它作为一个句子超级复杂,但在图片中要简单得多:

幻读-1.png

让我们在终端中尝试一下:

-- (re-) create the accounts table from the first example
create table if not exists accounts (owner text primary key, balance integer not null);

-- wipe the table
delete from accounts;

-- insert the first user
insert into accounts values ('Lisa', 2000);

从现在开始,我们将再次使用两个单独的终端(A 和 B),以便我们可以创建并发事务。

-- TERMINAL A

-- start transaction that prevents phantom reads
begin transaction isolation level repeatable read;

-- select rows from the accounts table
select * from accounts where balance > 500;

-- it returns:
owner | balance
-------+---------
 Lisa  |    2000
(1 row)

现在我们切换到终端 B,开始一个事务,插入一个新行并提交:

-- TERMINAL B

-- start transaction
begin transaction isolation level repeatable read;

-- insert a new row
insert into accounts values ('John', 1250);

-- commit the transaction
commit;

现在回到终端 A,让我们看看同样的 select 语句返回了什么:

-- TERMINAL A

-- select rows from the accounts table
select * from accounts where balance > 500;

-- it returns:
owner | balance
-------+---------
 Lisa  |    2000
(1 row)

哈!我们只得到了丽莎的行。如果选择查询也返回了 John 的行,那将是幻读。但是我们选择的隔离级别 (repeatable read) 不允许这样做,所以只返回 Lisa 的行。

如果您对较低的隔离级别read committed执行相同的步骤,您也会得到 John 的行,因为read committed允许幻读:

序列化异常

Serializable 是 Postgres 中限制性最强的隔离级别。使用此级别时,Postgres 将防止序列化异常。这意味着它确保仅在结果相同时才允许并发事务,而不管事务处理的顺序如何。

让我们最后一次使用accounts表,看看它在 sql 中的样子。首先我们准备我们的设置:

-- (re-) create the accounts table from the first example
create table if not exists accounts (owner text primary key, balance integer not null);

-- wipe the table
delete from accounts;

-- insert the first user
insert into accounts values ('Lisa', 2000);

现在我们要执行两个产生不同结果的事务,这取决于它们的执行顺序。两个事务都将尝试向表中插入新行。余额值将等于表中所有余额的总和。

同样,我们正在使用两个终端(A 和 B)来启动两个并发事务:

-- TERMINAL A

-- start transaction
begin transaction isolation level serializable;

-- select rows from the accounts table
insert into accounts select 'transaction T1', sum(balance) from accounts;

-- TERMINAL B

-- start transaction
begin transaction isolation level serializable;

-- insert a new row
insert into accounts select 'transaction T2', sum(balance) from accounts;

根据哪个事务更快,我们期望以下结果之一:

-- if T1 commits before T2

select * from accounts;

owner             | balance
------------------+---------
 Lisa             |    2000
 transaction T1   |    2000
 transaction T2   |    4000

-- if T2 commits before T1

select * from accounts;

owner             | balance
------------------+---------
 Lisa             |    2000
 transaction T1   |    4000
 transaction T2   |    2000

如果 T1 在 T2 之前提交,则 T1 的总和应该是2000。到 T2 提交时,T2 应该得到4000作为所有余额的总和(每个Lisa和transaction T1的2000)。

但是事务级别serializable不允许两个事务都提交。提交的第一个事务会很好,但是对于您提交的第二个事务,您将得到:

ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.
HINT:  The transaction might succeed if retried.

以下是交易的时间表:

序列化-anomaly.png

如果您使用其他两个隔离级别之一执行相同的事务,则两个事务都可以提交,结果将是:

select * from accounts;

owner           | balance
----------------+---------
 Lisa           |    2000
 transaction T1 |    2000
 transaction T2 |    2000

这不是我们所期望的。transaction T1或transaction T1的余额应该是4000。

因为执行顺序对这两个事务的结果很重要,所以隔离级别serializable将只允许其中一个提交。

Postgres 再次阻止我们以错误的平衡告终。

我们现在可以重试失败的事务,并且鉴于没有其他干扰事务正在运行,它会提交就好了🚀

结论

事务隔离是一个强大的概念。它可以帮助您避免不一致的数据,而无需使用更多的业务逻辑代码。

PostgreSQL

PostgreSQL社区为您提供最前沿的新闻资讯和知识内容

更多推荐

PostgreSQL 计数查询效率,物化视图 [重复]

问题:PostgreSQL 计数查询效率,物化视图 [重复] 可能重复: PostgreSQL 计数查询优化使用 PostgreSQL 9.2,我们试图弄清楚是否有一种方法可以跟踪查询的结果数量,并以有效的方式返回该数字。这个查询应该每秒执行几次(可能几十到几百甚至几千次)。我们现在的查询看起来像这样,但我们想知道这是否效率低下: -- Get # of rows that do not hav

PostgreSQL

多对多中的唯一性

问题:多对多中的唯一性我无法弄清楚谷歌的哪些术语,所以帮助标记这个问题或只是以相关问题的方式向我指出会有所帮助。我相信我有一个典型的多对多关系: CREATE TABLE groups ( id integer PRIMARY KEY); CREATE TABLE elements ( id integer PRIMARY KEY); CREATE TABLE groups_elements

PostgreSQL

Django 与 postgresql - manage.py syncdb 返回错误

问题:Django 与 postgresql - manage.py syncdb 返回错误我从 Django 开始。我设置了一些使用 SQLite 工作的站点,但是在将 DB 引擎更改为 postgresql manage.py syncdb 后返回错误。我已经用谷歌搜索了 2 天,但对我仍然没有任何作用。Postgres 用户 'joe' 具有超级用户权限和本地 'joe ' 数据库存在。