Behaviour of bool_and and bool_or with NULL values

Postgredaxiang

54人浏览 · 2022-09-22 05:53:04

Postgredaxiang · 2022-09-22 05:53:04 发布

Answer a question

I am working with the aggregation functions bool_or and bool_and to aggregate some records and see whether there is disgreement on a particular column. According to the official documentation:

bool_and(expression)    true if all input values are true, otherwise false
bool_or(expression)     true if at least one input value is true, otherwise false

However this test query:

SELECT bool_or(val),bool_and(val) FROM UNNEST(array[true,NULL]::bool[]) t(val)

Yields true for both columns.

I think bool_and is excluding NULL values. Is there any way to use built-in aggregation functions to make the above query return true and NULL?

Answers

Yes, it looks like NULL inputs are ignored by these aggregates.

This kind of stupidity almost certainly comes straight from the SQL standard (though I'm not about to pay $200 to find out for sure). Other standard aggregates like sum(var) work this way, and it seems like they probably just extrapolated from there, without considering the inherent difference between arithmetic and boolean operations when it comes to handling null values.

I don't think there's any way to work around it; I believe the only way you can convince these functions to return a NULL is by feeding them an empty dataset. (As an aside, whoever insisted that the sum() of zero rows should be NULL rather than 0 ought to be committed...)

Luckily, Postgres is infinitely extensible, and defining you own aggregates is pretty trivial:

CREATE FUNCTION boolean_and(boolean, boolean) RETURNS boolean AS
  'SELECT $1 AND $2'
LANGUAGE SQL IMMUTABLE;

CREATE AGGREGATE sensible_bool_and(boolean)
(
  STYPE = boolean,
  INITCOND = true,
  SFUNC = boolean_and,
  -- Optionally, to allow parallelisation:
  COMBINEFUNC = boolean_and, 
  PARALLEL = SAFE
);

If you just need this for a one-off query, and don't want to (or don't have the permissions to) add a new aggregate definition to the database, you can put these in your connection-local temp schema by defining and referring to them as pg_temp.boolean_and()/pg_temp.sensible_bool_and().
(If you're using a connection pool, you might want to drop them when you're done.)

Note that this is ~10x slower than the built-in bool_and() (though not likely to be the bottleneck in many realistic use-cases); SQL boolean values are heap-allocated and immutable, so boolean_and() needs to allocate a new one for each iteration, while LANGUAGE C functions are allowed to update an accumulator in-place. If performance is a concern, and you're willing/able to build and deploy your own C module, then (as with most internal functions) you can pretty easily copy-paste the bool_and() implementation and tweak it to suit your needs.

But all of this is kind of overkill unless you have a real need for it. In practice, I would probably go for @Luke's solution instead.

PostgreSQL

PostgreSQL社区为您提供最前沿的新闻资讯和知识内容

更多推荐

PostgreSQL 计数查询效率,物化视图 [重复]

问题:PostgreSQL 计数查询效率,物化视图 [重复] 可能重复: PostgreSQL 计数查询优化使用 PostgreSQL 9.2,我们试图弄清楚是否有一种方法可以跟踪查询的结果数量,并以有效的方式返回该数字。这个查询应该每秒执行几次(可能几十到几百甚至几千次)。我们现在的查询看起来像这样,但我们想知道这是否效率低下: -- Get # of rows that do not hav

PostgreSQL

多对多中的唯一性

问题:多对多中的唯一性我无法弄清楚谷歌的哪些术语,所以帮助标记这个问题或只是以相关问题的方式向我指出会有所帮助。我相信我有一个典型的多对多关系: CREATE TABLE groups ( id integer PRIMARY KEY); CREATE TABLE elements ( id integer PRIMARY KEY); CREATE TABLE groups_elements

PostgreSQL

Django 与 postgresql - manage.py syncdb 返回错误

问题:Django 与 postgresql - manage.py syncdb 返回错误我从 Django 开始。我设置了一些使用 SQLite 工作的站点,但是在将 DB 引擎更改为 postgresql manage.py syncdb 后返回错误。我已经用谷歌搜索了 2 天,但对我仍然没有任何作用。Postgres 用户 'joe' 具有超级用户权限和本地 'joe ' 数据库存在。