Answer a question

I am working with the aggregation functions bool_or and bool_and to aggregate some records and see whether there is disgreement on a particular column. According to the official documentation:

bool_and(expression)    true if all input values are true, otherwise false
bool_or(expression)     true if at least one input value is true, otherwise false

However this test query:

SELECT bool_or(val),bool_and(val) FROM UNNEST(array[true,NULL]::bool[]) t(val)

Yields true for both columns.

I think bool_and is excluding NULL values. Is there any way to use built-in aggregation functions to make the above query return true and NULL?

Answers

Yes, it looks like NULL inputs are ignored by these aggregates.

This kind of stupidity almost certainly comes straight from the SQL standard (though I'm not about to pay $200 to find out for sure). Other standard aggregates like sum(var) work this way, and it seems like they probably just extrapolated from there, without considering the inherent difference between arithmetic and boolean operations when it comes to handling null values.

I don't think there's any way to work around it; I believe the only way you can convince these functions to return a NULL is by feeding them an empty dataset. (As an aside, whoever insisted that the sum() of zero rows should be NULL rather than 0 ought to be committed...)

Luckily, Postgres is infinitely extensible, and defining you own aggregates is pretty trivial:

CREATE FUNCTION boolean_and(boolean, boolean) RETURNS boolean AS
  'SELECT $1 AND $2'
LANGUAGE SQL IMMUTABLE;

CREATE AGGREGATE sensible_bool_and(boolean)
(
  STYPE = boolean,
  INITCOND = true,
  SFUNC = boolean_and,
  -- Optionally, to allow parallelisation:
  COMBINEFUNC = boolean_and, 
  PARALLEL = SAFE
);

If you just need this for a one-off query, and don't want to (or don't have the permissions to) add a new aggregate definition to the database, you can put these in your connection-local temp schema by defining and referring to them as pg_temp.boolean_and()/pg_temp.sensible_bool_and().
(If you're using a connection pool, you might want to drop them when you're done.)

Note that this is ~10x slower than the built-in bool_and() (though not likely to be the bottleneck in many realistic use-cases); SQL boolean values are heap-allocated and immutable, so boolean_and() needs to allocate a new one for each iteration, while LANGUAGE C functions are allowed to update an accumulator in-place. If performance is a concern, and you're willing/able to build and deploy your own C module, then (as with most internal functions) you can pretty easily copy-paste the bool_and() implementation and tweak it to suit your needs.

But all of this is kind of overkill unless you have a real need for it. In practice, I would probably go for @Luke's solution instead.

Logo

PostgreSQL社区为您提供最前沿的新闻资讯和知识内容

更多推荐