[BUGS] BUG #14235: inconsistencies with IS NULL / IS NOT NULL

Discussion:

(too old to reply)

a***@tao11.riddles.org.uk

2016-07-08 02:47:46 UTC

The following bug has been logged on the website:

Bug reference: 14235
Logged by: Andrew Gierth
Email address: ***@tao11.riddles.org.uk
PostgreSQL version: 9.6beta2
Operating system: any
Description:

While discussing the issues resulting from the spec's definition of IS NULL
and IS NOT NULL on composite values, we discovered the following
inconsistent cases, all of which appear to be the fault of
eval_const_expressions.

In short, the transformations applied to NullTest nodes change the
semantics, which can then make the final result depend on, for example,
whether function calls were inlined or not.

create type ct1 as (a integer, b integer);
create type ct2 as (a integer, b ct1);
create type ct3 as (a integer, b ct2);

create function make_ct1(integer,integer) returns ct1
language sql immutable
as $f$ select row($1,$2*$2)::ct1; $f$;

create function make_ct2(integer,integer) returns ct2
language sql immutable
as $f$ select row($1,row($2,$2)::ct1)::ct2; $f$;

create function make_ct3(integer,integer) returns ct3
language sql immutable
as $f$ select row($1,row($2,row($2,$2)::ct1)::ct2)::ct3; $f$;

create function stable_null() returns integer
language plpgsql stable cost 1
as $f$ begin return null; end; $f$;

create function volatile_null() returns integer
language plpgsql volatile cost 1000
as $f$ begin return null; end; $f$;

select row(null,null)::ct1 is null as ct1,
row(null,row(null,null)::ct1)::ct2 is null as ct2,
row(null,row(null,row(null,null)::ct1)::ct2)::ct3 is null as ct3;

select row(1,null)::ct1 is not null as row_cons,
'(1,)'::ct1 is not null as row_literal,
make_ct1(1,null) is not null as inlined_const,
make_ct1(1,stable_null()) is not null as inlined_func,
make_ct1(1,volatile_null()) is not null as called_func;

select row(1,row(null,null)::ct1)::ct2 is not null as row_cons,
'(1,"(,)")'::ct2 is not null as row_literal,
make_ct2(1,null) is not null as inlined_const,
make_ct2(1,stable_null()) is not null as inlined_func,
make_ct2(1,volatile_null()) is not null as called_func;

select row(1,row(null,row(null,null)::ct1)::ct2)::ct3 is not null as
row_cons,
'(1,"(,""(,)"")")'::ct3 is not null as row_literal,
make_ct3(1,null) is not null as inlined_const,
make_ct3(1,stable_null()) is not null as inlined_func,
make_ct3(1,volatile_null()) is not null as called_func;

select row(null,null)::ct1 is null as row_cons,
'(,)'::ct1 is null as row_literal,
make_ct1(null,null) is null as inlined_const,
make_ct1(null,stable_null()) is null as inlined_func,
make_ct1(null,volatile_null()) is null as called_func;

select row(null,row(null,null)::ct1)::ct2 is null as row_cons,
'(,"(,)")'::ct2 is null as row_literal,
make_ct2(null,null) is null as inlined_const,
make_ct2(null,stable_null()) is null as inlined_func,
make_ct2(null,volatile_null()) is null as called_func;

select row(null,row(null,row(null,null)::ct1)::ct2)::ct3 is null as
row_cons,
'(,"(,""(,)"")")'::ct3 is null as row_literal,
make_ct3(null,null) is null as inlined_const,
make_ct3(null,stable_null()) is null as inlined_func,
make_ct3(null,volatile_null()) is null as called_func;

--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andrew Gierth

2016-07-08 03:10:10 UTC

Permalink

andrew> While discussing the issues resulting from the spec's
andrew> definition of IS NULL and IS NOT NULL on composite values, we
andrew> discovered the following inconsistent cases, all of which
andrew> appear to be the fault of eval_const_expressions.

andrew> In short, the transformations applied to NullTest nodes change
andrew> the semantics, which can then make the final result depend on,
andrew> for example, whether function calls were inlined or not.

And here's my analysis of what seems to be going on:

The executor, when doing IS [NOT] NULL on a composite value, looks at
each column to see if it is the null value. It does NOT recurse into
nested composite values, and my reading of the spec suggests that this
is correct.

But eval_const_expressions thinks it has license to change

ROW(a,b) IS NULL

into

(a IS NULL) AND (b IS NULL)

which has the effect of recursing one level into a nested composite
value.

It seems possible that this could be fixed by simply setting
argisrow=false for all the null tests generated in such cases.
Specifically I've tried this, which seems to fix all the inconsistent
cases:

--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -3308,7 +3308,13 @@ eval_const_expressions_mutator(Node *node,
newntest = makeNode(NullTest);
newntest->arg = (Expr *) relem;
newntest->nulltesttype = ntest->nulltesttype;
- newntest->argisrow = type_is_rowtype(exprType(relem));
+ /*
+ * We must unconditionally set argisrow false here.
+ * Otherwise, we'd be treating a nested composite
+ * structure as though it were at top level, which
+ * would change the semantics of the test.
+ */
+ newntest->argisrow = false;
newntest->location = ntest->location;
newargs = lappend(newargs, newntest);
}

--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andrew Gierth

2016-07-08 03:58:44 UTC

Permalink

Here's that change as a proper patch with the original testcase added as
a regression test.

--
Andrew (irc:RhodiumToad)

Andrew Gierth

2016-07-13 02:56:32 UTC

Permalink

Andrew> It seems possible that this could be fixed by simply setting
Andrew> argisrow=false for all the null tests generated in such cases.

This turns out to be more somplicated than I thought. A lot of places
look at argisrow to distinguish "simple" null tests from the
standard-required logic for composite values. In some of the cases I've
looked at, fixing the bug actually looks like it improves things (for
example, in (row(1,a) IS NOT NULL), a can be deduced to be not null in
the "not the null value" sense); but some places like
match_clause_to_indexcol I'm less sure of.

--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

2016-07-22 20:21:44 UTC

Permalink

Post by Andrew Gierth
The executor, when doing IS [NOT] NULL on a composite value, looks at
each column to see if it is the null value. It does NOT recurse into
nested composite values, and my reading of the spec suggests that this
is correct.

Hmm. Of course the $64 question is whether that really is correct, or
sensible.

I went to look at the spec, and discovered that SQL:2011 actually has
wording that is different from SQL99, which I think is what we relied
on last time we considered this issue. Specifically, in 2011,
section 8.8 <null predicate> quoth:

<null predicate> ::= <row value predicand> <null predicate part 2>
<null predicate part 2> ::= IS [ NOT ] NULL

(Oddly, SQL does not seem to allow IS [NOT] NULL on non-composite values,
which is just silly.)

1) Let R be the <row value predicand> and let V be the value of R.

2) Case:
a) If V is the null value, then âR IS NULLâ is True and the value of
âR IS NOT NULLâ is False.
b) Otherwise:
i) The value of âR IS NULLâ is
Case:
1) If the value of every field of V is the null value, then True.
2) Otherwise, False.
ii) The value of âR IS NOT NULLâ is
Case:
1) If the value of no field of V is the null value, then True.
2) Otherwise, False.

NOTE 267 â For all R, âR IS NOT NULLâ has the same result as âNOT R IS
NULLâ if and only if R is of degree 1.

Rule (2a) was not there in SQL99. But look at what this is doing: it
is admitting straight out that a null composite value is not the same
as a composite value all of whose fields are null. It is only asserting
that a <null predicate> will not distinguish them. The implication is
that it's just fine if, say, COALESCE() doesn't act that way. Previously,
we thought this part of the spec was supposed to define what "V is null"
means everywhere else in the spec if V is composite; but now it seems
clear that "V is null" is a primitive test that is not the same as the
<null predicate> construct.

Post by Andrew Gierth
It seems possible that this could be fixed by simply setting
argisrow=false for all the null tests generated in such cases.

I concur that this is an appropriate fix if we believe that
ExecEvalNullTest's behavior is correct.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andrew Gierth

2016-07-22 22:27:32 UTC

Permalink

Post by Andrew Gierth
The executor, when doing IS [NOT] NULL on a composite value, looks
at each column to see if it is the null value. It does NOT recurse
into nested composite values, and my reading of the spec suggests
that this is correct.

Tom> Hmm. Of course the $64 question is whether that really is
Tom> correct, or sensible.

Tom> I went to look at the spec, and discovered that SQL:2011 actually
Tom> has wording that is different from SQL99, which I think is what we
Tom> relied on last time we considered this issue.

I've been mainly using the 2008 spec, but it seems to have identical
wording to the 2011 one.

My reading is as follows, based on these parts of the spec:

[Framework] 4.4.2 The null value
Every data type includes a special value, called the null value,
sometimes denoted by the keyword NULL. This value differs from other
values in the following respects: [...]

[Foundation] 8.8 <null predicate> (as you quoted)

It seems to me that where the spec refers to "the null value", this is
exactly what we do with the "isnull" flag, and that <null predicate> is
a separate thing.

Where the spec refers to the "fields" of a row, it at least strongly
implies that these are never taken recursively:

[Framework] 4.4.5.2 Row types
A row type is a sequence of one or more (field name, data type)
pairs, known as fields. A value of a row type consists of one value
for each of its fields.

Also in 6.2 <field definition> the "degree" of a row type is defined in
such a way that a row type with one declared field, whose type is also a
row type, has a degree of 1. So the comment in 8.8 about degree-1 rows
only makes sense if 8.8 isn't supposed to recurse.

Plus, if 8.8 had been intended to be recursive, it could easily have
been written that way.

Most of the references to nullity in the spec talk specifically about
"the null value" rather than referencing IS [NOT] NULL. In particular,
constructs like count(x) refer to "eliminating the null value", which in
turn implies that count(x) is not the same thing as count(*) filter
(where x is not null) if x is a row type. Also, IS DISTINCT FROM only
refers to "the null value", so (x IS DISTINCT FROM NULL) is not the same
as either (x IS NOT NULL) or (NOT (x IS NULL)). Likewise the rules for
<routine invocation> only talk about "the null value" in deciding
whether a routine declared RETURNS NULL ON NULL INPUT (i.e. STRICT) is
actually called.

The places that explicitly use IS [NOT] NULL are:

- some definitions about whether values are known-not-null, which is
ok because (x IS NOT NULL) clearly implies that x is not the null
value

- some rules for multiset operations where the operands must be
multisets, not rows

- NOT NULL constraints on columns are transformed into CHECK (col IS
NOT NULL), which isn't what we do: our not-null constraint currently
only excludes the null value, though this has been discussed in some
past threads

- COALESCE is implemented as CASE WHEN x IS NOT NULL THEN ... which
again is not what we do. The logic of the spec's version of this is
truly bizarre, and I can't imagine any possible use for it; having
COALESCE(row(1,null),row(2,null)) return (2,null) as the spec
demands seems to be all kinds of wrong.

Tom> <null predicate> ::= <row value predicand> <null predicate part 2>
Tom> <null predicate part 2> ::= IS [ NOT ] NULL

Tom> (Oddly, SQL does not seem to allow IS [NOT] NULL on non-composite
Tom> values, which is just silly.)

<row value predicand> can be a <row value constructor predicand> which
can be a <common value expression> or a <boolean predicand>, either of
which can be a <value expression primary>, and in both of these cases
the syntax rules specify that they are to be replaced by ROW(X). So
non-composite values are allowed and are indistinguishable from
composite values of degree 1.

(Note that <row value predicand> can also be a <row value special case>
which can be a <nonparenthesized value expression primary>, but that is
required to be of a row type. The spec here seems to rely on constraints
about the declared types to disambiguate between the case of a <row
value special case> and a <common value expression> which is a
<nonparenthesized value expression primary>.)

Tom> Rule (2a) was not there in SQL99. But look at what this is doing:
Tom> it is admitting straight out that a null composite value is not
Tom> the same as a composite value all of whose fields are null. It is
Tom> only asserting that a <null predicate> will not distinguish them.

Right, that's my reading too.

Tom> The implication is that it's just fine if, say, COALESCE() doesn't
Tom> act that way.

Not COALESCE, because COALESCE is defined as a syntactic transform to a
CASE expression that uses IS NOT NULL.

But I really don't think we should follow the spec on that one.

Post by Andrew Gierth
It seems possible that this could be fixed by simply setting
argisrow=false for all the null tests generated in such cases.

Tom> I concur that this is an appropriate fix if we believe that
Tom> ExecEvalNullTest's behavior is correct.

I believe it is, as explained above.

That said, there may also be merit in arguing that the spec is just
broken on this point. I think back when IS NULL was originally changed
to conform to the spec, it might have been better to just say "no,
that's silly" and stick with the old behavior.

We wouldn't strictly speaking be violating the spec if we did that,
because we don't claim to support feature T051 "Row types", and so we
can just say that our row types are a nonstandard extension without the
standard IS NULL behavior. We do in fact violate the spec here in many
ways; for example we treat types from CREATE TYPE foo AS (fields) as
being row types, but in fact the spec says these are user-defined
structured types and not row types.

But whether it would be a good idea to actually revert to the old
way... that's another question. Does anyone actually _want_ the spec
behavior of IS [NOT] NULL?

--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

2016-07-22 23:40:15 UTC

Permalink

Post by Andrew Gierth
That said, there may also be merit in arguing that the spec is just
broken on this point. I think back when IS NULL was originally changed
to conform to the spec, it might have been better to just say "no,
that's silly" and stick with the old behavior.
We wouldn't strictly speaking be violating the spec if we did that,
because we don't claim to support feature T051 "Row types", and so we
can just say that our row types are a nonstandard extension without the
standard IS NULL behavior. We do in fact violate the spec here in many
ways; for example we treat types from CREATE TYPE foo AS (fields) as
being row types, but in fact the spec says these are user-defined
structured types and not row types.
But whether it would be a good idea to actually revert to the old
way... that's another question. Does anyone actually _want_ the spec
behavior of IS [NOT] NULL?

Good question. The existing behavior seems to be traceable as far back as
68d9fbeb5511d846ce3a6f66b8955d3ca55a4b76, which implemented the specific
syntax ROW( ... ) IS [NOT] NULL by dint of breaking the ROW construct into
its field expressions and applying scalar NullTest to each one. I can't
find anything in the mail archives discussing it specifically, and it was
only a very minor element of that patch. I strongly suspect that Lockhart
just did what the spec said without thinking about it much; there is
certainly no trace of user requests for this behavior.

Later on we improved it to be less inconsistent with other ways of
specifying a rowtype argument, but again I don't see much indication of
user requests driving that, see here:
https://www.postgresql.org/message-id/flat/451BDB28.4050201%40sigaev.ru

But the other side of the coin is that the behavior does date back ten
years or more. Between backwards compatibility worries and the quite
clear language of the spec, it's a bit hard to summon the conviction to
say we ought to change it, much as I'd like to.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

2016-07-23 00:05:21 UTC

Permalink

Post by Tom Lane
Good question. The existing behavior seems to be traceable as far back as
68d9fbeb5511d846ce3a6f66b8955d3ca55a4b76, which implemented the specific
syntax ROW( ... ) IS [NOT] NULL by dint of breaking the ROW construct into
its field expressions and applying scalar NullTest to each one.

BTW, thinking about that a bit more, I'm pretty sure that back then we did
not have any robust notion of composite-type datums at all, and thus that
there simply would not have been any other way to implement this syntax.
It would not have worked to treat ROW(...) as a construct that delivered
a single Datum to a standard NullTest expression node. So it was pretty
convenient for Lockhart that the spec was written the way it was.

Also note the observation in the 2006 thread that the N-scalar-NullTests
implementation didn't get the semantics quite right anyway, which makes it
even less likely that anybody was depending on the spec semantics between
2002 and 2006.

But that still leaves us with ten years of history in which we *were*
conforming to the spec, modulo the very narrow corner case mentioned
in this thread.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andrew Gierth

2016-07-23 00:12:19 UTC

Permalink

Tom> But that still leaves us with ten years of history in which we
Tom> *were* conforming to the spec, modulo the very narrow corner case
Tom> mentioned in this thread.

Yeah, but the main visible effect of that has been a stream of "you have
to use NOT (x IS NULL) rather than (x IS NOT NULL)" responses to people
having trouble with this.

Is there a single reported case where anyone has actually needed the
spec's version of (x IS NOT NULL) for a composite type?

--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

2016-07-23 00:22:28 UTC

Permalink

Post by Andrew Gierth
Yeah, but the main visible effect of that has been a stream of "you have
to use NOT (x IS NULL) rather than (x IS NOT NULL)" responses to people
having trouble with this.

I don't offhand recall any such complaints on pgsql-bugs. Maybe there
have been some on IRC.

Post by Andrew Gierth
Is there a single reported case where anyone has actually needed the
spec's version of (x IS NOT NULL) for a composite type?

By definition, we get no "reports" for a case where something works
as someone expects. So you're demanding proof that cannot exist.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

David G. Johnston

2016-07-23 00:34:15 UTC

Permalink

Post by Tom Lane

Post by Andrew Gierth
Yeah, but the main visible effect of that has been a stream of "you have
to use NOT (x IS NULL) rather than (x IS NOT NULL)" responses to people
having trouble with this.

I don't offhand recall any such complaints on pgsql-bugs. Maybe there
have been some on IRC.

Post by Andrew Gierth
Is there a single reported case where anyone has actually needed the
spec's version of (x IS NOT NULL) for a composite type?

By definition, we get no "reports" for a case where something works
as someone expects. So you're demanding proof that cannot exist.

Yeah, I'd say we lump this into the same bucket as "unintentional
correlated subqueries" and "forgot to add a where clause to your delete
statement".

In short, don't use "IS NOT NULL". But, sorry, we cannot actually prohibit
itâ without upsetting lots of people.

David J.
â

Andrew Gierth

2016-07-23 01:37:09 UTC

Permalink

Post by Andrew Gierth
Yeah, but the main visible effect of that has been a stream of "you
have to use NOT (x IS NULL) rather than (x IS NOT NULL)" responses
to people having trouble with this.

Tom> I don't offhand recall any such complaints on pgsql-bugs. Maybe
Tom> there have been some on IRC.

Indeed so.

As an aside, while looking back through logs I found this, which was
reported as bug #7808 in 2013 and is still unfixed:

postgres=# select * from unnest(array[row('a',1)::c1,null::c1]) u;
ERROR: function returning set of rows cannot return null value

(it came up in the context of a "how to strip nulls from an array"
question; array(select u from unnest(a) u where u is not null) obviously
works only for scalars, and you need not (u is null) instead.)

Post by Andrew Gierth
Is there a single reported case where anyone has actually needed the
spec's version of (x IS NOT NULL) for a composite type?

Tom> By definition, we get no "reports" for a case where something
Tom> works as someone expects. So you're demanding proof that cannot
Tom> exist.

I meant "reported" in a more general sense that bug reports, obviously.

--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

2016-07-26 20:51:48 UTC

Permalink

Post by Andrew Gierth
Andrew> It seems possible that this could be fixed by simply setting
Andrew> argisrow=false for all the null tests generated in such cases.
This turns out to be more somplicated than I thought. A lot of places
look at argisrow to distinguish "simple" null tests from the
standard-required logic for composite values. In some of the cases I've
looked at, fixing the bug actually looks like it improves things (for
example, in (row(1,a) IS NOT NULL), a can be deduced to be not null in
the "not the null value" sense); but some places like
match_clause_to_indexcol I'm less sure of.

I looked around to see what other consequences there might be. I think
that match_clause_to_indexcol is fine, because it's mighty unlikely that
any index type would implement argisrow semantics for IS[NOT]NULL.
However, I found a couple of things that seem like bugs:

1. get_relation_constraints() converts attnotnull column markings into
"col IS NOT NULL" constraints, ie it sets argisrow true for a composite
column. Perhaps ideally that would be an accurate representation of the
constraint semantics, but today it is not; a correct representation would
be a "simple not null" constraint. We are overpromising what the
constraint implies.

2. ruleutils.c will print a NullTest node as "IS [NOT] NULL" regardless
of its argisrow setting. For the case with a rowtype argument and not
argisrow, that is an incorrect depiction of the semantics. Since such a
case can't arise in parser output, only in a plan, this doesn't break
view dumping, but it could lead to misleading EXPLAIN output.

3. postgres_fdw's deparse.c is likewise naive about NullTest, and here
that *can* lead to an indisputable bug if a const-folded condition is
being sent to the remote side. It'd be better to issue IS [NOT] DISTINCT
FROM NULL in such cases.

Problem #1 is a pre-existing error, but problems #2/#3 are newly
introduced by this patch, since up to now argisrow has merely been a
pre-cached indication of the argument type, not an independent variable.

So one solution approach is to say "okay, argisrow is now independent,
deal with it". In that case #1 could be fixed trivially by setting
argisrow = false in the generated NullTest, and I think we would have
to modify ruleutils.c and deparse.c to print scalar NullTest on a
composite value as IS [NOT] DISTINCT FROM NULL.

The other general answer is to revert to the previous belief that argisrow
is merely a helpful cache. In that case, ruleutils is fine, but both
eval_const_expressions and get_relation_constraints would need to be fixed
to emit DistinctExprs in the cases where they now want to emit an argisrow
= false NullTest for a composite input.

The first approach seems to carry a rather larger risk of breaking
third-party code, since it removes an invariant that used to exist.
(The fact that we have to touch postgres_fdw is certainly a red flag
that other FDWs might be affected.) On the other hand, the planner
generally has a lot more intelligence about NullTest than about
DistinctExpr, and so I'm worried that substituting the latter for the
former might result in some nasty performance regressions for specific
queries.

I'm leaning slightly to the first approach, but could probably be
talked out of it if anyone sees additional arguments. Comments?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs