添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. Visit Stack Exchange

Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It only takes a minute to sign up.

Sign up to join this community

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am writing a query that returns a single record from the parent table. I would like to also return in this query if it has any children. This is a one to many relationship.

parent:
 -parent_id
 -name
child:
-child_id
-name
-parent_id

My first instinct is to write the following query:

select name, (select count(child_id) from child c  where c.parent_id=p.parent_id) children
     from parent p
     where name like 'some name'

But I was wondering if there was a more efficient way to do this, since I don't actually care about the count, just whether or not it has children. Any pointers?

Exactly. Use exists instead of counting all the children. I often liken this concept to someone asking how many people are in a room, versus asking whether or not the room is occupied. While knowing that there are 437 people in a room is nice, it's overkill if you're just looking for an empty room to use for a meeting... – Colin 't Hart Aug 17, 2018 at 19:12

Don't forget that Postgres has a boolean datatype. The following is the most succinct way to express the query:

select
  parent_id,
  name,
  exists (select from child where parent_id = p.parent_id) as has_children
from parent p;

https://dbfiddle.uk/?rdbms=postgres_10&fiddle=86748ba18ba8c0f31f1b77a74230f67b

Colin, it seems I did a poor job with my example. Let's say I am not selecting a specific parent, but rather many, will that sub query cause poor performance, or will it only run on parent records whose where clause matches? see updated question above – TheCatWhisperer Aug 17, 2018 at 20:10

Aggregate Method

The popular way we'll call it the aggregate method. Note bool_or(child_id IS NOT NULL) also works but was not any faster.

SELECT parent_id, count(*)>1 AS has_children
FROM parent
LEFT OUTER JOIN children
  USING (parent_id)
GROUP BY parent_id;

LEFT JOIN LATERAL with limit

But you may also try this, with LEFT JOIN LATERAL() like this..

SELECT parent_id, has_children
FROM parent AS p
LEFT JOIN LATERAL (
  SELECT true
  FROM children AS c
  WHERE c.parent_id = p.parent_id
  FETCH FIRST ROW ONLY
) AS t(has_children)
  ON (true);

EXISTS

Just FYI, you can use CROSS JOIN LATERAL with EXISTS too (which is I believe how it's planned). We'll call it the EXISTS method.

SELECT parent_id, has_children
FROM parent AS p
CROSS JOIN LATERAL (
  SELECT EXISTS(
    SELECT 
    FROM children AS c
    WHERE c.parent_id = p.parent_id
) AS t(has_children);

Which is the same as,

SELECT parent_id, EXISTS(
    SELECT 
    FROM children AS c
    WHERE c.parent_id = p.parent_id
) AS has_children
FROM parent AS p;

Benchmarks

Sample dataset

1000000 children, 2500 parents. Our sims get it done.

CREATE TABLE parent (
  parent_id int PRIMARY KEY
INSERT INTO parent
  SELECT x
  FROM generate_series(1,1e4,4) AS gs(x);
CREATE TABLE children (
  child_id int PRIMARY KEY,
  parent_id int REFERENCES parent
INSERT INTO children
  SELECT x, 1 + (x::int%1e4)::int/4*4
  FROM generate_series(1,1e6) AS gs(x);
VACUUM FULL ANALYZE children;
VACUUM FULL ANALYZE parent;

Results (pt1)

  • Aggregate method: 450ms,
  • LEFT JOIN LATERAL ( FETCH FIRST ROW ONLY ): 850ms
  • EXISTS method: 850ms
  • Results (adding an index and running again)

    Now let's add an index

    CREATE INDEX ON children (parent_id);
    ANALYZE children;
    

    Now the timing profile is totally different,

  • Aggregate method: 450ms,
  • LEFT JOIN LATERAL ( FETCH FIRST ROW ONLY ): 30ms
  • EXISTS method: 30ms
  • This is how I would do in SQL server (I don't have postgresql- I'm guessing it would be similiar)

    SELECT p.parent_id,
    CASE WHEN EXISTS (SELECT 1 FROM Child c WHERE c.ParentId=p.ParentId)
                    THEN 'Yes'
                    ELSE 'No'
                    END as has_child,
    FROM Parent p
    --WHERE EXISTS (SELECT 1 FROM Child c WHERE c.ParentId=p.ParentId)
                    There's no need to count all the children, so there's definitely room for performance improvement.
    – Colin 't Hart
                    Aug 17, 2018 at 19:10
            

    Thanks for contributing an answer to Database Administrators Stack Exchange!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.