What is a "bitmap index"?
Answer a question
I have a PostgreSQL query that is taking longer than I'd like. I'm looking at the output of EXPLAIN ANALYZE and it mentions a Bitmap Index Scan. I've been searching the 'net and reading for about 10 minutes, but I cannot figure out:
Is a Bitmap Index a manufactured thing—something that I could improve if I added a real index to some column somewhere—or is it a specific type of a real index?
Here's the single table that I'm querying:
bugbot4b=> \d bug_snapshots
Table "public.bug_snapshots"
Column | Type | Modifiers
------------+-----------------------------+-----------
fixin_id | integer | not null
created_on | timestamp without time zone | not null
pain | integer | not null
status_id | integer | not null
Indexes:
"bug_snapshots_pkey" PRIMARY KEY, btree (fixin_id, created_on)
Foreign-key constraints:
"bug_snapshots_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE SET NULL
"bug_snapshots_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)
And here's the result of analyzing the query. Note that there are about 3k distinct fixin_id literally within the query (elided in the below), and the table has 900k rows. Counting just those rows within the particular time range yields 15,000 rows.
EXPLAIN ANALYZE SELECT "created_on", sum("pain") AS "sum_pain" FROM "bug_snapshots"
WHERE (("fixin_id" IN (11,12,33,…,5351))
AND ("status_id" IN (2, 7, 5, 3))
AND ("created_on" >= '2013-10-08 16:42:26.994994-0700')
AND ("created_on" <= '2013-11-07 15:42:26.994994-0800')
AND ("pain" < 999))
GROUP BY "created_on"
ORDER BY "created_on";
Sort (cost=59559.33..59559.38 rows=20 width=12) (actual time=19.472..19.494 rows=30 loops=1)
Sort Key: created_on
Sort Method: quicksort Memory: 18kB
-> HashAggregate (cost=59558.64..59558.89 rows=20 width=12) (actual time=19.401..19.428 rows=30 loops=1)
-> Bitmap Heap Scan on bug_snapshots (cost=9622.42..59509.25 rows=9878 width=12) (actual time=6.849..13.420 rows=6196 loops=1)
Recheck Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
Filter: ((pain < 999) AND (status_id = ANY ('{2,7,5,3}'::integer[])))
-> Bitmap Index Scan on bug_snapshots_pkey (cost=0.00..9619.95 rows=11172 width=0) (actual time=6.801..6.801 rows=6196 loops=1)
Index Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
Total runtime: 19.646 ms
(10 rows)
Is the result of ANALYZE telling me that I need to add an index to the fixin_id (and/or other fields) to improve the speed? Or that this is just "slow" because of its size?
Answers
"Bitmap index scan"
Postgres doesn't have "bitmap indexes" per se. A "bitmap index scan" is an index access method that's allowed for certain index types (including default btree indexes). It's especially useful to combine multiple index lookups. The manual:
An index access method can support "plain" index scans, "bitmap" index scans, or both.
You can disable bitmap-scanning (for debugging purposes only!) by setting:
SET enable_bitmapscan = FALSE;
Optimize query performance
With long lists, joining to a derived table is often faster than a lengthy IN expression. You can use VALUES or unnest() for that purpose. Or even a temporary table, possibly with indexes. See:
- Query table by indexes from integer array
SELECT created_on, sum(pain) AS sum_pain
FROM unnest('{11,12,33,…,5351}'::int[]) AS f(fixin_id)
JOIN bug_snapshots USING (fixin_id)
WHERE status_id IN (2, 7, 5, 3)
AND created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz
AND created_on <= '2013-11-07 15:42:26.994994-0800'::timestamptz
AND pain < 999
GROUP BY created_on
ORDER BY created_on;
A partial multicolumn index would probably help (a lot). That depends on details like data distribution, load, stable query conditions, etc. Most importantly, the selectivity of WHERE expressions: a partial index typically only makes sense if many or most rows are excluded. Something like:
CREATE INDEX bug_snapshots_part_idx ON bug_snapshots (fixin_id, created_on, pain)
WHERE status_id IN (2, 7, 5, 3)
AND pain < 999;
The sequence of columns in the index matters. That is also true for your primary key, btw, which implements another multicolumn index. See:
- Is a composite index also good for queries on the first field?
db<>fiddle here
sqlfiddle
Performance testing in fiddles is hardly reliable. Run your own tests! Also there were many improvements to Postgres since 2013, when this answer was written!
timestamp [without time zone]
One more thing: bug_snapshots.created_on is type timestamp. Timestamps are interpreted according to your current time zone setting.
But in the query you try to compare to literals with time zone (timestamptz). This would work with an explicit cast:
WHERE created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz
Your literal would be cast do timestamptz and translated to your local time zone accordingly. However, since you don't provide the data type, Postgres casts your literal to the matching type timestamp (not timestamptz) ignoring the time zone offset. Most probably not your intention!
Consider this test:
SELECT min(created_on), max(created_on)
FROM bug_snapshots
WHERE created_on >= '2013-10-08 16:42:26.994994-0700'
AND created_on <= '2013-11-07 15:42:26.994994-0800'
See:
- Ignoring time zones altogether in Rails and PostgreSQL
更多推荐
所有评论(0)