[PostgreSQL] Basic Query Tuning

2 minute read

Basic Query Tuning

  • Query Processing Pipeline
  • EXPLAIN & EXPLAIN ANALYZE
  • Meaning of Cost

Query Processing Pipeline

  • Parser
    • checks whether the query is valid
    • build a query tree
  • Rewriter
    • decomposes the query and modify if improvements can be made for efficiencies
  • Planner
    • evaluate the plans and choose the most efficient plan
  • Executor
    • runs the query

EXPLAIN & EXPLAIN ANALYZE

  • EXPLAIN: Build a query plan and display info about it
  • EXPLAIN ANALYZE: Build a query plan, run it, and display info about it
EXPLAIN ANALYZE
SELECT username, contents
FROM users AS u
INNER JOIN "comments" AS c ON u.id = c.user_id
WHERE username = 'Alyson14';

 /*
1) Hash Join  (cost=8.31..1756.11 rows=11 width=81) (actual time=0.359..19.250 rows=7 loops=1)
  Hash Cond: (c.user_id = u.id)
  2) ->  Seq Scan on comments c  (cost=0.00..1589.10 rows=60410 width=72) (actual time=0.011..8.107 rows=60410 loops=1)
  3) ->  Hash  (cost=8.30..8.30 rows=1 width=17) (actual time=0.032..0.034 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        4) ->  Index Scan using users_username_idx on users u  (cost=0.28..8.30 rows=1 width=17) (actual time=0.025..0.027 rows=1 loops=1)
              Index Cond: ((username)::text = 'Alyson14'::text)
Planning Time: 0.387 ms
Execution Time: 19.383 ms
*/
  • The top row and all the arrows(➡️) are query nodes, meaning that it is a step where postgres is trying to access some data stored inside the database or trying to do some processing
  • Sequence: 4) Index Scan ➡️ 2) Seq Scan ➡️ 3) Hash ➡️ 1) Hash Join
    • 4) Index Scan: accessing some data on the hard disk and emits the data to 3) Hash
    • 2) Seq Scan: Full Scan on a table and emits the data to 1) Hash Join
    • 3) Hash: hashtag process the data and emits the data to 1) Hash Join
    • 1) Hash Join: combines all the data from 3) Hash and 2) Seq Scan
-- Hash Join  (cost=8.31..1756.11 rows=11 width=81) (actual time=0.359..19.250 rows=7 loops=1)

/*
postgres can get the following information(rows, width) because postgres saves information about the table
*/
SELECT *
FROM pg_stats
WHERE tablename IN ('users', 'comments');
  • Hash Join: How this node is generating data
  • cost: Amount of processing power required for this step
    • the first number(e.g. 8.31): cost for this step to produce the first row
    • the second number(e.g. 1756.11): cost for this step to produce all row
    • data(rows) are emited to the next step right after the execution, however if the first, and second number is the same(e.g. 3) Hash (cost=8.30..8.30), it means that all the process has to be executed, and the rows are sent in altogether.
  • rows: Guess at how many rows this step will produce
  • width: Guess at the average number of bytes of each row

Meaning of Cost

  • seq_page_cost: pages(blocks) read sequentially(1.0)
  • random_page_cost: pages read at random(4.0)
  • cpu_tuple_cost: rows scanned(.01)
  • cpu_index_tuple_cost: index entries scanned(0.005)
  • cpu_operator_cost: time function/ operation evaluated(0.0025)
  • There are more sort of costs which can be found in 🔗 Postgres Query Planning 공식문서

ref