Notes on random stuff

pettus_postgresql.md 7.8KB

Christophe Pettus: PostgreSQL Proficiency for Python People - PyCon 2014

  • initdb - creates the basic structures and default database which is done usually via installation
  • createdb - create new database; is a wrapper for CREATE DATABASE ...
  • always use UTF-8 for encoding (not changable later)
  • pg_ctl - start and stop PGSQL; usually wrapped in service postgresql start/stop (use these); always use -m to shutdown
  • psql - most direct way for diagnosing database problems

  • all data (configs, databases, etc.) is in one directory, e.g. on Windows: C:\Program Files\PostgreSQL\9.3\data

  • real data is in /base

  • /pg_xlog - is not be messed with; not actually a log

  • two important config files postgresql.conf (configuration of server parameters) and pg_hba.conf (authentication rules)

  • postgresql.conf:

    • best practice: leave everything in here as is, add include 'postgresql.conf.include at the bottom, uncomment in original, know where all the overrides are
    • Logging: be generous with logs, best source to locate problems
    • example:
      log_destination = 'csvlog' // use CSV as log format log_directory = 'pg_log' // Standard
      logging_collector = on // Standard
      log_filename = 'postgres-%Y-%m-%d_%H%M%S' log_rotation_age = 1d // Standard - logfile every day log_rotation_size = 1GB // if log exceeds 1GB, create new file log_min_duration_statement = 250ms // if something takes longer than 250ms, log it; useful for slow queries log_checkpoints = on log_connections = on log_disconnections = on log_lock_waits = on log_temp_files = 0 // log all temp files

    • Memory:

    • shared buffers (means RAM used): if SysRAM < 2GB, use 20% of SysRAM; if RAM < 32GB, use 25% of RAM; if RAM > 32GB, use 8GB

    • work_mem: start low (32-64MB), look for 'temporary file' in logs, set to 2-3x size of largest temp file, be a hero because it may cause a huge speed-up; be careful because this memory may be used multiple times per query which means: if it's too large it will eat up the whole sys' RAM (rule of thumb: 50% of free sysRAM / active connections)

    • maintenance_work_mem: set top 10% of sysRAM, max. 1GB, maybe higher if VACUUM problems persist

    • Checkpoints: buffers will be flushed to disk as WAL, lot of I/O possible

    • example: wal_buffers = 16MB checkpoint_completion_target = 0 to 9 // higher is better but may take longer to reboot checkpoint_timeout = 10m-30m # depends on restart time; see logs for that checkpoint_segments = 32 # to start

    • Planner:

    • effective_io_concurrency - set to I/O channels, e.g. 4 disk RAID = 4

    • random_page_cost // ?? check later

  • pg_hba.conf:

    • role = db object able to own other object (e.g. a table) and has privileges
    • user = role + can login into system
    • postgres = superuser
    • by default traffic is not encrypted, turn on SSL
    • only let your webservers connect
  • WAL - write ahead log

    • key to many PGSQL operations (replication, crashes, etc.)
    • after each transaction is commited, it gets flushed to disk
    • continous record of changes since last checkpoint (means: if you got a checkpoint the WAL will be flushed; when you crash, you have the checkpoint plus the WAL to replay every commit since the checkpoint was made)
    • 16MB segmented files in pg_xlog; never mess with them
    • archive_command - used for backing up WAL segments; so each time a segment is complete it can run anyting you want (e.g. rsync this segment to this other machine), important procedure for creating backups in case of a complete crash of the DB
    • synchronous_commit - on: does not return on commit until WAL flush is done; off: returns when WAL flush is queued --> means: on on you will not be able to commit new data until the query is written to a WAL segment; on off you may loose data because the commit may not be recorded in a WAL segment
  • Backup and Recovery

    • backup everything at every time
    • pg_dump: logical snapshot of database; does not lock the DB; low load on DB; does some compression
    • use custom format
    • pg_restore: restores a pg_dump; quite slow (3-4x the time to write the dump; mostly because in creates new indices)
    • restore using --jobs which says have many parallel jobs can be started; good value is: no. of cores plus one
    • alternative: Point-in-Time Recovery (PITR) backup / recovery
    • select pg_start_backup(...) - make sure WAL files are suitable for backup
    • rsync WAL files somewhere (via archive_command)
    • select pg_start_backup() - ??
    • make sure all WAL segments are in place
    • scripts to support this: wal-e (highly recommended, backups in Amazon S3)
    • restore: copy the segments somewhere on the machine, set up recovery.conf, start PostgreSQL and let it recover (overall much faster than pg_restore)
    • simple recovery.conf example from MIT or this one
    • streaming replication possible and useful if one server goes down (more in the video between 1:06:00 and 1:21:00)
    • always have a disaster recovery strategy: geographical dispersion of servers, WAL archiving
    • pg_basebackup - take a base backup of a PostgreSQL cluster; mostly used for replication but also useful for backup
  • Transactions, MVCC, VACUUM

    • no half-done transactions possible, no transaction can see progress of other transaction
    • everything runs inside a transaction, even schema changes
    • this means everything is rollback-able (if transaction isn't finished)
    • also means: open transactions hold resources --> keep transactions short and precise (avoid idle'ing)
    • dirty reads (read during update) are prevented by Multi-Version Concurrency Control (MVCC) which basically creates different versions (called snapshots) of the database
    • each snapshots is equally 'important'; there is no 'real'/'original' snapshot; the conecpt of snapshots is only applicable on these situations: reader - reader, reader - writer, writer - reader; if there is a reader - reader situation one will get blocked
    • VACUUM will take out the snapshots or tuples/rows of snapshots which are present but do not belong to any transaction
    • autovacuum runs at all times and will do this automatically
    • planner needs statistics for executing queries effectively; ANALYZE will create them automatically but may have to be run after a big restore
    • if autovacuum seems not be running because the DB gets bloated it probably is; 20-50% dead space is quite usual; one should get worried if the bloating has no stop
    • use select * from pg_stat_user_tables for checking when last vacuum or analyze has been run; tables with plenty transactions and old vacuum/analyze timestamp may be a problem
    • are there long-running/idle transactions? Is a program locking the table? Many tables? (+10000)
    • solutions: reduce autovacuum sleep time, increase number of autovac. workers, do manual vacuum, fix locking sessions
  • Schema Design

    • normalization is important but has limits (if someone asks "is this the 3rd normal form?", give them the look)
    • pick a naming scheme and stick with it
    • PostgreSQL is quite fast with joins
    • use the rich typing system of PostgreSQL
    • use constraints, they're cheap and fast
    • use TIMESTAMPTZ instead of TIMESTAMP because the latter one will give you the timestamp somewhere (where you or the user are) opposed to a nice UTC timestamp
  • Indexes

    • standard PostgreSQL index is a B-Tree
    • you can use B-Tree on single column, multiple column (composite) and functions/expressions
    • useful under these conditions: uses on of the comparison operators, selects max 10-15% of rows, is run frequently
    • indexes can accelerate joins, group by considerably

Resume at 2:05:00 (to come: more indexes, optimization, updating & practical stuff)