Postgres 19 Release Notes

Posted by Bruce Momjian in EDB on 2026-04-15 at 21:15

I have just completed the first draft of the Postgres 19 release notes. It includes little developer community feedback and still needs more XML markup and links. This year I have created a wiki page explaining the process I use.

The release note feature count is 212, which includes a strong list of administrative and monitoring features. Postgres 19 Beta 1 should be released in a few months. The final release is planned for September/October of this year.

Waiting for PostgreSQL 19 – Online enabling and disabling of data checksums

Posted by Hubert 'depesz' Lubaczewski on 2026-04-15 at 18:05

On 3rd of April 2026, Daniel Gustafsson committed patch: Online enabling and disabling of data checksums This allows data checksums to be enabled, or disabled, in a running cluster without restricting access to the cluster during processing. Data checksums could prior to this only be enabled during initdb or when the cluster is … Continue reading

Introducing Xata OSS: Postgres platform with branching, now Apache 2.0

Posted by Tudor Golubenco in Xata on 2026-04-15 at 12:30

Xata core is now available as open source under the Apache 2 license. It adds copy-on-write branching, scale-to-zero compute to Postgres.

pgEdge Vectorizer and RAG Server: Bringing Semantic Search to PostgreSQL (Part 2)

Posted by Ahsan Hadi in pgEdge on 2026-04-15 at 06:29

In my previous blog, I walked through setting up the pgEdge MCP Server with a distributed PostgreSQL cluster, and connecting Claude to live database data through natural language. In this blog I want to look at a different problem: how do you build AI-powered search over your own content, without adding a separate vector database to your infrastructure?This is where the pgEdge Vectorizer and RAG Server come in. Together, they give you a complete open-source Retrieval-Augmented Generation (RAG) pipeline that runs entirely inside PostgreSQL. In this blog, I'll explain what each component does, how they work together, and walk through working examples that you can follow on your own PostgreSQL instance.I am following the same pattern in this blog as I have been doing in my other blogs. The goal is to explain each component and then provide real world working examples in order to the reader to better understand these concepts.Please note: I am using my Rocky Linux VM for this installation and testing and using the Ollama embedding provider (installed on my VM) to generate the embeddings.

Background: The Problem With Keeping Vector Search In Sync

Most teams building AI-powered search hit the same wall. You set up a vector search pipeline, load your documents, generate embeddings, and everything works. Then someone updates a document or adds a new one - suddenly you need a process to detect the change, re-chunk the content, regenerate the embeddings, and update the index. Teams typically solve this with custom scripts, message queues, or external orchestration tools - all of which need to be built, maintained, and monitored separately from the database.The pgEdge Vectorizer eliminates that problem entirely. It runs as a PostgreSQL background worker. Once you enable Vectorizer on a table, it monitors the source data through triggers, chunks and embeds new or modified rows automatically, and keeps the search index in sync without any external orchestration. The same transactional guarantees that PostgreSQL gives y[...]

Postgres performance regression: are we there yet?

Posted by Lætitia AVROT on 2026-04-15 at 00:00

Every year, PostgreSQL gets faster. Researchers benchmarking the optimizer from version 8 through 16 found an average 15% performance improvement per major release. That’s a decade of consistent, measurable progress. The project has been doing this since 1996. So when a headline claimed Linux 7.0 just halved PostgreSQL throughput, DBAs, Sys Admins, and DevOps started panicking (in particular, those working with Ubuntu 26.04 LTS which plan to ship Linux kernel 7.

AI-Ready PostgreSQL 18 Is Out: Why AI Applications Win or Lose at the Seams

Posted by Vibhor Kumar on 2026-04-14 at 23:29

Most AI projects do not fail because the model is weak. They fail because the seams around the model break under real-world constraints such as data truth, governance, and production reality.

If you have shipped anything beyond a demo, you have seen the pattern. The embeddings look plausible, the chatbot sounds confident, and the prototype “works.” Then a user asks a normal question like: “Show me something like a leather jacket but lighter, under $150, and available right now.” If the system cannot enforce current pricing, availability reality, and access rules, the experience becomes untrustworthy. When trust breaks, architecture often splinters into extra systems, sync pipelines, and brittle glue code.

That is the motivation behind AI-Ready PostgreSQL 18: Building Intelligent Data Systems with Transactions, Analytics, and Vectors, which I coauthored with Marc Linster, with a foreword by Ed Boyajian. This book is built as a field guide. It includes working schemas, scripts, and production patterns—not just concepts—so builders can ship semantic search, recommendations, and assistants without splitting truth across systems.

This post is not a sales pitch. This post explains the core idea, shows a minimal hands-on demo using the open-source scripts, and gives you a practical checklist for what “AI-ready” means in production.

What you will get from this post

By the end of this post, you will understand three things clearly:

Why semantic search fails in production when it is not paired with relational truth.
What the “hybrid pattern” looks like: semantic candidates + SQL constraints in one flow.
How to try a working demo that returns both evidence rows and an LLM-generated explanation grounded in those rows.

TL;DR

AI systems succeed when meaning and truth stay close.

Vectors provide semantic recall (“what feels similar”). SQL enforces operational truth (“what is valid, current, allowed, and sellable”). When you keep embeddings in PostgreSQL with pgvec

[...]

Owning the pipe: physical replication, cloud neutrality, and the escape from DBaaS lock-in

Posted by Gabriele Bartolini in EDB on 2026-04-14 at 00:32

This article examines how managed database services deliberately suppress access to the physical replication stream, turning operational convenience into permanent lock-in. It makes the case for a cloud-neutral stack — PostgreSQL, Kubernetes, and CloudNativePG — as the only architecture that returns full operational sovereignty to the organisation that owns the data.

pg_clickhouse 0.2.0

Posted by David Wheeler on 2026-04-13 at 22:22

In response to a generous corpus of real-world user feedback, we’ve been hard at work the past week adding a slew of updates to pg_clickhouse, the query interface for ClickHouse from Postgres. As usual, we focused on improving pushdown, especially for various date and time, array, and regular expression functions.

Regular expressions prove to be a particular challenge, because while Postgres supports POSIX Regular Expressions, ClickHouse relies on RE2. For simple regular expressions that no doubt make up a huge number of use cases, the differences matter little or not at all. But these two engines take quite different approaches to regular expression evaluation, so issues will come up.

To address this, the new regular expression pushdown code examines the flags passed to the Postgres regular expression functions and refuses to push down in the presence of incompatible flags. It will push down compatible flags, though it takes pains to also pass (?-s) to disable the s flag, because ClickHouse enables s by default, contrary to the expectations of the Postgres regular expression user.

pg_clickhouse does not (yet?) examine the flags embedded in the regular expression, but v0.2.0 now provides the pg_clickhouse.pushdown_regex setting, which can disable regular expression pushdown:

SET pg_clickhouse.pushdown_regex = 'false';

My colleague Philip Dubé has also started work embedding ClickHouse-compatible regular expression functions that use re2 directly, to provide more options soon — not to mention a standalone extension with just those functions.

As with all pg_clickhouse releases to date, v0.2.0 does not break compatibility with previous versions at all: once the new library has been installed and reloaded, existing v0.1 releases get all the benefits. There is, however, a new function, pgch_version(), which requires an upgrade to use:

try=# ALTER EXTENSION pg_clickhouse UPDATE TO '0.2';
ALTER EXTENS

[...]

Contributions for week 14, 2026

Posted by Cornelia Biacsics in postgres-contrib.org on 2026-04-13 at 08:19

The Toulouse PostgreSQL User Group met on April 7, 2026 organized by

Geoffrey Coulaud
Xavier SIMON
Jean-Christophe Arnu

Speakers:

Mohamed Nossirat
Jean-Christophe Arnu
Pierre Fersing

Claire Giordano and Aaron Wislang hosted and published a new podcast episode on April 10, 2026 "How I went from Oracle to Postgres (with a big NoSQL detour) with Gwen Shapira" from the Talking Postgres series.

Community Blog Posts:

Pat Wright about PGDay Paris including his video about the same conference

Understanding PostgreSQL Wait Events

Posted by Richard Yen on 2026-04-13 at 08:00

Introduction

One of the most useful debugging tools in modern PostgreSQL is the wait event system. When a query slows down or a database becomes CPU bound, a natural question is: “What are sessions actually waiting on?” Postgres exposes this information through the pg_stat_activity view via two columns:

wait_event_type
wait_event

These fields reveal what the backend process is blocked on at a given moment. Among the different wait types, one category tends to cause confusion:

LWLock

If you’ve ever seen dashboards full of LWLock waits, you’re not alone in wondering what they mean and whether they’re a problem.

Where Wait Events Appear

The easiest way to see wait events is:

SELECT pid,
wait_event_type,
wait_event,
state,
query
FROM pg_stat_activity
WHERE state != 'idle';

Example output might look like:

pid	wait_event_type	wait_event	state
1234	Lock	transactionid	active
5678	LWLock	buffer_content	active
9012	IO	DataFileRead	active

Each category represents a different kind of wait. Common types include:

Lock
LWLock
IO
Client
IPC
Activity

Among these, LWLock waits often appear during performance incidents.

What Is an LWLock?

LWLock stands for Lightweight Lock. These are internal Postgres synchronization primitives used to coordinate access to shared memory structures. Note that they are NOT related to lock contention on tables, or deadlocking when performing DML. LWLocks protect important internal structures such as:

shared buffers
WAL buffers
lock tables
SLRU caches

Because

[...]

Zero autovacuum_cost_delay, Write Storms, and You

Posted by Jeremy Schneider on 2026-04-13 at 05:10

A few days ago, Shaun Thomas published an article over on the pgEdge blog called [Checkpoints, Write Storms, and You]. Sadly a lot of corporate blogs don’t have comment functionality anymore. I left a few comments [on LinkedIn], but overall let me say this article is a great read, and I’m always happy to see someone dive into an important and overlooked topic, present a good technical description, and include real test results to illustrate the details.

I don’t have any reproducible real test results today. But I have a good story and a little real data.

Vacuum tuning in Postgres is considered by some to be a dark art. Few can confidently say: “Yes I know the right value for autovacuum_cost_delay.” The documentation gives guidance, blog posts give opinions, and sooner or later, you start thinking, “Surely I can just set this one to zero — what’s the worst that could happen?”

My own story starts with some unexplained, intermittent application performance problems. We were doing some internal benchmarking to see just how far we could push a particular stack and see how much throughput a specific application could get. Everything hums along fine until suddenly – latency would spike across the board and the application would choke, causing backlogs and work queues to blow up throughout the system.

Where do you start when you have application performance problems? Wait Events and Top SQL – always! I’m far from the first person to evangelize this idea; I’ve said many times that wait events and top SQL are almost always the fastest way to discover where the bottlenecks are when you see unexpected performance problems. My [2024 SCaLE talk about wait events] gets into this.

So naturally I dug into the wait events and top SQL – and I noticed these slowdowns lined up perfectly with spikes in COMMIT statements on IPC:SyncRep waits. This wait event is not well understood. Last October I published an article [Explaining IPC:SyncRep – Postgres Sync Replication is Not Actually Sync Replication] with more e

[...]

504 Extensions: Expand the PostgreSQL Landscape

Posted by Ruohang Feng on 2026-04-13 at 00:00

One GitHub issue turned into an extension sprint. 32 new additions, 504 in total, say a lot about where PostgreSQL is headed.

column_encrypt v4.0: A Simpler, Safer Model for Column-Level Encryption in PostgreSQL

Posted by Vibhor Kumar on 2026-04-12 at 20:47

There is a point in every security tool’s life where adding one more feature is less important than removing one more obstacle.

That is what makes column_encrypt v4.0 interesting.

This release is not trying to be louder. It is trying to be cleaner. It takes the capabilities built across earlier versions of the extension and distills them into a smaller, more coherent, more production-friendly interface. The headline changes say a great deal: all management functions now live under the encrypt schema, the old multi-role model has been replaced by a single column_encrypt_user role, automatic log masking removes a manual operational step, and the extension tightens its security posture with safer SECURITY DEFINER behavior and schema-qualified object handling. In other words, v4.0 is a simplification release in the best sense of the phrase: less ceremony, fewer sharp edges, stronger defaults.

At its core, column_encrypt remains focused on a very practical problem: how do you protect sensitive fields inside PostgreSQL without forcing every application team to reinvent encryption logic in application code? The extension provides transparent column-level encryption through custom data types such as encrypted_text and encrypted_bytea, while supporting wrapped key storage, session-scoped key loading, searchable blind indexes, verification, and key rotation. Those foundations were built over earlier releases, including the two-tier KEK/DEK model from v2.0 and multi-version key lifecycle support from v3.0. What v4.0 does is make that model easier to understand, easier to operate, and easier to trust in production.

Why column-level encryption matters

For many teams, the hardest part of data security is not agreeing that it matters. That argument ended a long time ago. The hard part is implementation.

A healthcare platform needs to protect patient identifiers, diagnoses, insurance records, and clinical notes. A financial platform needs to secure account identifiers, tax records, and payment met

[...]

Waiting for Postgres 19: Reduced timing overhead for EXPLAIN ANALYZE with RDTSC

Posted by Lukas Fittl on 2026-04-11 at 12:00

In today’s E121 of “5mins of Postgres” we're talking about the upcoming Postgres 19 release, and how a change in the Postgres instrumentation handling reduces overhead of timing measurements in EXPLAIN ANALYZE using the RDTSC instruction, and why this will allow turning on for more workloads. We dive into the recently committed change that I (Lukas) authored together with Andres Freund and David Geier. See the full transcript with examples below. Share this episode: Click here to share this…

Checkpoints, Write Storms, and You

Posted by Shaun Thomas in pgEdge on 2026-04-10 at 06:06

Every database has to reconcile two uncomfortable truths: memory is fast but volatile, and disk is slow but durable. Postgres handles this tension through its Write-Ahead Log (WAL), which records every change before it happens. But the WAL can't grow forever. At some point, Postgres needs to flush all those accumulated dirty pages to disk and declare a clean starting point. That process is called a checkpoint, and when it goes wrong, it can bring throughput to its knees.

A Bit About Checkpoints

Under normal operation, Postgres is remarkably polite about checkpoints. The parameter (default 5 minutes) tells Postgres how often to perform a scheduled checkpoint, and (default 0.9) tells it to spread the resulting writes over 90% of that interval. So a checkpoint timeout of 5 minutes means Postgres trickles dirty pages to disk over roughly 4.5 minutes, keeping IO impact to a minimum.This only applies to timed checkpoint behavior.The parameter sets a soft limit on how much WAL can accumulate between checkpoints. When the WAL approaches that threshold (1GB by default), Postgres doesn't wait for the next scheduled checkpoint. Instead, it forces one immediately.These forced (or requested) checkpoints do not honor . Postgres needs to reclaim WAL space, so it flushes every dirty buffer to disk as fast as the IO subsystem will allow. On a busy system with a large pool full of modified pages, this can completely saturate disk IO in seconds.It's like trying to drink from a firehose.

Rubber Meets the Road

To see this in action, we set up a modest test environment:

Hypervisor:
Proxmox

CPU:
4x AMD EPYC 9454 cores

RAM:
4GB

DB Storage:
100GB @ 2,000 IOPS

WAL Storage:
100GB @ 2,000 IOPS

OS:
Debian 12 Bookworm

We initialized the database with at a scale factor of 800, producing roughly 12GB of data (3x available RAM to reduce cache hits). We also followed the traditional advice of setting to 25% of RAM, or 1GB in this case. All other set[...]

Waiting for PostgreSQL 19 – new pg_get_*_ddl() functions

Posted by Hubert 'depesz' Lubaczewski on 2026-04-09 at 16:37

On 5th of April 2026, Andrew Dunstan committed patch: Add pg_get_database_ddl() function Add a new SQL-callable function that returns the DDL statements needed to recreate a database. It takes a regdatabase argument and an optional VARIADIC text argument for options that are specified as alternating name/value pairs. The following options are supported: pretty (boolean) … Continue reading

The 1 GB Limit That Breaks pg_prewarm at Scale

Posted by warda bibi in Stormatics on 2026-04-09 at 06:56

Recently, we encountered a production incident where PostgreSQL 16.8 became unstable, preventing the application from establishing database connections. The same behavior was independently reproduced in a separate test environment, ruling out infrastructure and configuration issues. Further investigation identified the pg_prewarm extension as the source of the problem.

This blog post breaks down the failure, the underlying constraint, why it manifests only under specific configurations, and the corresponding short-term mitigation and long-term fix.

What pg_prewarm Does

Every time PostgreSQL restarts, its shared buffer cache (the region of RAM it uses to hold frequently accessed data pages from disk) starts completely cold. Every query that touches data must go to disk first. On large production systems, this cold-start penalty can be severe, sometimes taking hours before the cache naturally warms through organic traffic.

pg_prewarm solves this in two ways. First, it gives you manual control so you can explicitly warm specific tables or indexes on demand, useful before a heavy batch job or a known query workload. Second, it ships with an autoprewarm mode that, when enabled, continuously tracks which pages are resident in shared buffers and automatically replays that list after a restart with no manual intervention required. For high-traffic systems with large shared_buffers, this is operationally critical.

The BUG

In order for autoprewarm to dump the list of cached pages, PostgreSQL must build an array in memory containing one entry per page currently in shared buffers. Each entry (a BlockInfoRecord) is 20 bytes.

palloc(NBuffers * sizeof(BlockInfoRecord))

NBuffers is the number of shared buffer slots derived directly from shared_buffers setting. PostgreSQL’s standard memory allocator, palloc, enforces a hard ceiling of 1 GB on any single allocation. Any palloc() call requesting more

[...]

pgcollection 2.0: Integer Keys, Range Deletes, and Oracle Parity

Posted by Jim Mlodgenski on 2026-04-09 at 00:01

In my first post about pgcollection, I introduced the collection type to address the challenge of migrating Oracle associative arrays keyed by strings to PostgreSQL. For integer-keyed associative arrays, I noted that native PostgreSQL arrays work well enough for simple cases. That holds true until the keys are sparse.

Consider this Oracle pattern:

DECLARE
  TYPE cache_t IS TABLE OF VARCHAR2(100) INDEX BY PLS_INTEGER;
  cache  cache_t;
BEGIN
  cache(1)       := 'first';
  cache(1000000) := 'millionth';
  DBMS_OUTPUT.PUT_LINE('Count: ' || cache.COUNT);  -- 2
END;

The equivalent attempt with a PostgreSQL array produces a different result:

DO $$
DECLARE
  a text[];
BEGIN
  a[1]       := 'first';
  a[1000000] := 'millionth';
  RAISE NOTICE 'Length: %', array_length(a, 1);  -- 1000000
END $$;

PostgreSQL fills positions 2 through 999,999 with NULLs. You asked for two entries and got a million-element array. Worse, it is impossible to distinguish between a key that was explicitly set to NULL and one that was never set at all. pgcollection now avoids both problems.

icollection

icollection is a 64-bit integer-keyed associative array that stores only the keys you set. The same pattern from above works as expected:

DO $$
DECLARE
  cache  icollection('text');
BEGIN
  cache[1]       := 'first';
  cache[1000000] := 'millionth';

  RAISE NOTICE 'Count: %', count(cache);  -- 2
  RAISE NOTICE 'Value: %', cache[1000000];
  RAISE NOTICE 'Key 500 exists: %', exist(cache, 500);  -- false
END $$;

icollection supports the same full set of operations as collection — subscript access, forward and reverse iteration, sorting, set-returning functions, and JSON casting — with bigint keys instead of text. It maps directly to Oracle’s TABLE OF ... INDEX BY PLS_INTEGER, with the keys widened to 64-bit so overflow is not a concern during migration.

The exist() function resolves the NULL ambiguity problem directly. With a PostgreSQL array, a[2] returns NULL whether the key was set to NULL or never set. With icollecti

[...]

AI at the Edge, Truth in Postgres

Posted by Vibhor Kumar on 2026-04-08 at 19:54

A practical blueprint for secure, private, high-performance AI systems

Edge AI is having its inevitable moment. Not because the cloud is going away, but because reality keeps interrupting theory. Networks drop. Latency matters. Privacy rules get sharper teeth. Regulators ask harder questions. And in that world, the winning architecture is rarely the one with the flashiest model. It is the one that can still make the right decision when the link is weak, the clock is drifting, and the audit trail needs to hold up in daylight. As of April 2026, PostgreSQL 18 is the current major release, with 18.3 already out, and the surrounding governance landscape has moved too: the EU AI Act is now in phased application, and its broader 2026 obligations are close enough that “we’ll add governance later” is no longer a serious sentence.

The core argument of this series still holds, and I would state it even more strongly now: at the edge, AI can be probabilistic, but your system of record cannot be. That is why PostgreSQL matters here. It is not just a database in this pattern. It is the local ledger, the policy boundary, the coordination plane, and often the simplest place to make trust real. PostgreSQL 18 strengthened that story with asynchronous I/O, OAuth authentication, continued row-level security capabilities, and ongoing logical replication improvements; meanwhile pgvector continues to make hybrid relational-plus-vector patterns more natural inside the same operational envelope.

Edge is not a location. It is a latency budget and a failure budget.

A lot of edge architecture still gets described as geography. Factory floor. Retail store. Branch office. Vehicle. Hospital wing. That is useful, but incomplete. Edge is really the place where your acceptable latency, privacy boundary, and resilience needs collide. If a decision must happen in tens of milliseconds, if the raw data should not leave the site, or if the system must keep working through intermittent connectivity, then the architecture ha

[...]

Contributions for week 13, 2026

Posted by Cornelia Biacsics in postgres-contrib.org on 2026-04-07 at 08:58

The Prague PostgreSQL Meetup met on March 30, 2026, organized by Gulcin Yildirim Jelinek and Mayur B.

Speakers:

Radim Marek
Mayur B.

Community Blog Posts:

Pat Wright about Nordic Pg Day 2026

Community Videos:

Pavlo Golub about SCALE 23x

Using the pgEdge MCP Server with a Distributed PostgreSQL Cluster

Posted by Ahsan Hadi in pgEdge on 2026-04-07 at 06:36

I recently wrapped up my blog series covering the exciting new features in PostgreSQL 18 — from Asynchronous I/O and Skip Scan to the powerful RETURNING clause enhancements. If you haven't had a chance to read them yet, head over to pgedge.com/blog where you'll also find some great content from my colleagues on how PostgreSQL is embracing the AI revolution.Speaking of the AI revolution — in this blog I want to shift gears and dive into something I've been genuinely excited to explore: using the pgEdge MCP Server with a distributed PostgreSQL cluster. I'll explore one of those AI tools firsthand — the pgEdge MCP Server — and specifically what it looks like to connect it to a true distributed PostgreSQL cluster.The Model Context Protocol (MCP) has quickly become the standard way to connect Large Language Models (LLMs) to external data sources and tools. With the release of the pgEdge Agentic AI Toolkit, PostgreSQL developers and DBAs can now connect AI assistants like Claude directly to their databases through the pgEdge Postgres MCP Server.In this blog, I'll focus specifically on what makes using the MCP Server (used with a pgEdge Distributed PostgreSQL cluster) interesting and different from a single-node setup. I'll walk through the setup, and demonstrate practical examples where the MCP Server combined with a distributed cluster becomes a powerful tool for DBAs and developers alike.

A Quick Overview of the pgEdge MCP Server

The pgEdge Postgres MCP Server is part of the pgEdge Agentic AI Toolkit. It gives AI assistants secure, structured access to your PostgreSQL database - not just raw query execution, but deep schema introspection, performance metrics, and the ability to reason about your data model. Once connected, Claude (or other LLMs) can understand your schema, identify slow queries, inspect index usage, and help you write optimized SQL - all through natural language.The following functionality sets the pgEdge MCP Server apart from other Postgres MCP servers:

Full schema introspection
— pr

[...]

Schemas in PostgreSQL and Oracle: what is the difference?

Posted by Laurenz Albe in Cybertec on 2026-04-07 at 05:28

Two engineers talking. One says, "... then you log into the database as schema SYSTEM ..." and the other one thinks, "We shouldn't have tried to build that tower in Babylon."
© Laurenz Albe 2026

Recently, somebody asked me for a reference to a blog or other resource that describes how schemas work differently in Oracle. I didn't have such a reference, so I'm writing this article. But rather than just describing Oracle schemas to a reader who knows PostgreSQL, I'll try to present the topic in a way that helps Oracle users understand schemas in PostgreSQL as well. Since I already wrote about the difference between database transactions in Oracle and PostgreSQL, perhaps this can turn into a series!

The common ground: what is a schema

Schemas are defined by the SQL standard, so it is no surprise that there are a lot of similarities. Essentially, a schema is the same thing in Oracle and PostgreSQL: A named collection of database objects. Database objects that reside in a schema are called schema objects. Schemas have nothing to do with object storage, they give the database logical structure, very similar to directories in the file system. A schema is also a namespace: there cannot be any two tables with the same name in the same schema.

In the remaining article, I will explore the differences:

the relationship between users and schemas
the scope of a schema's namespace
schemas and privileges (permissions)
schemas and object ownership
the default schema for unqualified object names
system schemas

Users and schemas

This is probably the topic where the differences between both database management systems are most pronounced.

Oracle

Oracle has both database users and schemas, but it enforces a one-to-one correspondence between the two: Creating a user automatically creates a schema with the same name, and there is no other way to create a schema (the standard SQL statement CREATE SCHEMA exists, but doesn't create a schema). Because of that correspondence, many people don't clearly distinguish between “user” and “schema”. It is not unusual to hear an Oracle administrator say “Connect to the database as schema X” or “the tabl

[...]

pg_column_size(): What you see is not what you get

Posted by Lætitia AVROT on 2026-04-07 at 00:00

Thanks to my colleague Ozair, who sent me a JIRA ticket saying “I need to drop that huge column, what are the consequences?” My first question was: how huge? That’s when the rabbit hole opened. It looks simple. It is simple. Just use the administrative function pg_column_size(). Until you have toasted attributes. Then it gets interesting. A bit of history 🔗pg_column_size() was added in PostgreSQL 8.1 by Mark Kirkwood (commit a9236028).

pg_clickhouse 0.1.10

Posted by David Wheeler on 2026-04-06 at 21:38

Hi, it’s me, back again with another update to pg_clickhouse, the query interface for ClickHouse from Postgres. This release, v0.1.10, maintains binary compatibility with earlier versions but ships a number of significant improvements that increase compatibility of Postgres features with ClickHouse. Highlights include:

Mappings for the JSON and JSONB -> TEXT and ->> TEXT operators, as well as jsonb_extract_path_text() and jsonb_extract_path(), to be pushed down to ClickHouse using its sub-column syntax.
Mappings to push down the Postgres statement_timestamp(), transaction_timestamp(), and clock_timestamp() functions, as well as the Postgres “SQL Value Functions”, including CURRENT_TIMESTAMP, CURRENT_USER, and CURRENT_DATABASE.
And the big one: mappings to push down compatible window functions, including ROW_NUMBER, RANK, DENSE_RANK, LEAD,LAG, FIRST_VALUE, LAST_VALUE, NTH_VALUE, NTILE, CUME_DIST, PERCENT_RANK, and MIN/MAX OVER.
Oh yeah, the other big one: added result set streaming to the HTTP driver. Rather that load all the results A testing loading a 1GB table reduced memory consumption from over 1GB to 73MB peak.

We’ll work up a longer post to show off some of these features in the next week. But in the meantime, git it while it’s hot!

Thanks to my colleagues, Kaushik Iska and Philip Dubé for the slew of pull requests I waded through this past week!

Don't let your AI touch production

Posted by Radim Marek on 2026-04-06 at 20:24

Not so long ago, the biggest threat to production databases was the developer who claimed it worked on their machine. If you've attended my sessions, you know this is a topic I'm particularly sensitive to.

These days, AI agents are writing your SQL. The models are getting incredibly good at producing plausible code. It looks right, it feels right, and often it passes a cursory glance. But "plausible" isn't a performance metric, and it doesn't care about your execution plan or locking strategy.

AI-generated SQL is syntactically correct, which is the easy part. The hard part is knowing what a statement does to a running system: which locks it takes, how long it holds them, whether it rewrites the table on disk.

The spectacular failures get the headlines. In July 2025, an AI coding agent wiped a production database during a code freeze -- ran destructive commands, panicked, then lied about what it had done.

But the biggest damage is quieter. It's the migration that passes every test, ships through CI and then locks a production table during peak traffic. The query written with random assumptions. Indexes added based on copy/paste from psql. The cumulative effect that builds over time, when nobody is looking.

How to give AI agent "eyes"

When you're running Claude Code or any other agentic coding tool, the bottleneck isn't the model's intelligence. It's the fidelity of the environment. Standard AI coding involves the agent guessing column names based on your description, or in better cases parsing a schema.sql file or using a local database with seed data.

None of these give the agent the one thing it actually needs: awareness of your production schema. The table sizes, the indexes, the constraints, the statistics that determine whether a query index-scans in 2ms or sequential-scans for 40 seconds.

The obvious fix is to give it a database connection. Let it query pg_catalog, read table sizes, check existing indexes. This is how Anthropic's reference PostgreSQL MCP server worked, and

[...]

WAL as a Data Distribution Layer

Posted by Richard Yen on 2026-04-06 at 08:00

Introduction

Every so often, I talk to someone working in data analytics who wants access to production data, or at least a snapshot of it. Sometimes, they tell me about their ETL setup, which takes hours to refresh and can be brittle, with a lot of monitoring around it. For them, it works, but it sometimes gets me wondering if they need all that plumbing to get a snapshot of their live dataset. Back at Turnitin, I set up a way to get people access to production data without having to snapshot nightly, and I thought maybe I should share it with people here.

Common Implementations and Their Risks

Typical solutions that we might encounter as we give people a little bit of access to production data:

1. Query the primary

This is generally a bad idea, since you don’t want users getting access to the production prirmary, lest they make some mistakes or do something to lock up tables that prevent customers from using your apps. Even with a read-only user, large data analytics queries could cause unwanted interference that negatively affect your uptime. This is almost certainly not the way to go.

2. Query a streaming replica

This is better, but doing this is not free. Long-running queries can create replay lag, vacuum conflicts can cancel queries, and I/O contention can affect the primary upstream. It’s safer since users are forced to be read-only, but that still carries risk.

3. Nightly snapshots / rebuilds

Having time-based snapshots and rebuilds are the most common form of getting data out to analysts. ETL queries run at night (or some other specified regular interval) and provide the information needed to do the necessary work. This works, but is another piece of software that produces somewhat stale data, depending on how much stale-ness can be tolerated.

Once Upon a Time, Before Streaming Replication

If you’ve spent any time in Postgres, you already understand streaming replication. Primary sends WAL to standby, and standby replays the WAL stream. All the tutorials tal

[...]

PAX: The Storage Engine Strikes Back

Posted by Lætitia AVROT on 2026-04-06 at 00:00

Thanks to Boris Novikov, who pointed me in the PAX direction in the first place and followed up with many insightful technical discussions. I’m grateful for all his time and the great conversations we’ve had and continue to have. To dive deeper into the mechanics of PAX, I highly recommend checking out my previous post: PAX: The cache performance you’re looking for. PAX looks elegant on paper: minipages, cache locality, column-oriented access inside an 8KB page.

Using non ACID storage as workaround instead missing autonomous transactions

Posted by Pavel Stehule on 2026-04-03 at 05:57

When I was younger, the culture war (in my bubble) was about transactional versus non-transactional engines, Postgres versus MySQL (MyISAM). Surely, I preferred the transactional concept. Data integrity and crash safety is super important. But it is not without costs. It was visible 30 years ago, when MySQL was a super fast and PostgreSQL super slow database. Today on more powerful computers it is visible too, not too strong, but still it is visible. And we still use non-transactional storages a lot of - applications logs.

There are some cases when performance wins over consistency, and it can be acceptable. When I thought about non-transactional storages, I got one idea. It can be great replacement for missing autonomous transactions. But how to test it. Fortunately I found a csv_tam storage implemented by Alexey Gordeev. This storage is mostly a concept with a lot of limits. But the idea is great - csv is a strong protocol - it is not block based, it has no row headers - so it can be very hard to support transactions. On second hand, it is primitive, and without any buffering and with forcing syncing after any row, it is mostly crash safe (against Postgres crash). Sure - it is not as safe as block storage ensured by WAL, but can be safe enough - billions applications use this safety for logging today.

I did fork and fixed build on pg 17+. Now all types are supported and writing from parallel writes should be safe. It doesn't write to WAL, so these tables cannot be backuped and cannot be replicated - what can be a nice game to support it. It is not easy to do that in a non-block format. But for testing it is enough, and I believe so this extension is very simple, so it is enough for non critical environments. It is really very very simple.

Postgres has not autonomous transactions. There are some workarounds like using dblink or pg_background. As usual any workaround has some disadvantages and limits. pg_background looks good, but at the end, it doesn't ensure 100% suc

[...]

What is a Collation, and Why is My Data Corrupt

Posted by Shaun Thomas in pgEdge on 2026-04-03 at 05:36

The GNU C Library (glibc) version 2.28 entered the world on August 1st, 2018 and Postgres hasn't been the same since. Among its many changes was a massive update to locale collation data, bringing it in line with the 2016 Edition 4 release of the ISO 14651 standard and Unicode 9.0.0. This was not a subtle tweak. It was the culmination of roughly 18 years of accumulated locale modifications, all merged in a single release.Nobody threw a party.What followed was one of the most significant and insidious data integrity incidents in the history of Postgres. Indexes silently became corrupt, query results changed without warning, and unique constraints were no longer trustworthy. The worst part? You had to know to look for it. Postgres didn't complain. The operating system didn't complain. Everything appeared normal, right up until it wasn't.This is the story of how a library upgrade quietly corrupted databases around the world, what the Postgres community did about it, and how to make sure it never happens to you again.

What even is a Collation?

Before we can understand what broke, we need to understand what a collation actually does. At its core, a collation defines how text is compared and sorted. That sounds simple enough, but collation rules become much more turbulent outside of the English alphabet.Consider the German letter ß. Does it sort the same as "ss"? Usually. What about accented characters like é and è? Should they be treated as equivalent to "e" for sorting purposes, or should they have their own distinct positions? What about the Swedish alphabet, where ä and ö come after z rather than being treated as variants of a and o?Every language has its own answer to these questions, and a collation encodes those answers into a set of rules for a database to follow. When Postgres needs to sort a column of text, enforce a unique constraint, or build a B-tree index, it asks the collation: "Which of these two strings comes first?" The collation's answer determines everything from query results to whether an i[...]

pg_clickhouse 0.1.6

Posted by David Wheeler on 2026-04-02 at 15:21

We fixed a few bugs this week in pg_clickhouse, the query interface for ClickHouse from Postgres The fixes, improve query cancellation and function & operator pushdown, including to_timestamp(float8), ILIKE, LIKE, and regex operators. Get the new v0.1.6 release from the usual places:

Thanks to my colleague, Kaushik Iska, for most of these fixes!

Latest Blog Posts

Posted by Bruce Momjian in EDB on 2026-04-15 at 21:15

Posted by Hubert 'depesz' Lubaczewski on 2026-04-15 at 18:05

Posted by Tudor Golubenco in Xata on 2026-04-15 at 12:30

Posted by Ahsan Hadi in pgEdge on 2026-04-15 at 06:29

Background: The Problem With Keeping Vector Search In Sync

Posted by Lætitia AVROT on 2026-04-15 at 00:00

Posted by Vibhor Kumar on 2026-04-14 at 23:29

What you will get from this post

TL;DR

Posted by Gabriele Bartolini in EDB on 2026-04-14 at 00:32

Posted by David Wheeler on 2026-04-13 at 22:22

Posted by Cornelia Biacsics in postgres-contrib.org on 2026-04-13 at 08:19

Posted by Richard Yen on 2026-04-13 at 08:00

Introduction

Where Wait Events Appear

What Is an LWLock?

Posted by Jeremy Schneider on 2026-04-13 at 05:10

Posted by Ruohang Feng on 2026-04-13 at 00:00

Posted by Vibhor Kumar on 2026-04-12 at 20:47

Why column-level encryption matters

Posted by Lukas Fittl on 2026-04-11 at 12:00

Posted by Shaun Thomas in pgEdge on 2026-04-10 at 06:06

A Bit About Checkpoints

Rubber Meets the Road

Posted by Hubert 'depesz' Lubaczewski on 2026-04-09 at 16:37

Posted by warda bibi in Stormatics on 2026-04-09 at 06:56

What pg_prewarm Does

The BUG

Posted by Jim Mlodgenski on 2026-04-09 at 00:01

icollection

Posted by Vibhor Kumar on 2026-04-08 at 19:54

Edge is not a location. It is a latency budget and a failure budget.

Posted by Cornelia Biacsics in postgres-contrib.org on 2026-04-07 at 08:58

Posted by Ahsan Hadi in pgEdge on 2026-04-07 at 06:36

A Quick Overview of the pgEdge MCP Server

Posted by Laurenz Albe in Cybertec on 2026-04-07 at 05:28

The common ground: what is a schema

Users and schemas

Oracle

Posted by Lætitia AVROT on 2026-04-07 at 00:00

Posted by David Wheeler on 2026-04-06 at 21:38

Posted by Radim Marek on 2026-04-06 at 20:24

How to give AI agent "eyes"

Posted by Richard Yen on 2026-04-06 at 08:00

Introduction

Common Implementations and Their Risks

1. Query the primary

2. Query a streaming replica

3. Nightly snapshots / rebuilds

Once Upon a Time, Before Streaming Replication

Posted by Lætitia AVROT on 2026-04-06 at 00:00

Posted by Pavel Stehule on 2026-04-03 at 05:57

Posted by Shaun Thomas in pgEdge on 2026-04-03 at 05:36

What even is a Collation?

Posted by David Wheeler on 2026-04-02 at 15:21

Top posters

Top teams

Feeds

Planet

Contact