THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Learning PostgreSql: old versions of rows are stored right in the table

PostgreSql features multi-version concurrency control aka MVCC. To implement MVCC, old versions of rows are stored right in the same table, and this is very different from what SQL Server does, and it leads to some very interesting consequences. Let us play with this thing a little bit, but first we need to set up some test data.

Setting up.

First of all, let us create a numbers table. Any production database must have it anyway:


INSERT INTO Numbers(i) VALUES(0),(1);

k IN 1..19 LOOP
INSERT INTO Numbers(i) SELECT i + POW(2,k) FROM Numbers;
LANGUAGE plpgsql 

SELECT PopulateNumbers();

Next, let us use the numbers table to create another test table:

CREATE TABLE public.TestUpdate
i, 2 AS SomeValue, CAST('Let''s add some space to make the table wider' AS VARCHAR) AS SpaceFiller
FROM public.Numbers;


Updates insert new versions of rows without modifying old ones.

Whenever we update a row, we essentially insert a new version, and the old version stays in the same table. This behavior is completely different from SQL Server, and leads to many consequences we need to be aware of.

For example, if we update all the rows in a table, it can grow twice as big. Let us see for ourselves.

At this time the test table's size is 84 Mb, as returned by the following script:

select pg_size_pretty( pg_relation_size('public.TestUpdate'));

Presumably the following query scans all the rows in the table:

SELECT MIN(SomeValue), MAX(SomeValue) FROM public.TestUpdate;

This query returns 2 for both MIN and MAX - this is how we populated our test table.

EXPLAIN output clearly demonstrates that all 1M rows are scanned:

SELECT MIN(SomeValue), MAX(SomeValue) FROM public.TestUpdate;

"  ->  Seq Scan on testupdate  (snip) rows=1048576 "

Let us update all but one rows in this table, as follows:

UPDATE public.TestUpdate SET SomeValue=i;

Query OK, 1048576 rows affected (execution time: 5.912 sec; total time: 5.912 sec)

After the update the test table's size is 169 Mb, as returned by the following script:

select pg_size_pretty( pg_relation_size('public.TestUpdate'));

Both uncommitted new versions and old versions are stored in the same table, which is why it uses up twice as much storage.

The following query, executed from the same connection, shows that although the table still has exactly 1M current rows, 2M-1 rows are scanned: both original data and new versions:

SELECT MIN(SomeValue), MAX(SomeValue) FROM public.TestUpdate;

(snip) Rows=2097055

We can rerun this query from another connection, and see the original unupdated values, with MIN and MAX both equal to 2:

SELECT MIN(SomeValue), MAX(SomeValue) FROM public.TestUpdate;

Yet if we EXPLAIN this query, we still see that 2M-1 rows are scanned.

Rollback is instantaneous.

It took the database engine almost 6 seconds to modify the table. Yet when we rollback the transaction, it rolls back immediately, because the original unupdated rows are still in the table:

Query OK, 0 rows affected (execution time: 0 ms; total time: 0 ms)

In SQL Server, one would expect the rollback to take considerable time, because the update took considerable time as well.

The table does not shrink, however: it's size is still 169 MB.

The extra storage temporarily used by the uncommitted transaction cannot be reused yet

At this time we have 1M current versions and 1M-1 old versions in the table. Let us update the table one more time:

UPDATE public.TestUpdate SET SomeValue=i;

The table size grows again to 253 MB. Even though we have rolled back the update, the storage used by rolled back versions is not available yet, so new storage is allocated for the new versions.

At this time, for every current row in the table we have two old versions. As a result, even though the table still has 1M rows, exactly as before, its size has grown three times.

Using VACUUM to reclaim unused storage

To reclaim the unused storage, we need to VACUUM the table, either explicitly, or via an automated process. Let us do it explicitly:

VACUUM public.TestUpdate;
Query OK, 0 rows affected (execution time: 31 ms; total time: 31 ms)

Having reclaimed the free space, we can add 1M rows, and the table size stays the same:

INSERT INTO public.TestUpdate(i, someValue, SpaceFiller)
SELECT i - 1048576, someValue, SpaceFiller
FROM public.TestUpdate;

Let us add 1M rows one more time, and the table size still stays the same, even though the table now has 3 M rows:

INSERT INTO public.TestUpdate(i, someValue, SpaceFiller)
SELECT i - 2*1048576, someValue, SpaceFiller
FROM public.TestUpdate
WHERE i>=0;

VACUUM and real execution costs

Currently our table has 3M rows. Let us delete 2M rows:

DELETE FROM public.TestUpdate WHERE i<0;

Even though the changes are committed, the selects against our table still have to scan all 3M rows:

SELECT MIN(SomeValue), MAX(SomeValue) FROM public.TestUpdate;

"  ->  Seq Scan on testupdate  (snip) rows=3735300 "

Let us VACUUM the table, and rerun the EXPLAIN. It will show that less rows are scanned:

(snip) rows=1591111

Yet the number of scanned rows is still about 50% more than the exactly 1M rows in our table. Clearly some space is not reclaimed yet. VACUUM FULL is our friend - after it completes, EXPLAINshows that exactly 1M rows are scanned.


We have seen several significant differences between PostgreSql and SQL Server, caused by the way MVCC is implemented in PostgreSql. As we continue migrating functionality to PostreSql, we need to be aware of these profound differences, so that the modifications against PostgreSql perform well.

On a side note, I am really impressed by the quality of PostgreSql online documentation.


Published Thursday, November 7, 2013 3:26 PM by Alexander Kuznetsov



Pavel Stehule said:

VACUUM FULL require exclusive locking, so it is not used on production often - only where is not enough space on disc. VACUUM is much lighter, and prepare free space map - so free space inside relations can be reused for data. Both VACUUM execution are automatized now - and it is called on background, when it is necessary - see a autovacuum. A main property of PG MVCC are facts: PostgreSQL has extremely fast ROLLBACK (zero time) and COMMIT (only fsync on log), but require more space on disc (when massive updates are processed). Cleaning is processed on background now - almost all PostgreSQL users doesn't know about it today.

November 8, 2013 4:45 AM

Alexander Kuznetsov said:


As a developer who worked with SQL Server for the last few years, I mostly concentrate on the differences between these two RDBMS. IMO most people who visit this blog are familiar with SQL Server, and it is the differences that need to be learned most of all so that the migration can succeed.

As such, I am very impressed by PostgreSql's immediate rollbacks - this is completely different from SQL Server. I am used to fast COMMITs, so I did not mention that - SQL Server commits fast as well.

November 8, 2013 11:35 AM

Raghav said:

Very nice comparison article. Thumbs up (y).

Rollback's  are very fast indeed in PG.. In PG there's no UNDO, hence when a transaction aborts it marks the row as Aborted. So, PG Table contain data everything COMMITTED / EXPIRED / ABORTED rows.  Cleaning process of Expired/Aborted are taken care by AUTOVACUUM/VACUUM(explicit).


November 9, 2013 10:49 PM

bhuvnesh said:

Does PostGreSql maitain any flag/column to distinguish between  ABORTED/EXPIRED/COMMITTED in A table ?

March 12, 2014 7:34 AM

William said:

Is the rollback command missing?

August 18, 2014 3:15 PM

Alexander Kuznetsov said:

There is no need to provide ROLLBACK command explicitly, it is very simple: ROLLBACK;

August 18, 2014 3:35 PM
New Comments to this post are disabled

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on and Currently he works as an agile developer.

This Blog


Privacy Statement