The pg_partman extension
Manage large Postgres tables using the PostgreSQL Partition Manager extension
pg_partman
is a Postgres extension that simplifies the management of partitioned tables. Partitioning refers to splitting a single table into smaller pieces called partitions
. This is done based on the values in a key column or set of columns. Even though partitions are stored as separate physical tables, the partitioned table can still be queried as a single logical table. This can significantly enhance query performance and also help you manage the data lifecycle of tables that grow very large.
While Postgres natively supports partitioning a table, pg_partman
helps set up and manage partitioned tables:
- Automated partition creation:
pg_partman
automatically creates new partitions as new records are inserted, based on a specified interval for the partition key. - Automated maintenance:
pg_partman
bundles a background worker process that manages maintenance tasks without needing an external scheduler or cron job. For example, it can automatically detach old partitions from the main table based on a retention policy, runanalyze
on partitions to update statistics, and more.
In this guide, we’ll learn how to set up and use the pg_partman
extension with your Neon Postgres project. We'll cover why partitioning is helpful, how to enable pg_partman
, creating partitioned tables, and automating partition maintenance.
note
pg_partman
is an open-source Postgres extension that can be installed in any Neon project using the instructions below. Detailed installation instructions and compatibility information can be found in the pg_partman documentation.
pg_partman
extension
Enable the You can enable the extension by running the following CREATE EXTENSION
statement in the Neon SQL Editor or from a client such as psql
that is connected to Neon. Creatig a partman
schema is optional (but recommended) and you can name the schema whatever you like, but it cannot be changed after installation.
The pg_partman
extension does not require a superuser to run, but it's recommended to create a dedicated role for running pg_partman
functions and to act as the owner of all partition sets that pg_partman
will maintain.
Here is a sample SQL script to create a dedicated role with the minimum required privileges, assuming that pg_partman
is installed to the partman
schema and the dedicated role is named partman_user
:
If the role needs to create schemas, you'll have to grant CREATE
on the database as well. This is only required if you give the role above the CREATE
privilege on pre-existing schemas that will contain partition sets.
When you create a new Neon
project, the default database name is neondb
and the default schema name is public
. Replace {WORKING_DATABASE_NAME}
and {WORKING_SCHEMA_NAME}
with the actual database and schema names you want to manage the partitioned tables in. To find out more about the privileges needed to run pg_partman
, refer to the pg_partman documentation.
For information about using the Neon SQL Editor, see Query with Neon's SQL Editor. For information about using the psql
client with Neon, see Connect with psql.
Version Compatibility:
pg_partman
works with Postgres 14 and above, complementing the native partitioning features introduced in these versions.
Why partition your data?
For tables that grow very large, partitioning offers several benefits:
- Faster queries: Partitioning allows Postgres to quickly locate and retrieve data within a specific partition, rather than scanning the entire table.
- Scalability: Partitioning makes database administration simpler. For example, smaller partitions are easier to load and delete or back up and recover.
- Managing the data lifecycle: Easier management of the data lifecycle by archiving or purging old partitions, which can be moved to cheaper storage options without affecting the active dataset.
Native partitioning vs pg_partman
Postgres supports partitioning tables natively, with the following strategies to divide the data:
- List partitioning: Data is distributed across partitions based on a list of values, such as a category or location.
- Range partitioning: Data is distributed across partitions based on ranges of values, such as dates or numerical ranges.
With native partitioning, you need to manually create and manage partitions for your table.
pg_partman
supports creating partitions that are number or time-based, with each partition covering a range of values. It is particularly useful when partitions need to be created automatically as new records come in. So, list partitioning isn't applicable since the partition key values are not known in advance.
Example: Partitioning user-activity data
Consider a social media platform that tracks user interactions in their website application, such as likes, comments, and shares. The data is stored in a table called user_activities
, where activity_type
stores the type of activity and the other columns store additional information about the activity.
Setting up a new partitioned table
Given the large volume of data generated by user interactions, partitioning the user_activities
table can help keep queries manageable. Recent activity data is typically the most interesting for both the platform and its users, so activity_time
is a good candidate to partition on.
We can create the partitioned table using the following SQL statement, similar to defining a native partitioned table:
To create a partition for each week of activity data, you can run the following query:
This will create a new partition for each week of data in the user_activities
table. We can insert some sample data into the table:
Querying partitioned tables
We can query against the user_activities
table as if it were a single table, and Postgres will automatically route the query to the correct partition(s) based on the activity_time
column.
This query returns the following results:
To see the list of all partitions created for the user_activities
table, you can run the following query:
This will return the following results:
pg_partman
automatically created tables for weekly intervals close to the current data. As more data is inserted, it will create new partitions. Additionally, there is a user_activities_default
table that stores data that doesn't fit into any of the existing partitions.
Data retention policies
To make sure that old data is automatically removed from the main table, you can set up a retention policy:
The background worker process that comes bundled with pg_partman
automatically detaches the old partitions that are older than 4 weeks from the main table. Since we've set retention_keep_table
to true
, the old partitions are kept as separate tables, and not dropped from the database.
Additional considerations
pg_partman
Partitioning an existing table with If you have an existing table that you want to partition, you can use pg_partman
for it. However, it isn't straightforward since it can't be directly altered into the parent table for a partition set. Instead, you need to create a new partitioned table and copy the data from the existing table into the new partitioned table.
We describe the offline
method here, where queries to the existing table are stopped while the data is being copied over to the new partitioned table. It is also possible to achieve this while keeping the existing table operational, but it involves more complex steps. For more details, refer to the pg_partman documentation.
Example: Partitioning an existing table
To illustrate, we recreate the test_user_activities
table from the previous example but without specifying partitioning:
Now, we'll partition the existing test_user_activities
table using pg_partman
.
- Rename the original table so that the partitioned table can be created with the original table's name:
- Create a new table with the same name as the original table, but with partitioning enabled:
We were using a SERIAL
column for activity_id
in the original table. If you want to keep the same sequence for the new table, you can set the sequence value to the last value of the original table:
In general, we also need to ensure other properties from the old table, such as privileges, constraints, defaults, indexes, etc. are also applied to the new table.
- Use the
create_parent()
function provided bypg_partman
to set up partitioning on the new table:
- Now, to we can migrate data from the old table to the new partitioned table in smaller batches:
This will move the data from old_user_activities
to the new test_user_activities
table in daily intervals, committing after each batch. The p_interval
parameter specifies the interval of values to select in each batch, and p_loop_count
specifies the total number of batches to move.
- After the data migration is complete, the old table should be empty, and the new partitioned table should contain all the data and child tables. You can verify this by counting the number of rows in both the tables:
This should return 15 and 0 rows, respectively.
- Finally, run
VACUUM ANALYZE
on the new partitioned table to update statistics:
The test_user_activities
table is now successfully partitioned using pg_partman
, with the data migrated from the old table to the new partitioned structure.
Uniqueness constraints for partitioned tables
This section applies to partitioned tables created natively in Postgres, as well as those created using pg_partman
.
Postgres doesn't support indexes or unique constraints that span multiple tables. Since a partitioned table is made up of multiple physical tables, you can't create a unique constraint that spans all the partitions. For example, the following query will fail:
It returns the following error:
However, when the unique constraint involves partition key columns, Postgres can guarantee uniqueness across all partitions. In this way, different partitions cannot share the same values for the partition key columns, which allows unique constraints to be enforced.
For example, including the activity_time
column in the unique constraint will work because activity_time
is a partition key column:
Conclusion
By leveraging pg_partman
, you can significantly enhance the native partitioning functionality of Postgres, particularly for large-scale and time-series datasets. The extension simplifies partition management, automates retention and archival tasks, and improves query performance.
Reference
Last updated on