2016-08-18

Getting started with Aiven Kafka

Apache Kafka is a high-throughput publish-subscribe message broker, perfect for providing messaging for microservices. Aiven offers fully managed Kafka-as-a-Service in Amazon Web Services, Google Compute Engine, DigitalOcean and UpCloud (Microsoft Azure support is coming during 2016 Q3!).

Apache Kafka


Apache Kafka is a popular open-source publish-subscribe message broker. Kafka is distributed by design and offers scalability, high-throughput and fault-tolerance. Kafka excels in streaming data workloads, ingesting and serving hundreds of megabytes per second from thousands of clients.

Apache Kafka deployment example



Apache Kafka was originally developed by LinkedIn and open sourced in 2011.

Kafka is often used as a central piece in analytics, telemetry and log aggregation workloads, where it is used to capture and distribute event data at very high data rates. It can act as a communications hub for microservices for distributing work over a large cluster of nodes.

With Aiven, we use Kafka as a message bus between our cluster nodes as well as delivering telemetry, statistics and logging data. Kafka's guarantees for message delivery and fault-tolerance allows us to simplify and de-couple service components.

What is Aiven Kafka


Aiven Kafka is our fully managed Kafka service. We take care of the deployment and operational burden of running your own Kafka service, and make sure your cluster stays available, healthy and always up-to-date. We ensure your data remains safe by encrypting it both in transit and at rest on disk.

We offer multiple different plan types with different cluster sizing and capacity, and charge only based on your actual use on an hourly basis. Custom plans for deployments that are larger or have specific needs can also be requested. Aiven also makes it possible to migrate between the plans with no downtime to address changes in your requirements.

Below, I'll walk you through setting up and running with your first Aiven Kafka service.


Getting started with Aiven Kafka


Creating Aiven Kafka service is easy: just select the correct service type from the drop down menu on the new service creation dialog. You'll have the option of selecting three or five node cluster plans with the storage sizing of your choice. The larger node count allows for larger throughput or larger replica factors for mission critical data. If unsure, pick a three node cluster; you can always change the selected plan at a later time.



All Aiven services are offered over SSL encrypted connections for your protection. With Kafka, you're also required to perform client authentication with service certificates we provide. You can find and download these keys and certificates on the connection parameters section on the service details page: access key and certifications plus CA certificate you can use to verify the Aiven endpoint. Store these locally, we'll be referring back to them in code examples below (ca.crt, client.crt, client.key).




Finally, you can create the topics you'd like to use under the topics tab on the service details page. In Kafka terms, topics are logical channels that your send messages to and read them from. Topics themselves are divided into one or more partitions. Partitions can be used to handle larger read/write rates, but do note that Kafka's ordering guarantees are only valid within one partition.

When creating a topic, you can select number of partitions, number of replicas and how many hours the messages are retained in the Kafka logs before deletion. You also can increase the number of partitions at a later time.


That's it! The service is up and running and ready to capture and distribute your messages. Aiven team will take care of the operational burden of your cluster, and ensure it remains available and in use at all times. To utilize the service, we've included code examples in Python and Node.js below. Just make sure to replace the value of bootstrap_servers below with the service URL from the service details page. Also, verify that the SSL settings below point to the actual key and certificate files downloaded earlier.

Accessing Aiven Kafka in Python


Producing messages - Kafka term for sending them:

from kafka import KafkaProducer

producer = KafkaProducer(
    bootstrap_servers="getting-started-with-kafka.htn-aiven-demo.aivencloud.com:17705",
    security_protocol="SSL",
    ssl_cafile="ca.crt",
    ssl_certfile="client.crt",
    ssl_keyfile="client.key",
)

for i in range(1, 4):
    message = "message number {}".format(i)
    print("Sending: {}".format(message))
    producer.send("demo-topic", message.encode("utf-8"))

# Wait for all messages to be sent
producer.flush()

Consuming or receiving the same:

from kafka import KafkaConsumer

consumer = KafkaConsumer(
    bootstrap_servers="getting-started-with-kafka.htn-aiven-demo.aivencloud.com:17705",
    client_id="demo-client-1",
    group_id="demo-group",
    security_protocol="SSL",
    ssl_cafile="ca.crt",
    ssl_certfile="client.crt",
    ssl_keyfile="client.key",
)

for msg in consumer:
    print("Received: {}".format(msg.value))

Output from the producer above:

$ python kafka-producer.py
Sending: message number 1
Sending: message number 2
Sending: message number 3

And the consuming side:

$ python kafka-consumer.py
Received: message number 1
Received: message number 2
Received: message number 3

 

Accessing Aiven Kafka in Node.js


Here's a Node.js example utilizing node-rdkafka module:

var Kafka = require('node-rdkafka');

var producer = new Kafka.Producer({
    'metadata.broker.list': 'getting-started-with-kafka.htn-aiven-demo.aivencloud.com:17705',
    'security.protocol': 'ssl',
    'ssl.key.location': 'client.key',
    'ssl.certificate.location': 'client.crt',
    'ssl.ca.location': 'ca.crt',
    'dr_cb': true
});

producer.connect();

producer.on('ready', function() {
    var topic = producer.Topic('demo-topic', {'request.required.acks': 1});
    producer.produce({
        message: new Buffer('Hello world!'),
        topic: topic,
    }, function(err) {
        if (err) {
            console.log('Failed to send message', err);
        } else {
            console.log('Message sent successfully');
        }
    });
});

And the consuming side:

var Kafka = require('node-rdkafka');

var consumer = new Kafka.KafkaConsumer({
    'metadata.broker.list': 'getting-started-with-kafka.htn-aiven-demo.aivencloud.com:17705',
    'group.id': 'demo-group',
    'security.protocol': 'ssl',
    'ssl.key.location': 'client.key',
    'ssl.certificate.location': 'client.crt',
    'ssl.ca.location': 'ca.crt',
});

var stream = consumer.getReadStream('demo-topic');

stream.on('data', function(data) {
    console.log('Got message:', data.message.toString());
});



Trying Aiven is free, no credit card required


Remember that trying Aiven is free: you will receive US$10 worth of free credits at sign-up which you can use to try any of our service plans. The offer works for all of our services: PostgreSQL, Redis, InfluxDB, Grafana, Elasticsearch and Kafka!

Go to https://aiven.io/ to get started!


Cheers,

    Team Aiven

2016-07-22

Backing up tablespaces and streaming WAL with PGHoard

We've just released a new version of PGHoard, the PostgreSQL cloud backup tool we initially developed for Aiven and later open sourced.

Version 1.4.0 comes with the following new features:
  • Support for PostgreSQL 9.6 beta3
  • Support for backing up multiple tablespaces
  • Support for StatsD and DataDog metrics collection
  • Basebackup restoration now shows download progress
  • Experimental new WAL streaming mode walreceiver, which reads the write-ahead log data directly from the PostgreSQL server using the streaming replication protocol
  • New status API in the internal REST HTTP server
Please see our previous blog post about PGHoard for more information about the tool and a guide for deploying it.

Backing up multiple tablespaces

This is the first version of PGHoard capable of backing up multiple tablespaces. Multiple tablespaces require using the new local-tar backup option for reading files directly from the disk instead of streaming them using pg_basebackup as pg_basebackup doesn't currently allow streaming multiple tablespaces without writing them to the local filesystem.

The current version of PGHoard can utilize the local-tar backup mode only on a PG master server, PostgreSQL versions prior to 9.6 don't allow users to run the necessary control commands on a standby server without using the pgespresso extension. pgespresso also required fixes which we contributed to support multiple tablespaces - once a fixed version has been released we'll add support for it to PGHoard.

The next version of PGHoard, due out by the time of PostgreSQL 9.6 final release, will support local-tar backups from standby servers, natively when running 9.6 and using the pgespresso extension when running older versions with the latest version of the extension.

A future version of PGHoard will support backing up and restoring PostgreSQL basebackups in parallel mode when using the local-tar mode.  This will greatly reduce the time required for setting up a new standby server or restoring a system from backups.

Streaming replication support

This version adds experimental support for reading PostgreSQL's write-ahead log directly from the server using the streaming replication protocol which is also used by PostgreSQL's native replication and related tools such as pg_basebackup and pg_receivexlog. The functionality currently depends on an unmerged psycopg2 pull request which we hope to see land in a psycopg2 release soon.

While the walreceiver mode is still experimental it has a number of benefits over other methods of backing up the WAL and allows implementing new features in the future: temporary, uncompressed, files as written by pg_receivexlog are no longer needed saving disk space and I/O and incomplete WAL segments can be archived at specified intervals or, for example, whenever a new COMMIT appears in the WAL stream.

New contributors

The following people contributed their first patches to PGHoard in this release:
  • Brad Durrow
  • Tarvi Pillessaar

PGHoard in Aiven.io

We're happy to talk more about PGHoard and help you set up your backups with it.  You can also sign up for a free trial of our Aiven.io PostgreSQL service where PGHoard will take care of your backups.


Cheers,
Team Aiven

2016-07-19

New, bigger InfluxDB plans now available

We're happy to announce the immediate availability of new, bigger InfluxDB plans in Aiven. The new plans allow you to store up to 750 gigabytes of time-series data in a fully-managed InfluxDB database.

InfluxDB can be used to store time-series data from various data sources using data collection tools like Telegraf. The collected data is typically operating system and application metric data like CPU utilization and disk space usage, but we've also for example helped set up InfluxDB to host time-series data for an industrial manufacturing line where our Grafana service is used for data visualization.

Our InfluxDB Startup-4 plan, available in all AWS, Google Cloud, UpCloud and DigitalOcean regions, was expanded to 16 gigabytes of storage space and we've announced all new Startup-8, 16, 32 and 64 plans available in all AWS, Google Cloud and UpCloud regions with CPU counts ranging from 1 to 16, RAM from 4 to 64 gigabytes and storage space between 50 and 750 gigabytes.

Trying Aiven is free, no credit card required

Remember that trying Aiven is free: you will receive US$10 worth of free credits at sign-up which you can use to try any of our service plans.

Go to https://aiven.io/ to get started!


Cheers,

    Team Aiven

2016-07-13

Aiven Kafka now publicly available!

In a world filled with microservices we're delighted to announce yet another expansion of the Aiven service portfolio in the form of Aiven Kafka. Aiven Kafka adds streaming data capabilities in the form of a distributed commit log. For the last three months we've been offering Apache Kafka in private beta and now we're making it publicly available!


Aiven Kafka is a service that can be used to ingest and read back large quantities of log event data. This allows you to write your whole event stream durably in a fire hose like fashion and then process it at your leisure. Kafka is being used in some of the largest companies on the planet for many mission-critical workloads. Besides using it for streaming data you can also use it as a message broker connecting your myriad services with each other.

Historically Kafka itself and especially its reliance on Apache ZooKeeper has made its setup require considerable time and effort and requiring skilled staff to maintain and operate it. Aiven Kafka is now making it trivially easy to have your own managed Kafka cluster.

The easy streaming log service for your microservices

Our Web Console allows you to launch Aiven Kafka in any of our supported clouds and regions with a couple of clicks. All Aiven services are available in all Amazon Web Services, Google Cloud, DigitalOcean and UpCloud regions allowing you to launch services near you in minutes.



Aiven Kafka is a first-class service in Aiven, meaning we'll take care of fault-tolerance, monitoring and maintenance operations on your behalf. In case you need to get more performance out of your Kafka cluster, you can simply expand your cluster by selecting a bigger plan and all your data will be automatically migrated to beefier nodes without any downtime.

Our startup Kafka plan

If you want to try out Kafka on a modestly powered three node cluster and don't need Kafka REST our Startup-2 plan will get you started. After getting started you can easily later upgrade to a larger plan if needed.
  • Startup-2: 1 CPU, 2 GB RAM, 30 GB SSD at $200 / month ($0.274 / hour)

Our three node business Kafka plans

Our Business plans are three node clusters which are deployed alongside Kafka REST to allow the use of HTTP REST calls for interacting with Kafka.
  • Business-4: 1 CPU, 4 GB RAM, 200 GB SSD at $500 / month ($0.685 / hour)
  • Business-8: 2 CPU, 8 GB RAM, 400 GB SSD at $1000 / month ($1.370 / hour)
  • Business-16: 4 CPU, 16 GB RAM, 800 GB SSD at $2000 / month ($2.740 / hour)
 

Highly-available five node premium Kafka plans

If you want an even higher level of reliability and performance our Premium Aiven Kafka plans are made for this. They all come with five (or more for custom plans) Kafka broker nodes. 
  • Premium-4: 1 CPU, 4 GB RAM, 200 GB SSD at $800 / month ($1.096 / hour)
  • Premium-8: 2 CPU, 8 GB RAM, 400 GB SSD at $1600 / month ($2.192 / hour)
  • Premium-16: 4 CPU, 16 GB RAM, 800 GB SSD at $3200 / month ($4.384 / hour)
Also if you need to find larger or otherwise customized plans, please don't hesitate to contact us.

 

Trying Aiven is free, no credit card required

Remember that trying Aiven is free: you will receive US$10 worth of free credits at sign-up which you can use to try any of our service plans.

Go to https://aiven.io/ to get started!

We value your feedback

We are always interested in ways of making our service better. Please send your feedback and suggestions via email, Facebook, LinkedIn or using our support system.

 Cheers,

   Aiven Team

2016-06-17

Even bigger PostgreSQL plans now available

We've just launched the new 64, 120 and 160 style PostgreSQL plans in multiple clouds. These new plans allow you to run larger and larger PostgreSQL instances in the cloud. The new plans are available in our Startup, Business and Premium flavors supporting various levels of high-availability. The number after the plan flavor designates the RAM available for database in use in the plan. CPU count and storage also grow with larger plans giving you more resources to run your transactions.


All of the new plans are available in both Amazon Web Services and Google Cloud, and the 64 and 120 plans are also available in UpCloud. We hope we can offer bigger plans in DigitalOcean in the near future as well.

The pricing for our new plans is available on our PostgreSQL service page.  Remember that trying out Aiven is free, you'll receive US$10 free credits on sign up which allows you to run one of our huge new plans for some hours, or a small instance for a couple of weeks.

Cheers,
Team Aiven

2016-05-12

Help test PostgreSQL 9.6 via Aiven

The first beta of the upcoming PostgreSQL 9.6 major release was announced today with a number of important new features such as parallel queries, enhanced foreign data wrappers and various performance improvements for large databases.

To make it easier to test the new beta as well as to validate your applications compatibility with PostgreSQL 9.6 we've added support for it in Aiven.  When creating a new service in the console you can now select "PostgreSQL 9.6 Beta" as your service type, but note that this is a beta version and there are no guarantees about data durability.

When the PostgreSQL 9.6 final release comes out - expected this September - we'll provide you with one-click minimal downtime upgrade functionality allowing you to upgrade your current PostgreSQL 9.5 production databases to the new version with little effort.  We'll also make it possible to "fork" a 9.5 production database into a new 9.6 database allowing you to perform further validation of the new version without disrupting the production system.

Test PostgreSQL 9.6 in Aiven for free

You can sign up for a free trial of the Aiven Cloud Database service at https://aiven.io/ and try out PostgreSQL 9.6 beta using the US$10 worth of free credits we provide you at sign-up.  Aiven PostgreSQL is available in all Amazon Web Services, Google Cloud, DigitalOcean and UpCloud regions.


Cheers,
Team Aiven

2016-04-28

PostgreSQL cloud backups with PGHoard

PGHoard is the cloud backup and restore solution we're using in Aiven. We started PGHoard development in early 2015 when the Aiven project was launched as a way to provide real-time streaming backups of PostgreSQL to a potentially untrusted cloud object storage.

PGHoard has an extensible object storage interface, which currently works with the following cloud object stores:
  • Amazon Web Services S3
  • Google Cloud Storage
  • OpenStack Swift
  • Ceph's RADOSGW utilizing either the S3 or Swift drivers 
  • Microsoft Azure Storage (currently experimental)
  •  

Data integrity

PostgreSQL backups consist of full database backups, basebackups, plus write ahead logs and related metadata, WAL. Both basebackups and WAL are required to create and restore a consistent database.

PGHoard handles both the full, periodic backups (driving pg_basebackup) as well as streaming the write-ahead-log of the database.  Constantly streaming WAL as it's generated allows PGHoard to restore a database to any point in time since the oldest basebackup was taken.  This is used to implement Aiven's Database Forks and Point-in-time-Recovery as described in our PostgreSQL FAQ.

To save disk space and reduce the data that needs to be sent over the network (potentially incurring extra costs) backups are compressed by default using Google's Snappy, a fast compression algorithm with a reasonable compression ratio. LZMA (a slower algorithm with very high compression ratio) is also supported.

To protect backups from unauthorized access and to ensure their integrity PGHoard can also transparently encrypt and authenticate the data using RSA, AES and SHA256.  Each basebackup and WAL segments gets a unique random AES key which is encrypted with RSA.  HMAC-SHA256 is used for file integrity checking.

Restoration is key

As noted in the opening paragraph, PGHoard is a backup and restore tool: backups are largely useless unless they can be restored.  Experience tells us that backups, even if set up at some point, are usually not restorable unless restore is routinely tested, but experience also shows that backup restoration is rarely practiced unless it's easy to do and automate.

This is why PGHoard also includes tooling to restore backups, allowing you to create new master or standby databases from the object store archives.  This makes it possible to set up a new database replica with a single command, which first restores the database basebackup from object storage and then sets up PostgreSQL's recovery.conf to fetch the remaining WAL files from the object storage archive and optionally connect to an existing master server after that.

Preparing PostgreSQL for PGHoard

First, we will need to create a replication user account. We'll just use the psql command-line client for this:

postgres=# CREATE USER backup WITH REPLICATION PASSWORD 'secret';
CREATE ROLE


We also need to allow this new user to make connections to the database. In PostgreSQL this is done by editing the pg_hba.conf configuration file and adding a line something like this:

host  replication  backup  127.0.0.1/32  md5

We'll also need to ensure our PostgreSQL instance is configured to allow WAL replication out from the server and it has the appropriate wal_level setting. We'll edit postgresql.conf and edit or add the following settings:

max_wal_senders = 2  # minimum two with pg_receivexlog mode!
wal_level = archive  # 'hot_standby' or 'logical' are also ok


Finally, since we have modified PostgreSQL configuration files, we'll need to restart PostgreSQL to take the new settings into use by running "pg_ctl restart", "systemctl restart postgresql" or "service postgresql restart", etc depending on the Linux distribution being used.  Note that it's not enough to "reload" PostgreSQL in case the WAL settings were changed.

Now we are ready on the PostgreSQL side and can move on to PGHoard.

Installing PGHoard

PGHoard's source distribution includes packaging scripts for Debian, Fedora and Ubuntu.  Instructions for building distribution specific packages can be found in the PGHoard README.  As PGHoard is a Python package it can also be installed on any system with Python 3 by running "pip3 install pghoard".

Taking backups with PGHoard

PGHoard provides a number of tools that can be launched from the command-line:
  • pghoard - The backup daemon itself, can be run under systemd or sysvinit
  • pghoard_restore - Backup restoration tool
  • pghoard_archive_sync - Command for verifying archive integrity
  • pghoard_create_keys - Backup encryption key utility
  • pghoard_postgres_command - Used as PostgreSQL's archive_command and restore_command
First, we will launch the pghoard daemon to start taking backups. pghoard requires a small JSON configuration file that contains the settings for the PostgreSQL connection and for the target backup storage. We'll name the file pghoard.json:

{
    "backup_location": "./metadata",
    "backup_sites": {
        "example-site": {
            "nodes": [
                {
                    "host": "127.0.0.1",
                    "password": "secret",
                    "port": 5432,
                    "user": "backup"
                }
            ],
            "object_storage": {
                "storage_type": "local",
                "directory": "./backups"
            }
        }
    }
}


In the above file we just list where pghoard keep's its local working directory (backup_location), our PostgreSQL connection settings (nodes) and where we want to store the backups (object_storage). In this example we'll just write the backup files to a local disk instead of a remote cloud object storage.

Then we just need to run the pghoard daemon and point it to our configuration file:

$ pghoard --short-log --config pghoard.json
DEBUG   Loading JSON config from: './pghoard.json', signal: None
INFO    pghoard initialized, own_hostname: 'ohmu1', cwd: '/home/mel/backup'
INFO    Creating a new basebackup for 'example-site' because there are currently none
INFO    Started: ['/usr/bin/pg_receivexlog', '--status-interval', '1', '--verbose', '--directory', './metadata/example-site/xlog_incoming', '--dbname', "dbname='replication' host='127.0.0.1' port='5432' replication='true' user='backup'"], running as PID: 8809
INFO    Started: ['/usr/bin/pg_basebackup', '--format', 'tar', '--label', 'pghoard_base_backup', '--progress', '--verbose', '--dbname', "dbname='replication' host='127.0.0.1' port='5432' replication='true' user='backup'", '--pgdata', './metadata/example-site/basebackup_incoming/2016-04-28_0'], running as PID: 8815, basebackup_location: './metadata/example-site/basebackup_incoming/2016-04-28_0/base.tar'
INFO    Compressed 16777216 byte file './metadata/example-site/xlog_incoming/000000010000000000000025' to 805706 bytes (4%), took: 0.056s
INFO    'UPLOAD' transfer of key: 'example-site/xlog/000000010000000000000025', size: 805706, took 0.003s
INFO    Ran: ['/usr/bin/pg_basebackup', '--format', 'tar', '--label', 'pghoard_base_backup', '--progress', '--verbose', '--dbname', "dbname='replication' host='127.0.0.1' port='5432' replication='true' user='backup'", '--pgdata', './metadata/example-site/basebackup_incoming/2016-04-28_0'], took: 0.331s to run, returncode: 0
INFO    Compressed 16777216 byte file './metadata/example-site/xlog_incoming/000000010000000000000026' to 797357 bytes (4%), took: 0.057s
INFO    'UPLOAD' transfer of key: 'example-site/xlog/000000010000000000000026', size: 797357, took 0.011s
INFO    Compressed 80187904 byte file './metadata/example-site/basebackup_incoming/2016-04-28_0/base.tar' to 15981960 bytes (19%), took: 0.335s
INFO    'UPLOAD' transfer of key: 'example-site/basebackup/2016-04-28_0', size: 15981960, took 0.026s



PGHoard automatically connected to the PostgreSQL database server, noticed that we don't have any backups and immediately created a new basebackup and started the realtime streaming of WAL files (which act as incremental backups). Each file stored in the backups was first compressed for optimizing the transfer and storage costs.

As long as you keep PGHoard running, it will make full backups using the default schedule (once per 24 hours) and continuously stream WAL files.

Looking at the contents of the "backups" directory, we see that our backups now contain a full database backup plus a couple of WAL files, and some metadata for each of the files:

$ find backups/ -type f
backups/example-site/xlog/000000010000000000000025
backups/example-site/xlog/000000010000000000000025.metadata
backups/example-site/xlog/000000010000000000000026
backups/example-site/xlog/000000010000000000000026.metadata
backups/example-site/basebackup/2016-04-28_0
backups/example-site/basebackup/2016-04-28_0.metadata


Available backups can be listed with the pghoard_restore tool:

$ pghoard_restore list-basebackups --config pghoard.json
Available 'example-site' basebackups:

Basebackup                                Backup size    Orig size  Start time
----------------------------------------  -----------  -----------  --------------------
example-site/basebackup/2016-04-28_0            15 MB        76 MB  2016-04-28T06:40:46Z


Looks like we are all set. Now let's try restore!

Restoring a backup

Restoring a backup is a matter of running a single command:

$ pghoard_restore get-basebackup --config pghoard.json --target-dir restore-test
Found 1 applicable basebackup

Basebackup                                Backup size    Orig size  Start time
----------------------------------------  -----------  -----------  --------------------
example-site/basebackup/2016-04-28_0            15 MB        76 MB  2016-04-28T06:40:46Z
    metadata: {'compression-algorithm': 'snappy', 'start-wal-segment': '000000010000000000000026', 'pg-version': '90406'}

Selecting 'example-site/basebackup/2016-04-28_0' for restore
Basebackup complete.
You can start PostgreSQL by running pg_ctl -D restore-test start
On systemd based systems you can run systemctl start postgresql
On SYSV Init based systems you can run /etc/init.d/postgresql start


The pghoard_restore command automatically chooses the latest available backup, downloads, unpacks (and decompresses and decrypts, when those options are used) it to the specified target directory. The end result will be a complete PostgreSQL data directory (e.g. something like /var/lib/postgresql/9.5/main or /var/lib/pgsql/data, depending on the distro), ready to be used by a PostgreSQL instance.

There are more command-line options for more detailed control over the restoration process, for example restoring to a particular point in time or transaction (PITR) or choosing whether the restored database will be acting as a master or a standby.

Backup encryption

In order to encrypt our backups, we'll need to create an encryption key pair. PGHoard provides a handy command for automatically creating a key pair and storing it into our configuration file:

$ pghoard_create_keys --key-id example --config pghoard.json
Saved new key_id 'example' for site 'example-site' in 'pghoard.json'
NOTE: The pghoard daemon does not require the 'private' key in its configuration file, it can be stored elsewhere to improve security


Note that in most cases you will want to extract the private key away from the configuration file and store it safely elsewhere away from the machine that makes the backups. The pghoard daemon only needs the encryption public key during normal operation. The private key is only required by the restore tool and the daemon while restoring a backup.

Uploading backups to the cloud

Sending backups to an object storage in the cloud is simple: we just need the cloud's access credentials and we'll modify the object_storage section pghoard.json:

            "object_storage": {
                "aws_access_key_id": "XXX",
                "aws_secret_access_key": "XXX",
                "bucket_name": "backups",
                "region": "eu-central-1",
                "storage_type": "s3"
            }


Now when we restart pghoard, the backups are sent to AWS S3 in Frankfurt:

$ pghoard --short-log --config pghoard.json
DEBUG   Loading JSON config from: './pghoard.json', signal: None
INFO    pghoard initialized, own_hostname: 'ohmu1', cwd: '/home/mel/backup'
INFO    Started: ['/usr/bin/pg_receivexlog', '--status-interval', '1', '--verbose', '--directory', './metadata/example-site/xlog_incoming', '--dbname', "dbname='replication' host='127.0.0.1' port='5432' replication='true' user='backup'"], running as PID: 8001
INFO    Creating a new basebackup for 'example-site' because there are currently none
INFO    Started: ['/usr/bin/pg_basebackup', '--format', 'tar', '--label', 'pghoard_base_backup', '--progress', '--verbose', '--dbname', "dbname='replication' host='127.0.0.1' port='5432' replication='true' user='backup'", '--pgdata', './metadata/example-site/basebackup_incoming/2016-04-28_1'], running as PID: 8014, basebackup_location: './metadata/example-site/basebackup_incoming/2016-04-28_1/base.tar'
INFO    Ran: ['/usr/bin/pg_basebackup', '--format', 'tar', '--label', 'pghoard_base_backup', '--progress', '--verbose', '--dbname', "dbname='replication' host='127.0.0.1' port='5432' replication='true' user='backup'", '--pgdata', './metadata/example-site/basebackup_incoming/2016-04-28_1'], took: 0.350s to run, returncode: 0
INFO    Compressed and encrypted 16777216 byte file './metadata/example-site/xlog_incoming/000000010000000000000027' to 799445 bytes (4%), took: 0.406s
INFO    Compressed and encrypted 16777216 byte file './metadata/example-site/xlog_incoming/000000010000000000000028' to 797784 bytes (4%), took: 0.137s
INFO    Compressed and encrypted 80187904 byte file './metadata/example-site/basebackup_incoming/2016-04-28_1/base.tar' to 15982372 bytes (19%), took: 0.417s
INFO    'UPLOAD' transfer of key: 'example-site/xlog/000000010000000000000028', size: 797784, took 0.885sINFO    'UPLOAD' transfer of key: 'example-site/xlog/000000010000000000000027', size: 799445, took 1.104s
INFO    'UPLOAD' transfer of key: 'example-site/basebackup/2016-04-28_1', size: 15982372, took 4.911s



The restore tool works the same way regardless of where the backups are stored:

$ pghoard_restore list-basebackups --config pghoard.json
Available 'example-site' basebackups:

Basebackup                                Backup size    Orig size  Start time
----------------------------------------  -----------  -----------  --------------------
example-site/basebackup/2016-04-28_1            15 MB        76 MB  2016-04-28T09:39:37Z



PostgreSQL 9.2+ and Python 3.3+ required

Today we released PGHoard version 1.2.0 with support for Python 3.3 and PostgreSQL 9.2 plus enhanced support for handling network outages.  These features were driven by external users, in Aiven we always use the latest PostgreSQL versions (9.5.2 at the time of writing) and access object storages near the database machines.


PGHoard in Aiven.io

We're happy to talk more about PGHoard and help you set up your backups with it.  You can also sign up for a free trial of our Aiven.io PostgreSQL service where PGHoard will take care of your backups.


Cheers,
Team Aiven