How to export Django data faster than ever before

Use our open-source software to take advantage of PostgreSQL’s powerful COPY TO command

Today the California Civic Data Coalition released a new open-source tool that enables the Django web framework to more quickly export comma-delimited data.

Version 2.0 of django-postgres-copy, now available on the Python Package Index, extends Django’s database tools to support PostgreSQL’s powerful COPY TO command.

Found only in PostgreSQL, COPY TO can write out tables with millions of rows in a matter of seconds.

Our code, crafted by the Coalition’s Lead Developer James Gordon, makes using it this easy:

Person.objects.to_csv("/path/to/your/export.csv")

The custom to_csv method does it all. To start using it yourself, all you need to do is install our library and add a custom manager to your model.

from django.db import models
from postgres_copy import CopyManager


class Person(models.Model):
    first_name = models.CharField(max_length=500)
    last_name = models.CharField(max_length=500)
    objects = CopyManager()

Once that’s done you’re ready to roll. You can even export database queries that include filters, groups or other Django database tricks. For instance, this will work:

Person.objects.exclude(first_name='BEN').to_csv('/path/to/your/export.csv')

And so will something like this:

Person.objects.annotate(name_count=Count('first_name')).to_csv('/path/to/your/export.csv')

In cases where your model is connected to other tables with a foreign key, you can increase the number of fields exported by listing them out and calling in related tables using Django’s double underscore notation.

Person.objects.to_csv(
    '/path/to/your/export.csv',
    'first_name',
    'last_name',
    'hometown__name'
)

Why do you need this?

The Coalition invented this tool as part of its open-source quest to master CAL-ACCESS, the jumbled, dirty and difficult government database tracking money in California politics.

We are nearing the completion of a pipeline of Python code that downloads, extracts, cleans, loads, transforms and republishes the state’s raw data as easy-to-understand spreadsheets. This new wrapper for COPY TO allows our pipeline to quickly and clearly export a set of simplified flat files for end users.

What else can it do?

The library has long supported swiftly importing data files with PostgreSQL’s COPY command. Starting today, that old tool is easier to access with a new from_csv method on our custom manager. Code like the following can load millions of records in your database in a matter of seconds.

Person.objects.from_csv(
    # The source file
    "/path/to/your/import.csv",
    # A crosswalk of model fields to CSV headers.
    dict(first_name='FIRST_NAME', last_name='LAST_NAME')
)

What now?

If you’d like to try our tool out for yourself, you can install it with Python’s pip like so:

$ pip install django-postgres-copy

To learn more about how it works, visit the technical documentation. There you’ll find a more complete explanation and information about some fancier tricks not covered here, like the capability to transform and clean data on-the-fly as it’s loaded into the database.

Since it was first released in 2015, django-posgres-copy has drawn contributions from coders around the world, including some major improvements from users in other fields. If there are improvements you’d like to see, go get involved on our GitHub repository.