Today’s post is (super) short, we’re going to look at how we can use the Postgres Copy command within an Ecto migration to populate a lookup table.
Motivation
Why would you want to do this? Well, to be honest often you wouldn’t. If you have a lookup table with not a huge amount of data, say maybe a list of product categories, just explicitly inserting the data within a migration works fine, i.e.
If your source data is in a file, you could still do something like the above. You would just read the data from the file instead of specifying it in the migration.
What if you are dealing with a significant amount of data however? Recently I needed to insert ~900k lookup records into a table. Inserting that many records one by one is going to take forever!
Using Copy within a migration
Postgres has a Copy command which is designed exactly for the purpose of importing or exporting large amounts of data to or from a file.
And luckily, it is pretty easy to use within a migration:
With the above, all we are doing is specifying the COPY
command in our migration. We can use COPY
just as we would with Postgres directly. By making use of the COPY
command and an Ecto stream we speed up the migration significantly.
When dealing with a large amount of data, this is still going to be a slow operation, it takes around 5 minutes for the 900k rows of data I am dealing with to get into the database. This is not great, but much better than using individual Repo.insert
statements… which for this amount of data, would not be realistic.
Summary
So as can be seen, it’s pretty easy to insert a large amount of data directly within a migration. I’m still not sure how I feel about it however. It might make sense to create external scripts and keep the data insertion out of the migration. This way running the migration is quick… the downside being you now have more steps to get your database up and populated with data. Like most things in life, it’s a tradeoff!
Thanks for reading and I hope you enjoyed the post!