Turning PostgreSQL rows into arrays

May 23, 2014 . By Reuven

So far in this series, I have described how you can create arrays and retrieve information from them — both the actual data stored in the array, and information about the array, such as its length. But the coolest trick, or set of tricks, that I use in PostgreSQL is the ability to turn arrays into rows, and vice versa.  This is where, in my mind, arrays start to get truly powerful, and allow us to do all sorts of things that would otherwise be quite difficult, or require another language.

For example, let’s say that I want to keep track of my appointments in a database.  I create an Appointments table:

CREATE TABLE Appointments (id SERIAL,
 meeting_at TIMESTAMP,
 description TEXT
 );

And then I enter information into this table, such that I have the following appointments:

[local]/reuven=# select * from appointments order by meeting_at;
 +----+---------------------+---------------+
 | id | meeting_at          | description   |
 +----+---------------------+---------------+
 | 5 | 2014-05-22 12:00:00  | Dissertation  |
 | 4 | 2014-05-22 14:00:00  | Devel meeting |
 | 1 | 2014-05-23 15:00:00  | Dissertation  |
 | 2 | 2014-05-23 17:00:00  | Dinner        |
 | 3 | 2014-05-23 23:00:00  | Sleep         |
 +----+---------------------+---------------+
 (5 rows)

If I want to find out what I’m doing on May 23rd, I can simply say:

[local]/reuven=# select * from appointments where meeting_at::date = '2014-may-23';
 +----+---------------------+--------------+
 | id | meeting_at          | description  |
 +----+---------------------+--------------+
 | 1 | 2014-05-23 15:00:00  | Dissertation |
 | 2 | 2014-05-23 17:00:00  | Dinner       |
 | 3 | 2014-05-23 23:00:00  | Sleep        |
 +----+---------------------+--------------+

Notice that in the above example, I’m using PostgreSQL’s casting operator (::) to turn the meeting_at column, which contains a TIMESTAMP value, into a DATE value.  Without that,  the comparison would have been at midnight. So far, this isn’t very exciting or surprising.  But let’s say that I want to get all of the descriptions of today’s appointments in a comma-separated list.

Normally, I would need to retrieve all of the rows, and then use another language to piece them together.  With PostgreSQL’s arrays, though, we can do this all in the database — which executes much more quickly, among other things. The way to accomplish this is with the ARRAY function.  Now, don’t confuse the ARRAY[] constructor with the ARRAY function, although it’s hard not to.  (To be honest, I’m not sure how PostgreSQL can tell the difference; it might be the square brackets vs. parentheses that tell it to do different things.)

The ARRAY function is built into PostgreSQL, and turns a set of rows into an array.  That is, ARRAY expects to get rows as its input, and returns a single row, containing a single array, as its output. Let’s start with a set of rows, containing the descriptions of today’s appointments.  I’m going to use PostgreSQL’s concatenation operator (||) to combine the time (using the ::time cast) of the appointment with the appointment itself.  My query looks like:

SELECT meeting_at::time || ' ' || description
FROM Appointments
WHERE meeting_at::date = '2014-may-23'
ORDER BY meeting_at;

The result I get is:

 +-----------------------+
 | ?column?              |
 +-----------------------+
 | 15:00:00 Dissertation |
 | 17:00:00 Dinner       |
 | 23:00:00 Sleep        |
 +-----------------------+

Now that I have my rows, I can turn them into an array.  Now, the ARRAY function needs to be invoked via a SELECT.  Thus, using ARRAY means that we’re using a subselect.  The inner SELECT is what we did above.  The outer one is just our call to ARRAY:

SELECT ARRAY(SELECT meeting_at::time || ' ' || description 
FROM Appointments 
WHERE meeting_at::date = '2014-may-23'
ORDER BY meeting_at);

And sure enough, we get a one-row, one-column result:

 +--------------------------------------------------------------+
 | array                                                        |
 +--------------------------------------------------------------+
 | {"15:00:00 Dissertation","17:00:00 Dinner","23:00:00 Sleep"} |
 +--------------------------------------------------------------+

You can see that our three rows have been turned into three elements of the array.  I do this all of the time when creating reports or summaries of other data: I write a view or CTE that grabs the data from the rows that are of interest to me, and stuffs that data into an array.  I can then check the array’s contents, length, or anything else that might be useful in my analysis. But earlier, I said that I want to have a comma-separated list of the appointment.  I currently have them in an array.  Fortunately, PostgreSQL’s array_to_string can save the day, acting similarly to the “join” method that I know and love from such languages as Ruby and Python.  This function, wrapped around a call to ARRAY, wrapped around our subselect, will then look like:

SELECT array_to_string( 
  ARRAY(SELECT meeting_at::time || ' ' || description
FROM Appointments
WHERE meeting_at::date = '2014-may-23'ORDER BY meeting_at), ', ');

And sure enough, the result is as follows:

 +--------------------------------------------------------+
 | array_to_string                                        |
 +--------------------------------------------------------+
 | 15:00:00 Dissertation, 17:00:00 Dinner, 23:00:00 Sleep |
 +--------------------------------------------------------+

Not bad, for one query; we’ve taken our three rows, and turned them into a string, without having to use anything other than SQL!  There’s just one final problem here that I’d like to fix, namely that when you invoke a function as we did here, the function’s name becomes the name of the column.  I’d like to change that, so the column has a more reasonable name.  This is particularly important if you’re planning to read the column using a high-level language; your code will look awfully funny if it includes statements like:

puts row['array_to_string']

when it could instead read

puts row['appointments_for_today']

Here’s how I can use “AS” to change the name of the column, making it easier to read and work with:

SELECT array_to_string( 
  ARRAY(SELECT meeting_at::time || ' ' || description
FROM Appointments
WHERE meeting_at::date = '2014-may-23'ORDER BY meeting_at), ', ') 
AS appointments_for_today;

It is in these sorts of uses, to aggregate data and create reports, that I’ve found PostgreSQL’s arrays to be particularly useful and powerful. Next time, we’ll see how we can do the opposite, turning an array back into rows — and then, we’ll see how the combination of these conversions lets us perform all sorts of tricks.

Related Posts

Reminder: My “Intro to SQL” course is this Sunday!

Reminder: My “Intro to SQL” course is this Sunday!

Join the data revolution with my “Intro to SQL” course!

Join the data revolution with my “Intro to SQL” course!

Yes, you can master regular expressions!

Yes, you can master regular expressions!
  • What a hack man, it has been such a journey of browsing the internet, droping complex join alternatives, finding the array way, learning that you can’t have different data types in your arrays, and there getting here, just by chance, where the silver bullet finally was.
    Thanks for such enlightening article.

  • Narendar Reddy Madduru says:

    It Helps thanks

  • bingli224 says:

    Thanks a lot. This is what i’ve been looking for.

  • Thanks was looking everywhere for this.

  • It solved my problem exactly. Very well and simply explained. Thanks.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
    >