In my last blog post, I introduced the idea of a PostgreSQL array, and showed how we can insert data into a table using either the curly-brace {} syntax, or the ARRAY construction syntax. In this post, I want to talk about PostgreSQL indexes and length — what happens when we retrieve from indexes that exist (and don’t), how we can construct multidimensional arrays, and how we can ask PostgreSQL for the length of any dimension of our multidimensional array.
First of all, let’s talk about how you can retrieve data from arrays. if I still have my three-row table from last time:
[local]/reuven=# select id, stuff from foo; ┌────┬─────────────────────┐ │ id │ stuff │ ├────┼─────────────────────┤ │ 8 │ {abc} │ │ 9 │ {abc,def} │ │ 10 │ {abc,def,"ghi jkl"} │ └────┴─────────────────────┘
What if I want to get just the first value back from the “stuff” column? Well, then I have to take the first element of an array. Most modern languages start to number arrays with index 0; PostgreSQL, by contrast, starts to count them with 1. I can thus ask for just the first element of the “stuff” array from each row with:
[local]/reuven=# select id, stuff[1] from foo; ┌────┬───────┐ │ id │ stuff │ ├────┼───────┤ │ 8 │ abc │ │ 9 │ abc │ │ 10 │ abc │ └────┴───────┘
If we ask for an index that doesn’t exist, we get a NULL value back:
[local]/reuven=# select id, stuff[3] from foo; ┌────┬─────────┐ │ id │ stuff │ ├────┼─────────┤ │ 8 │ [null] │ │ 9 │ [null] │ │ 10 │ ghi jkl │ └────┴─────────┘
Note that I have configured my PostgreSQL client to show NULL values as “[null]”, by putting the following line in my ~/.psqlrc file:
\pset null [null]
Without the above line in your .psqlrc (or running that command in psql yourself manually), you might see blank space for row IDs 8 and 9.
Now, it’s pretty rare for me to pull out a particular value from a PostgreSQL array. Instead, I’m often finding out the lengths of the arrays on which I’m working. I can do this with the array_length function:
[local]/reuven=# select id, stuff, array_length(stuff, 1) from foo; ┌────┬─────────────────────┬──────────────┐ │ id │ stuff │ array_length │ ├────┼─────────────────────┼──────────────┤ │ 8 │ {abc} │ 1 │ │ 9 │ {abc,def} │ 2 │ │ 10 │ {abc,def,"ghi jkl"} │ 3 │ └────┴─────────────────────┴──────────────┘
Notice that array_length is a function that takes two parameters, an array and an integer. The array is what we want to measure. The integer describes which dimension should be measured in the array’s length. If you’re like me, and come from dynamic languages like Ruby and Python in which arrays (or lists) can be of any length, then you should realize here that PostgreSQL arrays can be multidimensional, but each inner array must be of the same length. So, for example, I can create a 2×3 array of integers with
[local]/reuven=# select ARRAY[ARRAY[1,1,1], ARRAY[2,2,2]]; ┌───────────────────┐ │ array │ ├───────────────────┤ │ {{1,1,1},{2,2,2}} │ └───────────────────┘
Trying to have a different inner dimension will not work:
[local]/reuven=# select ARRAY[ARRAY[1,1,1], ARRAY[2,2,2,2]]; ERROR: 2202E: multidimensional arrays must have array expressions with matching dimensions
Assuming that I have a legitimate array, I can get its length:
[local]/reuven=# select array_length(ARRAY[ARRAY[1,1,1], ARRAY[2,2,2]], 1); ┌──────────────┐ │ array_length │ ├──────────────┤ │ 2 │ └──────────────┘
Or I can get the length of the inner dimension:
[local]/reuven=# select array_length(ARRAY[ARRAY[1,1,1], ARRAY[2,2,2]], 2); ┌──────────────┐ │ array_length │ ├──────────────┤ │ 3 │ └──────────────┘
So, when retrieving our rows from the “foo” table:
[local]/reuven=# select id, stuff, array_length(stuff, 1) from foo; ┌────┬─────────────────────┬──────────────┐ │ id │ stuff │ array_length │ ├────┼─────────────────────┼──────────────┤ │ 8 │ {abc} │ 1 │ │ 9 │ {abc,def} │ 2 │ │ 10 │ {abc,def,"ghi jkl"} │ 3 │ └────┴─────────────────────┴──────────────┘
I can get the array length in a separate column, as in this example. Or I can even sort in descending order of the array length:
[local]/reuven=# select id, stuff, array_length(stuff, 1) from foo order by array_length(stuff, 1) desc; ┌────┬─────────────────────┬──────────────┐ │ id │ stuff │ array_length │ ├────┼─────────────────────┼──────────────┤ │ 10 │ {abc,def,"ghi jkl"} │ 3 │ │ 9 │ {abc,def} │ 2 │ │ 8 │ {abc} │ 1 │ └────┴─────────────────────┴──────────────┘
Notice that our ORDER BY clause has to repeat the function that we used to create the third column. Another way to do this is to declare an alias for the output of array_length, and then use the alias in ORDER BY:
select id, stuff, array_length(stuff, 1) len from foo order by len desc; ┌────┬─────────────────────┬─────┐ │ id │ stuff │ len │ ├────┼─────────────────────┼─────┤ │ 10 │ {abc,def,"ghi jkl"} │ 3 │ │ 9 │ {abc,def} │ 2 │ │ 8 │ {abc} │ 1 │ └────┴─────────────────────┴─────┘
Next time, we’ll look at how we can manipulate arrays using array functions.
[…] few blog posts in this series talking about PostgreSQL’s arrays — how to create them, query them, and even create them with the ARRAY function. But there are cases in which I would like to do […]
[…] I have described how you can create arrays and retrieve information from them — both the actual data stored in the array, and information about the array, such as its length. But the coolest trick, or set of tricks, that […]