In the database, I have various alpha-numeric strings in the following format:
10_asdaasda
100_inkskabsjd
11_kancaascjas
45_aksndsialcn
22_dsdaskjca
100_skdnascbka
I want them to essentially be sorted by the number in front of the string and then the string name itself, but of course, characters are compared one by one and so the result of Order by name produces:
10_asdaasda
100_inkskabsjd
100_skdnascbka
11_kancaascjas
22_dsdaskjca
45_aksndsialcn
instead of the order I'd prefer:
10_asdaasda
11_kancaascjas
22_dsdaskjca
45_aksndsialcn
100_inkskabsjd
100_skdnascbka
Honestly, I would be fine if the strings were just sorted by the number in front. I'm not too familiar with PostgreSQL, so I wasn't sure what the best way to do this would be. I'd appreciate any help!
You can use regular expressions with substrings
There is a way to do it with an index over an expression. It wouldn't be my preferred solution (I would go for Brad's) but you can create an index on the following expression (there are more ways to do it):
Then you can search and order by
CAST(SPLIT_PART(columname, '_', 1) AS integer)
every time you need the number before the underline character, such as:You can do the same to the string part by creating an index on
SPLIT_PART(columname, '_', 2)
, and then sort accordingly as well.As I said, however, I find this solution very ugly. I would definitely go with two other columns (one for the number and one for the string), then maybe even removing the column you mention here.
You should add a new column to the database which is has numeric data type and on persisting a new record set it to the same value as the prefix on the string value you have.
Then you can create an index on the properly typed numeric column for sorting.
The ideal way would be to normalize your data and split the two components of the column into two individual columns. One of type
integer
, onetext
.With the current table, you can do something like demonstrated here:
The same
substring()
expressions can be used to split the column.The regular expressions are somewhat fault tolerant:
The first regex picks the longest numeric string from the left,
NULL
if no digits are found, so the cast tointeger
can't go wrong.The second regex picks the rest of the string from the first character that is not a digit or '_'.
If the underscore is unambiguous as separator anyway,
split_part()
is faster:Answer for your example