Optimize escape JSON in PostgreSQL 9.0

2019-03-29 11:32发布

问题:

I'm currently using this JSON escaping function in PostgreSQL as a stand in for future native JSON support. While it works, it's also limiting our systems performance. How can I go about optimizing it? Maybe some kind of lookup array?

CREATE OR REPLACE FUNCTION escape_json(i_text TEXT) 
RETURNS TEXT AS
$body$                                                  
DECLARE
  idx INTEGER;
  text_len INTEGER;   
  cur_char_unicode INTEGER;
  rtn_value TEXT := i_text;
BEGIN
  -- $Rev: $ --
  text_len = LENGTH(rtn_value);
  idx = 1; 

  WHILE (idx <= text_len) LOOP
    cur_char_unicode = ASCII(SUBSTR(rtn_value, idx, 1));

    IF cur_char_unicode > 255 THEN
      rtn_value = OVERLAY(rtn_value PLACING (E'\\u' || LPAD(UPPER(TO_HEX(cur_char_unicode)),4,'0')) FROM idx FOR 1);
      idx = idx + 5;
      text_len = text_len + 5;
    ELSE
      /* is the current character one of the following: " \ / bs ff nl cr tab */
      IF cur_char_unicode IN (34, 92, 47, 8, 12, 10, 13, 9) THEN
        rtn_value = OVERLAY(rtn_value PLACING (E'\\' || (CASE cur_char_unicode
                                                         WHEN 34 THEN '"'
                                                         WHEN 92 THEN E'\\'
                                                         WHEN 47 THEN '/'
                                                         WHEN  8 THEN 'b'
                                                         WHEN 12 THEN 'f'
                                                         WHEN 10 THEN 'n'
                                                         WHEN 13 THEN 'r'
                                                         WHEN  9 THEN 't'
                                                          END)
                                        )
                                FROM idx FOR 1);

        idx = idx + 1;
        text_len = text_len + 1;
      END IF;
    END IF;

    idx = idx + 1;
  END LOOP;                   

  RETURN rtn_value;
END;
$body$
LANGUAGE plpgsql;

回答1:

All my approaches boil down to "do it some other way":

  • Write it in some other language, e.g. use pl/perl, pl/python, pl/ruby
  • Write a wrapper round some external JSON library written in C
  • Do the JSON escaping in the client rather than in the query (assuming your client has some good JSON escaping support)

In my experience pl/pgsql isn't fast at this sort of thing- its strength is in its integral support for exchanging data with the database, not as a general-purpose programming language.

Example:

create or replace function escape_json_perl(text) returns text 
  strict immutable
  language plperlu as $$
    use JSON;
    return JSON->new->allow_nonref->encode($_[0]);
  $$;

A quick test suggests this is on the order of 15x faster than the plpgsql function (although it returns quotes around the value which you probably want to strip off)



回答2:

Confession: I am the Google Summer of Code 2010 student who was going to try to bring JSON support to PostgreSQL 9.1. Although my code was fairly feature-complete , it wasn't completely ready for upstream, and the PostgreSQL development community was looking at some alternative implementations. However, with spring break coming up, I'm hoping to finish my rewrite and give it a final push this week.

In the mean time, you can download and install the work-in-progress JSON data type module, which should work on PostgreSQL 8.4.0 and up. It is a PGXS module, so you can compile and install it without having to compile all of PostgreSQL. However, you will need the PostgreSQL server development headers.

Installation goes something like this:

git clone git://git.postgresql.org/git/json-datatype.git
cd json-datatype/
USE_PGXS=1 make
sudo USE_PGXS=1 make install
psql -f json.sql <DBNAME1> # requires database superuser privileges

Although the build and install only needs to be done once, json.sql needs to be run on every database you plan to use the JSON data type on.

With that installed, you can now run:

=> SELECT to_json(E'"quotes and \n newlines"\n'::TEXT);
            to_json             
--------------------------------
 "\"quotes and \n newlines\"\n"
(1 row)

Note that this does not escape non-ASCII characters.



回答3:

I have found a PostgreSQL function implemented in C here : http://code.google.com/p/pg-to-json-serializer/

I have not compared it with your PLSQL method but it should be faster than any interpreted language.

Another one : http://miketeo.net/wp/index.php/projects/json-functions-for-postgresql