Knex with PostgreSQL select query extremely perfor

2019-06-15 18:39发布

问题:

In brief

I'm developing a game(of dream) and my backend stack is Node.js and PostgreSQL(9.6) with Knex. I hold all players data here and I need to request it frequently. One of the requests need to make 10 simple selects that would pull the data and this is where the problem starts: these queries are quite fast(~1ms), if server serves only 1 request at the same time. But if server server many requests in parallel(100-400), queries execution time degradates extremely (can be up to several seconds per query)

Details

In order to be more objective, I will describe server's request goal, select queries and the results that I received.

About system

I'm running node code on Digital Ocean 4cpu/8gb droplet and Postgres on the same conf(2 different droplets, same configuration)

About request

It need to do some gameplay actions for which he selects data for 2 players from DB

DDL

Players' data represented by 5 tables:

CREATE TABLE public.player_profile(
    id integer NOT NULL DEFAULT nextval('player_profile_id_seq'::regclass),
    public_data integer NOT NULL,
    private_data integer NOT NULL,
    current_active_deck_num smallint NOT NULL DEFAULT '0'::smallint,
    created_at bigint NOT NULL DEFAULT '0'::bigint,
    CONSTRAINT player_profile_pkey PRIMARY KEY (id),
    CONSTRAINT player_profile_private_data_foreign FOREIGN KEY (private_data)
        REFERENCES public.profile_private_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT player_profile_public_data_foreign FOREIGN KEY (public_data)
        REFERENCES public.profile_public_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_data(
    id integer NOT NULL DEFAULT nextval('player_character_data_id_seq'::regclass),
    owner_player integer NOT NULL,
    character_id integer NOT NULL,
    experience_counter integer NOT NULL,
    level_counter integer NOT NULL,
    character_name character varying(255) COLLATE pg_catalog."default" NOT NULL,
    created_at bigint NOT NULL DEFAULT '0'::bigint,
    CONSTRAINT player_character_data_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_data_owner_player_foreign FOREIGN KEY (owner_player)
        REFERENCES public.player_profile (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_cards(
    id integer NOT NULL DEFAULT nextval('player_cards_id_seq'::regclass),
    card_id integer NOT NULL,
    owner_player integer NOT NULL,
    card_level integer NOT NULL,
    first_deck boolean NOT NULL,
    consumables integer NOT NULL,
    second_deck boolean NOT NULL DEFAULT false,
    third_deck boolean NOT NULL DEFAULT false,
    quality character varying(10) COLLATE pg_catalog."default" NOT NULL DEFAULT 'none'::character varying,
    CONSTRAINT player_cards_pkey PRIMARY KEY (id),
    CONSTRAINT player_cards_owner_player_foreign FOREIGN KEY (owner_player)
        REFERENCES public.player_profile (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_equipment(
    id integer NOT NULL DEFAULT nextval('player_character_equipment_id_seq'::regclass),
    owner_character integer NOT NULL,
    item_id integer NOT NULL,
    item_level integer NOT NULL,
    item_type character varying(20) COLLATE pg_catalog."default" NOT NULL,
    is_equipped boolean NOT NULL,
    slot_num integer,
    CONSTRAINT player_character_equipment_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_equipment_owner_character_foreign FOREIGN KEY (owner_character)
        REFERENCES public.player_character_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_runes(
    id integer NOT NULL DEFAULT nextval('player_character_runes_id_seq'::regclass),
    owner_character integer NOT NULL,
    item_id integer NOT NULL,
    slot_num integer,
    decay_start_timestamp bigint,
    CONSTRAINT player_character_runes_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_runes_owner_character_foreign FOREIGN KEY (owner_character)
        REFERENCES public.player_character_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

With indexes

knex.raw('create index "player_cards_owner_player_first_deck_index" on "player_cards"("owner_player") WHERE first_deck = TRUE');
knex.raw('create index "player_cards_owner_player_second_deck_index" on "player_cards"("owner_player") WHERE second_deck = TRUE');
knex.raw('create index "player_cards_owner_player_third_deck_index" on "player_cards"("owner_player") WHERE third_deck = TRUE');
knex.raw('create index "player_character_equipment_owner_character_is_equipped_index" on "player_character_equipment" ("owner_character") WHERE is_equipped = TRUE');
knex.raw('create index "player_character_runes_owner_character_slot_num_not_null_index" on "player_character_runes" ("owner_character") WHERE slot_num IS NOT NULL');

The code

First query

async.parallel([
    cb => tx('player_character_data')
        .select('character_id', 'id')
        .where('owner_player', playerId)
        .limit(1)
        .asCallback(cb),
    cb => tx('player_character_data')
        .select('character_id', 'id')
        .where('owner_player', enemyId)
        .limit(1)
        .asCallback(cb)
], callbackFn);

Second query

async.parallel([
    cb => tx('player_profile')
        .select('current_active_deck_num')
        .where('id', playerId)
        .asCallback(cb),
    cb => tx('player_profile')
        .select('current_active_deck_num')
        .where('id', enemyId)
        .asCallback(cb)
], callbackFn);

Third q

playerQ = { first_deck: true }
enemyQ = { first_deck: true }
MAX_CARDS_IN_DECK = 5
async.parallel([
    cb => tx('player_cards')
        .select('card_id', 'card_level')
        .where('owner_player', playerId)
        .andWhere(playerQ)
        .limit(MAX_CARDS_IN_DECK)
        .asCallback(cb),
    cb => tx('player_cards')
        .select('card_id', 'card_level')
        .where('owner_player', enemyId)
        .andWhere(enemyQ)
        .limit(MAX_CARDS_IN_DECK)
        .asCallback(cb)
], callbackFn);

Fourth q

MAX_EQUIPPED_ITEMS = 3
async.parallel([
    cb => tx('player_character_equipment')
        .select('item_id', 'item_level')
        .where('owner_character', playerCharacterUniqueId)
        .andWhere('is_equipped', true)
        .limit(MAX_EQUIPPED_ITEMS)
        .asCallback(cb),
    cb => tx('player_character_equipment')
        .select('item_id', 'item_level')
        .where('owner_character', enemyCharacterUniqueId)
        .andWhere('is_equipped', true)
        .limit(MAX_EQUIPPED_ITEMS)
        .asCallback(cb)
], callbackFn);

Fifth one

runeSlotsMax = 3
async.parallel([
    cb => tx('player_character_runes')
        .select('item_id', 'decay_start_timestamp')
        .where('owner_character', playerCharacterUniqueId)
        .whereNotNull('slot_num')
        .limit(runeSlotsMax)
        .asCallback(cb),
    cb => tx('player_character_runes')
        .select('item_id', 'decay_start_timestamp')
        .where('owner_character', enemyCharacterUniqueId)
        .whereNotNull('slot_num')
        .limit(runeSlotsMax)
        .asCallback(cb)
], callbackFn);

EXPLAIN(ANALYZE)

Only Index scans and <1ms for planning and execution times. Can publish if need (did not published to save space)

Time itself

(total is number of requests, min/max/avg/median is for response time)

  • 4 concurrent requests: { "total": 300, "avg": 1.81, "median": 2, "min": 1, "max": 6 }
  • 400 concurrent requests:
    • { "total": 300, "avg": 209.57666666666665, "median": 176, "min": 9, "max": 1683 } - first select
    • { "total": 300, "avg": 2105.9, "median": 2005, "min": 1563, "max": 4074 } - last select

I tried to put slow queries that are performed more than 100ms into logs - nothing. Also tried to increase connection pool size up to num of parallel requests - nothing too.

回答1:

A solution was found fast, but forgot to respond here (was busy, sorry).

No magic with slow queries, but only node's event loop nature:

  • All silimar requests was made in parallel;
  • I have a code block with very slow execution time(~150-200ms);
  • If you have ~800 parallel requests 150ms code block transforms into ~10000ms event loop lag;
  • All you'll see is a visibility of slow requests, but it's just a lag of callback function, not database;

Conclusion: use pgBadger to detect slow queries and isBusy module to detect event loop lags



回答2:

I can see three potential problems here:

  1. 400 concurrent requests are actually quite a lot, and your machine spec is nothing to get excited about. Maybe this has more to do with my MSSQL background, but I would imagine this is a case where you may need to beef up the hardware.
  2. The communication between the two servers should be quite quick, but may account for some of the latency you're seeing. One powerful server may be a better solution.
  3. I assume you have reasonable data volumes (400 concurrent connections should have a lot to store). Perhaps posting some of the actual generated SQL may be useful. A lot depends on the the SQL Knex comes up with, and there may be available optimizations that can be used. Indexing comes to mind, but one would need to see the SQL to say for sure.

Your test doesn't seem to include the network latency from a client, so this may be an additional problem you have not yet accounted for.