I am using Google BigQuery and I am executing some simple queries from PHP. (e.g. SELECT * from emails WHERE email='mail@test.com') I am just checking if the email exists in the table.
The table "emails" is empty for now. But still the PHP script takes around 4 minutes to check 175 emails on an empty table .. As I wish in future the table will be filled and will have 500 000 mails then I guess the request time will be longer.
Is that normal ? Or are there any ideas/solutions to improve the checking time ?
(P.S. : The table "emails" contains only 8 columns, all are string type)
Thank you !
Here is a sample code in PHP to do streaming inserts, using the official https://github.com/google/google-api-php-client:
If you are just checking for existence of a field, consider using
SELECT COUNT(*) FROM emails where email='mail@test.com'
instead. This will only require reading a single field, and so will cost less and be marginally faster on large tables.And as Pentium10 suggested, consider using multiple lookups in a single query. You could do this like:
You're going to be limited to something like 64k of these in a single query, but it should be very fast to compute since it only requires scan of a single column in one pass.
Alternately,if you wanted the e-mails as one per row, you could do something a little bit fancier like
As a further optimization, you could do it as a LEFT JOIN:
If the interesting_emails had the list of emails you wanted to check, like
If the emails table contained only mail1@ and maiil2@, then you'd get back as results:
The advantage of doing it this way is that it will scale up to the billions of e-mails if needed (when the number gets large you might consider using a JOIN EACH instead of a JOIN).