I have a database (Postgres 9.3.5) of 80 millions records, the insert
query below fails with:
ERROR: invalid input syntax for integer: ""
INSERT INTO DISCOGS.TRACK_DURATION
SELECT
track_id,
duration,
hours_as_seconds + minutes_as_seconds + seconds as total_seconds
FROM (
select
track_id,
duration,
CASE
WHEN duration like '%:%:%' THEN (split_part(duration, ':', 1))::bigint * 60 * 60
ELSE 0
END as hours_as_seconds,
CASE
WHEN duration like '%:%:%' THEN (split_part(duration, ':', 2))::bigint * 60
WHEN duration like '%:%' THEN (split_part(duration, ':', 1))::bigint * 60
ELSE 0
END as minutes_as_seconds,
CASE
WHEN duration like '%:%:%' THEN (split_part(duration, ':', 3))::bigint
WHEN duration like '%:%' THEN (split_part(duration, ':', 2))::bigint
ELSE 0
END as seconds
from discogs.track t1
where release_id < 10000000
and t1.duration!='' and t1.duration is not null
and t1.position!=''
) as s1
I can use the where release_id
to limit the number of records checked and with lower values its fine, so its bad data , but with so many records how do I find the problem data. Note I'm already filtering out values where duration is empty string and I also found a few records with bad data (such as %%%%) that I have changed but it is still failing.