I am working on a project that needs migration of existing Oracle blobs into files. The environment to read from is a shared Oracle 10gR2 server. Currently I have a script using UTL_FILE. However, the process is pretty slow. It takes around 3 hours to extract 25 GB of sample data. The actual data to be moved is in the order of 1 TB. I need help/advice in tuning this significantly.
Here is my process:
- Open a cursor to fetch the list of blob ids and names
- Start a loop to go thru each blob
- Extract the blob using BLOB2FILE, a custom stored procedure (picked it up from a website and modified it slightly)
here is the code:
create or replace
PROCEDURE BLOB2File(
lngBlobID IN NUMBER,
sFileName IN VARCHAR2,
sDir IN VARCHAR2)
AS
iFileLen INTEGER;
iLineLen INTEGER := 32000; -- max line size for utl_file
vStart NUMBER := 1;
vBlob BLOB;
l_output utl_file.file_type;
my_vr RAW(32000);
iTmp INTEGER;
BEGIN
-- get blob details
LOG_IT('Entered. Blob Id: ' || lngBlobID || ', File Name: ' || sFileName || ', Directory: ' || sDir);
SELECT blobData,
lengthb(blobData)
INTO vBlob,
iFileLen
FROM blobTable
WHERE id = lngBlobID;
LOG_IT('Acquired the blob. Blob size: ' || TO_CHAR(iFileLen));
l_output := utl_file.fopen(sDir, sFileName,'wb', iLineLen);
vStart := 1;
iTmp := iFileLen;
-- if small enough for a single write
IF iFileLen < iLineLen THEN
utl_file.put_raw(l_output,vBlob);
utl_file.fflush(l_output);
ELSE -- write in pieces
vStart := 1;
WHILE vStart < iFileLen AND iLineLen > 0
LOOP
dbms_lob.read(vBlob,iLineLen,vStart,my_vr);
utl_file.put_raw(l_output,my_vr);
utl_file.fflush(l_output);
-- set the start position for the next cut
vStart := vStart + iLineLen;
-- set the end position if less than 32000 bytes
iTmp := iTmp - iLineLen;
IF iTmp < iLineLen THEN
iLineLen := iTmp;
END IF;
END LOOP;
END IF;
utl_file.fclose(l_output);
LOG_IT('Exited');
EXCEPTION
WHEN OTHERS THEN
LOG_IT('**ERROR** ' || SQLERRM, SQLCODE, DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
END;
LOG_IT is a stored proc logging to a table. There should not be any significant hit there. I tried optimizing Step 1 by using BULK FETCH instead of a normal FETCH. However, it didn't yield any significant result.
Can anybody suggest any ideas for improvement or, even better, a more performant way of approaching this?
Assuming that your hardware is sufficient to handle far more than 8 GB/hour of sustained writes to
sDir
(and to handle reading a similar amount fromblobTable
and to handle whatever other I/O your system needs), the simplest option would likely to be to spawn a few parallel sessions each of which is calling this procedure. For example, if you wanted to run three jobs in parallel each of which was extracting one LOB, you could do something like this.You probably don't want to have a separate thread for every BLOB in reality-- you probably want to generate a smaller number of jobs and give them each a range of
lngBlobID
values to work on. The number of jobs Oracle will run at any one time is limited by theJOB_QUEUE_PROCESSES
parameter so you could submit thousands of jobs and just let Oracle limit how many it will run simultaneously.