Monday, April 20, 2009

SQL Loader Memory Parameters

There are few command line SQLLoader parameters which are used for tuning direct data load. These parameters should be tuned only when we have any performance issue for loading the data through SQLLoader. Since these parameters are directly impacting client memory... This article is written in Oracle10g and these parameter might change in future Oracle Versions...

Here are the below parameters which are used only for Direct Path Data load.....

READSIZE is size of the buffer used to read the data from input data file. The default size is 1MB(1048576 bytes).

COLUMNARRAYROWS is number of rows in a two-dimensional array used to hold field information. This holds field information for every row. The default is 5000 rows.

STREAMSIZE is size of the buffer to send the data to server for loading into target table. The default is 256000 bytes.

MULTITHREADING is an option that allows concurrent execution of some SQL*Loader client operations with database server data loading. If multithreading is turned off, the SQL*Loader client will always wait until the server finishes a data load before continuing. If multithreading is turned on, some calls to the server will be executed by the client's "load thread" while the client's "main thread" continues converting data and building stream buffers.

When the MULTITHREADING command-line parameter is set to TRUE (the default on multi-CPU clients), SQL*Loader will overlap stream loading with stream conversion. If the main thread has converted data from a column array into a stream and the stream is filled before all data in the column array has been processed, the load thread will load that stream while the main thread continues to convert the column array into another stream buffer.

DATE_CACHE used in direct path load. This parameter will play a role when we load the date data from input file into Oracle table. When we have date value in input file, sqlloader has to convert the date string into Oracle DATE Format before load into target table. When we have large duplicate date string in input file, then sqlloader keeps the converted date value in date_cache first time and reuse every occurrence of the same date string in input file. So it would save time for date conversion when we have the same date string in the input file.

If you know the number of duplicate date strings is small or the number of unique date strings is very large, then you can disable the date cache by specifying DATE_CACHE=0 on the command line. The default value for DATE_CACHE is 1000. If number of unique date values in table greater then the size of the date cache, then DATE_CACHE will be disabled. All the date columns will share the same DATE_CACHE.

Please click this link if you need more info about DATE_CACHE.

Here is the process flow between these three memory parameters(READSIZE, COLUMNARRAYROWS and STREAMSIZE) for DIRECT Path data load...

Step1

SQLLoader reads data from input data file and load into memory buffer. This buffer size is controlled by READSIZE parameter.

Step2

SQLLoader parse every logical record from step1, and isolate the fields in COLUMNARRAY memory buffer based on field definition which is specified in SQLLoader control file. This process is called as "Field setting". The number of rows in the column array is controlled by COLUMNARRAYROWS parameter.

Step3

Step2 will continue untill COLUMNARRAY buffer is full.

Step4

SQLLoader will parse the columnarray data into STREAMBUFFER. This buffer size is controlled by STREAMSIZE parameter. There may be a possibility that, STREAMBUFFER might be getting full before it copies all the data from COLUMNARRAY buffer.

Step5

STREAMBUFFER data will be sent to server.

Step6

Server will parse STREAMBUFFER data and load into target table.

Step7

Once step6 is completed, then STREAMBUFFER will be set to empty.

Step8

If more data that needs to be loaded from COLUMNARRAYBUFFER, then continue from step4. If there is no data in COLUMNARRAYBUFFER, then continue step9.

Step9

If the COLUMNARRAY buffer data is completed(loaded into target table via streambuffer), then sqlloader will look for remaining data in READBUFFER. If there are more data in READBUFFER, then continue from step2. Otherwise, continue step10

Step10

If all the records in READBUFFER is processed, then load more records from input data file into READBUFFER which is apparently from step1.

Note : If you need more info on these DIRECT Load Tuning parameters, please click this link.

Thursday, April 2, 2009

Integer Versus Number data type

What is the difference betwen INTEGER and NUMBER? When should we use NUMBER and when should we use INTEGER? I just wanted to update my comments here...

NUMBER always stores as we entered. Scale is -84 to 127. But INTEGER rounds to whole number. The scale for INTEGER is 0. INTEGER is equivalent to NUMBER(38,0). It means, INTEGER is constrained number. The decimal place will be rounded. But NUMBER is not constrained.

INTEGER(12,2) => 12
INTEGER(12.5) => 13
INTEGER(12.9) => 13
INTEGER(12.4) => 12
NUMBER(12,2) => 12.2
NUMBER(12.5) => 12.5
NUMBER(12.9) => 12.9
NUMBER(12.4) => 12.4

INTEGER is always slower then NUMBER. Since integer is a number with added constraint. It takes additional CPU cycles to enforce the constraint. I never watched any difference, but there might be a difference when we load several millions of records on the INTEGER column. If we need to ensure that the input is whole numbers, then INTEGER is best option to go. Otherwise, we can stick with NUMBER data type.