Friday, February 20, 2009

Character and Byte Semantics in Oracle

Historically Oracle character data type column have been defined by using number of bytes. This is good as long as total number of characters equal to total number of bytes. When we maintain multilingual version of application, the database needs to set as a multi byte character set. For instance, the column data type length is VARCHAR2(10). In this case, we can store 10 characters for single byte character set. When we moved to multi byte character set, we can not store 10 characters. Oracle9i has solved this problem with the introduction of character and byte length semantics. Oracle9i introduced NLS_LENGTH_SEMANTICS init parameter to solve this issue. NLS_LENGTH_SEMANTICS enables you to create CHAR and VARCHAR2 columns using either byte or character length semantics.

There are three different ways, we can declare CHAR/VARCHAR2 data type.
Let us take VARCHAR2(10) as an example here.

1. VARCHAR2(10)
2. VARCHAR2(10 BYTE)
3. VARCHAR2(10 CHAR)

What is the difference between VARCHAR2(10), VARCHAR2(10 BYTE) & VARCHAR2(10 CHAR).

VARCHAR2(10 BYTE)

When we declare the data type as VARCHAR2(10 BYTE), oracle stores only 10 bytes of data, regardless of how many characters this represents. This is perfect when the database has only single byte character set. Since total number of charater is equal to total number of bytes.. So this case, oracle stores 10 Characters.. But when database handles multilingual version of application, then oracle stores multi byte characters. In this case, oracle can store only 5 character.

VARCHAR2(10 CHAR)

This allows the specified number of characters to be stored in the column regardless of number of bytes this equates to.. Oracle can store 10 character regardless of single byte character set or multi byte character set.

VARCHAR2(10)

Oracle stores 10 Character or 10 Bytes depends on NLS_LENGTH_SEMANTICS parameter value. if it sets to BYTE, then we can store 10 bytes(if it is mutibyte character set, then we can store 5 characters). The default value is BYTE. When this parameter is CHAR, then we can store 10 characters regardless of single byte character set or multi byte character set. When we move the database to multibyte characterset, we can change NLS_LENGTH_SEMANTICS paramter to CHAR to resolve the storage issue. Existing columns will not be affected when we change the value for this parameter.

The default character semantics of the database or session can be altered using the NLS_LENGTH_SEMANTICS parameter like:

SQL> alter system set nls_length_semantics=char;

System altered.

SQL> alter system set nls_length_semantics=byte;

System altered.

SQL> alter session set nls_length_semantics=char;

Session altered.

SQL> alter session set nls_length_semantics=byte;

Session altered.

Note : NLS_LENGTH_SEMANTICS does not apply to tables in SYS and SYSTEM. The data dictionary always uses byte semantics.