Character Set Scanner

run csminst.sql on the database you want to scan.
This creates a user named CSMIG.
Assigns the necessary privileges to CSMIG
Assigns the default tablespace to CSMIG
Connects as CSMIG
Creates the Character Set Scanner system  tables under CSMIG

find / -name 'csminst.sql' -print 2>/dev/null

cd $ORACLE_HOME/rdbms/admin
sqlplus "system/manager as sysdba"

csscan system/manager full=y tochar=utf8 array=10240 process=3

localhost: ls -l scan*
-rw-r--r--   1 oracle     dba            915 Oct 29 16:08 scan.err
-rw-r--r--   1 oracle     dba           9067 Oct 29 16:08 scan.out
-rw-r--r--   1 oracle     dba           5194 Oct 29 16:08 scan.txt

Notes:

There are three approaches to converting data from one character set to another:

1 Full Export and Import
2 ALTER DATABASE CHARACTER SET
3 ALTER DATABASE CHARACTER SET and Selective Imports

ALTER DATABASE CHARACTER SET is the fastest way to migrate, but can only be used if the new character set is a strict superset of the current character set.

The new character set is a strict superset of the current character set if:
1 Each and every character in the current character set is available in the new character set.
2 Each and every character in the current character set has the same code point value in the new character set. For example, US7ASCII is a strict subset of many character sets.

Another restriction of the ALTER DATABASE CHARACTER SET statement is that it can be used only when the character set migration is between two single-byte character sets or between two multibyte character sets. If the planned character set migration is from a single-byte character set to a multibyte character set, then use the Export and Import utilities.

Steps:

1. Shut down the database, using either a SHUTDOWN IMMEDIATE or a SHUTDOWN NORMAL statement.

2. Do a full backup of the database because the ALTER DATABASE CHARACTER SET statement cannot be rolled back.

3. Complete the following statements:
   STARTUP MOUNT;
   ALTER SYSTEM ENABLE RESTRICTED SESSION;
   ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
   ALTER SYSTEM SET AQ_TM_PROCESSES=0;
   ALTER DATABASE OPEN;
   ALTER DATABASE CHARACTER SET new_character_set;

Questions?

Should the NLS_LANG Setting Match the Database Character Set?

The NLS_LANG character set should reflect the setting of the operating system
client. For example, if the database character set is UTF8 and the client has a
Windows operating system, you should not set UTF8 as the client character set
because there are no UTF8 WIN32 clients. Instead the NLS_LANG setting should
reflect the code page of the client.

NLS_LANG is set as a local environment variable on UNIX platforms.

NLS_LANG is set in the registry on Windows platforms. For example, on an English
Windows client, the code page is WE8MSWIN1252. An appropriate setting for NLS_
LANG is AMERICAN_AMERICA.WE8MSWIN1252.

Setting NLS_LANG correctly allows proper conversion from the client operating
system code page to the database character set. When these settings are the same,
Oracle assumes that the data being sent or received is encoded in the same
character set as the database character set, so no validation or conversion is
performed. This can lead to corrupt data if the client code page and the database
character set are different and conversions are necessary.

AL32UTF8 Unicode 3.1 UTF-8 Universal character set E282AC
UTF8 Unicode 3.0 UTF-8 Universal character set E282AC
AL16UTF16 Unicode 3.1 UTF-16 Universal character set 20AC

Errors?

Connected to: Oracle8i Enterprise Edition Release 8.1.7.2.0 - 64bit Production
With the Partitioning option
JServer Release 8.1.7.2.0 - 64bit Production

IMP-00016: required character set conversion (type 31 to 871) not supported
IMP-00000: Import terminated unsuccessfully


AL32UTF8
The AL32UTF8 character set supports the latest version of the Unicode
standard. It encodes characters in one, two, or three bytes. Supplementary
characters require four bytes. It is for ASCII-based platforms.

UTF8
The UTF8 character set encodes characters in one, two, or three bytes. It is for
ASCII-based platforms.
The UTF8 character set has supported Unicode 3.0 since Oracle8i release 8.1.7
and will continue to support Unicode 3.0 in future releases of the Oracle
database server. Although specific supplementary characters were not assigned
code points in Unicode until version 3.1, the code point range was allocated for
supplementary characters in Unicode 3.0. If supplementary characters are
inserted into a UTF8 database, then it does not corrupt the data in the database.
The supplementary characters are treated as two separate, user-defined
characters that occupy 6 bytes. Oracle Corporation recommends that you switch
to AL32UTF8 for full support of supplementary characters in the database
character set.