ITPub博客

首页 > 数据库 > Oracle > Oracle从non-Unicode到Unicode的转换

Oracle从non-Unicode到Unicode的转换

原创 Oracle 作者:dbs101 时间:2011-05-03 16:48:04 0 删除 编辑

曾经实施过一个Oracle从non-Unicode到Unicode的转换,这里列出主要的步骤:

1. Oracle Version

There are several migration methods to migrate the non-Unicode db to Unicode db for Oracle. the exp/imp is best choice for oracle Unicode migration. The oracle Unicode migration is to change the character set of database from single byte character set (such as US7ASCII, WE8ISO8859P1 and so on) to multiple bytes character set (AL32UTF8/UTF8)

Below are the steps.

1.1. Install csscan tool as sys user
cd $ORACLE_HOME/rdbms/admin
set oracle_sid=
sqlplus "sys as sysdba"
SQL>set TERMOUT ON
SQL>set ECHO ON
SQL>spool csminst.log
SQL>@?/rdbms/admin/csminst.sql
Check the csminst.log for errors

1.2. Run csscan with the following syntax:
csscan 'sys as sysdba' USER=accela FROMCHAR=WE8ISO8859P1 TOCHAR=AL32UTF8 LOG=csscan CAPTURE=Y FEEDBACK=1000 ARRAY=1000000 PROCESS=2

this will create 3 files:

csscan.log a log of the output of csscan
csscan.txt a database scan summary report
csscan.err contains the rowid's of the the rows reported in csscan.txt

Please refer to appendix of how to handle the application data exception in csscan.err file before migration

1.3. Oracle Application Data Exception Handle
If there are not lossy data exception and data truncation exception in csscan.err report file, please skip this step.
1.3.1. Lossy Data Exception
1. run UTF8Mig.bat file with parameter csscan.err file, and generate two script. files csscan.log.size.sql and csscan.log.loss.sql
2. run UTF8Mig.bat file with parameter csscan.err file, and generate two script. files csscan.log.size.sql and csscan.log.loss.sql
3. csscan.log.loss.sql contains rows that has lossy conversion data potentially.
For example: char “…” the horizontal ellipsis, which is hex 85 in the Windows character set WE8MSWIN1252 character set, and it is hex E2 80 A6 in AL32UTF8/UTF8 character set. Now the char is stored in WE8ISO8859P1 character set db with hex 85. Because it is not valid char in WE8ISO8859P1 db, the csscan report it in application data exception report. The workaround is to change database character set to WE8MSWIN1252, below are detail steps

4. Change database character set to WE8MSWIN1252,
Run below command to check if all data in WE8MSWIN1252 character set
csscan 'sys as sysdba' FULL=Y FROMCHAR= WE8MSWIN1252 TOCHAR= WE8MSWIN1252 LOG=csscan CAPTURE=Y FEEDBACK=1000 ARRAY=1000000 PROCESS=2
If there is no application exception in csscan.err, use “@?/rdbms/admin/csalter.plb” in oracle 10g to change the database set to WE8MSWIN1252. Use “ALTER DATABASE CHARACTER SET WE8MSWIN1252” in oracle 9i.
Run below command to check if all data in WE8MSWIN1252 character set
csscan 'sys as sysdba' FULL=Y FROMCHAR= WE8MSWIN1252 TOCHAR= AL32UTF8 LOG=csscan CAPTURE=Y FEEDBACK=1000 ARRAY=1000000 PROCESS=2
The agencies in Appendix have been scanned. There are about 800 truncation exceptions in all agencies (It needs not much time to fix them manually). There is no lossy data exception in all agencies.
1.3.2. Data Truncation Exception
1. After conversion, one char maybe allocate more than one bytes in AL32UTF8/UTF8 character set db. The data maybe exceeds the maximum length 4000 of one VARCHAR2 column.
2. run UTF8Mig.bat file with parameter csscan.err file, and generate two script. files csscan.log.size.sql and csscan.log.loss.sql
3. csscan.log.size.sql is contains statements that retrieves rows with column size > 4000 bytes, after checked acclea host production db, 200 rows has this issue. The workaround is to remove space or useless character without much time.

1.4. Backup source database by doing full database export (full=y)

1.5. Export full source database without data (ROWS=N constraints=n indexes=n)

1.6. Recreate source database or new database with characterset = AL32UTF8/UTF8

1.7. Make sure to set NLS_LENGTH_SEMANTICS to CHAR in target database initialization file  (bounce DB)
   SQL> alter system set NLS_LENGTH_SEMANTICS=CHAR scope=both

1.8. Import export file of step 4 to create all database objects empty

1.9. Run script. nls_length_from_byte_to_char.txt, it will change accela columns containing CHAR, VARCHAR2 from byte to char semantics.

1.10. Import export file of step 3 to import all data (with IGNORE=Y because database objets already exist).
export NLS_LANG=SOURCE_DB_CHARACTERSET
The conversion only happens once during import process.

1.11. Recompile all invalid object if any

1.12. Do MAT test on new DB

1.13. Backup new db

1.14. Performance after Conversion

The performance will not degrade as the AL32UTF8/UTF8 char allocates one byte for ASCII chars.

1.15. DB Size
The DB Size is almost same as old db, as the Unicode char allocates two bytes and non-Unicode only allocates one byte. Please refer to the assessment report for the data

1.16. Data Loss
After tested in Acclea host production db, all rows in the report can be converted
Successfully by exp/imp

1.17. Data Truncation
Only 200 rows (>4000 bytes) has Data Truncation issue in accela production db, they can be handled manually without much time

2. Test Requirement
2.1. Test the migration steps for both Oracle and mssql dbs
2.2. Test the data conversion correctly from the UI and table data especially the data in csscan.err file for oracle version

3. Appendix

3.1. Oracle Application Data Exception Handle
Below agencies has been scanned.
csscan 'sys as sysdba' FULL=Y FROMCHAR= WE8MSWIN1252 TOCHAR= AL32UTF8 LOG=csscan CAPTURE=Y FEEDBACK=1000 ARRAY=1000000 PROCESS=2
There are about 800 truncation exceptions in all below agencies (It needs not much time to fix them manually). There is no lossy data exception in all below agencies.


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/25105315/viewspace-694347/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2010-12-18

  • 博文量
    92
  • 访问量
    437999