ITPub博客

首页 > 数据库 > Oracle > oracle数据库字符集characterset迁移及变更系列一

oracle数据库字符集characterset迁移及变更系列一

原创 Oracle 作者:wisdomone1 时间:2015-11-30 19:06:24 0 删除 编辑

背景

   oracle数据库字符集和应用数据密切相关,数据库迁移也会涉及到数据库字符集的转换,大家常常听到的乱码之类的,导出导入表发生列长度不足的错误,也和数据库字符集不无关系,
本文我沿袭前文:http://blog.itpub.net/9240380/viewspace-1849340/ , 
itpub网友问题之AL32UTF8与ZHS16GBK 2种数据库字符集database characterset,继续学习数据库字符集的相关知识。


结论

1,表名长度最大为30个字节
2,convert函数确实是个利器,可以验证2种字符集是否兼容,达到选择目标数据库字符集的作用
3,nls_lang是OS层面的环境变量,其最后部分指定客户端采用的数据库字符集,控制EXP及IMP和EXPDP及IMPDP的字符集
4,如果NLS_LANG及数据库字符集不一致,会提示可能产生字符集变更的信息,这个可能会引发数据损失
5,千万别太轻信网上的,运行ALERT DATABASE CHARECTERSET INTERNAL_USE变更 数据库字符集,这样会引发隐性或未知的风险
6,nls_length_semantics默认值为BYTE,值也可为CHAR,ORACLE不建议采用后者,引发会引发性能问题以及运行时间错误,可能也会产生BUFFER OVERFLOW




测试


SQL> select * from v$version where rownum=1;


BANNER
----------------------------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production


SQL> create table t_testlength(a int);


Table created.


SQL> select length(object_name) from user_objects where  lower(object_name)='t_testlength';


LENGTH(OBJECT_NAME)
-------------------
                 12


SQL> alter table t_testlength rename to t_testlength_testlength_testlengt;
alter table t_testlength rename to t_testlength_testlength_testlengt
                                   *
ERROR at line 1:
ORA-00972: identifier is too long




SQL> select length('t_testlength_testlength_testlengt') from dual;


LENGTH('T_TESTLENGTH_TESTLENGTH_TESTLENGT')
-------------------------------------------
                                         33


SQL> select length('t_testlength_testlength_testle') from dual;


LENGTH('T_TESTLENGTH_TESTLENGTH_TESTLE')
----------------------------------------
                                      30


可见表名长度最长为30个字节的长度
SQL> alter table t_testlength rename to t_testlength_testlength_testle;


Table altered.


SQL> alter table t_testlength_testlength_testle rename to t_testlength_testlength_testle1;
alter table t_testlength_testlength_testle rename to t_testlength_testlength_testle1
                                                     *
ERROR at line 1:
ORA-00972: identifier is too long                                                          




--当前数据库字符集为AL32UTF8
SQL> select parameter,value from nls_database_parameters where parameter='NLS_CHARACTERSET';


PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_CHARACTERSET               AL32UTF8


--长度刻度为字节,还有个值也可以为CHAR
SQL> show parameter nls_length_semantics


NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ------------------------------
nls_length_semantics                 string                 BYTE


--查看与中文相关的数据库字符集
SQL> select distinct parameter from v$nls_valid_values;


PARAMETER
------------------------------
CHARACTERSET
SORT
TERRITORY
LANGUAGE


SQL> select parameter,value from v$nls_valid_values where parameter='CHARACTERSET' and VALUE  like '%ZH%';


PARAMETER                      VALUE
------------------------------ ------------------------------
CHARACTERSET                   ZHS16CGB231280
CHARACTERSET                   ZHS16MACCGB231280
CHARACTERSET                   ZHS16GBK
CHARACTERSET                   ZHS16DBCS
CHARACTERSET                   ZHS32GB18030
CHARACTERSET                   ZHT32EUC
CHARACTERSET                   ZHT32SOPS
CHARACTERSET                   ZHT16DBT
CHARACTERSET                   ZHT32TRIS
CHARACTERSET                   ZHT16DBCS
CHARACTERSET                   ZHT16BIG5


PARAMETER                      VALUE
------------------------------ ------------------------------
CHARACTERSET                   ZHT16CCDC
CHARACTERSET                   ZHT16MSWIN950
CHARACTERSET                   ZHT16HKSCS
CHARACTERSET                   ZHT16HKSCS31
CHARACTERSET                   ZHS16CGB231280FIXED
CHARACTERSET                   ZHS16GBKFIXED
CHARACTERSET                   ZHS16DBCSFIXED
CHARACTERSET                   ZHT32EUCFIXED
CHARACTERSET                   ZHT32TRISFIXED
CHARACTERSET                   ZHT16DBCSFIXED
CHARACTERSET                   ZHT16BIG5FIXED


22 rows selected.


好像英文字符在2种字符集占用空间是一样的
SQL> select 'abc',dump('abc'),dump(convert('abc','ZHS16GBK','AL32UTF8')) from dual;


'ABC'  DUMP('ABC')                                  DUMP(CONVERT('ABC','ZHS16GBK','AL32UTF8'))
------ -------------------------------------------- ------------------------------------------
abc    Typ=96 Len=3: 97,98,99                       Typ=1 Len=3: 97,98,99


SQL> insert into t_charset values('我们');


1 row created.


SQL> commit;


Commit complete.




SQL> select a,dump(a) from t_charset;


A                              DUMP(A)
------------------------------ --------------------------------------------------
我们                           Typ=1 Len=9: 233,142,180,230,136,156,230,187,145


可见AL32UTF8不能正常转化ZHS16GBK,中文会出现问题,且前者占用字节长度为9个,而后者变成6,造成数据损失
SQL> select a,dump(a),convert(a,'ZHS16GBK','AL32UTF8') mig_char,dump(convert(a,'ZHS16GBK','AL32UTF8')) after_dump from t_charset;


A                    DUMP(A)                                            MIG_CHAR                       AFTER_DUMP
-------------------- -------------------------------------------------- ------------------------------ --------------------------------------------------
我们                 Typ=1 Len=9: 233,142,180,230,136,156,230,187,145                        Typ=1 Len=6: 230,136,145,228,187,172


看到没,如果从AL32UTF8转化为UTF8,不会造成数据损失,中文可以正常显示,可见CONVERT函数是个利器,对于字符集转化
SQL> select a,dump(a),convert(a,'UTF8','AL32UTF8') mig_char,dump(convert(a,'UTF8','AL32UTF8')) after_dump from t_charset;


A                    DUMP(A)                                            MIG_CHAR                       AFTER_DUMP
-------------------- -------------------------------------------------- ------------------------------ --------------------------------------------------
我们                 Typ=1 Len=9: 233,142,180,230,136,156,230,187,145   我们                           Typ=1 Len=9: 233,142,180,230,136,156,230,187,145


SQL> desc database_properties;
 Name              Null?    Type
 ----------------- -------- ------------
 PROPERTY_NAME     NOT NULL VARCHAR2(30)
 PROPERTY_VALUE             VARCHAR2(400
                            0)
 DESCRIPTION                VARCHAR2(400
                            0)


SQL> select count(*) from database_properties;


  COUNT(*)
----------
        36


SQL> select distinct property_name from database_properties;


PROPERTY_NAME
------------------------------
Flashback Timestamp TimeZone
DST_UPGRADE_STATE
NLS_CURRENCY
NLS_ISO_CURRENCY
NLS_DUAL_CURRENCY
DBTIMEZONE
DEFAULT_EDITION
NLS_CALENDAR
NLS_TIMESTAMP_FORMAT
EXPORT_VIEWS_VERSION
NO_USERID_VERIFIER_SALT


PROPERTY_NAME
------------------------------
DST_SECONDARY_TT_VERSION
NLS_TERRITORY
NLS_DATE_LANGUAGE
NLS_LENGTH_SEMANTICS
WORKLOAD_REPLAY_MODE
DEFAULT_PERMANENT_TABLESPACE
DEFAULT_TBS_TYPE
NLS_LANGUAGE
NLS_TIMESTAMP_TZ_FORMAT
NLS_NCHAR_CHARACTERSET
NLS_RDBMS_VERSION


PROPERTY_NAME
------------------------------
NLS_NUMERIC_CHARACTERS
NLS_TIME_FORMAT
NLS_NCHAR_CONV_EXCP
GLOBAL_DB_NAME
DEFAULT_TEMP_TABLESPACE
NLS_DATE_FORMAT
NLS_COMP
DICT.BASE
TDE_MASTER_KEY_ID
DST_PRIMARY_TT_VERSION
NLS_CHARACTERSET


PROPERTY_NAME
------------------------------
NLS_SORT
WORKLOAD_CAPTURE_MODE
NLS_TIME_TZ_FORMAT


36 rows selected.




SQL> select property_name,property_value from database_properties where property_name='NLS_LENGTH_SEMANTICS';


PROPERTY_NAME                  PROPERTY_VALUE
------------------------------ --------------------------------------------------
NLS_LENGTH_SEMANTICS           BYTE


--查官方手册,梳理上述参数nls_length_semantics的含义
Oracle? Database Globalization Support Guide 之
3 Setting Up a Globalization Support Environment


1,nls_length_semantics的可用值为:byte,char,默认值为byte
2,前值适用于单字节数据库字符集编码规则,后者适用于多字节数据库字符集编码规则


3,char,varchar2,long数据类型可以选取byte,也可以选择char
4,nchar,nvarchar2,clob,nclob,只能选用char
5,如果变更nls_length_semantics不会影响已存在的数据


6,nls_length_semantics可以在数据库级,实例级,会话级进行调整


7,sys用户下的数据只会采用byte


8,如果在客户端显示指定与服务器端不同的值,录入数据以客户端为准


9,为了数据兼容,ORACLE不建议配置值为CHAR,因为会导致产生很多问题:运行时错误,BUFFER OVERFLOW




---测试下zhs16gbk存储固定宽度的中文字符,然后转化为AL32UTF8
SQL> conn /as sysdba
Connected.
SQL> alter system enable restricted session;


System altered.


SQL> ALTER DATABASE character set INTERNAL_USE ZHS16GBK;


Database altered


SQL> alter system disable restricted session;


System altered.


SQL> conn scott/system
Connected.




SQL> create table t_charset(a char(10));


Table created.


SQL> insert into t_charset values('伙伴');


1 row created.


SQL> commit;


Commit complete.
--可见占用10个字节
SQL> select a,dump(a) from t_charset;


A                    DUMP(A)
-------------------- --------------------------------------------------
伙伴                 Typ=96 Len=10: 228,188,153,228,188,180,32,32,32,32


SQL> conn /as sysdba
Connected.
SQL> alter system enable restricted session;


System altered.


SQL> alter database character set INTERNAL_USE AL32UTF8;


Database altered.




SQL> alter system disable restricted session;


System altered.


--看到没,转化AL32UTF8,中文显示不出来
SQL> conn scott/system
Connected.
SQL> select a,dump(a) from t_charset;


A                    DUMP(A)
-------------------- --------------------------------------------------
           Typ=96 Len=10: 228,188,153,228,188,180,32,32,32,32




SQL> insert into t_charset values('伙伴');


1 row created.


SQL> commit;


Commit complete.


--可见基于同样的中文,ZHS16GBK与AL32UTF8,编码规则发生了变化,虽然占用的字节数相同,当然显示不出来了
SQL> select a,dump(a) from t_charset;


A                    DUMP(A)
-------------------- ----------------------------------------------------------------------------------------------------
           Typ=96 Len=10: 228,188,153,228,188,180,32,32,32,32
伙伴                 Typ=96 Len=10: 230,181,188,230,172,142,229,141,179,32










----在al32utf8导出测试表
可见导出失败了(这里失败是因为表中同时存储2种不同字符集的表数据)
[oracle@seconary ~]$ exp file=exp_t_charset.dmp userid=scott  tables=t_charset


Export: Release 11.2.0.1.0 - Production on Sun Nov 29 16:27:59 2015


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


Password: 


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)


About to export specified tables via Conventional Path ...
. . exporting table                      T_CHARSET
EXP-00008: ORACLE error 6552 encountered
ORA-06552: PL/SQL: Compilation unit analysis terminated
ORA-06553: PLS-553: character set name is not recognized
Export terminated successfully with warnings.
[oracle@seconary ~]$ 






---删除测试表,然后调整数据库字符,再导入测试表
SQL> drop table t_charset purge;


Table dropped.


SQL> conn /as sysdba
Connected.
SQL> alter system enable restricted session;


System altered.


SQL> alter database character set INTERNAL_USE  ZHS16GBK;


Database altered.


SQL> alter system disable restricted session;


System altered.


[oracle@seconary ~]$ imp file=exp_t_charset.dmp userid=scott  tables=t_charset


Import: Release 11.2.0.1.0 - Production on Sun Nov 29 16:31:43 2015


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


Password: 


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


Export file created by EXPORT:V11.02.00 via conventional path
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
. importing SCOTT's objects into SCOTT
Import terminated successfully without warnings.




---我们重新测试上述的例子,清空测试表
---al32utf8
SQL> conn scott/system
Connected.
SQL> create table t_charset(a char(10));


Table created.


SQL> insert into t_charset values('伙伴');


1 row created.


SQL> commit;


Commit complete.


SQL> select a,dump(a) from t_charset;


A                    DUMP(A)
-------------------- ----------------------------------------------------------------------
伙伴                 Typ=96 Len=10: 230,181,188,230,172,142,229,141,179,32


可见现在表中1种数据库字符集,EXP还是报错,说明用alter database character set internal_use这种方式改数据库字符集,其实具备很大的风险,很可能会把数据库搞坏
[oracle@seconary ~]$ exp file=exp_t_charset_pure.dmp userid=scott  tables=t_charset


Export: Release 11.2.0.1.0 - Production on Sun Nov 29 16:38:37 2015


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


Password: 


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set  --可见导出是以ZHS16GBK,这个提示与OS的环境变量NLS_LANG有关
server uses AL32UTF8 character set (possible charset conversion)


About to export specified tables via Conventional Path ...
. . exporting table                      T_CHARSET
EXP-00008: ORACLE error 6552 encountered
ORA-06552: PL/SQL: Compilation unit analysis terminated
ORA-06553: PLS-553: character set name is not recognized
Export terminated successfully with warnings.
[oracle@seconary ~]$ 


---查看NLS_LANG
[oracle@seconary ~]$ env|grep LANG
NLS_LANG=AMERICAN_AMERICA.ZHS16GBK
LANG=en_US.utf8


--调整NLS_LANG
[oracle@seconary ~]$ export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
[oracle@seconary ~]$ env|grep LANG
NLS_LANG=AMERICAN_AMERICA.AL32UTF8


---调整NLS_LANG中的数据库字符集与数据库中的字符集相同,导出还是报错
[oracle@seconary ~]$ exp file=exp_t_charset_pure.dmp userid=scott  tables=t_charset


Export: Release 11.2.0.1.0 - Production on Sun Nov 29 16:42:41 2015


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


Password: 


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in AL32UTF8 character set and AL16UTF16 NCHAR character set


About to export specified tables via Conventional Path ...
. . exporting table                      T_CHARSET
EXP-00008: ORACLE error 6552 encountered
ORA-06552: PL/SQL: Compilation unit analysis terminated
ORA-06553: PLS-553: character set name is not recognized
Export terminated successfully with warnings.


借此学习下nls_lang中的字符集是否配置不同,会影响EXP导出的字符集导出的结果
可见确实会影响EXP导出的效果,且会影响SQLPLUS的运行
[oracle@seconary ~]$ export NLS_LANG=AMERICAN_AMERICA.ZHT16DBCSFIXED
[oracle@seconary ~]$ exp file=exp_t_charset_pure_other.dmp userid=scott  tables=t_charset


BoBoBoBoBoBo: Release 11.2.0.1.0 - Production on 


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.




EXP-00092: BoBoBoBoBoBo@@BoBo@@BoBoBo@@BoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBoBo@@BoBo@@BoBoBoBoBoBoBoBo@@BoBoBoBoBoBoBoBoBo
EXP-00000: BoBoBoBoBoBo@@BoBoBoBoBoBoBoBoBoBo@@BoBoBoBoBoBoBoBoBoBoBoBoBoBoBo[oracle@seconary ~]$ 
[oracle@seconary ~]$ 
[oracle@seconary ~]$ 
[oracle@seconary ~]$ 
[oracle@seconary ~]$ sqlplus '/as sysdba'
Error 19 initializing SQL*Plus
Invalid NLS character set for this OS environment


---可见NLS_LANG确实会影响EXP数据库字符集的选择,所以一定要理解此参数的含义,否则EXP IMP以及EXPDP和IMPDP会产生数据损失以及产生乱码
[oracle@seconary ~]$ export NLS_LANG=AMERICAN_AMERICA.UTF8
[oracle@seconary ~]$ exp file=exp_t_charset_pure_other.dmp userid=scott  tables=t_charset


Export: Release 11.2.0.1.0 - Production on Sun Nov 29 17:24:25 2015


Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.


Password: 


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in UTF8 character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)


About to export specified tables via Conventional Path ...
. . exporting table                      T_CHARSET          1 rows exported
Export terminated successfully without warnings.






来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/9240380/viewspace-1849953/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
提供针对oracle初学者及进阶的数据库培训,欢迎大家咨询: 微信: wisdomone 微信公众号: lovedb qq: 305076427 微博: wisdomone9

注册时间:2008-04-04

  • 博文量
    2149
  • 访问量
    11890834