ITPub博客

首页 > 数据库 > Oracle > [转载]Oracle数据库字符集问题解析2

[转载]Oracle数据库字符集问题解析2

原创 Oracle 作者:lastwinner 时间:2005-11-11 21:46:53 0 删除 编辑

实验结果分析五

quote:


SQL> INSERT INTO TEST VALUES('东北');

1 row created.

SQL> EXIT
更改客户端字符集为ZHS16GBK
D:>SET NLS_LANG=AMERICAN_AMERICA.ZHS16GBK


D:>SQLPLUS "/ AS SYSDBA"

无法显示用US7ASCII插入的字符集,但可以显示用ZHS16GBK插入的字符集
SQL> SELECT * FROM TEST;

R1
--------------------
东北
??
东北
6+11

SQL>
疑问3:US7ASCII为ZHS16GBK的子集,为何在US7ASCII环境下插入的数据无法显示? [/B]
[@more@]

在客户端字符集设置为US7ASCII时,向字符集为ZHS16GBK的数据库中插入“东北”,需要进行字符转换,“东北”的ZHS16GBK编码为182(10110110)、171(10101011)与177(10110001)、177(10110001),由于US7ASCII为7bit编码,Oracle将这两个汉字当作四个字符,并忽略各字节的最高位,从而存入数据库的编码就变成了54(00110110)、43(00101011)与49(00110001)、49(00110001),也就是“6+11”,原始信息被改变了。这时,将客户端字符集设置为ZHS16GBK再进行SELECT,数据库中的信息不需要改变传到客户端,第一、三行由于存入的信息没有改变能显示“东北”,而第二、四行由于插入数据时信息改变,所以不能显示原有信息了。

分析了这么多的内容,但实际上总结起来也很简单,要想在字符集方面少些错误与麻烦,需要坚持两条基本原则:
在数据库端:选择需要的字符集(通过create database中的CHARACTER SET与NATIONAL CHARACTER SET子句指定);
在客户端:设置操作系统实际使用的字符集(通过环境变量NLS_LANG设置)。

问题解答:
wyq21973:你是指大部分汉字在这两个标准中编码不一样,还是指ZHS16GBK所包含的汉字更多些?
如果是指前者那直接更改就会产生错误的结果。
如果我从ZHS16GBK的库导出,再导入UTF8的库中,会出现什么情形呢?(能不能成功,会不会出现部分汉字乱码?)
jeffli73: 两者编码完全不一样,但所能支持的汉字(专业的说法是字汇)基本相当,所以理论上如果设置得当EXP/IMP可能成功

wyq21973:下面是我刚测试的结果,导入时字符集转换不成功。
D:>imp userid=sys/wyq fromuser=gsm touser=gsm rows=n fil
e=c:f1.dmp,f2.dmp,f3.dmp

Import: Release 8.1.7.0.0 - Production on Tue Dec 28 09:25:12 2004

(c) Copyright 2000 Oracle Corporation. All rights reserved.


Connected to: Oracle8i Enterprise Edition Release 8.1.7.0.0 - Production
With the Partitioning option
JServer Release 8.1.7.0.0 - Production

IMP-00016: required character set conversion (type 852 to 871) not supported
IMP-00000: Import terminated unsuccessfully

sys@ORCL>Select Nls_Charset_Id(Value) Nls_Charset_Id, Value Nls_Charset_Name
2 From V$nls_Valid_Values
3 Where Parameter = 'CHARACTERSET'
4 And Nls_Charset_Id(Value) In (852, 871);

NLS_CHARSET_ID NLS_CHARSET_NAME
-------------- ----------------------------------------------------------------
852 ZHS16GBK
871 UTF8

这种情形该如何处理?

jeffli73:
有可能是Oracle 8 还不支持

看了wyq21973朋友的帖子,当时就有些疑问,因为上面我也提到从理论上讲汉字由ZHS16GBK到UTF8的转换是可能的,是不是因为wyq21973朋友用的Oracle8.1.7,还没提供这种转换呢,但由于当时也没条件试,就放了下来,最近看到论坛里关于字符集的问题又不少,于是决定做一下进一步的实验;
我的PC上本来已经装了一个数据库test1,字符集为ZHS16GBK,今天利用DBCA又建了一个test2,字符集选用UTF8。
实验所用的数据库版本为:Oracle9i Enterprise Edition Release 9.2.0.1.0

1.确认源数据库的字符集:
SCOTT@test1> select * from nls_database_parameters;

PARAMETER VALUE
------------------------------ ---------------------------------------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET ZHS16GBK
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM

PARAMETER VALUE
------------------------------ ---------------------------------------------------------------------
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 9.2.0.1.0

20 rows selected.

注意:是ZHS16GBK。

2.确认转换前的数据:
SCOTT@test1> select deptno,dname,loc,dump(loc) from dept where deptno=50;

DEPTNO DNAME LOC
---------- -------------- -------------
DUMP(LOC)
----------------------------------------------------------------------------------------------------
50 1 1中文
Typ=1 Len=5: 49,214,208,206,196


可以看到“中文”两个字占用4个字节(每个汉字2个字节),是ZHS16GBK编码;

3.导出源数据
D:oracleora92bin>set ORACLE_SID=test1

D:oracleora92bin>exp

Export: Release 9.2.0.1.0 - Production on Wed Mar 30 16:56:15 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.


Username: scott/tiger

Connected to: Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
Enter array fetch buffer size: 4096 >

Export file: EXPDAT.DMP > dept1

(2)U(sers), or (3)T(ables): (2)U > t

Export table data (yes/no): yes >

Compress extents (yes/no): yes >

Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set

About to export specified tables via Conventional Path ...
Table(T) or Partition(T:P) to be exported: (RETURN to quit) > dept

. . exporting table DEPT 8 rows exported
Table(T) or Partition(T:P) to be exported: (RETURN to quit) >

Export terminated successfully without warnings.

4.确认目标数据库的字符集
SCOTT1@test2> select * from nls_database_parameters;

PARAMETER VALUE
------------------------------------------------------------ ---------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET UTF8
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM

PARAMETER VALUE
------------------------------------------------------------ ---------------------------------------
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 9.2.0.1.0

20 rows selected.

注意:是UTF8。

5.导入目标数据库
D:oracleora92bin>set ORACLE_SID=test2

D:oracleora92bin>imp

Import: Release 9.2.0.1.0 - Production on Wed Mar 30 16:58:16 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Username: scott1
Password:

Connected to: Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production

Import file: EXPDAT.DMP > dept1

Enter insert buffer size (minimum is 8192) 30720>

Export file created by EXPORT:V09.02.00 via conventional path

Warning: the objects were exported by SCOTT, not by you

import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
import server uses UTF8 character set (possible charset conversion)

List contents of import file only (yes/no): no >

Ignore create error due to object existence (yes/no): no >

Import grants (yes/no): yes >

Import table data (yes/no): yes >

Import entire export file (yes/no): no >
Username: scott

Enter table(T) or partition(T:P) names. Null list means all tables for user
Enter table(T) or partition(T:P) name or . if done: dept

Enter table(T) or partition(T:P) name or . if done: .

. importing SCOTT's objects into SCOTT1
. . importing table "DEPT" 8 rows imported
Import terminated successfully without warnings.

D:oracleora92bin>

注意上面的提示:
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
import server uses UTF8 character set (possible charset conversion)

6.检查导入数据是否正确
SCOTT1@test2> select deptno,dname,loc,dump(loc) from dept where deptno=50;

DEPTNO DNAME LOC
---------- ---------------------------- --------------------------
DUMP(LOC)
----------------------------------------------------------------------------------------------------
50 1 1中文
Typ=1 Len=7: 49,228,184,173,230,150,135


可以看到“中文”两个字占用6个字节(每个汉字3个字节),是UTF8编码;

7.小结
在Oracle 9.2环境下,汉字可以由ZHS16GBK转换为UTF8;
数据库服务器选用的字符集虽然可以各不相同,但只要各种相关设置正确,Oracle总是尽可能地将正确的转换结果呈现给客户端的用户。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29867/viewspace-809870/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2007-12-12

  • 博文量
    223
  • 访问量
    2812978