Tags: ascii, cant, convert, database, file, helloi, language, mysql, oracle, output, sql, string, text, utf-8, utf8, util, welsh, yet1
9.2 convert ASCII to UTF8 welsh language
7,558 words with 9 Comments; publish: Wed, 13 Feb 2008 18:07:00 GMT; (250109.38, « »)
hello
I have a 9.2 ascii database that i cant convert to UTF8 yet
1 for an output (util file) i need to convert an ascii text string to utf-8 on export
2 i have two characters that are not supported by ascii, ?? the users will represent these by typing w^y^
I tryed using UNISTR but non of the characters below are corectly converted
SELECT UNISTR(ASCIISTR( '?a?êê?)) FROM DUAL ;
how would you recomend converting a ascii latin 1 extended string to UTF-8 for export?
is it sencible to use the character replacement plan above for ???
thanks
james
http://oracle.itags.org/q_oracle_5814.html
All Comments
Leave a comment...
- 9 Comments

- If your database character set is US7ASCII, no accented characters are supported-- only the ASCII characters 0-127 are supported. String literals that include characters outside of 7-bit ASCII won't be supported by the database with that character set.
If you have 7-bit ASCII data, there is no conversion necessary when you convert to UTF8 because US7ASCII is a strict binary subset of UTF8. The binary representation of all 128 characters is the same in the two character sets.
I would strongly suspect that you'd be better off changing the database character set to something reasonable up front.
Justin
#1; Sat, 23 Feb 2008 15:03:00 GMT

- thanks for this
the DB is WE8MSWIN1252
but i dont think this supports accented characters either?
#2; Sat, 23 Feb 2008 15:04:00 GMT

- OK. That's a little better, Windows-1252 does support a number of accented characters. You can get the full set of supported characters from the Microsoft web site
http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx
If you want to convert data to UTF8 in the database, you can use the CONVERT function
SCOTT .oracle.itags.org. nx102 JCAVE9420> select convert( '?a?êê?', 'UTF8' ) from dual;
CONVERT('???
--
?a?êê?
Elapsed: 00:00:00.12
If you are trying to generate UTF8 output at the client, however, you would probably want to set the client's NLS_LANG to request UTF8 data. Specifics about the client application implementation, however, can add additional wrinkles...
If the client is using JDBC, for example, the conversion to Unicode happens automatically, but the data will be converted to UTF16. If you wanted to change the encoding, you'd have to do that at the output edge (i.e. when you create the output file in Java).
Justin
#3; Sat, 23 Feb 2008 15:05:00 GMT

- hi im still a bit confused, as what you sugest is what i have been trying.
is there somthing else i am doing wrong?
SQL> select convert( '?a?êê?', 'UTF8' ) from dual;
CONVERT('BBNJJT','UTF8')
--
BbnJjt
the problem seems to be with how the data is loaded in to the convert, not the output?
thanks
james
#4; Sat, 23 Feb 2008 15:06:00 GMT

- The Character set/language thing is something that always confuses people.
It's not just a case of what chr set/language the database is set to, but also what your operating system is set to. The database might be using the correct chr set, but then when it comes to displaying them the OS might have trouble doing that or be converting to the wrong things.
One of the trainers at Oracle University (UK) is Welsh and it's one of her bug bears that Oracle doesn't support Welsh as a language. They also don't support Klingon or Elvish even though they've had requests to include both. :)
#5; Sat, 23 Feb 2008 15:07:00 GMT

- I had the same problem.
Here i found the solution.
http://www.databasejournal.com/features/oracle/article.php/3493691
Bye Alessandro
#6; Sat, 23 Feb 2008 15:08:00 GMT

- hello thanks for the advice im almost there
I set the NSL_lang environment variable on my workstation to
AMERICAN_AMERICA.WE8MSWIN1252 to match the DB i am conecting to
when i try and run a convert i get
SQL> SELECT CONVERT ('?a?êê?','WE8MSWIN1252','UTF8') FROM DUAL ;
CONVERT('???êê?','WE8MSWIN1252
--
???
so it looks like the string is corect on the way in to the convert function, but i guess because the nls lang is now set to WE8MSWIN1252 the client cant display the output in utf 8 format
any ideas?
#7; Sat, 23 Feb 2008 15:09:00 GMT

- Probably the unconverted characters are not contained in the first charset.
If this is right.
http://en.wikipedia.org/wiki/Windows-1252
...there is no conversion for values outside the first charset.
But I may made a mistake.
Are you sure ?, a, ?, ê, ê and ? are in the 1252 charset?
I am not able to see if there is a difference between the similar chars in the table on wikipedia and the ones you posted, that is why I asked.
Anyway this output seems to verify my indication.
Processing ...
SELECT convert ('?a?€êê?','WE8MSWIN1252','UTF8') FROM DUAL
Query finished, retrieving results...
CONVERT('¨a?€¨ê?','WE8MSWIN1252','UTF8')
---
¨¨¨¨
1 row(s) retrieved
Processing ...
SELECT convert ('?a?€êê?','UTF8','UTF8') FROM DUAL
Query finished, retrieving results...
CONVERT('¨a?€¨ê?','UTF8','UTF8')
--
¨a?€¨ê?
1 row(s) retrieved
Processing ...
SELECT convert ('?a?€êê?','UTF8','WE8MSWIN1252') FROM DUAL
Query finished, retrieving results...
CONVERT('¨a?€¨ê?','UTF8','WE8MSWIN1252')
---
?¨???????¨????
1 row(s) retrieved
Processing ...
SELECT convert ('?a?€êê?','WE8PC858','UTF8') FROM DUAL
Query finished, retrieving results...
CONVERT('¨a?€¨ê?','WE8PC858','UTF8')
--
??
1 row(s) retrieved
Processing ...
SELECT convert ('?a?€êê?','UTF8','WE8PC858') FROM DUAL
Query finished, retrieving results...
CONVERT('¨a?€¨ê?','UTF8','WE8PC858')
--
??????ˉ??????????
1 row(s) retrieved
Some characters are not supported on my DB so try these queries on yours to prove it.
SELECT convert ('?a?€êê?','WE8MSWIN1252','UTF8') FROM DUAL;
SELECT convert ('?a?€êê?','UTF8','UTF8') FROM DUAL;
SELECT convert ('?a?€êê?','UTF8','WE8MSWIN1252') FROM DUAL;
SELECT convert ('?a?€êê?','WE8PC858','UTF8') FROM DUAL;
SELECT convert ('?a?€êê?','UTF8','WE8PC858') FROM DUAL;
Bye Alessandro
#8; Sat, 23 Feb 2008 15:10:00 GMT

- hello thanks for looking at this.
?a?êê? are definatly on the windows code page.
as i understand it the thing you need to convert from must be the
NLS_CHARACTERSET of your database. in my case WE8MSWIN1252
i tried all the converts you listed and got the same result
the one i would have expected to work would have been
SQL> SELECT CONVERT ('?a?êê?','WE8MSWIN1252','UTF8') FROM DUAL ;
but this returns
CONVERT('???êê?','WE8MSWIN1252
--
???
i have the local enviroment variable for NLS_lang set to WE8MSWIN1252
so im still puzzled ??
#9; Sat, 23 Feb 2008 15:11:00 GMT