Tags: ascii, cant, convert, database, file, helloi, language, mysql, oracle, output, sql, string, text, utf-8, utf8, util, welsh, yet1

9.2 convert ASCII to UTF8 welsh language

On Database » Oracle

7,558 words with 9 Comments; publish: Wed, 13 Feb 2008 18:07:00 GMT; (250109.38, « »)

hello

I have a 9.2 ascii database that i cant convert to UTF8 yet

1 for an output (util file) i need to convert an ascii text string to utf-8 on export

2 i have two characters that are not supported by ascii, ?? the users will represent these by typing w^y^

I tryed using UNISTR but non of the characters below are corectly converted

SELECT UNISTR(ASCIISTR( '?a?êê?)) FROM DUAL ;

how would you recomend converting a ascii latin 1 extended string to UTF-8 for export?

is it sencible to use the character replacement plan above for ???

thanks

james

All Comments

Leave a comment...

  • 9 Comments
    • If your database character set is US7ASCII, no accented characters are supported-- only the ASCII characters 0-127 are supported. String literals that include characters outside of 7-bit ASCII won't be supported by the database with that character set.

      If you have 7-bit ASCII data, there is no conversion necessary when you convert to UTF8 because US7ASCII is a strict binary subset of UTF8. The binary representation of all 128 characters is the same in the two character sets.

      I would strongly suspect that you'd be better off changing the database character set to something reasonable up front.

      Justin

      #1; Sat, 23 Feb 2008 15:03:00 GMT
    • thanks for this

      the DB is WE8MSWIN1252

      but i dont think this supports accented characters either?

      #2; Sat, 23 Feb 2008 15:04:00 GMT
    • OK. That's a little better, Windows-1252 does support a number of accented characters. You can get the full set of supported characters from the Microsoft web site

      http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx

      If you want to convert data to UTF8 in the database, you can use the CONVERT function

      SCOTT .oracle.itags.org. nx102 JCAVE9420> select convert( '?a?êê?', 'UTF8' ) from dual;

      CONVERT('???

      --

      ?a?êê?

      Elapsed: 00:00:00.12

      If you are trying to generate UTF8 output at the client, however, you would probably want to set the client's NLS_LANG to request UTF8 data. Specifics about the client application implementation, however, can add additional wrinkles...

      If the client is using JDBC, for example, the conversion to Unicode happens automatically, but the data will be converted to UTF16. If you wanted to change the encoding, you'd have to do that at the output edge (i.e. when you create the output file in Java).

      Justin

      #3; Sat, 23 Feb 2008 15:05:00 GMT
    • hi im still a bit confused, as what you sugest is what i have been trying.

      is there somthing else i am doing wrong?

      SQL> select convert( '?a?êê?', 'UTF8' ) from dual;

      CONVERT('BBNJJT','UTF8')

      --

      BbnJjt

      the problem seems to be with how the data is loaded in to the convert, not the output?

      thanks

      james

      #4; Sat, 23 Feb 2008 15:06:00 GMT
    • The Character set/language thing is something that always confuses people.

      It's not just a case of what chr set/language the database is set to, but also what your operating system is set to. The database might be using the correct chr set, but then when it comes to displaying them the OS might have trouble doing that or be converting to the wrong things.

      One of the trainers at Oracle University (UK) is Welsh and it's one of her bug bears that Oracle doesn't support Welsh as a language. They also don't support Klingon or Elvish even though they've had requests to include both. :)

      #5; Sat, 23 Feb 2008 15:07:00 GMT
    • hello thanks for the advice im almost there

      I set the NSL_lang environment variable on my workstation to

      AMERICAN_AMERICA.WE8MSWIN1252 to match the DB i am conecting to

      when i try and run a convert i get

      SQL> SELECT CONVERT ('?a?êê?','WE8MSWIN1252','UTF8') FROM DUAL ;

      CONVERT('???êê?','WE8MSWIN1252

      --

      ???

      so it looks like the string is corect on the way in to the convert function, but i guess because the nls lang is now set to WE8MSWIN1252 the client cant display the output in utf 8 format

      any ideas?

      #7; Sat, 23 Feb 2008 15:09:00 GMT
    • Probably the unconverted characters are not contained in the first charset.

      If this is right.

      http://en.wikipedia.org/wiki/Windows-1252

      ...there is no conversion for values outside the first charset.

      But I may made a mistake.

      Are you sure ?, a, ?, ê, ê and ? are in the 1252 charset?

      I am not able to see if there is a difference between the similar chars in the table on wikipedia and the ones you posted, that is why I asked.

      Anyway this output seems to verify my indication.

      Processing ...

      SELECT convert ('?a?€êê?','WE8MSWIN1252','UTF8') FROM DUAL

      Query finished, retrieving results...

      CONVERT('¨a?€¨ê?','WE8MSWIN1252','UTF8')

      ---

      ¨¨¨¨

      1 row(s) retrieved

      Processing ...

      SELECT convert ('?a?€êê?','UTF8','UTF8') FROM DUAL

      Query finished, retrieving results...

      CONVERT('¨a?€¨ê?','UTF8','UTF8')

      --

      ¨a?€¨ê?

      1 row(s) retrieved

      Processing ...

      SELECT convert ('?a?€êê?','UTF8','WE8MSWIN1252') FROM DUAL

      Query finished, retrieving results...

      CONVERT('¨a?€¨ê?','UTF8','WE8MSWIN1252')

      ---

      ?¨???????¨????

      1 row(s) retrieved

      Processing ...

      SELECT convert ('?a?€êê?','WE8PC858','UTF8') FROM DUAL

      Query finished, retrieving results...

      CONVERT('¨a?€¨ê?','WE8PC858','UTF8')

      --

      ??

      1 row(s) retrieved

      Processing ...

      SELECT convert ('?a?€êê?','UTF8','WE8PC858') FROM DUAL

      Query finished, retrieving results...

      CONVERT('¨a?€¨ê?','UTF8','WE8PC858')

      --

      ??????ˉ??????????

      1 row(s) retrieved

      Some characters are not supported on my DB so try these queries on yours to prove it.

      SELECT convert ('?a?€êê?','WE8MSWIN1252','UTF8') FROM DUAL;

      SELECT convert ('?a?€êê?','UTF8','UTF8') FROM DUAL;

      SELECT convert ('?a?€êê?','UTF8','WE8MSWIN1252') FROM DUAL;

      SELECT convert ('?a?€êê?','WE8PC858','UTF8') FROM DUAL;

      SELECT convert ('?a?€êê?','UTF8','WE8PC858') FROM DUAL;

      Bye Alessandro

      #8; Sat, 23 Feb 2008 15:10:00 GMT
    • hello thanks for looking at this.

      ?a?êê? are definatly on the windows code page.

      as i understand it the thing you need to convert from must be the

      NLS_CHARACTERSET of your database. in my case WE8MSWIN1252

      i tried all the converts you listed and got the same result

      the one i would have expected to work would have been

      SQL> SELECT CONVERT ('?a?êê?','WE8MSWIN1252','UTF8') FROM DUAL ;

      but this returns

      CONVERT('???êê?','WE8MSWIN1252

      --

      ???

      i have the local enviroment variable for NLS_lang set to WE8MSWIN1252

      so im still puzzled ??

      #9; Sat, 23 Feb 2008 15:11:00 GMT