Strings on Pocket PC, Unicode and ANSI

Vassili Philippov (vasja@spbteam.com), September 26, 2001.

Introduction

All WinAPI function on Pocket PC works with Unicode strings. I think that it was a great idea but... But legacy in our minds. Developers get into the way of using ANSI string developing for desktop. This article describes common problems.

Forget about ANSI strings

If you a Pocket PC developer you should get into the habit of writing _T macro before any constant string. Here are two samples of code that will not work:

AfxMessageBox("My message here"); TRACE("My debug message here");

The code above will not work because WinAPI and MFC functions expect Unicode string but you give them ANSI string. Correct code that will work is the following

AfxMessageBox(_T("My message here")); TRACE(_T("My debug message here"));

_T, TEXT, L""

_T and TEXT macros are standard macros that will work on both Pocket PC and desktop computers. This macro makes Unicode string in the system is Unicode-based (Pocket PC, Windows NT, etc) and makes ANSI string on systems that don't use Unicode in WinAPI (Windows 95, Windows 98, etc).

L is a standard C++ prefix to make Unicode stings. So "string" will be a constant ANSI string in C++ but L"string" will be a constant Unicode string.

I prefer using _T macro because you have to learn once write _T around every constant string and this reflex will work when you develop for Pocket PC as well as when you develop for PC.

Convert from ANSI to Unicode

Sometimes you have ANSI string and you have to convert it to Unicode. Usually it happens when you use some third-party library that uses ANSI strings. In most case you can create new CString object with your ANSI string as a parameter and it will be enough. Here is a sample of code:

char *pAnsiString = "Some test string"; CString strUnicode = pAnsiString;

One moment you should keep in mind is that default code page will be used.

Convert from Unicode to ANSI

There are several ways to convert Unicode string to ANSI. In any case you should answer one question before coding: "what code page to use and what to do with characters that cannot be converted to ANSI in this code page?".

wcstombs

wcstombs function is the simplest way to convert ANSI string to Unicode. It does not contain parameters like code page. Here is a sample code:

char* GetAnsiString(const CString &s) { int nSize = s.GetLength(); char *pAnsiString = new char[nSize+1]; wcstombs(pAnsiString, s, nSize+1); return pAnsiString; } CString strUnicode = _T("Some test string"); char *pAnsiString = GetAnsiString(strUnicode);

WideCharToMultiByte

WideCharToMultiByte is a powerful function with many parameters. It allows you controlling everything: code page, default character, etc. Here is a sample code:

char* GetAnsiString(const CString &s, UINT nCodePage) { int nSize = s.GetLength(); char *pAnsiString = new char[nSize+1]; WideCharToMultiByte(nCodePage, 0, s, nSize+1, pAnsiString, nSize+1, NULL, NULL); return pAnsiString; } CString strUnicode = _T("Some test string"); char *pAnsiString = GetAnsiString(strUnicode, CP_ACP);

printf, scanf, etc

Functions like printf, scanf, etc use ANSI strings. You can use Unicode versions of these functions: wprintf, wscanf, etc (just add w prefix) but it is better to use _tprintf, _tscanf, etc because these functions will work in both Unicode and ANSI configurations.

Conclusion

Using only Unicode strings simplify development but we should remember to add _T before any constant string. Some legacy libraries use ANSI strings; you can convert Unicode strings to ANSI and back but should remember about code page.

Related resources: