Re: Tranfering unicod charcters in Socket programming!

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



[...]

B. If I send any unicode string(????) , its coming as ????????????. <

You send 4 characters - ????, it's 8 bytes. So in send(...) you provide
pointer to buffer which holds ???? string, and you specify the correct size
for it's buffer, i.e. 8 bytes? True?. What is the send(...) return value in
this case? This is strange, that you recieve 12 characters (24) bytes in
case of sending only 4 characters (8 bytes).

3) You are telling about conversion b/w MBCS (multibyte) to Unicode. I
agree about your point.But my problem is totaly different , after
conversion I should substitute string handling <

Let's forget for now about this problem, and let's figure out why you
recieve string ???? while sending string ????????????

Could you please show me, how do you initialize the array with data, and how
do you pass it into send(...) ?

--
Volodymyr
<raghupise@xxxxxxxxx> wrote in message
news:8d4266cd-ebac-4493-a903-2f38445f52d0@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Dear Chris Becke,

2)I will explain clearly.

A. I am sending unicode as string, but its coming as "??????????????
?ý"" in recv argument of recv
function.This will be amazing output.
B. If I send any unicode string(????) , its coming as ????????????.
I believ you understand my problem.

3) You are telling about conversion b/w MBCS (multibyte) to Unicode.
I agree about your point.But my problem is totaly different ,
after conversion I should substitute string handling
function( find, substr,compare) with wchar_t funcion(If
possible ).
If this is not possible Shall I try with string to wstring
conversion?
Here i believe i will find string handling function like find,
compare -etc function.
Still if u have doubts in my problem.You are most welcome to ask
any quires.

If your previous post answers mentioned above(3rd )
question,Please explain me deeply.



Kind Regards,
Raghavendra L Pise..



Chris Becke wrote:
<raghupise@xxxxxxxxx> wrote in message
news:569ae0c7-42c8-4f5f-9bac-679a9099458c@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Dear Volodymyr Shcherbyna,

1) Really i done mistake by aking code snippet or posting long list of
code.

2) My immediate doubt in my code is why recvbuf is returning double
byte charcter.
Actualy i am sending simple string, but its returing double byte
charcter.

What do you mean in this context by double byte characters. If you examine
the actual byte values sent, and recieved, are they the same?

An easy way to check if you are using Visual Studio 6 (this should also
work
in other versions but I know it works in VS6) is to place the variable in
the debug window with a ',m' - which shows a memory dump of the variable.

In your case you would enter

recvbuf,m

in your VS watch window to see the byte values.


bytesRecv = recv( ConnectSocket, (char*)recvbuf, sizeof(recvbuf),
0 );

3) In our project design, I have replaced string by wcahr_t data
type.In other class i have to replace
reaplace string funcitons like "find", "substr" with wchar_t
substitute functions?
How I can achiev this.If possible give me some hints
should i use std::wstring instead of wcahr_t fucntion to resolve
above problem(3rd question)

Look. All these frameworks are obscuring the problem. Its usually much
clearer to deal with very simple C, rather than C++ when trying to
understand api issues... So, in that vein, in order to transform a string
from MBCS (multibyte) to Unicode and send it you could make a function
like
this

int SendStringAsUnicode(SOCKET s, LPCSTR pszMbcsString)
{
WCHAR szWideString[1024];
int cchWide =
MultiByteToWideChar(CP_ACP,0,pszMbcsString,-1,sizeof(szWideString)/sizeof(WCHAR));
return send(s,(char*)szWideString,cchWide,0);
}

To recieve a Unicode string and transform it back into MBCS, you could do
something like

int RecvUnicodeAsAnsiString(SOCKET s,LPSTR pszBuffer,INT cchBuffer)
{
WCHAR szWideString[1024];
int cbRecv =
recv(s,(char*)szWideString,sizeof(szWideString)/sizeof(WCHAR),0);
if(cbRecv <=0)
return cbRecv;
int cch =
WideCharToMultiByte(CP_ACP,0,szWideString,cbRecv/sizeof(WCHAR),pszBuffer,cchBuffer,NULL,NULL);
return cch;
}

The recvstring function will work to test the basic idea, but im sure you
can see it has a huge number of issues - if an odd number of bytes is
recieved it will loose a byte and the next read will be mis-aligned, and
if
the passed in buffer is too small then characters will just be lost.

Also note that, once in Unicode, you should really KEEP the string in
unicode...
If you call GetACP() to get the systems ansi codepage identifier, and the
result is different on the server from on the client, then very likley,
many
characters in the transferred string will be replaced with ??'s.


4) I have replaced following code with
"STRCAT(strRecvbuf.c_str(),recvbuf);"
strRecvbuf.append(recvbuf);

And STRCAT defintion is
inline void STRCAT(uchar* d,const uchar* s) {
pstrconcat(d,s);
}

I am asking this above question(4) because i am not sure this
conversion works fine or not if we send single or double byte
charcter.

Kind Regards,
Raghavendra L Pise.



On Dec 6, 12:32 pm, "Volodymyr Shcherbyna"
<v_scherb...@xxxxxxxxxxxxxxx> wrote:
Hello,

You have to realize that we help here not by doing ourself _your_ work.
We
help by providing hints and solutions which allows you to solve the
problem
yourself. More specifically, I give answers. I can also take a look at
short
piece of code, and write the corrected version, but when OP just makes
copy
paste from his project into NG it shows that he just does not respect
our
time. Because:

- the code usually is formatted ugly
- the amount of code is huge
- the code contains unnessary staff which makes diffucut understanding
of
the cause of problem
- noone ever attach the source files to the messages

If you have questions, I can help, no problem. But the definition
should
be
strict, for example,

- How would I store and what string functions should I use to
manipulate
with unicode strings?

You can also take a short code which is clean, compiliable which
depicts
your problem, attach it to your post, I will check it. But please,
don't
provide big listings.

--
Volodymyr, Windows SDK MVP

<raghup...@xxxxxxxxx> wrote in message

news:23967872-404f-4e3c-a2a6-08dfb82cc0ff@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx



Hi Volodymyr,

1) I am requesting you to see my Parese functin from CParese
class.This one i have posted in
earlier messages(Post).
Really i have more problem to fix unicode problem in my project.
At the same time, I need your help.
You have suggested me to look on appropriate unicode functions for
strings manipulations.
I wan't able to findout your solution.
Please, have a look on my code and do some modification.
Here i used string.substr or string.find.
How can substitute this with unicode function.
Here i am in trouble.

2)Still also, I am geting double byte charcter in recv(), if i send
smiple string.
For this question I am looking for your solution.

Kind Regards,
Raghaevndra L Pise.

On Dec 5, 12:51 pm, "Chris Becke" <chris.be...@xxxxxxxxx> wrote:
You need to write a small test application using blocking sockets
with
minimal error handling just to get this concept down. Once the
simple
example is working, port it to your application. At the moment it is
difficult to tell if the problem is in your conversion of ansi to
unicode,
unicode back to ansi, or somwhere in your socket code.
Hi Volodymyr,

<raghup...@xxxxxxxxx> wrote in message

news:7b4600b5-762b-45f5-9102-de56343e1e1c@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Hi Volodymyr,

1) As you said I have to use std::wstring for unicode characters
.But
how I can achieve this.
which header file i need to use?

2)What about recv() function problem.

As I explained in my earlier post i was getting junk unicode
charcters,If i send simple charcter.

Looking forward to your answers.

Kind Regards,
Raghavendra L Pise.

On Dec 4, 5:56 pm, "Volodymyr Shcherbyna"
<v_scherb...@xxxxxxxxxxxxxxx> wrote:
Hello again ;)

As I see further in your code, you have more problems. You
recieve
the
data
in unicode, but then you pass the resulted UNICODE string to
Parse
method,
which is defined as:

void CParser::Parse(string strRecvbuf, CMessageIn* mi)

And this is wrong, because this method takes as an input
parameter
a
std::string object, which is a wrapper over ANSI string. You
cannot
simply
take a UNICODE string and convert it into corresponding
std::string
object.
You should use std::wstring for unicode characters. Also, you
will
need
to
use appropriate unicode functions for strings manipulations.

--
Volodymyr<raghup...@xxxxxxxxx> wrote in message

news:34aba993-c626-4a8b-bca6-720576948d29@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Dear Volodymyr,

1) As you said, I have done modification as given below.

recvbuf[bytesRecv / sizeof(wchar_t)] = '\0';

2)You asked me about what is the return value after recv()
function.
I am getting amazing output for this.
Earlier i was trying with these double byte charcter "????".
But this is that, Now I am sending simple string its coming as
"??????
????????? " these charters.
I dont know why its coming double byte charcter,If i am sending
simple
string called "unicode1".
I believe some where double byte chrcters are going to store
after
executing recv() function.

3) I have another problem,
In this below line of Listen fucntion.

parser.Parse((char*)strRecvbuf.c_str(), &mi); //Rlp has done to
fix
unicode

server is not going to strcuk for this above line.
Parser() defintaion s follows.
ofcouse we use strting data type for that reason its going to
strcuk.
But i need your help to fix this problem also.

void CParser::Parse(string strRecvbuf, CMessageIn* mi)
{

bool bArgHasDelim = 0; //special extraction of arguments for
tagstring, whichs argument can contain c_strArgDelim
mi->SetCmd(op_error);
string strCmd("");

RetrieveCmd(strRecvbuf, &strCmd, mi);
InterpretCmd(strCmd, mi);
RetrieveArgs(strRecvbuf, mi);

}

int CParser::RetrieveCmd(string strRecvbuf, string* strCmd,
CMessageIn* mi)
{
//
//Retrieve command
//
int nCmdStartPos = 0, nCmdEndPos;
int nMessEndPos = strRecvbuf.find(c_strEndOfInMess);
//Check if the message contains a command i.e starts with "/" or
">"
if (strRecvbuf.substr(0,1) == c_strStartOfCmd ||
strRecvbuf.substr(0,1) == c_strStartOfCmd2){
nCmdStartPos++;
mi->SetShallReturnMessage(true);
//"//" or ">>" gives silent mode (no answers)
if (strRecvbuf.substr(1,1) == c_strStartOfCmd ||
strRecvbuf.substr(1,1) == c_strStartOfCmd2){
nCmdStartPos++;
mi->SetShallReturnMessage(false);}

//Find start of arguments and extract the string inbetween
nCmdEndPos = strRecvbuf.find(c_strStartOfArgs);
if (nCmdEndPos != string::npos) {
*strCmd = strRecvbuf.substr(nCmdStartPos, nCmdEndPos -
nCmdStartPos);
//Otherwise the command MUST be terminated by c_strEndOfInMess}
else {

*strCmd = strRecvbuf.substr(nCmdStartPos, nMessEndPos -
nCmdStartPos);}
} else {

throw DocException("RetrieveCmd failed. No '/' in beginning of
command.");

}

return 0;

}

int CParser::RetrieveArgs(string strRecvbuf, CMessageIn* mi)
{
//
//Retrieve arguments
//
string strExtracted("");
string strArgs("");
int nArgStartPos, nArgEndPos;
nArgStartPos = strRecvbuf.find(c_strStartOfArgs);
nArgEndPos = strRecvbuf.length() - c_strEndOfInMess.length() - 1;
//
do not include the "\r\n" at the end
//Check if a '(' was found in the string
if(nArgStartPos != string::npos){
strArgs = strRecvbuf.substr(nArgStartPos, nArgEndPos -
nArgStartPos
+ 1);
//RemoveBackSlash(&strArgs);
//Check if there is an argument with delimiters
if(mi->GetArgHasDelim()){
//extract one argument
strExtracted = strArgs.substr(1, strArgs.size() - 2);
mi->PushArg(strExtracted);
CLogger::Log("argument with delimiters", strExtracted);}

//Check if there is only an empty paranthesis
else if (removeSigns(strArgs, c_strSpace) !=
c_strEmptyParenthesis)
{
//start retrieving several args between the paranthesis
int nSpacePos = 0, nArgDelimPos = 0, nLastArgDelimPos = 0;
bool bHasMoreArgs = true;
nArgEndPos = strArgs.length() - 1;
while (bHasMoreArgs){
//find the next delimiter
nArgDelimPos = GetNextDelimPos(nLastArgDelimPos, strArgs);
//check if there is only one argument left
if (nArgDelimPos == string::npos){
bHasMoreArgs = false;
nArgDelimPos = nArgEndPos;}

//extract one argument
strExtracted = strArgs.substr(nLastArgDelimPos + 1,
nArgDelimPos -
nLastArgDelimPos - 1);
//Trim the arg from spaces
strExtracted = removeEdgeSigns(strExtracted, c_strSpace);
if (strExtracted.size() <= 0){
throw DocException("RetrieveArgs failed. Arg no " +
toString(mi->GetNoOfArgs() + 1) + " has no length.");
}

//Remove the backslashes from the argument
//RemoveBackSlash(&strExtracted);
//save strExtacted in tne messagein vector
mi->PushArg(strExtracted);
nLastArgDelimPos = nArgDelimPos;

}
}
}
return 0;
}

Kind Regards,
Raghavendra L Pise.

On Dec 4, 12:42 pm, "Volodymyr Shcherbyna"

<v_scherb...@xxxxxxxxxxxxxxx> wrote:
Hello,

I took a brief look at the code. And I have one simple
question,

- if you send a string, let's say "sometest string" do you
recieve
exactly
the same string in your recieve and listen methods? (After the
recv
method
returns)?

If so, then the problem with sending unicode text is solved.
Also, I
noticed, that you incorrectly truncate the unicode strings when
recieving,
for example,

bytesRecv = recv( ConnectSocket, (char*)recvbuf, 32, 0 );

[...]

recvbuf[bytesRecv] = '\0';

Here you truncate the unicode buffer at incorrect position,
because
bytesRecv - is the number of bytes. And recvbuf is a multibyte
array,
so,
since you want to truncate on a specified character, and not
the
BYTE!
you
should devide bytesRecv / sizeof(wchar_t) and only then
trancate,
so:

recvbuf[bytesRecv / sizeof(wchar_t)] = '\0';

Would be correct. Also, why do you allow to write exatly 32
bytes
in
this
(and other) lines. If

...

read more >>



.



Relevant Pages

  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • Re: Fast UTF-8 strlen function
    ... >> Is there a fast UTF-8 string length function floating around? ... Length in bytes, or length in characters? ... For UTF-8, the main basic "change" you have to make to your string routines ... then I could individually look up the characters in my UNICODE ...
    (alt.lang.asm)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • Re: Prothon should not borrow Python strings!
    ... """It does not make sense to have a string without knowing what encoding ... same cul de sac as Python. ... Prothon_String_As_ASCII // raises error if there are high characters ... Python's split between byte strings and Unicode strings is ...
    (comp.lang.python)