Re: Tranfering unicod charcters in Socket programming!



Dear Chris Becke,

2)I will explain clearly.

A. I am sending unicode as string, but its coming as "搾捯潟数⡮㩣慜换⹜湕捩摯ㅥ焮
灸Щ"" in recv argument of recv
function.This will be amazing output.
B. If I send any unicode string(硩硻碒碨) , its coming as 搾捯潟数⡮㩣慜换㽜㼿⸿硱.
I believ you understand my problem.

3) You are telling about conversion b/w MBCS (multibyte) to Unicode.
I agree about your point.But my problem is totaly different ,
after conversion I should substitute string handling
function( find, substr,compare) with wchar_t funcion(If
possible ).
If this is not possible Shall I try with string to wstring
conversion?
Here i believe i will find string handling function like find,
compare -etc function.
Still if u have doubts in my problem.You are most welcome to ask
any quires.

If your previous post answers mentioned above(3rd )
question,Please explain me deeply.



Kind Regards,
Raghavendra L Pise..



Chris Becke wrote:
<raghupise@xxxxxxxxx> wrote in message
news:569ae0c7-42c8-4f5f-9bac-679a9099458c@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Dear Volodymyr Shcherbyna,

1) Really i done mistake by aking code snippet or posting long list of
code.

2) My immediate doubt in my code is why recvbuf is returning double
byte charcter.
Actualy i am sending simple string, but its returing double byte
charcter.

What do you mean in this context by double byte characters. If you examine
the actual byte values sent, and recieved, are they the same?

An easy way to check if you are using Visual Studio 6 (this should also work
in other versions but I know it works in VS6) is to place the variable in
the debug window with a ',m' - which shows a memory dump of the variable.

In your case you would enter

recvbuf,m

in your VS watch window to see the byte values.


bytesRecv = recv( ConnectSocket, (char*)recvbuf, sizeof(recvbuf),
0 );

3) In our project design, I have replaced string by wcahr_t data
type.In other class i have to replace
reaplace string funcitons like "find", "substr" with wchar_t
substitute functions?
How I can achiev this.If possible give me some hints
should i use std::wstring instead of wcahr_t fucntion to resolve
above problem(3rd question)

Look. All these frameworks are obscuring the problem. Its usually much
clearer to deal with very simple C, rather than C++ when trying to
understand api issues... So, in that vein, in order to transform a string
from MBCS (multibyte) to Unicode and send it you could make a function like
this

int SendStringAsUnicode(SOCKET s, LPCSTR pszMbcsString)
{
WCHAR szWideString[1024];
int cchWide =
MultiByteToWideChar(CP_ACP,0,pszMbcsString,-1,sizeof(szWideString)/sizeof(WCHAR));
return send(s,(char*)szWideString,cchWide,0);
}

To recieve a Unicode string and transform it back into MBCS, you could do
something like

int RecvUnicodeAsAnsiString(SOCKET s,LPSTR pszBuffer,INT cchBuffer)
{
WCHAR szWideString[1024];
int cbRecv =
recv(s,(char*)szWideString,sizeof(szWideString)/sizeof(WCHAR),0);
if(cbRecv <=0)
return cbRecv;
int cch =
WideCharToMultiByte(CP_ACP,0,szWideString,cbRecv/sizeof(WCHAR),pszBuffer,cchBuffer,NULL,NULL);
return cch;
}

The recvstring function will work to test the basic idea, but im sure you
can see it has a huge number of issues - if an odd number of bytes is
recieved it will loose a byte and the next read will be mis-aligned, and if
the passed in buffer is too small then characters will just be lost.

Also note that, once in Unicode, you should really KEEP the string in
unicode...
If you call GetACP() to get the systems ansi codepage identifier, and the
result is different on the server from on the client, then very likley, many
characters in the transferred string will be replaced with ??'s.


4) I have replaced following code with
"STRCAT(strRecvbuf.c_str(),recvbuf);"
strRecvbuf.append(recvbuf);

And STRCAT defintion is
inline void STRCAT(uchar* d,const uchar* s) {
pstrconcat(d,s);
}

I am asking this above question(4) because i am not sure this
conversion works fine or not if we send single or double byte
charcter.

Kind Regards,
Raghavendra L Pise.



On Dec 6, 12:32 pm, "Volodymyr Shcherbyna"
<v_scherb...@xxxxxxxxxxxxxxx> wrote:
Hello,

You have to realize that we help here not by doing ourself _your_ work.
We
help by providing hints and solutions which allows you to solve the
problem
yourself. More specifically, I give answers. I can also take a look at
short
piece of code, and write the corrected version, but when OP just makes
copy
paste from his project into NG it shows that he just does not respect our
time. Because:

- the code usually is formatted ugly
- the amount of code is huge
- the code contains unnessary staff which makes diffucut understanding of
the cause of problem
- noone ever attach the source files to the messages

If you have questions, I can help, no problem. But the definition should
be
strict, for example,

- How would I store and what string functions should I use to manipulate
with unicode strings?

You can also take a short code which is clean, compiliable which depicts
your problem, attach it to your post, I will check it. But please, don't
provide big listings.

--
Volodymyr, Windows SDK MVP

<raghup...@xxxxxxxxx> wrote in message

news:23967872-404f-4e3c-a2a6-08dfb82cc0ff@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx



Hi Volodymyr,

1) I am requesting you to see my Parese functin from CParese
class.This one i have posted in
earlier messages(Post).
Really i have more problem to fix unicode problem in my project.
At the same time, I need your help.
You have suggested me to look on appropriate unicode functions for
strings manipulations.
I wan't able to findout your solution.
Please, have a look on my code and do some modification.
Here i used string.substr or string.find.
How can substitute this with unicode function.
Here i am in trouble.

2)Still also, I am geting double byte charcter in recv(), if i send
smiple string.
For this question I am looking for your solution.

Kind Regards,
Raghaevndra L Pise.

On Dec 5, 12:51 pm, "Chris Becke" <chris.be...@xxxxxxxxx> wrote:
You need to write a small test application using blocking sockets with
minimal error handling just to get this concept down. Once the simple
example is working, port it to your application. At the moment it is
difficult to tell if the problem is in your conversion of ansi to
unicode,
unicode back to ansi, or somwhere in your socket code.
Hi Volodymyr,

<raghup...@xxxxxxxxx> wrote in message

news:7b4600b5-762b-45f5-9102-de56343e1e1c@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Hi Volodymyr,

1) As you said I have to use std::wstring for unicode characters
.But
how I can achieve this.
which header file i need to use?

2)What about recv() function problem.

As I explained in my earlier post i was getting junk unicode
charcters,If i send simple charcter.

Looking forward to your answers.

Kind Regards,
Raghavendra L Pise.

On Dec 4, 5:56 pm, "Volodymyr Shcherbyna"
<v_scherb...@xxxxxxxxxxxxxxx> wrote:
Hello again ;)

As I see further in your code, you have more problems. You recieve
the
data
in unicode, but then you pass the resulted UNICODE string to Parse
method,
which is defined as:

void CParser::Parse(string strRecvbuf, CMessageIn* mi)

And this is wrong, because this method takes as an input parameter
a
std::string object, which is a wrapper over ANSI string. You cannot
simply
take a UNICODE string and convert it into corresponding std::string
object.
You should use std::wstring for unicode characters. Also, you will
need
to
use appropriate unicode functions for strings manipulations.

--
Volodymyr<raghup...@xxxxxxxxx> wrote in message

news:34aba993-c626-4a8b-bca6-720576948d29@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Dear Volodymyr,

1) As you said, I have done modification as given below.

recvbuf[bytesRecv / sizeof(wchar_t)] = '\0';

2)You asked me about what is the return value after recv()
function.
I am getting amazing output for this.
Earlier i was trying with these double byte charcter "????".
But this is that, Now I am sending simple string its coming as
"??????
????????? " these charters.
I dont know why its coming double byte charcter,If i am sending
simple
string called "unicode1".
I believe some where double byte chrcters are going to store after
executing recv() function.

3) I have another problem,
In this below line of Listen fucntion.

parser.Parse((char*)strRecvbuf.c_str(), &mi); //Rlp has done to
fix
unicode

server is not going to strcuk for this above line.
Parser() defintaion s follows.
ofcouse we use strting data type for that reason its going to
strcuk.
But i need your help to fix this problem also.

void CParser::Parse(string strRecvbuf, CMessageIn* mi)
{

bool bArgHasDelim = 0; //special extraction of arguments for
tagstring, whichs argument can contain c_strArgDelim
mi->SetCmd(op_error);
string strCmd("");

RetrieveCmd(strRecvbuf, &strCmd, mi);
InterpretCmd(strCmd, mi);
RetrieveArgs(strRecvbuf, mi);

}

int CParser::RetrieveCmd(string strRecvbuf, string* strCmd,
CMessageIn* mi)
{
//
//Retrieve command
//
int nCmdStartPos = 0, nCmdEndPos;
int nMessEndPos = strRecvbuf.find(c_strEndOfInMess);
//Check if the message contains a command i.e starts with "/" or
">"
if (strRecvbuf.substr(0,1) == c_strStartOfCmd ||
strRecvbuf.substr(0,1) == c_strStartOfCmd2){
nCmdStartPos++;
mi->SetShallReturnMessage(true);
//"//" or ">>" gives silent mode (no answers)
if (strRecvbuf.substr(1,1) == c_strStartOfCmd ||
strRecvbuf.substr(1,1) == c_strStartOfCmd2){
nCmdStartPos++;
mi->SetShallReturnMessage(false);}

//Find start of arguments and extract the string inbetween
nCmdEndPos = strRecvbuf.find(c_strStartOfArgs);
if (nCmdEndPos != string::npos) {
*strCmd = strRecvbuf.substr(nCmdStartPos, nCmdEndPos -
nCmdStartPos);
//Otherwise the command MUST be terminated by c_strEndOfInMess}
else {

*strCmd = strRecvbuf.substr(nCmdStartPos, nMessEndPos -
nCmdStartPos);}
} else {

throw DocException("RetrieveCmd failed. No '/' in beginning of
command.");

}

return 0;

}

int CParser::RetrieveArgs(string strRecvbuf, CMessageIn* mi)
{
//
//Retrieve arguments
//
string strExtracted("");
string strArgs("");
int nArgStartPos, nArgEndPos;
nArgStartPos = strRecvbuf.find(c_strStartOfArgs);
nArgEndPos = strRecvbuf.length() - c_strEndOfInMess.length() - 1;
//
do not include the "\r\n" at the end
//Check if a '(' was found in the string
if(nArgStartPos != string::npos){
strArgs = strRecvbuf.substr(nArgStartPos, nArgEndPos - nArgStartPos
+ 1);
//RemoveBackSlash(&strArgs);
//Check if there is an argument with delimiters
if(mi->GetArgHasDelim()){
//extract one argument
strExtracted = strArgs.substr(1, strArgs.size() - 2);
mi->PushArg(strExtracted);
CLogger::Log("argument with delimiters", strExtracted);}

//Check if there is only an empty paranthesis
else if (removeSigns(strArgs, c_strSpace) != c_strEmptyParenthesis)
{
//start retrieving several args between the paranthesis
int nSpacePos = 0, nArgDelimPos = 0, nLastArgDelimPos = 0;
bool bHasMoreArgs = true;
nArgEndPos = strArgs.length() - 1;
while (bHasMoreArgs){
//find the next delimiter
nArgDelimPos = GetNextDelimPos(nLastArgDelimPos, strArgs);
//check if there is only one argument left
if (nArgDelimPos == string::npos){
bHasMoreArgs = false;
nArgDelimPos = nArgEndPos;}

//extract one argument
strExtracted = strArgs.substr(nLastArgDelimPos + 1, nArgDelimPos -
nLastArgDelimPos - 1);
//Trim the arg from spaces
strExtracted = removeEdgeSigns(strExtracted, c_strSpace);
if (strExtracted.size() <= 0){
throw DocException("RetrieveArgs failed. Arg no " +
toString(mi->GetNoOfArgs() + 1) + " has no length.");
}

//Remove the backslashes from the argument
//RemoveBackSlash(&strExtracted);
//save strExtacted in tne messagein vector
mi->PushArg(strExtracted);
nLastArgDelimPos = nArgDelimPos;

}
}
}
return 0;
}

Kind Regards,
Raghavendra L Pise.

On Dec 4, 12:42 pm, "Volodymyr Shcherbyna"

<v_scherb...@xxxxxxxxxxxxxxx> wrote:
Hello,

I took a brief look at the code. And I have one simple question,

- if you send a string, let's say "sometest string" do you
recieve
exactly
the same string in your recieve and listen methods? (After the
recv
method
returns)?

If so, then the problem with sending unicode text is solved.
Also, I
noticed, that you incorrectly truncate the unicode strings when
recieving,
for example,

bytesRecv = recv( ConnectSocket, (char*)recvbuf, 32, 0 );

[...]

recvbuf[bytesRecv] = '\0';

Here you truncate the unicode buffer at incorrect position,
because
bytesRecv - is the number of bytes. And recvbuf is a multibyte
array,
so,
since you want to truncate on a specified character, and not the
BYTE!
you
should devide bytesRecv / sizeof(wchar_t) and only then trancate,
so:

recvbuf[bytesRecv / sizeof(wchar_t)] = '\0';

Would be correct. Also, why do you allow to write exatly 32 bytes
in
this
(and other) lines. If

...

read more >>

.



Relevant Pages