Re: Tranfering unicod charcters in Socket programming!



Hello,

I took a brief look at the code. And I have one simple question,

- if you send a string, let's say "sometest string" do you recieve exactly
the same string in your recieve and listen methods? (After the recv method
returns)?

If so, then the problem with sending unicode text is solved. Also, I
noticed, that you incorrectly truncate the unicode strings when recieving,
for example,

bytesRecv = recv( ConnectSocket, (char*)recvbuf, 32, 0 );

[...]

recvbuf[bytesRecv] = '\0';

Here you truncate the unicode buffer at incorrect position, because
bytesRecv - is the number of bytes. And recvbuf is a multibyte array, so,
since you want to truncate on a specified character, and not the BYTE! you
should devide bytesRecv / sizeof(wchar_t) and only then trancate, so:

recvbuf[bytesRecv / sizeof(wchar_t)] = '\0';

Would be correct. Also, why do you allow to write exatly 32 bytes in this
(and other) lines. If this your protocol specification?

bytesRecv = recv( ConnectSocket, (char*)recvbuf, 32, 0 );

The recvbuf is 64 characters long, it means that it's 128 bytes, so you
should write

bytesRecv = recv( ConnectSocket, (char*)recvbuf, 128, 0 ); or BETTER:

bytesRecv = recv( ConnectSocket, (char*)recvbuf, sizeof(recvbuf), 0 );

HTH

--
Volodymyr
<raghupise@xxxxxxxxx> wrote in message
news:23ff0665-d73e-40b7-9916-8a40a960d05b@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Dear Volodymyr Shcherbyna,

I have replaced string with wchar_t data tyep in my project.
The problem in my project is not giving proper response for below line
in listen function..

"if (strRecvbuf.substr(strRecvbuf.length() -
c_strEndOfInMess.length(), c_strEndOfInMess.length()) ==
c_strEndOfInMess)."

I have enlcosed my below code, that i have made changes in my code.
Really this could be design level change in our project,But I was
strcuk these issues.So i need your help to fix this problem

And If any wrong with my code snippet you are most welcome to recrtify
these errors.
You can ask any queation if you have?
Rightnow I am facing problem in Lsiten function.


1) Send Function.

int CConnection::Send(wchar_t *mess)//., CMessageIn* mi)
{
//mess += c_strEndOfOutMess;
STRCAT(mess,c_strEndOfOutMess.c_str()); //Rlp Has done to fix
Unicode problem...

int bytesSent = 0;
//bytesSent = send(ConnectSocket, mess.c_str(), mess.size(),0);

bytesSent = send(ConnectSocket, (const char *)mess,(wcslen(mess )*
sizeof(wchar_t)), 0 );

return bytesSent;

}

2) Recv function:
string CConnection::Recv(int nSecTimeout)
{
int bytesRecv;
string strRecvbuf("");
//char recvbuf[64] = ""; //Rlp Has sone to fix unicode problem
wchar_t recvbuf[64] = L" ";
bool bStopListen = false;


ResetTimer();
while(!bStopListen){
bytesRecv = SOCKET_ERROR;
while( bytesRecv == SOCKET_ERROR ) {

//bytesRecv = recv( ConnectSocket, recvbuf, 32,
0 );

bytesRecv = recv( ConnectSocket, (char*)recvbuf, 32, 0 );
if (bytesRecv == 0 ) {
CLogger::Log("bytesRecv == 0");
return strRecvbuf;
}
if (bytesRecv == WSAECONNRESET) {
CLogger::Log("WSAECONNRESET");
return strRecvbuf;
}
if (CheckTimeOut(nSecTimeout)) {
CLogger::Log("Timeout");
return strRecvbuf;
}
}
recvbuf[bytesRecv] = '\0';
STRCAT(strRecvbuf.c_str(), recvbuf);
//strRecvbuf.append(recvbuf);


if(strRecvbuf.length() < c_strEndOfInMess.length()){
bStopListen = false;
} else if (strRecvbuf.substr(strRecvbuf.length() -
c_strEndOfInMess.length(), c_strEndOfInMess.length()) ==
c_strEndOfInMess){
bStopListen = true;
}

}
return strRecvbuf.substr(0, strRecvbuf.length() -
c_strEndOfInMess.length());
}

3) Listen function

APIERR CConnection::Listen()
{
APIERR err = noErr;

//Receive variables
int bytesRecv = SOCKET_ERROR;
//Parser variables
CMessageIn mi;
CParser parser;
char sendbuf[256] = "";

wchar_t recvbuf[256] = L" ";
char peekbuf[1] = "";

string strRecvbuf("");
bool bStopListen = 0;

//Initiate the message
mi.Erase();

//No document has yet been opened
m_bDocIsOpen = false;

//Receive data
while(!bStopListen){

bytesRecv = SOCKET_ERROR;
ResetTimer();
while( bytesRecv == SOCKET_ERROR ) {
bytesRecv = recv( ConnectSocket,(char*)recvbuf, 64, 0 );
if ( bytesRecv == 0 || bytesRecv == WSAECONNRESET ||
CheckTimeOut()) {
return CloseListen();
}
}
//recvbuf[bytesRecv] = L'\0'; Rlp has doen to fix unicode problem.
recvbuf[bytesRecv] =L'\0';
STRCAT(strRecvbuf.c_str(),recvbuf);
//strRecvbuf.append(recvbuf);

//strRecvbuf.append(recvbuf); Rlp has doen to fix unicode problem.


//.//Check if the message is ended wiht \r\n

if (strRecvbuf.substr(strRecvbuf.length() -
c_strEndOfInMess.length(), c_strEndOfInMess.length()) ==
c_strEndOfInMess){

// Here my Quark Server goign to strcuk and Here
c_strEndOfInMess const string c_strEndOfInMess(string("")+char(4))
const string c_strStartOfArgs("(");


try {
//Parse the message
parser.Parse(strRecvbuf,&mi); //Rlp has done to fix uncode....

//parser.Parse((char*)strRecvbuf.c_str(), &mi); //Rlp has done
to fix uncode

//Handle the message
CMessageOut clMo = HandleRequest(&mi);

//Send(clMo.ToString());//., &mi); Rlp has done to fix unicode
problem..

wchar_t unicode_data[32]= L"Ok"; //
Rlp has done to fix unicode problem..

//Send(clMo.ToString()); // Rlp has done to fix unicode problem.
Send((wchar_t*)unicode_data[32]);

//Check if we got an exit message...

bStopListen = (mi.GetCmd() == op_exit);

//Clear the buffer
strRecvbuf.erase();
mi.Erase();

} catch (DocException e) {
//Log the error
CLogger::Log("Exception:", e.GetMessage());
uchar bufptr[256];
formaterror(XTOSErrToAPIERR(e.GetMessageNo()),bufptr);

//Send the error message to the client

//Send(e.GetMessage());//., &mi); Rlp has done to fix unicode
problem.
Send((wchar_t*)e.GetUnicodeMessage());

//Clear the buffer
strRecvbuf.erase();
mi.Erase();
}
}
}
return CloseListen();
}



On Nov 30, 3:35 pm, raghup...@xxxxxxxxx wrote:
Dear Volodymyr Shcherbyna,

I will test this and let you the result.

Thanks for your advice.

On Nov 30, 1:39 pm, "Volodymyr Shcherbyna"

<v_scherb...@xxxxxxxxxxxxxxx> wrote:
Hello,

There is no need to edit declarions of winsock functions. These
functions
are able to send _any_ type of information, whether 1 byte strings or 2
bytes string or dwords and other types.

The only issue is how you interpret the data. Here is a few examples,

(pseudo code)

1. Sending single bytes string,

[
char * szSomeString = "string";

send(s, szSomeString, strlen(szSomeString ), 0);
]

The recieving side should have the following code:

[
char szSomeBuffer[1024] = {0};

recv(s, szSomeBuffer, sizeof(szSomeBuffer), 0);
]

2. Sending double bytes string,

[
wchar_t * szSomeString = L"string";

send(s, szSomeString, (wcslen(szSomeString ) * sizeof(wchar_t)), 0);
]

The recieving side should have the following code:

[
wchar_t szSomeBuffer[1024] = {0};

recv(s, szSomeBuffer, sizeof(szSomeBuffer), 0);
]

As you can see from code, if we send the single byte I use char array
to
hold data, if we send unicode I use wchar_t to hold data.

--
Volodymyr<raghup...@xxxxxxxxx> wrote in message

news:5c699225-55bb-4749-a556-8d3a1e7d2863@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Dear Volodymyr Shcherbyna/Chris Becke,

Thanks for your suggestions.
I am using winsock2.h header files and Ws2_32.lib library .
I cant see the body of send or recv funtion. I can see declaration of
these function.
Here I can't edit "send" or "recv" funtion definition I believe.
I am in delima how I can fix this problem or should I use wraper
class
concept.
I need some sugestion on this problem.
Looking forward to your answer.

I am enlcosing send and recv declaration as follows.
Ofcouse we are not using wchar_t data type.We are using const char
FAR
* buf and char FAR * buf data type.

send(
SOCKET s,
const char FAR * buf,
int len,
int flags
);

recv(
SOCKET s,
char FAR * buf,
int len,
int flags
);
#endif /

On Nov 29, 8:32 pm, "Volodymyr Shcherbyna"
<v_scherb...@xxxxxxxxxxxxxxx> wrote:
Short answer: make sure that your variable "recvbuf" is a pointer to
unicode
string, i.e. wchar_t.
Long answer: read reply of Chris Becke

--Volodymyr<raghup...@xxxxxxxxx> wrote in message

news:167e508d-7f3b-4132-b0b7-3a078c669fa9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Dear Chris,

Thanks for sending information.
Really I havnt work with Scoket programming earlier.
I need to study Unicode, code page concepts.

It could be better, if I get code snippet for any of your
solution.

Atleast you can suggest me where I can get code snippet for this.

Thanks in advance.

On Nov 29, 5:12 pm, "Chris Becke" <chris.be...@xxxxxxxxx> wrote:
The socket library doesnt care what sort of data its sending. it
just
takes
a buffer of bytes and ensures that it arrives on the other side.
Its
up
to
the application to determine what those bytes mean.

Now, if you have a 64 byte buffer of "double byte" characters
that
means
one
of two things: you are using a multibyte / codepage encoding, or
you
are
using windows unicode - sometimes known as ucs2 or utf-16 - an
encoding
that
uses two bytes to encode a character.

Now, if you are getting ?'s it implies that some part of the
system is
having trouble translating / understanding the characters. Which
in
turn
implies that unicode isn't being used - instead there is a
mismatch in
the
ansi code page of the two systems.

To resolve this sort of issue, you either need to
1. Send the codepage of text accross the wire with the text.
OR,
2. standardise on a codepage to use. Because each language has a
different
codepage and characters from arbitrary languages can't
necessarially
be
encoded in each others codepages this is dangerous.
OR,
3. Instead of using an ansi codepage, use the universal encoding
that
can
encode all characters in all languages (in use on the internet at
least.) -
Unicode has a multibyte (1, 2 or more bytes per character)
encoding
called
utf-8, and a 16 bit encoding called utf-16.

The Windows API will have some functions to help you :-

GetACP() will get the local PCs ansi code page identifier.
WideCharToMultiByte() will convert a utf16 unicode string to a
multibyte
ansi OR utf-8 string.
MultiByteToWideChar() will convert from a local system encoded
string
to
unicode.

<raghup...@xxxxxxxxx> wrote in message

news:c284561a-d461-4dfc-89f5-392c9cde4994@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Hi Friends,

In socket programming I want to connect double byte charcters.
Earlier we used recv() function for single byte charcter.
If we use double byte chracter in below recv function, its
coming as
a
question mark symbol.
If anybody knows how can I handle double byte charcter in
Scoket
programing(recv function),Please let me know.

My codesnippet:

while( bytesRecv == SOCKET_ERROR ) {
bytesRecv = recv( ConnectSocket, recvbuf, 64, 0 );
if ( bytesRecv == 0 || bytesRecv == WSAECONNRESET ||
CheckTimeOut())
{
return CloseListen();
}
}

I am looking for sample example.

Note:Here recvbuf works for singel byte not for doublebyte
charcters.

Thanks in advance.




.


Loading