Re: FTPFindFirstFile unicode

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



On Sep 30, 4:47 am, Joseph M. Newcomer <newco...@xxxxxxxxxxxx> wrote:
Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
window; the hex for the text in the lower window appears below that window.

CP_UTF8 should be the last code page in the "Code Page" list.
                                joe





On Tue, 29 Sep 2009 15:07:47 -0700 (PDT), ksr <sujatha.kokkir...@xxxxxxxxx> wrote:
On Sep 29, 12:27 pm, Joseph M. Newcomer <newco...@xxxxxxxxxxxx> wrote:
It looks like you might have hit some buffer limitation on the other side of the
connection.  It has occurred to me that the buffer might be MAX_PATH but the ftp system
might choose to use UTF-8 encoding.  Therfore, a 260-character Japanese name would need to
be encoded using 520 characters in UTF-8, and this might be where the problem is.

You can check for surrogates (although I suspect this is now NOT the problem!) by getting
the file name as a string and printing out the bytes of the string.  The look to see if
any of the bytes are in the surrogate range.  But I suspect that a UTF-8 encoding might be
the cuplrit.

Try encoding the filename in UTF-8.  Note that you can do this by using my Locale
Explorer, choosing the MultiByte tab, pasting the Japanese filename in the top window, and
using WideCharToMultiByte with the UTF-8 locale selected.  Then look at the result and see
what would happen if you accepted, say, the first 260 UTF-8 characters..  Since some UTF-8
encodings take more than 2 characters, there is a possibility that this is what you are
seeing that creates the even smaller limit (128 vs. 130)

This is still just guesswork on my part, but I would rate it high in the list of probable
causes.
                                        joe

On Tue, 29 Sep 2009 09:20:23 -0700 (PDT), ksr <sujatha.kokkir...@xxxxxxxxxx> wrote:
On Sep 29, 1:38 am, "Mihai N." <nmihai_year_2...@xxxxxxxxx> wrote:
I am using WinInet API FtpFindFirstFile to enumerate files and folders
on FTP server. It works fine for filenames that have english
characters and filepath upto 260 characters. But for filenames that
have Japanese characters it fails.
For Japanese filenames it works fine upto 128 characters, but fails on
longer filenames. It is a unicode compiled project, my question is,
why is it failing to read upto 260 characters for japanese filenames.
I tried by explicitly using FtpFindFirstFileW, but it does not work.
Please help.

I would try to connect with a telnet to the ftp server and see if
it supports RFC 2640 ("Internationalization of the File Transfer Protocol")
Most servers don't.

If it is supported, then I would do some digging to see if FtpFindFirstFile
understands it. It is possible that it is not.

If it works for short Japanese file names, but not for longer ones,
I would suspect some buffer lenght parameter is wrong.

--
Mihai Nita [Microsoft MVP, Visual C++]http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Thank you for your responses.

Joe, to your questions:

On a non-Japanese windows, using characters A-Z0-9, I can read
filenames upto 260 characters (path+filename+ext), explorer limits the
length to 260 characters and FTP can read this path.
On a Japanese windows, using characters A-Z0-9, again I can read
filenames upto 260 characters (path+filename+ext)

However on a Japanese windows, using Japanese characters I can
consistently see that it can read filenames upto 128 characters
(excluding extension). This is irrespective of path length (ie, path
could contain Japanese or A-Z0-9 characters). The number of characters
in the filename is 128 but byte count is 256, if you include file
extension, number of characters is 132 and byte count is 260.

It looks like there is a bug.

Can you explain how I can check this?
"Also, check whether or not your  Japanese characters require Unicode
surrogates for UTF-16 encoding.   "

Let me know.

Thanks,
ksr

Joseph M. Newcomer [MVP]
email: newco...@xxxxxxxxxxxx
Web:http://www.flounder.com
MVP Tips:http://www.flounder.com/mvp_tips.htm-Hide quoted text -

- Show quoted text -

In locale explorer, should I select UTF-8 under CodePage? I don't see
UTF-8 in the locale list. I pasted the 260 character Japanese filename
using WideCharToMultiByte. The hex values are showing in the window
below. Where will the result in UTF-8 be displayed?

Joseph M. Newcomer [MVP]
email: newco...@xxxxxxxxxxxx
Web:http://www.flounder.com
MVP Tips:http://www.flounder.com/mvp_tips.htm- Hide quoted text -

- Show quoted text -


Try encoding the filename in UTF-8. Note that you can do this by
using my Locale
Explorer, choosing the MultiByte tab, pasting the Japanese filename in
the top window, and
using WideCharToMultiByte with the UTF-8 locale selected. Then look
at the result and see
what would happen if you accepted, say, the first 260 UTF-8
characters. Since some UTF-8
encodings take more than 2 characters, there is a possibility that
this is what you are
seeing that creates the even smaller limit (128 vs. 130)


Check the WideCharToMultiByte option. Click the arrow that points down to the MultiByte
window; the hex for the text in the lower window appears below that window.


When I pasted the Japanese filename and click the arrow to Multibyte,
the text appears that appears in the top windows is not readable, the
hex values appear in the lower window, but I don't know how to convert
them to readable UTF-8 characters. May be I am missing something here?

The other test suggested by Mihai to connect to ftp server using
telnet gave this result:

211-FEAT
SIZE
MDTM
211 END

So it looks my ftp server does not support internalization.
How can I make it support internalization? like is there anything I
can install/download to enable this support?

Thanks,
ksr
.



Relevant Pages

  • Re: FTPFindFirstFile unicode
    ... be encoded using 520 characters in UTF-8, and this might be where the problem is. ... Explorer, choosing the MultiByte tab, pasting the Japanese filename in the top window, and ... It works fine for filenames that have english ...
    (microsoft.public.vc.mfc)
  • Re: FTPFindFirstFile unicode
    ... window; the hex for the text in the lower window appears below that window. ... be encoded using 520 characters in UTF-8, and this might be where the problem is. ... It works fine for filenames that have english ... I pasted the 260 character Japanese filename ...
    (microsoft.public.vc.mfc)
  • Re: FTPFindFirstFile unicode
    ... you should see "readable" characters for the UTF-8 ... window; the hex for the text in the lower window appears below that window. ... It works fine for filenames that have english ...
    (microsoft.public.vc.mfc)
  • Re: FTPFindFirstFile unicode
    ... a 260-character Japanese name would need to ... be encoded using 520 characters in UTF-8, and this might be where the problem is. ... It works fine for filenames that have english ...
    (microsoft.public.vc.mfc)
  • Re: Unicode-based FreeBSD
    ... filenames which have some "special characters" in the Latin alphabet. ... UTF-16 and UTF-8 support and the ... and type Icelandic and Russian text simultaneously in syscons console. ... To read and type some filenames (that contains only characters that are ...
    (freebsd-current)