Managed C++ code calling unmanaged C++ code using COM

From: Jerry Cain (jerry_at_cs.stanford.edu)
Date: 06/05/04


Date: Sat, 5 Jun 2004 13:31:04 -0700

My consulting work involves my cannibalizing some third-party SDK code that translates
Japanese to English. That sample code makes use of COM, but not all of it is handled
by COM, so there is C and C++ that contributes to their implementation. In a nutshell, I've
written a wrapper class in legacy C++ whose interface is more or less trivial. For the purposes
of illustration, I'll just write it out here:

class LegacyJETranslationEngine {
  
     public:
            const char *Translate(const char *japanese);

     private:
           // implementation details mostly irrelevant, save that COM is used..
};

In particular, the implementation of Translate uses CLSIDFromProgID and CoCreateInstance in order
to realize the translation. The details of that are what I assume is irrelevant, so I'll omit them.
An unmanaged C++ test harness has more or less convinced me that this class does its job
properly, provided the application calls CoInitializeEX (I don't do this in the Translate method itself,
so the .NET runtime calls this for me when it subscribes to unmanaged code using COM)

Because this functionality needs to be folded into a C# Windows app, I created a simple managed C++
class to marshal the strings to const char *s, call the unmanaged translate method, and then construct
a .NET string. Since that code is more or less boilerplate, I'm not exposing anything proprietary and
can reproduce it here.

.h

namespace TranslationEngines
{
     public __gc class JETranslationEngine
     {
          public:
               String *TranslateToEnglish(String *japanese);
          };
     }
}

.cs

namespace TranslationEngines
{
     String *JETranslationEngine::TranslateToEnglish(String *japanese)
     {
          LegacyJETranslationEngine engine;
          IntPtr japaneseRawPointer = Marshal::StringToHGlobalAnsi(japanese);
          const char *unmanagedJapaneseRep = (const char *) japaneseRawPointer.ToPointer();
          const char *result = engine.Translate(unmanagedJapaneseRep); // pointer to persisent memory, so nothing wrong with that
          Marshal::FreeHGlobal(japaneseRawPointer);
          return result;
     }
}

My understanding is that the LegacyJETranslationEngine constructor is the first call into unmanaged code,
so that the .NET runtime fires up COM by effectively calling CoInitializeEx for me, using the concurrency
model of the current thread, whatever that may be. My hunch is that the implemenation of the
two classes I've mentioned (the unmanaged one above, and this managed one in place just to do the
marshaling) are fine, because when a C# console application uses the managed class, I translate Japanese
to English without any problems. The third-party COM objects and the unmanaged code deal with Unicode
characters (at least that's what their documentation claims), so the default marshalling of .NET strings
to pointers to null-terminated Unicode char arrays seems to be working just beautifully all by itself, but only
when exercised from a C# console application (EJTranslationEngine is a separate class, and it works fine.)

     [STAThread]
     static void Main(string[] args)
     {
          for (int i = 0; i < args.Length; i++)
          Console.WriteLine(args[i]);
          EJTranslationEngine ejEngine = new EJTranslationEngine();
          JETranslationEngine jeEngine = new JETranslationEngine();
          ExerciseTranslationEngines(ejEngine, jeEngine);
          Console.ReadLine();
     }

     static void ExerciseTranslationEngines(EJTranslationEngine ejEngine, JETranslationEngine jeEngine)
     {
          string salutation = "Good morning!";
          string japanese = ejEngine.TranslateToJapanese(salutation);
          string english = jeEngine.TranslateToEnglish(japanese);
          Console.WriteLine("Original English: " + salutation);
          Console.WriteLine("Japanese Translation: " + japanese);
          Console.WriteLine("English Reverse Translation: " + english);
     }

The salutation and english strings are exactly the same, the intervening Japanese is what it's supposed to be, and
the layered abstraction of managed C++ over unmanaged C++ over third-party COM with C seems to work
as intended. Again, I'll emphasize that CoInitializeEx is called on my behalf, and in this case, the call would
be CoInitializeEx(NULL, COINIT_APARTMENTTHREADED), since the Main method is decorated with [STAThread].

But the managed layer fails me when called from a C# windows application, and I can only assume that it's
related to the fact that there are multiple threads involved. Imagine three TextBox windows and a Translate
Button, where the user is asked to populate the first TextBox with English, and then click the Button. Not
surprisingly, the button click fires an event, and that event is implemented to populate the second and third
TextBoxes with the forward and the reverse translations from English to Japanese back to English.

In the Console app scenario, "I went to the bank to deposit my paycheck" cycles back to "I went to the
bank to deposit my paycheck", and we're happy. In the Windows app scenario, "I went to the bank to
deposit my paycheck." translates to the correct Japanese
("私は、私の給料支払い小切手を預けるために、銀行に行った。", for those with support for
Japanese on their system), but that Japanese translates back to garbage.. specifically, that garbage
is "??? I? A? A d a ~ e s E A a s E s? B". I put that here, because I'm hoping the spacing pattern
might be telling here.

cout statements on both the unmanaged and managed sides confirm that the Japanese is marshaling
properly, and that the garbage is the same on both sides as well. Naturally, when you see strings
like this, you assume that it's a marshaling problem, but in this case i don't think it is. My educated
guess is that COM is just not being initialized properly during the first jump from managed to unmanaged.

I'm open to any and all brainstorms. I've not spent so many days on a single bug in a while, so I'm eager to
hear any suggestions anyone may have.

Thanks in advance,
Jerry Cain
jerry@cs.stanford.edu


Loading