Re: How does managed code work?
- From: Jeroen Mostert <jmostert@xxxxxxxxx>
- Date: Wed, 24 Jun 2009 01:33:34 +0200
Siegfried Heintze wrote:
Would like to pursue some questions I recently encountered in an interview -- someone might ask them again in another interview. I was not very satisfied with the answers I gave the interviewer.First of all, on the language level, there is no stack and there is no heap. There are only reference types and value types, and instances of both are objects. This focus on objects rather than memory is actually part of languages like C and C++ too (my copy of the C standard contains no instances at all of the words "heap" and "stack"), but since the mechanisms are ubiquitous and people usually work much closer to the metal there this tends to be ignored. It bears mentioning because in managed languages it's both easier and more convenient to not focus on this at all.
(1) Can someone point me to a discussion of the managed stack? Does it work the same way as the native (CPU Vender implemented) stack with a frame pointer that is the head of a linked list of stack frames where each time we enter a function we create a new stack frame in which new variables are pushed and each time we exit a function the entire stack frame is popped?
(2) Can someone point me to a discussion of the managed heap? How does it work? Does it use counted pointers like COM often does? What, exactly, happens when use operator= to (shallow) copy a SqlDataReader object from a stack local variable to a global variable? How does it prevent memory leaks that occur in COM when two objects reference each other and keep the others reference count nonzero? How is the managed heap different than the native heap? I think the managed heap implements defragmentation automatically like Java. Does it use the mark and sweep algorithm or some other algorithm.
This helps avoid the misleading detail in questions like "What, exactly, happens when use operator= to (shallow) copy a SqlDataReader object from a stack local variable to a global variable?" What *exactly* happens is that the reference to the object is copied (not the object itself), so now there are two references to the object instead of one, and that's all that happens. The interesting part doesn't happen until later, when the garbage collector draws up the set of objects which are still reachable. That can be couched in terms of stacks and heap, but there's no need for that; local variables and fields will do.
On the runtime instruction level it's more complicated: there's an evaluation stack for expressions, there are "slots" for local variables, and there are instructions for creating new objects, calling methods and returning from them. The "evaluation stack", "call stack" and "heap" exist only as concepts to facilitate this; they operate according to certain abstract rules but how they're implemented is of no concern.
Finally, when you get down to the implementation there is a heap and there are stacks (plural since we can have multiple threads), and if you're interested you can dig into the current implementation of the CLR, but it's of little consequence when you're programming. Managed code has no access to these details and unmanaged code only needs to bother if it's the CLR itself (or possibly when you're debugging interop scenarios).
FWIW, the current Windows implementation of the CLR uses call stacks which work the same as the native call stack (in fact, they use the same mechanism, so managed and unmanaged stack frames are mixed). The managed frames have their own format; they cannot be decoded like unmanaged frames. The managed heap on the other hand works quite differently from the native heap. http://msdn.microsoft.com/library/f144e03t explains it a lot better than I can. Summarizing: .NET uses mark-and-sweep garbage collecting with generations (so no reference counting issues like in COM).
If you are interested in the nitty-gritty of .NET and you have a C/C++/Win32 background (as you seem to do) I can recommend "CLR via C#" by Jeffrey Richter; Jeffrey's an old hand at Win32 and he goes into much detail in a familiar way.
(3) Why do we have non-determanistic destructors in C#?
For "destructor", read "finalizer". We have it because .NET uses the philosophy that deterministic finalization is the minority case. There's certainly something to say for this, namely that non-deterministic finalization for memory, the most common resource, happens to be a good idea -- this is garbage collection.
Asked differently: Why do some classes, like the SQL data reader, need to have their dispose
function called explicitly? Why did not the language designers implement
deterministic destructors so we would not have to manually use the
"using" statement or (worse yet) manually call the dispose function when
the object goes out of scope?
Because objects don't go out of scope, they become unreachable. But when they are determined to be unreachable is left undefined. In most cases this additional freedom for the compiler and the runtime pays off.
There would be very little gain to forcing the compiler/runtime to treat the case of an object to which only local variables hold references (the only case to which we could realistically apply deterministic finalization) in a special manner. For starters, this would force you to spell out the rules for when a variable is live in the programming language itself (and the programmer would have to know them). Even if you let this coincide with scope (arguably the simplest way to do it), the case where a reference accidentally "escapes" and ceases to be deterministically finalizable would be trivial to overlook (in C++ this would be the classic "reference to local variable" problem). A "using" scope makes these things explicit.
Could not the language designers design the C# language so the compilerBut this is exactly what finalization does. As soon as the object "goes out of scope" (is no longer reachable and is garbage collected) it's finalized (well, technically, it's put on the list to be finalized Real Soon Now). The issue is that in .NET, "going out of scope" does not have an exact timeframe associated with it like in C++. An object is determined to be unreachable at some unspecified time.
tells runtime: "hey! this sqldatareader is going out of scope so you
better call the dispose function."
Associating an explicit timeframe (for purposes of cleaning up non-memory resources, at least) is what you do in C# with "using", and what you do in C++ by allocating the object on the stack (or wrapping a heap-allocated object in a smart pointer).
(4) Is there any circumstance where I would NOT want to call dispose (explicitly or via the "using" statement) on function local SQL Data Reader object when it is going out of scope?Assuming you don't pass the reference off to another object (so you can no longer tell when it's "going out of scope"), which would probably be a bad idea, the answer is "no". Actually, you might want to dispose it *before* it goes out of scope, or better yet, combine the scope and the disposing... which is precisely what "using" does. In C++, you might introduce an artifical scope for this. In neither case can you afford to ignore what's happening when those little objects blink in and out of life, since they're using precious resources under the hood. The main difference is that C++ considers memory precious as well, while .NET treats it more like a renewable resource.
(5) How is the structure of a managed DLL different from a native DLL?Managed code is organized in the form of assemblies (whether .DLL or .EXE, that's the same thing as far as managed code is concerned). As for the differences between assemblies and DLLs, I couldn't possibly do it justice here. Google is your friend. Structurally, the main difference is that assemblies have complete metadata (types, methods, argument lists) while DLLs have little to none (list of function names imported/exported, maybe type names for a C++ DLL, that's about it).
(6) What choices of XML parser implementatoins do I have? I can call the native MSXML via COM interop or PInvoke and I can the ones in System.Xml. What is the difference? Is System.XML just a wrapper for MSXML?Using COM interop is a performance hit generally to be avoided, and no, System.Xml is not a wrapper around unmanaged code but a grounds-up implementation in managed code. I can think of no scenario at all where P/Invoking to MSXML would make sense, except perhaps stringent backwards compatibility (but then, don't write managed code).
(7) What choices of XML parser types are there? There is SAX and DOM. Any other choices?The framework has no SAX-like parser for XML. It has a DOM parser, it has lightweight pull-model parsers in the form of XPathNavigator and XmlReader and with the advent of .NET 3.5 it has a non-DOM but still in-memory model (albeit lazily evaluated) in the form of LINQ to XML.
This link explains why the framework has XmlReader but not SAX: http://msdn.microsoft.com/library/sbw89de7
(8) What is the difference between using PInvoke to manipulate a semephore or mutex and using System.Threading?Using the Semaphore class from System.Threading is portable (.NET isn't just for Windows). Using P/Invoke is not. Other than that, as the class is at present a wrapper around the unmanaged functionality, there's no difference.
--
J.
.
- Follow-Ups:
- Re: How does managed code work?
- From: Peter Duniho
- Re: How does managed code work?
- References:
- How does managed code work?
- From: Siegfried Heintze
- How does managed code work?
- Prev by Date: Trying to understand enums
- Next by Date: Re: Trying to understand enums
- Previous by thread: Re: How does managed code work?
- Next by thread: Re: How does managed code work?
- Index(es):
Relevant Pages
|