Re: Unbox IL instruction question

Tech-Archive recommends: Speed Up your PC by fixing your registry



Hi Barry,

Thanks, this info is really useful. Your thoroughness is much appreciated.
You've inspired me to read more about methodtables.

There was one thing you mentioned, though, that I have trouble understanding:

so why is this unbox instruction able to avoid
copying from the box and just return a pointer to what's in the box?

You may be calling an instance method on the value type. In that case,
argument 0 - i.e. the 'this' argument - needs to be a pointer.

Would an example of this be if you define a function within a struct? I'm
also not sure how this ties into the "this" pointer. Could you elaborate on
this?

Thanks...

-Ben

"Barry Kelly" wrote:

Ben R. <benr@xxxxxxxxxxxxxxxx> wrote:

http://msdn2.microsoft.com/en-us/library/system.reflection.emit.opcodes.unbox.aspx

Now a few things: It mentions an "object reference" and a "value type
pointer."

An object, traditionally, looks somewhat like this:

<vtable pointer>
<object data>
<object data...>

The equivalent in the CLR is:

<MethodTable pointer>
<object data>
<object data...>

So, if you have a boxed int, you've got this:

<System.Int32 MethodTable pointer>
<int32 m_value>

(You can view this 'm_value' field if you use reflection to look into a
boxed Int32.)

So, to 'unbox' a boxed instance, all the CLR needs to do is increment
the pointer. A boxed instance looks like this (including the pointer):

-> <System.Int32 MethodTable pointer>
<int32 m_value>

.... to:

<System.Int32 MethodTable pointer>
-> <int32 m_value>

et voilà - a managed pointer to the value!

What to type O and type & refer to?

Type O is an object type, and will always point to a fully-fledged
object on the heap. A managed pointer, denoted by '&', may point to an
arbitrary typed location. For example, 'ref' and 'out' parameters are
implemented using '&', in which case the pointers might point to locals
on the stack.

Another question:
"unbox is not required to copy the value type from the object. Typically it
simply computes the address of the value type that is already present inside
of the boxed object."

Hopefully this is clear from the adjustment above - typically, the CLR
simply needs to add IntPtr.Size to the pointer (4 on 32-bit, 8 on
64-bit).

It seems then that unbox which is supposed to return a value type is simply
returning a pointer to what is inside the box.

It loads a managed pointer onto the stack. You can then use ldobj to
dereference the managed pointer and convert it into a fully-fledged
value type on the stack. There is an instruction on 2.0, unbox.any,
which combines the two steps, and is pretty essential for generics to
work on both value types and integers.

I thought value types were
very simple: they contain only the actual data you're using. If you push the
integer 6 on the stack, you don't deal with addresses at all, you simply put
a 6 on the stack. right?

Yes.

so why is this unbox instruction able to avoid
copying from the box and just return a pointer to what's in the box?

You may be calling an instance method on the value type. In that case,
argument 0 - i.e. the 'this' argument - needs to be a pointer.

Also, I hasten to point out that the IL instructions are simply a
description of required semantics. I think it's wise not to think of the
CLI's abstract machine as an actual CPU with x cost for this instruction
and y cost for that instruction, but rather as input to a compiler and
optimizer which can figure things out properly. When considering how to
generate code, look at the output of the C# or C++/CLI compiler for an
example expression / statement. The C++/CLI compiler in particular seems
to sometimes generate slightly faster IL.

-- Barry

--
http://barrkel.blogspot.com/

.



Relevant Pages

  • Re: pc,bp,and sp concepts
    ... pc or ip is the program counter/instruction pointer that holds the ... address of the next instruction to be executed. ... bp, or base pointer, can be used for whatever, but originally it was used as ... an extra way to access another stack. ...
    (alt.lang.asm)
  • Re: I feel stupid... "Invalid combination of opcode and operand", was, now is FORTH question
    ... TOS in ebx - top of stack - first stack element ... PSP in ebp - parameter stack pointer - pointer to stack, ... execute next at the end of their definition. ... High level Forth definitions merely organize the ...
    (comp.lang.asm.x86)
  • Re: win32 call dword ptr [eax] help needed
    ... I am kind of unsure as to how one would tell if this is in the heap or the stack. ... Anyone got any good documents on overwriting the SEH in a multithreaded application? ... >vtable directly, you are overwriting a pointer to an object, may be ... >talking about reliability we should ponder all options, ...
    (Vuln-Dev)
  • [0xbadc0ded #04] smtp.proxy <= 1.1.3
    ... A remotely exploitable format string vulnerability exists in smtp.proxy ... Since the 'line' buffer is ... Since smtp.proxy is started from a superserver, like inetd, the stack ... To determine the address of a function pointer we could either try to ...
    (Bugtraq)
  • [Full-Disclosure] [0xbadc0ded #04] smtp.proxy <= 1.1.3
    ... A remotely exploitable format string vulnerability exists in smtp.proxy ... Since the 'line' buffer is ... Since smtp.proxy is started from a superserver, like inetd, the stack ... To determine the address of a function pointer we could either try to ...
    (Full-Disclosure)