Re: Executable Memory in a Driver

From: Jan Bottorff (nospam4096_at_online.nospam)
Date: 02/16/05


Date: Tue, 15 Feb 2005 23:17:03 -0800


> Almost all the memory you typically allocate in a WDM kernel driver is
> sitting in places that are tracked by the OS as pointers (e.g. your device
> extension), or is sitting on the stack. These can't be garbage collected
> without disaster (and must be allocated in NPP, mostly), so you lose a
> majority of the benefit of the language being managed in the first place,
> at the expense of a ludicrous amount of complexity.

Let me speak from some years of experience as a pure object oriented
language virtual machine developer in a former life...

I expect you might use a garbage collector algorithm, like a treadmill
collector, that doesn't need to move memory. You might also keep an "object"
space that will get compacted periodically, but have objects that refer to
paged/non-paged pool memory.

A big misunderstanding about garbage collection is that it's more overall
work than the memory management you would need in say C to create similar
functionality. Lot's of drivers I've worked on had to use reference counting
(a simplistic form of garbage collection). Managing those reference counts
takes cpu cycles. In complex systems, a garbage collection as part of the
language can be more efficent, and do a better job than you can by making
your own simple garbage collection.

The issue of kernel stack usage could be a total non-issue too. When
implementing a language, you can implement your "stack" frame other ways,
for example a linked list. In the Smalltalk system I worked on, activation
frames were dynamically shuffled between the processor stack and being full
objects living in the garbage collected heap space. It could consume some
memory (not that much) to recurse 8000 levels deep if you needed to, but
there was no concept of a stack overflow.

Having multiple classes of memory is an option too. You could design the
language implementation so that things like interrupt handler are always
resident, and accessable

These are all just language implementation choices. There have been C
compilers that emit byte codes that get interpreted, and there have been
Smalltalk implementaitons that generate native processor code.

> Also, I don't know of any managed languages that are currently as robust
> as C or even C++ in terms of their compiler/interpreter implementation.

It sounds like the "there might be bugs in the execution system" argument.
I'd suggest that with a serious language project, the number of system
failures from things like invalid pointers or buffer overruns would go way
DOWN, as in most languages it's simple impossible to write code that can
corrupt memory. Using C and C++ are the MAJOR cause of all these memory
corruption issues.

> That will change eventually, of course, but in the mean time it would be
> criminal to expose users to the added bluescreen and security risk.

Your joking about the security risk aspect right? I believe analysis has
shown that 50% of security breaches are caused by exceeding arrays bounds
(i.e. buffer overruns). In a language that can't access outside an array,
50% of the security breaches simply can't exist.

> Also, almost all of the problems that plague C++ in kernel mode apply
> double to a managed language.

Sounds like a pretty sweeping statement, can you be specific? I can totally
believe that kernel developers would go through a LOT of pain initially, as
many of their paradigms would have to change. You realize kernel developer's
paradigms may have to change anyway, as "secure computers" and "unlimited
access to everything" are mutually exclusive. Things like DRM (digital
rights management) may force things in this direction. For example, if you
had an OS (or processor) that always checked the digital signature of code
modules, and the only way to get those digital signatures was from a
compiler and API library that was digitally signed by the OS maker, you
could enforce almost any constraints desired in the compiler. My very fuzzy
memory of a Burroughs mainframe was it had no user and kernel modes, as all
protection was enforced by the compiler, and the OS was written in an Algol
like language.

> If you interpret the byte-code, it will be dog slow. If you JIT it, you're
> opening up a whole can of security holes by requiring that much of the
> memory used by driver images has to be (at least for a while) both
> writable and executable. Any bugs in the entire interpreter/JIT compiler
> now become kernel vulnerabilities. So you're going to end up statically
> compiling and linking it anyway, which removes the opportunities that many
> of these languages have to make your life easier by using dynamic code.

Like I said, there are MANY execution options. I think you leave out the
detail of who get's to write or execute. If ONLY the inner core kernel is
allowed to write/control execution, the risks are dramatically different.
For example, if the function return stacks are in a different memory space
than the temporary data, then you can't overrun a buffer and influence the
execution return path. In the microcontroller universe it's very common for
code and data to be in totally disjoint memory spaces. Actually, C compilers
could generate code that solves many of these issues today.

A BIG BIG problems is many people are used to trading performance at all
cost for secure execution. I'd bet MANY people today would trade some
performance for better security. It's a knob that could be turned. The
products such as VMWare annd Virtual Server take the attitude the OS is so
insecure you had better only let it run in a totally isolated environment.
It's funny how people will take the large performance hit from processor
virtual environments like these, but would kick and scream if the compiler
suddently generated code that ran 10% slower, and did a lot of run-time
checking.

For some of you this is all probably obvious, and some of you probably think
I'm nuts.

- Jan



Relevant Pages

  • Re: "STL from the Ground Up"
    ... high-level intermediate language than can interoperate with many other ... If your language lacks expressive features then you cannot write code ... memory management in comparison. ... Mostly because type errors mean that the programmer and compiler disagree ...
    (comp.programming)
  • Re: multiple threads and volatile
    ... However declaring an object volatile tells the compiler that access to ... stored), then add 50 and store the result again. ... Also the C language spec doesn't know about threads, ... copied from memory to register, ...
    (comp.programming.threads)
  • Re: operation on module arguments
    ... But the logic to have this feature in the ... language is pretty reasonable, so I was not too greedy, was I? ... memory for storing a real variable, and let the variable name 'A' refer to that bit of memory." ... It means, for example, that the compiler can't pass A around like a real variable, because when the compiler passes it to a subroutine that might modify it, the compiler also has to pass along the instructions that go with it. ...
    (comp.lang.fortran)
  • Re: RFC : SOME IDEAS FOR THE APPLE II FPGAers
    ... After porting a compiler to a new machine, ... use the other half of the machine's memory for profile counters ... middling peaks for fast call expansion, and the rest for interpretation. ... etc. and hand craft the critical sections in assembly language. ...
    (comp.sys.apple2)
  • Re: How come Ada isnt more popular?
    ... I think the absence of manual memory management code actually furthers ... the rest of the language in such a way that GNAT didn't get it right ... fcuntional langauges you can do more against ressource leaks, ...
    (comp.lang.ada)