Re: Compressed textures

Tech-Archive recommends: Fix windows errors by optimizing your registry




"John Withe":
That is a problem, and one of the most frequently discussed topics here.

ok? I can not seem to locate discussions about it.
Would you care to elaborate?

Say you have a render-loop like this

1. beginscene draw endscene
2. present, goto 1

This (at most) guarantees that there will be well defined contents visible on the screen.
But you can't decide when this will really happen. Any of the d3d calls may return
immediatly, because they just add a token to the gpu's (or driver's) task-list.
We just defined what wil be visible on the screen, may it be yet or some other day later.
Assuming the loop can run faster on CPU then the gpu can execute it, there will be
some kind of a blocking call in the loop whenever the driver/gpu doesn't want to
accept more tasks (note that this blocking call isn't always present()!).

Let's include a (discard-mode) lock,

1. lock, copy, unlock
2. beginscene draw endscene
3. present, goto 1

and assume, the dirver is able to create n shadow-instances f the texture object,
where n is an integer >= 0. This number n might depend on things like driver configuration,
but also on the runtime situation (memory fragmentation, for example, or simply lazy
realeasing of shadow-instances). Remember noone said that this number n must be
greater then 0, so the driver is always free to choose to ignore the discard-locking-mode.

If n is less then the number of frames, the driver/gpu can fall behind the applicction,
and if the application feeds d3d-tasks fast, the situation would become
almost idetical to the case where n=0 (no dicard-mode).

But when a resource is logically simultaneously in use by more then one
"process" (or prosessing unit), only one process is owning the resource,
and all other units have to wait until they get it. For a system with a
single-core CPU and one GPU, this means that we can have wasted up

waste <= ( a*CPU + (1-a)*GPU ) * 100%

where a is some number in the range 0..1.


This situation changes, if we modify the render loop to read
1. lock, copy, unlock
2. beginscene draw_temp draw endscene
3. present, goto 1

where draw_temp copies the image from step 1 to a RT-texture, that will
be used in subsequent draw-calls. This decouples the actual drawing
from the lockable resource, so that only a (initial) portion of step 2
makes use of it. So the waste is now limited to

waste <= ( a*CPU + (1-a)*GPU ) * 100% * q

where q is a number in the range 0..1, indicating the fraction of time
the GPU depends on owning the lockable resource.
However, this doesn't mean that the driver really keeps track about
the last occurence of resource usage in a task-stream. If it doesn't,
q is still 100%.

After all, you're probably right in that on your computer it doesn't make
a diffrence for your application, if you lock(discard) a single "upload"
texture object, or lock(discard) just one surface of a circular "upoad"
texture collection.

However, if you compare the amount of data you actually upload
(100 MB/s ?) against the available upload bandwidth (AGP8X: 2GB/s)
you'll probably wonder where all that bandwidth is gone, even if
you take some degrading factors (like suboptimal CPU<->AGP bus
coupling/usage and/or CPU-usage for additional processing) into account.

Gruss

Jan Bruns






.



Relevant Pages