Re: Advice using the movsb, movsw, movsd instructions
- From: "Jean-Marc" <jmnaste@xxxxxxxxxxx>
- Date: 1 Jul 2006 09:33:55 -0700
Andy Champ wrote:
Tim's right, use memcpy. If you pull it apart you'll find it does
clever things like copying a few bytes at the beginning and the end of
the block to make sure that the big lump in the middle is dword aligned,
which means each read and write take one cycle only. if they are not
aligned they take two (although the 2nd one will be from cache, you
still want to avoid it). The other thing to do is to make sure both
your buffers are on the same alignment - if one buffer is at an address
ending zero, and the other ending one, there is no way to make the read
and the write be aligned.
Yes, I saw that too.
That aside, are you sure you are optimising the right bit of your code?
Also, the clever trick will be to use MMX to pull the data out of the
source block and apply some sort of processing in one go.
Yes, there are a couple of graphs going on and memory movements are
used for bridging between them. It will save just a couple % of the CPU
but this already makes free CPU time almost double (it runs over 80%).
But as you say, memcpy (compiled with the right optimization) does the
job.
Optimising got a whole lot harder when they invented cache, and more so
with MMX. In my experience you'll get more from improved algorithms
than polishing the code.
Actually, Alessandro's (post 2) just woke me up that I was testing with
a non optimized version. Flipping the right compiler /O switch gave me
those few % and brought CPU a bit over 80% compared to almost 90%
before.
Regarding MMX, Jeremy suggests to rather use IPP. I have to take a look
at it because my understanding is that it is a higher level of
abstraction (over MMX?). More services, easier to program.
As I said earlier in thread, I suspect YUV to ARGB32 conversion
(actually done by DV Decoder filter) to consume significant amount of
CPU resources. Is it worthed replacing this conversion with a custom
filter using IPP?
Also, I need to rotate image 90 degres (around Z axis) and I need to
sometimes capture VMR9 output to a file. Those two can be resolved
using a custom allocator/presenter. Some threads on this newsgroup
suggest that a custom allocator/presenter is the best (if not only)
solution for capturing VMR9 output. I am working on that now. Another
advantage for me is that applying rotation at this later stage in graph
releases the need to manage the rotation in upstream treatments (which
makes things a lot easier for me).
Thanks for helping.
Jean-Marc.
Andy
.
- References:
- Re: Advice using the movsb, movsw, movsd instructions
- From: Andy Champ
- Re: Advice using the movsb, movsw, movsd instructions
- Prev by Date: Re: GraphEdit and MPEG2 files - incorrect playback
- Next by Date: Re: DShow and MPEG
- Previous by thread: Re: Advice using the movsb, movsw, movsd instructions
- Next by thread: Re: GraphEdit and MPEG2 files - incorrect playback
- Index(es):
Relevant Pages
|