Although I have it installed for some time now, I've been avoiding using Visual Studio . The incremental improvements in the compiler simply aren't worth putting up with the braindead, butt-slow IDE.Thus, I've been continuing to use Visual C 6.0 SP5 PP.The problem is that different drivers and applications are inconsistent about how they treat or format odd-width and odd-height YV12 images.Some support it by truncating the chroma planes (dumb). Now, if people had sense, they would have handled this the way that MPEG and JPEG do, and simply require that the bitmap always be padded to the nearest even boundaries and that the extra pixels be ignored on decoding.If you have extra shifter or ALU bandwidth you can attack this by replacing , but you can't do this when the compiler is generating code from intrinsics.And before you say that performance doesn't matter so much, remember that the purpose of those intrinsics is so that you can optimize hotspots using CPU-specific optimizations. This all only pertains to the Microsoft Visual C compiler, and as it turns out, the Intel C/C Compiler generates much better MMX and SSE2 code. As it stands right now, though, I still have to use Visual C , and that means I'm still going to have to hand-roll a lot of assembly code for performance.

There are two major bottlenecks to getting Virtual Dub running smoothly on AMD64: the compiler doesn't support inline assembly, and the OS doesn't support MMX for 64-bit tasks.

The code has shipped and is in 1.5.10, but is hard-coded off in .

I might resurrect it again as NVIDIA reportedly exposes a number of features in their hardware in Open GL that are not available in Direct3D, such as the full register combiners, and particularly the final combiner.

Applying this both horizontally and vertically gives the bicubic filter.

The fact that you calculate the 2D filter as two 1D passes means that the 2D filter is separable; this reduces the number of effective taps for the 2D filter from 16 to 8.

