Friday, April 15, 2016

Who watches the watchers?

While continuing on my quest of researching solutions to over-zealous track-pad scrolling in Chrome, I discovered a really disconcerting issue with the Direct Composition interface APIs. Let me frame this by pointing out this statement from the above linked page:
The DirectComposition API is intended for experienced and highly-capable graphics developers who know C/C++, have a solid understanding of the Component Object Model (COM), and are familiar with Windows programming concepts.
Well that's a scary statement for sure. It's intended to let you know that you really need to understand all these concepts before continuing, lest you'll be in for some frustration. Fair-enough. Little did I know where the actually "frustration" would be coming from.

I'm not sure I fit the "highly-capable graphics developers" requirement, but I think I can stumble along OK. The rest of the requirements aren't an issue at all. Familiar with Windows programming concepts? Check. Understanding C/C++? Mostly Check. Understanding COM? Check.

Unfortunately, I'm not sure I can say that the last item is true for the developers of these APIs. Now I know that COM is, for many (most? all?), a somewhat dense and obtuse model... but you'd think that the actual designer of COM would at least understand some of its basic tenets. At least have the code reviewed and corrected before being released. Who is watching the watchers?

If you have access to the Windows SDK headers (you can download and install it here) and understand C/C++ and COM, open up the dcomp.h header file. There are many interesting interfaces declared in this file, but I'd like to specifically point out one: IDCompositionRectangleClip. There are several others with the same issue, but I'll stick to that one to demonstrate the problem.

Take a look at that interface; notice the seemingly proper usage of the various preprocessor macros designed to make declaring these interfaces simpler. They also can expand to different code depending on whether the developers is building with C++ or C. There is also another subtle issue that can easily go unnoticed until you actually build and run an application based on these interfaces! At that point the application will crash down inside the dcomp.dll. Why?

As I was working with these APIs with my handy-dandy copy of C++Builder, the application would crash every time it called IDCompositionRectangleClip::SetLeft(float left);. Odd. So I busted out VC++ and (painfully ;-), built an application to call those same APIs. This time it worked. Huh?

After going back and forth between VC++ and C++Builder, I decided to look at the actual generated code at the call-sites. They were both calling into the same dcomp.dll, so the problem couldn't (shouldn't) be in there. Here's the code from C++Builder for a call to SetLeft(1.0f):
File1.cpp.37: p->SetLeft(1.0f);
00401227 680000803F       push $3f800000
0040122C 8B4508           mov eax,[ebp+$08]
0040122F 50               push eax
00401230 8B10             mov edx,[eax]
00401232 FF520C           call dword ptr [edx+$0c]
Take particular note of the call dword ptr [edx+$0c] instruction. It is doing an indirect call to the SetLeft() method by indexing into the virtual table pointed to by edx.

Now here's the code from VC++ for the same call:
p->SetLeft(1.0f); 00884346 mov esi,dword ptr [p] 00884349 push ecx 0088434A mov dword ptr [esp],3F800000h 00884351 push esi 00884352 mov eax,dword ptr [esi] 00884354 call dword ptr [eax+10h] 
In this instance, take note of the call dword ptr [eax+10h] instruction. This is doing an indirect call to the SetLeft() method by indexing into the virtual table pointed to by eax.

Notice something odd about the offsets in those respective instructions? Why is one 12 (0Ch) and the other 16 (10h)?

You may have already figured out what may be going on here if you looked over the declaration of the interface I referenced above. Let's look at the declaration for SetLeft():
    // Changes the value of the Left property.
        float left
        ) PURE;

    // Animates the value of the Left property.
        _In_ IDCompositionAnimation* animation
        ) PURE;
Waydaminnut! Why are there two SetLeft() methods? Clearly one is intended to explicitly set the Left property of the clipping rectangle to a specific value, and the other is intended to allow for the property to be animated (or have it's value change over time based on the IDCompositionAnimation instance). So the SetLeft() method is overloaded. At first blush, this shouldn't be a problem; the compiler will select the proper method to call based on the parameters.

The problem here is this is a published COM-style interface (it descends from IUknown), so it must adhere to some rules that ensure it is accessible from other languages and tools (such as C#, C++Builder, or even Delphi). Overloaded methods in a COM interface are frowned upon because, I'll show, there is no really easy way know what v-table slots get assigned to what methods when VC++ is involved.

Clearly, dcomp.dll is built with Microsoft's own tools, which would be VC++. Because of that, their compiler gets to decide how it's going to allocate the v-table slots for the implementation of IDCompositionRectangleClip. The problem is that the v-table ordering is not, I repeat, not declaration order! Let's let that sink in for a moment.

It turns out that VC++ reverses the v-table slot assignment ordering for a given overload "group" based not on the appearance of a virtual method with a given name, but with any named declaration nested within a class. In short, the header file is built upon a bed of lies!

So far, the only solution I've found to this is to literally change the header file and reverse the declared order of the above methods. Repeat for all other overloaded method groups in other interface declarations. The problem is there is no easy way to ensure all the interfaces are now correct since the implementing code in dcomp.dll is opaque. Short of calling each one and checking that it doesn't crash, but also that it does what is expected, verifying this is a nightmare.

I guess, as long as you only ever use VC++... you will never run into this. However, there is the clang-cl effort going on which is striving to have full ABI compatibility with VC++. This means that a VC++ built object file containing a class can be linked directly with clang-cl built object file containing a descendant of that class. After contacting them, they're clearly aware of this oddity, and must faithfully duplicate it. I really don't envy those guys.