Steve Gibson (of SpinRite fame) proposed a theory in his weekly Thursday-night
podcast last week that if true, would be the biggest scandal to ever hit Microsoft - that the Windows Metafile (WMF) vulnerability that drew so much media attention last month is actually a backdoor programmed intentionally by Microsoft for unknown reasons.
Slashdot picked up the story the next day and I received a flood of emails asking me to look into it. I finished my analysis, which Steve aided by sending me the source code to his WMF-vulnerability tester program (KnockKnock), over the weekend. In my opinion the backdoor is one caused by a security flaw and not one made for subterfuge. I sent my findings to both Steve and to Microsoft Monday morning, but because the issue continues to draw media attention I’ve decided to publicly document my investigation.
Understanding the WMF vulnerability requires a brief background in WMF files. A WMF file is a script for executing graphics commands, called graphics device interface (GDI) functions. Each command is stored as a record in the WMF file and examples of GDI functions include ones to draw lines, fill rectangles, and copy bitmaps. Image files like bitmaps, GIFs, or JPEGs, on the other hand, store the representation of pre-rendered images. Because an application can direct a WMF file’s execution at different devices, like screens and printers, with different graphics resolutions and color depths, their advantage over pre-rendered formats is that they scale to the capabilities of the target device. For this reason, many clipart images, including those used by Microsoft Office, are stored in WMF files.
WMF files originated with early 16-bit versions of Windows that implemented single-threaded cooperative multitasking. In that programming environment a process can’t perform two tasks, such as printing a document and displaying a print-cancel dialog, concurrently. Instead, they have to manually interleave the tasks, periodically polling to see if the user has asked to cancel the printing. The programming model for printing in Windows therefore has the concept of an abort procedure that an application can set before calling the printing API. If such a procedure is registered Windows periodically calls it to give an application a chance to signal that it wants the print job cancelled. Otherwise there would be no way to abort a long-running print job.
The WMF vulnerability stems from the fact that WMF supports the SetAbortProc API, which is the GDI call to set an abort procedure, that Windows expects abort procedure code to be stored directly in the SetAbortProc WMF record, and that Windows will invoke the procedure under certain conditions immediately after processing the record. Thus, if an attacker can get your computer to execute their WMF file through Internet Explorer or Outlook, for example, they can make your system execute arbitrary Windows commands, including downloading malicious applications and launching them.
Steve Gibson’s intentional backdoor theory is based on four suspicious observations he made regarding the vulnerability and the behavior of his tests with WMF files that contain a SetAbortProc record:
- There is no need for WMF files to include support for the SetAbortProc API.
- Even if an abort procedure is set by a WMF file, Windows shouldn’t execute it unless some abort condition is triggered, which should never occur when executing a WMF file.
- He could only get his WMF file’s abort procedure to execute when he specified certain invalid values for the size of the record containing the SetAbortProc command.
- Windows executes code embedded within the SetAbortProc record rather than expect the record to reference a procedure within the application executing the WMF file.
Steve’s belief that WMF files should not support the SetAbortProc API comes from the
documentation for how Windows calls an abort procedure registered via SetAbortProc:
It [the abort proc]
is called when a print job is to be cancelled during spooling.
The statement implies that Windows detects that a user or printer wants to cancel a print job and informs an application by executing the registered abort procedure. Steve echoes this understanding in a
posting on his website’s news group:
[the abort proc]
is the address of an application-provided "callback" -- a subroutine provided by the application that is expressly designed to asynchronously accept the news and notification of a printing job being aborted for whatever reason.Steve reasoned that WMF files execute to screens, not printers, and so it makes no sense to abort their execution. Further, his tests showed that Windows calls the abort procedure registered by a WMF file immediately, when there’s no apparent cause for cancellation.
WMF files can be directed at a printer, however, and not only that, but the abort procedure documentation is misleading. Its correct description is in the Microsoft
documentation that describes the basic steps for writing code that prints to a printer:
After the application registers the AbortProc abort procedure, GDI calls the function periodically during the printing process to determine whether to cancel the job.
Thus, the abort procedure really works both ways, providing Windows a way to notify an application of printing errors and the application a way to notify Windows that it wants to cancel printing. With this interpretation Windows’ execution of the abort procedure immediately after one is registered makes sense: Windows is calling the procedure to ask it if the playback of the rest of the procedure should be aborted.
Even still, the question remains as to why WMF files implement the SetAbortProc GDI function at all. My belief is that Microsoft developers decided to implement as much as the GDI function-set as possible. Including SetAbortProc makes sense for the same reason that abort procedures for printing make sense: WMF files can consist of many records containing complex GDI commands that can take along time to execute, especially when sent to a printer and on old hardware like the kind on which the cooperatively multitasked Windows 3.1 operating system ran. The abort procedure gives applications the ability to monitor the progress of a playback and to unilaterally abort it if a user makes UI choices that make a complete playback unnecessary. In addition, if a WMF file is sent to a printer and there’s a printer error Windows must have a way to know that an application wants to cancel WMF playback, which is another reason to invoke the abort procedure from within the PlayMetaFile loop. This Microsoft
article from 1992 confirms the behavior as designed.
I’ve addressed the first two of Steve’s observations, but what about his claim that the abort procedure only executes when the SetAbortProc record contains certain invalid record sizes? I’ve analyzed the control flow of the PlayMetaFile function that executes WMF file records and found that, if an abort procedure is registered, it calls it after executing each record except the last record of the file. That behavior makes sense since there’s no need to ask an application if playback should be aborted when the playback is already completed.
Steve’s example WMF file contains only one record, the one that specifies SetAbortProc, so under normal circumstances PlayMetaFile will never call his abort procedure. The record sizes that he found trigger its execution cause PlayMetaFile to incorrectly increment its pointer into the WMF file such that it believes that there are more records to process, whereas the values he used that don’t trigger the execution land it on data values that indicate there are no more records. So his assertion that only certain magic values open the backdoor is wrong.
The remaining question is why PlayMetaFile expects the abort procedure to be in-lined in the metafile. It’s that fact that allows a hacker to transport malicious code within a WMF file. The actual reason is lost with the original developer of the API, but my guess is that he or she was being as flexible as possible. When a WMF file is generated in memory and played back by the application in the same run, like it would to create a graphics macro and use it mulitple times, it generally makes no difference if the procedure is copied or not.
For the code in on-disk WMF files to work any references it makes to data or code, such as Windows functions, must be to hard-coded addresses. This means that WMF file code won’t work if Windows system DLLs change in size or load into different locations in memory and therefore WMF vulnerability exploits only work on specific patch-levels of Windows. While this might make an argument against a design that includes the abort code in the WMF file things were different when the format was architected. In the Windows 3.1 “large” memory model code is inherently location-independent and Windows was never patched, so both Windows and an application could simply copy an application function into the WMF file and assume it would work when played back by the same application in a later run session. In any case, its not clear that the developers envisioned applications creating on-disk metafiles with abort procedures. Also, as Microsoft’s Stephen Toulouse pointed out in
Microsoft’s rebuttal to Steve’s claims, the security landscape in the early 1990’s was very different than today and all code, including that stored in a WMF file, was inherently trusted.
The vulnerability is subtle enough that the WINE project, whose intent is to implement the Windows API for non-Windows environments,
copied it verbatim in their implementation of PlayMetaFile. A secret backdoor would probably have been noticed by the WINE group, and given a choice of believing there was malicious intent or poor design behind this implementation, I’ll pick poor design. After all, there are plenty of such examples all throughout the Windows API, especially in the part of the API that has its roots in Windows 3.1. The bottom line is that I'm convinced that this behavior, while intentional, is not a secret backdoor.