The above image is a screenshot of Soleau Software's The Mice Men, running in DOSbox.
You can probably tell it's not supposed to look like this, and if you run this same game under DOSbox yourself, it won't look like that. It'll look like this:
There's no deep mystery to what's going on here, I didn't do any great act of archaeology or technological research to discover the answer - I actually just found the explanation on a website after being puzzled for a day or two, but it's a pretty wild problem nonetheless and I found it fun to figure out, even though it turned out to be not only a known issue, but intended behavior.
I learned the ultimate answer to this here, but I find my telling of the story and background more interesting, and I figure I'm not plagiarizing anything by describing known errata.
IBM primarily provided three graphics card designs for the PC: the CGA, EGA and VGA. There are lots of details you can read about the limitations and similarities between these cards, and there were other IBM cards (some did text only, and some did very limited graphics and were hardly used by anyone) but these were overwhelmingly the standards for PC video prior to the mid 90s.
Unlike game console graphics hardware, these cards provided very little other than the ability to read a chunk of memory, interpret it as a framebuffer, and plot its values as pixel colors onto a video display. The cards usually had their own video RAM, but you wrote things to the screen largely by just setting values on that RAM exactly as if it were normal system memory. EGA and VGA added some interesting features like redefineable palettes and video scrolling registers, but they hardly got used by most people. By and large, these cards were little more than digital to analog converters with increasing amounts of bit depth.
One feature they did all have was a variety of text modes - the CGA provided an ugly 8x8 pixel font, the EGA offered a much more pleasant 8x14 font, and VGA introduced an 8x16 (sometimes 9x16) font:
These fonts were burned directly into ROM chips on the respective cards. CGA contained a ROM with its BIOS routines and one font, EGA had a table with two fonts, and VGA had three. There are two ways to use these.
The first method is to set the video card to a "text mode." In this mode, the contents of VRAM do not describe specific pixels on the screen. Instead, for each character position (either 40 or 80 columns wide and 25 rows tall) you can specify a byte that identifies a glyph to display from the character ROM, and then a set of attributes (it's foreground and background colors, whether it should blink, whether it should be bright or dim.) When the card generates the video image, it uses the character value to select an index into the character ROM and then draws the bitmap it finds there to the screen.
On some cards you could redefine this character set by telling the card to use a different location in RAM, instead of ROM, for the glyphs. This allowed fancy serif'ed fonts and so on, but was rarely used. Generally, software just used whatever font came with the card.
The second method is to switch to graphics mode, then draw text to the screen - usually because you want mixed graphics and text on the screen at the same time. See for instance the upper right corner of the game title screen:
The text here is unquestionably the distinctive IBM builtin ROM font, but it shares the screen with pixel-addressed bitmap graphics. You may think there's some great hardware trick behind this, but there isn't - you simply copy the font out of the ROM and draw it to the screen yourself.
Since the ROM is accessible memory like anything in RAM, you can simply read the pixels and plot them in code. The PC BIOS routines that draw text to the screen also perform this technique when you've selected a graphics mode. It's slower and uses more RAM, so you wouldn't want to do this unless you really do need mixed graphics and text.
At this point you have enough information to surmise what the basic malfunction is - the print routines are hitting the wrong spot in memory. But despite the characters being garbled, you can see chunks of recognizeable glyphs being drawn, not only in the big text but in the small text as well:
The 'g' and '*' seem to be valid, if incorrect glyphs, while the + appears to just be shifted vertically from normal. I don't know what the "stretched apostrophe" started life as, but it's also clearly part of a real character. If the print routines were copying random garbage out of memory, like program code, you'd see total gibberish here. Instead, we appear to be hitting a font, just at the wrong offsets.
Why is the big text affected? My guess is that William Soleau, the author of the program, wrote a routine that copies the characters for his full screen banner text out of ROM, integer-scales them up, and then draws them to the screen twice in different colors to produce a faux-3D effect. Many people would have just made this banner in an image editor, saved it in the program folder and drawn it pixel by pixel to the screen, but Soleau had a... unique perspective on software development, and also probably wanted to save some space. Why ship a whole bitmap when the EGA ROM has a bitmap you can adapt?
The cause of this problem is extremely simple and, as I said, is completely by design.
When I discovered this issue, I wasn't running the program in DOSbox - indeed, it doesn't have this problem in any version of DOSbox I tried. I was running it on a real PC, albeit one much newer than what Soleau targeted. Instead of a PC-AT with an EGA card, I was running a Pentium 3 with a Radeon 7500. While modern graphics cards continue to retain support for all the 80s IBM graphics modes, it appeared that I had found a place where they had not bothered to test compatibility, but this isn't actually the case.
The actual story has to do with the Video Electronics Standards Association, a coalition between video hardware manufacturers who, in the mid-90s, came up with a way to unify graphics card APIs. Up until this point most graphics cards depended in whole or in part on IBM's graphics card specifications - if you wanted software to be able to display an image with your "extended EGA" card in 1986, you had to put the video framebuffer at the same place in memory, support the same register values for mode selection, and so on. If that wasn't good enough, your only option was to make a card that couldn't do much without a special driver to enable its extended features.
In the 90s, after VGA came out, an enormous explosion in "super VGA" cards, which extended IBM's VGA in incompatible ways, prompted the creation of VESA, which in turn produced the VESA Bios Extensions, a standard set of registers, memory locations, etc. that weren't restrictive in the way that IBM's specifications had been, so instead of graphics card vendors needing to break from the VGA spec the moment they wanted to provide e.g. a resolution that VGA didn't have a special identifier number for, they could use the VBE specification, and any software that implemented it would be able to generically interface with that card.
VESA was very successful and all graphics cards now implement it - it's the reason you can get 1024x768 video with no graphics card driver installed. But even your Geforce RTX 2080 still has those ancient VGA, EGA and even CGA modes built in and largely compatible - it's not very hard to do, and computers still often boot in these modes, so there's no way around it without causing lots of unnecessary problems.
However, they're only largely compatible. I'm not aware offhand of many breaks in compatibility, but the font issue is one of them. Here's the same quote Better Software used on their FIX8X14 page:
VESA BIOS EXTENSION (VBE) Core Functions Standard Version: 3.0 Date: September 16, 1998:
Removal of Unused VGA Fonts
VESA strongly recommends that removal of the 8x14 VGA font become a standard way of freeing up space for VBE 3.0 implementations. The removal of this font leaves 3.5K bytes of ROM space for new functions, and is probably the least painful way to free up such a large amount of space while preserving as much backwards compatibility as possible. The 8x14 font is normally used for VGA Modes 0, 3 and Mode 10h, which are 350-line or EGA compatible modes. When those are selected the 8x16 font may be chopped and used instead. When chopping a 16 point font to replace the 14 point, there are several characters (ones with descenders) that should be special cased.
Some applications which use the 8x14 font obtain a pointer to the font table through the standard VGA functions and then use this table directly. In such cases, no workaround using the 8x16 font is possible and a TSR with the 8x14 font is unavoidable. Some OEMs may find this situation unacceptable because of the potential for an inexperienced user to encounter "garbage" on the screen if the TSR is not present. However, OEMs may also find eventually that demand for VBE 3.0 services is great enough to justify the inconvenience associated with an 8x14 font TSR. To date, no compatibility problems are known to be caused by the use of such a TSR. VESA will make available a TSR that replaces the 8x14 font, please contact VESA for more information.
That's all there is to it. VESA simply recommended everyone delete this font, and everyone did it (nearly - a few cards retained compatibility.) So why didn't this immediately break a ton of software? Well, VESA's recommendation is to use the 8x16 font when in EGA modes, and simply truncate a couple pixels (presumably out of the middle) to make it fit, which makes sense and probably worked for nearly everything in practice.
But this presumably only works in text modes (where the graphics card can assert its own control over where the glyph ROM is found) or when the BIOS routines are used to draw text in graphics mode. If you manually try to pull the font out of ROM, by accessing the specific place where you know it lives on EGA compatible cards, whoops, you'll land in the middle of the 8x16 font that was shuffled up when the 8x14 one was removed.
That's why the characters are clearly valid but just shuffled up and down. The ROM font is stored as a continuous series of one-byte strips, so if you want the first 8x14 character, you take the first 14 bytes, and if you want the tenth character, you skip forward 140 bytes and then take 14. If you do that with the 8x16 ROM font however, the 140th byte is actually the middle of the eigth character, and that's how we end up with horizontally shuffled glyphs.
So this is completely by design, in the sense that the graphics card manufacturers changed how their cards work internally, while not breaking the BIOS interface - the only "contract" they ever had with software developers - and it was known and expected that in a few rare cases (which did in fact prove to be rare) a solution could be implemented in software, and it was, and it worked. The only problem is that, faced with this, how on earth were you ever supposed to find out what was causing it? I mean, I have the full might and fury of the internet to hand and it took me several days, plus help from friends.
As VESA suggested, TSRs are available that fix the problem. Better Software provides FIX8X14, which does in fact work on my actual machine. You can see from the short list of programs on that page that the frequency of this actually causing problems was probably pretty low, as predicted.
Why does it work in DOSbox, despite DOSbox implementing VBE? That's an interesting question that I can only speculate on, but my guess is simply that it's because DOSbox does not emulate a specific graphics card, generally speaking. Its emulated graphics hardware is an amalgamation that works with most software, probably hacked to death to make certain programs work, and since its ROM calls presumably don't need to execute everything within the address space of the emulated machine, they probably had no reason to recover the couple kilobytes that the 8x14 font took up in order to have space for all the graphics routines DOSbox supports.
I did in fact obtain the DOSbox code in order to explore and experiment with this and found that, yes, it does define all three fonts in its address space, and by changing the pointer of the 8x14 font to the location of the 8x16 font, the problem is replicated. You can find code and a compiled binary that replicates this problem here - look in src to find dosbox_8x14.exe.
If this was interesting to you, or if you did something interesting with it, email me: articles@gekk.info
If you like my work, consider tossing me a few bucks. It takes a lot of effort and payment helps me stay motivated.