Daggerfall:Text Record Format
While Daggerfall is a graphical CRPG (specifically a "first-person perspective" CRPG), she makes considerable use of text data for myriad displays and functions. Quest dialogue, conversations with NPCs, and reading books or letters are all implemented via a (mostly) uniform Text Record structure, complete with formatting information such as centering lines or displaying with different fonts. Further enriching the experience, each Text Record can have multiple Text Subrecords. These allow the engine to select one at random when displaying a message, and thus Quests can have multiple "offer" dialogs, NPCs can have multiple "greetings", and so-on. It is the goal of this article to document the Daggerfall Text Record format in detail, pointing out where and how these records are used and by what subsystems of the engine.
Where possible, ABNF will be used to describe the format to facilitate hacking and reverse-engineering efforts. To remind the reader, text following a semicolon, ";", is a comment (not malformed wiki), and is present to offer additional description but is not considered part of the element specification.
This article will use a foundational approach by first describing the formatting characters, then building up to describing a Text Subrecord, and finally applying the previous two sections to describe the Text Record itself.
Text Record Format
First, a Text Record is a list of Text Subrecord structures. While most Text Records encountered in the various files will contain but a single Text Subrecord, even they conform to this format description. So in order to describe and speak meaningfully about Text Records we must first describe Text Subrecords, but since it is those which contain the actual text and formatting information we must first describe the formatting characters.
There are a variety of characters which may be used within a Text Record which control how the text should be displayed. These characters control which font to use, if to justify/center the text, or even if a new "page" in a book should be started. I make use of the term "printing head position" which comes from my classical training in those halcyon (some would say "fossilized" or "ancient") days of Teletypes (TTY) and console-mode (no GUI). Be that as it may, just think of the "printing head position" as where the carrot/cursor would be if you were typing into an edit box or word-processor.
- NewLineoffset := %x00 ; this character instructs the renderer to display the following text at the specified PositionX on a new line
- SameLineOffset := %x01 ; this value instructs the renderer to display the following text at the specified PositionX on the same line as the preceeding text
- PullPreceeding := %x02-%xff : this value instructs the renderer the entire line, rather than only the following text, at the specified PositionX
- PositionPrefix := %xfb ; this character indicates the following text should be displayed at a specific position on the screen.
- PositionX := %x00-ff ; this indicates in which pixel-column (left-right) the following text should begin, relative to the current printing head position.
- PositionY := %x00-ff ; this seems to indicate in which pixel-row (top-bottom) the following text should begin, relative to the current printing head position. Until the exact mechanism is discovered, Book authors should always set this token to 0x00.
- PositionCode := ( NewLineOffset / SameLineOffset / PullPreceeding ) PositionPrefix PositionX PositionY ; this token is a four-character formatting code
- FontPrefix := %xf9 ; this indicates the following byte specifies which font should be used when rendering the text
- FontScript := %x02 ; this is the large, fancy-looking script with which many book titles are displayed.
- FontNormal := %x04 ; this is the default font used when displaying text, and is included primarily to recover from the FontScript.
- FontCode := FontPrefix ( FontScript / FontNormal )
- JustifyPreceedingLeft := %xfc ; indicates the preceeding text should be left-justified
- CenterPreceeding := %xfd ; indicates the preceeding text should be centered
- NewLine := %x00 ; indicates the following text should be displayed on a new line.
- EndOfLine := ( JustifyPreceedingLeft / CenterPreceeding ) NewLine ; line-breaks are a two-byte code. A NewLine token on its own is an empty string, and is rendered by displaying nothing (a no-op).
- EndOfPage := %xf6 ; indicates the end of the current page in a book or letter, and the following text should be displayed on a new page.
It should be noted that only the Daggerfall Book Display Subsystem makes any use of the PositionCode entries. It is not, to this author's knowledge, used at all by the Daggerfall Quest Subsystem and has not even been tried. Book authors should also be aware of the order in which the various tokens are evaluated. The first character of the string
fc fb 7f 00 is evaluated as a PullPreceeding token, not a NewLine. This is indicative of "greedy" regular expression parsing.
Now that we have defined how we can format text, we can define text itself. This was important because the above formatting characters will be found in almost every Text Subrecord we will encounter. Letters, Books, and even NPC dialogue will make use of them. In simple terms, a Text Subrecord contains one or more printable characters from the 7-bit ASCII encoding and/or one or more of the above formatting characters. One will encounter some Text Subrecords which appear to be blank lines. Since all that is necessary is at least something (be it a single printable character or a new-line alone), but most Text Subrecords encountered which appear blank will typically be found to be an SP (%x20) followed by EndOfLine on closer inspection.
- Character = %x20-7f / PositionCode / FontCode ; these indicate the fundamental units (atoms) of Daggerfall Text
- Text = 1*Character ; a body of Daggerfall Text is at least one Daggerfall Character
- EndOfLine / EndOfPage
- TextLine = Text [ EndOfLine / EndOfPage ]
- TextSubrecord = 1*( TextLine / BlankLine ) ; a subrecord can consist of multiple lines, be they blank or textual.
Book authors should beware that while FontCode tokens can appear at any position within a Text element, the renderer will not cooperate. FontCode tokens are rendered with an implicit carriage-return but no line-feed, which renders Text elements containing interned FontCode tokens almost unreadable as characters to the left of the FontCode are over-written; i.e., displayed concurrently as multiple print-head strikes. FontCode tokens should only appear as the first token of a TextLine element.
Finally we arrive at the Text Record, which is a container of Text Subrecords. Here we will see there is a special token, SubrecordSeparator, which divides each of the Text Subrecords composing a Text Record. Each and every Text Subrecord must be terminated with the SubrecordSeparator, excepting the last where it is an optional constraint. The engine seems capable of handling the presence or absence of terminating SubrecordSeparators without any troubles.
- SubrecordSeparator = %xff
- EndOfRecord = %xfe
- TextRecord = *( TextSubrecord SubrecordSeparator ) TextSubrecord [SubrecordSeparator] EndOfRecord ; the terminating TextSubrecord can be followed by an optional SubrecordSeparator
Text Record Database
TextRecord structures are also stored en masse in a uniform format, which this section describes. Because each TextRecord structure is assigned an ID value and a precise position within the containing file, and the count of TextRecord structures is also known a priori, we will refer to this as a Text Record Database. Numerous Daggerfall files are actually Text Record Databases, such as all the Quest QRC files, the TEXT.RSC file, and the book files.
A Text Record Database is composed of a TextRecordDatabaseHeader structure, followed by a TextRecordHeaderList structure. The offset field of each textRecordHeader element of the textRecordHeaderList identifies the position of the corresponding TextRecord as an offset from the start of the file.
Text Record Database Header
- [Bytes 0-1] unsigned short (UInt16) TextRecordHeaderLength
- Integer Length of header information in the file.
- Since each TextRecordHeader is 0x06 bytes long, textRecordCount = ( ( textRecordHeaderLength ÷ 6 ) - 1 ). There will be textRecordCount TextRecordHeader elements in the textRecordHeaderList and textRecordCount TextRecord entries in the textRecordList.
- Example: A2 00
- This would mean the textRecordHeaderLength is 162 bytes long, and there are ( ( 162 ÷ 6 ) - 1 ) = ( 27 - 1 ) = 26 TextRecords in this file.
Text Record Header Element
Following the TextRecordDatabaseHeader is a list of TextRecordHeader elements. Each element is 0x06 bytes long, and there are textRecordCount entries in this list. First we will describe the format for the TextRecoedHeader structure.
- [Bytes 0-1] unsigned short (UInt16) TextRecordId
- The unique ID number for this TextRecord.
- Each TextRecord (within this file) is uniquely identified by this textRecordId such that other systems (QBN files, for example) may refer to a specific TextRecord.
- The value 0xFFFF is invalid.
- Valid values range from 0x0000 through 0xfffe.
- [Bytes 2-5] unsigned long (UInt32) Offset
- The offset from the start of the file to where the textRecord described by this header begins.
- TextRecordHeader in ABNF
- TextRecordId = %x0000-fffe
- TextRecordOffset = %x00000000-ffffffff
- TextRecordHeader = TextRecordId TextRecordOffset
- Example: F2 03 54 01 00 00
- This would mean the text data related to TextRecordId 0x03f2 can be found 0x0154 bytes (that's 340 decimal bytes) from the beginning of the file.
Text Record Header List
The number of TextRecordHeader elements is equal to textRecordCount, computed from TextRecordDatabaseHeader's textRecordHeaderLength. Following the TextRecordDatabaseHeader is a list of TextRecordHeader elements, TextRecordHeaderList. In ABNF, this list takes the form:
- TextRecordHeaderList in ABNF
- TextRecordHeaderList = textRecordCount( TextRecordHeader ) %xffff
As seen above, the list is terminated with the literal value (UInt16) 0xFFFF, and contains a number of TextRecordHeader elements equal to textRecordCount. Thus, if there were 10 TextRecordHeader structures in the file, the list would be ( ( 10 × 6 ) + 2 ) = ( 60 + 2 ) = 62 bytes long (0x06 bytes per TextRecordHeader, plus 0x02 bytes of terminator). Since the termination is always 0xffff, this value is an invalid textRecordId.
- Example: E8 03 A4 00 00 00 E9 03 B9 00 00 00 FF FF
- This would indicate there are only two TextRecordHeader elements in the TextRecordHeaderList, the first (textRecordId 0x03e8) starts at offset 0x0a4 from the start of the file, and the second (textRecordId 0x03e9) starts at offset 0xb9 from the start of the file.
Areas of Use
This format for textual data is used by several parts of the engine.
- Quest Subsystem
- This system makes extensive use of Text Subrecords to offer variety within individual quests. Via multiple Text Subrecords the player may enjoy multiple log entries or multiple quest offering dialogue, ensuring a diverse experience and giving the illusion of an infinity of quests. Some formatting characters, such as EndOfPage, may not be suitable for use with NPC conversations or log entries, so authors are advised to avoid using them.
- This system relies heavily on the formatting characters to give the books and scrolls a "lifelike" experience. One is required to flip pages, and most display the titles in a "fancy" script.
- It is within this file the text for regular (non-quest) NPC dialogue are stored which all make good use of multiple Text Subrecords. Descriptions of the various items and artifacts are also present here, which typically only contain a single Text Subrecord. In theory one could extensively expand this file up to several gigabytes to offer an infinity of diversity since the offsets are all UInt32.