|  | ===================================== | 
|  | The PDB DBI (Debug Info) Stream | 
|  | ===================================== | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | .. _dbi_intro: | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | The PDB DBI Stream (Index 3) is one of the largest and most important streams | 
|  | in a PDB file.  It contains information about how the program was compiled, | 
|  | (e.g. compilation flags, etc), the compilands (e.g. object files) that | 
|  | were used to link together the program, the source files which were used | 
|  | to build the program, as well as references to other streams that contain more | 
|  | detailed information about each compiland, such as the CodeView symbol records | 
|  | contained within each compiland and the source and line information for | 
|  | functions and other symbols within each compiland. | 
|  |  | 
|  |  | 
|  | .. _dbi_header: | 
|  |  | 
|  | Stream Header | 
|  | ============= | 
|  | At offset 0 of the DBI Stream is a header with the following layout: | 
|  |  | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | struct DbiStreamHeader { | 
|  | int32_t VersionSignature; | 
|  | uint32_t VersionHeader; | 
|  | uint32_t Age; | 
|  | uint16_t GlobalStreamIndex; | 
|  | uint16_t BuildNumber; | 
|  | uint16_t PublicStreamIndex; | 
|  | uint16_t PdbDllVersion; | 
|  | uint16_t SymRecordStream; | 
|  | uint16_t PdbDllRbld; | 
|  | int32_t ModInfoSize; | 
|  | int32_t SectionContributionSize; | 
|  | int32_t SectionMapSize; | 
|  | int32_t SourceInfoSize; | 
|  | int32_t TypeServerSize; | 
|  | uint32_t MFCTypeServerIndex; | 
|  | int32_t OptionalDbgHeaderSize; | 
|  | int32_t ECSubstreamSize; | 
|  | uint16_t Flags; | 
|  | uint16_t Machine; | 
|  | uint32_t Padding; | 
|  | }; | 
|  |  | 
|  | - **VersionSignature** - Unknown meaning.  Appears to always be ``-1``. | 
|  |  | 
|  | - **VersionHeader** - A value from the following enum. | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | enum class DbiStreamVersion : uint32_t { | 
|  | VC41 = 930803, | 
|  | V50 = 19960307, | 
|  | V60 = 19970606, | 
|  | V70 = 19990903, | 
|  | V110 = 20091201 | 
|  | }; | 
|  |  | 
|  | Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be | 
|  | ``V70``, and it is not clear what the other values are for. | 
|  |  | 
|  | - **Age** - The number of times the PDB has been written.  Equal to the same | 
|  | field from the :ref:`PDB Stream header <pdb_stream_header>`. | 
|  |  | 
|  | - **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`, | 
|  | which contains CodeView symbol records for all global symbols.  Actual records | 
|  | are stored in the symbol record stream, and are referenced from this stream. | 
|  |  | 
|  | - **BuildNumber** - A bitfield containing values representing the major and minor | 
|  | version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the | 
|  | program, with the following layout: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | uint16_t MinorVersion : 8; | 
|  | uint16_t MajorVersion : 7; | 
|  | uint16_t NewVersionFormat : 1; | 
|  |  | 
|  | For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``. | 
|  | If it is ``false``, the layout above does not apply and the reader should consult | 
|  | the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for | 
|  | further guidance. | 
|  |  | 
|  | - **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`, | 
|  | which contains CodeView symbol records for all public symbols.  Actual records | 
|  | are stored in the symbol record stream, and are referenced from this stream. | 
|  |  | 
|  | - **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this | 
|  | PDB.  Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``. | 
|  |  | 
|  | - **SymRecordStream** - The stream containing all CodeView symbol records used | 
|  | by the program.  This is used for deduplication, so that many different | 
|  | compilands can refer to the same symbols without having to include the full record | 
|  | content inside of each module stream. | 
|  |  | 
|  | - **PdbDllRbld** - Unknown | 
|  |  | 
|  | - **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream | 
|  |  | 
|  | - **Flags** - A bitfield with the following layout, containing various | 
|  | information about how the program was built: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | uint16_t WasIncrementallyLinked : 1; | 
|  | uint16_t ArePrivateSymbolsStripped : 1; | 
|  | uint16_t HasConflictingTypes : 1; | 
|  | uint16_t Reserved : 13; | 
|  |  | 
|  | The only one of these that is not self-explanatory is ``HasConflictingTypes``. | 
|  | Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``. | 
|  | If it is passed to ``link.exe``, this field will be set.  Otherwise it will | 
|  | not be set.  It is unclear what this flag does, although it seems to have | 
|  | subtle implications on the algorithm used to look up type records. | 
|  |  | 
|  | - **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__ | 
|  | enumeration.  Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86). | 
|  |  | 
|  | Immediately after the fixed-size DBI Stream header are ``7`` variable-length | 
|  | `substreams`.  The following ``7`` fields of the DBI Stream header specify the | 
|  | number of bytes of the corresponding substream.  Each substream's contents will | 
|  | be described in detail :ref:`below <dbi_substreams>`.  The length of the entire | 
|  | DBI Stream should equal ``64`` (the length of the header above) plus the value | 
|  | of each of the following ``7`` fields. | 
|  |  | 
|  | - **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`. | 
|  |  | 
|  | - **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`. | 
|  |  | 
|  | - **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`. | 
|  |  | 
|  | - **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`. | 
|  |  | 
|  | - **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`. | 
|  |  | 
|  | - **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`. | 
|  |  | 
|  | - **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`. | 
|  |  | 
|  | .. _dbi_substreams: | 
|  |  | 
|  | Substreams | 
|  | ========== | 
|  |  | 
|  | .. _dbi_mod_info_substream: | 
|  |  | 
|  | Module Info Substream | 
|  | ^^^^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`.  The | 
|  | module info substream is an array of variable-length records, each one | 
|  | describing a single module (e.g. object file) linked into the program.  Each | 
|  | record in the array has the format: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | struct SectionContribEntry { | 
|  | uint16_t Section; | 
|  | char Padding1[2]; | 
|  | int32_t Offset; | 
|  | int32_t Size; | 
|  | uint32_t Characteristics; | 
|  | uint16_t ModuleIndex; | 
|  | char Padding2[2]; | 
|  | uint32_t DataCrc; | 
|  | uint32_t RelocCrc; | 
|  | }; | 
|  |  | 
|  | While most of these are self-explanatory, the ``Characteristics`` field | 
|  | warrants some elaboration.  It corresponds to the ``Characteristics`` | 
|  | field of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__ | 
|  | structure. | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | struct ModInfo { | 
|  | uint32_t Unused1; | 
|  | SectionContribEntry SectionContr; | 
|  | uint16_t Flags; | 
|  | uint16_t ModuleSymStream; | 
|  | uint32_t SymByteSize; | 
|  | uint32_t C11ByteSize; | 
|  | uint32_t C13ByteSize; | 
|  | uint16_t SourceFileCount; | 
|  | char Padding[2]; | 
|  | uint32_t Unused2; | 
|  | uint32_t SourceFileNameIndex; | 
|  | uint32_t PdbFilePathNameIndex; | 
|  | char ModuleName[]; | 
|  | char ObjFileName[]; | 
|  | }; | 
|  |  | 
|  | - **SectionContr** - Describes the properties of the section in the final binary | 
|  | which contain the code and data from this module. | 
|  |  | 
|  | - **Flags** - A bitfield with the following format: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | uint16_t Dirty : 1;  // ``true`` if this ModInfo has been written since reading the PDB. | 
|  | uint16_t EC : 1;     // ``true`` if EC information is present for this module. It is unknown what EC actually is. | 
|  | uint16_t Unused : 6; | 
|  | uint16_t TSM : 8;    // Type Server Index for this module.  It is unknown what this is used for, but it is not used by LLVM. | 
|  |  | 
|  |  | 
|  | - **ModuleSymStream** - The index of the stream that contains symbol information | 
|  | for this module.  This includes CodeView symbol information as well as source | 
|  | and line information. | 
|  |  | 
|  | - **SymByteSize** - The number of bytes of data from the stream identified by | 
|  | ``ModuleSymStream`` that represent CodeView symbol records. | 
|  |  | 
|  | - **C11ByteSize** - The number of bytes of data from the stream identified by | 
|  | ``ModuleSymStream`` that represent C11-style CodeView line information. | 
|  |  | 
|  | - **C13ByteSize** - The number of bytes of data from the stream identified by | 
|  | ``ModuleSymStream`` that represent C13-style CodeView line information.  At | 
|  | most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero. | 
|  |  | 
|  | - **SourceFileCount** - The number of source files that contributed to this | 
|  | module during compilation. | 
|  |  | 
|  | - **SourceFileNameIndex** - The offset in the names buffer of the primary | 
|  | translation unit used to build this module.  All PDB files observed to date | 
|  | always have this value equal to 0. | 
|  |  | 
|  | - **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file | 
|  | containing this module's symbol information.  This has only been observed | 
|  | to be non-zero for the special ``* Linker *`` module. | 
|  |  | 
|  | - **ModuleName** - The module name.  This is usually either a full path to an | 
|  | object file (either directly passed to ``link.exe`` or from an archive) or | 
|  | a string of the form ``Import:<dll name>``. | 
|  |  | 
|  | - **ObjFileName** - The object file name.  In the case of an module that is | 
|  | linked directly passed to ``link.exe``, this is the same as **ModuleName**. | 
|  | In the case of a module that comes from an archive, this is usually the full | 
|  | path to the archive. | 
|  |  | 
|  | .. _dbi_sec_contr_substream: | 
|  |  | 
|  | Section Contribution Substream | 
|  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  | Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends, | 
|  | and consumes ``Header->SectionContributionSize`` bytes.  This substream begins | 
|  | with a single ``uint32_t`` which will be one of the following values: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | enum class SectionContrSubstreamVersion : uint32_t { | 
|  | Ver60 = 0xeffe0000 + 19970605, | 
|  | V2 = 0xeffe0000 + 20140516 | 
|  | }; | 
|  |  | 
|  | ``Ver60`` is the only value which has been observed in a PDB so far.  Following | 
|  | this ``4`` byte field is an array of fixed-length structures.  If the version | 
|  | is ``Ver60``, it is an array of ``SectionContribEntry`` structures.  If the | 
|  | version is ``V2``, it is an array of ``SectionContribEntry2`` structures, | 
|  | defined as follows: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | struct SectionContribEntry2 { | 
|  | SectionContribEntry SC; | 
|  | uint32_t ISectCoff; | 
|  | }; | 
|  |  | 
|  | The purpose of the second field is not well understood. | 
|  |  | 
|  |  | 
|  | .. _dbi_section_map_substream: | 
|  |  | 
|  | Section Map Substream | 
|  | ^^^^^^^^^^^^^^^^^^^^^ | 
|  | Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends, | 
|  | and consumes ``Header->SectionMapSize`` bytes.  This substream begins with an ``8`` | 
|  | byte header followed by an array of fixed-length records.  The header and records | 
|  | have the following layout: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | struct SectionMapHeader { | 
|  | uint16_t Count;    // Number of segment descriptors | 
|  | uint16_t LogCount; // Number of logical segment descriptors | 
|  | }; | 
|  |  | 
|  | struct SectionMapEntry { | 
|  | uint16_t Flags;         // See the SectionMapEntryFlags enum below. | 
|  | uint16_t Ovl;           // Logical overlay number | 
|  | uint16_t Group;         // Group index into descriptor array. | 
|  | uint16_t Frame; | 
|  | uint16_t SectionName;   // Byte index of segment / group name in string table, or 0xFFFF. | 
|  | uint16_t ClassName;     // Byte index of class in string table, or 0xFFFF. | 
|  | uint32_t Offset;        // Byte offset of the logical segment within physical segment.  If group is set in flags, this is the offset of the group. | 
|  | uint32_t SectionLength; // Byte count of the segment or group. | 
|  | }; | 
|  |  | 
|  | enum class SectionMapEntryFlags : uint16_t { | 
|  | Read = 1 << 0,              // Segment is readable. | 
|  | Write = 1 << 1,             // Segment is writable. | 
|  | Execute = 1 << 2,           // Segment is executable. | 
|  | AddressIs32Bit = 1 << 3,    // Descriptor describes a 32-bit linear address. | 
|  | IsSelector = 1 << 8,        // Frame represents a selector. | 
|  | IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address. | 
|  | IsGroup = 1 << 10           // If set, descriptor represents a group. | 
|  | }; | 
|  |  | 
|  | Many of these fields are not well understood, so will not be discussed further. | 
|  |  | 
|  | .. _dbi_file_info_substream: | 
|  |  | 
|  | File Info Substream | 
|  | ^^^^^^^^^^^^^^^^^^^ | 
|  | Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends, | 
|  | and consumes ``Header->SourceInfoSize`` bytes.  This substream defines the mapping | 
|  | from module to the source files that contribute to that module.  Since multiple | 
|  | modules can use the same source file (for example, a header file), this substream | 
|  | uses a string table to store each unique file name only once, and then have each | 
|  | module use offsets into the string table rather than embedding the string's value | 
|  | directly.  The format of this substream is as follows: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | struct FileInfoSubstream { | 
|  | uint16_t NumModules; | 
|  | uint16_t NumSourceFiles; | 
|  |  | 
|  | uint16_t ModIndices[NumModules]; | 
|  | uint16_t ModFileCounts[NumModules]; | 
|  | uint32_t FileNameOffsets[NumSourceFiles]; | 
|  | char NamesBuffer[][NumSourceFiles]; | 
|  | }; | 
|  |  | 
|  | **NumModules** - The number of modules for which source file information is | 
|  | contained within this substream.  Should match the corresponding value from the | 
|  | ref:`dbi_header`. | 
|  |  | 
|  | **NumSourceFiles**: In theory this is supposed to contain the number of source | 
|  | files for which this substream contains information.  But that would present a | 
|  | problem in that the width of this field being ``16``-bits would prevent one from | 
|  | having more than 64K source files in a program.  In early versions of the file | 
|  | format, this seems to have been the case.  In order to support more than this, this | 
|  | field of the is simply ignored, and computed dynamically by summing up the values of | 
|  | the ``ModFileCounts`` array (discussed below).  In short, this value should be | 
|  | ignored. | 
|  |  | 
|  | **ModIndices** - This array is present, but does not appear to be useful. | 
|  |  | 
|  | **ModFileCountArray** - An array of ``NumModules`` integers, each one containing | 
|  | the number of source files which contribute to the module at the specified index. | 
|  | While each individual module is limited to 64K contributing source files, the | 
|  | union of all modules' source files may be greater than 64K.  The real number of | 
|  | source files is thus computed by summing this array.  Note that summing this array | 
|  | does not give the number of `unique` source files, only the total number of source | 
|  | file contributions to modules. | 
|  |  | 
|  | **FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles** | 
|  | here refers to the 32-bit value obtained from summing **ModFileCountArray**), where | 
|  | each integer is an offset into **NamesBuffer** pointing to a null terminated string. | 
|  |  | 
|  | **NamesBuffer** - An array of null terminated strings containing the actual source | 
|  | file names. | 
|  |  | 
|  | .. _dbi_type_server_substream: | 
|  |  | 
|  | Type Server Substream | 
|  | ^^^^^^^^^^^^^^^^^^^^^ | 
|  | Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends, | 
|  | and consumes ``Header->TypeServerSize`` bytes.  Neither the purpose nor the layout | 
|  | of this substream is understood, although it is assumed to related somehow to the | 
|  | usage of ``/Zi`` and ``mspdbsrv.exe``.  This substream will not be discussed further. | 
|  |  | 
|  | .. _dbi_ec_substream: | 
|  |  | 
|  | EC Substream | 
|  | ^^^^^^^^^^^^ | 
|  | Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends, | 
|  | and consumes ``Header->ECSubstreamSize`` bytes.  Neither the purpose nor the layout | 
|  | of this substream is understood, and it will not be discussed further. | 
|  |  | 
|  | .. _dbi_optional_dbg_stream: | 
|  |  | 
|  | Optional Debug Header Stream | 
|  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  | Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and | 
|  | consumes ``Header->OptionalDbgHeaderSize`` bytes.  This field is an array of | 
|  | stream indices (e.g. ``uint16_t``'s), each of which identifies a stream | 
|  | index in the larger MSF file which contains some additional debug information. | 
|  | Each position of this array has a special meaning, allowing one to determine | 
|  | what kind of debug information is at the referenced stream.  ``11`` indices | 
|  | are currently understood, although it's possible there may be more.  The | 
|  | layout of each stream generally corresponds exactly to a particular type | 
|  | of debug data directory from the PE/COFF file.  The format of these fields | 
|  | can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__. | 
|  |  | 
|  | **FPO Data** - ``DbgStreamArray[0]``.  The data in the referenced stream is a | 
|  | debug data directory of type ``IMAGE_DEBUG_TYPE_FPO`` | 
|  |  | 
|  | **Exception Data** - ``DbgStreamArray[1]``.  The data in the referenced stream | 
|  | is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``. | 
|  |  | 
|  | **Fixup Data** - ``DbgStreamArray[2]``.  The data in the referenced stream is a | 
|  | debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``. | 
|  |  | 
|  | **Omap To Src Data** - ``DbgStreamArray[3]``.  The data in the referenced stream | 
|  | is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``.  This | 
|  | is used for mapping addresses between instrumented and uninstrumented code. | 
|  |  | 
|  | **Omap From Src Data** - ``DbgStreamArray[4]``.  The data in the referenced stream | 
|  | is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``.  This | 
|  | is used for mapping addresses between instrumented and uninstrumented code. | 
|  |  | 
|  | **Section Header Data** - ``DbgStreamArray[5]``.  A dump of all section headers from | 
|  | the original executable. | 
|  |  | 
|  | **Token / RID Map** - ``DbgStreamArray[6]``.  The layout of this stream is not | 
|  | understood, but it is assumed to be a mapping from ``CLR Token`` to | 
|  | ``CLR Record ID``.  Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__ | 
|  | for more information. | 
|  |  | 
|  | **Xdata** - ``DbgStreamArray[7]``.  A copy of the ``.xdata`` section from the | 
|  | executable. | 
|  |  | 
|  | **Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata`` | 
|  | section from the executable, but that would make it identical to | 
|  | ``DbgStreamArray[1]``.  The difference between these two indices is not well | 
|  | understood. | 
|  |  | 
|  | **New FPO Data** - ``DbgStreamArray[9]``.  The data in the referenced stream is a | 
|  | debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``.  It is not clear how this | 
|  | differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have | 
|  | used the "new" format rather than the "old" format. | 
|  |  | 
|  | **Original Section Header Data** - ``DbgStreamArray[10]``.  Assumed to be similar | 
|  | to ``DbgStreamArray[5]``, but has not been observed in practice. |