TRB Protocol

WARNING: Work in progress

Introduction

The widespread use of encryption means that network packet content is often opaque. Systems, subsystems, equipment and applications generate a lot of useful log data that can be used to provide the desired visibility, and generally enrich packet data.

attachment:trb<span data-escaped-char>_</span>screenshot.png

Although there are many data analytics tools available, some are expensive and all require learning yet another tool. Wireshark has powerful features, such as filter and search, that would be very useful in the analysis of log data, particularly for engineers who are already familiar with Wireshark.

This page proposes Wireshark support for the analysis of text-based log data carried in pcapng files.

Objective

The objective of the project is to add Wireshark support for the display, filtering, etc. of text log data (machine data). The data is presented to Wireshark in a pcapng file that contains two new block types:

There is no proprietary content here, and the block formats are documented so that any suitable tool can be used to convert log data into pcapng format. It may possible to add support to the Wiretap library to directly read log files.

Test pcapng Generation

The Babel function of TribeLab Workbench has been extended to convert log file data into TRBs and TDBs. Babel produces the pcapng file like this:

    log_file -----------------------------------> TRBs
                           ^
                           |
    apache-common.xml -----+--------------------> TDB

An XML file describes the format of the log file. The XML is used to generate the TDB, and some elements of it are used to help parse the log records to form TRBs.

Of course, any tool could be built to generate TRB files; something based on Logstash is an obvious choice here.

Text Description Block (TDB)

The TDB defines the type and meaning of each field using a Field Descriptor. The Field Descriptors can themselves be encoded in a number of field formats. The TRB plugin only supports one format, TDB_FD_FORMAT_WS, which is a native Wireshark format. The pcapng reader (e.g. Wireshark) should not generate an event list entry for this block.

TDB Block Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +---------------------------------------------------------------+
 0 |                    Block Type = 0x80000010                    |
   +---------------------------------------------------------------+
 4 |                      Block Total Length                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 8 |            Version            |            Format             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
12 |          Scheme Index         |           Reserved            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
16 |                             GUID1                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
24 |                             GUID2                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
32 |                             GUID3                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
40 |                             GUID4                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
44 |                           FD Length                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
48 /                                                               /
   /                       Field Descriptors                       /
   /              variable length, padded to 32 bits               /
   /                                                               /
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   /                                                               /
   /                      Options (variable)                       /
   /                                                               /
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Block Total Length                       |
   +---------------------------------------------------------------+
Name Data Type Description Example
Version UINT16 The protocol version. This document describes TRB v3. 3
Format UINT16 The format of the Field Descriptors 101 - TDB_FD_FORMAT_WS
Scheme Index UINT16 Used to relate TRBs with a particular record format 0
Reserved UINT16 Not used - must be set to 0 0
GUID1-4 UINT32 Not used - must be set to 0 0
FD Length UINT32 The length of the Field Descriptor data 420
Field Descriptors Variable Descriptors that describe the data type for each field - See ???
Options Variable TDB options - See below

Wireshark Native Field Descriptors

Wireshark Native Field Descriptors simplify Wireshark decoding of the TRB protocol. Wireshark's interpretation and rendering of each protocol field value is controlled by a header field (hf). A header field has a number of attributes:

| type | UINT32 | The type of value this field holds. See Appendix A for values. | | display | UINT32 | The display field has a couple of overloaded uses. See Appendix B for values. | | bitmask | UINT64 | If the field is a bitfield, then the bitmask is the mask which will leave only the bits needed to make the field when ANDed with a value. The proto_tree routines will calculate 'bitshift' automatically from 'bitmask', by finding the rightmost set bit in the bitmask. This shift is applied before applying string mapping functions or filtering. | | name | String | A string representing the name of the field. This is the name that will appear in the graphical protocol tree. It must be a non-empty string. | | abbrev_ending | String | This string is concatenated with "trb.group_name." to form a complete Wireshark abbreviation. The group_name value is carried in a Group Start field descriptor (see Groups below). Example: "trb" protocol plus "httpd" group name plus field name "host" will be rendered in Wireshark as "trb.httpd.host". | | strings | Compound | Some integer fields, of type FT_UINT*, need labels to represent the true value of a field. You could think of those fields as having an enumerated data type, rather than an integral data type. | | blurb | String | This is a string giving a proper description of the field. It should be at least one grammatically complete sentence, or NULL in which case the name field is used. |

A TDB contains these values serialised like this:

++-----------------------------------------------------++-----------------------------------------------------++----
||type|display|bitmask|name|abbrev_ending|strings|blurb||type|display|bitmask|name|abbrev_ending|strings|blurb|| etc
++-----------------------------------------------------++-----------------------------------------------------++----

The treatment of the abbrev field was tricky. A concern is that if two files are merged, they may have abbrev values that clash i.e. same value, different meaning. By forming the abbrev in Wireshark from the protocol name (trb), a group name value and abbrev_ending, there is an opportunity to adjust the group name value and hence avoid the clash. For example, let's imagine that we need to merge two logs that both have records with a first level group name value of "websphere" and a "host" field; in one log this means the client IP and in the other it means the WebSphere instance. Wireshark could suffix the first level group name of the second log to give websphere_2, thereby avoiding the clash. Ultimately, the use of GUIDs rather than group names is preferable, but these would have to be administered and so this is not a simple matter.

type

The encoded integer values for the field types are not the same as the integer values used within Wireshark. This is because the Wireshark types are generated via an enumerated list. A change to the list could change the enumerated values. If we used these values within the TDB, we would have compatibility problems. Wireshark field type values start at 0. The TDB field type values start at 1001.

The types map to native Wireshark field types with three important exceptions:

Field Descriptor Encoding

Field Descriptor values are encoded like this:

| value_type | UINT16 | See value types below | | value_length | UINT16 | The length of value including zero terminator for a string but not padding | | value | Various | An element of the Field Descriptor e.g. blurb |

Note that the value_type is the data type and not related to the variable identity (so not name, abbrev_ending, blurb, etc.). The identity of of each value is purely defined by its position.

value_type are:

| 1 | UINT8 | | 2 | UINT16 | | 4 | UINT32 | | 8 | UINT64 | | 10 | Zero-terminated string | | 12 | Compound | | 14 | Group Start | | 16 | Group End |

All FD values are padded to a 4-byte boundary.

The Field Descriptors must be defined in the same order as the fields appear in a row of log data. This allows for rapid matching of TRB fields with the correct Field Descriptor by indexing into an array of Field Descriptors. This also means that if a log data row has a missing value, it must be represented in the TRB. Two exceptions to this rule are Group Start and Group End, which bracket a related group of fields - see Groups below.

Compound Type

The strings value is a compound data type and presents three challenges:

There are seven variants of string value:

For full details see doc/README.dissector in the Wireshark code tree.

ToDo - The first release of the TRB dissector doesn't support strings values. Define the encoding and add support for strings.

Groups

Wireshark (or any protocol analyzer) rarely defines a packet decode as a flat list of field values. Related values are group so that they can be shrunk and expanded as necessary. Take a TCP/IP packet as an example. The decode groups the Ethernet fields together, the IP fields, and so on. The Ethernet destination address is further grouped into the Address, LG bit and IG bit.

The availability of this feature for text logs is attractive, and the program generating the TRBs needs to be able to define the group structure.

The rules of the use of groups are:

As a group is a construct that relates to the presentation of a set of fields, a group definition resides in the TDB only; there is no corresponding field in the log and so there is no related TRB field value.

In this way, Group Start should be thought of as a pseudo field descriptor; there is no corresponding field in the log.

Each field of a TRB maps to an entry in the TDB, for example:

TDB Field Descriptor fd_index TRB Data field_index
TS_FT_GRP_START - trb.iis 0 - -
TS_FT_STRINGZ - trb.iis.date 1 2018-06-06 0
TS_FT_STRINGZ - trb.iis.time 2 10:52:28 1
TS_FT_STRINGZ- trb.iis.s-sitename 3 W3SVC1 2
. . . .
. . . .
TS_FT_UINT32 - trb.iis.time-taken 11 31 10
TS_FT_GRP_END - trb.iis 12 - -

It's expected that a pcapng reader uses a field index (field_index) to process each field of a TRB and a field descriptor index (fd_index) to access the correct TDB entry. When the reader encounters the first TRB field it would go to fd_index 0 for descriptor information, and when it finds that the descriptor at this point is a Group Start, it would:

Wireshark represents a group by creating a subtree within the protocol tree. Therefore, the Wireshark TRB dissector should create a subtree when it sees a Group Start.

Processing a Group End should be the same.

TDB Options

Name Code Length Multiple Allowed
opt_owner 3 Variable No
opt_nativeformat 4 Variable No
opt_missingvalue 6 1 No
opt_infocolumn 7 Variable No
opt_summary 8 Variable No
opt_delimiter 9 Variable No

opt_owner:

Details of the owner of this format description. It's envisaged that this would be presented as information only and would not have a direct affect on rendering the data.
Example: 'Paul Offord'

opt_nativeformat:

Native format of the log file. It's envisaged that this would be presented as information only and would not have a direct affect on rendering the data.

Example: LogFormat "%h %l %u %t \"%r\" %>s %b" common

opt_missingvalue:

A character that MAY be used to represent a missing string value.
Example: -

opt_infocolumn:

Defines the information in the Info column of the Wireshark Packet List. Can be constructed from a mixture of fixed text and field values. Field values are specified by the fully qualified abbrev prefixed with a percentage symbol, e.g. %trb.iis.cs-method
Example: HTTPD: %trb.httpd.request which would produce something like this 'HTTPD: GET / HTTP/1.1'

opt_delimiter:

One or more characters marked the boundary between fields i.e. the variable delimiter. It's envisaged that this would be presented as information only and would not have a direct affect on rendering the data.
Example: " " (space character to a Space Separated Variable log record)

Field Descriptors as Options

The structure of the Field Descriptor is based on pcapng Options. This allows the possibility of simply renaming these fields as options, if it is appropriate to do so.

Text Record Block (TRB)

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +---------------------------------------------------------------+
 0 |                    Block Type = 0x80000011                    |
   +---------------------------------------------------------------+
 4 |                      Block Total Length                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 8 |            Version            |            Format             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
12 |          Scheme Index         |           Reserved            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
16 |                        Timestamp (High)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
20 |                        Timestamp (Low)                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
24 |                       Text Data Length                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
28 /                                                               /
   /                           Text Data                           /
   /              variable length, padded to 32 bits               /
   /                                                               /
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   /                                                               /
   /                      Options (variable)                       /
   /                                                               /
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Block Total Length                       |
   +---------------------------------------------------------------+
Name Data Type Description Example
Version UINT16 The protocol version. This document describes TRB v3. 3
Format UINT16 The format of the Field Descriptors 101 - TDB_FD_FORMAT_WS
Scheme Index UINT16 Used to relate this TRB to the TDB describing this record format 0
Reserved UINT16 Not used - must be set to 0 0
Timestamp (High) UINT32 The high order 32 bits of a pcapng timestamp
Timestamp (Low) UINT32 The low order 32 bits of a pcapng timestamp
Data Length UINT32 The length of the field data 968
Field Descriptors Variable Field values E.g. encoded as per Wireshark Field Encoding below
Options Variable TRB options - See below

Wireshark Field Encoding

Each field values are encoded like this:

| value_type | UINT32 | Values that map to native directly to Wireshark field types as shown in Appendix A | | value_length | UINT32 | The length of value including zero terminator for a string | | value | Various | An element of the Field Descriptor e.g. blurb |

The field values are not padded.

The field values must be defined in the same order as the order the corresponding Field Descriptors appear in the TDB. Missing values must be represented in the TRB with a value length of zero.

TRB Options

Name Code Length Multiple Allowed
opt_tsadjust 3 8 No

opt_tsadjust:

Plus or minus adjustment (INT64) to the timestamp based on if_tsresol in the IDB.

Appendix A - Wireshark Native Field Types

Prior to adding TRB support, pcapng files had no need to carry Wireshark Field Type values. Within Wireshark, the field types are enumerated values, and so if the enumeration list is changed the integer value for a field type may change. If we were to use the enumerated values in the TRB protocol, we could experience compatibility issues. Therefore, TRB defines its own set of values for Wireshark field types and the TRB dissector maps these to the enumerated values used in the Wireshark code. The following values are completely different from the enumerated values and this is intentional; the objective being to avoid confusion.

#define EVENT_DATETIME            1001
#define TS_FT_IPvx                1002  /* Special Case */

#define TS_FT_PROTOCOL            2001
#define TS_FT_BOOLEAN             2002  /* TRUE and FALSE come from <glib.h> */
#define TS_FT_UINT8               2003
#define TS_FT_UINT16              2004
#define TS_FT_UINT24              2005  /* really a UINT32 but displayed as 6 hex-digits if FD_HEX*/
#define TS_FT_UINT32              2006
#define TS_FT_UINT40              2007  /* really a UINT64 but displayed as 10 hex-digits if FD_HEX*/
#define TS_FT_UINT48              2008  /* really a UINT64 but displayed as 12 hex-digits if FD_HEX*/
#define TS_FT_UINT56              2009  /* really a UINT64 but displayed as 14 hex-digits if FD_HEX*/
#define TS_FT_UINT64              2010
#define TS_FT_INT8                2011
#define TS_FT_INT16               2012
#define TS_FT_INT24               2013  /* same as for UINT24 */
#define TS_FT_INT32               2014
#define TS_FT_INT40               2015   /* same as for UINT40 */
#define TS_FT_INT48               2016   /* same as for UINT48 */
#define TS_FT_INT56               2017   /* same as for UINT56 */
#define TS_FT_INT64               2018
#define TS_FT_IEEE_11073_SFLOAT   2019
#define TS_FT_IEEE_11073_FLOAT    2020
#define TS_FT_FLOAT               2021
#define TS_FT_DOUBLE              2022
#define TS_FT_ABSOLUTE_TIME       2024
#define TS_FT_RELATIVE_TIME       2025
#define TS_FT_STRING              2026
#define TS_FT_STRINGZ             2027  /* for use with proto_tree_add_item() */
#define TS_FT_UINT_STRING         2028  /* for use with proto_tree_add_item() */
#define TS_FT_ETHER               2029
#define TS_FT_BYTES               2030
#define TS_FT_UINT_BYTES          2031
#define TS_FT_IPv4                2032
#define TS_FT_IPv6                2033
#define TS_FT_IPXNET              2034
#define TS_FT_FRAMENUM            2035  /* a UINT32 but if selected lets you go to frame with that number */
#define TS_FT_PCRE                2036  /* a compiled Perl-Compatible Regular Expression object */
#define TS_FT_GUID                2037  /* GUID UUID */
#define TS_FT_OID                 2038          /* OBJECT IDENTIFIER */
#define TS_FT_EUI64               2039
#define TS_FT_AX25                2040
#define TS_FT_VINES               2041
#define TS_FT_REL_OID             2042  /* RELATIVE-OID */
#define TS_FT_SYSTEM_ID           2043
#define TS_FT_STRINGZPAD          2044  /* for use with proto_tree_add_item() */
#define TS_FT_FCWWN               2045
#define TS_FT_GRP_START           2046
#define TS_FT_GRP_END             2047

TS_FT_BOOLEAN is encoded as a UINT32; 0 = FALSE and 1 = TRUE;

TS_FT_GRP_START and TS_FT_GRP_END are both encoded as UINT32 values but the value is ignored. The value should be set to zero.

Appendix B - Wireshark Native Display Values

The rational behind defining the display values is exactly the same as the reason for defining values for field types - see the explanation above.

#define TS_BASE_NONE          0   /**< none */
#define TS_BASE_DEC        1001   /**< decimal */
#define TS_BASE_HEX        1002   /**< hexadecimal */
#define TS_BASE_OCT        1003   /**< octal */
#define TS_BASE_DEC_HEX    1004   /**< decimal (hexadecimal) */
#define TS_BASE_HEX_DEC    1005   /**< hexadecimal (decimal) */
#define TS_BASE_CUSTOM     1006   /**< call custom routine (in ->strings) to format */

Appendix C - XML Example

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<source>
  <info>
    <description>Descriptor file for Apache HTTPD access log in common format</description>
    <generator>Babel 3.0</generator>
    <gendate>2017-10-20</gendate>
    <gentime>19:18:22</gentime>
    <genzoffset>+1</genzoffset>
    <owner>Paul Offord</owner>
    <nativeformat>LogFormat "%h %l %u %t \"%r\" %>s %b" common</nativeformat>
    <example>192.168.1.87 - aslpjo [09/Jul/2012:08:25:35 +0100] "GET /Setup.php HTTP/1.1" 200 1824</example>
    <charencoding>ASCII</charencoding>
  </info>
  <records>
    <record type="1">
      <eols enforce="true">
        <eol>\n</eol>
        <eol>\r\n</eol>
      </eols>
      <delimiters>
        <delimiter>&nbsp;</delimiter>
      </delimiters>
      <missingvalues>
        <missingvalue>-</missingvalue>
      </missingvalues>
      <criteria>
        <criterium type="string" offset="*">*</criterium>
      </criteria>
      <columns definedby="position">
        <group name="httpd" label="HTTPD Log Record">
          <column>
            <informat quoted="false">%i</informat>
            <name>host</name>
            <abbrev>host</abbrev>
            <blurb>This is the IP address of the client (remote host) which made the request to the server.</blurb>
            <type quoted="false">FT_IPvx</type>
            <display>BASE_NONE</display>
            <bitmask>0</bitmask>
          </column>
          <column>
            <informat quoted="false">%s</informat>
            <name>identid</name>
            <abbrev>identid</abbrev>
            <blurb>The identity of the client determined by a request to the identd server on the clients machine.</blurb>
            <type quoted="false">FT_STRINGZ</type>
            <display>BASE_NONE</display>
            <bitmask>0</bitmask>
          </column>
          <column>
            <informat quoted="false">%s</informat>
            <name>userid</name>
            <abbrev>userid</abbrev>
            <blurb>This is the userid of the person requesting the document as determined by HTTP authentication.</blurb>
            <type quoted="false">FT_STRINGZ</type>
            <display>BASE_NONE</display>
            <bitmask>0</bitmask>
          </column>
          <column>
            <informat quoted="false" start-bracket="[" end-bracket="]">[%d/%b/%Y:%H:%M:%S %z]</informat>
            <name>datetime</name>
            <abbrev>datetime</abbrev>
            <blurb>The time that the request was received.</blurb>
            <type>EVENT_DATETIME</type>
            <display>BASE_NONE</display>
            <bitmask>0</bitmask>
          </column>
          <column>
            <informat quoted="true">%s</informat>
            <name>request</name>
            <abbrev>request</abbrev>
            <blurb>The request line from the client is given in double quotes.</blurb>
            <type>FT_STRINGZ</type>
            <display>BASE_NONE</display>
            <bitmask>0</bitmask>
          </column>
          <column>
            <informat quoted="false">%d</informat>
            <name>response code</name>
            <abbrev>response-code</abbrev>
            <blurb>This is the status code that the server sends back to the client.</blurb>
            <type>FT_UINT32</type>
            <display>BASE_DEC</display>
            <bitmask>0</bitmask>
          </column>
          <column>
            <informat quoted="false">%d</informat>
            <name>bytes returned</name>
            <abbrev>sc-bytes</abbrev>
            <blurb>This indicates the size of the object returned to the client, not including the response headers.</blurb>
            <type>FT_UINT32</type>
            <display>BASE_DEC</display>
            <bitmask>0</bitmask>
          </column>
        </group>
      </columns>
      <infofield>HTTPD: %trb.httpd.request</infofield>
    </record>
  </records>
</source>

Appendix D - Sample Pcapng File

(Use Wireshark View->Reload as File Format/Capture to view Pcapng contents)

Appendix E - Other Information


Imported from https://wiki.wireshark.org/TRB%20Protocol on 2020-08-11 23:26:54 UTC