I have a project to add support for two new block types to Wireshark. This doesn't seem to be documented anywhere and so I'm hoping that my notes here may help someone in the future.
This is work in progress and so the notes here are not complete. Also, I'm using this as a notepad and I may make mistakes which I'll correct later. If you notice mistakes, please feel free to update this page.
The objective of the project is to add Wireshark support for the display, filtering, etc. of text log data (machine data). The data is presented to Wireshark in a PCAP-NG file that contains two new block types:
- TSDB - Text Source Descriptor Block that defines the layout of the data records
- The data in the TSDB is used to define heading fields i.e. the heading fields aren't predefined as they typically are in dissectors, but rather defined at file load time (and cleared when the file is closed)
- This block is analogous to the Interface Descriptor Block found in a network packet capture
- TRB - Text Record Block that contains the log record data
The initial data being used is Apache HTTPD Common format log records, but I'm designing the solution so that any format of log data can be supported. I've started with the Apache HTTPD log data as it is a fairly simple format; space separated variables in fixed columns.
Test PCAP-NG Generation
Of course, the above raises the question, "What creates the PCAP-NG file with the new blocks?". At this time I'm using the Babel function that comes with TribeLab Workbench. The project that should follow this one will be to write a Wiretap reader for log files.
Babel produces the PCAP-NG file like this:
log_file -----------------------------------> TRBs ^ | apache-common.xml -----+--------------------> TSDB
An XML file describes the format of the log file. The XML is used to generate the TSDB, and some elements of it are used to help parse the log records to form TRBs. See Appendix A below for an example of the XML file.
NB: Although I'm using Babel to generate the file, anyone can use any tool to generate a suitable file. There is nothing proprietary about the TSDB or TRB formats.
I'm trying to add this support completely through the plugin framework, and avoid having to make any changes to core Wireshark code. There is an API to add support for new block types via plugins, but I think this may the first project to use this functionality; there could be bugs and it may not be complete.
Even though the code I am writing has nothing to do with network packets, Wireshark still refers to the list of events in the top pane as the Packet List, and various structures that we need to use refer to packets, most notably the wtap_pkthdr structure.
The TSDB defines the type and meaning of fields. Wireshark should not generate a "Packet List" entry for this block.
- So if you want to handle a new block type but not generate a packet list entry from it, think of the TSDB as being your template.
The TSDB defines each field through TLVs (type-length-value). The types map to native Wireshark field types with two important exceptions.
Field Type Encoding
The encoded integer values for the field types are not the same as the integer values used within Wireshark. This is because the Wireshark types are generated via an enumerated list. A change to the list could change the enumerated values. If we used these values within the TSDB, we would have compatibility problems. Wireshark field tyoe values start at 0. The TSDB field type values start at 1001.
The Wireshark field types values can be found in epan/ftypes/ftype.h. The mapping of TSDB values to Wireshark field values is in the array babeltowsft.
We need to deal with two special cases. A log record could contain many date-time values. We need to indicate which value should be used in the Wireshark packet list. This is done through the EVENT_DATETIME field type.
A log often mixes IPv4 and IPv6 addresses in the same column; both Apache HTTPD and Microsoft IIS do this. To accommodate this we have a TS_FT_IPvx field type.
Appendix A - XML Example
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <source> <header headerline="false" skipheaderlines="0"> <description>Descriptor file for Apache access log in common format</description> <generator>Babel 3.0</generator> <gendate>2017-10-20</gendate> <gentime>19:18:22</gentime> <genzoffset>+1</genzoffset> <owner>Paul Offord</owner> <nativeformat>LogFormat "%h %l %u %t \"%r\" %>s %b" common</nativeformat> <example>192.168.1.87 - user01 [09/Jul/2012:08:25:35 +0100] "GET /Setup.php HTTP/1.1" 200 1824</example> <wsnamespace>apache</wsnamespace> <charencoding>ASCII</charencoding> </header> <records> <record type="1"> <eols enforce="true"> <eol>\n</eol> <eol>\r\n</eol> </eols> <delimiters> <delimiter> </delimiter> </delimiters> <missingvalues> <missingvalue>-</missingvalue> </missingvalues> <criteria> <criterium type="string" offset="*">*</criterium> </criteria> <columns> <column> <informat quoted="false">%i</informat> <name>host</name> <abbrev>bds.apache.host</abbrev> <blurb>This is the IP address of the client (remote host) which made the request to the server.</blurb> <type quoted="false">FT_IPvx</type> <display>BASE_NONE</display> <bitmask>0</bitmask> </column> <column> <informat quoted="false">%s</informat> <name>identid</name> <abbrev>bds.apache.identid</abbrev> <blurb>The identity of the client determined by a request to the identd server on the clients machine.</blurb> <type quoted="false">FT_STRINGZ</type> <display>BASE_NONE</display> <bitmask>0</bitmask> </column> <column> <informat quoted="false">%s</informat> <name>userid</name> <abbrev>bds.apache.userid</abbrev> <blurb>This is the userid of the person requesting the document as determined by HTTP authentication.</blurb> <type quoted="false">FT_STRINGZ</type> <display>BASE_NONE</display> <bitmask>0</bitmask> </column> <column> <informat quoted="false" start-bracket="[" end-bracket="]">[%d/%b/%Y:%H:%M:%S %z]</informat> <name>datetime</name> <abbrev>bds.apache.datetime</abbrev> <blurb>The time that the request was received.</blurb> <type>EVENT_DATETIME</type> <display>BASE_NONE</display> <bitmask>0</bitmask> </column> <column> <informat quoted="true">%s</informat> <name>request</name> <abbrev>bds.apache.request</abbrev> <blurb>The request line from the client is given in double quotes.</blurb> <type>FT_STRINGZ</type> <display>BASE_NONE</display> <bitmask>0</bitmask> </column> <column> <informat quoted="false">%d</informat> <name>response code</name> <abbrev>bds.apache.response-code</abbrev> <blurb>This is the status code that the server sends back to the client.</blurb> <type>FT_UINT32</type> <display>BASE_DEC</display> <bitmask>0</bitmask> </column> <column> <informat quoted="false">%d</informat> <name>bytes returned</name> <abbrev>bds.apache.sc-bytes</abbrev> <blurb>This indicates the size of the object returned to the client, not including the response headers.</blurb> <type>FT_UINT32</type> <display>BASE_DEC</display> <bitmask>0</bitmask> </column> </columns> <infofield>%4 - %5</infofield> </record> </records> </source>