NAME

rwfileinfo - Print information about a SiLK file

SYNOPSIS

rwfileinfo [--fields=FIELDS] [--summary] [--no-titles]
      [--site-config-file=FILENAME]
      {--xargs | --xargs=FILENAME | FILE [FILE...]}

rwfileinfo --help

rwfileinfo --help-fields

rwfileinfo --version

DESCRIPTION

rwfileinfo prints information about a binary SiLK file that can be determined by reading the file's header and by moving quickly over the data blocks in the file.

rwfileinfo requires one or more filename arguments to be given on the command line or the use of the --xargs switch. When the --xargs switch is provided, rwfileinfo reads the names of the files to process from the named text file or from the standard input if no file name argument is provided to the switch. The input to --xargs must contain one file name per line. rwfileinfo does not read a SiLK file's content from the standard input by default, but it does when either - or stdin is given as a filename argument.

When the --summary switch is given, rwfileinfo first prints the information for each individual file and then prints the number of files processed, the sum of the individual file sizes, and the sum of the individual record counts.

Field Descriptions

By default, rwfileinfo prints the following information for each file argument. Use the --fields switch to modify which pieces of information are printed.

(rwfileinfo prints each field in the order in which support for that field was added to SiLK. The field descriptions are presented here in a more logical order.)

file-size

The size of the file on disk as reported by the operating system. rwfileinfo prints 0 for the file-size when reading from the standard input.

version

Every binary file written by SiLK has a version number field. Since SiLK 1.0.0, the version number field has been used to indicate the general structure (or layout) of the file. The file structure adopted in SiLK 1.0.0 uses a version number of 16 and has a header section and a data section. The header section begins with 16 bytes that specify well-defined values, and those bytes are followed by one or more variably-sized header entries. The specifics of the data section depend on the content of the file.

header-length

The header-length field shows the number of octets required by header (i.e., the initial 16 bytes and the header entries). Since everything after the header is data, the header-length is the starting offset of the data section. The smallest header length is 24 bytes, but typically the header is padded to be an integer multiple of the record-length. The header-length that rwfileinfo prints for a file is determined dynamically by reading the file's header.

silk-version

When a SiLK tool creates a binary file, the tool writes the current SiLK release number (such as 3.9.0) into the file's header as a way to help diagnose issues should a bug with a particular release of SiLK be discovered in the future.

byte-order

Every SiLK file has a byte-order or endian field. SiLK uses the machine's native representation of integers when writing data, and this field shows what representation the file contains. BigEndian is network byte order and littleEndian is used by Intel chips. The rwswapbytes(1) tool changes a file's integer representation, and some tools have a --byte-order switch that allows the user to specify the integer representation of output files. The header-section of a file is always written in network byte order.

compression

SiLK tools may use the zlib library (http://zlib.net/), the LZO library (http://www.oberhumer.com/opensource/lzo/), or the snappy library (http://google.github.io/snappy/) to compress the data section of a file. The compression field specifies which library (if any) was used to compress the data section. If a file is compressed with a library that was not included in an installation of SiLK, SiLK is unable to read the data section of the file. Many SiLK tools accept the --compression-method switch to choose a particular compression method. (The compression field does not indicate whether the entire file has been compressed with an external compression utility such as gzip(1).)

format

Every binary file written by SiLK has two fields in the header that specify exactly what the file contains: the format and the record-version. In general, the format indicates the content type of the file and the record-version indicates the evolution of that content.

The contents of a file whose format is FT_IPSET, FT_RWBAG, or FT_PREFIXMAP is fairly obvious (an IPset, a Bag, a prefix map).

There are many different file formats for writing SiLK Flow records, but the SiLK analysis tools largely use a single Flow file format. That format is FT_RWIPV6ROUTING if SiLK has been compiled with IPv6 support, or FT_RWGENERIC otherwise. A file that uses the FT_RWGENERIC format is only capable of holding IPv4 addresses.

The other SiLK Flow file formats are created by rwflowpack(8) as it writes flow records to the repository. These formats often omit fields and use reduced bit-sizes for fields to reduce the space required for an individual flow record.

The record-version field indicates changes within the general type specified by the format field. For example, SiLK incremented the record-version of the formats that hold flow records when the resolution of record timestamps changed from seconds to milliseconds and again from milliseconds to nanoseconds.

record-version

Together with the format fields specifies the contents of the file. See the discussion of format for details.

record-length

Files created by SiLK 1.0.0 and later have a record length field. This field contains the length of an individual record, and this value is dependent on the format and record-version fields described above. Some files (such as those containing IPsets or prefix maps) do not write individual records to the output, and the record length is 1 for these files.

count-records

The count-records field is generated dynamically by determining the length the data section would require if it were completely uncompressed and dividing it by the record-length. When the record-length is 1 (such as for IPset files), the count-records field does not provide much information beyond the length of the uncompressed data. For an uncompressed file, adding header-length to the product of count-records and record-length is equal to the file-size.

The fields given above are either present in the well-defined header or are computed by reading the file.

The following fields are generated by reading the header entries and determining if one or more header entries of the specified type are present. The field is not printed in the output when the header entry is not present in the file.

command-lines

Many of the SiLK tools write a header entry to the output file that contains the command line invocation used to create that file, and some of the SiLK tools also copy the command line history from their input files to the output file. (The --invocation-strip switch on the tools can be used to prevent copying and recording of the invocation.) The command lines are stored in individual header entries and this field displays those entries with the most recent invocation at the end of the list.

The command line history is has a couple of issues:

  • When multiple input files are used to create a single output, the entries are stored as a list, and this makes it is difficult to know which set of command line entries are associated with which input file.

  • When a SiLK tool creates multiple output files (e.g., when using both --pass and --fail to rwfilter(1)), the tool writes the same command line entry to each output file. Some context in addition to the command line history may be needed to know which branch of that tool a particular file represents.

annotations

Most of SiLK tools that create binary output files provide the --note-add and --note-file-add switches which allow an arbitrary annotation to be added to the header of a file. Some tools also copy the annotations from the source files to the destination files. The annotations are stored in individual header entries and this field displays those entries.

ipset

The IPset writing tools (rwset(1), rwsetbuild(1), rwsettool(1), rwaggbagtool(1), and rwbagtool(1)) support the following output formats for IPset data structures:

2

May hold only IPv4 addresses and does not have an ipset header entry.

3

May hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later. It contains a header entry that describes the IPset data structure, and the entry specifies the number of nodes, the number of branches from each node, the number of leaves, the size of the nodes and leaves, and which node is the root of the tree.

4

May hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. The file's header entry specifies whether the file contains IPv4 addresses or IPv6 addresses.

5

May hold only IPv6 addresses and is readable by SiLK 3.14 and later. The header entry specifies that the file contains IPv6 data.

bag

Since SiLK 3.0.0, the tools that write binary Bag files (rwbag(1), rwbagbuild(1), and rwbagtool(1)) have written a header entry that specifies the type and size of the key and of the counter in the file.

aggregate-bag

The tools rwaggbag(1), rwaggbagbuild(1), and rwaggbagtool(1) write a header entry that contains the field types that comprise the key and the counter.

prefix-map

When using rwpmapbuild(1) to create a prefix map file, a string that specifies a mapname may be provided. rwpmapbuild writes the mapname to a header entry in the prefix map file. The mapname is used to generate command line switches or field names when the --pmap-file switch is specified to several of the SiLK tools (see pmapfilter(3) for details). When displaying the mapname, rwfileinfo prefixes it with the string v1: which denotes a version number for the prefix-map header entry. (The version number is printed for completeness.)

packed-file-info

When rwflowpack(8) creates a SiLK Flow file for the repository, all the records in the file have the same starting hour, the same sensor, and the same flowtype (class/type pair). rwflowpack writes a header entry to the file that contains these values, and this field displays those values. (To print the names for the sensor and flowtype, the silk.conf(5) file must be accessible.)

probe-name

When flowcap(8) creates a SiLK flow file, it adds a header entry specifying the name of the probe from which the data was collected.

OPTIONS

Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.

--fields=FIELDS

Specify what information to print for each file argument on the command line. FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the start and end of the range with a hyphen (-). Field-names are case-insensitive and may be shortened to a unique prefix. When the --fields option is not given, all fields are printed if the file contains the necessary information. The fields are always printed in the order they appear here regardless of the order they are specified in FIELDS.

The possible field values are given next with a brief description of each. For a full description of each field, see "Field Descriptions" above.

format,1

The contents of the file as a name and the corresponding hexadecimal ID.

version,2

An integer describing the layout or structure of the file.

byte-order,3

Either BigEndian or littleEndian to indicate the representation used to store integers in the file (network or non-network byte order).

compression,4

The compression library (if any) used to compress the data-section of the file, specified as a name and its decimal ID.

header-length,5

The octet length of the file's header; alternatively the offset where data begins.

record-length,6

The octet length of a single record or the value 1 if the file's content is not record-based.

count-records,7

The number of records in the file, computed by dividing the uncompressed data length by the record-length.

file-size,8

The size of the file on disk as reported by the operating system.

command-lines,9

The command line invocation used to generate this file.

record-version,10

The version of the records contained in the file.

silk-version,11

The release of SiLK that wrote this file.

packed-file-info,12

For a repository Flow file generated by rwflowpack(8), this prints the timestamp of the starting hour, the flowtype, and the sensor of each flow record in the file.

probe,13

For a Flow file generated by flowcap(8), the name of the probe where the flow records where initially collected.

annotations,14

The notes (annotations) that users have added to the file's header.

prefix-map,15

For a prefix map file, the mapname that was set when the file was created by rwpmapbuild(1).

ipset,16

For an IPset file whose record-version is 3, a description of the tree data structure. For an IPset file whose record-version is 4, the type of IP addresses (IPv4 or IPv6).

bag,17

For a bag file, the type and size of the key and of the counter.

aggregate-bag,18

For an aggregate bag file, the field types that comprise the key and the counter.

--summary

After the data for each individual file is printed, print a summary that shows the number of files processed, the sum of the individual file sizes, and the total number of records contained in those files.

--no-titles

Suppress printing of the file name and field names. The output contains only the values, where each value is printed left-justified on a single line.

--site-config-file=FILENAME

Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, rwfileinfo searches for the site configuration file in the locations specified in the "FILES" section.

--xargs
--xargs=FILENAME

Read the names of the input files from FILENAME or from the standard input if FILENAME is not provided. The input is expected to have one filename per line. rwfileinfo opens each named file in turn and prints its information as if the filenames had been listed on the command line. Since SiLK 3.15.0.

--help

Print the available options and exit.

--help-fields

Print a description of each field, its alias, and exit.

--version

Print the version number and information about how SiLK was configured, then exit the application.

EXAMPLE

In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign represents the command line.

Get information about the file tcp-data.rw:

$ rwfileinfo tcp-data.rw
tcp-data.rw:
  format(id)          FT_RWGENERIC(0x16)
  version             16
  byte-order          littleEndian
  compression(id)     none(0)
  header-length       208
  record-length       52
  record-version      5
  silk-version        1.0.1
  count-records       7
  file-size           572
  command-lines
                   1  rwfilter --proto=6 --pass=tcp-data.rw ...
  annotations
                   1  This is some interesting TCP data

Return a single value which is the number of records in the file tcp-data.rw:

$ rwfileinfo --no-titles --field=count-records tcp-data.rw
7

ENVIRONMENT

SILK_CONFIG_FILE

This environment variable is used as the value for the --site-config-file when that switch is not provided.

SILK_DATA_ROOTDIR

This environment variable specifies the root directory of data repository. As described in the "FILES" section, rwfileinfo may use this environment variable when searching for the SiLK site configuration file.

SILK_PATH

This environment variable gives the root of the install tree. When searching for configuration files, rwfileinfo may use this environment variable. See the "FILES" section for details.

FILES

${SILK_CONFIG_FILE}
${SILK_DATA_ROOTDIR}/silk.conf
/data/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/share/silk/silk.conf
/usr/share/silk.conf

Possible locations for the SiLK site configuration file which are checked when the --site-config-file switch is not provided.

SEE ALSO

rwfilter(1), rwaggbag(1), rwaggbagbuild(1), rwaggbagtool(1), rwbag(1), rwbagbuild(1), rwbagtool(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwsettool(1) rwswapbytes(1), silk.conf(5), pmapfilter(3), flowcap(8), rwflowpack(8), silk(7), gzip(1)