rwbagtool - Perform high-level operations on binary Bag files
rwbagtool { --add | --subtract | --minimize | --maximize
| --divide | --scalar-multiply=VALUE
| --compare={lt | le | eq | ge | gt} }
[--intersect=SETFILE | --complement-intersect=SETFILE]
[--mincounter=VALUE] [--maxcounter=VALUE]
[--minkey=VALUE] [--maxkey=VALUE]
[--invert] [--coverset] [--ipset-record-version=VERSION]
[--output-path=PATH [--modify-inplace [--backup-path=BACKUP]]]
[--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD]
[BAGFILE[ BAGFILE...]]
rwbagtool --help
rwbagtool --version
rwbagtool performs various operations on binary Bag files (key-counter associations) and creates a new Bag file or an IPset file. rwbagtool can add Bags together, subtract a subset of data from a Bag, divide a Bag by another, compare the counters of two Bag files, perform key intersection of a Bag with an IPset, extract the keys of a Bag as an IPset, or filter Bag entries based on their key or counter values.
rwbagtool reads Bags from the files and named pipes specified on the command line. If no file names are given on the command line, rwbagtool attempts to read a Bag from the standard input. The names stdin
or -
may be used to force rwbagtool to read from the standard input. The resulting Bag or IPset is written to the location specified by the --output-path switch or to the standard output if that switch is not provided. If a BAGFILE does not contain a Bag or an attempt is made to read binary input or write binary output to the terminal,, rwbagtool prints an error to the standard error and exits abnormally.
In SiLK 3.21.0, rwbagtool added the --modify-inplace switch which correctly handles the case when an input file is also used as the output file. That switch causes rwbagtool to write the output to a temporary file first and then replace the original output file. The --backup-path switch may be used in conjunction with --modify-inplace to set the pathname where the original output file is copied.
A Bag is a set where each key is associated with a counter. rwbag(1) and rwbagbuild(1) are the primary tools used to create a Bag file. rwbagcat(1) prints a binary Bag file as text.
SiLK 3.15.0 introduced Aggregate Bags that are capable of storing multiple keys and counters. See rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), and rwaggbagtool(1) for more information.
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
The first set of options are mutually exclusive; only one may be specified. If none are specified, the counters in the Bag files are summed.
Sum the counters for each key for all Bag files given on the command line. At least one Bag file must be specified, and any number of additional Bag files may be given. If a key is not present in an input file, a counter of zero is used. The result contains the union of the keys from the input Bag files. When no operation switch is specified on the command line, the add operation is the default. If addition causes a counter to exceed the maximum value, rwbagtool exits with an error.
Subtract from the first Bag file all subsequent Bag files. At least one Bag file must be specified, and any number of additional Bag files may be given. If a key does not appear in the first Bag file, rwbagtool assumes it has a value of 0. If subtracting a key's counters results in a non-positive number, the key does appear in the resulting Bag file. The result contains a subset of the keys in the first Bag file.
Cause the output to contain the minimum counter seen for each key. Keys that do not appear in all input Bags do not appear in the output. At least one Bag file must be specified, and any number of additional Bag files may be given.
Cause the output to contain the maximum counter seen for each key. The output contains each key that appears in any input Bag. At least one Bag file must be specified, and any number of additional Bag files may be given.
Divide the first Bag file by the second Bag file. It is an error if only one Bag file or more than two Bag files are given. Every key in the first Bag file must appear in the second file; the second Bag may have keys that do not appear in the first, and those keys do not appear in the output. Since Bags do not support floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5
are rounded up). If the result of the division is less than 0.5, the key does not appear in the output.
Multiply each counter in the Bag file by the scalar VALUE, where VALUE is an integer in the range 1 to 18446744073709551614. This switch requires a single Bag as input. On overflow, the lower 64-bits of the result are used as the counter's value.
Compare the key/counter pairs in exactly two Bag files. It is an error if only one Bag file or more than two Bag files are specified. The keys in the output Bag are only those for which the comparison denoted by OPERATION is true when comparing the key's counter in the first Bag with the key's counter in the second Bag. The counters for all keys in the output have the value 1. Any key that does not appear in both input Bag files does not appear in the result. The possible OPERATION values are the strings:
lt
GetCounter(Bag1, key) < GetCounter(Bag2, key)
le
GetCounter(Bag1, key) <= GetCounter(Bag2, key)
eq
GetCounter(Bag1, key) == GetCounter(Bag2, key)
ge
GetCounter(Bag1, key) >= GetCounter(Bag2, key)
gt
GetCounter(Bag1, key) > GetCounter(Bag2, key)
The result of the above operation is an intermediate Bag file. The following switches are applied next to remove entries from the intermediate Bag:
Mask the keys in the intermediate Bag using the set in SETFILE. SETFILE is the name of a file or a named pipe containing an IPset, or the name stdin
or -
to have rwbagtool read the IPset from the standard input. If SETFILE does not contain an IPset, rwbagtool prints an error to stderr and exits abnormally. Only key/counter pairs where the key matches an entry in SETFILE are written to the output. (IPsets are typically created by rwset(1) or rwsetbuild(1).)
As --intersect, but only writes key/counter pairs for keys which do not match an entry in SETFILE.
Cause the output to contain only those entries whose counter value is VALUE or higher. The allowable range is 1 to the maximum counter value (18446744073709551614); the default is 1.
Cause the output to contain only those entries whose counter value is VALUE or lower. The allowable range is 1 to the maximum counter value; the default is the maximum counter value.
Cause the output to contain only those entries whose key value is VALUE or higher. Default is 0 (or 0.0.0.0). Accepts input as an integer or as an IP address in dotted decimal notation.
Cause the output to contain only those entries whose key value is VALUE or higher. Default is 4294967295 (or 255.255.255.255). Accepts input as an integer or as an IP address in dotted decimal notation.
The following switches control the output.
Generate a new Bag whose keys are the counters in the intermediate Bag and whose counter is the number of times the counter was seen. For example, this turns the Bag {sip:flow} into the Bag {flow:count(sip)}. Any counter in the intermediate Bag that is larger than the maximum possible key is attributed to the counter for the maximum key; to prevent this, specify --maxcounter=4294967295
which removes all key-counter pairs whose counters do not fit into a key. (The --bin-ips switch on rwbagcat(1) allows one to invert a Bag file as it is being printed.) If inverting the Bag causes a counter to exceed the maximum value, rwbagtool exits with an error.
Instead of creating a Bag file as the output, write an IPset which contains the keys contained in the intermediate Bag.
Specify the format of the IPset records that are written to the output when the --coverset switch is used. VERSION may be 2, 3, 4, 5 or the special value 0. When the switch is not provided, the SILK_IPSET_RECORD_VERSION environment variable is checked for a version. The default version is 0. Since SiLK 3.11.0.
Use the default version for an IPv4 IPset and an IPv6 IPset. Use the --help switch to see the versions used for your SiLK installation.
Create a file that may hold only IPv4 addresses and is readable by all versions of SiLK.
Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.0 and later.
Create a file that may hold IPv4 or IPv6 addresses and is readable by SiLK 3.7 and later. These files are more compact that version 3 and often more compact than version 2.
Create a file that may hold only IPv6 addresses and is readable by SiLK 3.14 and later. When this version is specified, IPsets containing only IPv4 addresses are written in version 4. These files are usually more compact that version 4.
Write the resulting Bag or IPset to PATH, where PATH is a filename, a named pipe, the keyword stderr
to write the output to the standard error, or the keyword stdout
or -
to write the output to the standard output. If PATH names an existing file, rwbagtool exits with an error unless the --modify-inplace switch is given or the SILK_CLOBBER environment variable is set, in which case PATH is overwritten. If --output-path is not given, the output is written to the standard output. Attempting to write the binary output to a terminal causes rwbagtool to exit with an error.
Allow rwbagtool to overwrite an existing file and properly account for the output file (PATH) also being an input file. When this switch is given, rwbagtool writes the output to a temporary location first, then overwrites PATH. rwbagtool attempts to copy the permission, owner, and group from the original file to the new file. The switch is ignored when PATH does not exist or the output is the standard output or standard error. rwbagtool exits with an error when this switch is given and PATH is not a regular file. If rwbagtool encounters an error or is interrupted prior to closing the temporary file, the temporary file is removed. See also --backup-path. Since SiLK 3.21.0.
Move the file named by --output-path (PATH) to the path BACKUP immediately prior to moving the temporary file created by --modify-inplace over PATH. If BACKUP names a directory, the file is moved into that directory. This switch will overwrite an existing file. If PATH and BACKUP point to the same location, the output is written to PATH and no backup is created. If BACKUP cannot be created, the output is left in the temporary file and rwbagtool exits with a message and an error. rwbagtool exits with an error if this switch is given without --modify-inplace. Since SiLK 3.21.0.
Do not copy the notes (annotations) from the input files to the output file. Normally notes from the input files are copied to the output.
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
Open FILENAME and add the contents of that file to the header of the output file as an annotation. This switch may be repeated to add multiple annotations. Currently the application makes no effort to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file as an annotation.
Specify the compression library to use when writing output files. If this switch is not given, the value in the SILK_COMPRESSION_METHOD environment variable is used if the value names an available compression method. When no compression method is specified, output to the standard output or to named pipes is not compressed, and output to files is compressed using the default chosen when SiLK was compiled. The valid values for COMP_METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression methods and the default method, use the --help or --version switch. SiLK can support the following COMP_METHOD values when the required libraries are available.
Do not compress the output using an external library.
Use the zlib(3) library for compressing the output, and always compress the output regardless of the destination. Using zlib produces the smallest output files at the cost of speed.
Use the lzo1x algorithm from the LZO real time compression library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead.
Use the snappy library for compression, and always compress the output regardless of the destination. This compression provides good compression with less memory and CPU overhead. Since SiLK 3.13.0.
Use lzo1x if available, otherwise use snappy if available, otherwise use zlib if available. Only compress the output when writing to a file.
Print the available options and exit.
Print the version number and information about how SiLK was configured, then exit the application.
In the following examples, the dollar sign ($
) represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash (\
) is used to indicate a wrapped line.
The examples assume the following contents for the files:
Bag1.bag Bag2.bag Bag3.bag Bag4.bag Mask.set
3| 10| 1| 1| 2| 8| 1| 1| 2
4| 7| 4| 2| 4| 10| 4| 3| 4
6| 14| 7| 32| 6| 14| 6| 4| 6
7| 23| 8| 2| 7| 12| 7| 4| 8
8| 2| 9| 8| 8| 6|
The examples use rwbagcat(1) to print the contents of the Bag files.
Adding Bag files produces a Bag whose keys are the set union of the keys in the input Bags. The counter for each key is the sum of the key's counters in each input Bag.
$ rwbagtool --add Bag1.bag Bag2.bag > Bag-sum.bag
$ rwbagcat --key-format=decimal Bag-sum.bag
1| 1|
3| 10|
4| 9|
6| 14|
7| 55|
8| 4|
$ rwbagtool --add Bag1.bag Bag2.bag Bag3.bag > Bag-sum2.bag
$ rwbagcat --key-format=decimal Bag-sum2.bag
1| 1|
2| 8|
3| 10|
4| 19|
6| 28|
7| 67|
8| 4|
9| 8|
The --subtract switch subtracts from the key/counter pairs in the first Bag file the key/counter pairs in all other Bag file arguments. Keys that are not present in the first argument are ignored. If subtraction results in a counter value of zero or less, the key is removed from the result.
$ rwbagtool --subtract Bag1.bag Bag2.bag > Bag-diff.bag
$ rwbagcat --key-format=decimal Bag-diff.bag
3| 10|
4| 5|
6| 14|
$ rwbagtool --subtract Bag2.bag Bag1.bag > Bag-diff2.bag
$ rwbagcat --key-format=decimal Bag-diff2.bag
1| 1|
7| 9|
The output produced by the --minimize switch contains only the keys that appear in all of input Bags. For each key, the counter is the minimum value for that key in any input Bag.
$ rwbagtool --minimize Bag1.bag Bag2.bag Bag3.bag > Bag-min.bag
$ rwbagcat --key-format=decimal Bag-min.bag
4| 2|
7| 12|
The keys of the Bag file produced by --maximize are the same as the keys produced by --add; that is, the union of all keys in the input files. For each key, its counter is the maximum value seen for that key in any single input Bag file.
$ rwbagtool --maximize Bag1.bag Bag2.bag Bag3.bag > Bag-max.bag
$ rwbagcat --key-format=decimal Bag-max.bag
1| 1|
2| 8|
3| 10|
4| 10|
6| 14|
7| 32|
8| 2|
9| 8|
The --divide switch requires exactly two Bag files as input. The keys in the first Bag argument must be either the same as or a subset of those in the second argument. The counter for each key in the first Bag file is divided by that key's counter in the second file. If the result of the division is less than 0.5, the key is not included in the output.
$ rwbagtool --divide Bag2.bag Bag4.bag > Bag-div1.bag
$ rwbagcat --key-format=decimal Bag-div1.bag
1| 1|
4| 1|
7| 8|
When the order of the Bag file arguments is reversed an error is reported.
$ rwbagtool --divide Bag4.bag Bag2.bag > Bag-div2.bag
rwbagtool: Error dividing bags; key 6 not in divisor bag
To work around this issue, use the --coverset switch to create a copy of Bag4.bag that contains only the keys in Bag2.bag.
$ rwbagtool --coverset Bag2.bag > Bag2-keys.set
$ rwbagtool --intersect=Bag2-keys.set Bag4.bag > Bag4-small.bag
$ rwbagtool --divide Bag4-small.bag Bag2.bag > Bag-div2.bag
$ rwbagcat --key-format=decimal Bag-div2.bag
1| 1|
4| 2|
8| 3|
The following command is the same as the above except the IPset and Bag files are piped between the tools instead of being written to disk:
$ rwbagtool --coverset Bag2.bag \
| rwbagtool --intersect=- Bag4.bag \
| rwbagtool --divide - Bag2.bag \
| rwbagcat --key-format=decimal
1| 1|
4| 2|
8| 3|
The --scalar-multiply switch multiplies each counter in the input Bag by the specified value. Exactly one Bag file argument is required.
$ rwbagtool --scalar-multiply=7 Bag1.bag > Bag-multiply.bag
$ rwbagcat --key-format=decimal Bag-multiply.bag
3| 70|
4| 49|
6| 98|
7| 161|
8| 14|
Use two rwbagtool commands if multiple operations are desired.
$ rwbagtool --add Bag1.bag Bag2.bag \
| rwbagtool --scalar-multiply=3 --output-path=Bag12-multi.bag
$ rwbagcat --key-format=decimal Bag12-multi.bag
1| 3|
3| 30|
4| 27|
6| 42|
7| 165|
8| 12|
The --compare switch takes an argument that specifies how to compare the counters in two Bag files, and it requires exactly two Bag files as input. For each key that appears in both Bag files, the counter value in the first file is compared to counter value in the second file. If the comparison is true, the key appears in the resulting Bag file with a counter of 1. If the comparison is false, the key is not present in the output file. Keys that appear in only one of the input files are ignored.
The following comparisons operate on Bag1.bag and Bag2.bag which have as common keys 4, 7, and 8.
Find counters in Bag1.bag that are less than those in Bag2.bag:
$ rwbagtool --compare=lt Bag1.bag Bag2.bag > Bag-lt.bag
$ rwbagcat --key-format=decimal Bag-lt.bag
7| 1|
Find counters in Bag1.bag that are less than or equal to those in Bag2.bag:
$ rwbagtool --compare=le Bag1.bag Bag2.bag > Bag-le.bag
$ rwbagcat --key-format=decimal Bag-le.bag
7| 1|
8| 1|
Find counters in Bag1.bag that are equal to those in Bag2.bag:
$ rwbagtool --compare=eq Bag1.bag Bag2.bag > Bag-eq.bag
$ rwbagcat --key-format=decimal Bag-eq.bag
8| 1|
Find counters in Bag1.bag that are greater than or equal to those in Bag2.bag:
$ rwbagtool --compare=ge Bag1.bag Bag2.bag > Bag-ge.bag
$ rwbagcat --key-format=decimal Bag-ge.bag
4| 1|
8| 1|
Find counters in Bag1.bag that are greater than those in Bag2.bag:
$ rwbagtool --compare=gt Bag1.bag Bag2.bag > Bag-gt.bag
$ rwbagcat --key-format=decimal Bag-gt.bag
4| 1|
A cover set is an IPset file that contains the keys that are present in any of the input Bag files. In other words, it is the union of the keys converted to an IPset. Since an operation switch is not provided in this command, an implicit --add operation is performed on the Bag files prior to creating the cover set. (rwsetcat(1) prints the contents of an IPset file as text.)
$ rwbagtool --coverset Bag1.bag Bag2.bag Bag3.bag > Cover.set
$ rwsetcat --key-format=decimal Cover.set
1
2
3
4
6
7
8
9
One use of a cover set is to limit the contents of a Bag file to keys that are present in a second Bag file:
$ rwbagtool --coverset --output-path=Cover.set Bag1.bag
$ rwbagtool --intersect=Cover.set Bag2.bag > Bag1-mask-Bag2.bag
$ rwbagcat --key-format=decimal Bag1-mask-Bag2.bag
4| 2|
7| 32|
8| 2|
To mask the contents of Bag2.bag by the keys that are not present in Bag1.bag:
$ rwbagtool --complement-intersect=Cover.set Bag2.bag \
> Bag1-notmask-Bag2.bag
$ rwbagcat --key-format=decimal Bag1-notmask-Bag2.bag
1| 1|
The output of the --invert switch is a Bag file that counts the number of times each counter is present in the input Bag file.
$ rwbagtool --invert Bag1.bag > Bag-inv1.bag
$ rwbagcat --key-format=decimal Bag-inv1.bag
2| 1|
7| 1|
10| 1|
14| 1|
23| 1|
$ rwbagtool --invert Bag2.bag > Bag-inv2.bag
$ rwbagcat --key-format=decimal Bag-inv2.bag
1| 1|
2| 2|
32| 1|
$ rwbagtool --invert Bag3.bag > Bag-inv3.bag
$ rwbagcat --key-format=decimal Bag-inv3.bag
8| 2|
10| 1|
12| 1|
14| 1|
When multiple Bag files are specified on the command line, the files are added prior to creating the inverted Bag. Even though the counter 2 appears three times in the files Bag1.bag and Bag2.bag, the key 2 is not present in the following since the add operation is performed first.
$ rwbagtool --invert Bag1.bag Bag2.bag \
| rwbagcat --key-format=decimal
1| 1|
4| 1|
9| 1|
10| 1|
14| 1|
55| 1|
The --intersect switch takes an IPset file as an argument and limits the keys of the Bag produced by rwbagtool to only those keys that appear in the IPset file.
$ rwbagtool --intersect=Mask.set Bag1.bag > Bag-mask.bag
$ rwbagcat --key-format=decimal Bag-mask.bag
4| 7|
6| 14|
8| 2|
The --complement-intersect switch limits the output to only those keys that do not appear in the IPset file.
$ rwbagtool --complement-intersect=Mask.set Bag1.bag > Bag-mask2.bag
$ rwbagcat --key-format=decimal Bag-mask2.bag
3| 10|
7| 23|
See also the next section.
In addition to limiting the result of rwbagtool to keys that appear or do not appear in an IPset file (cf. previous section), numeric limits may be used to restrict the keys or counters that in the resulting Bag file with use of the --minkey, --maxkey, --mincounter, and --maxcounter switches.
$ rwbagtool --add --maxkey=5 Bag1.bag Bag2.bag > Bag-res1.bag
$ rwbagcat --key-format=decimal Bag-res1.bag
1| 1|
3| 10|
4| 9|
$ rwbagtool --minkey=3 --maxkey=6 Bag1.bag > Bag-res2.bag
$ rwbagcat --key-format=decimal Bag-res2.bag
3| 10|
4| 9|
6| 14|
$ rwbagtool --mincounter=20 Bag1.bag Bag2.bag > Bag-res3.bag
$ rwbagcat --key-format=decimal Bag-res3.bag
7| 55|
$ rwbagtool --subtract --maxcounter=9 Bag1.bag Bag2.bag \
> Bag-res4.bag
$ rwbagcat --key-format=decimal Bag-res4.bag
4| 5|
To share a Bag file with a user who has a version of SiLK that includes different compression libraries, it may be necessary to change the the compression-method of the Bag.
It is not possible to change the compression-method directly. A new file must be created first, and then you may then replace the old file with the new file.
To create a new file that uses a different compression-method of the Bag file A.bag, use rwbagtool with the --add switch and specify the desired argument:
$ rwbagtool --add --compression=none --output-path=A1.bag A.bag
Unfortunately, the Bag tools do not allow changing the key type or counter type of a Bag file. To change the types, use rwbagcat(1) to write the Bag as text and rwbagbuild(1) to convert the text back to a Bag file.
$ rwbagcat Bag1.bag \
| rwbagbuild --bag-input=- --output-path=Bag1-typed.bag \
--key-type=sport --counter-type=sum-bytes
Use rwfileinfo(1) to see the type of the key and counter.
$ rwfileinfo --field=bag Bag1-typed.bag
Bag1-typed.bag:
bag key: sPort @ 4 octets; counter: sum-bytes @ 8 octets
Alternatively, one may use PySiLK (see pysilk(3)) to modify the key type and counter type.
$ cat bag-type.py
import sys
from silk import *
key_type = sys.argv[1]
counter_type = sys.argv[2]
old_file = sys.argv[3]
new_file = sys.argv[4]
old = Bag.load(old_file, key_type=IPv4Addr)
new = Bag(old, key_type=key_type, counter_type=counter_type)
new.save(new_file)
$
$ python bag-type.py sipv4 sum-packets Bag1.bag Bag1-type2.bag
$ rwfileinfo --field=bag Bag1-type2.bag
Bag1-type2.bag:
bag key: sIPv4 @ 4 octets; counter: sum-packets @ 8 octets
This environment variable is used as the value for the --ipset-record-version when that switch is not provided. Since SiLK 3.7.0.
The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
This environment variable is used as the value for --compression-method when that switch is not provided. Since SiLK 3.13.0.
The --modify-inplace switch was added in SiLK 3.21. When --backup-path is also given, there is a small time window when the original file does not exist: the time between moving the original file to the backup location and moving the temporary file into place.
rwbagtool should handle counter overflow more consistently and gracefully.
rwbag(1), rwbagbuild(1), rwbagcat(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwaggbag(1), rwaggbagbuild(1), rwaggbagcat(1), rwaggbagtool(1), silk(7), zlib(3)