The netsa.script module provides a common framework for building SiLK-based analysis scripts. This framework is intended to make scripts re-usable and automatable without much extra work on the part of script authors. The primary concerns of the scripting framework are providing metadata for cataloging available scripts, standardizing handling of command-line arguments (particularly for flow data input), and locating output files.
Here’s an example of a simple Python script using the netsa.script framework.
First is a version without extensive comments, for reading clarity. Then the script is repeated with comments explaining each section.
#!/usr/bin/env python
# Import the script framework under the name "script".
from netsa import script
# Set up the metadata for the script, including the title, what it
# does, who wrote it, who to ask questions about it, etc.
script.set_title("Sample Framework Script")
script.set_description("""
An example script to demonstrate the basic features of the
netsa.script scripting framework. This script counts the
number of frobnitzim observed in each hour (up to a maximum
volume of frobs per hour.)
""")
script.set_version("0.1")
script.set_contact("H. Bovik <hbovik@example.org>")
script.set_authors(["H. Bovik <hbovik@example.org>"])
script.add_int_param("frob-limit",
"Maximum volume of frobs per hour to observe.",
default=10)
script.add_float_param("frobnitz-sensitivity",
"Sensitivity (between 0.0 and 1.0) of frobnitz categorizer.",
default=0.61, expert=True, minimum=0.0, maximum=1.0)
script.add_flow_params(require_pull=True)
script.add_output_file_param("output-path",
"Number of frobnitzim observed in each hour of the flow data.",
mime_type="text/csv")
# See the text for discussion of the next two functions.
def process_hourly_data(out_file, flow_params, frob_limit, frob_sense):
...
def main():
frob_limit = script.get_param("frob-limit")
frobnitz_sensitivity = script.get_param("frobnitz-sensitivity")
out_file = script.get_output_file("output-path")
for hour_params in script.get_flow_params().by_hour():
process_hourly_data(out_file, hour_params, frob_limit,
frobnitz_sensitivity)
script.execute(main)
Let’s break things down by section:
#!/usr/bin/env python
from netsa import script
This is basic Python boilerplate. Any other libraries we use would also be imported at this time.
script.set_title("Sample Framework Script")
script.set_description("""
An example script to demonstrate the basic features of the
netsa.script scripting framework. This script counts the
number of frobnitzim observed in each hour (up to a maximum
volume of frobs per hour.)
""")
script.set_version("0.1")
script.set_contact("H. Bovik <hbovik@example.org>")
script.set_authors(["H. Bovik <hbovik@example.org>"])
Script metadata allows users to more easily find out information about a script, and browse available scripts stored in a central repository. The above calls define all of the metadata that the netsa.script framework currently supports. It is possible that a future version will include additional metadata fields.
script.add_int_param("frob-limit",
"Maximum volume of frobs per hour to observe.",
default=10)
script.add_float_param("frobnitz-sensitivity",
"Sensitivity (between 0.0 and 1.0) of frobnitz categorizer.",
default=0.61, expert=True, minimum=0.0, maximum=1.0)
Script parameters are defined by calling netsa.script.add_X_param (where X is a type) for each parameter. Depending on the type of the parameter, there may be additional configuration options (like minimum and maximum for the float parameter above) available. See the documentation for each function later in this document.
Expert parameters are tuning parameters that are intended for expert use only. An expert parameter is created by setting expert to True when creating a new parameter. This parameter will then be displayed only if the user asks for --help-expert, and the normal help will indicate that expert options are available.
script.add_flow_params(require_pull=True)
Parameters involving flow data are handled separately, in order to ensure that flows are handled consistently across all of our scripts. The netsa.script.add_flow_params function is used to add all of the flow related command-line arguments at once. There is currently only one option. If the require_pull option is set, the flow data must come from an rwfilter data pull (including switches like --start-date, --end-date, --class, etc.) If require_pull is not set, then it is also possible for input files or pipes to be given on the command-line.
script.add_output_file_param("output-path",
"Number of frobnitzim observed in each hour of the flow data.",
mime_type="text/csv")
Every output file (not temporary working file) that the script produces must also be defined using calls to the framework—this ensures that when an automated tool is used to run the script, it can find all of the relevant output files. It’s preferable, but not required, for a MIME content-type (like "text/csv") and a short description of the contents of the file be included.
def process_hourly_data(out_file, flow_params, frob_limit, frob_sense):
...
In this example, the process_hourly_data function would be expected to use the functions in netsa.util.shell to acquire and process flow data for each hour (based on the flow_params argument). The details have been elided for simplicity in this example.
def main():
frob_limit = script.get_param("frob-limit")
frobnitz_sensitivity = script.get_param("frobnitz-sensitivity")
out_file = script.get_output_file("output-path")
for hour_params in script.get_flow_params().by_hour():
process_hourly_data(out_file, hour_params, frob_limit,
frobnitz_sensitivity)
It is important that no work is done outside the main function (which can be given any name you wish). If instead you do work in the body of the file outside of a function, that work will be done whether or not the script has actually been asked to do work. (For example, if the script is given --help, it will not normally call your main function.) So make sure everything is in here.
script.execute(main)
The final statement in the script should be a call to netsa.script.execute, as shown above. This allows the framework to process any command-line arguments (including producing help output, etc.), then call your main function, and finally do clean-up work after the completion of your script.
See the documentation for functions in this module for more details on individual features, including further examples.
This exception represents an error in the arguments provided to a script at the command-line. For example, ParamError('foo', '2x5', 'not a valid integer') is the exception generated when the value given for an integer param is not parsable, and will produce the following error output when thrown from a script’s main function:
<script-name>: Invalid foo '2x5': not a valid integer
This exception represents an error reported by the script that should be presented in a standard way. For example, UserError('your message here') will produce the following error output when thrown from a script’s main function:
<script-name>: your message here
This exception represents an error in script definition or an error in processing script data. This is thrown by some netsa.script calls.
The following functions define “metadata” for the script—they provide information about the name of the script, what the script is for, who to contact with problems, and so on. Automated tools can use this information to allow users to browse a list of available scripts.
Set the title for this script. This should be the human-readable name of the script, and denote its purpose.
Set the description for this script. This should be a longer human-readable description of the script’s purpose, including simple details of its behavior and required inputs.
Set the version number of this script. This can take any form, but the standard major . minor (. patch ) format is recommended.
Set the package name for this script. This should be the human-readable name of a collection of scripts.
Set the point of contact email for support of this script, which must be a single string. The form should be suitable for treatment as an email address. The recommended form is a string containing:
Full Name <full.name@contact.email.org>
Set the list of authors for this script, which must be a list of strings. It is recommended that each author be listed in the form described for set_contact.
Add another author to the list of authors for this script, which must be a single string. See set_authors for notes on the content of this string.
These calls are used to add parameters to a script. When the script is called from the command-line, these are command-line arguments. When a GUI is used to invoke the script, the params might be presented in a variety of ways. This need to support both command-line and GUI access to script parameters is the reason that they’ve been standardized here. It’s also the reason that you’ll find no “add an argument with this arbitrary handler function” here.
If you do absolutely need deeper capabilities than are provided here, you can use one of the basic param types and then do additional checking in the main function. Note, however, that a GUI will not aid users in choosing acceptable values for params defined in this way. Also, make sure to raise ParamError with appropriate information when you reject a value, so that the error can be most effectively communicated back to the user.
Add a text parameter to this script. This parameter can later be fetched as a str by the script using netsa.script.get_param. The required, default, default_help, and expert arguments are used by all add_X_param calls, but each kind of parameter also has additional features that may be used. See below for a list of these features for text params.
Example: Add a new parameter which is required for the script to run.
add_text_param("graph-title",
"Display this title on the output graph.",
required=True)
It is an error if this parameter is not set, and the script will exit with a usage message when it is run at the command-line.
Example: Add a new parameter with a default value of “” (the empty string):
add_text_param("graph-comment",
"Display this comment on the output graph.",
default="")
If the parameter is not provided, the default value will be used.
Example: Display something different in the help text than the actual default value:
add_text_param("graph-date",
"Display data for the given date.",
default=date_for_today(), default_help="today")
Sometimes a default value should be computed but not displayed as the default to the user when they ask for help at the command-line. In this case, a default value should be provided (which will be displayed to users of a GUI), while a value for default_help will be presented in the –help output. In addition, GUIs will also display the value of default_help in some way next to the entry field for this parameter.
It is perfectly legal to provide a value for default_help and not provide a value for default. This makes sense when the only way to compute the default value for the field is at actual execution time. (For example, if the end-date defaults to be the same as the provided start-date.)
Example: Add a new “expert” parameter:
add_text_param("gnuplot-extra-commands",
"Give these extra command to gnuplot when writing output.",
expert=True)
Expert parameters are not listed for users unless they explicitly ask for them. (For example, by using --help-expert at the command line.)
Other keyword arguments meaningful for text params:
- regex
- Require strings to match this regular expression.
Example: Add a new text parameter that is required to match a specific pattern for phone numbers:
add_text_param("phone-number",
"Send reports to this telephone number.",
regex=r"[0-9]{3}-[0-9]{3}-[0-9]{4}")
Add an integer parameter to this script. This parameter can later be fetched as an int by the script using netsa.script.get_param. The required, default, default_help, and expert arguments are described in the help for netsa.script.add_text_param.
Other keyword arguments meaningful for integer parameters:
- minimum
- Only values greater than or equal to this value are allowed for this param.
- maximum
- Only values less than or equal to this value are allowed for this param.
Example: Add a new int parameter which is required to be in the range 0 <= x <= 65535.
add_int_param("targeted-port",
"Search for attacks targeting this port number.",
required=True, minimum=0, maximum=65535)
Add a floating-point parameter to this script. This parameter can later be fetched as a :class`float` by the script using netsa.script.get_param. The required, default, default_help and expert arguments are described in the help for netsa.script.add_text_param.
Other keyword arguments meaningful for floating-point parameters:
- minimum
- Only values greater than or equal to this value are allowed for this param.
- maximum
- Only values less than or equal to this value are allowed for this param.
Add a date parameter to this script. This parameter can later be fetched by the script as a datetime.datetime object using netsa.script.get_param. The required, default, default_help, and expert arguments are described in the help for netsa.script.add_text_param.
Add a label parameter to this script. This parameter can later be fetched by the script as a Python str using netsa.script.get_param. The required, default, default_help, and expert arguments are described in the help for netsa.script.add_text_param.
Other keyword arguments meaningful for label params:
- regex
- Require strings to match this regular expression, instead of the default r"[^\S,]+" (no white space or commas).
Example: Add a new label parameter that is required to match a specific pattern for phone numbers:
add_label_param("output-label",
"Store output to the destination with this label.",
regex=r"[0-9]{3}-[0-9]{3}-[0-9]{4}")
Add a file parameter to this script. This parameter can later be fetched by the script as a Python str filename using netsa.script.get_param. The required, default, default_help, and expert arguments are described in the help for netsa.script.add_text_param.
When the script is run at the command-line, an error will be reported to the user if they specify a file that does not exist, or the path of a directory.
Other keyword arguments meaningful for file params:
- mime_type
- The expected MIME Content-Type of the file, if any.
Add a directory parameter to this script. This parameter can later be fetched by the script as a Python str filename using netsa.script.get_param. The required, default, default_help, and expert arguments are described in the help for netsa.script.add_text_param.
When the script is run at the command-line, an error will be reported to the user if they specify a directory that does not exist, or the path of a file.
Add a path parameter to this script. This parameter can later be fetched by the script as a Python str using netsa.script.get_param. The required, default, default_help, and expert arguments are described in the help for netsa.script.add_text_param.
Add a path parameter to this script. This parameter can later be fetched by the script as a Python str using netsa.script.get_param. The required, default, default_help, and expert arguments are described in the help for netsa.script.add_text_param.
Add a flag parameter to this script. This parameter can later be fetched by the script as a bool using netsa.script.get_param. The default, default_help, and expert arguments are described in the help for netsa.script.add_text_param.
Returns the value of the parameter given by the str argument name. This parameter will be in the type specified for the param when it was added (for example, date parameters will return a datetime.datetime object.) Note that a parameter with no default that is not required may return None.
Returns any extra un-named arguments from the command-line.
Returns the current verbosity level (default 0) for the script invocation. The message function may be used to automatically emit messages based on the verbosity level set for the script. Verbosity is set from the command-line via the --verbose or -v flags.
Writes the string text to stderr, as long as the script’s verbosity is greater than or equal to min_verbosity. Verbosity is set from the command-line via the --verbose or -v flags. The current verbosity level may be retrieved by using the get_verbosity function.
Use this function to write debugging or informational messages from your script for command-line use. For example, writing out which file you are processing, or what stage of processing is in progress.
Do not use it to write out important information such as error messages or actual output. (See UserError or add_output_file_param and add_output_dir_param for error messages and output.)
In order to standardize the large number of scripts that work with network flow data using the SiLK tool suite, the following calls can be used to work with flow data input.
Add a note that will automatically be included in SiLK data pulls generated by this script. This will be included only by rwfilter pulls created by this script using Flow_params.
Add standard flow parameters to this script. The following params are added by default, but individual params may be disabled by including their names in the without_params argument. You might wish to disable the --type param, for example, if your script will run the same pull multiple times, once with --type=in,inweb, then again with --type=out,outweb. (Of course, you might then also want to add in-type and out-type params to the script.)
- --class
- Req Arg. Class of data to process
- --type
- Req Arg. Type(s) of data to process within the specified class. The type names and default type(s) vary by class. Use all to process every type for the specified class. Use rwfilter –help` for details on valid class/type pairs.
- --flowtypes
- Req Arg. Comma separated list of class/type pairs to process. May use all for class and/or type. This is alternate way to specify class/type; switch cannot be used with --class and --type
- --sensors
- Req Arg. Comma separated list of sensor names, sensor IDs, and ranges of sensor IDs. Valid sensors vary by class. Use mapsid to see a mapping of sensor names to IDs and classes.
- --start-date
- Req Arg. First hour of data to process. Specify date in YYYY/MM/DD[:HH] format: time is in UTC. When no hour is specified, the entire date is processed. Def. Start of today
- --end-date
- Req Arg. Final hour of data to process specified as YYYY/MM/DD[:HH]. When no hour specified, end of day is used unless start-date includes an hour. When switch not specified, defaults to value in start-date.
If the require_pull argument to netsa.script.add_flow_params is not True, input filenames may be specified bare on the command-line, and the following additional options are recognized:
- --input-pipe
- Req Arg. Read SiLK flow records from a pipe: stdin or path to named pipe. No default
- --xargs (expert)
- Req Arg. Read list of input file names from a file or pipe pathname or stdin. No default
The values of these parameters can later be retrieved as a netsa.script.Flow_params object using netsa.script.get_flow_params.
Returns a Flow_params object encapsulating the rwfilter flow selection parameters the script was invoked with. This object is filled in based on the command-line arguments described in add_flow_params.
This object represents the flow selection arguments to an rwfilter data pull. In typical use it is built automatically from command-line arguments by the netsa.script.get_flow_params call. Afterwards, methods such as by_hour are used to modify the scope of the data pull, and then the parameters are included in a call to rwfilter using the functions in netsa.util.shell.
Example: Process SMTP data from the user’s requested flow data:
netsa.util.shell.run_parallel(
["rwfilter %(flow_params)s --protocol=6 --aport=25 --pass=stdout",
"rwuniq --fields=sip",
">>output_file.txt"],
vars={'flow_params': script.get_flow_params()})
Example: Separately process each hour’s SMTP data from the user’s request flow data:
flow_params = script.get_flow_params()
# Iterate over each hour individually
for hourly_params in flow_params.by_hour():
# Format ISO-style datetime for use in a filename
sdate = iso_datetime(hourly_params.get_start_date())
netsa.util.shell.run_parallel(
["rwfilter %(flow_params)s --protocol=6 --pass=stdout",
"rwuniq --fields=dport",
">>output_file_%(sdate)s.txt"],
vars={'flow_params': hourly_params,
'sdate': sdate})
Given a Flow_params object including a start-date and an end-date, returns an iterator yielding a Flow_params for each individual day in the time span.
If the original Flow_params starts or ends on an hour that is not midnight, the first or last yielded pulls will not be for full days. All of the other pulls will be full days stretching from midnight to midnight.
See also by_hour which iterates over the time span of the Flow_params by hours instead of days.
Raises a ScriptError if the Flow_params has no date information (for example, the script user specified input files rather than a data pull.) This can be prevented by using require_pull in your call to script.add_flow_params.
Given a Flow_params object including a start-date and an end-date, returns an iterator yielding new Flow_params object identical to this one specialized for each hour in the time period.
Example (strings are schematic of the Flow_params involved):
>>> # Note: Flow_params cannot actually take a str argument like this.
>>> some_flows = Flow_params('--type in,inweb --start-date 2009/01/01T00 '
>>> '--end-date 2009/01/01T02')
>>> list(some_flows.by_hour())
[netsa.script.Flow_params('--type in,inweb --start-date 2009/01/01T00 '
'--end-date 2009/01/01T00'),
netsa.script.Flow_params('--type in,inweb --start-date 2009/01/01T01 '
'--end-date 2009/01/01T01'),
netsa.script.Flow_params('--type in,inweb --start-date 2009/01/01T02 '
'--end-date 2009/01/01T02')]
See also by_day which iterates over the time span of the Flow_params by days instead of hours.
Raises a ScriptError if the Flow_params has no date information (for example, the script user specified input files rather than a data pull.) This can be prevented by using require_pull in your call to script.add_flow_params.
Given a Flow_params object including a data pull, returns an interator yielding a Flow_params for each individual sensor defined in the system.
Returns the bundle of flow selection parameters as a list of strings suitable for use as command-line arguments in an rwfilter call. This is automatically called by the netsa.util.shell routines when a Flow_params object is used as part of a command.
Returns the rwfilter pull --end-date argument as a datetime.datetime object.
Returns any files given on the command-line for an rwfilter pull as a str.
Returns the rwfilter pull --start-date argument as a datetime.datetime object.
Returns True if this Flow_params object represents processing of already retrieved files.
Returns True if this Flow_params object represents a data pull from the repository. (i.e. it contains selection switches.)
Returns a new Flow_params object in which the arguments in this call have replaced the parameters in self, but all other parameters are the same.
Raises a ScriptError if the new parameters are inconsistent or incorrectly typed.
Every output file that a script produces needs to be registered with the system, so that automated tools can be sure to collect everything. Some scripts produce one or more set outputs. For example “the report”, or “the HTML version of the report”. Others produce a number of outputs based on the content of the data they process. For example “one image for each host we identify as suspicious.”
Add an output file parameter to this script. This parameter can later be fetched by the script as a Python str filename or a Python file object using netsa.script.get_output_file_name or netsa.script.get_output_file. Note that if you ask for the file name, you may wish to handle the filenames stdout, stderr, and - specially to be consistent with other tools. (See the documentation of netsa.script.get_output_file_name for details.) Output file parameters are required by default. If an output file parameter is not required, the implication is that if the user does not specify this argument, then this output is not produced.
You should probably not use default values for output file parameters other than "stdout" and "stderr".
In keeping with the behavior of the SiLK tools, it is an error for the user to specify an output file that already exists. If the environment variable SILK_CLOBBER is set, this restriction is relaxed and existing output files may be overwritten.
The mime_type argument is advisory., but it should be set to an appropriate MIME content type for the output file. The framework will not report erroneous types, nor will it automatically convert from one type to another. Examples:
- text/plain
- Human readable text file.
- text/csv
- Comma-separated-value file.
- application/x-silk-flows
- SiLK flow data
- application/x-silk-ipset
- SiLK ipset data
- application/x-silk-bag
- SiLK bag data
- application/x-silk-pmap
- SiLK prefix map data
- image/png
- etc. Various standard formats, many of which are listed on IANA’s website.
It is by no means necessary to provide a useful MIME type, but it is helpful to automated systems that wish to interpret or display the output of your script.
The description argument may also be provided, with a long-form text description of the contents of this output file. Note that description describes the contents of the file, while help describes the meaning of the command-line argument.
Returns the filename for the output parameter name. Note that many SiLK tools treat the names stdout, stderr, and - as meaning something special. stdout and - imply the output should be written to standard out, and stderr implies the output should be written to standard error. It is not required that you handle these special names, but it helps with interoperability. Note that you may need to take care when passing these filenames to SiLK command-line tools for output or input locations, for the same reason.
If you use netsa.script.get_output_file, it will automatically handle these special filenames.
If this output file is optional, and the user has not specified a location for it, this function will return None.
Returns an open file object for the output parameter name. The special names stdout, - are both translated to standard output, and stderr is translated to standard error.
If you need the output file name, use netsa.script.get_output_file_name instead.
If append is True, then the file is opened for append. Otherwise it is opened for write.
Add an output directory parameter to this script. This parameter can later be used to construct a str filename or a Python file object using netsa.script.get_output_dir_file_name or netsa.script.get_output_dir_file. Unlike most parameters, output directory parameters never have default values, and are required by default. If an output directory parameter is not required, the implication is that if the user does not specify this argument, then this output is not produced.
See add_output_file_param for the meanings of the description and mime_type arguments. In this context, these arguments provide default values for files created in this output directory. Each individual file can be given its own mime_type and description when using the netsa.script.get_output_dir_file_name and netsa.script.get_output_dir_file functions.
Returns the path for the file named file_name in the output directory specified by the parameter dir_name. Also lets the netsa.script system know that this output file is about to be used. If provided, the description and mime_type arguments have meanings as described in add_output_file_param. If these arguments are not provided, the defaults from the call where dir_name was defined in add_output_dir_param are used.
If the output directory parameter is optional, and the user has not specified a location for it, this function will return None.
Returns the an open file object for the file named file_name in the output directory specified by the parameter dir_name. Also lets the netsa.script system know that this output file is about to be used. If provided, the description and mime_type arguments have meanings as described in add_output_file_param. If these arguments are not provided, the defaults from the call where dir_name was defined in add_output_dir_param are used.
If the output dir param is optional, and the user has not specified a location for it, this function will return None.
If append is True, the file is opened for append. Otherwise, the file is opened for write.
Executes the main function of a script. This should be called as the last line of any script, with the script’s main function (whatever it might be named) as its only argument.
It is important that all work in the script is done within this function. The script may be loaded in such a way that it is not executed, but only queried for metadata information. If the script does work outside of the main function, this will cause metadata queries to be very inefficient.