Mothra is a collection of libraries and tools for working with network flow data in the Apache Spark large-scale data analytics engine. Mothra currently supports Apache Spark versions 2.3, 2.4, and 3.
The Mothra libraries include Apache Spark data sources for reading IPFIX and SiLK flow data, as well as some additional SiLK data files. Since Mothra works with the Apache Spark data analytics engine, numerous other data sources and file formats may also be analyzed and cross-referenced with this network data.
Other Mothra libraries provide useful functions for working with information commonly present in network data, such as IP addresses, TCP flags, port numbers, and the like.
The Mothra tools include software for loading IPFIX and SiLK data into HDFS storage for later analysis, and for partitioning IPFIX data as it is loaded to support more efficiently queries.