Arrow
VAST supports reading and writing data in the binary Arrow IPC
columnar format, suitable for efficient handling of large data sets. For
example, VAST's Python bindings use this format
for high-bandwidth data exchange.
VAST translates its own types into Arrow extension types to properly describe domain-specific concepts like IP addresses or subnets. VAST's Python bindings come with the required tooling, so you can work with native types instead of relying on generic string or number representations.
Parser
The import arrow
command imports Arrow IPC
data. This allows
for efficiently transferring data between VAST nodes:
VAST_SOURCE_HOST=localhost:5158
VAST_DESTINATION_HOST=localhost:42001
# Transfer all Zeek events from the VAST node at VAST_SOURCE_HOST to the VAST
# node at VAST_DESTINATION_HOST.
vast --endpoint=${VAST_SOURCE_HOST} export arrow '#type == /zeek.*/' \
| vast --endpoint=${VAST_DESTINATION_HOST} import arrow
Technically, this format carries the schema alongside the data: import arrow
is self-contained and does not require an additional schema. However, the Arrow
import is currently limited to Arrow data that was exported by VAST via the
export arrow
command. We plan to remove this restriction in the future,
allowing the following Python code to work:
import pyarrow as pa
import sys
data = [
pa.array([1, 2, 3, 4]),
pa.array(['foo', 'bar', 'baz', None]),
pa.array([True, None, False, True])
]
batch = pa.record_batch(data, names=['a', 'b', 'c'])
sink = pa.output_stream(sys.stdout.buffer)
with pa.ipc.new_stream(sink, batch.schema) as writer:
for i in range(5):
writer.write_batch(batch)
python generate.py | vast import arrow
Output
Since Arrow IPC is self-contained and includes the full schema, you can use it to transfer data between VAST nodes, even if the target node is not aware of the underlying schema.
To export a query result as an Arrow IPC stream, use export arrow
:
vast export arrow '1.2.3.4 || #type == "suricata.alert"'
Note that this generates binary output. Make sure you pipe the output to a tool that reads an Arrow IPC stream on stdin.
VAST's Python bindings use this method to retrieve data from a VAST server.