Skip to main content

· 6 min read
Matthias Vallentin
Thomas Peiselt

Apache Parquet is the common denominator for structured data at rest. The data science ecosystem has long appreciated this. But infosec? Why should you care about Parquet when building a threat detection and investigation platform? In this blog post series we share our opinionated view on this question. In the next three blog posts, we

  1. describe how VAST uses Parquet and its little brother Feather
  2. benchmark the two formats against each other for typical workloads
  3. share our experience with all the engineering gotchas we encountered along the way

· 5 min read
Matthias Vallentin

The VAST project is roughly a decade old. But what happened over the last 10 years? This blog post looks back over time through the lens of the git merge commits.

Why merge commits? Because they represent a unit of completed contribution. Feature work takes place in dedicated branches, with the merge to the main branch sealing the deal. Some feature branches have just one commit, whereas others dozens. The distribution is not uniform. As of 6f9c84198 on Sep 2, 2022, there are a total of 13,066 commits, with 2,334 being merges (17.9%). We’ll take a deeper look at the merge commits.

· 5 min read
Matthias Vallentin

VAST's Sigma frontend now supports more modifiers. In the Sigma language, modifiers transform predicates in various ways, e.g., to apply a function over a value or to change the operator of a predicate. Modifiers are the customization point to enhance expressiveness of query operations.

The new pySigma effort, which will eventually replace the now-considered-legacy sigma project, comes with new modifiers as well. Most notably, lt, lte, gt, gte provide comparisons over value domains with a total ordering, e.g., numbers: x >= 42. In addition, the cidr modifier interprets a value as subnet, e.g., 10.0.0.0/8. Richer typing!

· 4 min read
Dominik Lohmann

VAST v2.1 is out! This release comes with a particular focus on performance and reducing the size of VAST databases. It brings a new utility for optimizing databases in production, allowing existing deployments to take full advantage of the improvements after upgrading.

· 6 min read
Matthias Vallentin

VAST bets on Apache Arrow as the open interface to structured data. By "bet," we mean that VAST does not work without Arrow. And we are not alone. Influx's IOx, DataDog's Husky, Anyscale's Ray, TensorBase, and others committed themselves to making Arrow a corner stone of their system architecture. For us, Arrow was not always a required dependency. We shifted to a tighter integration over the years as the Arrow ecosystem matured. In this blog post we explain our journey of becoming an Arrow-native engine.

· One min read
Benno Evers

Dear community, we are happy to announce the release of VAST v1.1.2, the latest release on the VAST v1.1 series. This release contains a fix for a race condition that could lead to VAST eventually becoming unresponsive to queries in large deployments.

· 6 min read
Dominik Lohmann

Dear community, we are excited to announce VAST v1.1, which ships with exciting new features: query language plugins to exchange the query expression frontend, and compaction as a mechanism for expressing fine-grained data retention policies and gradually aging out data instead of simply deleting it.