Firewall and Projection Queries
Firewall queries and projection queries are advanced optimization techniques that work together to provide fine-grained change detection and prevent unnecessary dirty propagation in dense dependency graphs. They’re particularly useful for expensive global analysis queries that have many dependents.
The Problem: Chokepoints
Consider a compiler that performs global type analysis. This analysis:
- Depends on many input files
- Produces a large type table
- Is depended upon by hundreds of downstream queries
When any input file changes:
- The global analysis is marked dirty
- All downstream queries are marked dirty (transitively)
- On next execution, hundreds of queries need verification
This creates a chokepoint where a single change causes excessive dirty propagation.
The Solution: Firewalls
A firewall query acts as a boundary that limits dirty propagation:
Input Changes
↓
Firewall Query (checked first)
↓ (only if result changed)
Downstream Queries
When an input changes:
- Dirty propagation stops at the firewall
- When the firewall is requested, it’s recomputed
- Only if the firewall’s output changes does dirty propagation continue
Defining a Firewall Query
Mark a query as a firewall by overriding execution_style():
#![allow(unused)]
fn main() {
use qbice::{Query, ExecutionStyle};
#[derive(Debug, Clone, PartialEq, Eq, Hash, StableHash, Identifiable, Encode, Decode)]
pub struct GlobalTypeAnalysis {
// Input parameters
}
impl Query for GlobalTypeAnalysis {
type Value = TypeTable; // Large result
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Firewall
}
}
}
How Firewalls Work
Without Firewall
Input → Analysis → [100 downstream queries all marked dirty]
When input changes:
- Analysis marked dirty
- 100 queries marked dirty
- On next query, must verify all 100 queries
With Firewall
Input → Analysis [FIREWALL] → [100 downstream queries]
When input changes:
- Analysis marked dirty
- Downstream queries NOT marked dirty
- On next query:
- Analysis is recomputed
- If result unchanged, return cached value
- Downstream queries remain clean!
Example: Global Analysis
#![allow(unused)]
fn main() {
use std::collections::HashMap;
#[derive(Debug, Clone, StableHash)]
pub struct TypeTable {
pub types: HashMap<ID, Type>,
}
#[derive(Debug, Clone, PartialEq, Eq, Hash, StableHash, Identifiable, Encode, Decode)]
pub struct GlobalTypeAnalysis;
impl Query for GlobalTypeAnalysis {
type Value = Arc<TypeTable>;
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Firewall
}
}
pub struct GlobalTypeAnalysisExecutor;
impl<C: Config> Executor<GlobalTypeAnalysis, C> for GlobalTypeAnalysisExecutor {
async fn execute(
&self,
_query: &GlobalTypeAnalysis,
engine: &TrackedEngine<C>,
) -> TypeTable {
// This is expensive and depends on many inputs
let files = engine.query(&GetAllSourceFiles).await;
let mut types = HashMap::new();
for file in files {
let file_types = engine.query(&ParseFile {
path: file.path.clone(),
}).await;
types.extend(file_types);
}
TypeTable { types }
}
}
}
Downstream Query
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, StableHash, Identifiable, Encode, Decode)]
pub struct GetFunctionType {
pub function_name: ID,
}
impl Query for GetFunctionType {
type Value = Option<Type>;
}
impl<C: Config> Executor<GetFunctionType, C> for GetFunctionTypeExecutor {
async fn execute(
&self,
query: &GetFunctionType,
engine: &TrackedEngine<C>,
) -> Option<Type> {
// Depends on the firewall
let table = engine.query(&GlobalTypeAnalysis).await;
table.types.get(&query.function_name).cloned()
}
}
}
Transitive Firewall Edges
Queries that transitively depend on a firewall maintain special transitive firewall edges:
Firewall
↓
Query A
↓
Query B ← has transitive edge to Firewall
When Query B is requested:
- It first checks the Firewall directly
- If Firewall changed, dirty flags propagate from Firewall → A → B
- Then normal verification continues
This ensures correctness while limiting dirty propagation.
When to Use Firewalls
Firewall is best when there are many dependents on the firewall query, making dirty propagation traversing large portions of the graph costly.
When NOT to Use Firewalls
Avoid firewalls when:
- Too much firewalls - Extra memory overhead for maintaining firewall edges
- Few dependents - No chokepoint exists
Trade-offs
Benefits
- Reduced dirty propagation - Potentially thousands of queries stay clean
- Fewer verifications - Less work during repairation
Costs
- Increased complexity - More complex dependency tracking
- Memory overhead - Transitive firewall edges stored
Projection Queries
Projection queries work together with firewall queries to provide fine-grained change detection. They solve the over-invalidation problem by reading specific slices of large firewall results.
The Problem: Over-Invalidation
Consider a firewall query that produces a large result:
#![allow(unused)]
fn main() {
impl Query for GlobalTypeTable {
type Value = Arc<HashMap<ID, Type>>; // 1000+ entries
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Firewall
}
}
}
Downstream queries typically read just one entry:
#![allow(unused)]
fn main() {
impl<C: Config> Executor<GetFunctionType, C> for GetFunctionTypeExecutor {
async fn execute(&self, query: &GetFunctionType, engine: &TrackedEngine<C>) -> Option<Type> {
let table = engine.query(&GlobalTypeTable).await;
table.get(&query.function_name).cloned() // Only reads ONE entry!
}
}
}
The problem:
- A change affects one entry in the table
- The firewall result “changed” (different hash)
- All downstream queries are marked dirty
- In reality, only small subset of downstream queries are actually affected
The Solution: Projections
Projection queries extract specific data from firewall/projection results:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, StableHash, Identifiable, Encode, Decode)]
pub struct GetFunctionType {
pub function_name: ID,
}
impl Query for GetFunctionType {
type Value = Option<Type>;
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Projection // Mark as projection!
}
}
}
Key properties:
- Must only depend on Firewall
- Should be very fast (memory access, hash lookup, field access)
- Extracts a small piece of a larger result
How Projections Work
Normal Flow (Without Projection)
Input changes
↓
Firewall recomputes (hash changes)
↓
All downstream marked dirty
With Projection
Input changes
↓
Firewall recomputes (hash changes)
↓
Projection recomputes (checks specific slice)
↓
Only if projection's result changed → mark downstream dirty
The projection is re-executed immediately when the firewall changes. If the projection’s specific slice hasn’t changed, dirty propagation stops.
Example: Type Table
Firewall Query
#![allow(unused)]
fn main() {
use std::collections::HashMap;
#[derive(Debug, Clone, StableHash)]
pub struct TypeTable {
pub types: HashMap<ID, Type>,
}
#[derive(Debug, Clone, PartialEq, Eq, Hash, StableHash, Identifiable, Encode, Decode)]
pub struct GlobalTypeTable;
impl Query for GlobalTypeTable {
type Value = Arc<TypeTable>; // Use Arc for cheap cloning
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Firewall
}
}
}
Projection Query
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, StableHash, Identifiable, Encode, Decode)]
pub struct GetFunctionType {
pub function_name: ID,
}
impl Query for GetFunctionType {
type Value = Option<Type>;
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Projection
}
}
impl<C: Config> Executor<GetFunctionType, C> for GetFunctionTypeExecutor {
async fn execute(
&self,
query: &GetFunctionType,
engine: &TrackedEngine<C>,
) -> Option<Type> {
// Very fast: just a hash lookup
let table = engine.query(&GlobalTypeTable).await;
table.types.get(&query.function_name).cloned()
}
}
}
Downstream Query
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, StableHash, Identifiable, Encode, Decode)]
pub struct TypeCheckFunction {
pub function_name: ID,
}
impl Query for TypeCheckFunction {
type Value = Result<(), TypeError>;
}
impl<C: Config> Executor<TypeCheckFunction, C> for TypeCheckFunctionExecutor {
async fn execute(
&self,
query: &TypeCheckFunction,
engine: &TrackedEngine<C>,
) -> Result<(), TypeError> {
// Depends on projection
let function_type = engine.query(&GetFunctionType {
function_name: query.function_name.clone(),
}).await;
// Type check using the function's type
type_check(function_type)
}
}
}
Behavior
When a source file changes:
GlobalTypeTable(firewall) recomputes- Only functions in that file have new types
GetFunctionType(projection) re-executes for each dependent- If a specific function’s type didn’t change, its projection returns the same value
- Most of the downstream
TypeCheckFunctionqueries remain clean
Result: Only functions with changed types are re-checked.
Projection Rules
1. Only Depend on Firewall
#![allow(unused)]
fn main() {
// ✓ GOOD - depends on firewall
impl<C: Config> Executor<MyProjection, C> for MyProjectionExecutor {
async fn execute(&self, query: &MyProjection, engine: &TrackedEngine<C>) -> Value {
let firewall_result = engine.query(&MyFirewall).await;
firewall_result.get_field()
}
}
// ✗ BAD - depends on normal query
impl<C: Config> Executor<BadProjection, C> for BadProjectionExecutor {
async fn execute(&self, query: &BadProjection, engine: &TrackedEngine<C>) -> Value {
let normal = engine.query(&NormalQuery).await; // NOT ALLOWED!
normal.field
}
}
}
2. Keep Projections Fast
Projections are re-executed during dirty propagation, so they must be very fast:
#![allow(unused)]
fn main() {
// ✓ GOOD - O(1) lookup
impl<C: Config> Executor<FastProjection, C> for FastProjectionExecutor {
async fn execute(&self, query: &FastProjection, engine: &TrackedEngine<C>) -> Value {
let table = engine.query(&FirewallTable).await;
table.get(&query.key).cloned() // Hash lookup
}
}
// ✓ GOOD - field access
impl<C: Config> Executor<FieldProjection, C> for FieldProjectionExecutor {
async fn execute(&self, query: &FieldProjection, engine: &TrackedEngine<C>) -> Value {
let data = engine.query(&FirewallData).await;
data.field.clone() // Direct field access
}
}
// ✗ BAD - expensive operation
impl<C: Config> Executor<ExpensiveProjection, C> for ExpensiveProjectionExecutor {
async fn execute(&self, query: &ExpensiveProjection, engine: &TrackedEngine<C>) -> Value {
let data = engine.query(&FirewallData).await;
data.items.iter()
.filter(|item| expensive_check(item)) // TOO SLOW!
.collect()
}
}
}
3. Extract Small Slices
Projections should return small portions of the firewall result:
#![allow(unused)]
fn main() {
// ✓ GOOD - returns one item
impl Query for GetItem {
type Value = Option<Item>;
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Projection
}
}
// ✓ GOOD - returns small subset
impl Query for GetItemSubset {
type Value = Vec<Item>; // Small vec
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Projection
}
}
// ✗ BAD - returns entire result
impl Query for GetAllItems {
type Value = Vec<Item>; // Entire table!
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Projection // Pointless!
}
}
}
Common Patterns
Map Lookup
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, StableHash, Identifiable, Encode, Decode)]
pub struct GetValue {
pub key: ID,
}
impl Query for GetValue {
type Value = Option<Value>;
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Projection
}
}
impl<C: Config> Executor<GetValue, C> for GetValueExecutor {
async fn execute(&self, query: &GetValue, engine: &TrackedEngine<C>) -> Option<Value> {
let map = engine.query(&GlobalMap).await;
map.get(&query.key).cloned()
}
}
}
Field Access
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, StableHash, Identifiable, Encode, Decode)]
pub struct GetConfig {
pub field: ConfigField,
}
impl Query for GetConfig {
type Value = ConfigValue;
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Projection
}
}
impl<C: Config> Executor<GetConfig, C> for GetConfigExecutor {
async fn execute(&self, query: &GetConfig, engine: &TrackedEngine<C>) -> ConfigValue {
let config = engine.query(&GlobalConfig).await;
match query.field {
ConfigField::Timeout => config.timeout,
ConfigField::MaxRetries => config.max_retries,
// ...
}
}
}
}
Index Access
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, StableHash, Identifiable, Encode, Decode)]
pub struct GetArrayElement {
pub index: usize,
}
impl Query for GetArrayElement {
type Value = Option<Element>;
fn execution_style() -> ExecutionStyle {
ExecutionStyle::Projection
}
}
impl<C: Config> Executor<GetArrayElement, C> for GetArrayElementExecutor {
async fn execute(&self, query: &GetArrayElement, engine: &TrackedEngine<C>) -> Option<Element> {
let array = engine.query(&GlobalArray).await;
array.get(query.index).cloned()
}
}
}
Performance Trade-offs
Benefits
- Fine-grained invalidation - Only truly affected queries recompute
- Reduced recomputation - Could save thousands of verifications
- Composable - Build chains of projections
Costs
- Extra executions - Projections run during dirty propagation
- More queries - Need projection queries in addition to normal queries
- Increased complexity - More query types to manage
The trade-off is worth it when:
Cost of projection re-execution < Cost of recomputing downstream queries