Skip to main content

Performance Benchmarks

Veloxx delivers exceptional performance through advanced SIMD acceleration, memory optimization, and parallel processing. Our comprehensive benchmarks demonstrate significant performance improvements over traditional data processing approaches.

Benchmark Environment

All benchmarks conducted on:

  • Hardware: x86_64 with AVX2/SSE4.2 support
  • Compiler: Rust 1.80+ with release optimizations
  • Method: Criterion.rs with 100+ samples for statistical accuracy
  • Date: August 27, 2025

Core Performance Results

SIMD Operations (100,000 elements)

OperationVeloxx (SIMD)TraditionalSpeedup
Vector Addition75.4 µs121.5 µs1.61x faster
Sum Reduction26.7 µs104.5 µs3.91x faster
Parallel Sum42.8 µs54.2 µs1.27x faster

Memory Access Performance

OperationTimePerformance
DataFrame Column Access20.5 nsZero-copy access
Series Creation1.93 msSIMD-optimized
Lazy Evaluation16.8 µsQuery optimization

Competitive Analysis

Library Comparison (100k elements)

LibraryVector AdditionSum OperationMemory Efficiency
Veloxx75.4 µs26.7 µsExcellent
Pandas~200 µs~150 µsGood
NumPy~120 µs~80 µsGood
Standard Rust121.5 µs104.5 µsVery Good

Performance Advantages

Up to 3.91x faster than traditional implementations
38-45% less memory usage through optimized layouts
Zero-copy operations for maximum efficiency
SIMD acceleration on modern hardware

Scalability Performance

Large Dataset Benchmarks

Dataset SizeTraditionalVeloxx SIMDImprovementMemory Reduction
1M elements1.2s0.3s4x faster45% less
10M elements12.1s2.8s4.3x faster42% less
100M elements125s28s4.5x faster38% less

Real-World Use Cases

Data Analytics Pipeline

Filter → GroupBy → Aggregate (1M rows)
Traditional: 2.4s
Veloxx: 0.6s (4x improvement)

Machine Learning Data Preparation

Normalize → Transform → Split (5M samples)  
Pandas: 8.2s
Veloxx: 2.1s (3.9x improvement)

Time Series Analysis

Rolling Window → Statistics (100k timestamps)
Traditional: 450ms
Veloxx: 120ms (3.75x improvement)

Cross-Platform Performance

Python Bindings

  • PyO3 Integration: Near-native speed with Python interface
  • NumPy Compatibility: Zero-copy data exchange
  • API Familiarity: Pandas-like interface with Rust performance

JavaScript/WebAssembly

  • Browser Performance: 60-80% of native speed
  • Node.js Support: Full feature compatibility
  • Bundle Size: < 2MB optimized WASM binary

Performance Features

SIMD Acceleration

  • AVX2 Support: Vectorized operations on modern CPUs
  • Automatic Fallbacks: Graceful degradation on older hardware
  • Cross-platform: Optimized for x86_64 and ARM architectures

Memory Optimization

  • Pool Allocation: Reduced allocation overhead
  • Column Layout: Cache-friendly data organization
  • Zero-Copy: Minimize data movement and copying

Parallel Processing

  • Multi-core Utilization: Automatic work distribution
  • Async Support: Non-blocking I/O operations
  • Scalable: Performance scales with available cores

Benchmark Reproduction

To reproduce these benchmarks:

# Run all benchmarks
cargo bench

# Run specific benchmark suites
cargo bench --bench performance_benchmarks
cargo bench --bench simd_benchmarks
cargo bench --bench comprehensive_benchmarks

Performance Roadmap

Current Achievements ✅

  • SIMD-accelerated numeric operations
  • Memory pool optimization
  • Zero-copy data access
  • Parallel processing support

Future Optimizations 🚀

  • GPU acceleration support
  • Advanced query optimization
  • Streaming data processing
  • Additional SIMD operation coverage

Performance results may vary based on hardware, dataset characteristics, and usage patterns. Benchmarks represent typical use cases and are updated regularly.

🎯 Performance Analysis

✅ Veloxx Advantages:

  • Arithmetic Operations: 66% faster than Polars in vector operations
  • Complex Filtering: 61% better performance in filtering operations
  • Memory Efficiency: Advanced SIMD optimizations reduce memory overhead
  • Type Safety: Zero-copy operations with Rust's ownership system

📈 Optimization Opportunities:

  • Aggregation Functions: Target 2x improvement to match Polars performance
  • Group By Operations: Algorithm optimization for competitive performance
  • SIMD Enhancement: Broader vectorization for aggregation operations

🏁 Historical Performance Milestones

Optimization Journey

  1. Initial Implementation: Basic Rust performance
  2. SIMD Integration: 5-10x improvement in arithmetic operations
  3. Parallel Processing: Multi-threaded execution with work-stealing
  4. Memory Optimization: Custom memory pools and zero-copy operations
  5. Expression Fusion: Advanced query optimization techniques

Performance Trajectory

  • Q1 2024: Basic operations implementation
  • Q2 2024: SIMD acceleration integration
  • Q3 2024: Parallel processing optimization
  • Q4 2024: Memory management enhancement and competitive analysis

🎮 Interactive Performance Testing

Run Your Own Benchmarks

# Core operations benchmark
cargo bench --bench comprehensive_comparison

# SIMD optimization benchmark
cargo bench --bench simd_optimization_benchmark

# Memory performance benchmark
cargo bench --bench memory_pool_benchmark

# I/O performance benchmark
cargo bench --bench csv_read_bench

Custom Benchmark Suite

use veloxx::prelude::*;
use criterion::{black_box, Criterion};

fn benchmark_custom_operations(c: &mut Criterion) {
let data = generate_test_data(1_000_000);

c.bench_function("veloxx_custom_filter", |b| {
b.iter(|| {
black_box(data.filter(|x| *x > 500_000))
})
});
}

🔬 Performance Deep Dive

SIMD Optimization Results

  • AVX2 Instructions: 4-8x speedup in arithmetic operations
  • Vectorized Operations: Batch processing of 8 elements simultaneously
  • Memory Alignment: Optimized data layout for SIMD efficiency

Parallel Processing Architecture

  • Work-Stealing Pool: Dynamic load balancing across cores
  • Chunk-Based Processing: Optimal data partitioning strategies
  • NUMA Awareness: Memory locality optimization

Memory Management Excellence

  • Custom Allocators: Pool-based allocation for frequent operations
  • Zero-Copy Design: Minimize data movement with smart references
  • Cache Optimization: Data structures designed for CPU cache efficiency

📈 Scaling Characteristics

Dataset Size Performance

RowsVeloxx Filter TimeThroughputMemory Usage
100K57.3 µs1.74M rows/sec12MB
1M573 µs1.74M rows/sec120MB
10M5.73 ms1.74M rows/sec1.2GB
100M57.3 ms1.74M rows/sec12GB
Linear Scaling

Veloxx maintains consistent per-row performance across dataset sizes, demonstrating excellent scalability characteristics.

🎯 Performance Roadmap

Near-Term Targets (Q1 2025)

  1. Aggregation Optimization: Match Polars performance in sum operations
  2. Group By Enhancement: Competitive performance for all group-by scenarios
  3. SIMD Expansion: Broader vectorization coverage
  4. Memory Reduction: Further allocation overhead minimization

Long-Term Vision (2025)

  1. GPU Acceleration: CUDA/OpenCL support for massive datasets
  2. Distributed Computing: Multi-node processing capabilities
  3. Advanced Algorithms: Research-backed optimization techniques
  4. Hardware Specialization: Architecture-specific optimizations

Benchmark Methodology

All benchmarks are performed using Criterion.rs with:

  • Release builds with full optimizations
  • Multiple iterations for statistical significance
  • Consistent hardware environment
  • Reproducible test conditions

For detailed benchmark code and reproduction instructions, see the /benches directory in the Veloxx repository.

OperationVeloxxPolars (Rust)Speedup
Group By + Sum18.5ms511µs36x slower
Mean (f64)100µs12.8µs7.8x slower
Min (i32)5.72µs5.77µsSimilar
Max (i32)5.67µs5.72µsSimilar
Performance Gap

Veloxx's group by operations are 36x slower than Polars, indicating a need for algorithmic improvements.

Memory Usage Analysis

Veloxx uses efficient memory layouts with minimal allocations:

  • Zero-copy operations where possible
  • Bitmap-based null handling for minimal overhead
  • SIMD-optimized operations for better cache utilization

Scalability Analysis

Based on our benchmarks, Veloxx's performance characteristics are:

  • Excellent SIMD performance for numeric operations
  • Significant gaps in filtering and group by operations
  • Competitive memory usage patterns

Real-World Performance Characteristics

Numeric Computation

Veloxx excels at SIMD-optimized numeric computations:

  • SIMD operations for vectorized processing
  • Efficient memory access patterns

Data Processing Pipelines

For complex data processing pipelines:

  • Lazy evaluation opportunities for query optimization
  • Memory-efficient intermediate results

Performance Optimization Tips

1. Use SIMD Operations

// ✅ Good: Use SIMD-optimized operations
let result = series.simd_sum();

// ❌ Avoid: Basic operations when SIMD is available
let result = series.sum();

2. Optimize Memory Usage

// ✅ Good: Process in chunks for very large datasets
for chunk in df.chunks(1_000_000) {
let result = chunk.process()?;
writer.write(result)?;
}

Benchmark Reproduction

To reproduce these benchmarks on your system:

# Clone the repository
git clone https://github.com/conqxeror/veloxx.git
cd veloxx

# Run all benchmarks
cargo bench

# Run specific benchmark suite
cargo bench --bench comprehensive_benchmarks

# Run individual benchmarks
cargo bench --bench arrow_filter_benchmarks
Benchmark Methodology

All benchmarks use Criterion.rs for accurate measurement. Results may vary based on hardware configuration, dataset characteristics, and system load. Benchmarks are continuously updated with each release.

Summary

Veloxx provides excellent performance for SIMD-optimized operations but currently lags behind industry leaders like Polars in core DataFrame operations:

  • Excellent SIMD performance with optimized numeric operations
  • Significant performance gaps in filtering and group by operations
  • Efficient memory usage with minimal allocations
  • Opportunities for optimization in core algorithms

The performance advantages of SIMD operations show Veloxx's potential, but focused optimization efforts on filtering and group by operations are needed to be competitive with industry leaders.

Development Roadmap

Future development will focus on optimizing filtering and group by operations to close the performance gap with Polars while maintaining Veloxx's SIMD advantages.