Python API Reference

Complete API reference for Veloxx Python bindings.

Installation

pip install veloxx

Quick Start

import veloxx as vx

# Load data
df = vx.read_csv("data.csv")

# Basic operations
filtered = df.filter(df["age"] > 25)
grouped = df.groupby("department").mean()

Core Classes

`PyDataFrame`

The main data structure for working with tabular data in Python.

Constructors

PyDataFrame(columns: dict)

Creates a new DataFrame from a dictionary of column names to PySeries.

Parameters:

columns: dict - Dictionary mapping column names to PySeries objects

Example:

import veloxx as vx

df = vx.PyDataFrame({
    "name": vx.PySeries("name", ["Alice", "Bob", "Charlie"]),
    "age": vx.PySeries("age", [25, 30, 35]),
    "salary": vx.PySeries("salary", [50000.0, 75000.0, 60000.0])
})

Class Methods

@classmethod from_csv(path: str) -> PyDataFrame

Loads a DataFrame from a CSV file with automatic type inference.

Parameters:

path: str - Path to the CSV file

Example:

df = vx.PyDataFrame.from_csv("data/employees.csv")
print(f"Loaded {df.row_count()} rows")

@classmethod from_json(path: str) -> PyDataFrame

Loads a DataFrame from a JSON file.

Parameters:

path: str - Path to the JSON file

Example:

df = vx.PyDataFrame.from_json("data/users.json")

Properties

row_count() -> int

Returns the number of rows in the DataFrame.

Example:

print(f"DataFrame has {df.row_count()} rows")

column_count() -> int

Returns the number of columns in the DataFrame.

Example:

print(f"DataFrame has {df.column_count()} columns")

column_names() -> List[str]

Returns a list of column names.

Example:

names = df.column_names()
for name in names:
    print(f"Column: {name}")

Data Access

get_column(name: str) -> Optional[PySeries]

Gets a column by name.

Parameters:

Example:

age_column = df.get_column("age")
if age_column:
    print(f"Age column has {age_column.len()} values")

getitem(key: str) -> PySeries

Gets a column using bracket notation (syntactic sugar).

Example:

# These are equivalent
age1 = df.get_column("age")
age2 = df["age"]

Data Manipulation

filter(row_indices: List[int]) -> PyDataFrame

Filters rows by index positions.

Parameters:

row_indices: List[int] - List of row indices to keep

Example:

# Filter rows where age > 25
age_series = df.get_column("age")
indices = [i for i, age in enumerate(age_series.to_list()) if age and age > 25]
filtered_df = df.filter(indices)

select_columns(names: List[str]) -> PyDataFrame

Selects specific columns from the DataFrame.

Parameters:

names: List[str] - Names of columns to select

Example:

selected = df.select_columns(["name", "age"])

drop_columns(names: List[str]) -> PyDataFrame

Removes specified columns from the DataFrame.

Parameters:

names: List[str] - Names of columns to drop

Example:

without_id = df.drop_columns(["id"])

rename_column(old_name: str, new_name: str) -> PyDataFrame

Renames a column in the DataFrame.

Parameters:

old_name: str - Current name of the column

new_name: str - New name for the column

Example:

renamed = df.rename_column("age", "years")

with_column(name: str, expr: PyExpr) -> PyDataFrame

Adds a new column or replaces an existing one using an expression.

Parameters:

expr: PyExpr - Expression to compute the column values

Example:

# Add a column with salary + 1000 bonus
expr = vx.PyExpr.add(
    vx.PyExpr.column("salary"),
    vx.PyExpr.literal(1000.0)
)
with_bonus = df.with_column("salary_with_bonus", expr)

Grouping and Aggregation

group_by(by_columns: List[str]) -> PyGroupedDataFrame

Groups the DataFrame by specified columns.

Parameters:

by_columns: List[str] - Columns to group by

Example:

grouped = df.group_by(["department"])
result = grouped.mean()

describe() -> PyDataFrame

Generates descriptive statistics for numeric columns.

Example:

stats = df.describe()
print(stats)

Statistical Methods

correlation(col1_name: str, col2_name: str) -> float

Calculates the Pearson correlation between two numeric columns.

Parameters:

col1_name: str - Name of the first column

col2_name: str - Name of the second column

Example:

corr = df.correlation("age", "salary")
print(f"Age-Salary correlation: {corr:.3f}")

covariance(col1_name: str, col2_name: str) -> float

Calculates the covariance between two numeric columns.

Parameters:

col1_name: str - Name of the first column

col2_name: str - Name of the second column

Example:

cov = df.covariance("age", "salary")
print(f"Age-Salary covariance: {cov:.2f}")

Joining

join(other: PyDataFrame, on_column: str, join_type: PyJoinType) -> PyDataFrame

Joins this DataFrame with another DataFrame.

Parameters:

other: PyDataFrame - DataFrame to join with

on_column: str - Column name to join on

join_type: PyJoinType - Type of join (Inner, Left, Right)

Example:

joined = df1.join(df2, "user_id", vx.PyJoinType.Inner)

Sorting and Ordering

sort(by_columns: List[str], ascending: bool = True) -> PyDataFrame

Sorts the DataFrame by specified columns.

Parameters:

by_columns: List[str] - Columns to sort by

ascending: bool - Sort order (default: True)

Example:

sorted_df = df.sort(["age", "name"], ascending=True)

Data Cleaning

drop_nulls() -> PyDataFrame

Removes rows containing any null values.

Example:

clean_df = df.drop_nulls()

fill_nulls(value: Any) -> PyDataFrame

Fills null values with a specified value.

Parameters:

value: Any - Value to use for filling nulls

Example:

filled = df.fill_nulls(0)  # Fill with 0
filled_str = df.fill_nulls("Unknown")  # Fill with string

I/O Operations

to_csv(path: str) -> None

Writes the DataFrame to a CSV file.

Parameters:

path: str - Output file path

Example:

df.to_csv("output/results.csv")

Concatenation

append(other: PyDataFrame) -> PyDataFrame

Appends another DataFrame vertically.

Parameters:

other: PyDataFrame - DataFrame to append

Example:

combined = df1.append(df2)

`PyGroupedDataFrame`

Represents a grouped DataFrame for aggregation operations.

Aggregation Methods

sum() -> PyDataFrame

Calculates the sum for each group.

Example:

grouped = df.group_by(["department"])
sums = grouped.sum()

mean() -> PyDataFrame

Calculates the mean for each group.

Example:

averages = grouped.mean()

count() -> PyDataFrame

Counts values for each group.

Example:

counts = grouped.count()

min() -> PyDataFrame

Finds the minimum value for each group.

Example:

minimums = grouped.min()

max() -> PyDataFrame

Finds the maximum value for each group.

Example:

maximums = grouped.max()

agg(aggregations: List[Tuple[str, str]]) -> PyDataFrame

Performs custom aggregations.

Parameters:

aggregations: List[Tuple[str, str]] - List of (column, aggregation_function) tuples

Example:

result = grouped.agg([
    ("salary", "mean"),
    ("age", "count"),
    ("experience", "max")
])

`PySeries`

Represents a single column of data.

Constructors

PySeries(name: str, data: List[Any])

Creates a new Series with automatic type inference.

Parameters:

data: List[Any] - List of values (supports None for nulls)

Example:

# Integer series
ages = vx.PySeries("age", [25, 30, None, 35])

# String series  
names = vx.PySeries("name", ["Alice", "Bob", None, "Charlie"])

# Float series
salaries = vx.PySeries("salary", [50000.0, 75000.0, 60000.0])

# Boolean series
active = vx.PySeries("is_active", [True, False, True])

Properties

name() -> str

Returns the name of the Series.

Example:

print(f"Series name: {series.name()}")

len() -> int

Returns the length of the Series.

Example:

print(f"Series has {series.len()} values")

is_empty() -> bool

Checks if the Series is empty.

Example:

if series.is_empty():
    print("Series is empty")

data_type() -> PyDataType

Returns the data type of the Series.

Example:

dtype = series.data_type()
print(f"Series type: {dtype}")

Data Access

get_value(index: int) -> Any

Gets the value at a specific index.

Parameters:

index: int - Index of the value to retrieve

Example:

first_value = series.get_value(0)
print(f"First value: {first_value}")

to_list() -> List[Any]

Converts the Series to a Python list.

Example:

values = series.to_list()
for value in values:
    if value is not None:
        print(value)

Statistical Methods

sum() -> float

Calculates the sum of numeric values.

Example:

total = series.sum()
print(f"Sum: {total}")

mean() -> float

Calculates the mean of numeric values.

Example:

average = series.mean()
print(f"Average: {average}")

median() -> float

Calculates the median of numeric values.

Example:

median = series.median()
print(f"Median: {median}")

min() -> Any

Finds the minimum value.

Example:

minimum = series.min()
print(f"Minimum: {minimum}")

max() -> Any

Finds the maximum value.

Example:

maximum = series.max()
print(f"Maximum: {maximum}")

std() -> float

Calculates the standard deviation.

Example:

std_dev = series.std()
print(f"Standard deviation: {std_dev}")

count() -> int

Counts non-null values.

Example:

non_null_count = series.count()
print(f"Non-null values: {non_null_count}")

unique() -> PySeries

Returns a Series with unique values.

Example:

unique_values = series.unique()
print(f"Unique values: {unique_values.len()}")

Data Manipulation

filter(row_indices: List[int]) -> PySeries

Filters the Series by index positions.

Parameters:

row_indices: List[int] - List of indices to keep

Example:

filtered = series.filter([0, 2, 4])  # Keep indices 0, 2, 4

fill_nulls(value: Any) -> PySeries

Fills null values with a specified value.

Parameters:

value: Any - Value to use for filling nulls

Example:

filled = series.fill_nulls(0)

`PyExpr`

Represents expressions for computed columns.

Static Methods

@staticmethod column(name: str) -> PyExpr

Creates a column reference expression.

Parameters:

Example:

expr = vx.PyExpr.column("salary")

@staticmethod literal(value: Any) -> PyExpr

Creates a literal value expression.

Parameters:

value: Any - The literal value

Example:

expr = vx.PyExpr.literal(1000.0)

Arithmetic Operations

@staticmethod add(left: PyExpr, right: PyExpr) -> PyExpr

Creates an addition expression.

Example:

expr = vx.PyExpr.add(
    vx.PyExpr.column("base_salary"),
    vx.PyExpr.column("bonus")
)

@staticmethod subtract(left: PyExpr, right: PyExpr) -> PyExpr

Creates a subtraction expression.

Example:

expr = vx.PyExpr.subtract(
    vx.PyExpr.column("revenue"),
    vx.PyExpr.column("costs")
)

@staticmethod multiply(left: PyExpr, right: PyExpr) -> PyExpr

Creates a multiplication expression.

Example:

expr = vx.PyExpr.multiply(
    vx.PyExpr.column("quantity"),
    vx.PyExpr.column("price")
)

@staticmethod divide(left: PyExpr, right: PyExpr) -> PyExpr

Creates a division expression.

Example:

expr = vx.PyExpr.divide(
    vx.PyExpr.column("total_sales"),
    vx.PyExpr.column("num_customers")
)

`PyJoinType`

Enumeration for join types.

class PyJoinType:
    Inner = "Inner"
    Left = "Left" 
    Right = "Right"

Example:

joined = df1.join(df2, "user_id", vx.PyJoinType.Left)

Convenience Functions

Data Loading

read_csv(path: str) -> PyDataFrame

Convenience function to load CSV files.

Example:

import veloxx as vx

df = vx.read_csv("data.csv")

read_json(path: str) -> PyDataFrame

Convenience function to load JSON files.

Example:

df = vx.read_json("data.json")

Usage Patterns

Basic Data Analysis

import veloxx as vx

# Load data
df = vx.read_csv("sales_data.csv")

# Basic info
print(f"Dataset: {df.row_count()} rows, {df.column_count()} columns")
print(f"Columns: {df.column_names()}")

# Filter high-value sales
high_value_indices = []
amount_series = df.get_column("amount")
for i, amount in enumerate(amount_series.to_list()):
    if amount and amount > 1000:
        high_value_indices.append(i)

high_value_sales = df.filter(high_value_indices)

# Group by and aggregate
summary = high_value_sales.group_by(["region"]).agg([
    ("amount", "sum"),
    ("amount", "mean"),
    ("customer_id", "count")
])

print(summary)

Advanced Analytics

import veloxx as vx

def analyze_customer_data():
    # Load customer data
    customers = vx.read_csv("customers.csv")
    orders = vx.read_csv("orders.csv")
    
    # Join datasets
    customer_orders = customers.join(orders, "customer_id", vx.PyJoinType.Inner)
    
    # Calculate customer lifetime value
    clv_expr = vx.PyExpr.multiply(
        vx.PyExpr.column("order_value"),
        vx.PyExpr.column("order_frequency")
    )
    
    with_clv = customer_orders.with_column("lifetime_value", clv_expr)
    
    # Segment customers
    high_value_indices = []
    clv_series = with_clv.get_column("lifetime_value")
    for i, clv in enumerate(clv_series.to_list()):
        if clv and clv > 5000:
            high_value_indices.append(i)
    
    high_value_customers = with_clv.filter(high_value_indices)
    
    # Analyze by segment
    segment_analysis = high_value_customers.group_by(["customer_segment"]).agg([
        ("lifetime_value", "mean"),
        ("order_frequency", "mean"),
        ("customer_id", "count")
    ])
    
    return segment_analysis

# Run analysis
results = analyze_customer_data()
print(results)

Data Cleaning Pipeline

import veloxx as vx

def clean_dataset(df):
    """Clean and prepare dataset for analysis"""
    
    # Remove rows with missing critical data
    clean_df = df.drop_nulls()
    
    # Fill missing values in optional columns
    filled_df = clean_df.fill_nulls("Unknown")
    
    # Remove outliers (example: ages > 100)
    age_series = filled_df.get_column("age")
    valid_indices = []
    for i, age in enumerate(age_series.to_list()):
        if age and 0 <= age <= 100:
            valid_indices.append(i)
    
    filtered_df = filled_df.filter(valid_indices)
    
    # Standardize column names
    standardized = filtered_df.rename_column("customer_name", "name")
    standardized = standardized.rename_column("customer_age", "age")
    
    return standardized

# Usage
raw_data = vx.read_csv("raw_customer_data.csv")
clean_data = clean_dataset(raw_data)
clean_data.to_csv("clean_customer_data.csv")

Performance Tips

Use appropriate data types: Let Veloxx infer types automatically for best performance
Filter early: Apply filters before expensive operations like joins
Use vectorized operations: Leverage expressions instead of loops
Process in chunks: For very large datasets, process in smaller chunks
Minimize data copying: Chain operations when possible

Error Handling

import veloxx as vx

try:
    df = vx.read_csv("data.csv")
    result = df.group_by(["category"]).mean()
    result.to_csv("output.csv")
except FileNotFoundError:
    print("Input file not found")
except Exception as e:
    print(f"Error processing data: {e}")

Integration with Pandas

Convert between Veloxx and Pandas for interoperability:

import veloxx as vx
import pandas as pd

# Pandas to Veloxx
def pandas_to_veloxx(pandas_df):
    columns = {}
    for col in pandas_df.columns:
        data = pandas_df[col].tolist()
        # Convert NaN to None
        data = [None if pd.isna(x) else x for x in data]
        columns[col] = vx.PySeries(col, data)
    return vx.PyDataFrame(columns)

# Veloxx to Pandas
def veloxx_to_pandas(veloxx_df):
    data = {}
    for col_name in veloxx_df.column_names():
        series = veloxx_df.get_column(col_name)
        data[col_name] = series.to_list()
    return pd.DataFrame(data)

# Usage
pandas_df = pd.read_csv("data.csv")
veloxx_df = pandas_to_veloxx(pandas_df)

# Process with Veloxx (faster)
result = veloxx_df.group_by(["category"]).mean()

# Convert back to Pandas if needed
result_pandas = veloxx_to_pandas(result)

Installation​

Quick Start​

Core Classes​

PyDataFrame​

Constructors​

Class Methods​

Properties​

Data Access​

Data Manipulation​

Grouping and Aggregation​

Statistical Methods​

Joining​

Sorting and Ordering​

Data Cleaning​

I/O Operations​

Concatenation​

PyGroupedDataFrame​

Aggregation Methods​

PySeries​

Constructors​

Properties​

Data Access​

Statistical Methods​

Data Manipulation​

PyExpr​

Static Methods​

Arithmetic Operations​

PyJoinType​

Convenience Functions​

Data Loading​

Usage Patterns​

Basic Data Analysis​

Advanced Analytics​

Data Cleaning Pipeline​

Performance Tips​

Error Handling​

Integration with Pandas​

Installation

Quick Start

Core Classes

`PyDataFrame`

Constructors

Class Methods

Properties

Data Access

Data Manipulation

Grouping and Aggregation

Statistical Methods

Joining

Sorting and Ordering

Data Cleaning

I/O Operations

Concatenation

`PyGroupedDataFrame`

Aggregation Methods

`PySeries`

Constructors

Properties

Data Access

Statistical Methods

Data Manipulation

`PyExpr`

Static Methods

Arithmetic Operations

`PyJoinType`

Convenience Functions

Data Loading

Usage Patterns

Basic Data Analysis

Advanced Analytics

Data Cleaning Pipeline

Performance Tips

Error Handling

Integration with Pandas