Skip to main content

Python API Reference

Complete API reference for Veloxx Python bindings.

Installation

pip install veloxx

Quick Start

import veloxx as vx

# Load data
df = vx.read_csv("data.csv")

# Basic operations
filtered = df.filter(df["age"] > 25)
grouped = df.groupby("department").mean()

Core Classes

PyDataFrame

The main data structure for working with tabular data in Python.

Constructors

PyDataFrame(columns: dict)

Creates a new DataFrame from a dictionary of column names to PySeries.

Parameters:

columns: dict - Dictionary mapping column names to PySeries objects

Example:

import veloxx as vx

df = vx.PyDataFrame({
"name": vx.PySeries("name", ["Alice", "Bob", "Charlie"]),
"age": vx.PySeries("age", [25, 30, 35]),
"salary": vx.PySeries("salary", [50000.0, 75000.0, 60000.0])
})

Class Methods

@classmethod from_csv(path: str) -> PyDataFrame

Loads a DataFrame from a CSV file with automatic type inference.

Parameters:

path: str - Path to the CSV file

Example:

df = vx.PyDataFrame.from_csv("data/employees.csv")
print(f"Loaded {df.row_count()} rows")
@classmethod from_json(path: str) -> PyDataFrame

Loads a DataFrame from a JSON file.

Parameters:

path: str - Path to the JSON file

Example:

df = vx.PyDataFrame.from_json("data/users.json")

Properties

row_count() -> int

Returns the number of rows in the DataFrame.

Example:

print(f"DataFrame has {df.row_count()} rows")
column_count() -> int

Returns the number of columns in the DataFrame.

Example:

print(f"DataFrame has {df.column_count()} columns")
column_names() -> List[str]

Returns a list of column names.

Example:

names = df.column_names()
for name in names:
print(f"Column: {name}")

Data Access

get_column(name: str) -> Optional[PySeries]

Gets a column by name.

Parameters:

name: str - Name of the column to retrieve

Example:

age_column = df.get_column("age")
if age_column:
print(f"Age column has {age_column.len()} values")
getitem(key: str) -> PySeries

Gets a column using bracket notation (syntactic sugar).

Example:

# These are equivalent
age1 = df.get_column("age")
age2 = df["age"]

Data Manipulation

filter(row_indices: List[int]) -> PyDataFrame

Filters rows by index positions.

Parameters:

row_indices: List[int] - List of row indices to keep

Example:

# Filter rows where age > 25
age_series = df.get_column("age")
indices = [i for i, age in enumerate(age_series.to_list()) if age and age > 25]
filtered_df = df.filter(indices)
select_columns(names: List[str]) -> PyDataFrame

Selects specific columns from the DataFrame.

Parameters:

names: List[str] - Names of columns to select

Example:

selected = df.select_columns(["name", "age"])
drop_columns(names: List[str]) -> PyDataFrame

Removes specified columns from the DataFrame.

Parameters:

names: List[str] - Names of columns to drop

Example:

without_id = df.drop_columns(["id"])
rename_column(old_name: str, new_name: str) -> PyDataFrame

Renames a column in the DataFrame.

Parameters:

old_name: str - Current name of the column

new_name: str - New name for the column

Example:

renamed = df.rename_column("age", "years")
with_column(name: str, expr: PyExpr) -> PyDataFrame

Adds a new column or replaces an existing one using an expression.

Parameters:

name: str - Name of the new column

expr: PyExpr - Expression to compute the column values

Example:

# Add a column with salary + 1000 bonus
expr = vx.PyExpr.add(
vx.PyExpr.column("salary"),
vx.PyExpr.literal(1000.0)
)
with_bonus = df.with_column("salary_with_bonus", expr)

Grouping and Aggregation

group_by(by_columns: List[str]) -> PyGroupedDataFrame

Groups the DataFrame by specified columns.

Parameters:

by_columns: List[str] - Columns to group by

Example:

grouped = df.group_by(["department"])
result = grouped.mean()
describe() -> PyDataFrame

Generates descriptive statistics for numeric columns.

Example:

stats = df.describe()
print(stats)

Statistical Methods

correlation(col1_name: str, col2_name: str) -> float

Calculates the Pearson correlation between two numeric columns.

Parameters:

col1_name: str - Name of the first column

col2_name: str - Name of the second column

Example:

corr = df.correlation("age", "salary")
print(f"Age-Salary correlation: {corr:.3f}")
covariance(col1_name: str, col2_name: str) -> float

Calculates the covariance between two numeric columns.

Parameters:

col1_name: str - Name of the first column

col2_name: str - Name of the second column

Example:

cov = df.covariance("age", "salary")
print(f"Age-Salary covariance: {cov:.2f}")

Joining

join(other: PyDataFrame, on_column: str, join_type: PyJoinType) -> PyDataFrame

Joins this DataFrame with another DataFrame.

Parameters:

other: PyDataFrame - DataFrame to join with

on_column: str - Column name to join on

join_type: PyJoinType - Type of join (Inner, Left, Right)

Example:

joined = df1.join(df2, "user_id", vx.PyJoinType.Inner)

Sorting and Ordering

sort(by_columns: List[str], ascending: bool = True) -> PyDataFrame

Sorts the DataFrame by specified columns.

Parameters:

by_columns: List[str] - Columns to sort by

ascending: bool - Sort order (default: True)

Example:

sorted_df = df.sort(["age", "name"], ascending=True)

Data Cleaning

drop_nulls() -> PyDataFrame

Removes rows containing any null values.

Example:

clean_df = df.drop_nulls()
fill_nulls(value: Any) -> PyDataFrame

Fills null values with a specified value.

Parameters:

value: Any - Value to use for filling nulls

Example:

filled = df.fill_nulls(0)  # Fill with 0
filled_str = df.fill_nulls("Unknown") # Fill with string

I/O Operations

to_csv(path: str) -> None

Writes the DataFrame to a CSV file.

Parameters:

path: str - Output file path

Example:

df.to_csv("output/results.csv")

Concatenation

append(other: PyDataFrame) -> PyDataFrame

Appends another DataFrame vertically.

Parameters:

other: PyDataFrame - DataFrame to append

Example:

combined = df1.append(df2)

PyGroupedDataFrame

Represents a grouped DataFrame for aggregation operations.

Aggregation Methods

sum() -> PyDataFrame

Calculates the sum for each group.

Example:

grouped = df.group_by(["department"])
sums = grouped.sum()
mean() -> PyDataFrame

Calculates the mean for each group.

Example:

averages = grouped.mean()
count() -> PyDataFrame

Counts values for each group.

Example:

counts = grouped.count()
min() -> PyDataFrame

Finds the minimum value for each group.

Example:

minimums = grouped.min()
max() -> PyDataFrame

Finds the maximum value for each group.

Example:

maximums = grouped.max()
agg(aggregations: List[Tuple[str, str]]) -> PyDataFrame

Performs custom aggregations.

Parameters:

aggregations: List[Tuple[str, str]] - List of (column, aggregation_function) tuples

Example:

result = grouped.agg([
("salary", "mean"),
("age", "count"),
("experience", "max")
])

PySeries

Represents a single column of data.

Constructors

PySeries(name: str, data: List[Any])

Creates a new Series with automatic type inference.

Parameters:

name: str - Name of the series

data: List[Any] - List of values (supports None for nulls)

Example:

# Integer series
ages = vx.PySeries("age", [25, 30, None, 35])

# String series
names = vx.PySeries("name", ["Alice", "Bob", None, "Charlie"])

# Float series
salaries = vx.PySeries("salary", [50000.0, 75000.0, 60000.0])

# Boolean series
active = vx.PySeries("is_active", [True, False, True])

Properties

name() -> str

Returns the name of the Series.

Example:

print(f"Series name: {series.name()}")
len() -> int

Returns the length of the Series.

Example:

print(f"Series has {series.len()} values")
is_empty() -> bool

Checks if the Series is empty.

Example:

if series.is_empty():
print("Series is empty")
data_type() -> PyDataType

Returns the data type of the Series.

Example:

dtype = series.data_type()
print(f"Series type: {dtype}")

Data Access

get_value(index: int) -> Any

Gets the value at a specific index.

Parameters:

index: int - Index of the value to retrieve

Example:

first_value = series.get_value(0)
print(f"First value: {first_value}")
to_list() -> List[Any]

Converts the Series to a Python list.

Example:

values = series.to_list()
for value in values:
if value is not None:
print(value)

Statistical Methods

sum() -> float

Calculates the sum of numeric values.

Example:

total = series.sum()
print(f"Sum: {total}")
mean() -> float

Calculates the mean of numeric values.

Example:

average = series.mean()
print(f"Average: {average}")
median() -> float

Calculates the median of numeric values.

Example:

median = series.median()
print(f"Median: {median}")
min() -> Any

Finds the minimum value.

Example:

minimum = series.min()
print(f"Minimum: {minimum}")
max() -> Any

Finds the maximum value.

Example:

maximum = series.max()
print(f"Maximum: {maximum}")
std() -> float

Calculates the standard deviation.

Example:

std_dev = series.std()
print(f"Standard deviation: {std_dev}")
count() -> int

Counts non-null values.

Example:

non_null_count = series.count()
print(f"Non-null values: {non_null_count}")
unique() -> PySeries

Returns a Series with unique values.

Example:

unique_values = series.unique()
print(f"Unique values: {unique_values.len()}")

Data Manipulation

filter(row_indices: List[int]) -> PySeries

Filters the Series by index positions.

Parameters:

row_indices: List[int] - List of indices to keep

Example:

filtered = series.filter([0, 2, 4])  # Keep indices 0, 2, 4
fill_nulls(value: Any) -> PySeries

Fills null values with a specified value.

Parameters:

value: Any - Value to use for filling nulls

Example:

filled = series.fill_nulls(0)

PyExpr

Represents expressions for computed columns.

Static Methods

@staticmethod column(name: str) -> PyExpr

Creates a column reference expression.

Parameters:

name: str - Name of the column to reference

Example:

expr = vx.PyExpr.column("salary")
@staticmethod literal(value: Any) -> PyExpr

Creates a literal value expression.

Parameters:

value: Any - The literal value

Example:

expr = vx.PyExpr.literal(1000.0)

Arithmetic Operations

@staticmethod add(left: PyExpr, right: PyExpr) -> PyExpr

Creates an addition expression.

Example:

expr = vx.PyExpr.add(
vx.PyExpr.column("base_salary"),
vx.PyExpr.column("bonus")
)
@staticmethod subtract(left: PyExpr, right: PyExpr) -> PyExpr

Creates a subtraction expression.

Example:

expr = vx.PyExpr.subtract(
vx.PyExpr.column("revenue"),
vx.PyExpr.column("costs")
)
@staticmethod multiply(left: PyExpr, right: PyExpr) -> PyExpr

Creates a multiplication expression.

Example:

expr = vx.PyExpr.multiply(
vx.PyExpr.column("quantity"),
vx.PyExpr.column("price")
)
@staticmethod divide(left: PyExpr, right: PyExpr) -> PyExpr

Creates a division expression.

Example:

expr = vx.PyExpr.divide(
vx.PyExpr.column("total_sales"),
vx.PyExpr.column("num_customers")
)

PyJoinType

Enumeration for join types.

class PyJoinType:
Inner = "Inner"
Left = "Left"
Right = "Right"

Example:

joined = df1.join(df2, "user_id", vx.PyJoinType.Left)

Convenience Functions

Data Loading

read_csv(path: str) -> PyDataFrame

Convenience function to load CSV files.

Example:

import veloxx as vx

df = vx.read_csv("data.csv")
read_json(path: str) -> PyDataFrame

Convenience function to load JSON files.

Example:

df = vx.read_json("data.json")

Usage Patterns

Basic Data Analysis

import veloxx as vx

# Load data
df = vx.read_csv("sales_data.csv")

# Basic info
print(f"Dataset: {df.row_count()} rows, {df.column_count()} columns")
print(f"Columns: {df.column_names()}")

# Filter high-value sales
high_value_indices = []
amount_series = df.get_column("amount")
for i, amount in enumerate(amount_series.to_list()):
if amount and amount > 1000:
high_value_indices.append(i)

high_value_sales = df.filter(high_value_indices)

# Group by and aggregate
summary = high_value_sales.group_by(["region"]).agg([
("amount", "sum"),
("amount", "mean"),
("customer_id", "count")
])

print(summary)

Advanced Analytics

import veloxx as vx

def analyze_customer_data():
# Load customer data
customers = vx.read_csv("customers.csv")
orders = vx.read_csv("orders.csv")

# Join datasets
customer_orders = customers.join(orders, "customer_id", vx.PyJoinType.Inner)

# Calculate customer lifetime value
clv_expr = vx.PyExpr.multiply(
vx.PyExpr.column("order_value"),
vx.PyExpr.column("order_frequency")
)

with_clv = customer_orders.with_column("lifetime_value", clv_expr)

# Segment customers
high_value_indices = []
clv_series = with_clv.get_column("lifetime_value")
for i, clv in enumerate(clv_series.to_list()):
if clv and clv > 5000:
high_value_indices.append(i)

high_value_customers = with_clv.filter(high_value_indices)

# Analyze by segment
segment_analysis = high_value_customers.group_by(["customer_segment"]).agg([
("lifetime_value", "mean"),
("order_frequency", "mean"),
("customer_id", "count")
])

return segment_analysis

# Run analysis
results = analyze_customer_data()
print(results)

Data Cleaning Pipeline

import veloxx as vx

def clean_dataset(df):
"""Clean and prepare dataset for analysis"""

# Remove rows with missing critical data
clean_df = df.drop_nulls()

# Fill missing values in optional columns
filled_df = clean_df.fill_nulls("Unknown")

# Remove outliers (example: ages > 100)
age_series = filled_df.get_column("age")
valid_indices = []
for i, age in enumerate(age_series.to_list()):
if age and 0 <= age <= 100:
valid_indices.append(i)

filtered_df = filled_df.filter(valid_indices)

# Standardize column names
standardized = filtered_df.rename_column("customer_name", "name")
standardized = standardized.rename_column("customer_age", "age")

return standardized

# Usage
raw_data = vx.read_csv("raw_customer_data.csv")
clean_data = clean_dataset(raw_data)
clean_data.to_csv("clean_customer_data.csv")

Performance Tips

  1. Use appropriate data types: Let Veloxx infer types automatically for best performance
  2. Filter early: Apply filters before expensive operations like joins
  3. Use vectorized operations: Leverage expressions instead of loops
  4. Process in chunks: For very large datasets, process in smaller chunks
  5. Minimize data copying: Chain operations when possible

Error Handling

import veloxx as vx

try:
df = vx.read_csv("data.csv")
result = df.group_by(["category"]).mean()
result.to_csv("output.csv")
except FileNotFoundError:
print("Input file not found")
except Exception as e:
print(f"Error processing data: {e}")

Integration with Pandas

Convert between Veloxx and Pandas for interoperability:

import veloxx as vx
import pandas as pd

# Pandas to Veloxx
def pandas_to_veloxx(pandas_df):
columns = {}
for col in pandas_df.columns:
data = pandas_df[col].tolist()
# Convert NaN to None
data = [None if pd.isna(x) else x for x in data]
columns[col] = vx.PySeries(col, data)
return vx.PyDataFrame(columns)

# Veloxx to Pandas
def veloxx_to_pandas(veloxx_df):
data = {}
for col_name in veloxx_df.column_names():
series = veloxx_df.get_column(col_name)
data[col_name] = series.to_list()
return pd.DataFrame(data)

# Usage
pandas_df = pd.read_csv("data.csv")
veloxx_df = pandas_to_veloxx(pandas_df)

# Process with Veloxx (faster)
result = veloxx_df.group_by(["category"]).mean()

# Convert back to Pandas if needed
result_pandas = veloxx_to_pandas(result)