PyStormTracker Roadmap

This document outlines the strategic plan for improving PyStormTracker’s performance, CI/CD pipelines, and overall architecture, with a focus on high-resolution climate data scalability.

1. Performance & Scalability

Prevent CPU Oversubscription (Numba vs. Dask/MPI):
- Current State: Dask/MPI orchestrates processes, but Numba kernels lack explicit thread constraints. If parallel=True is used in Numba, it will oversubscribe CPU cores and cause thrashing.
- Action: Explicitly control thread topology inside worker tasks (e.g., numba.set_num_threads(1) when scaling via Dask/MPI processes).
Vectorize the SimpleLinker:
- Current State: Linking uses a vectorized Haversine matrix but remains \(O(N \times M)\), which can be a bottleneck as trajectory counts scale.
- Action: Leverage scipy.spatial.cKDTree for nearest-neighbor lookups across time steps to convert spatial proximity searches to highly optimized C-level trees.
Manage Memory Pressure (Chunking) (Completed):
- Implemented time-chunking across backends to prevent memory exhaustion on large datasets. This maintains optimal block-IO performance by avoiding metadata/locking overhead.
Array-Backed Data Model (Completed):
- Transitioned from nested Python objects to flat, C-contiguous NumPy arrays for trajectories and centers.
JIT-Optimized Kernels (Completed):
- Implemented core mathematical filters (Laplacian, Extrema, MGE, CCL) in GIL-free Numba JIT.
GPU-Accelerated Preprocessing & Detection (Experimental):
- Action: Expand JAX-native capabilities beyond spherical harmonic transforms and kinematic derivatives to include local extrema detection and Laplacian filtering. This will enable full end-to-end GPU/TPU acceleration for high-resolution datasets.
- Status: JAX-based spectral filtering and vector derivatives have been implemented as an experimental backend.

2. CI/CD & Testing

Implement Performance Regression Testing:
- Current State: No automated guardrails against JIT performance degradation.
- Action: Integrate pytest-benchmark with a deterministic synthetic dataset fixture. Add a CI job that fails if Numba execution time drops significantly compared to main.
Dependency Audit:
- Action: Add a weekly scheduled CI run of uv sync --resolution lowest-direct combined with pytest to ensure minimum versions in pyproject.toml remain accurate.
Tiered Integration Testing (Completed):
- Implemented “Short” vs “Full” integration test suites to balance local dev speed with CI thoroughness.

3. Architecture

Idiomatic Xarray Integration (apply_ufunc):
- Current State: Xarray is primarily used as an I/O loader before dropping down to NumPy arrays and manual parallel orchestration.
- Action: Wrap core Numba filters inside xr.apply_ufunc(..., dask="parallelized"). This allows Xarray to natively handle chunking and distributed execution.
Distributed Backends (Completed):
- Native support for Dask and MPI backends with automatic environment detection and fallback logic.
Modern CLI & API (Completed):
- Grouped, logical command-line interface with auto-configuration of parallel workers.
- Flexible Tracker Protocol for cross-algorithm support.
Remote Data Support (Completed):
- Native support for remote Zarr datasets via HTTP, S3, and GS protocols with automatic format detection.

4. Distribution & Ecosystem

Modular Dependencies (Completed):
- Optional dependency groups (e.g., [hodges], [mpi], [grib]) to minimize build-time requirements and simplify installation in constrained environments like ReadTheDocs.
Conda-forge Distribution (Completed):
- Available on conda-forge for easy cross-platform installation.

5. Feature Implementation

HodgesTracker Integration (Completed):
- Native Python/Numba implementation of the Modified Greedy Exchange (MGE) algorithm with algorithmic parity to TRACK-1.5.2.
Preprocessing (Completed):
HodgesTracker Refinement (In Progress):
- Action: Implement Dierckx B-spline surface fitting and evaluation in Numba to achieve bit-wise coordinate identity with original TRACK software.
Postprocessing (Track Metrics):
- Action: Implement Accumulated Track Activity (ATA) and other storm track metrics from Yau and Chang (2020).
JAX-Based Feature Detection (Proposed):
- Action: Develop JAX-native implementations of the extrema detection and intensity refinement kernels to support high-throughput, GPU-resident tracking pipelines.