PyStormTracker Roadmap

This document outlines the strategic plan for improving PyStormTracker’s performance, CI/CD pipelines, and overall architecture, with a focus on high-resolution climate data scalability.

1. Performance & Scalability

  • Prevent CPU Oversubscription (Numba vs. Dask/MPI):

    • Current State: Dask/MPI orchestrates processes, but Numba kernels lack explicit thread constraints. If parallel=True is used in Numba, it will oversubscribe CPU cores and cause thrashing.

    • Action: Explicitly control thread topology inside worker tasks (e.g., numba.set_num_threads(1) when scaling via Dask/MPI processes).

  • Vectorize the SimpleLinker:

    • Current State: Linking uses a vectorized Haversine matrix but remains \(O(N \times M)\), which can be a bottleneck as trajectory counts scale.

    • Action: Leverage scipy.spatial.cKDTree for nearest-neighbor lookups across time steps to convert spatial proximity searches to highly optimized C-level trees.

  • Manage Memory Pressure (Chunking) (Completed):

    • Implemented time-chunking across backends to prevent memory exhaustion on large datasets. This maintains optimal block-IO performance by avoiding metadata/locking overhead.

  • Array-Backed Data Model (Completed):

    • Transitioned from nested Python objects to flat, C-contiguous NumPy arrays for trajectories and centers.

  • JIT-Optimized Kernels (Completed):

    • Implemented core mathematical filters (Laplacian, Extrema, MGE, CCL) in GIL-free Numba JIT.

  • GPU-Accelerated Preprocessing & Detection (Experimental):

    • Action: Expand JAX-native capabilities beyond spherical harmonic transforms and kinematic derivatives to include local extrema detection and Laplacian filtering. This will enable full end-to-end GPU/TPU acceleration for high-resolution datasets.

    • Status: JAX-based spectral filtering and vector derivatives have been implemented as an experimental backend.

2. CI/CD & Testing

  • Implement Performance Regression Testing:

    • Current State: No automated guardrails against JIT performance degradation.

    • Action: Integrate pytest-benchmark with a deterministic synthetic dataset fixture. Add a CI job that fails if Numba execution time drops significantly compared to main.

  • Dependency Audit:

    • Action: Add a weekly scheduled CI run of uv sync --resolution lowest-direct combined with pytest to ensure minimum versions in pyproject.toml remain accurate.

  • Tiered Integration Testing (Completed):

    • Implemented “Short” vs “Full” integration test suites to balance local dev speed with CI thoroughness.

3. Architecture

  • Idiomatic Xarray Integration (apply_ufunc):

    • Current State: Xarray is primarily used as an I/O loader before dropping down to NumPy arrays and manual parallel orchestration.

    • Action: Wrap core Numba filters inside xr.apply_ufunc(..., dask="parallelized"). This allows Xarray to natively handle chunking and distributed execution.

  • Distributed Backends (Completed):

    • Native support for Dask and MPI backends with automatic environment detection and fallback logic.

  • Modern CLI & API (Completed):

    • Grouped, logical command-line interface with auto-configuration of parallel workers.

    • Flexible Tracker Protocol for cross-algorithm support.

  • Remote Data Support (Completed):

    • Native support for remote Zarr datasets via HTTP, S3, and GS protocols with automatic format detection.

4. Distribution & Ecosystem

  • Modular Dependencies (Completed):

    • Optional dependency groups (e.g., [hodges], [mpi], [grib]) to minimize build-time requirements and simplify installation in constrained environments like ReadTheDocs.

  • Conda-forge Distribution (Completed):

    • Available on conda-forge for easy cross-platform installation.

5. Feature Implementation

  • HodgesTracker Integration (Completed):

    • Native Python/Numba implementation of the Modified Greedy Exchange (MGE) algorithm with algorithmic parity to TRACK-1.5.2.

  • Preprocessing (Completed):

  • HodgesTracker Refinement (In Progress):

    • Action: Implement Dierckx B-spline surface fitting and evaluation in Numba to achieve bit-wise coordinate identity with original TRACK software.

  • Postprocessing (Track Metrics):

    • Action: Implement Accumulated Track Activity (ATA) and other storm track metrics from Yau and Chang (2020).

  • JAX-Based Feature Detection (Proposed):

    • Action: Develop JAX-native implementations of the extrema detection and intensity refinement kernels to support high-throughput, GPU-resident tracking pipelines.