Smart-KNN: A production-focused, feature-weighted KNN optimized for CPU

Hi HN,

I’ve been working on SmartKNN, a nearest-neighbor system designed specifically for production deployment rather than academic experimentation.

The goal was not to slightly tweak classical KNN, but to restructure it into a deployable, latency-aware system while preserving interpretability.

What it does differently

Traditional KNN is simple and interpretable, but in practice it struggles with:

Inference latency as datasets grow

Equal treatment of all features

Fixed distance metrics

Unpredictable performance under load

SmartKNN addresses these issues through:

1. Learned Feature Weighting

Feature importance is learned automatically and incorporated into the distance computation. This reduces noise and improves neighbor quality without manual tuning.

2. Adaptive Distance Behavior

Distance computation adapts to learned feature relevance instead of relying on a fixed metric like plain Euclidean.

3. Backend Selection

SmartKNN supports both brute-force and approximate nearest-neighbor strategies.

Small datasets → brute-force

Larger datasets → approximate candidate retrieval

Approximate search is used only to retrieve candidates. Final prediction always uses the learned distance function.

4. CPU-Focused Design

The system is optimized for predictable CPU inference performance rather than GPU-heavy workflows. The focus is stable latency characteristics suitable for production workloads.

5. Unified API

Supports both classification and regression through a scikit-learn compatible interface.

Performance

On structured/tabular datasets with strong local structure, SmartKNN achieves competitive accuracy against tree-based models.

It does not aim to replace tree models or neural networks universally. It performs best where neighborhood structure is meaningful and interpretability is desired.

Limitations

- Requires dataset to remain in memory - High-dimensional dense data can still challenge nearest-neighbor methods - No online/incremental updates yet - Backend preparation adds setup time for large datasets

Project Status

- Public release: 0.2.2 - Stable API - Open source - CPU-optimized core Repository: https://github.com/thatipamula-jashwanth/smart-knn I’d appreciate feedback, especially from people who have deployed nearest-neighbor systems in production.

Thanks.

- Jashwanth

1 points | by Jashwanth01 2 hours ago

1 comments

  • verdverm 2 hours ago
    1. just submit a title and link, what you have is unclickable

    2. don't put readme like content on HN, let the other side of the link speak for itself

    3. a blog post about the experience or lessons learned will often do much better

    4. do you have a peer reviewed paper to go with this?

    • Jashwanth01 2 hours ago
      Thanks for the feedback.. that makes sense.

      I’ve updated the post to include a direct link to the repository. I appreciate the note about keeping the HN submission lighter and letting the linked page speak for itself.

      This project is engineering-focused rather than academic research, so there isn’t a peer-reviewed paper at this stage. The goal was to explore practical deployment tradeoffs in nearest-neighbor systems.

      I’ll consider writing a blog post focused on lessons learned and design decisions that’s a good suggestion.

      • verdverm 2 hours ago
        My Phd dissertation started as an engineer's frustration, the papers can come later

        urls in the text are not links, I believe you will need a new submission