How to reproduce `kneighbors_graph(include_self=True)` using `KNeighborsTransformer` in sklearn?

How to reproduce `kneighbors_graph(include_self=True)` using `KNeighborsTransformer` in sklearn?

  

My ultimate goal is replace some methods that use kneighbors_graph with transformers from the sklearn-ann package. All the methods in sklearn-ann are implemented as sklearn-compatible transformer objects. However, the function I'm trying to replace uses kneighbors_graph(mode="connectivity", include_self=True) and I'm having a hard time converting the distance output with include_self=False to this type of connectivity matrix. Not all the transformer objects allow for connectivity mode while including self but all provide access to distance calculations without self.

I'm able to reproduce the kneighbors_graph(mode="connectivity", include_self=True) from kneighbors_graph(mode="distance", include_self=True) (referring to as nn_with_self). However, I'm unable to reproduce it from kneighbors_graph(mode="distance", include_self=False) (referring to as nn_without_self) which is the same output as KNeighborsTransformer(mode="distance").fit_transform.

I see that the nn_without_self is a super set of nn_with_self but I don't know how the backend algorithm selects which fields are kept.

How can I recreate nn_with_self from the nn_without_self matrix below?

Further, how can I operate on sparse matrices the whole time without converting to dense matrices?

I tried looking at the backend code but it's like an inception of class inheritance and I find myself pouring through several files at the same time losing track on the GitHub.

from sklearn.datasets import make_classification
from sklearn.neighbors import kneighbors_graph, KNeighborsTransformer

X, _ = make_classification(n_samples=10, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
n_neighbors=3

# Nearest neighbors
nn_with_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=True,n_jobs=-1).todense()
nn_without_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=False,n_jobs=-1).todense()
nn_from_transformer = KNeighborsTransformer(mode="distance", n_neighbors=n_neighbors, metric="euclidean", n_jobs=-1).fit_transform(X)

np.all(nn_from_transformer == nn_without_self)
# True

np.all(nn_with_self == nn_without_self)
# False

# Is `nn_with_self` symmetric?
np.allclose(nn_with_self,nn_with_self.T)
# False

# Is `nn_without_self` symmetric?
np.allclose(nn_without_self,nn_without_self.T)
# False

Here are the actual arrays:

nn_with_self
# matrix([[0.        , 0.70550439, 0.        , 0.20463097, 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.        , 0.        , 0.        , 0.51947869, 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.44145655],
#         [0.        , 0.        , 0.        , 0.        , 0.50025504,
#          0.        , 0.        , 0.        , 0.49481662, 0.        ],
#         [0.20463097, 0.51947869, 0.        , 0.        , 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.        , 0.        , 0.50025504, 0.        , 0.        ,
#          0.        , 0.        , 0.        , 0.34132965, 0.        ],
#         [0.        , 0.88867318, 0.        , 0.        , 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.44956691],
#         [0.        , 0.        , 1.10390699, 0.        , 1.52953542,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.        , 0.        , 0.        , 0.        , 0.        ,
#          3.62670755, 0.        , 0.        , 0.        , 3.83571739],
#         [0.        , 0.        , 0.49481662, 0.        , 0.34132965,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.        , 0.44145655, 0.        , 0.        , 0.        ,
#          0.44956691, 0.        , 0.        , 0.        , 0.        ]])

nn_without_self
# matrix([[0.        , 0.70550439, 0.        , 0.20463097, 1.02852831,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.70550439, 0.        , 0.        , 0.51947869, 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.44145655],
#         [0.        , 0.        , 0.        , 0.        , 0.50025504,
#          0.        , 1.10390699, 0.        , 0.49481662, 0.        ],
#         [0.20463097, 0.51947869, 0.        , 0.        , 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.95611187],
#         [1.02852831, 0.        , 0.50025504, 0.        , 0.        ,
#          0.        , 0.        , 0.        , 0.34132965, 0.        ],
#         [0.        , 0.88867318, 0.        , 1.40547465, 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.44956691],
#         [0.        , 0.        , 1.10390699, 0.        , 1.52953542,
#          0.        , 0.        , 0.        , 1.59848513, 0.        ],
#         [0.        , 4.1280709 , 0.        , 0.        , 0.        ,
#          3.62670755, 0.        , 0.        , 0.        , 3.83571739],
#         [1.36553076, 0.        , 0.49481662, 0.        , 0.34132965,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.        , 0.44145655, 0.        , 0.95611187, 0.        ,
#          0.44956691, 0.        , 0.        , 0.        , 0.        ]])

Answer

With the following code, the operations are performed using sparse matrices throughout the process. The resulting sparse matrix nn_with_self_sparse includes both the k-neighbors and self-connections without the need to convert to a dense matrix at any point.

>>import numpy as np
from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsTransformer
from scipy.sparse import coo_matrix, diags

# Create a sample dataset
X, _ = make_classification(n_samples=10, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
n_neighbors = 3

# Generate the kneighbors graph without self-connections
knn_transformer = KNeighborsTransformer(n_neighbors=n_neighbors, mode="distance", metric="euclidean", n_jobs=-1)
knn_transformer.fit(X)
nn_without_self = knn_transformer.transform(X)

# Create a sparse diagonal matrix for self-connections with zero distance
num_samples = X.shape[0]
self_connections = diags([0] * num_samples, offsets=0, shape=(num_samples, num_samples), format='coo')

# Combine the k-neighbors graph without self-connections and the self-connections matrix
nn_with_self_sparse = nn_without_self + self_connections

print("Sparse matrix with self-connections:")
print(nn_with_self_sparse)


© 2024 Dagalaxy. All rights reserved.