# How to reproduce `kneighbors_graph(include_self=True)` using `KNeighborsTransformer` in sklearn?

```
```

My ultimate goal is replace some methods that use `kneighbors_graph`

with transformers from the sklearn-ann package. All the methods in `sklearn-ann`

are implemented as sklearn-compatible transformer objects. However, the function I'm trying to replace uses `kneighbors_graph(mode="connectivity", include_self=True)`

and I'm having a hard time converting the distance output with `include_self=False`

to this type of connectivity matrix. Not all the transformer objects allow for connectivity mode while including self but all provide access to distance calculations without self.

I'm able to reproduce the `kneighbors_graph(mode="connectivity", include_self=True)`

from `kneighbors_graph(mode="distance", include_self=True)`

(referring to as `nn_with_self`

). However, I'm unable to reproduce it from `kneighbors_graph(mode="distance", include_self=False)`

(referring to as `nn_without_self`

) which is the same output as `KNeighborsTransformer(mode="distance").fit_transform`

.

I see that the `nn_without_self`

is a super set of `nn_with_self`

but I don't know how the backend algorithm selects which fields are kept.

**How can I recreate nn_with_self from the nn_without_self matrix below?**

**Further, how can I operate on sparse matrices the whole time without converting to dense matrices?**

I tried looking at the backend code but it's like an inception of class inheritance and I find myself pouring through several files at the same time losing track on the GitHub.

```
from sklearn.datasets import make_classification
from sklearn.neighbors import kneighbors_graph, KNeighborsTransformer
X, _ = make_classification(n_samples=10, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
n_neighbors=3
# Nearest neighbors
nn_with_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=True,n_jobs=-1).todense()
nn_without_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=False,n_jobs=-1).todense()
nn_from_transformer = KNeighborsTransformer(mode="distance", n_neighbors=n_neighbors, metric="euclidean", n_jobs=-1).fit_transform(X)
np.all(nn_from_transformer == nn_without_self)
# True
np.all(nn_with_self == nn_without_self)
# False
# Is `nn_with_self` symmetric?
np.allclose(nn_with_self,nn_with_self.T)
# False
# Is `nn_without_self` symmetric?
np.allclose(nn_without_self,nn_without_self.T)
# False
```

Here are the actual arrays:

```
nn_with_self
# matrix([[0. , 0.70550439, 0. , 0.20463097, 0. ,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0. , 0. , 0.51947869, 0. ,
# 0. , 0. , 0. , 0. , 0.44145655],
# [0. , 0. , 0. , 0. , 0.50025504,
# 0. , 0. , 0. , 0.49481662, 0. ],
# [0.20463097, 0.51947869, 0. , 0. , 0. ,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0. , 0.50025504, 0. , 0. ,
# 0. , 0. , 0. , 0.34132965, 0. ],
# [0. , 0.88867318, 0. , 0. , 0. ,
# 0. , 0. , 0. , 0. , 0.44956691],
# [0. , 0. , 1.10390699, 0. , 1.52953542,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0. , 0. , 0. , 0. ,
# 3.62670755, 0. , 0. , 0. , 3.83571739],
# [0. , 0. , 0.49481662, 0. , 0.34132965,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0.44145655, 0. , 0. , 0. ,
# 0.44956691, 0. , 0. , 0. , 0. ]])
nn_without_self
# matrix([[0. , 0.70550439, 0. , 0.20463097, 1.02852831,
# 0. , 0. , 0. , 0. , 0. ],
# [0.70550439, 0. , 0. , 0.51947869, 0. ,
# 0. , 0. , 0. , 0. , 0.44145655],
# [0. , 0. , 0. , 0. , 0.50025504,
# 0. , 1.10390699, 0. , 0.49481662, 0. ],
# [0.20463097, 0.51947869, 0. , 0. , 0. ,
# 0. , 0. , 0. , 0. , 0.95611187],
# [1.02852831, 0. , 0.50025504, 0. , 0. ,
# 0. , 0. , 0. , 0.34132965, 0. ],
# [0. , 0.88867318, 0. , 1.40547465, 0. ,
# 0. , 0. , 0. , 0. , 0.44956691],
# [0. , 0. , 1.10390699, 0. , 1.52953542,
# 0. , 0. , 0. , 1.59848513, 0. ],
# [0. , 4.1280709 , 0. , 0. , 0. ,
# 3.62670755, 0. , 0. , 0. , 3.83571739],
# [1.36553076, 0. , 0.49481662, 0. , 0.34132965,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0.44145655, 0. , 0.95611187, 0. ,
# 0.44956691, 0. , 0. , 0. , 0. ]])
```

## Answer

With the following code, the operations are performed using sparse matrices throughout the process. The resulting sparse matrix nn_with_self_sparse includes both the k-neighbors and self-connections without the need to convert to a dense matrix at any point.

`>````
>import numpy as np
from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsTransformer
from scipy.sparse import coo_matrix, diags
# Create a sample dataset
X, _ = make_classification(n_samples=10, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
n_neighbors = 3
# Generate the kneighbors graph without self-connections
knn_transformer = KNeighborsTransformer(n_neighbors=n_neighbors, mode="distance", metric="euclidean", n_jobs=-1)
knn_transformer.fit(X)
nn_without_self = knn_transformer.transform(X)
# Create a sparse diagonal matrix for self-connections with zero distance
num_samples = X.shape[0]
self_connections = diags([0] * num_samples, offsets=0, shape=(num_samples, num_samples), format='coo')
# Combine the k-neighbors graph without self-connections and the self-connections matrix
nn_with_self_sparse = nn_without_self + self_connections
print("Sparse matrix with self-connections:")
print(nn_with_self_sparse)
```