Answer a question

I have a point

point = np.array([0.07852388, 0.60007135, 0.92925712, 0.62700219, 0.16943809,
       0.34235233])

And a pandas dataframe

           a           b           c           d           e           f
0   0.025641    0.554686    0.988809    0.176905    0.050028    0.333333
1   0.027151    0.520914    0.985590    0.409572    0.163980    0.424242
2   0.028788    0.478810    0.970480    0.288557    0.095053    0.939394
3   0.018692    0.450573    0.985910    0.178048    0.118399    0.484848
4   0.023256    0.787253    0.865287    0.217591    0.205670    0.303030

I would like to calculate the distance of every row in the pandas dataframe, to that specific point

I tried

import numpy as np
d_all = list()
for index, row in df_scaled[cols_list].iterrows():
        d = np.linalg.norm(centroid-np.array(list(row[cols_list])))
        d_all += [d]
df_scaled['distance_cluster'] = d_all

My solution is really slow though (taking into account that I want to calculate the distance from other points as well.

Is there a way to do my calculations more efficiently ?

Answers

You can compute vectorized Euclidean distance (L2 norm) using the formula

sqrt((a1 - b1)2 + (a2 - b2)2 + ...)

df.sub(point, axis=1).pow(2).sum(axis=1).pow(.5)

0    0.474690
1    0.257080
2    0.703857
3    0.503596
4    0.461151
dtype: float64

Which gives the same output as your current code.


Or, using linalg.norm:

np.linalg.norm(df.to_numpy() - point, axis=1)
# array([0.47468985, 0.25707985, 0.70385676, 0.5035961 , 0.46115096])
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐