What is the fastest way to compute an RBF kernel in python?

Question 1

I would like to compute an RBF or "Gaussian" kernel for a data matrix X with n rows and d columns. The resulting square kernel matrix is given by:

K[i,j] = var * exp(-gamma * ||X[i] - X[j]||^2)

var and gamma are scalars.

What is the fastest way to do this in python?

Question 2

Well you are doing a lot of optimizations in your answer post. I would like to add few more (mostly tweaks). I would build upon the winner from the answer post, which seems to be numexpr based on.

Tweak #1

First off, np.sum(X ** 2, axis = -1) could be optimized with np.einsum. Though this part isn't the biggest overhead, but optimization of any sort won't hurt. So, that summation could be expressed as -

X_norm = np.einsum('ij,ij->i',X,X)

Tweak #2

Secondly, we could leverage Scipy supported blas functions and if allowed use single-precision dtype for noticeable performance improvement over its double precision one. Hence, np.dot(X, X.T) could be computed with SciPy's sgemm like so -

sgemm(alpha=1.0, a=X, b=X, trans_b=True)

Few more tweaks on rearranging the negative sign with gamma lets us feed more to sgemm. Also, we would push in gamma into the alpha term.

Tweaked implementations

Thus, with these two optimizations, we would have two more variants (if I could put it that way) of the numexpr method, listed below -

from scipy.linalg.blas import sgemm

def app1(X, gamma, var):
    X_norm = -np.einsum('ij,ij->i',X,X)
    return ne.evaluate('v * exp(g * (A + B + 2 * C))', {\
        'A' : X_norm[:,None],\
        'B' : X_norm[None,:],\
        'C' : np.dot(X, X.T),\
        'g' : gamma,\
        'v' : var\
    })

def app2(X, gamma, var):
    X_norm = -gamma*np.einsum('ij,ij->i',X,X)
    return ne.evaluate('v * exp(A + B + C)', {\
        'A' : X_norm[:,None],\
        'B' : X_norm[None,:],\
        'C' : sgemm(alpha=2.0*gamma, a=X, b=X, trans_b=True),\
        'g' : gamma,\
        'v' : var\
    })

Runtime test

Numexpr based one from your answer post -

def app0(X, gamma, var):
    X_norm = np.sum(X ** 2, axis = -1)
    return ne.evaluate('v * exp(-g * (A + B - 2 * C))', {
            'A' : X_norm[:,None],
            'B' : X_norm[None,:],
            'C' : np.dot(X, X.T),
            'g' : gamma,
            'v' : var
    })

Timings and verification -

In [165]: # Setup
     ...: X = np.random.randn(10000, 512)
     ...: gamma = 0.01
     ...: var = 5.0

In [166]: %timeit app0(X, gamma, var)
     ...: %timeit app1(X, gamma, var)
     ...: %timeit app2(X, gamma, var)
1 loop, best of 3: 1.25 s per loop
1 loop, best of 3: 1.24 s per loop
1 loop, best of 3: 973 ms per loop

In [167]: np.allclose(app0(X, gamma, var), app1(X, gamma, var))
Out[167]: True

In [168]: np.allclose(app0(X, gamma, var), app2(X, gamma, var))
Out[168]: True

What is the fastest way to compute an RBF kernel in python?

Mangs

Answer a question

Answers

Tweak #1

Tweak #2

Tweaked implementations

Runtime test

所有评论(0)

Mangs