Numpy and SIMD

Numpy is by its design a SIMD structure, which is best examplified by the list indexing feature:python - How to filter numpy array by list of indices? - Stack Overflowfilter_indices = [1,3,5]np.array(

EverNoob

1502人浏览 · 2022-02-17 11:02:35

EverNoob · 2022-02-17 11:02:35 发布

Indexing

Numpy is by its design a SIMD structure, which is best examplified by the list indexing feature:

python - How to filter numpy array by list of indices? - Stack Overflow

filter_indices = [1,3,5]
np.array([11,13,155,22,0xff,32,56,88])[filter_indices]

Together with ohther python structures, such as dictionary, the SIMD execution can be done, but with a more unintuitive grammatic style:

python - Fast replacement of values in a numpy array - Stack Overflow

python - How to use a dictionary to translate/replace elements of an array? - Stack Overflow

from numpy import copy

newArray = copy(theArray)
for k, v in d.iteritems(): newArray[theArray==k] = v

numpy.copy — NumPy v1.22 Manual

test code:

#!/usr/bin/env python2.7

from numpy import copy, random, arange

random.seed(0)
data = random.randint(30, size=10**5)

d = {4: 0, 9: 5, 14: 10, 19: 15, 20: 0, 21: 1, 22: 2, 23: 3, 24: 0}
dk = d.keys()
dv = d.values()

def f1(a, d):
    b = copy(a)
    for k, v in d.iteritems():
        b[a==k] = v
    return b

def f2(a, d):
    for i in xrange(len(a)):
        a[i] = d.get(a[i], a[i])
    return a

def f3(a, dk, dv):
    mp = arange(0, max(a)+1)
    mp[dk] = dv
    return mp[a]


a = copy(data)
res = f2(a, d)

assert (f1(data, d) == res).all()
assert (f3(data, dk, dv) == res).all()

Any All in Python - GeeksforGeeks

python - Use a.any() or a.all() - Stack Overflow

==> obviously direct indexing filter by numpy arrays is going to be faster than numpy + dict;

===> use numpy arrays when you can; python has weaker SIMD support in general.

the indices slicing support makes SIMD easier and more versatile in implementation, but depending on the hardware support and numpy low-level implementations, i.e. how the hardware organize physical storage and accesses and how numpy utilize such features, different slicing directions might have drastically different performances.

==> in general assume row-major or C-like array storage structures, meaning slicing lower indices will be more efficient.

==> some ASICs might have much stronger support for higher indices slicing.

Some Tips

Numpy

While python is not structured for SIMD style prarallel programming, since you don't even have accesses to pointers, Numpy is. And this package has proven to be quite efficient (, power of modular design I guess), so try to base your SIMD coding with Numpy.

Slicing

Numpy slicing is not as straightforward as manipulating the memory buffers directly with pointers in C/C++, but still many slicing options are supported; there are 2 versions:

1. basic:

array[row][col][level]

this is a successive access, reducing array dimensionality by 1 per dereference action, [], if a non-full range of indices is given.

slicing in this mode is restrictive:

array[:][:col][:level]

is actually

array[:col][:level]

==> use this mode for clear, structured expressions of dereferences.

2. multi-directional access, or full slicing support

array[row, col, level]

this mode does the same for specified accesses, but differs in slicing behavior from successive derefences;

array[:, :col, :level]

actually works as intended, i.e. "for all rows, take all cols till col; for each of the taken cols, take all levels till level."

!!!!

Slicing combined with filter/list indexing as introduced in Indexing section is a powerful feature, which strongly mirrors a basic gather/scatter action in most SIMD ISAs, but they can be extremely confusing to use.

e.g.

for a 3D array a:

array([[[0.87330218, 0.348806 , 0.98876196, 0.44153593, 0.35657919],
[0.0591688 , 0.01207211, 0.76808385, 0.5382626 , 0.74737973],
[0.61562341, 0.49494463, 0.99326787, 0.78333718, 0.18965861],
[0.10603183, 0.78535426, 0.54849272, 0.6651616 , 0.99013694]],

[[0.42220155, 0.65080645, 0.92558894, 0.11468048, 0.70492543],
[0.58528903, 0.71053382, 0.96009024, 0.84545703, 0.89357304],
[0.61943998, 0.99428317, 0.54617109, 0.62770748, 0.39451982],
[0.94771556, 0.56667405, 0.18225097, 0.75520699, 0.99649013]],

[[0.05937206, 0.71885611, 0.08577789, 0.82468742, 0.61361646],
[0.13556848, 0.05283339, 0.63987149, 0.91302604, 0.37158879],
[0.37965324, 0.71274351, 0.19897426, 0.48187764, 0.55820695],
[0.20501126, 0.44322089, 0.90804689, 0.55505773, 0.66719231]]])

a[0][np.array((1,2)), np.array((3,4))] or a[0, np.array((1,2)), np.array((3,4))]

will give:

array([0.5382626 , 0.18965861])

add in slicing:

a[:2, np.array((1,2)), np.array((3,4))]

==>
array([[0.5382626 , 0.18965861],
[0.84545703, 0.39451982]])

while

a[:2][np.array((1,2)), np.array((3,4))]

==>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 2 is out of bounds for axis 0 with size 2

Maybe try:

==> try with a terminal and toy examples to help;

==> while slicing, except for only the lowest index, use the [,,,,,] mode/notation only, to avoid confusing yourself;

==> of course, you can always reshape whichever array you are dealing with into 1D and treat it as a C pointer.

Grammar

min vs. minima

minimum is the elementwise SIMD like comparison

numpy.minimum — NumPy v1.22 Manual

array.min/np.amin is a reduction operation, not elementwise

numpy.ndarray.min — NumPy v1.22 Manual

numpy.amin — NumPy v1.22 Manual

==> naturally the same applies to max/maximum

additionally, for the position of the extrema, see:

numpy.argmin — NumPy v1.22 Manual

Broadcast

for initialization use np.full():

python - NumPy array initialization (fill with identical values) - Stack Overflow

updates: array.fill()

numpy.ndarray.fill — NumPy v1.22 Manual

SIMD Alignment

pad an existing array

numpy.pad — NumPy v1.22 Manual

reshape and resize: reshape explicitly requires the new shape to be compatible with the old one

numpy.resize — NumPy v1.22 Manual

numpy.reshape — NumPy v1.22 Manual

==> by choosing order='C' / 'F' (row major or C-like vs. column major or Fortran-like) while casting arrays from nD to 1D or vice versa, you can achieve interleave/deinterleave effects.

Logical Comparison

numpy.greater — NumPy v1.22 Manual

更专业、系统、实战的高性能计算学习资源地

汇聚原天河团队并行计算工程师、中科院计算所专家以及头部AI名企HPC专家，助力解决“卡脖子”问题

更多推荐

计算机广东考研大学排名,广东省最值得考研的8所大学

本文由木哥原创，欢迎转载分享，转载须注明来自百家号“木哥说教”！今日话题：广东省最值得大家考研的8所大学！广东省虽然不是中国好大学最多的省份，但是却是很多名校毕业生最想去的地方，俗话说北上广深，广东就占据两位，足以看得出来广东是非常受大学生欢迎的一个城市，其实广东也有很多好的大学，对于一些想考研的学生也可以参考一下，木哥给大家梳理了8所比较不错的大学出来了，希望能帮到大家。第一所大学：中山大学又叫

高性能计算社区

【考研英语】词汇积累（详细全面，2023最新版）

缺课miss a class、旷课cut a class、辍学drop out、退学quit school、助学金assistantship、奖学金scholarship。老人the elderly、the old、the aged、the senior、aged parents、the old and infirm parents。只有only、nothing but、merely、大约abou

高性能计算社区

计算机类专业学习难度排行,全国33所名校计算机专业考研跨考难度系数排名

计算机系统结构前20名(总共47所高校)1.清华大学，2.华中科技大学，3.西安交通大学，4.上海交通大学，5.浙江大学，6.西安电子科技大学，7.武汉大学，8.复旦大学，9.哈尔滨工业大学，10.东北大学，11.北京大学，12.东南大学，13.北京航空航天大学，14.中国科学技术大学，15.电子科技大学，16.吉林大学，17.南京理工大学，18.重庆大学，19.北京科技大学，20.同济大学计算机