度量学习系列（3）：弱监督度量学习

弱监督度量学习算法利用比有监督度量学习较弱的数据点信息。与有监督度量学习算法不同，弱监督度量学习利用相似或不相似的数据点构成的元组来学习，而不是直接利用有标签的数据来学习。1. 通用API1.1 输入数据下面将介绍元组，元组可以是对，三元组，四元组等，不同的度量学习算法所需的元组形式不同。1.1.1 基本形式每个弱监督算法都以元组作为输入，如果需要的话，还需要为这些元组添加标签（语义...

windSeS

1450人浏览 · 2019-11-23 11:27:06

windSeS · 2019-11-23 11:27:06 发布

弱监督度量学习算法利用比有监督度量学习较弱的数据点信息。与有监督度量学习算法不同，弱监督度量学习利用相似或不相似的数据点构成的元组来学习，而不是直接利用有标签的数据来学习。

1. 通用API

1.1 输入数据

下面将介绍元组，元组可以是对，三元组，四元组等，不同的度量学习算法所需的元组形式不同。

1.1.1 基本形式

每个弱监督算法都以元组作为输入，如果需要的话，还需要为这些元组添加标签（语义表示元组中的各数据点的关系）。这些元组也被称为约束。元组中包含一系列要考虑的点（例如：两个点，三个点等等），标签是关于这一系列点的信息（例如，这两个点是相似的）。此处需要注意的，这些元组的顺序也包含了一些信息，例如基于四元组的度量学习。

对于弱监督度量学习算法来说，tuple的意义，就好比有监督度量学习算法中的 $x$ 。弱监督度量学习的第二个参数是元组的标签，它的语义取决于使用的算法。例如，一个包含两个数据点的二元组，它的标签表明这两个数据是相似的还是不同的。

然后，可以利用元组以及对应的标签来学习：

>>> my_algo.fit(tuples, y)

我们也可以将元组与标签组成数据集拆分成训练集与测试集。

>>> from sklearn.model_selection import train_test_split
>>> pairs_train, pairs_test, y_train, y_test = train_test_split(pairs, y)

有两种方式来组织元组。

1.1.2 元组的3D数组

最直接的表示元组的方法是为算法提供3D数组形式的元组,大小为(n_tuples, t, n_features)，n_tuples为元组的个数，t为元组中数据点的个数，n_features为数据点的特征个数。

>>> import numpy as np
>>> tuples = np.array([[[-0.12, -1.21, -0.20],
>>>                     [+0.05, -0.19, -0.05]],
>>>
>>>                    [[-2.16, +0.11, -0.02],
>>>                     [+1.58, +0.16, +0.93]],
>>>
>>>                    [[+1.58, +0.16, +0.93],  # same as tuples[1, 1, :]
>>>                     [+0.89, -0.34, +2.41]],
>>>
>>>                    [[-0.12, -1.21, -0.20],  # same as tuples[0, 0, :]
>>>                     [-2.16, +0.11, -0.02]]])  # same as tuples[1, 0, :]
>>> y = np.array([-1, 1, 1, -1])

提示：不建议对大数量的元组用上面这种方式，因为它的表示是信息冗余的，且是大量的信息冗余，因此占用大量内存。实际上，数据点的每个特征向量将被复制到元组中对应的位置。对于拥有大量元组的情况，采用下面的方法更为有效。

1.1.3 2D数据指示器+处理器

在表示大数量的元组时，更有效的表示是直接利用包含数据点的数据集存储数据点的信息，然后利用元组中数据点在数据集中的对应索引值来表示该数据点。

>>> X = np.array([[-0.12, -1.21, -0.20],
>>>               [+0.05, -0.19, -0.05],
>>>               [-2.16, +0.11, -0.02],
>>>               [+1.58, +0.16, +0.93],
>>>               [+0.89, -0.34, +2.41]])
>>>
>>> tuples_indices = np.array([[0, 1],
>>>                            [2, 3],
>>>                            [3, 4],
>>>                            [0, 2]])
>>> y = np.array([-1, 1, 1, -1])

1.2 Fit, transform, and so on

弱监督度量学习算法的目的是在考虑元组约束的前提下将数据点转换到新的空间。

>>> from metric_learn import MMC
>>> mmc = MMC(random_state=42)
>>> mmc.fit(tuples, y)
MMC(A0='deprecated', convergence_threshold=0.001, diagonal=False,
  diagonal_c=1.0, init=None, max_iter=100, max_proj=10000,
  preprocessor=None, random_state=42, verbose=False)

或者直接利用预算理器：

>>> from metric_learn import MMC
>>> mmc = MMC(preprocessor=X, random_state=42)
>>> mmc.fit(pairs_indice, y)

此时，估计器已经经过学习了，你可以有多种目的来用这个估计器。

首先，你可以将新数据转换到学习到的空间，利用transform：此处，我们将两个数据点转换到新的嵌入的空间。

>>> X_new = np.array([[9.4, 4.1, 4.2], [2.1, 4.4, 2.3]])
>>> mmc.transform(X_new)
array([[-3.24667162e+01,  4.62622348e-07,  3.88325421e-08],
       [-3.61531114e+01,  4.86778289e-07,  2.12654397e-08]])

像前几篇介绍的，我们的度量学习器也得到了点与点之间的距离，所以我们可以这样用：

score_pairs

>>> mmc.score_pairs([[[3.5, 3.6, 5.2], [5.6, 2.4, 6.7]],
...                  [[1.2, 4.2, 7.7], [2.1, 6.4, 0.9]]])
array([7.27607365, 0.88853014])

get_metric

>>> metric_fun = mmc.get_metric()
>>> metric_fun([3.5, 3.6, 5.2], [5.6, 2.4, 6.7])
7.276073646278203

也可以利用get_mahalanobis_matrix得到马氏矩阵：

>>> mmc.get_mahalanobis_matrix()
array([[ 0.58603894, -5.69883982, -1.66614919],
       [-5.69883982, 55.41743549, 16.20219519],
       [-1.66614919, 16.20219519,  4.73697721]])

1.3 预测与评分

>>> from metric_learn import MMC
>>> import numpy as np
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import cross_val_score
>>> rng = np.random.RandomState(42)
>>> X, _ = load_iris(return_X_y=True)
>>> # let's sample 30 random pairs and labels of pairs
>>> pairs_indices = rng.randint(X.shape[0], size=(30, 2))
>>> y = 2 * rng.randint(2, size=30) - 1
>>> mmc = MMC(preprocessor=X)
>>> cross_val_score(mmc, pairs_indices, y)

2. 基于二元组的学习

2.1 Fitting

>>> from metric_learn import MMC
>>> pairs = np.array([[[1.2, 3.2], [2.3, 5.5]],
>>>                   [[4.5, 2.3], [2.1, 2.3]]])
>>> y_pairs = np.array([1, -1])
>>> mmc = MMC(random_state=42)
>>> mmc.fit(pairs, y_pairs)
MMC(A0='deprecated', convergence_threshold=0.001, diagonal=False,
    diagonal_c=1.0, init=None, max_iter=100, max_proj=10000, preprocessor=None,
    random_state=42, verbose=False)

此处，我们学习到了一个度量，该度量将第一个元组中的两个数据的距离拉的更近，而将第二个元组的数据的距离拉远。

2.2 预测

>>> mmc.predict([[[0.6, 1.6], [1.15, 2.75]],
...              [[3.2, 1.1], [5.4, 6.1]]])
array([1, -1])

我们需要设定预测阈值，然后才可以根据计算出的距离来判定两数据是同类还是异类。可以利用如下三种方式设定预测的阈值:

Calibration at fit time:

>>> mmc.fit(pairs, y) # will fit the threshold automatically after fitting

Calibration on validation set：

>>> mmc.calibrate_threshold(pairs, y)

Manual threshold：

>>> mmc.set_threshold(0.4)

2.3 评分

基于数据对的度量学习器可以返回一个decision_function，该函数可以对一系列数据对进行评分，也即两数据在新空间的距离，该距离也可以用来判定最终这两个数据是否为同一类。

>>> mmc.decision_function([[[0.6, 1.6], [1.15, 2.75]],
...                        [[3.2, 1.1], [5.4, 6.1]]])
array([-0.12811124, -0.74750256])

另外的一些评价：

>>> from sklearn.model_selection import cross_val_score
>>> pairs_test = np.array([[[0.6, 1.6], [1.15, 2.75]],
...                        [[3.2, 1.1], [5.4, 6.1]],
...                        [[7.7, 5.6], [1.23, 8.4]]])
>>> y_test = np.array([-1., 1., -1.])
>>> cross_val_score(mmc, pairs_test, y_test, scoring='accuracy')
array([1., 0., 1.])

>>> pairs_test = np.array([[[0.6, 1.6], [1.15, 2.75]],
...                        [[3.2, 1.1], [5.4, 6.1]],
...                        [[7.7, 5.6], [1.23, 8.4]]])
>>> y_test = np.array([1., -1., -1.])
>>> mmc.score(pairs_test, y_test)
1.0

2.4 算法

2.4.1 ITML

信息理论度量学习 Information Theoretic Metric Learning (ITML)

ITML通过最小化两个在马氏距离约束下的多高斯分布的相对熵，又名Kullback-Leibler divergence（KL分歧），通过最小化受线性约束的LogDet散度，可以将其表述为Bregman优化问题。该算法可以处理各种各样的约束，并可以选择加入一个先验的距离函数。与其他方法不同，ITML不依赖于特征值计算或半定规划。

给定一个由矩阵 $\mathbf{A}$ 参数化的马氏距离，它可以表示成一个多变量高斯函数:

$p(\mathbf{x} ; \mathbf{A})=\frac{1}{Z} \exp \left(-\frac{1}{2} d_{\mathbf{A}}(\mathbf{x}, \mu)\right)=\frac{1}{Z} \exp \left(-\frac{1}{2}\left((\mathbf{x}-\mu)^{T} \mathbf{A}(\mathbf{x}-\mu)\right)\right.$

此处， $Z$ 是一个归一化常数，马氏矩阵 $\mathbf{A}^{-1}$ 正好为高斯函数的协方差矩阵。

给定一系列相似的点对 $\mathbf{S}$ 以及一系列不相似的点对 $\mathbf{D}$ ，度量学习问题变成最小化如下所示的LogDet分歧（问题也可等价为最小化 $\mathbf{K L}\left(p\left(\mathbf{x} ; \mathbf{A}_{0}\right) \| p(\mathbf{x} ; \mathbf{A})\right)$ ）：

$\begin{aligned} \min _{\mathbf{A}} D_{\ell \mathrm{d}}\left(A, A_{0}\right)=& \operatorname{tr}\left(A A_{0}^{-1}\right)-\log \operatorname{det}\left(A A_{0}^{-1}\right)-n \\ \text { subject to } & d_{\mathbf{A}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \leq u \quad\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \in S \\ & d_{\mathbf{A}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \geq l \quad\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \in D \end{aligned}$

此处， $u$ 与 $l$ 分别为相似与不相杺对的距离上下限， $\mathbf{A_0}$ 为先验的距离度量，默认设定为单位阵， $D_{ld}(.)$ 为对数行列式。

from metric_learn import ITML

pairs = [[[1.2, 7.5], [1.3, 1.5]],
         [[6.4, 2.6], [6.2, 9.7]],
         [[1.3, 4.5], [3.2, 4.6]],
         [[6.2, 5.5], [5.4, 5.4]]]
y = [1, 1, -1, -1]

# in this task we want points where the first feature is close to be closer
# to each other, no matter how close the second feature is


itml = ITML()
itml.fit(pairs, y)

References:

[1] Jason V. Davis, et al. Information-theoretic Metric Learning. ICML 2007
[2] Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/itml/

2.4.2 SDML

稀疏高维度量学习 Sparse High-Dimensional Metric Learning (SDML)

SDML是一个能高效应对高维空间的稀疏度量学习方法，SDML中包含双正则化：一个关于偏离单位对角阵的 $KaTeX parse error: Expected '}', got 'EOF' at end of input: \mathbf{M$ 的L1-惩罚项，另一个是 $\mathbf{M}$ 与 $\mathbf{M_0}$ （被设置为 $\mathbf{I}$ 或 $\mathbf{\Omega}^{-1}$ , $\mathbf{\Omega}为协方差矩阵$ ）之间的对数行列式差异。

对于半正定矩阵 $\mathbf{M}$ 的这种最优化目标函数形式是凸的。
$\min _{\mathbf{M}}=\operatorname{tr}\left(\left(\mathbf{M}_{0}+\eta \mathbf{X} \mathbf{L} \mathbf{X}^{T}\right) \cdot \mathbf{M}\right)-\log \operatorname{det} \mathbf{M}+\lambda\|\mathbf{M}\|_{1, o f f}$
此处， $\mathbf{X}=[\mathbf{x_1}, \mathbf{x_2}, ..., \mathbf{x_n}]$ 为训练数据，指示矩阵 $\mathbf{K}_{ij}=1$ 表明 $(\mathbf{x_i}, \mathbf{x_j})$ 是一个相似对，否则为不相似对。Laplacian矩阵 $\mathbf{L}=\mathbf{D}-\mathbf{K}$ ，其中 $\mathbf{D}$ 为一个单位阵，它的对角元素为 $\mathbf{K}$ 的行和， $_{1, off}$ 为偏离对角阵的L1规范。

from metric_learn import SDML

pairs = [[[1.2, 7.5], [1.3, 1.5]],
         [[6.4, 2.6], [6.2, 9.7]],
         [[1.3, 4.5], [3.2, 4.6]],
         [[6.2, 5.5], [5.4, 5.4]]]
y = [1, 1, -1, -1]

# in this task we want points where the first feature is close to be closer
# to each other, no matter how close the second feature is

sdml = SDML()
sdml.fit(pairs, y)

References:

[1] Qi et al. An efficient sparse metric learning in high-dimensional space via L1-penalized log-determinant regularization. ICML 2009.
[2] Adapted from https://gist.github.com/kcarnold/5439945

2.4.3 RCA

相对成分分析 Relative Components Analysis (RCA)
RCA根据块内协方差矩阵的加权和学习一个满秩的马氏距离度量，它采用全局线性变换，将较大的权重分配给相关维，将较小的权重分配给不相关维。这些相关的维度是使用“chunklet”(已知属于同一类的点的子集)来估计的。

对于具有 $n$ 个训练点和 $k$ 块的训练集，该算法的有效是由于只是简单的计算下式

$\mathbf{C}=\frac{1}{n} \sum_{j=1}^{k} \sum_{i=1}^{n_{j}}\left(\mathbf{x}_{j i}-\hat{\mathbf{m}}_{j}\right)\left(\mathbf{x}_{j i}-\hat{\mathbf{m}}_{j}\right)^{T}$
此处，块 $j$ 的数据为 $\{\mathbf{x_{ji}}\}_{i=1}^{n_j}$ ，数据集的均值为 $\hat{m}_j$ 。 $\mathbf{C}^{-1}$ 的逆为马氏矩阵。

from metric_learn import RCA

pairs = [[[1.2, 7.5], [1.3, 1.5]],
         [[6.4, 2.6], [6.2, 9.7]],
         [[1.3, 4.5], [3.2, 4.6]],
         [[6.2, 5.5], [5.4, 5.4]]]
y = [1, 1, -1, -1]

# in this task we want points where the first feature is close to be closer
# to each other, no matter how close the second feature is

rca = RCA()
rca.fit(pairs, y)

References:

[1] Shental et al. Adjustment learning and relevant component analysis. ECCV 2002
[2] Bar-Hillel et al. Learning distance functions using equivalence relations. ICML 2003
[3] Bar-Hillel et al. Learning a Mahalanobis metric from equivalence constraints. JMLR 2005

2.4.4 MMC

基于边信息聚类的度量学习 Metric Learning with Application for Clustering with Side Information (MMC)

MMC最小化相似点之间的距离平方和，同时使不同点之间的距离的和大于1。这就构成了一个凸的，没有局部最小问题的优化问题。该问题可以很有效的被求解。该方法的难点是涉及到特征值的计算，这限制了MMC算法的速度。MMC最初是针对聚类问题而设计的，因此它假定所有的类都是单独的聚集，并遵循各自的分布。

该算法的优化目标函数如下：

$\min _{\mathbf{M} \in \mathrm{S}_{+}^{d}} \sum_{\left(\mathrm{x}_{i}, \mathrm{x}_{j}\right) \in S} d_{\mathrm{M}}\left(\mathrm{x}_{i}, \mathrm{x}_{j}\right) \quad \text { s.t. } \quad \sum_{\left(\mathrm{x}_{i}, \mathrm{x}\right) \in D} d_{\mathrm{M}}^{2}\left(\mathrm{x}_{i}, \mathrm{x}_{j}\right) \geq 1$

from metric_learn import MMC

pairs = [[[1.2, 7.5], [1.3, 1.5]],
         [[6.4, 2.6], [6.2, 9.7]],
         [[1.3, 4.5], [3.2, 4.6]],
         [[6.2, 5.5], [5.4, 5.4]]]
y = [1, 1, -1, -1]

# in this task we want points where the first feature is close to be closer
# to each other, no matter how close the second feature is

mmc = MMC()
mmc.fit(pairs, y)

References:

[1] Xing et al. Distance metric learning with application to clustering with side-information. NIPS 2002
[2] Adapted from Matlab code http://www.cs.cmu.edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz

3. 基于四元组的学习

一些度量学习算法对四元组样本进行学习。在这种情况下，应该为算法提供n_samples四元组的点。每个四元组的语义是前两点比后两点靠得更近。

3.1 Fitting

以下是一个关于四元组度量学习的例子。

>>> from metric_learn import LSML
>>> quadruplets = np.array([[[1.2, 3.2], [2.3, 5.5], [2.4, 6.7], [2.1, 0.6]],
>>>                         [[4.5, 2.3], [2.1, 2.3], [0.6, 1.2], [7.3, 3.4]]])
>>> lsml = LSML(random_state=42)
>>> lsml.fit(quadruplets)
LSML(max_iter=1000, preprocessor=None, prior=None, random_state=42, tol=0.001,
   verbose=False)

或者:

>>> X = np.array([[1.2, 3.2],
>>>               [2.3, 5.5],
>>>               [2.4, 6.7],
>>>               [2.1, 0.6],
>>>               [4.5, 2.3],
>>>               [2.1, 2.3],
>>>               [0.6, 1.2],
>>>               [7.3, 3.4]])
>>> quadruplets_indices = np.array([[0, 1, 2, 3], [4, 5, 6, 7]])
>>> lsml = LSML(preprocessor=X, random_state=42)
>>> lsml.fit(quadruplets_indices)
LSML(max_iter=1000,
   preprocessor=array([[1.2, 3.2],
       [2.3, 5.5],
       [2.4, 6.7],
       [2.1, 0.6],
       [4.5, 2.3],
       [2.1, 2.3],
       [0.6, 1.2],
       [7.3, 3.4]]),
   prior=None, random_state=42, tol=0.001, verbose=False)

此处，我们想得到一个度量，该度量使得每个四元组中的前两个点的距离比后两个点的距离更近。

3.2 预测

>>> quadruplets_test = np.array(
... [[[5.6, 5.3], [2.2, 2.1], [0.4, 0.6], [1.2, 3.4]],
...  [[6.0, 4.2], [4.3, 1.2], [4.5, 0.6], [0.1, 7.8]]])
>>> lsml.predict(quadruplets_test)
array([-1.,  1.])

3.3 评价

>>> lsml.decision_function(quadruplets_test)
array([-1.75700306,  4.98982131])

>>> from sklearn.model_selection import cross_val_score
>>> cross_val_score(lsml, quadruplets, scoring='f1_score')  # this won't work

>>> lsml.score(quadruplets_test)
0.5

3.4 算法：LSML

Metric Learning from Relative Comparisons by Minimizing Squared Residual (LSML)

LSML提出了一种简单而有效的算法，该算法最小化与约束残差平方和相对应的凸目标函数。该算法采用相对距离比较的约束形式，特别适用于不能自然获取成对约束的情况，使得基于成对约束的算法难以部署。此外，当维数较大且只给出少量约束条件时，其稀疏性可使估计更加稳定。
损失函数中的每一项约束 $d(x_a, x_b)<d(x_c, x_d)$ 表示如下形式：
$H\left(d_{\mathbf{M}}\left(\mathbf{x}_{a}, \mathbf{x}_{b}\right)-d_{\mathbf{M}}\left(\mathbf{x}_{c}, \mathbf{x}_{d}\right)\right)$

其中， $H (.)$ 为平方Hinge损失函数：
$H(x)=\left\{\begin{array}{cc}{0} & {x \leq 0} \\ {x^{2}} & {x>0}\end{array}\right.$
求和的损失函数 $L (C)$ 就是将所有的约束 $C=\{(x_a, x_b, x_c, x_d):d(x_a, x_b)<d(x_c,x_d)\}$ 加和。在该算法的原论文中，作者建议每个约束应该先乘一个概率值（权值）再加和。此处为了简化，将权值都取为1，也即普通的直接加和。

优化目标函数如下所示
$\min _{\mathbf{M}}\left(D_{l d}\left(\mathbf{M}, \mathbf{M}_{0}\right)+\sum_{\left(\mathbf{x}_{a}, \mathbf{x}_{b}, \mathbf{x}_{c}, \mathbf{x}_{d}\right) \in C} H\left(d_{\mathbf{M}}\left(\mathbf{x}_{a}, \mathbf{x}_{b}\right)-d_{\mathbf{M}}\left(\mathbf{x}_{c}, \mathbf{x}_{c}\right)\right)\right.$
其中， $\mathbf{M}_0$ 为先验度量矩阵，通常默认为单位阵。 $D_{ld}（.,.）$ 为对数行列式分歧:
$D_{l d}\left(\mathbf{M}, \mathbf{M}_{0}\right)=\operatorname{tr}\left(\mathbf{M M}_{0}\right)-\operatorname{logdet}(\mathbf{M})$

from metric_learn import LSML

quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
               [[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
               [[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
               [[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]

# we want to make closer points where the first feature is close, and
# further if the second feature is close

lsml = LSML()
lsml.fit(quadruplets)

References:

[1] Liu et al. Metric Learning from Relative Comparisons by Minimizing Squared Residual. ICDM 2012
[2] Adapted from https://gist.github.com/kcarnold/5439917

点击阅读全文

CSDN学习社区

CSDN联合极客时间，共同打造面向开发者的精品内容学习社区，助力成长！

更多推荐

嵌入式作业（七）：基于Ardunio的STM32串口通信

嵌入式作业（七）0作业要求1Ardunio 完成STM32的串口通信（1）安装Ardunio IDE（2）stm32串口通信2关于 stduino IDE0作业要求安装 Ardunio IDE 和相关软件支持库，在Ardunio 完成STM32板子的串口通信程序：（1）持续向串口输出“Hello world！”；（2）当接收到“stop!”时，停止输出。网上有一个国人版的MCU集成开发平台， st

CSDN学习社区

JDBC详解

JDBC文章目录JDBC什么是JDBC?JDBC驱动程序:Java使用JDBC访问数据库的步骤:设置classpath:Oracle连接字符串的书写格式:简单的例子:常用数据库的驱动程序及JDBC URL:Oracle数据库:SQL Server数据库MySQL数据库Access数据库PreparedStatement接口:JNDI-数据源（Data Source）与连接池（Connection

CSDN学习社区

“模式识别与机器学习”学习笔记no2.再谈感知机

接**上篇：上篇主要进行了PLA，Pocket算法的理论过程分析和在给定数据集上利用pocket算法对数据集进行分类学习，得到错分数量最少的分类面。上篇中pocket算法的过程已经进行了编程和测试，框架已经建立了起来，这一篇主要上篇中没有提到或涉及不深的几个问题。1.数据集的构造。上篇是直接使用了题目给的向量，这次来根据正态分布来产生数据集。np.random.normal函数可以根据均值和方差生