【计算机科学】【2014.08】自然语言处理与计算机视觉的递归深度学习

本文为美国斯坦福大学（作者：RichardSocher）的博士论文，共204页。随着人类在互联网上产生的非结构化文本数据总量的增长，对文本数据的智能化处理以及从中提取不同类型知识的需求也在增长。为了解决多种高层次语言任务，本文的研究目标是开发学习模型，可以自动归纳出人类语言的表现，特别是其结构和意义。在自然语言处理中，诸如提取信息、情感分析或语法分析等技术已经取得了很大的进展。然而，已有的解...

梅花香——苦寒来

320人浏览 · 2018-12-03 09:23:07

梅花香——苦寒来 · 2018-12-03 09:23:07 发布

在这里插入图片描述
本文为美国斯坦福大学（作者：RichardSocher）的博士论文，共204页。

随着人类在互联网上产生的非结构化文本数据总量的增长，对文本数据的智能化处理以及从中提取不同类型知识的需求也在增长。为了解决多种高层次语言任务，本文的研究目标是开发学习模型，可以自动归纳出人类语言的表现，特别是其结构和意义。

在自然语言处理中，诸如提取信息、情感分析或语法分析等技术已经取得了很大的进展。然而，已有的解决方案往往基于不同的机器学习模型。本文的目标是开发通用的、可扩展的算法，这些算法能够共同解决这类任务，并学习所涉及语言单元的必要的中间表示。此外，大多数标准方法都需要强有力的语言假设简化，并要求良好的特征表示设计。本文模型主要着重解决两方面的缺陷问题，在语序独立性假设条件下，提供了有效的和一般的陈述句型。此外，提供了最先进的性能，而不需要或很少需要手动设计的特征。

在递归深度学习的基础上，总结了本文提出的新模型族。该族中的模型是非监督和有监督递归神经网络（RNNs）的变型和扩展，将深度和特征学习思想推广到多级结构。本论文的RNN模型在释义检测、情感分析、关系分类、句法分析、图像-语句映射和知识库完成等任务上都取得了较好的效果。

第2章是介绍一般神经网络的章节。论文的三章主要内容探讨了三种递归的深度学习建模方法。本文研究的第一个模型选择是总体目标函数，关键用于指导RNN需要捕获什么。在结构预测（句法分析）、结构情感预测和释义检测方面，探索了非监督、监督和半监督的学习方法。

下一章探讨了递归合成函数，该函数基于短语中的单词计算较长短语的向量。标准的RNN合成函数是基于单个神经网络层的，该层以两个短语或词向量作为输入，并在解析树中的每个节点使用相同的权重集来计算高阶短语向量。但这些不足以表达捕捉到的所有类型的合成，因此开发了多种合成函数的变型。第一种变型根据词义向量和算子矩阵表示每一个词和短语，随后，提出了两种可选择的方法：第一种适用于组合短语在句法类别上的组合函数，改进了广泛使用的斯坦福语法分析器。最近发展起来的表达型复合函数是基于一种新型的神经网络层，称为递归神经张量网络。

研究的第三个主要维度是树结构本身。假设树结构的变体被用于RNN模型作为输入，这允许RNN模型仅集中在句子的语义内容和预测任务上。尤其是将依存树作为底层结构，它允许最终表示能够集中在句子的主要动作（动词）上。这对于基础语义映射到联合句子-图像向量空间是特别有效的。最后一节中的模型假定树结构对于每个输入都是相同的，这证明了3D物体分类任务的有效性。

As the amount of unstructured text data that humanity produces overalland on the Internet grows, so does the need to intelligently process it andextract different types of knowledge from it. My research goal in this thesisis to develop learning models that can automatically induce representations ofhuman language, in particular its structure and meaning in order to solvemultiple higher level language tasks. There has been great progress indelivering technologies in natural language processing such as extractinginformation, sentiment analysis or grammatical analysis. However, solutions areoften based on different machine learning models. My goal is the development ofgeneral and scalable algorithms that can jointly solve such tasks and learn thenecessary intermediate representations of the linguistic units involved. Furthermore,most standard approaches make strong simplifying language assumptions andrequire well designed feature representations. The models in this thesis addressthese two shortcomings. They provide effective and general representations forsentences without assuming word order independence. Furthermore, they provide stateof the art performance with no, or few manually designed features. The newmodel family introduced in this thesis is summarized under the term Recursive DeepLearning. The models in this family are variations andextensions of unsupervised and supervised recursive neural networks (RNNs)which generalize deep and feature learning ideas to hierarchical structures.The RNN models of this thesis obtain state of the art performance on paraphrasedetection, sentiment analysis, relation classification, parsing, image-sentencemapping and knowledge base completion, among other tasks. Chapter 2 is anintroductory chapter that introduces general neural networks. The main threechapters of the thesis explore three recursive deep learning modeling choices.The first modeling choice I investigate is the overall objective function that cruciallyguides what the RNNs need to capture. I explore unsupervised, supervised andsemi-supervised learning for structure prediction (parsing), structuredsentiment prediction and paraphrase detection. The next chapter explores therecursive composition function which computes vectors for longer phrases basedon the words in a phrase. The standard RNN composition function is based on asingle neural network layer that takes as input two phrase or word vectors anduses the same set of weights at every node in the parse tree to compute higherorder phrase vectors. This is not expressive enough to capture all types ofcompositions. Hence, I explored several variants of composition functions. Thefirst variant represents every word and phrase in terms of both a meaningvector and an operator matrix. Afterwards, two alternatives are developed: Thefirst conditions the composition function on the syntactic categories of thephrases being combined which improved the widely used Stanford parser. The mostrecent and expressive composition function is based on a new type of neuralnetwork layer and is called a recursive neural tensor network. The third majordimension of exploration is the tree structure itself. Variants of treestructures are explored and assumed to be given to the RNN model as input. Thisallows the RNN model to focus solely on the semantic content of a sentence andthe prediction task. In particular, I explore dependency trees as theunderlying structure, which allows the final representation to focus on themain action (verb) of a sentence. This has been particularly effective forgrounding semantics by mapping sentences into a joint sentence-image vectorspace. The model in the last section assumes the tree structures are the samefor every input. This proves effective on the task of 3d object classification.

1 引言
2 深度学习背景知识
3 递归目标函数
4 递归合成函数
5 合成树结构的变型

下载英文原文地址：

http://page5.dfpan.com/fs/flec5jc2c2316239160/

更多精彩文章请关注微信号：在这里插入图片描述