2007 计算语言学相关论文

摘要选录

Abstract 2007

by Liu Haitao

 

   摘要:基于树库和机器学习的语言处理方法是自然语言处理领域中的一个研究热点。本文旨在探索利用语言学手段来提高句法分析精度的可能性。本文采用MaltParser和自建的汉语依存树库进行了相关的汉语依存句法分析实验。通过对句法分析结果的分析,找出了影响句法分析精度的主要因素,并据此对树库中处理某些语言结构的方式进行了修改,然后再对得到的句法分析数据进行进一步的分析,以确定所用方法的有效性。结果表明,无标记依存句法分析精度提高了5.5个百分点,有标记依存句法分析精度提高了7.5个百分点。

摘要本文从泰尼埃1959年的著作出发,讨论了图式、配价、依存关系等构成结构句法的基本成分,并且从语言信息处理的角度,探讨了泰尼埃结构句法的形式化问题。泰尼埃的结构句法是一种面向分析和理解的语言理论,是一种语义驱动的功能句法理论,是一种基于虚图句式的理论,在计算语言学和语言教学等领域得到了广泛应用。

      摘要:本文引入一种基于配价模式的依存句法分析方法,并使用XDG形式化体系和XDK软件包进行了汉语句法分析实验。在本文提出的配价模式中,不仅包括补足语,也包括说明语,不仅考虑了价语的支配能力,也考虑了其被支配的能力。

Abstract: This paper investigates probability distributions of dependency distances in six texts ex­tracted from a Chinese dependency treebank. The fitting results reveal that the investigated distribu­tion can be well captured by the right truncated Zeta distribution. In order to restrict the model only to natural language, two samples with randomly generated governors are investigated. One of them can be described e.g. by the Hyperpoisson distribution, the other satisfies the Zeta distribution. The paper also presents a study on sequential plot and mean dependency distance of six texts with three analyses (syntactic, and two random). Of these three analyses, syntactic analysis has a minimum (mean) dependency distance.

       摘要:本文采用MaltParser和哈工大汉语依存树库进行了基于树库的汉语依存句法分析实验,目的在于发现影响依存句法分析精度、效率和连通性的因素。实验结果表明,POS使得用小训练集也能得到可接受的结果,并且对于句法分析器效率的提高和稳定起着关键的作用;LEX有助于解决句子的联通性,也会降低句法分析的效率;DEP无法单独工作,但和其它两类特征结合可以改善句法分析器的精度和效率。

摘要:本文提出一种用于自然语言处理的概率配价模式理论(Probabilistic Valency Pattern, 简称PVP理论)。PVP理论不仅扩展了传统的配价理论,也在配价模式中加入了概率成分。这种理论不但有助于从概率的角度解释语言的理解或生成过程,对寻求更好的基于统计的自然语言处理算法也有一定的作用。

Abstract: The dependency relation is the most essential ingredient in a dependency-based theory of syntax. This paper presents some statistical findings on the dependency relation extracted from a Chinese dependency treebank. A sentence in the proposed treebank can easily be converted into a SSyntS graph in Meaning-Text Theory. The statistics on the dependency relation show that modifiers make up 55% of all dependencies and actants have a lower proportion of 45%. The paper demonstrates it is possible to extract from the treebank active and passive valence information of a word (or word class). The paper gives a formula to calculate the mean dependency distance (MDD) for a specific type of dependency relation in a language and obtains MDD of all dependency types in Chinese. These figures show that some dependencies tend to be much farther apart than others, and demonstrate that dependency distance tends to minimization and different dependency types have varying preference on the direction of dependency.