2007 计算语言学相关论文
Abstract 2007
by Liu Haitao
Abstract: This paper investigates probability distributions of dependency distances in six texts extracted from a Chinese dependency treebank. The fitting results reveal that the investigated distribution can be well captured by the right truncated Zeta distribution. In order to restrict the model only to natural language, two samples with randomly generated governors are investigated. One of them can be described e.g. by the Hyperpoisson distribution, the other satisfies the Zeta distribution. The paper also presents a study on sequential plot and mean dependency distance of six texts with three analyses (syntactic, and two random). Of these three analyses, syntactic analysis has a minimum (mean) dependency distance.
摘要:本文提出一种用于自然语言处理的概率配价模式理论(Probabilistic Valency Pattern, 简称PVP理论)。PVP理论不仅扩展了传统的配价理论,也在配价模式中加入了概率成分。这种理论不但有助于从概率的角度解释语言的理解或生成过程,对寻求更好的基于统计的自然语言处理算法也有一定的作用。
Abstract: The dependency relation is the most essential ingredient in a dependency-based theory of syntax. This paper presents some statistical findings on the dependency relation extracted from a Chinese dependency treebank. A sentence in the proposed treebank can easily be converted into a SSyntS graph in Meaning-Text Theory. The statistics on the dependency relation show that modifiers make up 55% of all dependencies and actants have a lower proportion of 45%. The paper demonstrates it is possible to extract from the treebank active and passive valence information of a word (or word class). The paper gives a formula to calculate the mean dependency distance (MDD) for a specific type of dependency relation in a language and obtains MDD of all dependency types in Chinese. These figures show that some dependencies tend to be much farther apart than others, and demonstrate that dependency distance tends to minimization and different dependency types have varying preference on the direction of dependency.