Using Distant Supervision to Build a Proposition Bank

Semantic role labeling has become a key module of many language processing applications. To build an unrestricted semantic role labeler, the first step is to develop a comprehensive proposition bank. However, building such a bank is a costly enterprise, which has only been achieved for a handful of languages. In this paper, we describe a technique to build proposition banks for new languages using distant supervision. Starting from PropBank in English and loosely parallel corpora such as versions of Wikipedia in different languages, we carried out a mapping of semantic propositions we extracted from English to syntactic structures in Swedish using named entities. We could identify 2,333 predicate‚Äďargument frames in Swedish.