MULTILINGUAL DISTRIBUTIONAL LEXICAL SIMILARITY
Author: Kirk Baker
Publisher: Ohio State University
Publication date: 2008
Number of pages: 243
Size: 1,3 Mb
One of the most fundamental problems in natural language processing involves words that are not in the dictionary, or unknown words. The supply of unknown words is virtually unlimited – proper names, technical jargon, foreign borrowings, newly created words, etc. – meaning that lexical resources like dictionaries and thesauri inevitably miss important vocabulary items. However, manually creating and maintaining broad coverage dictionaries and ontologies for natural language processing is expensive and difficult.