Thursday, March 19, 2009

Automatic Language Identification using Python

Cognitive Science and Coding: Automatic Language Identification using Python: "I was playing around with an idea of an automatic language detection script for detection of languages using ngrams. The idea was to use a sample corpus for each language to build language profiles. For a sentence whose language is to be detected, a profile consiting of ngrams with relative frequency scores is built and then compared to the existing language profiles. The output is a normalized ranking score for each language profile, with 100 being the score of the best match."