SourceForge Logo

Applications for N-Grams

This page is based uppon research on Language recognition based on n-Grams. One of the theoretical foundations of this is the paper
"N-GRAM-Based Text Cathegorization" by Willian B.Canvar and John M.Trenkle

Based on this article there is the text_cat Perl script by Gertjan van Noord which is able to fairly reliable guess the right out off 70 natural languages when presented with a file containing text in that language.

Recently I did a quick and dirty port of this programm to Java. (Actually this is not really a port since it lacks some features and includes some others more, but it uses the same basic principle).

The most important points of the algorithm

Based on that algorithm I currently have two applications:

Application 1: Guessing a language

The Java program is capable to do the same job as text_cat, i.e. to guess text language from a sample.

I'm sorry I haven't prepared that as an online application yet. But it actually works the same way as it works with above linked text_cat stuff.

1
2
abc
3
def
4
ghi
5
jkl
6
mno
7
pqrs
8
tuv
9
wxy

0
_-.

Application 2: PHONER, Generates Memorizable phonenumbers

Well, what is common in the US, to buy a phone number which can be typed as a short piece of text. This is also possible in Germany now. But what can you do with your old numbers? The answer is: Use PHONER, phoner takes the N-Gram information and your number and tries to find a word which is easier to memorize. You can test that algorithm with below form:

The Number Language Resource
 (Note: Answer may take minutes!)
Remarks: Caution: Though the algorithm is perfectly capable to help you memorize your bank cards PIN or so, I do not encourage you to type that into above unsafe form. It is internet, there could be someone listening.

More Applications? More Ideas!

SourceForge Logo

Download and Documentation

Go to the
The NGramJ Sourceforge pages.
Contact Frank Sven Nestel. Other experiments on this site Franks search engine.
Back to Franks Playground, to Spieleck-Entry-Page
or go to Doris&Frank's site.