Skip to content

Tokenize English text in Java

aburkov edited this page Jun 27, 2015 · 2 revisions

With xpresso you can tokenize any English text in Java with a simplicity of NLTK.

Tokenization:

import com.wantedtech.common.xpresso.token.Token;

String text = "English is hard. It can be understood through tough thorough thought, though.";

for (Sentence s : x.String.EN.tokenize(text)) {
	for (Token t : s) {
		x.print(t);
	}
}

Console: English
is
hard
.
It
can
...
Clone this wiki locally