Simple language generator
Can be used for generating simple regular and context free languages.
Usage: java grammar lngfile wordcount
lngfile: language description
wordcount: number of words to be generated.
It creates two files: language.dat and language.nnl.
language.dat is text file containing
generated words, one per line with spaces between symbols.
language.nnl is text file containing
probabilities releated with generated symbols, can be used for calculating
negative-log likelihood (source's entropy estimation or compression ratio).
Language description file (lngfile) is
composed of production rules:
###############################################################
# The simplest context-free language anbn
#
# first state : S
# terminals : a b
S : a b 10
S : a S b 5
###############################################################
Production rule weight (probability of applying rule) can be specified as last
element in the production rule (see example). Note, character '#' denotes
comment, the first nonterminal symbol in the first rule is starting symbol,
sets of terminal and non-terminal symbols are created implicitly from
production rules.
While generating words program displays average normalized negative-log likelihood based on words already generated. Hence, program can be used as Monte-Carlo estimation of language entropy.