I want to get some random text generated.
I tried writing a basic Java programme,
int nowords = r.nextInt(2000);
int i, j;
for (i = 0; i < nowords; i++) {
int lengthofword = r.nextInt(10) + 2;
for (j = 0; j < lengthofword; j++) {
int ch = r.nextInt(26);
System.out.print(alphabet[ch]);
}
System.out.print(" ");
}
and the result is something like:
tafawc flnqhabhv mqceuoqy rttzckzqa bdyxzod zbxweclvia wegmxvuoqez ijwauhmzw joxm zvphbs ogpjyip qxoymxkxv yrfoifig fbhecph izxcyfma xarzse srwic jgi fkbcdcydpz qpdvsz rqhjieqno fmelfmtgqe qozenjlxtg vfxd lkmkrksgw ytuaduknsl let ao bm lsfjednsa qouinii yrwzerdck yb kszttly zmwflwevyix kdg qpnkzuijva ssau yc wxews drqsdwbc glxb gokunixldec lznuwdvksx zkzhsirruxc sqplhv fzixywkaft fqdkumfgddn bcqp oiwwbo emhk kv qhm xkjp kacbmcd ojh wzvukx oztbexkf lylyv kdspqpa zbykj lnprtlxp af bne ryamumcg oyhldwdlq bqyfxrszuf wyrijnr ysnefsz lhhazrdwsev tll ikibsnpqwg ntzlgc aahfsdeups rushos ihqzyucd mjorscchszm tuppz hxi ssumrevg
It would be helpful if the text was at least readable instead of this.
I am thinking of using English words and randomly pick from among them to make sentences. Where can I get a big list of words in English language?
The Wordlist project has some lists. I think it's hard to find a complete list, natural languages don't work like that.
Download the openOffice dictionary:
http://wiki.services.openoffice.org/wiki/Dictionaries#English_.28AU.2CCA.2CGB.2CNZ.2CUS.2CZA.29
if you are on a linux pc try /usr/share/dict
You want to look up "Lorem Ipsum". There is bound to be some sort of library for generating it in Java.
The gold standard for natural language processing is Wordnet at http://wordnet.princeton.edu/. This has an active user group, has semantics and syntax associated with words, and interfaces with other NLP tools. If you are thinking of doing computation with the words you should definitely have a look.
However selecting words at random does not generate a useful sentence and I suspect you will be disappointed with the results. Have a look at toolkits such as OpenNLP where there are many tools including part-of-speech (POS) which you will certainly need.
Even when you have sentences that may have valid syntax, you will need to read the work of Chomsky and others. His "Colorless green ideas sleep furiously" http://en.wikipedia.org/wiki/Colorless_green_ideas_sleep_furiously illustrates the problem.
I would suggest to use a lorem ipsum generator. For Java there is this on. Online Version is available here.