可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 6 years ago.
I want to get some random text generated.
I tried writing a basic Java programme,
int nowords = r.nextInt(2000);
int i, j;
for (i = 0; i < nowords; i++) {
int lengthofword = r.nextInt(10) + 2;
for (j = 0; j < lengthofword; j++) {
int ch = r.nextInt(26);
System.out.print(alphabet[ch]);
}
System.out.print(" ");
}
and the result is something like:
tafawc flnqhabhv mqceuoqy rttzckzqa
bdyxzod zbxweclvia wegmxvuoqez
ijwauhmzw joxm zvphbs ogpjyip
qxoymxkxv yrfoifig fbhecph izxcyfma
xarzse srwic jgi fkbcdcydpz qpdvsz
rqhjieqno fmelfmtgqe qozenjlxtg vfxd
lkmkrksgw ytuaduknsl let ao bm
lsfjednsa qouinii yrwzerdck yb kszttly
zmwflwevyix kdg qpnkzuijva ssau yc
wxews drqsdwbc glxb gokunixldec
lznuwdvksx zkzhsirruxc sqplhv
fzixywkaft fqdkumfgddn bcqp oiwwbo
emhk kv qhm xkjp kacbmcd ojh wzvukx
oztbexkf lylyv kdspqpa zbykj lnprtlxp
af bne ryamumcg oyhldwdlq bqyfxrszuf
wyrijnr ysnefsz lhhazrdwsev tll
ikibsnpqwg ntzlgc aahfsdeups rushos
ihqzyucd mjorscchszm tuppz hxi
ssumrevg
It would be helpful if the text was at least readable instead of this.
I am thinking of using English words and randomly pick from among them to make sentences.
Where can I get a big list of words in English language?
回答1:
The gold standard for natural language processing is Wordnet at http://wordnet.princeton.edu/. This has an active user group, has semantics and syntax associated with words, and interfaces with other NLP tools. If you are thinking of doing computation with the words you should definitely have a look.
However selecting words at random does not generate a useful sentence and I suspect you will be disappointed with the results. Have a look at toolkits such as OpenNLP where there are many tools including part-of-speech (POS) which you will certainly need.
Even when you have sentences that may have valid syntax, you will need to read the work of Chomsky and others. His "Colorless green ideas sleep furiously" http://en.wikipedia.org/wiki/Colorless_green_ideas_sleep_furiously illustrates the problem.
回答2:
Check for Lorem Ipsum on site http://www.lipsum.com/ for generating "Void text"
There are lot of generators on net http://loremipsum.sourceforge.net/
Reference text:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed consectetur viverra fringilla. Donec at lectus at turpis bibendum placerat. Vivamus non nibh mauris. Nulla metus metus, sollicitudin nec egestas id, fermentum at nisl. Pellentesque at nisl est. In nec sem tellus, ac imperdiet lectus. Pellentesque tortor turpis, sagittis vel facilisis tristique, cursus in tortor. Mauris non neque magna, vel dignissim sem. Suspendisse interdum diam tempus dui mattis molestie. Donec in mauris urna, at vulputate ipsum. Sed sodales venenatis quam non tincidunt.
回答3:
I would suggest to use a lorem ipsum generator. For Java there is this on. Online Version is available here.
回答4:
The Wordlist project has some lists. I think it's hard to find a complete list, natural languages don't work like that.
回答5:
A big list I found on the Freebsd CVS
回答6:
CUVPlus is a good machine readable dictionary (the link goes straight to the download page). This is "for research purposes only" (non-commercial licence). It includes classification into nouns, verbs, and so on, so it may be more useful for generating random sentences than just a list of words.
回答7:
Download the openOffice dictionary:
http://wiki.services.openoffice.org/wiki/Dictionaries#English_.28AU.2CCA.2CGB.2CNZ.2CUS.2CZA.29
回答8:
if you are on a linux pc try /usr/share/dict
回答9:
You want to look up "Lorem Ipsum". There is bound to be some sort of library for generating it in Java.
回答10:
The Scrabble wordlists may be worth a look. There's two variations: SOWPODS (everywhere except USA and Canada) and TWL (for the US and Canada). Both word lists are readily downloadable from various sites.
However, for what you need, you may want to considering also using Lorem Ipsum (aka 'lipsum'). One popular Lipsum generator is here, although there are many others.
回答11:
When I did this in 12th grade, back in 1972, I made a list of all the possible second letters in English. In other words, a vector of 26 strings. The first string was all the possible letters that could follow A, the second was all the possible letters that could follow B, and so on.
I made the lists just by trying to think of a word with each possible two letter sequence, and if it was too hard to think of one, I did not include it. Therefore I ended up with all of the common two letter sequences in English.
I do remember that the generated text was pronounceable, and that there were often real words, or almost real words in it.
I was written on OCR mark sense cards in BASIC for the HP 2100A minicomputer with 8k of core memory.
I've since learned that you can usually identify a language by examining the frequency of letter triplets, so I suspect that if you do this to one more level, you will end up with a lot more real words, and a much greater eerie resemblance to some form of English.