Given a list of words, how would you go about arranging them into a crossword grid?
It wouldn't have to be like a "proper" crossword puzzle which is symmetrical or anything like that: basically just output a starting position and direction for each word.
Would there be any Java examples available?
Here is some javascript code based on nickf's answer and Bryan's python code. Just posting it in case someone else needs it in js.
Although this is an older question, will attempt an answer based on similar work i have done.
There are many approaches to solving constraint problems (which generallay are in NPC complexity class).
This is related to combinatorial optimization and constraint programming. In this case the constraints are the geometry of the grid and the requirement that words are unique etc..
Randomization/Annealing approaches can also work (although within the proper setting).
Efficient simplicity might just be the ultimate wisdom !
The requirements were for a more or less complete crossword compiler and (visual WYSIWYG) builder.
Leaving aside the WYSIWYG builder part, the compiler outline was this:
Load the available wordlists (sorted by word length, ie 2,3,..,20)
Find the wordslots (ie grid words) on the user-constructed grid (eg word at x,y with length L, horizontal or vertical) ( complexity O(N) )
Compute the intersecting points of the grid words (that need to be filled) ( complexity O(N^2) )
Compute the intersections of the words in the wordlists with the various letters of the alphabet used (this allows to search for matching words by using a template eg. Sik Cambon thesis as used by cwc ) ( complexity O(WL*AL) )
Steps .3 and .4 allow to do this task:
a. The intersections of the grid words with themselves enable to create a "template" for trying to find matches in the associated wordlist of available words for this grid word (by using the letters of other intersecting words with this word which are already filled at a certain step of the algorithm)
b. The intersections of the words in a wordlist with the alphabet enable to find matching (candidate) words that match a given "template" (eg 'A' in 1st place and 'B' in 3rd place etc..)
So with these data structures implemented the algorithm used was sth like this:
NOTE: if the grid and the database of words are constant the previous steps can just be done once.
First step of the algorithm is select an empty wordslot (grid word) at random and fill it with a candidate word from its associated wordlist (randomization enables to produce different solutons in consecutive executions of the algorithm) ( complexity O(1) or O(N) )
For each still empty word slots (that have intersections with already filled wordslots), compute a constraint ratio (this can vary, sth simple is the number of available solutions at that step) and sort the empty wordslots by this ratio ( complexity O(NlogN) or O(N) )
Loop through the empty wordslots computed at previous step and for each one try a number of cancdidate solutions (making sure that "arc-consistency is retained", ie grid has a solution after this step if this word is used) and sort them according to maximum availability for next step (ie next step has a maximum possible solutions if this word is used at that time in that place, etc..) ( complexity O(N*MaxCandidatesUsed) )
Fill that word (mark it as filled and go to step 2)
If no word found that satisfies the criteria of step .3 try to backtrack to another candidate solution of some previous step (criteria can vary here) ( complexity O(N) )
If backtrack found, use the alternative and optionally reset any already filled words that might need reset (mark them as unfilled again) ( complexity O(N) )
If no backtrack found, the no solution can be found (at least with this configuration, initial seed etc..)
Else when all wordlots are filled you have one solution
This algorithm does a random consistent walk of the solution tree of the problem. If at some point there is a dead end, it does a backtrack to a previous node and follow another route. Untill either a solution found or number of candidates for the various nodes are exhausted.
The consistency part makes sure that a solution found is indeed a solution and the random part enables to produce different solutions in different executions and also on the average have better performance.
PS. all this (and others) were implemented in pure JavaScript (with parallel processing and WYSIWYG) capability
PS2. The algorithm can be easily parallelized in order to produce more than one (different) solution at the same time
Hope this helps
I came up with a solution which probably isn't the most efficient, but it works well enough. Basically:
This makes a working, yet often quite poor crossword. There were a number of alterations I made to the basic recipe above to come up with a better result.
This algorithm creates 50 dense 6x9 arrow crosswords in 60 seconds. It uses a word database (with word+tips) and a board database (with pre-configured boards).
A bigger word database decreases generation time considerably and some kind of boards are harder to fill! Bigger boards require more time to be filled correctly!
Example:
Pre-Configured 6x9 Board:
(# means one tip in one cell, % means two tips in one cell, arrows not shown)
Generated 6x9 Board:
Tips [line,column]:
Why not just use a random probabilistic approach to start with. Start with a word, and then repeatedly pick a random word and try to fit it into the current state of the puzzle without breaking the constraints on the size etc.. If you fail, just start all over again.
You will be surprised how often a Monte Carlo approach like this works.
I was playing around crosswords generator engine, and I found this the most important :
0.
!/usr/bin/python
a.
allwords.sort(key=len, reverse=True)
b. make some item/object like cursor which will walk around matrix for easy orientation unless you want to iterate by random choice later on.
the first, pick up first pair and place them across and down from 0,0 ; store the first one as our current crossword 'leader'.
move cursor by order diagonal or random with greater diagonal probability to next empty cell
iterate over the words like and use free space length to define max word length :
temp=[] for w_size in range( len( w_space ), 2, -1 ) : # t for w in [ word for word in allwords if len(word) == w_size ] : # if w not in temp and putTheWord( w, w_space ) : # temp.append( w )
to compare word against free space I used i.e. :
after each successfully used word, change direction. Loop while all cells are filled OR you run out of words OR by limit of iterations then :
# CHANGE ALL WORDS LIST inexOf1stWord = allwords.index( leading_w ) allwords = allwords[:inexOf1stWord+1][:] + allwords[inexOf1stWord+1:][:]
...and iterate again new crossword.
Make the scoring system by easiness of filling, and some estimation calcs. Give score for the current crossword and narrow later choice by append it into list of made crosswords if the score is satisfied by your scoring system.
After first iteration session iterate again from the list of made crosswords to finish the job.
By using more parameters speed can be improved by a huge factor.