Description
Given a dictionary, my program generates two output files, 'sequences.txt' and 'words.txt'.
- 'sequences' contain every sequence of four letters (A-z) that appear in exactly one word of the dictionary, one sequence per line.
- 'words' will contain the corresponding words that contain the sequences, in the same order, again one per line.
For example, given spec/fixtures/sample_words.txt
dictionary containing only
arrows
carrots
give
me
The outputs should be:
'sequences' 'words'
carr carrots
give give
rots carrots
rows arrows
rrot carrots
rrow arrows
Of course, 'arro' does not appear in the output since it is found in more than one word.
What I've come up with so far
Project structure:
├── Gemfile
├── Gemfile.lock
├── examples
│ └── dictionary.txt
├── lib
│ └── word_sequence_parser.rb
├── main.rb
├── output
├── readme.md
└── spec
├── fixtures
│ └── sample_words.txt
└── word_sequence_parser_spec.rb
To run the script:
ruby main.rb examples/dictionary.txt
main.rb
require_relative 'lib/word_sequence_parser.rb'
dict_path = ARGV.shift
if dict_path.nil?
dict_path = 'spec/fixtures/sample_words.txt'
end
parser = WordSequenceParser.new(dict_path)
# step 1 - Opens dictionary file and generates a new set of words
parser.set
# step 2 - Parses word sequences
parser.sequence
# step 3 - Prints to files in ./output
parser.dump_text
The script that works
word_sequence_parser.rb
require 'set'
class WordSequenceParser
def initialize(path)
@path = path
end
def set
set = Set.new
File.open(@path) do |f|
f.each_line do |line|
set.add(line.chomp.downcase)
end
end
set
end
def sequence
sequences = Set.new
words = Set.new
to_remove = Set.new
set.each do |w|
letters = w.split(//)
letters.each_cons(4) do |seq|
s = seq.join
if !words.add?(s)
to_remove.add(s)
end
sequences.add( {seq: s, word: w} )
end
end
sequences.delete_if { |hash| to_remove.include?(hash[:seq]) }
end
def dump_text
output_s = File.open( 'output/sequences.txt', 'w' )
output_w = File.open( 'output/words.txt', 'w' )
sequence.each do |hash|
output_s.puts("#{hash[:seq]}")
output_w.puts("#{hash[:word]}")
end
output_s.close
output_w.close
end
end
My shot at the script with memorization that doesn't work
require 'set'
class WordSequenceParser
def initialize(path)
@path = path
end
def set
set = Set.new
File.open(@path) do |f|
f.each_line do |line|
set.add(line.chomp.downcase)
end
end
set
end
def memoize
@set = set
end
def sequence
sequences = Set.new
words = Set.new
to_remove = Set.new
@set.each do |w|
letters = w.split(//)
letters.each_cons(4) do |seq|
s = seq.join
if !words.add?(s)
to_remove.add(s)
end
sequences.add( {seq: s, word: w} )
end
end
sequences.delete_if { |hash| to_remove.include?(hash[:seq]) }
end
def dump_text
output_s = File.open( 'output/sequences.txt', 'w' )
output_w = File.open( 'output/words.txt', 'w' )
sequence.each do |hash|
output_s.puts("#{hash[:seq]}")
output_w.puts("#{hash[:word]}")
end
output_s.close
output_w.close
end
end
I get this error message when trying to run the script.
../word_sequence_parser.rb:29:in `sequence': undefined method `each' for nil:NilClass (NoMethodError)
from main.rb:15:in `<main>'
I've read up on Justin Weiss' article on memoization and for the most part get it. Just having a hard time implementing this technique into something I've already written.