The objective of my code is to get each seperate word from a txt file and put it into a list and then making a binary search tree using that list to count the frequency of each word and printing each word in alphabetical order along with its frequency. Each word in the can only contain letters, numbers, -, or ' The part that I am unable to do with my beginner programming knowledge is to make the Binary Search Tree using the list I have (I am only able to insert the whole list in one Node instead of putting each word in a Node to make the tree). The code I have so far is this:
def read_words(filename):
openfile = open(filename, "r")
templist = []
letterslist = []
for lines in openfile:
for i in lines:
ii = i.lower()
letterslist.append(ii)
for p in letterslist:
if p not in ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',"'","-",' '] and p.isdigit() == False:
letterslist.remove(p)
wordslist = list("".join(letterslist).split())
return wordslist
class BinaryTree:
class _Node:
def __init__(self, value, left=None, right=None):
self._left = left
self._right = right
self._value = value
self._count = 1
def __init__(self):
self.root = None
def isEmpty(self):
return self.root == None
def insert(self, value) :
if self.isEmpty() :
self.root = self._Node(value)
return
parent = None
pointer = self.root
while (pointer != None) :
if value == pointer._value:
pointer._count += 1
return
elif value < pointer._value:
parent = pointer
pointer = pointer._left
else :
parent = pointer
pointer = pointer._right
if (value <= parent._value) :
parent._left = self._Node(value)
else :
parent._right = self._Node(value)
def printTree(self):
pointer = self.root
if pointer._left is not None:
pointer._left.printTree()
print(str(pointer._value) + " " + str(pointer._count))
if pointer._right is not None:
pointer._right.printTree()
def createTree(self,words):
if len(words) > 0:
for word in words:
BinaryTree().insert(word)
return BinaryTree()
else:
return None
def search(self,tree, word):
node = tree
depth = 0
count = 0
while True:
print(node.value)
depth += 1
if node.value == word:
count = node.count
break
elif word < node.value:
node = node.left
elif word > node.value:
node = node.right
return depth, count
def main():
words = read_words('sample.txt')
b = BinaryTree()
b.insert(words)
b.createTree(words)
b.printTree()
Since you're a beginner I'd advice to implement the tree methods with recursion instead of iteration since this will result to simpler implementation. While recursion might seem a bit difficult concept at first often it is the easiest approach.
Here's a draft implementation of a binary tree which uses recursion for insertion, searching and printing the tree, it should support the functionality you need.
class Node(object):
def __init__(self, value):
self.value = value
self.left = None
self.right = None
self.count = 1
def __str__(self):
return 'value: {0}, count: {1}'.format(self.value, self.count)
def insert(root, value):
if not root:
return Node(value)
elif root.value == value:
root.count += 1
elif value < root.value:
root.left = insert(root.left, value)
else:
root.right = insert(root.right, value)
return root
def create(seq):
root = None
for word in seq:
root = insert(root, word)
return root
def search(root, word, depth=1):
if not root:
return 0, 0
elif root.value == word:
return depth, root.count
elif word < root.value:
return search(root.left, word, depth + 1)
else:
return search(root.right, word, depth + 1)
def print_tree(root):
if root:
print_tree(root.left)
print root
print_tree(root.right)
src = ['foo', 'bar', 'foobar', 'bar', 'barfoo']
tree = create(src)
print_tree(tree)
for word in src:
print 'search {0}, result: {1}'.format(word, search(tree, word))
# Output
# value: bar, count: 2
# value: barfoo, count: 1
# value: foo, count: 1
# value: foobar, count: 1
# search foo, result: (1, 1)
# search bar, result: (2, 2)
# search foobar, result: (2, 1)
# search bar, result: (2, 2)
# search barfoo, result: (3, 1)
To answer your direct question, the reason why you are placing all of the words into a single node is because of the following statement inside of main()
:
b.insert(words)
The insert function creates a Node
and sets the value of the node to the item you pass in. Instead, you need to create a node for each item in the list which is what your createTree()
function does. The preceeding b.insert
is not necessary.
Removing that line makes your tree become correctly formed, but reveals a fundamental problem with the design of your data structure, namely the printTree()
method. This method seems designed to traverse the tree and recursively call itself on any child. In your initial version this function worked, because there the tree was mal-formed with only a single node of the whole list (and the print function simply printed that value since right and left were empty).
However with a correctly formed tree the printTree()
function now tries to invoke itself on the left and right descendants. The descendants however are of type _Node
, not of type BinaryTree
, and there is no methodprintTree()
declared for _Node
objects.
You can salvage your code and solve this new error in one of two ways. First you can implement your BinaryTree.printTree()
function as _Node.printTree()
. You can't do a straight copy and paste, but the logic of the function won't have to change much. Or you could leave the method where it is at, but wrap each _left
or _right
node inside of a new BinaryTree so that they would have the necessary printTree()
method. Doing this would leave the method where it is at, but you will still have to implement some kind of helper traversal method inside of _Node
.
Finally, you could change all of your _Node objects to be _BinaryTree objects instead.
The semantic difference between a node and a tree is one of scope. A node should only be aware of itself, its direct children (left and right), and possibly its parent. A tree on the other hand can be aware of any of its descendents, no matter how far removed. This is accomplished by treating any child node as its own tree. Even a leaf, without any children at all can be thought of as a tree with a depth of 0. This behavior is what lets a tree work recursively. Your code is mixing the two together.