Note: Classes do not meet June 29 or June 30. The midterm exam is tenatively scheduled for Thursday, July 2, 1998.

Trees

A tree is a generalization of a linked list. A linked list node can be thought of as having a single successor or child, the node that follows it. The last node in the list has no children. We can extend this concept to allow a node to have many children.

A program examining such a tree can decide which child it wants to follow based on the information in the node, or may choose to follow all the children.

Tree nodes that have no children are called leaf nodes.

Tree nodes that are not leaves are called internal nodes.

The number of children a node has it is degree.

The "top" node in a tree, the one leading (eventually) to all of the other nodes, is called the root node.

The depth of a node x is the number of nodes between the root and x. The root is said to be at depth 0; any children of the root are at depth 1, etc. The height of a tree is the maximum depth of any node. For convenience, the height of an empty tree is defined as -1. All the nodes with the same depth d are in a set called level d.

Often, we will use familial relationships (e.g. silbling, grandchild, parent, etc.) to describe relationships among tree nodes with the obvious interpretations. Similarly, tree analogies will be used (e.g., a set of trees is a forest, a tree with lots of breadth is bushy, etc.).

In lecture 7, we saw an example of such a data structure: a parse tree, where the root represented an entire expression, internal nodes represented operations, and leaf nodes represented variable names. The tree was created by a recursive descent parser, which built the tree in a bottom up manner (i.e., from leaf nodes to the root), then evaluated it in a top down manner (i.e., from root to leaf).

Here's an example of an ordinary tree:

                                a                     Level 0
                               / \ 
                              /   \
                             /     \
                            /       \
                           b         c                Level 1
                         / | \       |
                        /  |  \      |
                       d   e   f     g                Level 2
                      / \  |  /|\   / \
                     h   i j k l m n   o              Level 3
                    / \          |    /|\
                   p   q         r   x y z            Level 4
                               / | \
                              / /|\ \
                             / | | | \
                            s  t u v  w               Level 5

Implementation of Trees

As in the other data structures we have seen, tree nodes are for storing some kind of keyed data. Typically this data is a record, like a student record with key=SSN, but we can simply consider trees where the data is a key, like an integer.

How can we represent trees? If we know the maximum degree (number of children) of any internal node, we can do something similar to a linked list, with an array of pointers to the next node, instead of just one, e.g.:

#define MAX_DEGREE	20 /* up to 20 children */

typedef struct _treenode {
	int		k;	/* information */
	struct _treenode *children[MAX_DEGREE];
} treenode;

and we can use NULL pointers to indicate no successor node.

If we don't know the maximum degree, then we can use a linked list of siblings, each of which has a link to a linked list of children:

typedef struct _treenode {
	int		k;	/* information */
	struct _treenode *children, /* list of children */
			 *next_sibling; /* list of siblings */
} treenode;

So next_sibling points to the next node in a list of siblings at the same level, and children points to a list (connected by that structures next_sibling field) of children. (It turns out that this kind of node is a very useful general list processing structure; see the computer language LISP for more information.)

For example, we can represent the tree:

                 _____
		|1| |/|
                  _|___   _____   _____
                 |2|/|-|-|3|/|-|-|4| |/|
                                   _|___   _____
                                  |5|/|-|-|6|/|/|

How do we insert into such trees? Well, it all depends on the nature of the values of keys and how they are related to the concept of successors to a node. That is, until we know what a key means and why we want to stick it into a tree, it doesn't make sense to talk about how to insert it into a tree. An easy example of insertion is binary search trees.

Binary Trees

A binary tree is a tree where the maximum degree of any internal node is 2. Thus, a node may have 0, 1, or 2 children. Here is a binary tree:

What is the height of this tree? What are the leaves? The internal nodes?

A special case of a binary tree is a complete binary tree. A complete binary tree is one in which:

Every internal node has degree exactly two, and
Every leaf node occurs at the same level.

Can you think of trees that satisfy one of these constraints, but not the other?

For example, this is a complete binary tree:

                       1
                      / \
                     /   \
                    2     3
                   / \   / \
                  4   5 6   7

Complete binary trees are important because they allow us to explore certain aspects of binary trees in a simple context.

How many leaf nodes are there in a complete binary tree of height h? Notice that, whatever the answer is at height h, it is twice the value for height h-1. We can look at this as a recursive function:

f(h) =

1, if h = 0 (i.e., just the root node by itself), or
2 f(h) otherwise.

This adds up to exponentiation, i.e., f(h) is just 2^h. In class, we'll use this fact to figure out how many total nodes there are in a tree of height h. It turns out to be 2^h+1-1.

What is the maximum height of a binary tree with n nodes? If the tree is stretched out like a linked list, e.g.

then the maximum height is just n-1. It will turn out later that we don't like trees like this; we prefer short trees.

A more interesting question is, what is the minimum height of a binary tree with n nodes? To answer this question, we must determine what configuration of nodes will yield the minimum height. The tree must be almost complete, that is, a complete binary tree with possibly some of its leaves missing. An almost complete binary tree can have any number of nodes in it (proof of that left as an exercise :-), so we can consider this kind of tree without loss of generality.

The height of an almost complete binary tree with height h is the same as the height of a complete binary tree with height h, and one more than that of a complete binary tree with height h-1.

How many nodes in a complete binary tree of height h-1? Exactly 2^h-1. So an almost complete binary tree of height h must have between 2^h, e.g.:

                          1
                         / \          h = 2, n = 2² = 4
                        2   3
                       / 
                      4

and 2^h+1-1 nodes, e.g.:

                          1
                         / \          h = 2, n = 2²⁺¹-1 = 7
                        2   3
                       / \ / \
                      4  5 6  7

That is, n must be equal to something between 2^h and 2^h+1-1; any almost complete binary tree with n in these limits will have height h. So a binary tree with n nodes has a height of h = floor (log₂ n) (floor(n) means the closest integer less than or equal to n).

How does this help us? Suppose we have important information in a data structure, and we want to find it. We know it is along a path from the beginning to a terminal node (leaf or end of list). If we're talking about a linked list, going from the head to the end of the list takes (n) time. In a binary tree, going from the root to leaf takes (log n). Since log n is much less than n (e.g., log₂ 1,000,000,000 = about 30), we can exploit this property to create efficient searchable data structures.

Binary Search Trees

One such data structure is a binary search tree. The keys in the node must have a total ordering on them. For example, if the key is an integer or float, a total ordering is less than or equal. If the key is a character string, a total ordering is lexicographical ordering (just like alphabetical, except for all of ASCII, not just the letters). Let's just call this ordering <= with the understanding that it is up to the user (or programming language) to define it.

Then a binary search tree is a binary tree that is:

empty, or
A leaf node, or
An internal node with key value x such that all nodes in the left binary search tree are <= x, and x <= all nodes in the right binary search tree.

Here is an example of a binary tree:

                            10
                          /    \
                         5     12
                       /   \     \
                      1     6     13

We'll think of other examples in class.

Searching in a binary search tree is simply binary search. If what you're looking for isn't in the current node, and it is less than the current node, look in the left subtree. Otherwise, look in the right subtree. If you get to a point where there's nowhere left to go, the item isn't in the tree.

Let's look at an implementation of binary search trees, with integer keys and the natural "less than or equal to" total ordering. The tree node structure has space for the key and pointers to the left and right subtrees, just like a normal binary tree. The nodes are unfortunately, but intuitively called bstreenodes:

typedef struct _bstreenode {
	int		k;	/* the key */
	struct _bstreenode 
			*left,	/* left subtree */
			*right;	/* right subtree */
} bstreenode, *bstree;

Searching a Binary Search Tree

Assuming we have some way to create and insert into a bstree, the following function will search for a key k in a bstree t. Again, NULL pointers indicate that there are no children in a particular direction, and we will return a pointer to the found object, or NULL if it isn't found:

int *search_bstree (bstree *t, int k) {
	while (t) {
		if (t->k == k) return &(t->k); /* found it */
		if (k <= t->k) 
			t = t->left;	 /* go left */
		else 
			t = t->right; /* go right */
	}
	return NULL;
}

When the function reaches an empty subtree that should include k, it returns NULL. If it ever finds an internal node or leaf containing k, it returns a pointer to the k field there.

How could we write this recursively?

Inserting into a Binary Search Tree

Here is one way to insert into a bstree:

void insert_bstree (bstree *t, int k) {
	bstree	p;

	p = (bstree) malloc (sizeof (bstreenode));
	p->left = NULL;
	p->right = NULL;
	p->k = k;
	while (*t) {
		if (k <=lt;= (*t)->k) 
			t = &(*t)->left;
		else 
			t = &(*t)->right;
	}
	*t = p;
}

How can we do this recursively? Is it any easier?

Binary Search Tree Traversal

We would like to visit all of the nodes in a binary search tree. Doing this is called doing a tree traversal. Since we know something about the order of the binary search tree, we can print the elements out in order by doing the following:

Visit everything in the left subtree.
Visit the root node.
Visit everything in the right subtree.

If we follow this rule for every node, everyhing is visited in order: everything on the left first, the current thing, then everything on the right. This is called doing an inorder traversal. It is frighteningly easy to code as a recursive C function. Here, we will print all the nodes in a binary tree:

void traverse_tree (bstree t) {
	if (!t) return;
	traverse_tree (t->left);
	printf ("%d\n", t->k);
	traverse_tree (t->right);
}

That was too easy. Now let's see a nonrecursive version of that function. It will assume some implementation of stacks of tree node pointers:

	
void traverse_nr (bstree t) {
	do {
		while (t) {
			push (t);
			t = t->left;
		}
		if (!empty_stack()) {
			t = pop ();
			printf ("%d\n", t->k);
			t = t->right;
		}
	} while (t || !empty_stack());
}

Which one do you like better? The stack-based one can be more efficient, but hard to read and maintain. So, unless you want to squeeze every ounce of computing power out of the machine, I suggest you stick with the recursive version.

There are other kinds of traversals for trees. For instance, suppose we want to delete all the nodes in a tree, to free up storage when we're done. We can try to use an inorder traversal:

void delete_tree (bstree t) {
	if (!t) return;
	traverse_tree (t->left);
	free (t);
	traverse_tree (t->right);
}

but what is wrong with this? The free is done before we refer to the right subtree; this will probably result in a segmentation fault. If we reverse the order of the last two statements, the problem is taken care of. We are no longer visiting the nodes in order, but who cares, since we're deleting them anyway? This is called doing a postorder traversal. And a preorder traversal is where the node is visited before the left and right subtrees.

A General Searching Data Structure

This is all very good and well for integers, but we would like to be able to do this for arbitrary kinds of records, without having to recode binary search trees for each kind of data (this is a good goal, in general). This program, which we will go over in class, implements a general-purpose binary search tree class and uses it for a string sorting algorithm. This program involves passing function pointers as parameters, a technique you may not have seen before.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct _gbstreenode {
	void	*k;		/* pointer to some stuff, 
				 * we don't know what.  the user has
				 * to provide storage here.
				 */
        struct _gbstreenode 
                        *left,  /* left subtree */
                        *right; /* right subtree */
} gbstreenode, *gbstree;

/* search a general tree */

void *search_gbstree (gbstree t, void *k, int (*compar)(void *, void *)) {
	/* compar is a function of two pointers a and b, returning
	 * < 0 if *a < *b, 0 if *a == *b, and > 0 if *a > *b
	 */

	int	c;

        while (t) {
		c = compar (t->k, k);
		if (c == 0) return &(t->k); /* found it */
		if (c < 0)
                        t = t->left;  /* go left */
                else 
                        t = t->right; /* go right */
        }
        return NULL;
}

/* insert a node */

void insert_gbstree (gbstree *t, void *k, int (*compar)(void *, void *)) {
	gbstree	p;
	int	c;

	p = (gbstree) malloc (sizeof (gbstreenode));
	p->left = NULL;
	p->right = NULL;
	p->k = k;
	while (*t) {
		c = compar (k, (*t)->k);
		if (c < 0)
			t = &(*t)->left;
		else 
			t = &(*t)->right;
	}
	*t = p;
}

/* traverse a tree */

void gbsinorder_traverse_tree (gbstree t, void (*visit)(void *)) {
	/* visit is a function accepting a pointer, doing
	 * something (we don't care what) to the data it points to
	 */
	int	c;

	if (!t) return;
	gbsinorder_traverse_tree (t->left, visit);
	visit (t->k);
	gbsinorder_traverse_tree (t->right, visit);
}

void gbspostorder_traverse_tree (gbstree t, void (*visit)(void *)) {
	/* visit is a function accepting a pointer, doing
	 * something (we don't care what) to the data it points to
	 */
	int	c;

	if (!t) return;
	gbspostorder_traverse_tree (t->left, visit);
	gbspostorder_traverse_tree (t->right, visit);
	visit (t->k);
}

/* the 'visit' function for a tree of strings.  just prints out the string */

void print_string (char *s) {
	fputs (s, stdout);
}

/* strcmp(), already in the standard C library, serves for compar */

/* this program reads in strings, then prints them out in sorted order */
	
int main () {
	char	s[100], *p;
	gbstree	t;

	/* empty tree */

	t = NULL;

	/* loop until end of file */

	for (;;) {

		/* get a string */

		fgets (s, 100, stdin);
		if (feof (stdin)) break;

		/* strdup() uses malloc to duplicate the string,
		 * so it will have its own storage in the tree
		 */
		p = strdup (s);

		/* insert, casting strcmp atrociously */

		insert_gbstree (&t, p, (int (*)(void*,void*))strcmp);
	}

	/* traverse tree, casting print_string */

	gbsinorder_traverse_tree (t, (void(*)(void*))print_string);

	/* traverse in postorder, freeing each node */

	gbspostorder_traverse_tree (t, (void(*)(void*))free);
	exit (0);
}