Earth Hour

Friday, November 27, 2009

Using BFS to solve the minimum number moves problems

Often in various programming contests we find problems related to calculating minimum moves for some toy problems like "8-puzzle problem", where the step cost to move a block from 1 location to other is constant for every move (or analogous). Generally such problems can be tackled by BFS (it means Breadth First Search, for my friends who didn't know earlier) technique. Well, the concept is pretty simple for BFS. it refers to traversing a tree level wise, i.e. first level 0, then level 1, then 2, 3...
/ \
2 3
/ \ / \
4 5 6 7
For a tree like the above, a BFS traversal guarantees the traversal order: 1->2->3->4->5->6->7
Though in a linked implementation of a tree, the BFS traversal is a bit trickier but for an array representation this is nothing but a straight traversal of the array sequentially.
Since in most of the programming problems, generally the tree is not known at the beginning, therefore the limit of the total number of nodes is uncertain. But, theres nothing to worry much, for the ArrayList in Java and vector classe in C++ are the way out. Well a generalized BFS traversal may look something like this, when the code to remove the repeated states is added to the main BFS logic:
(Note: its just a pseudo code)

fringe= a queue
closed= a set containing visited nodes
while( ! fringe.empty())
node n=fringe.pop();
return solution(n);
//add all children of n to fringe

Based on this below is my C++ implementation for finding out the minimum move sequence to solve the 8-puzzle problem. In this problem given a 3*3 matrix with numbers 1..8 filled in random 8 places among 9 in the matrix with 1 block empty. Our goal is to slide a block at a time in the empty place and finally arrange the puzzle back to sorted order, just like this
1 2 3
4 5 6
7 8 0
(Note: 0 indicates the empty block)

Here is my complete C++ implementation

Sunday, November 8, 2009

An Effective way of removing duplicates in a file

I remember just a few days back I was asked to design a program to remove duplicates from a file. Well the problem seems quite straight forward and simple to approach, but the real hidden question here lies in, how effective can anyone come up with an algorithm to solve this problem. Initially thinking on the tracks of using hashtables, maps, sets etc. I finally landed up designing a custom 'tree' implementation for getting almost the optimal solution which would remove duplicates from a file in a single iteration! (or in O(n) time, for my geek friends! )

Well while surfing internet, I later found that the tree concept I used to solve this problem is pretty much similar to a standard implementation called 'Suffix tree'. Here I am not going to explain you about suffix trees, you can find plenty of matter on this once you google it, rather, I would be explaining how my solution to this problem worked. Well, I don't claim it to be the most optimal implementation, I am sure, someone out there may come up with a solution better than mine. But this one solves the purpose just fine.

The logic here is to keep building the custom tree (which now we know is similar to suffix tree) as we go through the text. Here is the basic algorithm:
1) Iterate through the text and do the following steps
1.1) As the beginning of a new word is encountered, start matching it starting from the root node to its children, successively. along with buffering the current word as we proceed.
1.2) If a match is found, try to match the next character in the word with its children
1.3) If a match is not found, add another node representing the current character as a child to the current node.
1.4) When the word ends, and the last character is matched, check its 'final' tag.
1.4.1) if its true, it means the current word is the repeated one, so don't output the current buffer word
1.4.2) if its false, it means the current word is encountered for the first time, so output the current buffer and flush the buffer.

Eg. let the text be "CAT CAN BAT". The tree for this sentence will be -

/ \
/ \ \
T* N* T*

Note: the (*) mark represents the final state.

Here goes my C/C++ implementation