DWITE Online Computer Programming Contest
January 2008
Problem 3
Don't follow my links

There's a lot of spam on the internet -- blog comments, forum posts, etc., all done for the purpose of planting enough links and influencing search engines such as Google to think that a certain page is more important than it should be. One of the solutions is to mark untrusted links with rel="nofollow" tag, telling spiders to ignore the link. A sample link might look like:

<a href="http://compsci.ca/" title="Computer Science Canada" rel="nofollow">sample link</a>
		

The goal is to write a program that will find all the links in a text file, and insert nofollow tags properly. rel="" should be inserted as the last property of the link, unless it already exists. nofollow tag should be inserted last in the rel= string, unless it already exists. Rel could have multiple tags, space separated. Refer to the sample input for examples.

The input file DATA3.txt will contain five lines of text, each will contain one link, in the form <a*>*</a>. Links might be surrounded by filler text. Each line will be no more than 255 characters long.

The output file OUT3.txt will contain five lines -- just the parsed links.

Sample Input:
This is a <a>sample link</a>.
<a rel="" href="http://dwite.org/">link with rel</a>
<a href="http://compsci.ca/" rel="nofollow">link with no follow</a>
<a href="http://compsci.ca/blog" rel="external">more rels</a>
text <a href="http://compsci.ca/v3/viewforum.php?f=131" title="">link</a> more text
		        
Sample Output:
<a rel="nofollow">sample link</a>
<a rel="nofollow" href="http://dwite.org/">link with rel</a>
<a href="http://compsci.ca/" rel="nofollow">link with no follow</a>
<a href="http://compsci.ca/blog" rel="external nofollow">more rels</a>
<a href="http://compsci.ca/v3/viewforum.php?f=131" title="" rel="nofollow">link</a>