Weeding dead RSS feeds with processing OPML

opml_icon RSS feeds are somewhat similar to bookmarks. Unlike bookmarks they don’t require manual attention (that’s the point ) and when you subscribe to dozens (hundreds) of resources it’s easy to end up with plenty of dead feeds. Depending on reader used they might stay low and in no hurry to announce they are no longer working.

Fortunately feeds are merely links to RSS files that can be processed and checked.

Process outline

Basically you need tool to check list of links and list of links itself. Main problem is making these two lists the same.

Exporting to OPML

OPML is XML format that is often used for storing feed links. Most feed readers have options to import and export in this format, just look around in menus or help. Result looks like this:

<?xml version="1.0" encoding="utf-8"?>

<opml version="1.0">

<head>

<title>Newsfeeds exported from Opera Mail/9.52 (Win32)</title>

</head>

<body>

<outline text="www.Rarst.net" type="rss"

xmlUrl="http://feeds.rarst.net/rarst-posts" title="www.Rarst.net"/>

</body>

</opml>

Converting to list of links

I am not aware of any dedicated link checkers that take OPML as input. Still it’s text and plain text is good. We need to get list of URLs which is simply one URL per line of text file and which most of link checkers take as input easily.

Can be done manually with text editor such as Notepad++ or with bit of scripting. I made simple AutoIt script.

$opml = FileRead("opera.opml")

$pattern='xmlUrl="(.*?)"'

$opml=StringRegExp($opml, $pattern, 3)

$txt=""

For $i=0 To UBound($opml)-1

	$txt&=$opml[$i]&@CRLF

Next

FileWrite("opml.txt",$txt)

Exit

Script https://www.rarst.net/script/opml2list.au3

Checking list

Now you only need decent link checker. AM-DeadLink will do just fine.

Note for publishers

It’s common sense to make redirects and such when you are moving web stuff around. Unfortunately most forget to do same with feeds. Don’t rely on default link your CMS provides – you can be on totally different CMS (or domain) in a year. Use FeedBurner or setup gate URL with redirect from the start.

Related Posts

2 Comments

  • Nice script… will check AM-deadlink checker

  • […] Rarst posted a process on his own website that makes it possible to check all feeds for dead links. The process can be broken down to exporting all feeds in an opml file, extracting the links into a new text file and checking the links with a link checker. The link checker in question is AM-Deadlink which we have reviewed previously here on the site. […]

Comments are closed.