I have developed a ASP.NET HttpHandler in C# that produces a Google sitemap. Originally I did it as a feature request for Subtext but later realized it’s completely generic and reusable.
So I’m also publishing it’s code here: Download .NET 2.0 source code.
If you don’t know what sitemap is – here is a quote from Google site:
The Sitemap Protocol allows you to inform search engines about URLs on your websites that are available for crawling. In its simplest form, a Sitemap that uses the Sitemap Protocol is an XML file that lists URLs for a site. The protocol was written to be highly scalable so it can accommodate sites of any size. It also enables webmasters to include additional information about each URL (when it was last updated; how often it changes; how important it is in relation to other URLs in the site) so that search engines can more intelligently crawl the site.
Here is a sample sitemap for a taste, if you want to know more, be sure to read protocol specification.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
If you wan’t to see the real thing, you can take a look at my blogs sitemap: http://www.vidmar.net/weblog/sitemap.ashx
The whole thing is implement as a custom HttpHandler. Let’s do the registration of custom handler at the beginning so we don’t forget it. Put something like this in your web.config HandlerConfiguration section:
<HttpHandler pattern="Sitemap.ashx" type="Subtext.Web.SiteMap.SiteMapHttpHandler, Subtext.Web" handlerType="Direct"/>
I could use XmlWritter or even use WriteLine() method to generate Xml content (some other implementations you’ll find on the net use that approach), but I decided to do it "the right way". I represented the base Url structure with a class and collection as a List<>, decorated those with some Xml serialization attributes so that the class and members names are violating good naming practices.
[XmlElementAttribute(ElementName = "lastmod", DataType="date")]
public DateTime LastModified
{
get { return lastModified; }
set { lastModified = value; }
}
When needed I XmlSerialize the class and I get just the right output. Here is a sample – “lastmod” field:
XmlSerializer serializer = new XmlSerializer(typeof(UrlCollection));
XmlTextWriter xmlTextWriter = new XmlTextWriter(context.Response.OutputStream, Encoding.UTF8);
serializer.Serialize(xmlTextWriter, urlCollection);
Usage of the class is really simple... Create instance, fill UrlList with the pages you want to report to google, figure out your priorities and last changed dates and you're set - submit your sitemap to Google and start enjoying better indexation! (is that a even a word?!)
UrlCollection urlCollection = new UrlCollection();
// Let's add home page
Url homePage = new Url(Config.CurrentBlog.HomeFullyQualifiedUrl, DateTime.Now, ChangeFrequency.Daily, 1.0M);
urlCollection.Add(homePage);
And that’s it. I wanted to implement it on si.blogs tonight but I realized it’s .NET 1.1. Bummer. Well, some other winter night, I guess.