love to code .net

blogging mostly about one of my favorite subjects, programming

BlogEngine.NET Post List User Control Creates Blank Pages

Today I was looking at Google's Webmaster Tools for my blog.  I noticed there are 11 URLs in my site map and only 6 are indexed.  I started wondering why and that led to looking at what has been indexed.  So I started looking at Google's index of my site, there are 189 pages indexed.  It seemed like a lot considering with this post I've made 10 posts.  I started looking at the links and noticed there were some huge page numbers.

Here's a screen shot of my posts tagged Windows Live Writer.

BadCrawlPage

Notice how it's at page 246.  When I look at the cache of the page there are not back and forward links.  So I found another lower page for posts tagged Windows Live Writer.

BadCrawlPage2

I looked at that and saw the previous and next page links, but absolutely no post content on the the page, as you can see here.

 GoogleCachePage12

Fixing Paging

Here is the BindPosts method of the PostList User Control (found in ~/User Controls/PostList.ascx.cs).  This is what looks at the page number to decide what posts to show and whether to show the next and previous page links.

1: /// <summary>
2: /// Binds the list of posts to individual postview.ascx controls
3: /// from the current theme.
4: /// </summary>
5: private void BindPosts()
6: {
7:     if (Posts == null || Posts.Count == 0)
8:     {
9:         hlPrev.Visible = false;
10:         return;
11:     }
12:
13:     List<IPublishable> visiblePosts = Posts.FindAll(delegate(IPublishable p) { return p.IsVisible; });
14:
15:     int count = Math.Min(BlogSettings.Instance.PostsPerPage, visiblePosts.Count);
16:     int page = GetPageIndex();
17:     int index = page * count;
18:     int stop = count;
19:     if (index + count > visiblePosts.Count)
20:         stop = visiblePosts.Count - index;
21:
22:     if (stop < 0 || stop + index > visiblePosts.Count)
23:     {
24:         hlPrev.Visible = false;
25:         hlNext.Visible = false;
26:         return;
27:     }
28:
29:     string query = Request.QueryString["theme"];
30:     string theme = !string.IsNullOrEmpty(query) ? query : BlogSettings.Instance.Theme;
31:     string path = Utils.RelativeWebRoot + "themes/" + theme + "/PostView.ascx";
32:     int counter = 0;
33:
34:     foreach (Post post in visiblePosts.GetRange(index, stop))
35:     {
36:         if (counter == stop)
37:             break;
38:
39:         PostViewBase postView = (PostViewBase)LoadControl(path);
40:         postView.ShowExcerpt = BlogSettings.Instance.ShowDescriptionInPostList;
41:         postView.Post = post;
42:         postView.ID = post.Id.ToString().Replace("-", string.Empty);
43:         postView.Location = ServingLocation.PostList;
44:         posts.Controls.Add(postView);
45:         counter++;
46:     }
47:
48:     if (index + stop == Posts.Count)
49:         hlPrev.Visible = false;
50: }

Changing line 48 to this will prevent the previous button from being displayed when there are no previous visible posts remaining.

1:     if (index + stop >= visiblePosts.Count)

Fixing Google

Now that we've fixed the paging in the post list control.  It would be nice if we could start fixing Google's cache the next time they crawl the site.  Replacing lines 22 - 27 with this, will redirect bad page request to the error404.aspx page.

1:     if (index < 0 || index > visiblePosts.Count)
2:     {
3:         Response.Redirect(Utils.RelativeWebRoot + "error404.aspx", true);
4:     }

Now when anyone tries to browse with the bad page numbers they will be redirected to the 404 error page.  Unfortunately, this isn't enough to get Google to remove the page from the cache, but it is a friendly error for our human readers.  The error404.aspx page returns a status of 200 or OK.  So now we need to update the error404.aspx page to return a status of 403 or Not Found, telling Google the page doesn't exist, which will eventually cause the page to be removed from their cache.  Here is the Page_Load method in the error404.aspx.cs file.

1: protected void Page_Load(object sender, EventArgs e)
2: {
3:   if (Request.QueryString["aspxerrorpath"] != null && Request.QueryString["aspxerrorpath"].Contains("/post/"))
4:   {
5:     DirectHitSearch();
6:     divDirectHit.Visible = true;
7:   }
8:   else if (Request.UrlReferrer == null)
9:   {
10:     divDirectHit.Visible = true;
11:   }
12:   else if (Request.UrlReferrer.Host == Request.Url.Host)
13:   {
14:     divInternalReferrer.Visible = true;
15:   }
16:   else if (GetSearchKey() != string.Empty)
17:   {
18:     SearchTerm = GetSearchTerm(GetSearchKey());
19:     BindSearchResult();
20:     divSearchEngine.Visible = true;
21:   }
22:   else if (Request.UrlReferrer != null)
23:   {
24:     divExternalReferrer.Visible = true;
25:   }
26:
27:   Page.Title += Server.HtmlEncode(" - " + "Page not found");
28: }

Inserting the following at line 28, will return a status code of 404 or Not Found.  This code will tell Google and other search engines that the page does not exist.

1:   Page.Response.StatusCode = 404;

License and Code

The code in this post is released under the Microsoft Reciprocal License, which is the license for BlogEngine.NET.

kick it on DotNetKicks.com

Comments

Add comment


(Will show your Gravatar icon)  

  Country flag

biuquote
  • Comment
  • Preview
Loading