June 19, 2004

What's up with MSN Search?

Posted at June 19, 2004 05:41 AM

Two of the most oft asked questions these days in the Webmaster community are "What's up with all of this spidering by MSN's bot?" followed quickly by "When will they start using their own data?"

The first answer is easy. Those MSNbot entries you see in your log files is Microsoft's prototype spider gathering information to use in their database of web sites. Both to build it, since they're starting from scratch; and to build something they can use for internal testing.

MSNbot certainly appears to be a voracious little critter, snapping up everything it can find on a site. Especially the first few times through a site. There have been numerous reports over the last few months of MSNbot crawling like a radoactive spider, jumping from page to page quickly and eating up a lot of bandwidth.

Some webmasters have complained about this, mainly because it's putting them over their bandwidth allowance. There have even been a few reports of MSNbot hitting a site so hard that it's slowing down the server. Most of that IMHO is because those sites are on crappy servers. ;-)

MSN has allowed you way to better control how MSNbot behaves, short of simply blocking it from your site altogether. Basically, they've appropriated a non-standard robots.txt directive that you can use to slow the bot down. Usage is as follows in case you need it:

User-Agent: msnbot
Crawl-Delay: 20

The above instructs MSNbot to pause for 20 seconds between pages. So rather than grabbing several pages in 20 seconds time, it will retrieve only one. You can of course use any numerical value in the Crawl-Delay entry.

Personally, I've had no problems with MSNbot's crawl rate. Even with my dynamic sites its not been putting a strain on my server. For a few weeks there it did grab an AWFUL LOT of information, but bandwidth is cheap these days so that's not an issue for me. Now that it has a good idea of what my sites are all about and how they're structured, things seem to be pretty normal. Still a little above what the other search engines are grabbing, but not out of line by any stretch of the imagination.

Last week's bandwidth stats from one site are:

MSNbot: 11.58 MB
Googlebot: 5.75 MB
Yahoo! Slurp: 4.05 MB
Teoma/Ask Jeeves: 1.09 MB

I can certainly live with that, especially considering this particular site uses over 700 megs of disk space. Plus, I'd be much more worried if MSNbot wasn't spidering my sites completely ! From the looks of it, they're certainly trying to make sure they have a large database when they start rolling out their own search technology.

Next comes the question of when that will happen.

Speculation has run rampant. Even prior to MSNbot's first appearance earlier this year. Since its arrival on the scene, the rumors have reached an almost fever pitch.

Early on there were hints that it would be coming in 2005. The most recent official mention of a time frame I've seen is a post by MSNdude (a Microsoft employee) over at Webmaster World that simply stated "later this year."

If I were a betting man I would say it's going to be sooner rather than later, assuming all of the technology parts and pieces are in place. It's a many pronged problem.

First, the database has to be in place. Microsoft has obviously spent the last few months scouring the web building this database. With the varied reports of how active MSNbot has been I have to believe that they're reasonably comfortable with that portion of the equation, but who knows. Maybe they want to make a really big splash by being able to say that they have twice as many documents in their index as Mighty Google.

The second part of the equation is the actual Search Technology. The algorithm and the speed with which search results are returned. And can it stand up to the massive usage requests it will likely see from Day 1, both by normal users and search engine marketers who are trying to get a handle on what MSN Search needs for a page to rank highly.

To my knowledge no one outside of Redmond has seen this side of things yet. However it would not surprise me one bit if they're very, very close to having the search side of things ready to rollout. You have to remember that Microsoft has already been working on this for some while. Not only for MSN Search, but also for their Longhorn project, the next iteration of Windows.

They've got skilled programmers and they certainly have the funding to hire others if needed. And there are other factors in the mix that may spur a sooner-than-originally-predicted release.

Google's pending IPO. Yahoo's recent rollout of their own search technology. The AOL contract that should be expiring soon.

My thought is that if there is any way Microsoft can get MSN Search ready for prime time early, they'll do it. This isn't like rolling out a new operating system. Not only is it considerably less complex from the programming side of things, but there is also competition.

In the O/S market there is really no pressure for them to rush production because there is basically no competition. The same is not true in Search. There is plenty of competition, and the competitors are Hungry to beat MS to the punch. If you don't believe that one, read Google's IPO filing where they specifically mention Microsoft and the search tool they're building, and how this could have a detrimental effect on Google's business and corporate valuation.

Microsoft is coming into the search race in 3rd place, as opposed to being the unchallenged champion they are in the O/S market. Being late to the game MS certainly doesn't want to let everyone else steal all of the thunder for the rest of this year.

So while I don't have a specific date in mind, my gut tells me it'll be sooner rather than later.

Comments

Posting of new comments has been disabled for this post.