Google searches on wiki act dumb

Discussion about the site's wikis, including bugs/issues encountered.
User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA

Google searches on wiki act dumb

Post by blargg »

I've noticed that Google searches on the Wiki using site:wiki.nesdev.com often act dumb. For example, when I search for MMC3, none of the 21 hits is the main MMC3 page, even though it has MMC3 in the title. Using intitle:MMC3 gives zero hits (even if I add MMC3 again as a normal search string).

Is there something telling Google to skip documents? At least a week or two ago, I was searching for things on nesdevwiki and Google had lots of hits to the now-nonexistent site. Maybe that has something to do with it, like Google thinks the new site is a spam mirror or something, I dunno. I see no robots.txt, so that wouldn't be it (some bad entry in one or something).
User avatar
Banshaku
Posts: 2417
Joined: Tue Jun 24, 2008 8:38 pm
Location: Japan

Post by Banshaku »

I don't know enough about media wiki to give an answer but could it be related to the latest spam links that we received about essays that could affect google since it must be a common spam link?

Maybe there is a way to check on google to see how nesdev is affected by this but I don't know about that too. I will see if I can give it a look.
tepples
Posts: 22994
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Post by tepples »

The robots.txt is a 404, and there don't appear to be any robots directives in meta elements. Nor are internal links using nofollow.
User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA

Post by blargg »

I just noticed that Google is also getting hits within the skins/ directory on the Wiki. I'm thinking that should have an empty index.html in it. Otherwise one gets useless hits, even when restricting via a site: as mentioned in an earlier message.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu »

blargg wrote:I just noticed that Google is also getting hits within the skins/ directory on the Wiki. I'm thinking that should have an empty index.html in it. Otherwise one gets useless hits, even when restricting via a site: as mentioned in an earlier message.
This is because of the secure MPM we use for Apache. A more appropriate fix would be to place an .htaccess file in /home/ndwiki/www/w which contains:

Code: Select all

Options -Indexes
...which disables automatic directory generation listings for any directory therein which lacks an index.php/index.html/etc. document. The end result is the web client receiving an HTTP 403 Forbidden.

I've put said .htaccess in place; verified as working. It may take a few weeks before Google picks up the changes, as their crawler sometimes takes a while to notice such things.