UESPWiki talk:Spam Blacklist
The UESPWiki – Your source for The Elder Scrolls since 1995
Just for my own future reference, IP addresses that (so far) have created w/index.php and similar spam pages have been:
- 58.26.9.158
- 61.55.135.167
- 64.34.173.105
- 66.150.105.20
- 66.232.102.40
- Used twice, has been permanently blocked
- 67.52.216.4
- 70.125.70.79
- 72.232.220.58
- Used twice, has been permanently blocked
- 84.122.153.198
- 85.141.160.197
- 87.7.50.126
- 91.124.102.108
- 91.124.105.245
- 91.124.107.107
- 91.124.216.239
- 125.241.82.132
- 193.138.70.54
- 200.57.146.153
- 201.141.59.181
- 202.205.109.57
- 203.26.171.235
- 211.237.184.25
- 213.140.58.187
- 213.183.252.218
- 218.72.250.54
- 219.235.2.72
- 221.185.124.84
- 222.149.135.210
The pages that have been created:
- Tamriel Talk:Books/w/index.php
- 5 times, about to create empty non-editable page
- UESPWiki talk:Community Portal/w/wiki/UESPWiki:Administrator Noticeboard/
- 5 times, about to create empty non-editable page
- User_talk:Hoggwild5/w/index.php
- 2 times
- User_talk:Nephele/Sandbox/w/index.php
- 2 times
- UESPWiki_talk:Community_Portal/w/index.php
- 6 times, blocked with empty page
- Talk:W/index.php
- 6 times, blocked with empty page
- Talk:Main_Page/
- 3 times, blocked with empty page
- User_talk:Booyah_boy/w/index.php
- 1 time (but this is the earliest instance: 27 November)
I scanned through the deletion logs back to November 3, which I'm pretty sure was before this spam vandalism pattern first appeared. (Because the created pages are being deleted, this can't easily be checked by looking at the IP address' edit history).
If anyone has suggestions for how to deal with these guys, feel free to post them. These are the options that I can think of:
- We could block each IP address as soon as it's been used once for spam, but IP addresses are very rarely used, so it seems pointless to block an IP address after it's been used and abandoned by the spammer. We're blocking IP addresses after they've been used twice, but there are only two cases of that so far. I created this list just so I can check for re-use again more quickly in future.
- The only repeat offenders so far have been on UESPWiki_talk:Community_Portal/w/wiki/UESPWiki:Administrator_Noticeboard/, both times two days apart.
- We could create warnings on every IP address page the first time it's used for spam so it would be easier to identify repeat offenders. If we want to do this, we should probably set up a template to make it as easy as possible.
- I've created non-editable pages at three of these locations (and I'll do it for two more, too). At most it causes a brief pause in activity, then they pick a few new target pages.
- There have been a couple intervals where I've added all the advertised websites to the spam blacklist page, but really they don't seem to repeat any more often than the IP addresses do. So I'm starting to feel like adding them all is just wasting the server's time (it has to check through this entire list every time anybody posts any edit).
Other random thoughts:
- Wikipedia is evidently having similar problems. They've been creating non-editable pages at frequently targeted pages, too. But I don't see that they've got any great solution to this problem.
- The wiki software is not set up to block the creation of pages matching any particular pattern. At least not the last time I checked.... This vandalism pattern may cause that feature to get added to a future release, though.
- I don't see how the spammers are really benefiting from this activity. I'm not sure if the search engines (google, yahoo, etc) would even download these files when they're first created (our robots.txt file disallows files with /w/ in their name... I need to do research whether that just means files that start with /w/ or if it's anywhere in the name). Definitely as soon as someone blanks the page, it's disallowed to the robots (the only way to access the history is via a page name that starts with /w/). Of course, rogue badly-behaved robots/spiders/etc could in theory grab the files anyway, but all the popular search engines follow the rules. And once they're deleted, they're definitely not available any more.
--Nephele 15:52, 17 February 2007 (EST)
- The part that confuses me is the whole "index.php" thing. Admittedly, I know next to nothing about PHP, but how is it that using this name for the page makes any sense at all? They're not actually uploading PHP files, just creating pages with names that look like PHP files. Is this some attempt to fool the site into thinking this file is part of it? And does this site even have a "legit" index.php file associated with it? If not, it might be worth using that as a criterion for blocking (although you say that the wiki software cannot block pages matching a pattern like that - hopefully next version will allow this, especially if Wikipedia is having the same problem.) As far as another solution - what happened to the idea we had before of not allowing IP editors or new accounts to create pages? Seems to me that would be a pretty effective way of blocking these. Allow edits to anyone, but you must have an account registered for 3 days before being able to create new pages. (This would also help abate the problem of site-newbies creating pages in the wrong namespace and such, as well as preventing incidents like that vandal who impersonated you several months ago, since they couldn't create a new user page until they'd settled in.) --TheRealLurlock Talk 16:24, 17 February 2007 (EST)
-
- Yes, the site has an index.php file, which is actually the core file for the entire wiki software. If you look at the URL whenever you edit a page or look at the history or do anything other than a plain page view, you'll see that the URL is http://www.uesp.net/w/index.php (followed by a ? and then whatever parameters are being given to make up the command). And even on a plain page view, behind the scenes it gets converted into a call to index.php. So basically every single wiki request goes through the master index.php file.
- Not that that explains in any way why spammers want to create pages with index.php in their names... maybe they're hoping that someone will mistype the name of a real page and end up at their bogus page?
- Good point though about anonymous IPs not being allowed to create pages. I hadn't even remembered that ;) That has actually been implemented. But the loophole they're exploiting is that anonymous IPs are still allowed to create talk pages (and explains why all of these have been talk pages... I didn't clue into that until now). And I think we should be cautious about any change to that policy... I'd want a pretty serious discussion about the pros and cons beforehand. If you prevent anonymous IPs from creating a talk page then you're forcing them to post any comments they might have on the article page in any case where a talk page hasn't previously been created. So the solution ideally would be to only allow anons to create talk pages when a corresponding article already exists, but from what I remember seeing it hits the same problem as just blocking any page with /w/ in its title: the wiki software doesn't have a way to selectively prevent pages from being created. --Nephele 17:30, 17 February 2007 (EST)
[edit] China
I really hate to say this, but could we just block ALL links to .cn sites? The vast majority of recent spam has been links to Chinese websites, and I really doubt that we have a significant readership in China. Even on the General:Links page, there is only one link to a Chinese site, way down at the bottom, and it does not use the .cn TLD. Other than that, I can't really think of any legitimate reason to post an external link to a Chinese site here, and it would be a much more efficient way of blocking a lot of this spam. Does this make sense? Or is it just a little too extreme? --TheRealLurlock Talk 11:19, 30 September 2007 (EDT)

![[Content is available under Attribution-ShareAlike]](http://www.uesp.net/w/images/Somerights.png)