Forum:SH:External link whitelist
Talk0this wiki
This page is an archive of a community-wide discussion. This page is no longer live. Further comments or questions on this topic should be made in a new Senate Hall page rather than here so that this page is preserved as a historic record. —MJ— Holocomm 04:03, November 14, 2012 (UTC)
At the recent Mofference, we voted to create a whitelist of external links for this filter, which prevents external links from being added by anons and new users. It has come to my attention that this has not been done yet, so I'm creating this page to build the whitelist. We're looking for any site that could reasonably and commonly be used as a reference in articles. This is a list of domains, not individual pages, though if only part of a domain is useful, I believe the whitelist can specify one or two subfolders instead. If you think a site should be whitelisted and is not already listed here, just add it to the list with your signature; I know my list is nowhere near complete. If you disagree with whitelisting a site, start a threaded discussion below it, and the admins will decide based on the discussion whether to include it. —MJ— Training Room Saturday, March 17, 2012, 21:53 UTC
(struck sites have been resolved, either by addition to the filter or a decision not to)
starwars.comtheforce.netboards.theforce.net(depending on how the code works, this may be redundant to the one above)- Might be worth pointing out what sites have legitimate subdomains so that we can wildcard them (for example http://*.theforce.net). 1358 (Talk) 22:10, March 17, 2012 (UTC)
darkhorse.comrandomhouse.comstar-wars.suvudu.com—MJ— Training Room Saturday, March 17, 2012, 21:53 UTCfarawaypress.com(JJM's website, I'm sure other authors have websites with production info like his) nayayen★talk 21:56, March 17, 2012 (UTC)bioware.comlucasarts.comswtor.comen.wikipedia.org—Silly Dan (talk) 22:21, March 17, 2012 (UTC)facebook.comStake black msg 01:18, April 23, 2012 (UTC)- Absolutely not. There are very few pages on Facebook that constitute valid sources; we can whitelist those individually. Whitelisting all of Facebook is asking for trouble; that would enable noobs to create pages on their non-notable Facebook fan groups, which was one purpose of the original filter in the first place. —MJ— Training Room Monday, April 23, 2012, 01:27 UTC
- Oops, sorry. I think for a moment I misunderstood the meaning of "whitelist". Kindly disconsider my previous addition. Stake black msg 01:31, April 23, 2012 (UTC)
- No problem. Struck as WONTFIX. —MJ— Jedi Council Chambers Monday, April 23, 2012, 17:29 UTC
- Oops, sorry. I think for a moment I misunderstood the meaning of "whitelist". Kindly disconsider my previous addition. Stake black msg 01:31, April 23, 2012 (UTC)
- Absolutely not. There are very few pages on Facebook that constitute valid sources; we can whitelist those individually. Whitelisting all of Facebook is asking for trouble; that would enable noobs to create pages on their non-notable Facebook fan groups, which was one purpose of the original filter in the first place. —MJ— Training Room Monday, April 23, 2012, 01:27 UTC
facebook.com/darkhorsecomicsfacebook.com/starwarsbooks—MJ— Comlink Saturday, March 17, 2012, 23:01 UTCwikia.com(for the same reason as Wikipedia) —MJ— Holocomm Monday, March 19, 2012, 17:25 UTCpetroglyphgames.com(Empire at War developer)lucasforums.com(Forums for various LucasArts games)ravensoft.com(Jedi Outcast and Jedi Academy developer)amazon.com-- I need a name (Complain here) 16:46, March 31, 2012 (UTC)karentraviss.com(per false positive report) —MJ— Holocomm Tuesday, April 3, 2012, 23:54 UTC- Done. grunny@wookieepedia:~$ 07:10, April 4, 2012 (UTC)
obsidianent.com
OLIOSTER (talk) 19:36, April 25, 2012 (UTC)
disney.go.com/star-tours-adventuresStake black msg 19:46, April 25, 2012 (UTC)
Discussion
Edit
Discussion not about a specific site goes here. —MJ— Training Room Saturday, March 17, 2012, 21:53 UTC
- Alright, I believe I've made a working filter. 1358 (Talk) 22:10, March 17, 2012 (UTC)
- (To clarify, it's log only for now… should we wait for a comprehensive list or should I go ahead and enable it?) 1358 (Talk) 22:12, March 17, 2012 (UTC)
- We already have consensus to enable it, so I say go ahead. Just make sure the warning has the false positive link in it, and we can use those reports to help build the list. In terms of listing subdomains, why not look in
added_linksinstead ofadded_lines? I would think that you could then drop thehttps?://from the test string, eliminating the need to use wildcards to match subdomains. —MJ— War Room Saturday, March 17, 2012, 23:00 UTC- I'm pretty sure added_links is the amount of added links, not their content. Besides, it's not that hard to just go with
https?://*.url.com. It's easily done for every URL —https?://*.(url 1|url 2|...)1358 (Talk) 23:05, March 17, 2012 (UTC)- Actually, a quick check shows that it is actually a list of links (warning: long page). :) —MJ— Council Chambers Saturday, March 17, 2012, 23:08 UTC
- Yeah, I guess. Not sure what exactly the benefits are over the above mentioned syntax, though. (Especially considering I have no idea how to actually use it. added_links rlike "urls"?) 1358 (Talk) 23:14, March 17, 2012 (UTC)
- (That) is what I would assume. As for benefits, it eliminates the potential false negative that could result when a bad link is added to a line that already contains a whitelisted link (the preexisting link would show up in
added_linesbut notadded_links). —MJ— Training Room Saturday, March 17, 2012, 23:18 UTC- I guess this should do the trick, then? 1358 (Talk) 23:24, March 17, 2012 (UTC)
- Should work, though it might be worth testing to be sure. —MJ— Comlink Saturday, March 17, 2012, 23:27 UTC
- Prefixing it with a subdomain still isn't caught by the filter (using added_links). 1358 (Talk) 23:45, March 17, 2012 (UTC)
- I just created my own test wiki to try a few things out myself, but am waiting on a staff member to turn the abusefilter on. I'll see what I can come up with there. —MJ— Jedi Council Chambers Sunday, March 18, 2012, 04:25 UTC
- Alright, this is what I came up with. The check to see whether any links have been added now uses added_links to eliminate false negatives where an existing links is replaced with a bad link. Everything in the whitelist above is in there. Subdomains of whitelisted domains are allowed through, so it's not necessary to individually whitelist every single subdomain. If it becomes necessary to blacklist individual subdomains, that will have to be done via whitelisting only the good subdomains (an example of this can be seen in the linked filter on the starwars.com listing); I was not able to figure out how to blacklist an individual subdomain while still whitelisting the rest of the domain. —MJ— Holocomm Monday, March 19, 2012, 17:25 UTC
- So basically what we'd have is...
- Alright, this is what I came up with. The check to see whether any links have been added now uses added_links to eliminate false negatives where an existing links is replaced with a bad link. Everything in the whitelist above is in there. Subdomains of whitelisted domains are allowed through, so it's not necessary to individually whitelist every single subdomain. If it becomes necessary to blacklist individual subdomains, that will have to be done via whitelisting only the good subdomains (an example of this can be seen in the linked filter on the starwars.com listing); I was not able to figure out how to blacklist an individual subdomain while still whitelisting the rest of the domain. —MJ— Holocomm Monday, March 19, 2012, 17:25 UTC
- I just created my own test wiki to try a few things out myself, but am waiting on a staff member to turn the abusefilter on. I'll see what I can come up with there. —MJ— Jedi Council Chambers Sunday, March 18, 2012, 04:25 UTC
- Prefixing it with a subdomain still isn't caught by the filter (using added_links). 1358 (Talk) 23:45, March 17, 2012 (UTC)
- Should work, though it might be worth testing to be sure. —MJ— Comlink Saturday, March 17, 2012, 23:27 UTC
- I guess this should do the trick, then? 1358 (Talk) 23:24, March 17, 2012 (UTC)
- (That) is what I would assume. As for benefits, it eliminates the potential false negative that could result when a bad link is added to a line that already contains a whitelisted link (the preexisting link would show up in
- Yeah, I guess. Not sure what exactly the benefits are over the above mentioned syntax, though. (Especially considering I have no idea how to actually use it. added_links rlike "urls"?) 1358 (Talk) 23:14, March 17, 2012 (UTC)
- Actually, a quick check shows that it is actually a list of links (warning: long page). :) —MJ— Council Chambers Saturday, March 17, 2012, 23:08 UTC
- I'm pretty sure added_links is the amount of added links, not their content. Besides, it's not that hard to just go with
- We already have consensus to enable it, so I say go ahead. Just make sure the warning has the false positive link in it, and we can use those reports to help build the list. In terms of listing subdomains, why not look in
- (To clarify, it's log only for now… should we wait for a comprehensive list or should I go ahead and enable it?) 1358 (Talk) 22:12, March 17, 2012 (UTC)
!("autoconfirmed" in user_groups|added_links like ""|lcase(added_links) rlike "((forums.|blogs.|www.|https?://)starwars.com|theforce.net|darkhorse.com|randomhouse.com|star-wars.suvudu.com|farawaypress.com|bioware.com|lucasarts.com|swtor.com|en.wikipedia.org|wikia.com|facebook.com/(darkhorsecomics|starwarsbooks))") & ((count("http://", string(added_lines)) + count("https://", string(added_lines))) > (count("http://", string(removed_lines)) + count("https://", string(removed_lines))))
Plus an exempt for the report page (article_prefixedtext == "Wookieepedia:Spamfilter problems"). 1358 (Talk) 21:22, March 20, 2012 (UTC)
- Almost. The
& ((count("http://", string(added_lines)) + count("https://", string(added_lines))) > (count("http://", string(removed_lines)) + count("https://", string(removed_lines))))is no longer necessary; I replaced it with theadded_links like ""condition, which simply checks whether added_links is empty or not. That eliminates false negatives where an existing link is replaced with a bad link. I tossed the report page into the test filter linked in my previous comment here to show how it would look and quickly tested it to confirm it works. The(forums.|blogs.|www.|https?://)check was removed; it was merely to give an example of whitelisting specific subdomains and isn't needed at this time. —MJ— Council Chambers Tuesday, March 20, 2012, 21:33 UTC- Ah, handy. Just so that I don't misunderstand: It exempts if (a) user is autoconfirmed (b) added_links = null (c) added_links contains any of the whitelisted URLs (d) the page in question is the report page. I'll edit the filter here and put up the report page tomorrow unless someone else gets to it. 1358 (Talk) 21:39, March 20, 2012 (UTC)
- Correct. The filter will stop the edit only if all four of those conditions are false. —MJ— Comlink Tuesday, March 20, 2012, 21:43 UTC
- It's been implemented. Thanks for all the help, and feel free to point out whatever I might've missed. :P Cheers, 1358 (Talk) 19:49, March 21, 2012 (UTC)
- I tweaked the report page a little and copied over Wikipedia's edit filter false positive report system, adjusting it for our purposes. This way, each report provides one-click access to the filter logs for both the user and the page, so admins can see exactly what the user was trying to do. It's also more user-friendly for newbies than a blank edit box. —MJ— Council Chambers Wednesday, March 21, 2012, 21:53 UTC
- It's been implemented. Thanks for all the help, and feel free to point out whatever I might've missed. :P Cheers, 1358 (Talk) 19:49, March 21, 2012 (UTC)
- Correct. The filter will stop the edit only if all four of those conditions are false. —MJ— Comlink Tuesday, March 20, 2012, 21:43 UTC
- Ah, handy. Just so that I don't misunderstand: It exempts if (a) user is autoconfirmed (b) added_links = null (c) added_links contains any of the whitelisted URLs (d) the page in question is the report page. I'll edit the filter here and put up the report page tomorrow unless someone else gets to it. 1358 (Talk) 21:39, March 20, 2012 (UTC)
- I also just noticed that the original filter is still enabled; the whitelist filter won't work properly until the original one is disabled, as the original one could stop edits before they get to the whitelist filter. —MJ— Council Chambers Saturday, March 17, 2012, 23:30 UTC
- Yeah, I know—planning on just pasting the new filter over the old. 1358 (Talk) 23:45, March 17, 2012 (UTC)
- OK. :) —MJ— Jedi Council Chambers Saturday, March 17, 2012, 23:55 UTC
- Yeah, I know—planning on just pasting the new filter over the old. 1358 (Talk) 23:45, March 17, 2012 (UTC)
Another question: What would we name the false positive report page? Wookieepedia:Spamfilter problems maybe? 1358 (Talk) 15:03, March 19, 2012 (UTC)