- 11 Talk
Xwing328/Pywikipedia
< Xwing328
These instructions should help you get started using the Python Wikipedia Robot Framework. Please keep in mind that I do not keep this up to date, and the Pywikipedia and MediaWiki softwares are constantly changing. Before proceeding, please visit Wookieepedia:Droids. If you are looking at this from another wiki, replace 'starwars' in the following instructions with the name of your wiki. For example, if you are on Zeldapedia at http://zelda.wikia.com, replace it with 'zelda'. Also, the family names in starwars_family.py have to be configured individually for each wiki. Finally, be sure to have your community's consensus before attempting to create/run a bot.
Contents |
Getting started
Edit
- Install Python version 2.6 (or greater) from http://www.python.org
- I would suggest installing it at C:\Python26
- Download the latest CVS snapshot:
- Unzip the snapshot (use a program such as 7-Zip) and place the files at C:\Python26\pywikipedia
The following steps are out-of-date and should be used as guidelines only.
- Using Notepad or a similar text-editing program, create a file called user-config.py in the Pywikipedia folder you just created. You are in the correct directory if you see the file config.py
- In the file, insert the following:
family = 'starwars' mylang = 'en' usernames['starwars']['en'] = 'Whistler'
- Replace bot_username with your bot's name and save (be sure NOT to save it as a .txt file).
- Locate the families directory under Pywikipedia. Create a file called starwars_family.py and insert the following content:
# -*- coding: utf-8 -*-
import family
# Wookieepedia, the Star Wars wiki
class Family(family.Family):
def __init__(self):
family.Family.__init__(self)
self.name = 'starwars'
self.langs = {
'en':'starwars.wikia.com',
'bg':'bg.starwars.wikia.com',
'cs':None,
'da':'da.starwars.wikia.com',
'de':None,
'el':'el.starwars.wikia.com',
'es':'es.starwars.wikia.com',
'fi':'fi.starwars.wikia.com',
'fr':'fr.starwars.wikia.com',
'hr':'hr.starwars.wikia.com',
'hu':'hu.starwars.wikia.com',
'it':'it.starwars.wikia.com',
'ja':'ja.starwars.wikia.com',
'ko':'ko.starwars.wikia.com',
'nl':'nl.starwars.wikia.com',
'no':'no.starwars.wikia.com',
'pl':None,
'pt':'pt.starwars.wikia.com',
'ro':'ro.starwars.wikia.com',
'ru':'ru.starwars.wikia.com',
'sk':'sk.starwars.wikia.com',
'sl':'sl.starwars.wikia.com',
'sv':'sv.starwars.wikia.com',
'tr':'tr.starwars.wikia.com',
'zh-hk':'zh-hk.starwars.wikia.com',
}
# Most namespaces are inherited from family.Family.
self.namespaces[4] = {
'_default': u'Wookieepedia',
}
self.namespaces[5] = {
'_default': u'Wookieepedia talk',
}
self.namespaces[100] = {
'_default': u'Forum',
}
self.namespaces[101] = {
'_default': u'Forum talk',
}
alphabetic = [
'en', 'bg', 'cs', 'da', 'de', 'el', 'es',
'fr', 'hr', 'hu', 'it', 'ja', 'ko', 'nl',
'no', 'pl', 'pt', 'ro', 'ru', 'sk', 'sl',
'fi', 'sv', 'tr', 'zh-hk',
]
self.interwiki_putfirst = {
'en': alphabetic,
}
self.disambiguationTemplates['en'] = ['disambig']
def hostname(self,code):
return 'starwars.wikia.com'
def path(self, code):
return '/index.php'
def version(self, code):
return "1.15.5"
def scriptpath(self, code):
return "/wiki"
def apipath(self, code):
return '/api.php'
Bot bug fixes
Edit
- Bot adds
to every page. Fix courtesy of uberfuzzy at SourceForge.net.- In wikipedia.py, find the line
return self._contents - On the line directly above it, add the EXACT following (Note: There should be exactly eight spaces before this line - no tabs or other indentations):
- In wikipedia.py, find the line
self._contents = re.sub('', '', self._contents)
- Trouble logging in
- Open login.py in a text-editing program. Locate the following line:
if len(L) == 4:Change the== 4to>= 4
- Open login.py in a text-editing program. Locate the following line:
Running scripts
Edit
Now that your bot is installed and configured, try running some scripts. A summary on command line options can be found in the .py files themselves.
- Open Command Prompt (or a similar program).
- Start, Run, cmd, OK
- Navigate to the Pywikipedia directory:
- Type: cd C:\\Program Files\Python 2.5\Pywikipedia
- Login to the wiki. This only needs to be done once, NOT every time you run your bot!
- Type: login.py (Follow the instructions to enter the bot's password)
- You can now run any other script you desire. I would suggest starting small, by testing an edit to your user page.
Fixes.py
Edit
Fixes.py is a file that can be edited with customized scripts. It is run by invoking the following python command: replace.py -fix:FIX_NAME (replacing FIX_NAME with the appropriate name. The following is an example of a site-wide script that I run. Keep in mind that it might not be up to date with current site standards. Please do not alter or run this script unless you are very aware of your actions and the possible consequences. Note that the formatting of fixes has changed in more recent versions of Pywikipedia. Additionally, fixes like the one below should now be placed in user-fixes.py.
# Add tag wookiee-safe replacements
# By en:User:Xwing328 - http://starwars.wikia.com/wiki/User:Xwing328
# python replace.py -fix:wookiee-safe -recursive -namespace:0 -start:!
'wookiee-safe':{
'regex': True,
'msg': {
'en':u'Droid: Cleanup, Formatting, Template fixes',
},
'replacements': [
#Capitalization - No known bugs
#Categories
(u'category:', u'Category:'),
(u'Category: ', u'Category:'),
(u'Category:a', u'Category:A'),
(u'Category:b', u'Category:B'),
(u'Category:c', u'Category:C'),
(u'Category:d', u'Category:D'),
(u'Category:e', u'Category:E'),
(u'Category:f', u'Category:F'),
(u'Category:g', u'Category:G'),
(u'Category:h', u'Category:H'),
(u'Category:i', u'Category:I'),
(u'Category:j', u'Category:J'),
(u'Category:k', u'Category:K'),
(u'Category:l', u'Category:L'),
(u'Category:m', u'Category:M'),
(u'Category:n', u'Category:N'),
(u'Category:o', u'Category:O'),
(u'Category:p', u'Category:P'),
(u'Category:q', u'Category:Q'),
(u'Category:r', u'Category:R'),
(u'Category:s', u'Category:S'),
(u'Category:t', u'Category:T'),
(u'Category:u', u'Category:U'),
(u'Category:v', u'Category:V'),
(u'Category:w', u'Category:W'),
(u'Category:x', u'Category:X'),
(u'Category:y', u'Category:Y'),
(u'Category:z', u'Category:Z'),
#Exceptions
(u'Category:ZZip', u'Category:zZip'),
#Puts categories on separate lines
(r'\[\[Category:(.*?)\]\] ?\[\[', r'[[Category:\1]]\n[['),
#Switching interwiki before cat to cat before interwiki
(r'\[\[([a-z][a-z]):(.*?)\](.*?)(\[\[|\n\[\[|\n\n\[\[)(C|c)ategory:(.*?)\]\]', r'[[Category:\6]]\n[[\1:\2]]'),
#Adds a line between categories and interwiki links
(r'\[\[Category:(.*?)\](.*?)\n\[\[([a-z][a-z]):', r'[[Category:\1]\2\n\n[[\3:'),
#Alphabetize categories (to some extent)
(r'\[\[Category:(B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:A(.*?)\]\]', r'[[Category:A\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:B(.*?)\]\]', r'[[Category:B\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:C(.*?)\]\]', r'[[Category:C\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:D(.*?)\]\]', r'[[Category:D\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:E(.*?)\]\]', r'[[Category:E\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:F(.*?)\]\]', r'[[Category:F\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:G(.*?)\]\]', r'[[Category:G\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:H(.*?)\]\]', r'[[Category:H\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:I(.*?)\]\]', r'[[Category:I\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:J(.*?)\]\]', r'[[Category:J\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:K(.*?)\]\]', r'[[Category:K\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:L(.*?)\]\]', r'[[Category:L\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(N|O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:M(.*?)\]\]', r'[[Category:M\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(O|P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:N(.*?)\]\]', r'[[Category:N\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(P|Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:O(.*?)\]\]', r'[[Category:O\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(Q|R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:P(.*?)\]\]', r'[[Category:P\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(R|S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:Q(.*?)\]\]', r'[[Category:Q\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(S|T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:R(.*?)\]\]', r'[[Category:R\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(T|U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:S(.*?)\]\]', r'[[Category:S\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(U|V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:T(.*?)\]\]', r'[[Category:T\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(V|W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:U(.*?)\]\]', r'[[Category:U\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(W|X|Y|Z)(.*?)\](.*?)\n\[\[Category:V(.*?)\]\]', r'[[Category:V\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(X|Y|Z)(.*?)\](.*?)\n\[\[Category:W(.*?)\]\]', r'[[Category:W\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(Y|Z)(.*?)\](.*?)\n\[\[Category:X(.*?)\]\]', r'[[Category:X\4]]\n[[Category:\1\2]]'),
(r'\[\[Category:(Z)(.*?)\](.*?)\n\[\[Category:Y(.*?)\]\]', r'[[Category:Y\4]]\n[[Category:\1\2]]'),
#Alphabetize interwiki links - (bg|da|de|es|fr|it|ja|hu|nl|pl|pt|ru|sl|fi|sv|zh-hk)
(r'\[\[(da|de|es|fr|it|ja|hu|nl|pl|pt|ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[bg:(.*?)\]\]', r'[[bg:\4]]\n[[\1:\2]]'),
(r'\[\[(de|es|fr|it|ja|hu|nl|pl|pt|ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[da:(.*?)\]\]', r'[[da:\4]]\n[[\1:\2]]'),
(r'\[\[(es|fr|it|ja|hu|nl|pl|pt|ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[de:(.*?)\]\]', r'[[de:\4]]\n[[\1:\2]]'),
(r'\[\[(fr|it|ja|hu|nl|pl|pt|ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[es:(.*?)\]\]', r'[[es:\4]]\n[[\1:\2]]'),
(r'\[\[(it|ja|hu|nl|pl|pt|ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[fr:(.*?)\]\]', r'[[fr:\4]]\n[[\1:\2]]'),
(r'\[\[(ja|hu|nl|pl|pt|ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[it:(.*?)\]\]', r'[[it:\4]]\n[[\1:\2]]'),
(r'\[\[(hu|nl|pl|pt|ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[ja:(.*?)\]\]', r'[[ja:\4]]\n[[\1:\2]]'),
(r'\[\[(nl|pl|pt|ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[hu:(.*?)\]\]', r'[[hu:\4]]\n[[\1:\2]]'),
(r'\[\[(pl|pt|ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[nl:(.*?)\]\]', r'[[nl:\4]]\n[[\1:\2]]'),
(r'\[\[(pt|ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[pl:(.*?)\]\]', r'[[pl:\4]]\n[[\1:\2]]'),
(r'\[\[(ru|sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[pt:(.*?)\]\]', r'[[pt:\4]]\n[[\1:\2]]'),
(r'\[\[(sl|fi|sv|zh-hk):(.*?)\](.*?)\n\[\[ru:(.*?)\]\]', r'[[ru:\4]]\n[[\1:\2]]'),
(r'\[\[(fi|sv|zh-hk):(.*?)\](.*?)\n\[\[sl:(.*?)\]\]', r'[[sl:\4]]\n[[\1:\2]]'),
(r'\[\[(sv|zh-hk):(.*?)\](.*?)\n\[\[fi:(.*?)\]\]', r'[[fi:\4]]\n[[\1:\2]]'),
(r'\[\[(zh-hk):(.*?)\](.*?)\n\[\[sv:(.*?)\]\]', r'[[sv:\4]]\n[[\1:\2]]'),
#Images
(u'image:', u'File:'),
(u'File: ', u'File:'),
(u'File:a', u'File:A'),
(u'File:b', u'File:B'),
(u'File:c', u'File:C'),
(u'File:d', u'File:D'),
(u'File:e', u'File:E'),
(u'File:f', u'File:F'),
(u'File:g', u'File:G'),
(u'File:h', u'File:H'),
(u'File:i', u'File:I'),
(u'File:j', u'File:J'),
(u'File:k', u'File:K'),
(u'File:l', u'File:L'),
(u'File:m', u'File:M'),
(u'File:n', u'File:N'),
(u'File:o', u'File:O'),
(u'File:p', u'File:P'),
(u'File:q', u'File:Q'),
(u'File:r', u'File:R'),
(u'File:s', u'File:S'),
(u'File:t', u'File:T'),
(u'File:u', u'File:U'),
(u'File:v', u'File:V'),
(u'File:w', u'File:W'),
(u'File:x', u'File:X'),
(u'File:y', u'File:Y'),
(u'File:z', u'File:Z'),
#Templates
(u'{{a', u'{{A'),
(u'{{b', u'{{B'),
(u'{{c', u'{{C'),
(u'{{d', u'{{D'),
(u'{{e', u'{{E'),
(u'{{f', u'{{F'),
(u'{{g', u'{{G'),
(u'{{h', u'{{H'),
(u'{{i', u'{{I'),
(u'{{j', u'{{J'),
(u'{{k', u'{{K'),
(u'{{l', u'{{L'),
(u'{{m', u'{{M'),
(u'{{n', u'{{N'),
(u'{{o', u'{{O'),
(u'{{p', u'{{P'),
(u'{{q', u'{{Q'),
(u'{{r', u'{{R'),
(u'{{s', u'{{S'),
(u'{{t', u'{{T'),
(u'{{u', u'{{U'),
(u'{{v', u'{{V'),
(u'{{w', u'{{W'),
(u'{{x', u'{{X'),
(u'{{y', u'{{Y'),
(u'{{z', u'{{Z'),
(u'{{Era\|', u'{{Eras|'),
#Heading fixes - No known bugs
# Everything case-insensitive (?i)
#(r'(?i)== ?Behind the scenes ?==', r'==Behind the scenes=='),
(r'(?i)== ?Behind the scenes ?==', r'==Behind the scenes=='),
(r'(?i)== ?(References|Notes and references) ?==', r'==Notes and references=='),
(r'(?i)== ?See also ?==', r'==See also=='),
(r'(?i)== ?External links? ?==', r'==External links=='),
(r'(?i)== ?Dramatis personae ?==', r'==Dramatis personae=='),
(r'(?i)== ?Other characters ?==', r'==Other characters=='),
(r'(?i)== ?Powers (and|&) abilities ?==', r'==Powers and abilities=='),
(r'(?i)== ?Personality (and|&) traits ?==', r'==Personality and traits=='),
(r'(?i)== ?Organizations (and|&) titles ?==', r'==Organizations and titles=='),
(r'(?i)== ?Starships (and|&) vehicles ?==', r'==Vehicles and vessels=='),
(r'(?i)== ?Vehicles (and|&) vessels ?==', r'==Vehicles and vessels=='),
(r'(?i)== ?Weapons (and|&) technology ?==', r'==Weapons and technology=='),
(r'(?i)== ?Sapient species ?==', r'==Sentient species=='),
#Miscellanea
(r'\|\}\}', u'}}'),
(u'\{\{Eras\}\}', u'{{Eras|}}'),
(u' {1,}=', u'='),
(u'= {1,}', u'='),
(r'\<=([0-9])', r'<= \1'), #Correcting space removal from equals sign
(r'\>=([0-9])', r'>= \1'), #Correcting space removal from equals sign
(r'\[\[ ([a-zA-Z0-9])', r'[[\1'), #Removes space before beginning of link
(r'([a-zA-Z0-9]) \]\]', r'\1]]'), #Removes space before end of link
#(r'\|height=(.*) meters', r'|height=\1 [[meter]]s'), #Add link to meters in infobox
#Quotes
(r'\{\{Quote\|(.*)((?<!\'))(")<br[ /]*>(")((?!\'))', r"{{Quote|\1\2''\3<br />\4''\5"), #Adds italics between line breaks in quotes
#Dashes
(r'\[\[(.*?) (A|B)BY\]\](—|-| - )\[\[(.*?) (A|B)BY\]\]', r'[[\1 \2BY]]–[[\4 \5BY]]'), #date-date NOT TESTED
(r'(?!( |\[|A|B))\?&(m|n)dash;\[\[([0-9]{1,3}) (BBY|ABY)\]\]', r'—[[\3 \4]]'), #?-date
(r'\[\[([0-9]{1,3}) (BBY|ABY)\]\]&(m|n)dash;\?(?!( |\[|A|B))', r'[[\1 \2]]—'), #date-?
(r'(BBY|ABY)\|([0-9]{1,3})\]\]—\[\[([0-9]{1,3}) (BBY|ABY)\]\]', r'\1|\2]]–[[\3 \4]]'), #Piped dates
(r'([0-9]{1,3}) (BBY|ABY)\]\]—\[\[([0-9]{1,3}) (BBY|ABY)\]\]', r'\1 \2]]–[[\3 \4]]'), #Non-piped dates
(r'([0-9]{1,3}) (BBY|ABY)\]\] - \[\[([0-9]{1,3}) (BBY|ABY)\]\]', r'\1 \2]]–[[\3 \4]]'), #Normal dash dates
#Date fixes - Removes grammatically incorrect commas between month and year, and adds links if needed
(r'(January|February|March|April|May|June|July|August|September|October|November|December), ([1-2][0-9][0-9][0-9])', r'[[\1]] [[\2]]'),
(r'(January|February|March|April|May|June|July|August|September|October|November|December), \[\[([1-2][0-9][0-9][0-9])\]\]', r'[[\1]] [[\2]]'),
(r'\[\[(January|February|March|April|May|June|July|August|September|October|November|December)\]\], ([1-2][0-9][0-9][0-9])', r'[[\1]] [[\2]]'),
(r'\[\[(January|February|March|April|May|June|July|August|September|October|November|December)\]\], \[\[([1-2][0-9][0-9][0-9])\]\]', r'[[\1]] [[\2]]'),
#Referencing fixes
(u'/>', u' />'),
(u' />', u' />'),
(r' <ref ', r'<ref '),
(r'<div class="references-small"(.*)\n<references /(.*)\n</div>', r'{{Reflist}}'),
(r'<div class="references-small"><references /></div>', r'{{Reflist}}'),
(r'<references />', r'{{Reflist}}'),
#External links
(r'\[\[(?P<url>https?://[^\]]+?)\]\]', r'[\g<url>]'), # external link in double brackets
(r'\[\[(?P<url>https?://.+?)\]', r'[\g<url>]'), # external link starting with double bracket
# external link and description separated by a dash, with whitespace in front of the dash, so that it is clear that the dash is not a legitimate part of the URL.
(r'\[(?P<url>https?://[^\|\] \r\n]+?) +\| *(?P<label>[^\|\]]+?)\]', r'[\g<url> \g<label>]'),
# dash in external link, where the correct end of the URL can be detected from the file extension. It is very unlikely that this will cause mistakes.
(r'\[(?P<url>https?://[^\|\] ]+?(\.pdf|\.html|\.htm|\.php|\.asp|\.aspx|\.jsp)) *\| *(?P<label>[^\|\]]+?)\]', r'[\g<url> \g<label>]'),
],
'exceptions': {
'text-contains': [
r'\{\{[Ii]nuse',
],
}
},
|