Skip to main content

How to see the HTTP requests being made by pywikibot

To see exactly what HTTP requests were being made, I modified the library so that betamax would record requests.

I was trying to debug an issue in some code for interacting with Wikimedia Commons (see village pump discussion). I had my broken code using httpx, and another Wikimedian had given me working code with pywikibot. I wanted to see the exact HTTP requests that pywikibot was making, so that I could compare them to my code.

I fiddled around with lots of print() statements for a while, before I had a much better idea.

If you dig through the code, you end up in the file http.py, which is where the HTTP request is actually made. It’s using the requests library with a Session object. And in the documentation for comms.http, it explains that you can swap out the Session object if necessary.

If you combine this with the betamax library, you can get an instance of pywikibot that will record all its HTTP requests:

import betamax
import requests
import pywikibot
from pywikibot.comms import http


class RecordingSession(requests.Session):
    def request(self, *args, **kwargs):
        recorder = betamax.Betamax(self, cassette_library_dir="cassettes")

        with recorder.use_cassette("recorded_request", record="all"):
            return super().request(*args, **kwargs)


http.session = RecordingSession()

site = pywikibot.Site("commons", "commons")
site.login()

(In case it’s important later, I’m using betamax 0.9.0 and pywikibot 9.0.0.)