Python snippet: dealing with query strings in URLs

Posted 31 August 2016

I spend a lot of time dealing with URLs: in particular, with URL query strings. The query string is the set of key-value pairs that comes after the question mark in a URL. For example:

http://example.net?name=alex&color=red

Typically I want to do one of two things: get the value(s) associated with a particular key, or create a new URL with a different key-value pair.

This is possible with the Python standard library’s urllib.parse module, but it’s a bit fiddly and requires chaining several functions together. Since I do this fairly often, I have a pair of helper functions that I copy-and-paste into new projects when I need to do this. And since it’s fairly generic, I thought it might be worth sharing more widely.

Get the value(s) associated with a particular key

It’s easy to imagine a query string as like a Python dict: it’s a series of key-value pairs. But unlike a dictionary, the same key can appear twice in the same query string. For example, this is perfectly legal:

http://example.net?name=alex&name=anna

So we might have a single value associated with a given key, or multiple values. The most flexible approach is to return a list of values associated with a key, and let the caller decide which (if any of them) they care about. Doing so requires chaining together a couple of standard library functions, and looks like this:

from urllib.parse import parse_qs, urlparse


def get_query_field(url, field):
    try:
        return parse_qs(urlparse(url).query)[field]
    except KeyError:
        return []

I used the term “field” rather than “key” because that’s what the Wikipedia article uses, although I don’t actually see that terminology in RFC 3986.

Setting the value for a particular key

Setting a value takes slightly more work: quite a few calls into bits of urllib, then constructing an entirely new tuple of URL components to “unparse”. This is what the code looks like:

from urllib.parse import parse_qsl, urlencode, urlparse, urlunparse


def set_query_field(url, field, value, replace=False):
    # Parse out the different parts of the URL.
    components = urlparse(url)
    query_pairs = parse_qsl(urlparse(url).query)

    if replace:
        query_pairs = [(f, v) for (f, v) in query_pairs if f != field]
    query_pairs.append((field, value))

    new_query_str = urlencode(query_pairs)

    # Finally, construct the new URL
    new_components = (
        components.scheme,
        components.netloc,
        components.path,
        components.params,
        new_query_str,
        components.fragment
    )
    return urlunparse(new_components)

Because the same key can appear twice, when you’re adding a new key-value pair, you have to decide whether to just add another pair, or replace the existing pair (if it exists): that’s what the replace parameter is for. If it’s False, any existing pairs are preserved, and the new pair is added to the end of the URL. If it’s True, all the existing pairs with the same key are discarded before adding the new pair.

There are definitely some edge cases this doesn’t cover – for example, if the exact ordering matters, or you only want to replace the first instance of an existing key – but in practice, this covers all of my usage. I’ll need extra code if I want something more complicated.

Getting the code

The complete code is in a Gist. That includes docstrings, usage examples, and imports for Python 2/3 compatibility. Only requires standard library imports.

When I use this code, I just copy and paste it into the new project; there’s no PyPI module. It’s too small to matter, and in general I don’t like having lots of small dependencies.