Tagged with “tumblr”


Python snippets: Is a URL from a Tumblr post?

I’ve been writing some code recently that takes a URL, and performs some special actions if that URL is a Tumblr post. The problem is working out whether a given URL points to Tumblr.

Most Tumblrs use a consistent naming scheme: username.tumblr.com, so I can detect them with a regular expression. But some Tumblrs use custom URLs, and mask their underlying platform: for example, http://travelingcolors.net or http://wordstuck.co.vu. Unfortunately, I encounter enough of these that I can’t just hard-code them, and I really should handle them properly.

So how can I know if an arbitrary URL belongs to Tumblr?

I’ve had to do this a couple of times now, so I thought it was worth writing up what to do – partly for my future reference, partly in case anybody else finds it useful.

In the HTTP headers on a Tumblr page, there are a couple of “X-Tumblr” headers. These are custom headers, defined by Tumblr – they aren’t part of the official HTTP spec. They aren’t documented anywhere, but it’s clear who’s sending them, and I’d be quite surprised to see another site send them. For my purposes, this is a sufficiently reliable indicator.

So this is the function I use to detect Tumblr URLs:

try:
    from urllib.parse import urlparse
except ImportError:  # Python 2
    from urlparse import urlparse

import requests


def is_tumblr_url(url):
    if urlparse(url).netloc.endswith('.tumblr.com'):
        return True
    else:
        req = requests.head(url)
        return any(h.startswith('X-Tumblr') for h in req.headers)

It’s by no means perfect, but it’s a step-up from basic string matching, and accurate and fast enough that I can usually get by.


Finding even more untagged posts on Tumblr

When I wrote my original script for finding untagged Tumblr posts, I expected it to be a one-off. I never expected to write a dedicated site, or for that site to become the most popular thing I’ve ever made. I’ve been flattered by some of the emails and tweets I’ve received about the site.

But I’ve also been letting it stagnate. I’ve been putting off a steady trickle of bug reports and feature requests, and the site was getting rough around the edges. On Monday, I inadvertently broke the site completely with some changes on the blog, so I decided that it was finally time to fix it.

This is a fairly major update, which I’m calling “v2.0”. It’s a ground-up rewrite that makes the site much simpler and easier to maintain.

Along with a fresh coat of paint and lots of bug fixes, there are a few new features:

As always, the URL is http://finduntaggedtumblrposts.com and the code is on GitHub.

Feedback, bug reports, etc. can be sent via email or on Twitter.


Notes on Tumblr

The most popular thing I’ve ever written is my site for finding untagged Tumblr posts. I have a few small changes, a new way to filter posts, and some other thoughts on using Tumblr.

Read more →


Updates to my site for finding untagged Tumblr posts

About two weeks ago, I took a family holiday to Oslo. When I came back, I found that my site for finding untagged Tumblr posts had received a lot of traffic while I was gone. I’m flattered that so many people have found it useful.

This heavy usage also exposed several bugs in the original design. The site would become unresponsive if there were lots of untagged posts (sometimes in the tens of thousands). I’ve pushed out an update to fix this: you can click “Do you have lots of posts?” to limit the number of posts that get shown. This should fix any bugs with browsers freezing up.

If you have any other problems or suggestions, then please get in touch.

The rest of this post explains the major changes.

Read more →


Finding untagged posts on Tumblr, redux

One of the most popular posts on this site is Finding untagged posts on Tumblr, but it’s not exactly… friendly. Asking people to download a script and register an API key can look sufficiently daunting that a lot of people probably don’t try.

I wanted a simple turnkey solution. My idea was that people could go to a website, type in their Tumblr URL and click a single button to get a list of all your untagged posts. And now, that exists:

http://finduntaggedtumblrposts.com/

If you go to that URL, then you should get a nice list of all your untagged posts. I hope it’s useful.

If you find any bugs, or a page it doesn’t seem to work for, then please get in touch.

Read more →


Finding untagged posts on Tumblr

Yesterday one of my friends was going through her old Tumblr posts, trying to add tags to every post. If you have any more than a handful of posts, then this becomes tedious and difficult, and you’ve no guarantee that you tagged them all when you’re done. Tumblr doesn’t have a built-in way to list all of your untagged posts, so I wrote a script to poll the Tumblr API, and get a list of post URLs which didn’t have tags.

Doing a Google search for this topic, it seems that this is a fairly common problem, so I thought I’d post the script here for other people to use.

Read more →


Podcast feeds on Tumblr

On episode 12 of the Accidental Tech Podcast, Marco, Casey and John were discussing podcasts. They were comparing them to blogging, the way more people wanted to have their own podcast, and how the tools for making your own podcast compare to those for making your own blog. A little over an hour in, Marco mentioned a useful Easter egg in Tumblr for podcasters.

Marco was the co-founder and lead developer at Tumblr for about three and a half years. Every so often, he drops a nugget like this on one of his podcasts, telling you about an undocumented feature of Tumblr that turns out to be really useful. Since I haven’t read this one mentioned anywhere else, I decided to break it out into a nice, Google-searchable blog post.

Skip to the 1 hr 3 min mark for the relevant segment. Marco:

Tumblr also supports podcast hosting if you host the files elsewhere. I believe if you go to any Tumblr site /podcast, or maybe /podcast/RSS, it will give you an iTunes compatible podcast feed of any audio posts that are externally hosted.

I went to have a play, and this feature still works. You want the first, rather than the second, extension. Adding /podcast to a Tumblr site gives you an RSS feed for the podcast. If you a Tumblr site under your own domain name, then adding /podcast to that works just as well.

Handy!

Read more →