Friday, 24 August 2007

Here in the Natural Search department we use Excel. A lot. And frequently we are taking data from the web (such as from the rather good Open Site Explorer) and manhandling it in order to make more sense of it. As part of this process, it’s often helpful to be able to quickly pull out the domain portion of a URL. For example, let’s say you have a list of URLs like this:

You might find yourself in a position where want to take this list and reduce it to just the domain portions like this:


It’s possible to use an Excel formula to do this. Here’s one I found by searching:


This is fine, but as you can see the formula is rather long and combining this with any further functions would be a real headache. To try to solve this, I wrote an add-In for Excel that adds a handy function to perform this step (along with a few others).

Download the URL Tools Excel Add-in HereTo use the Add-in, you’ll need to save it somewhere on your computer (we use ‘C:\Program Files\URL Tools\’ by convention, which helps when sharing workbooks) and install the add-in via the Excel Options menu as follows:
1. Click on the Office Button in Excel

2. Select ‘Excel Options’ from the bottom of the menu

3. Select ‘Add-Ins’ from the left hand menu in the resulting dialog box

4. Click ‘Go’ where it says ‘Manage Excel Add-ins’

5. Click ‘Browse’ and browse to wherever you saved the Add-in file

6 Click ‘OK‘ in the appropriate places to return to Excel

Hopefully, you will now have the Add-in installed. To test this, do the following: Put a URL in cell A1. Any URL, but try something long like Then, in cell A2 type ‘=wwwsubdomain(A1)’. All being well, this formula should return ‘’. If so, celebrate, otherwise go back and check the installation steps again.

There are a number of other functions included in the Add-in that I will summarise below. There are a couple of options for handling the domain extraction outlined above, but they are subtly different. If in doubt, you probably just want to use ‘=wwwsubdomain’.

  • subdomain() – returns the domain part of a URL including any subdomains e.g ‘’ becomes ‘
  • nowww() – removes ‘www.’ from the start of a URL (note that www must be at the very beginning of the URL)
  • wwwsubdomain() – basically, a combination of the above, equivalent to ‘nowww(subdomain(A1))’. This is what you want to use most of the time!
  • urlencode() – Encodes a URL string (try it on a string like ‘Hello World!’)
  • urldecode() – Un-encodes a URL string (try it on a string like ‘Hello%20World%21’)
  • tld() – Returns the TLD of a URL (not 100% perfect but pretty good)
  • geturl() – Extracts the URL from a Hyperlink
I hope you find this useful. It’s very much a work in progress so please let me have your feedback or questions in the comments section below.