Showing posts with label url shortening. Show all posts

Creating a Java URL shortener

A long time ago I posted a brief article about using Google's URL shortening API to easily create shortened URLs for your application (e.g. if you wanted to post stuff server side to twitter and cared about character usage, you could fire your URL over to Google's API and just use the result).

However, recently I have had to think about how to shorten URLs myself, and really, it's pretty easy to implement yourself.


Creating a unique, repeatable identifier for a URL
I think a lot of people's first instinct might be to go for hashing the URL string - this isn't a good idea for a few reasons though:
  • Length - most normal hashing algorithms (md5/sha-*) produce long strings, which kind of goes against the point of a url shortener
  • Unique-ness - Obviously, if this is going to be a URL identifier then it needs to be unique, and hashes by their very nature are not unique - which means you would need to handle the scenario where a URL creates an already used hash and has an alternative
  • Look-up - as hashes are not (easily) reversible, you would need to look up the URL using the hash as the db key - which may not be ideal given a very large set of URLs (imagine number of URLs bitly.com has)


Thankfully there is a viable, easy solution available.


Lets first think about our database structure for persisting our URLs - In the simplest case we could probably get by with two columns:
  • id (DB generated sequence ID)
  • url - text field to capture the URL value

Generating the identifier from a DB
  1. Now, if you provide a String URL value, your code just needs to insert it into the table, this will create the row and the unique ID.
  2. Next, fetch that unique numeric ID, and convert it to base-62 (this will convert the numeric value into the base-62 representation (rather than normal base10, it will allow 0-9, a-z, A-Z as characters.  This gives you both an identifier in the form of "1jLPSIv" but also provides a massive ID space that can fit into relatively few characters (that will be part of the shortened URL) - 6 characters of base 62 provides a possible 6^62 different unique combinations (1.7594524073e+48 in total)

You may not want to directly leak the DB ids to the URL, in which case you can easily salt the IDs as you choose appropriate.



Bit.ly example

Looking at bit.ly shortening URLs it would appear that they follow the same pattern. I shortened two of my previous blog post URLs one after another, and the URLs generated as follows:

bit.ly/1oYgrsG

bit.ly/1tR63Zb

If we look at those two url identifiers, they look a lot like base-6, and if we convert the identifiers to base 62(and let's look at base-64 as well, just for funsies they may be using that)

URL - Base64>Base10 - Base62>Base10
1oYgrsG - 3685493160710 - 103119489480
1tR63Zb - 3690751293019 - 107587946123

As you can see, they look reasonable - there's a chance that there were a few URLs created between my two URLs, but I suspect there is a reasonable amount of salting going on so it is just a one-by-one increment. (although interestingly, if you get a bit.ly url and then just increase the last char by one - assuming base62 - then it will likely provide a new url - so you can have your own game of manual-webpage-roulette!)

Google Shortening URLs

As another quick follow on note from the below, one of the features that was built in to the application was the ability to share your resume or particular achievements with your friends on Twitter. To do this, I obviously wanted to share a link back to the URL of the resume, so to maximise the potential additional text I investigated URL shortening.

Their is a bit.ly API that uses OAuth, but for what I wanted to do, I decided that was overkill, as I didn't necessarily need to associate the shortened URLs to a users bit.ly account, all I really cared about was getting a shortened URL.

Fortunately, Google came to the rescue with their goo.gl URL shortening service that also exposes a public API without need for authentication.

So I simply wrote a service class that utilised the Spring RestTemplate class to shorten URLs:

@Service("urlShortenService")
public class UrlShortenService {

       private RestTemplate restTemplate;

       public UrlShortenService() {
              restTemplate = new RestTemplate(ClientHttpRequestFactorySelector.getRequestFactory());
              List<HttpMessageConverter<?>> messageConverters = new ArrayList<HttpMessageConverter<?>>();
              messageConverters.add(new StringHttpMessageConverter());
              messageConverters.add(new MappingJacksonHttpMessageConverter());
              restTemplate.setMessageConverters(messageConverters);
       }
	   
       public String shortenUrl(String url) {
              Map<String, String> request = new HashMap<String, String>();
              request.put("longUrl", url);
              LinkedHashMap<String, String> shortUrl = restTemplate.postForObject("https://www.googleapis.com/urlshortener/v1/url", request, LinkedHashMap.class);
              return shortUrl.get("id");
       }
}


I didn't worry too much about validating that the string passed in was a URL for the time being as I always had control of that, but that should be something that would need to be considered.