Showing posts with label rest. Show all posts

Building a RESTful API like a Product Manager

Something I have been thinking about for a while is the idea of APIs as your Product - the idea that you should design, build and test your API like it's a public facing product (as it might actually be if its a public API), and what lessons we should be learning from Product Managers about how we achieve this.

In the latest instalment of their "Technology Radar", ThoughtWorks recommended the concept "API as a Product" as ready to trial (although I think it's far beyond trial stage - I can see no real risk of adopting the lessons from Product Management for your API).



I have generally been interested in Product Management for several years - I think it's hard not to get interested in the science of Product Management when you are building side projects, as you naturally have to think about the product you are building - sure, with side projects they can easily turn into vanity projects, where you just build what you think is cool, but I think it's still fun to think about real users and good Product Management techniques that can be applied even to side-projects.

Having tried to gain some understanding, I have turned to some classic material on the subject of Product Management: The Lean StartupDon't Make Me ThinkHow to Build a Billion Dollar App. I have also been lucky enough to attend Mind The Product Conference twice (though volunteering - but managed to watch lots of the talks), and to have worked with several great Product Managers.

So what lessons can we start applying? I think, if one statement sums it up, its an idea from Kathy Sierra:

Does your product make your users awesome?
This is a great way to think about APIs (I'm mostly talking about REST/GraphQL type APIs here - but this thinking is a great way of approaching design for normal class/interface/library APIs too).

As engineers, I'm sure we have all used APIs that don't achieve this - APIs that make us have to think, that make us feel stupid, that seem overly complicated (for example, the old Java Date library - the number of times I had to google just to find out how to simply create a specific date for something simple like a unit test is ridiculous). And hopefully, we have all used APIs that make us feel awesome - APIs that make us productive, that let us do what we expect to be able to do, and can (with the help of the IDE's auto-complete) let us guess what the method calls are.

Another idea Kathy Sierra mentions in that talk, and is also made famous from the title of the Product Management classic, is:

Don't make me think

I have written before about the StackOverflow and GitHub APIs, but they deserve a mention again, if you have ever written an integration with them, you will know.  The APIs are designed in such a way that sensibly models the domain model.

For example, take StackOverflow, if we were going to describe their domain model, what first class objects would we have? I guess Questions, Answers, Comments etc - and sure enough, their API models these:

/questions - will get you all the questions on the site

/answers - will get you all the answers,

Now, how do you think you might get all answers for a given question? Well, a list of answers should probably be a child of a question, so something that represents that, something like:

/questions/{Question ID}/answers

Sure enough, thats the URL - and the behaviour is consistent through the design, I'm sure you can guess how you might retrieve all comments for a given answer.

I'll be honest, developing against an API where I can guess the endpoints makes me feel pretty awesome.

MVP and the Lean Product

The idea of a Minimum Viable Product is a core concept in Lean Startup - the idea that you build the minimum version of the product that can get you feedback as to whether you are heading in the right direction before investing more effort in building out a fuller fledged product.

Whilst I think an appreciation of an MVP is a good awareness for any engineer - I have generally found it is useful in facilitating conversation when building a new product - if as an engineer you recognise some new feature is going to be considerable effort, its helpful to think about what hypothesis the feature is trying to test, and is there a simpler engineering solution to test the hypothesis.

However, the principal of not over engineering the API is also a good one - once you have a well designed domain model, its quite tempting to get carried away with REST design best practices and start building endpoints for all different variations of the model - however, unless you need them, it's better to start with less (if you are building a public REST API from the start then it might be harder to know what isn't needed, but you can still certainly build an MVP version of the API to test the hypothesis - e.g. whether people actually want to use your API)

User Testing

Another important part of building a product is collecting user feedback - Whilst a REST API may not be well suited to A/B testing (your consumers won't be very happy if switch the API design on them after they have implemented against a specific variation!), listening to user feedback is still important - because the consumers will likely be engineers (writing the client code), then hopefully they will be expecting REST standards and good API design principles, but another way to think about this is the importance of not making breaking changes to your API.

A technical way of thinking about this is Consumer Driven Contracts Testing - As RESTful APIs are de-coupled from the client code, the API doesn't have any idea what clients are using or doing, so this approach allows consumers to define what they expect from the API, and then the API can always test against these contracts, so be sure that any change to the API is still in keeping with existing consumers, or users, of the API.

Conclusion

I think, in general, all engineers can benefit from thinking about how good products are built, what concepts are important, and be able to think about the products or services they are building in this way - both in terms of thinking about the actual implementation as a product that someone else will be the user of (either externally if a RESTful API, but also internally if its cross-team or even just someone coming to take over your code once you have moved on).

RESTful API Design: An opinionated guide

This is very much an opinionated rant about APIs, so it's fine if you have a different opinion. These are just my opinions. Most of the examples I talk through are from the Stack Exchange or GitHub API - this is mostly just because I consider them to be well designed APIs that are well documented, have non-authenticated public endpoints and should be familiar domains to a lot of developers.

URL formats

Resources

Ok, lets get straight to one of the key aspects. Your API is a collection of URLs that represent resources in your system that you want to expose. The API should expose these as simply as possible - to the point that if someone was just reading the top level URLs they would get a good idea of the primary resources that exist in your data model (e.g. any object that you consider a first-class entity in itself). The Stack Exchange API is a great example of this. If you read through the top level URLs exposed you will probably find they match the kind of domain model you would have guessed:

/users
/questions
/answers
/tags
/comments
etc

And whilst there is no expectation that there will be anyone attempting to guess your URLs, I would say these are pretty obvious. What’s more, if I was a client using the API I could probably have a fair shot and understanding these URLs without any further documentation of any kind.

Identifying resources

To select a specific resource based on a unique identifier (an ID, a username etc) then the identifier should be part of the URL. Here we are not attempting to search or query for something, rather we are attempting to access a specific resource that we believe should exist. For example, if I were to attempt to access the GitHub API for my username: https://api.github.com/users/robhinds I am expecting the concrete resource to exist.

The pattern is as follows (elements in square braces are optional):
/RESOURCE/[RESOURCE IDENTIFIER]

Where including an identifier will return just the identified resource, assuming one exists, else returning a 404 Not Found (so this differs from filtering or searching where we might return a 200 OK and an empty list) - although this can be flexible, if you prefer to return an empty list also for identified resources that don’t exist, this is also a reasonable approach, once again, as long as it is consistent across the API (the reason I go for a 404 if the ID is not found is that normally, if our system is making a request with an ID, it believes that the ID is valid and if it isn't then its an unexpected exception, compared to if our system was querying filtering user by sign-up dates then its perfectly reasonable to expect the scenario where no user is found).

Subresources

A lot of the time our data model will have natural hierarchies - for example StackOverflow Questions might have several child Answers etc. These nested hierarchies should be reflected in the URL hierarchy, for example, if we look at the Stack Exchange API for the previous example:
/questions/{ids}/answers

Again, the URL is (hopefully) clear without further documentation what the resource is: it is all answers that belong to the identified questions.

This approach naturally allows many levels of nesting as necessary using the same approach, but as many resources are top level entities as well, then this prevents you needing to go much further than the second level. To illustrate, let’s consider we wanted to extend the query for all answers to a given question, to instead query all comments for an identified answer - we could naturally extend the previous URL pattern as follows
/questions/{ids}/answers/{ids}/comments

But as you have probably recognised, we have /answers as a top level URL, so the additional prefixing of /questions/{ids} is surplus to our identification of the resource (and actually, supporting the unnecessary nesting would also mean additional code and validation to ensure that the identified answers are actually children of the identified questions)

There is one scenario where you may need this additional nesting, and that is when a child resource’s identifier is only unique in the context of its parent. A good example of this is Github’s user & repository pairing. My Github username is a global, unique identifier, but the name of my repositories are only unique to me (someone else could have a repository the same name as one of mine - as is frequently the case when a repository is forked by someone). There are two good options for representing these resources:

  1. The nested approach described above, so for the Github example the URL would look like:
    /users/{username}/repos/{reponame}

    I like this as it consistent with the recursive pattern defined previously and it is clear what each of the variable identifiers is relating to.

  2. Another viable option, the approach that Github actually uses is as follows:
    /repos/{username}/{reponame}

    This changes the repeating pattern of {RESOURCE}/{IDENTIFIER} (unless you just consider the two URL sections as the combined identifier), however the advantage is that the top level entity is what you are actually fetching - in other words, the URL is serving a repository, so that is the top level entity.

Both are reasonable options and really come down to preference, as long as it's consistent across your API then either is ok.

Filtering & additional parameters

Hopefully the above is fairly clear and provides a high level pattern for defining resource URLs. Sometimes we want to go beyond this and filter our resources - for example we might want to filter StackOverflow questions by a given tag. As hinted at earlier, we are not sure of any resources existence here, we are simply filtering - so unlike with an incorrect identifier we don’t want to 404 Not Found the response, rather return an empty list.
Filtering controls should be entered as part of the URL query parameters (e.g. after the first ? in the URL). Parameter names should be specific and understandable and lower case. For example:
/questions?tagged=java&site=stackoverflow

All the parameters are clear and make it easy for the client to understand what is going on (also worth noting that https://api.stackexchange.com/2.2/questions?tagged=awesomeness&site=stackoverflow for example returns an empty list, not a 404 Not Found). You should also keep your parameter names consistent across the API - for example if you support common functions such as sorting or paging on multiple endpoints, make sure the parameter names are the same.

Verbs

As should be obvious in the previous sections, we don’t want verbs in our URLs, so you shouldn’t have URLs like /getUsers or /users/list etc. The reason for this is the URL defines a resource not an action. Instead, we use the HTTP methods to describe the action: GET, POST, PUT, HEAD, DELETE etc.

Versioning

Like many of the RESTful topics, this is hotly debated and pretty divisive. Very broadly speaking, two approaches to define API versioning is:
  • Part of the URL
  • Not part of the URL
Including the version in the URL will largely make it easier for developers to map their endpoints to versions etc, but for clients consuming the API it can make it harder (often they will have to go and find-and-replace API URLs to upgrade to a new version). It can also make HTTP caching harder - if a client POSTs to /v2/users then the underlying data will change, so the cache for GET-ting users from /v2/users is now invalid, however, the API versioning doesn’t affect the underlying data so that same POST has also invalidated the cache for /v1/users etc. The Stack Exchange API uses this approach (as of writing their API us based at https://api.stackexchange.com/2.2/)

If you choose to not include the version in your API then two possible approaches are HTTP request headers or using content-negotiation. This can be trickier for the API developers (depending on framework support etc), and can also have the side affect of clients being upgraded without knowing it (e.g. if they don’t realise they can specify the version in the header, they will default to the latest).  The GitHub API uses this approach https://developer.github.com/v3/media/#request-specific-version

I think this sums it up quite nicely:


Response format

JSON is the RESTful standard response format. If required you can also provide other formats (XML/YAML etc), which would normally be managed using content negotiation.

I always aim to return a consistent response message structure across an API. This is for ease of consumption and understanding across calling clients.

Normally when I build an API, my standard response structure looks something like this::

[ code: "200", response: [ /** some response data **/ ] ]

This does mean that any client always needs to navigate down one layer to access the payload, but I prefer the consistency this provides, and also leaves room for other metadata to be provided at the top level (for example, if you have rate limiting and want to provide information regarding remaining requests etc, this is not part of the payload but can consistently sit at the top level without polluting the resource data).

This consistent approach also applies to error messages - the code (mapping to HTTP Status codes) reflects the error, and the response in this case is the error message returned.

Error handling

Make use of the HTTP status codes appropriately for errors. 2XX status codes for successful requests, 3XX status codes for redirecting, 4xx codes for client errors and 5xx codes for server errors (you should avoid ever intentionally returning a 500 error code - these should be used for when unexpected things go wrong within your application).

I combine the status code with the consistent JSON format described above.

Google Shortening URLs

As another quick follow on note from the below, one of the features that was built in to the application was the ability to share your resume or particular achievements with your friends on Twitter. To do this, I obviously wanted to share a link back to the URL of the resume, so to maximise the potential additional text I investigated URL shortening.

Their is a bit.ly API that uses OAuth, but for what I wanted to do, I decided that was overkill, as I didn't necessarily need to associate the shortened URLs to a users bit.ly account, all I really cared about was getting a shortened URL.

Fortunately, Google came to the rescue with their goo.gl URL shortening service that also exposes a public API without need for authentication.

So I simply wrote a service class that utilised the Spring RestTemplate class to shorten URLs:

@Service("urlShortenService")
public class UrlShortenService {

       private RestTemplate restTemplate;

       public UrlShortenService() {
              restTemplate = new RestTemplate(ClientHttpRequestFactorySelector.getRequestFactory());
              List<HttpMessageConverter<?>> messageConverters = new ArrayList<HttpMessageConverter<?>>();
              messageConverters.add(new StringHttpMessageConverter());
              messageConverters.add(new MappingJacksonHttpMessageConverter());
              restTemplate.setMessageConverters(messageConverters);
       }
	   
       public String shortenUrl(String url) {
              Map<String, String> request = new HashMap<String, String>();
              request.put("longUrl", url);
              LinkedHashMap<String, String> shortUrl = restTemplate.postForObject("https://www.googleapis.com/urlshortener/v1/url", request, LinkedHashMap.class);
              return shortUrl.get("id");
       }
}


I didn't worry too much about validating that the string passed in was a URL for the time being as I always had control of that, but that should be something that would need to be considered.