The joyless world of a data-less startup

I recently read "The Joyless World of Data-Driven Startups" over on medium.

It's a nice article, and there are plenty of good points made - but my concern with the article is that it champions data-less decisions without covering the times when data-less or gut-instinct decisions are not appropriate.  In many ways it mirrors some conversations I have had over the past year or two regarding the benefits of data and using the Build-Measure-Learn approach from the Lean methodology.

I actually think the article hits the nail on the head, but really this post is to address the fact that some people may read what they want from it (and use it as an excuse not to measure).  I think really, the main take away point from the article is this:

Data-driven innovation sucks.

That is really what is at the crux of the article, using data to drive innovation is as close to paint-by-numbers as you can get.  Think about it, innovation is by its very nature a creative process - its the act of creating something new, or something that tackles a particular domain or problem in a new and original way - obviously not something that can be done formulaic-ally by numbers.  Creating a new and exciting product has to be creative by nature. Creating a clone of an existing product/service is not (that's not to say that it wouldn't still be a viable business though!).

The problem is, in the past I have ended up in lots of conversations as a proponent of Lean and the B-M-L iterative approach to attempting to better understand the solution and problem domain and have faced this very argument - that its OK that we aren't measuring result correctly(if at all!), and its OK that we are releasing multiple changes, and pulling lots of levers, all at once (thereby making data we do measure difficult to tie back to single changes), because just look at Steve Jobs! He didn't bother about asking what users wanted! he just gave them his vision!

And that is the main distinction I would like to highlight - there is a big difference between using data to drive your creative/innovative decisions and using data to learn from your innovations.  Yes its OK to be creative and out-of-the-box with innovation and new product features - that's how we got so many amazing products that we have today - but you need to be ruthless in your learning and understanding of the data.

To the Steve Jobs example - yes, he was a visionary and he ignored data/user research when designing products - Jobs/Apple are famous for their approach of just giving customers what they decide they want, and with great success - but they still use the data.  You think if the iPhone was a dismal failure they would have ignored that data and not changed it? You think Apple haven't killed off products that haven't performed well?  These are fairly extreme examples, as in both cases the data is very visible (e.g. sales figures) that people might not think of as post-product learning, but really its the same thing.

Essentially, the bottom line is whilst data-driven innovation sucks, data-driven learning is essential.

That's my opinion, anyway.


Spring MVC - Page caching and If-Modified-Since

Spring (and in general Java/groovy etc) support a range of caching layers for your web application - with options for Hibernate first/second level cache, Spring's @Cacheable and simple integration with lots of providers (EHCache, Redis, etc), but aside from server side caching, you can also implement page level caching using the standard If-Modified-Since cache headers.

If your web application is sitting behind an apache web server, then using this mechanism, you can set it up such that Apache will do a lot of the work, and a lot of requests never even trouble our web application.  This will of course only work if you have fairly static pages that don't update to frequently, and aren't user specific (e.g. so its actually possible to cache at a web-page level - user account pages obviously are harder to cache at this level as they are very user specific).  Even without Apache in the mix, most modern browsers are built to handle If-Modified-Since headers and 304 response codes, so using the approach can mean that browsers are less eager to even request the page from your server whilst a user is browsing the site if we have said its cache-able.

Thankfully, the machinery for this stuff is all baked into Spring MVC.

Updating the RequestMappingHandlerAdapter

This walkthrough is assuming that your app is using standard @Controller annotations and a standard RequestMappingHandlerAdapter to route the requests - although there are still relatively easy mechanisms to do this if you are using other Controller/HandlerAdapter patterns.

The RequestMappingHandlerAdapter class has a method that can be overridden called getLastModifiedInternal() - This method simply returns a long value (the epoch time) that the requested resource was last modified. All we need to do is extend the RequestMappingHandlerAdapter and implement this method to return a timestamp.  For example:

The above assumes we have initialised a timestamp at startup (easy to do using Java config) and assumes that if you have visited the site since last app startup, then there is no change (in reality, we will likely need something more complicated to calculate this)

And that's really all we need, as Spring will handle the rest for us - from this, if a brand new request comes in with no If-Modified-Since headers, then Spring will take this date/time stamp and return it with our response as the Last-Modified date (HTTP standard header - this will inform the browsers/web servers action next time).  If however, a user has already visited your site and the browser has received a valid Last-Modified timestamp, then on the next request it will include this value in the request If-Modified-Since header, when Spring recieves this request, it compares the timestamp against the value that is returned from our getLastModifiedInternal() method, and if there has been no change then Spring will automatically return the response to the client with a 304 response - so none of the Controller code will be executed.

As you can probably guess, this can provide huge efficiencies and improvement on latency, server overhead etc.

More complex Last Modified Dates

As mentioned, in all likelyhood you will need a more sophisticated mechanism than just checking the application start time - So we have plenty of options here: we could query the DB for changes, we could have other flags/properties set for when particular resources are set (bare in mind that if static resources like CSS are changed, these need to be considered)

If you need to consider a fairly unique LastModified date for every endpoint, then this solution isn't a good fit, as this is the more generic approach - you can alternatively implement a similar last modified method on your (every) controllers.

Another option that I have considered is a halfway compromise - where I need endpoints to have specific considerations, but I only have a set of 4-5 (or relatively manageable) specific queries/checks that need to happen, and every endpoint will just need to check some subset of these conditions - this solution involves custom annotations and marking up Controller methods with this annotation to signal to our HandlerAdapter what checks should be considered for the Last Modified timestamp.  This has the advantage of relatively little intrusion in all my controller endpoints, but granular enough to provide enough control for effective page caching.

A custom annotation

First we define a simple annotation that can be used to mark up Controller methods to indicate which conditions are important to the endpoint:

The above code shows the annotation code, a sample enum (just to provide some type safety whilst using the annotation - this could be a string or anything else) and an example of the annotation on a controller endpoint.

Updating our Last Modified method

Now, in our last modified method, we have access to the Handler (e.g. the controller method being invoked) so we can quite simply check the controller method for our annotation, and if found we can just check the values said and perform the appropriate checks to work out what the last modified date needs to consider:

As mentioned, this won't work if there are lots of pages with lots of different requirements, but if you have a manageable of changing entities in your application this can strike a nice balance between clean code and flexibility.


Spring Boot - Tomcat error handling

The default (maybe even recommended?) behaviour for Spring Boot is to package your web application as a JAR with a bundled Tomcat - following the increasingly popular pattern of "Application Servers are Dead" - a more extreme version of one-app-per-app-server pattern I guess.  However, by and large I am still using Tomcat instances (although usually still one-app-per-server) so I am building WAR files for deployment.

So error handling.  Normally, in our web.xml we would define errorPages for response codes, exception types etc, that Tomcat would be able to forward any response code/exception that comes from your application back into an end-point within your application so you can display a custom error page and handle any additional logging etc.

Now for Spring Boot (and Spring 4 generally) there is a shift away from XML and a lot of effort has been put in to make it possible to create web apps with out a single XML file in sight.  This includes the error page mappings - so if you want to define custom mappings you can simply create an error configuration file as so:

Simple right? Now that all makes sense if we are running tomcat bundled with our app - we can use this configuration to configure Tomcat and everything would work as expected - but what about if we are deploying a built WAR file to a Tomcat instance? How can Spring be configuring Tomcat's error page mapping?  To me it just seemed too magical, but as you might expect, there is a simple answer:  ErrorPageFilter - This is just a default Filter that is added that intercepts all requests with error codes/exceptions and forwards it to the correct endpoint.  From the comment at the top of the Filter:

* A special {@link AbstractConfigurableEmbeddedServletContainer} for non-embedded

* applications (i.e. deployed WAR files). It registers error pages and handles

* application errors by filtering requests and forwarding to the error pages instead of

* letting the container handle them. Error pages are a feature of the servlet spec but

* there is no Java API for registering them in the spec. This filter works around that by

* accepting error page registrations from Spring Boot's

* {@link EmbeddedServletContainerCustomizer} (any beans of that type in the context will

* be applied to this container).

A simple solution that will work for standalone JARs or deployed WARs.


Thoughts on Spring Boot

 Having long been a fan of the Spring framework (Spring MVC, Spring Social etc), I have recently started using Spring Boot for two different projects I am currently working on.
In both cases, they are web applications - both with some key differences in technology (in terms of what I am using for client side libraries and data persistence).

Spring Boot is an opinionated implementation of Spring - and it works really nicely (most of the time).  You simply add the Spring Boot dependencies to your build file (Gradle or Maven both well supported) and you can have an application up and running with a very small amount of code (you may have seen the Spring boot application in a Tweet a while ago)

Now of course, in reality you do need other configuration stuff, but the take away point is that if you are happy with Spring opinions, you can get an application up and running in pretty quick time.  For me, this is really a turning point for Java/Spring productivity - especially as Groovy has become more mature and sits so easily in Spring Boot, I don't think there is much to the argument that Java/Spring is too slow for early stage companies/prototypes/rapid development process (Aside: I know grails has been around for a while, but I think Spring and Java have stepped up here, plus I honestly have my suspicions that Spring Boot and Grails may clash at some point - they have certainly been on a collision course since Spring Boot was announced at Spring One).

The way it works is quite simple really, you add a relevant dependency to your buildfile, for example:


The first one is just your standard web application Spring Boot dependency, and the second is the opinionated version of Thymeleaf configuration.  Then, at startup Spring Boot has lots of AutoConfiguration files that check for the existence of key classes in the classpath, and if present it executes the auto configuration.

See here for the Thymeleaf autoconfiguration - you will see that there is heavy use of the @ConditionalOnClass annotation - which checks for relevant Thymeleaf classes, and if they are on the classpath it configures them.

Undoubtedly, there are a few times where you find yourself scratching your head at the magic, and I have on more than one occasion had to go through the Spring source code.  And sometimes, you want most of the auto configuration, but just want to tweak one or two properties, and you are left having to turn off the autoconfig and do it yourself, but for me the biggest take away point is the speed at which it is now possible to get up and running with Spring/Java/Groovy and have a decent web platform for building your product or company.