Thoughts on development & design: overview

Showing posts with label overview. Show all posts

As a result of working in web products (both personal and professional) I have become quite au fait with SEO from a technical point of view, and purely technical things you can do to put you in good standing - all of which really, are just web best practices to be a good web citizen - things that we should be doing to make the web better (proper use of HTTP headers, mobile friendly designs, good performance for people on slower connections etc).

One thing that is a bit above and beyond that is the use of Google's Structured Data. I will talk about what it is and what it does below, but if you are dynamically loading webpages (e.g. your website isn't just HTML on a web server, but either served from a server-side application, or is an API driven JS application), then you are most likely well placed to easily and immediately start implementing it.

1. What is Structured Data?

Google has defined a set of schemas regarding Structured Data on websites. This is a schema that allows better definition of key data points from a given website. It's a sensible move by Google and is a natural progression for their search engine.

Think about it, there are millions of websites out there made up with sprawling HTML code and content - whilst HTML is a standard (more or less!) across the web, there are probably millions of different ways people use it. It would be nice if everyone used the H1 etc heading tags consistently, or if everyone used the <em> tag the same way (emphasis vs italics), but the reality is they don't - some sites will be using HTML as intended but many many more undoubtedly just rely on <span> or <div> tags combined with CSS classes to re-define every element they might need.

This is all fine, Google is smart enough to pull out the content for indexing - yes, if you use span elements with custom styling for headings on your website rather than the H1+ tags then Google will penalize you, but it won't stop Google reading and indexing the site. Whats more, its getting smarter all the time - I'd probably back Google to be able to pull out relevant clips or question/answers directly in a fairly reliable way. However, they are Google and much like the Microsoft/IE of the 90s, they have the dominant market share so they can define their own standards for the web if they want to. That's exactly what Structured Data is.

It's Google saying:

Hey, if you provide some data in your code that looks like this then we will read that, so we don't have to guess or work out stuff. Or we can just keep trying to work it out from your html content.. it's your call

As mentioned earlier, if you have control over the source code and the data on your website, then this is really powerful. You can define specific (keyword heavy) snippets of content and explicitly tell Google about them - what's more the Schema for Structured Data lets you define content such as FAQ or how-to directions - so you can go beyond keyword heavy snippets and actually create FAQ for questions that you have identified from Google search trends (or whatever SEO tools you might use).

Hopefully you get the picture by now - this is a pretty powerful tool.

2. Schema tags: FAQ and HowTo

Two specific parts of the Structured Data Schema stood out to me as quite universally useful for websites, and also produce decent tangible results: FAQ schema and HowTo schema.

FAQ schema allows high level Q&A points to be provided in the metadata for Google - generally useful as most sites will have some element of their pages that could be presented as FAQ
HowTo schema allows step-by-step how to guides - less widely applicable, but if you have any website that provides how-to guides or anything with instructions this is applicable.

What exactly to these tags do and why do we care? Well, as well as trying to win favour with the all-seeing google search bot, if it gets picked up it also means we get more search real estate and increased accessibility to our content which should also increase the chance of click through conversion.

If you have ever seen search results like this:

These points are being scraped from the site by Google from their schema tags - if you can automate the creation and inclusion of these for your site then you can be in a pretty good position to improve your SEO (still relatively low number of sites implementing these schemas).

3. Automating your schema tags

As I have mentioned a couple of times, and as you have hopefully realised - if you have a dynamic website, you are likely already taking structured data (from your database for example, so reliably structured!) and building HTML pages (either server-side, or data as JSON to a javascript app for page creation client side) - but either way, we are starting off with structured data, and the Google Structured Data wants.. you got it, Structured Data! So if you have the content, it really is a generic, simple transform between structured data formats.

Below is some example code - It is based on jekyll, as thats what my most recent personal project has been, but it's also pretty close to pseudocode, so hopefully you can easily translate it to whatever tech you use:

As you can see, its a simple JSON based data structure and you just fill in the relevant parts of the object with your data.

You can see the full code in my Jekyll food website over on github - likewise, you can see the end result in action on the website (hosted by github pages, of course) too - the project was a food-science site - covering science and recipes, so a perfect match for FAQ (science pages) and HowTo (recipe pages) - for example if you go to a chilli recipe page which naturally has structured data for step-by-step instructions, and view the page source, you will see the JSON schema at the top of the page, using the HowTo schema elements laying out resources required and then the steps required. Likewise on the Science of humidity in cooking in the page source you will see the JSON schema with FAQ:

4. Conclusion

If you have control of the data being loaded on to the page (a custom site - something like WordPress or another off the shelf platform might make this harder, but at the same time there are undoubtedly lots of plugins that already handle this for you on your given platform which make it even easier!), and are even vaguely interested in increasing organic traffic and search ranking, then I'd recommend implementing this. It's very easy to add (it's also primarily behind the scenes tech - it doesn't impact the visuals or design of the pages its added to) and can only be beneficial.

As with anything trying to improve SEO, its always hard to measure and get concrete details on, but its relatively low cost to add, minimal ongoing maintenance so I think its worth a shot.

If you have had experiences with Google's Structured data, good or bad, then I'd love to hear about it!

Last year I spent some time playing with, and writing about, Scala & Shapeless - walking through the simple example of generating random test data for a case class.

Recently, I have played some more with Shapeless, this time with the goal of generating React (javascript) components for case classes. It was a very similar exercise, but this time I made use of the LabelledGeneric object, so I could access the field names - so I thought I'd re-visit here and talk a bit about some of the internals of what is going on.

Getting started

As before, I had to define implicits for the simple types I wanted to be able to handle, and the starting point is of course accepting a case class as input.

So there are a few interesting things going on here:

First of all, the method is parameterised with two types: caseClassToGenerator[A, Repr <: HList] A is simply going to be our case class type, and Repr is going to be a Shapeless HList.

Next up, we are expecting several implict method arguments (we will ignore the third implicit for now, that is an implicit that I am using purely for the react side of things - this can be skipped if the method handles everything itself):

implicit generic: LabelledGeneric.Aux[A, Repr], gen: ComponentGenerator[Repr],

Now, as this method's purpose is to handle the input of a case class, and as we are using shapeless, we want to make sure that from that starting input we can transform it into a HList so we can then start dealing with the fields one by one (in other words, this is the first step in converting a case class to a generic list that we can then handle element by element). In this setting, the second implicit argument is asking the compiler to check that we have also defined an appropriate ComponentGenerator (this is my custom type for generating React components) that can handle the generic HList representation (its no good being able to convert the case class to its generic representation, if we then have no means to actually process a generic HList).

Straight forward so far?

The first implicit argument is a bit more interesting. Functionally, all LabelledGeneric.Aux[A, Repr] is doing is asking the compiler to make sure we have an implicit LabelledGeneric instance that can handle converting between our parameter A (the case class input type) and Repr (the HList representation). This implicit means that if we try to pass some type A to this method, the compiler will check that we have a shapeless LabelledGeneric that can handle it - if not, we will get a compile error.

But things get more interesting if we look more at what the .Aux is doing!

Path dependent types & the Aux pattern

The best way to work out what is going on is to just jump into the Shapeless code and have a dig. I will proceed to use Generic as an example, as its a simpler case, but its the same for LabelledGeneric:

That's a lot simpler than I expected to find, to be honest, but as you can see from the above example, there are two types involved, there is the trait parameter T and the inner type Repr, and the Generic trait is just concerned with converting between these two types.

The inner type, Repr, is what is called a path dependent type, in scala. That is, the type is dependent on the actual instance of the enclosing trait or class. This is a powerful mechanism in Scala (but one that can also catch you out, if you are in the habit of defining classes etc within other classes or traits). This is an important detail for our Generic here, as it could be given any parameter T, so the corresponding HList could be anything, but this makes sure it must match the given case class T - that is, the Repr is dependent on what T is.

To try and get our head around it, let's take a look at an example:

Cool, so as we expected, we can see that in our Generic example, we can also see that the type Repr has been defined matching the HList representation of our case class. It makes sense that we want the transformed output HList to have its own, specific type (based on whatever input it was transforming), but it would be a real pain to have to actually define that as a type parameter in the class along with our case class type, so it uses this path-dependent type approach.

So, we still haven't got any closer to what this Aux method is doing, so let's dig into that..

The Aux Pattern

We can see from our code that the Aux method is taking two parameters, firstly A - which is the parameter that our Generic will take, but the Aux method also takes the parameter Repr - which we know (or at least pretend we can guess) corresponds to the path dependent type that is defined nested inside the Generic trait.

The best way to work out what is going on from is to take a look at the shapeless code!

As we can see, the Aux type (this is defined within the Generic object) is just an alias for a Generic[T], where the inner path-dependent type is defined as Repr - they have a pretty decent explanation of what is going on in the comments, so I will reproduce that here:

(that is abbreviated for the more relevant bits - they have even more detail in the comments that can be read).

That pretty nicely sums up the Aux pattern - the pattern allows us to essentially promote the result of a type-level computation to the higher level parameter. It can be used for a variety of things where we want to reason about the path dependent types, besides this, but this is a common use for the pattern.

So thats all I wanted to get into for now - you can see the code here, and hopefully with this overview, and the earlier shapeless overview, you can get an understanding of what the LabelledGeneric stuff is doing and how Shapeless is helping me generate React components.

In my last post about evolutionary computing, I mentioned I started building the project partly just so I could have a play with Shapeless. Shapeless is a generic programming library for Scala, which is growing in popularity - but can be fairly complicated to get started with.

I had used Shapeless from a distance up until this point - using libraries like Circe or Spray-Json-Shapeless that use Shapeless under the hood to do stuff like JSON de/serialisation without boilerplate overhead of having to define all the different case classes/sealed traits that we want to serialise.

You can read the previous post for more details (and the code) if you want to understand exactly what the goal was, but to simplify, the first thing I needed to achieve was for a given case class, generate a brand new instance.

Now I wanted the client of the library to be able to pass in any case class they so wished, so I needed to be able to generically inspect all attributes of any given case class* and then decide how to generate a new instance for it.

Thinking about this problem in traditional JVM terms, you may be thinking you could use reflection - which is certainly one option, although some people still have a strong dislike for reflection in the JVM, and also has the downside that its not type safe - that is, if someone passes in some attribute that you don't want to support (a DateTime, UUID, etc) then it would have to be handled at runtime (which also means more exception handling code, as well as loosing the compile time safety). Likewise, you could just ask clients of the code to pass in a Map of the attributes, which dodges the need for reflection but still has no type safety either.

Enter Shapeless

Shapeless allows us to represent case classes and sealed traits in a generic structure, that can be easily inspected, transformed and put into new classes.

A case class, product and HList walk into a bar

A case class can be thought of as a product (in the functional programming, algebraic data type sense), that is, case class Person(name: String, age: Int, living: Boolean) is the product of name AND age AND living - that is, every time you have an instance of that case class you will always have an instance of name AND age AND living. Great. So, as well as being a product, the case class in this example also carries semantic meaning in the type itself - that is, for this instance as well as having these three attributes we also know an additional piece of information that this is a Person. This is of course super helpful, and kinda central to the idea of a type system! But maybe sometimes, we want to be able to just generically (but still type safe!) say I have some code that doesn't care about the semantics of what something is, but if it has three attributes of String, Int, Boolean, then I can handle it - and that's what Shapeless provides - It provides a structure called a HList (a Heterogeneous List) that allows you to define a type as a list of different types, but in a compile time checked way.

Rather that a standard List in Scala (or Java) where by we define the type that the list will contain, with a HList, we can define the list as specific types, for example:
The above example allows us to define out HList with specific types, which provides our compile time safety - the :: notation is some syntactical sugar provided by Shapeless to mirror normal Scala list behaviour, and HNil is the type for an empty HList.

Case class to HList and back again

Ok, cool - we have a way to explicitly define very specific heterogeneous lists of types - how can this help us? We might not want to be explicitly defining these lists everywhere. Once again Shapeless steps in and provides a type class called Generic.

The Generic type class allows conversion of a case class to a HList, and back again, for example:
Through compile time macros, Shapeless provides us these Generic type classes for any case class. We can then use these Generic classes to convert to and from case classes and HLists:

Back to the problem at hand

So, we now have the means to convert any given case class to a HList (that can be inspected, traversed etc) - which is great, as this means we can allow our client to pass in any case class and we just need to be able to handle the generation of/from a HList.

To start with, we will define our type class for generation:

Now, in the companion object we will define a helper method, that can be called simply without needing anything passed into scope, and the necessary Generator instance will be provided implicitly (and if there aren't any appropriate implicits, then we get our compile time error, which is our desired behaviour)

As you can see, we can just call the helper method with an appropriate type argument and as long as there is an implicit Generator in scope for that type, then it will invoke that generate method and *hopefully* generate a new instance of that type.

Next, we need to define some specific generators, for now, I will just add the implicits to support some basic types:
All simple so far - if we call Generator.generate[Int] then it gets the implicit Generator[Int] and generates a random int (not that useful, perhaps, but it works!)

Now comes the part where our generate method can handle a case class being passed in, and subsequently the Shapeless HList representations of a case class (we will recursively traverse a HList structure using appropriate implict generators). First up, lets look at how we can implicitly handle a case class being passed in - at first this may start to look daunting, but its really not that bad.. honestly:
So, what is going on there? The implicit is bound by two types: [T, L <: HList] - which is then used in the implicit argument passed into the method implicit generic: Generic.Aux[T, L] and argument lGen: Generator[L] - this matches for all types T that there exists in scope a Shapeless Generic instance that can convert it into some HList L, and for which L an implicit Generator exists. Now we know from our earlier look at Shapeless that it will provide a Generic instance for any case class (and case classes will be converted into a HList), so this implicit will be used in the scenario where by we pass in a case class instance, as long as we have an implicit Generator in scope that can handle a HList: So lets add that next!

Let's start with the base-case, as we saw earlier, HNil is the type for an empty HList so lets start with that one:
Remember, the goal is from a given case class (and therefore a given HList) is to generate a new one, so for an empty HList, we just want to return the same.

Next we need to handle a generator for a non-empty HList:
So this case is also relatively simple, because our HList is essentially like a linked list, we will handle it recursively - and the type parameters [H, T <: HList] represent the type of the head of the HList (probably a simple type - in which case it will try to resolve the implicit using our original Int/Boolean/Double generators) and the tail of the HList which will be another HList (possibly a HNil if we are at the end of the list, otherwise this will resolve recursively to this generator again). We grab the implicit Generator for the head and tail then simply call generate on both and that's really all there is to it! We have defined the implicit for the top level case class (which converts it to the generic HList and then calls generate on that), we have the implicit to recursively traverse the HList structure and we have finally the implicits to handle the simple types that might be in our case class.

Now we can simply define a case class, pass it to our helper method and it will generate a brand new instance for us:

I was using this approach to generate candidates for my evolutionary algorithm, however the exact same approach could be used easily to generate test data (or much more). I used the same approach documented here to later transform instances of case classes as well.

Here is the completed code put together:

Footnote

I have mentioned so far that Shapeless provides the means to transform case classes and sealed traits to generic structures, but have only really talked about case classes. The sealed trait is a slightly different shape to a case class (it is analogous to the algebraic data type co-product) and accordingly, Shapeless has a different structure to handle it, rather than a HList it has a type called Coproduct. The above code can be extended to also handle sealed traits, to do so you just need to add the implicit to handle the empty coproduct (CNil) and recursively handle the Coproduct.

A pretty cool thing that has come out of recent Machine Learning advancements is the idea of "Word Embedding", specifically the advancements in the field made by Tomas Mikolov and his team at Google with the Word2Vec approach. Word Embedding is a language modelling approach that involves mapping words to vectors of numbers - If you imagine we are modelling every word in a given body of text to an N-dimension vector (it might be easier to visualise this as 2-dimensions - so each word is a pair of co-ordinates that can be plot on a graph), then that could be useful in plotting words and starting to understand relationships between words given their proximity. What's more, if we could map words to sets of numbers, then we could start thinking about interesting arithmetic that we could perform on the words.

Sounds cool, right? Now of course, the tricky bit is how can you convert a word to a vector of numbers in such a way that it encapsulates the details behind this relationship? And how can we do it without painstaking manual work and trying to somehow indicate semantic relationships and meaning in the words?

Unsupervised Learning

Word2Vec relies on neural networks and trains on a large, un-labelled piece of text in a technique known as "unsupervised" learning.

Contrary to the last neural network I discussed which was a "supervised" exercise (e.g. for every input record we had the expected output/answer), Word2Vec uses a completely "unsupervised" approach - in other words, the neural network simply takes a massive block of text with no markup or labels (broken into sentences or lines usually) and then uses that to train itself.

This kind of unsupervised learning can seem a little unbelievable at first, getting your head around the idea that a network could train itself without even knowing the "answers" seemed a little strange to me first time I heard the concept, especially as a fundamental requirement for a NN to converge on optimum solution requires a "cost-function" (e.g. some thing we can use after each feed-forward step to tell us how right we are, and if our NN is heading in the right direction).

But really, if we think back to the literal biological comparison with the brain, as people we learn through this unsupervised approach all the time - its basically trial-and-error.

It's child's play

Imagine a toddler attempting to learn to use a smart phone or tablet: they likely don't get shown explicitly to press an icon, or to swipe to unlock, but they might try combinations of power buttons, volume controls and swiping and seeing what happens (and if it does what they are ultimately trying to do), and they get feedback from the device - not direct feedback about what the correct gesture is, or how wrong they were, just the feedback that it doesn't do what they want - and if you have ever lived with a toddler who has got to grips with touchscreens, you may have noticed that when they then experience a TV or laptop, they instinctively attempt to touch or swipe the things on the screen that they want (in NN terms this would be known as "over fitting" - they have trained on too specific a set of data, so are poor at generalising - luckily, the introduction of a non-touch screen such as a TV expands their training set and they continue to improve their NN, getting better at generalising!)

So, this is basically how Word2Vec works. Which is pretty amazing if you think about it (well, I think its neat).

Word2Vec approaches

So how does this apply to Word2Vec? Well just like a smartphone gives implicit, in-direct feedback to a toddler, so the input data can provide feedback to itself. There are broadly two techniques when training the network:

Continuous Bag of Words (CBOW)

So, our NN has a large body of text broken up into sentences/lines - and just like in our last NN example, we take the first row from the training set, but we don't just take the whole sentence to push into the NN (after all, the sentence will be variable length, which would confuse our input neurons), instead we take a set number of words - referred to as the "window size", let's say 5, and feed those into the network. In this approach, the goal is for the NN to try and correctly guess the middle word in that window - that is, given a phrase of 5 words, the NN attempts to guess the word at position 3.

[It was ___ of those] days, not much to do

So its unsupervised learning, as we haven't had to go through any data and label things, or do any additional pre-processing - we can simply feed in any large body of text and it can just try to guess the words given their context.

Skip-gram

The Skip-gram approach is similar, but the inverse - that is, given the word at position n, it attempts to guess the words at position n-2, n-1, n+1, n+2.

[__ ___ one __ _____] days, not much to do

The network is trying to work out which word(s) are missing, and just looks to the data itself to see if it can guess it correctly.

Word2Vec with DeepLearning4J

So one popular deep-learning & word2vec implementation on the JVM is DeepLearning4J. It is pretty simple to use to get used to what is going on, and is pretty well documented (along with some good high-level overviews of some core topics). You can get up and running playing with the library and some example datasets pretty quickly following their guide. Their NN setup is also equally simple and worth playing with, their MNIST hello-world tutorial lets you get up and running with that dataset pretty quickly.

Food2Vec

A little while ago, I wrote a web crawler for the BBC food recipe archive, so I happened to have several thousand recipes sitting around and thought it might be fun to feed those recipes into Word2Vec to see if it could give any interesting results or if it was any good at recommending food pairings based on the semantic features the Word2Vec NN extracts from the data.

The first thing I tried was just using the ingredient list as a sentence - hoping that it would be better for extracting the relationship between ingredients, with each complete list of ingredients being input as a sentence. My hope was that if I queried the trained model for X is to Beef, as Rosemary is to Lamb, I would start to get some interesting results - or at least be able to enter an ingredient and get similar ingredients to help identify possible substitutions.

As you can see, it has managed to extract some meaning from the data - for both pork and lamb, the nearest words do seem to be related to the target word, but not so much that could really be useful. Although this in itself is pretty exciting - it has taken an un-labelled body of text and has been able to learn some pretty accurate relationships between words.

Actually, on reflection, a list of ingredients isn't actually that great an input, as it isn't a natural structure and there is no natural ordering of the words - a lot of meaning is captured in the phrases rather than just lists of words.

So next up, I used the instructions for the recipes - each step in the recipe became a sentence for input, and minimal cleanup was needed, however, with some basic tweaking (it's fairly possible that if I played more with the Word2Vec configuration I could have got some improved results) the results weren't really that much better, and for the same lamb & pork search this was the output:

Again, its still impressive to see that some meaning has been found from these words, is it better than raw ingredient list? I think not - the pork one seems wrong, as it seems to have very much aligned pork as a poultry (although maybe that is some meaningful insight that conventional wisdom just hasn't taught us yet!?)

Arithmetic

Whilst this is pretty cool, there is further fun that can be had - in the form of simple arithmetic. A simple, often quoted example, is the case of countries and their capital cities - well trained Word2Vec models have countries and their capital cities equal distances apart:

(graph taken from DeepLearning4J Word2Vec intro)

So could we extract similar relationships between food stuffs? The short answer, with the models trained so far, was kind of..

Word2Vec supports the idea of positive and negative matches when looking for nearest words - that allows you to find these kind of relationships. So what we are looking for is something like "X is to Lamb, as thigh is to chicken" (e.g. hopefully this should find a part of the lamb), and hopefully use this to extract further information about ingredient relationships that could be useful in thinking about food.

So, I ran that arithmetic against my two models.
The instructions based model returned the following output:

Which is a pretty good effort - I think if I had to name a lamb equivalent of chicken thigh, a lamb shank is probably what I would have gone for (top of the leg, both pieces of slow twitch muscle and both the more game-y, flavourful pieces of the animal - I will stop as we are getting into food-nerd territory).

I also ran the same query on the ingredients based set (which remember, ran better on the basic nearest words test):

Which interestingly, doesn't seem as good. It has the shin, which isn't too bad in as far as its the leg of the animals, but not quite as good a match as the previous.

Let us play

Once you have the input data, Word2Vec is super easy to get up and running. As always, the code is on GitHub if you want to see the build stuff (I did have to fudge some dependencies and exclude some stuff to get it running on Ubuntu - you may get errors relating to javacpp or jnind4j not available - but the build file has the required work arounds in to get that running), but the interesting bit is as follows:
If we run through what we are setting up here:

Stop words - these are words we know we want to ignore - I originally ruled these out as I didn't want measurements of ingredients to take too much meaning.
Line iterator and tokenizer - these are just core DL4J classes that will take care of processing the text line by line, word by word. This makes things much easier for us, so we don't have to worry about that stuff
Min word frequency - this is the threshold for words to be interesting to us - if a word appears less than this number of times in the text then we don't include the mapping (as we aren't confident we have a strong enough signal for it)
Iterations - how many training cycles are we going to loop for
Layer size - this is the size of the vector that we will produce for each word - in this case we are saying we want to map each word to a 300 dimension vector, you can consider each vector a "feature" of the word that is being learnt, this is a part of the network that will really need to be tuned to each specific problem
Seed - this is just used to "seed" the random numbers used in the network setup, setting this helps us get more repeatable results
Window size - this is the number of words to use as input to our NN each time - relates to the CBOW/Skip-gram approaches described above.

And that's all you need to really get your first Word2Vec model up and running! So find some interesting data, load it in and start seeing what interesting stuff you can find.

So go have fun - try and find some interesting data sets of text stuff you can feed in and what you can work out about the relationships - and feel free to comment here with anything interesting you find.

A long time ago, when Android was still in its infancy (1.5 I think..) I built and open sourced a basic quiz app. The app was just a basic multiple choice question-answer app, driven from some questions in the database, but it did ok - it has had over 10k downloads on the app store, and the blog post tutorial here is still one of the most popular articles.

But here we are, Android 5.0 is released and the state of Android development is very different now. But I didn't really want to just re-skin/tweak the old app and push it out again - and I also wanted to write up some notes on using parse.com as a backend service - so this seemed like a good opportunity.

The source code for the app is all on GitHub.

The goal

So the aim is to create a an android app quiz game, but rather than using local storage, using the cloud to power the questions. This avoids the need for lots of boiler plate DB code and also makes it easier for us to update questions. The tutorial will be broken into two parts - the first part will be basic quiz stuff with cloud questions, the second part will enhance the app to support user login and to track scores to allow users to compete against each other.

Parse

Before you start the tutorial, you need to get an account setup at parse.com - its a cloud DB/backend as a service that was recently bought by Facebook. They allow real easy setup of DBs and provide a load of nice libraries/APIs to really easily interact with their endpoints across lots of platforms (its also incredibly well priced -the free tiew is really good and if you find your mobile app is going beyond that then you can probably think about monetising the app!). All you need to do is head over there, sign-up and then create a new app - all you need to do is give it a name and hey presto! You can either make a note of the keys then, or come back and grab them later. There are no other changes you need to make now, as that will get handled from our source code. The only other thing to do is to download the parse android library to make use of their sdk in android (if you have grabbed the source code from GitHub then you will not have to worry about these)

OK, lets get started!

Again, I am not going to spend a lot of time covering the real basics of Android and only really mention the things of note or that are different from standard application development - hopefully the code & general explanation will be clear enough to get an understanding of what is going on.

Android manifest
First, lets get our AndroidManifest.xml file configured. The only things to note here are the permissions we are setting - we will request internet access and network state permissions. Also worth noting that I have set the min sdk for my sample app at version 16.

Our Application class
We will have to create a custom implementation of the Android Application class. This class is instantiated on application startup, and hopefully if you are looking at Android development you should be familiar with this class. We will do a couple of things in this class:

Register our parse.com application with out secret keys
Initialise the Parse library and our domain objects
Try to fetch all the questions for the quiz and store them for offline usage
Create a GamePlay object, that will keep track of the state of the current game in progress

First lets look at the Parse setup - this is standard parse boilerplate and is covered in parse docs and sample apps - you just need to add your ID/Key here (also note, we have just registered the Parse object class Question - this is our domain object - like a JPA entity etc - if we add more domain objects they need to be added here too)

Next we will make a call to parse.com to fetch the questions from our cloud API - we will save this in the background (make an asynchronous call) and "pin it" to make it available for offline usage. Also note that we do not un-pin existing questions until we have successfully found new ones - that way users should always have questions once they have successfully loaded them the first time.
Hopefully the above is quite clear - the parse libraries are quite straight forward to understand - we create a query (typed Question) and then call findInBackground and implement an on success handler.

Domain objects: Question
Parse library provides a nice interface to create POJOs to model your domain model, if you are familiar with JPA/Hibernate/etc and the approach of POJOs representing a domain model its much like this. Using these classes you can easily query/load/save data from the cloud by just using these objects. You will have spotted that in the query we use in the application class to load all our questions we just run a plain query with the Question class - this, as you should expect, will just retrieve all Question objects from your cloud table (parse). The domain models are just an annotated POJO, and you just define appropriate getter/setters for the fields you want to include.

Welcome activity
Now we are into Android basic stuff really - we have setup parse and fetched the questions for local usage, now we just need a basic menu to start the quiz and some activities to display the questions and the end results.

We will just apply the layout and then implement a method to handle the button clicks. For this tutorial we are skipping the high score stuff and just playing the quiz.
All we need to do is reset the current GamePlay object and grab the questions from the local store (by this point they should be updated from the cloud so no problems, then kick off the quiz!

Question activity
There is nothing really interesting to see here - it's all on github if you want to take a look in detail (or have already downloaded and working along) - This is just a standard Android activity that pulls out the question and possible answers and presents them.

This just progresses along fairly simply, until it gets to the end of the quiz and then it presents a simple screen saying the score - all this stuff can be tweaked/styled etc - but there are the basics for a cloud powered multiple choice quiz app!

Creating the questions

All the questions will be stored in the cloud in parse.com - once you have a look at the interface, it should be pretty clear - you can easily create the data for questions either manually or by importing a csv/json file.

You will need to login to your parse account and the quiz app and create a Question class. This will just match the domain POJO we have created. Just login, go to "CORE" then "DATA" Then select "+ Add Class", adda a custom class called "Question" (this must be exactly the same as the name provided in the POJO annotation for the Question class).. Then select "Add Col" and add the fields to match the POJO (question[string], option1[string], etc). Once you have the class/table added on parse, you can simply add data by selecting "Add Row" and just manually entering the data, or using the import function.

Source code

Thoughts on development & design

An overview: How to implement Google's Structured Data Schemas