Ebooks, an evolving landscape

I read a lot. Amazon made a fortune out of me over the years as the most convenient place to buy technical books; but eventually I hit a tipping point. Physical books, particularly technically ones, take up a lot of space, and I literally ran out of room. Luckily technology has finally caught up, and I’ve bought into the ebook (r)evolution over the last year or so.

I decided to stick to the open ePub format, which meant leaving Amazon behind as they are the only major vendor not to support ePub. Unfortunately Amazon makes things very easy, if you decide to go elsewhere you may have to put in a little more effort.

I started with iBooks on my iPad, a great application. I bought a couple of books in the iBookStore. And all was well… until I tried to transfer them to a Kobo. Both use ePub, but Apple’s DRM means I can only view the books on the iPad or iPhone. No-one has yet reliably cracked Apples DRM, so I cut my losses, and don’t buy there anymore.

These days I buy directly from publishers where I can. The upside is access to beta versions pf some books and occasional chances to pick up eBook versions of physical books I bought long ago, sometimes at a huge discount. The downside is I now have at least 6 separate accounts to manage. You may find it worth your while to buy direct from the publisher even if you use Amazon, as they all support Amazons Mobi format as well as ePub.

For general books I tend to go to Kobo or Diesel. Another two accounts to manage… 😦

For managing books I use Calibre. It’s free, open source, works with just about all e-readers, and allows converting between most formats.

Most of the technical publishers books are either DRM free (OReilly) or watermarked (PragProg, Informit) which is great. Most publishers using DRM tend to use Adobe which is easy to crack if you are so inclined. You will probably have to install Adobe Editions so the procedure is download to Adobe Editions, and either use that as your library manager or fish the books out of there, remove the DRM, and pull them into Calibre.

An unexpected side effect is that I now buy even more books than before. However doing the above does require effort. No one else has yet replicated the magical mix of good hardware, ease of use and sheer breadth of content that Amazon has; though plenty are trying.

http://calibre-ebook.com

Favourite books – better programming

I didn’t do software engineering at college, so for a long time I worried about what I didn’t know, which translated into a continuing quest for knowledge. This and a few followup posts will list a few of my favourite books and other sources, with some brief explanations of why I think they are important. Your opinions may vary. 🙂

So here goes…

Programming

These are books about the technicalities of programming; about being better programmers. We can get away with minimal designs and documentation, and many products have shipped with little of either, but there is always code, which has to be maintained and extended. In fact the more successful a product is, the more its likely to change.
All of the following cover (to greater or lesser extents) style, structure, debugging, documentation and low level design. The principles apply across many languages and environments, though the examples typically use the most popular language of the day.

The practice of Programming (Kernighan and Pike)

Brian Kernighan is famous as the co author of ‘The C programming language’, another of my favourite books, and his clear writing style is put to good use here. The tag line to this book is: Simplicity, Clarity, Generality. Written in 1999, its examples are mostly in C, C++ and Java, though there is a C and Unix bias, not surprising from the people so heavily involved with both. And it’s short, only 288 pages.

Code Complete (McConnell)

A heavyweight tome. Very comprehensive and a little verbose. McConnell backs up his advice with plenty of citations from industry studies. This one tops many peoples favourite technical book lists. It’s best to pick up the second edition, which has more of an OO focus than the first. I’m hoping for a third edition one day incorporating more agile and lean practises.

Clean code (Martin)

A book for the agile era, so an emphasis on things like unit testing and code as documentation. The cartoon about the only true measure of code quality being ‘WTFs per minute’ is a classic, and sets the tone. Bob Martin is very opinionated, so the book is a bit of a polemic, but the content (especially the early chapters) is excellent.
This book is required reading for all developers in our company.
There are mny books that focus on a single practice (such as TDD) or best use of a given language, but surprisingly few books like the three above, that focus on the nuts and bolts of general programming. Hopefully there will soon be one incorporating functional programming elements…

NoSQL introduction

I’ve been researching data persistence recently, and wrote a few notes to try and organise my thoughts. Here’s a summary that might be useful, along with a list of useful sources at the end. I’ve missed out a huge amount (CAP theorem, map-reduce, etc), so this just scratches the surface.

Firstly, what is NoSQL?

NoSQL as a term was coined for a conference of next generation databases in 2009. However it’s not a particularly useful name. My favourite definition comes from the Fowler/Sadalage book ‘NoSQL Distilled’ where they define it as:
“an ill defined set of mostly open-source databases, mostly developed in the early 21st century, and mostly not using SQL”. Hmm…

Some typical characteristics of NoSQL databases are:

  • Most of them run well on clusters (except for graph databases) and are designed for very high scale
  • They don’t use SQL, though some are getting close
  • They don’t have a schema, so you can add new fields without having to define schema changes. This doesn’t mean that there is no schema. There is always a schema, it’s just now in code instead of the database. So engineering best practice around schema migrations still applies…
  • They do have transactions but not in the SQL sense. For example a single write to a document database is a transaction which could persist one or many entities, depending upon the design of the document.

There are four main types of NoSQL database:

Key Value Store

Examples: Riak
I think of these as distributed Dictionaries or HashMaps. You store and retrieve values by key. The values are opaque to the database and joins across multiple values are typically not allowed. As a result they are very fast.
A classic use case is storing session data. They are not recommended when you have relationships between values.

Document Database

Examples: MongoDB, CouchDB
A shorthand way to think of these is a bit like key value stores except that the values are not opaque. This opens up a world of possibilities. You can store complex structures and query them by something other than the primary key. So for example if you store addresses, keyed by ID, you could also query them by town or zip/post code.
Joins across documents are either not allowed or expensive. However a document can be arbitrarily complex, containing many nested elements, for example, a customer and their recent orders could be stored as a single document. By careful grouping of related information into the same document, lookups can be fast lookups in typical business use cases, as all related information can be returned by a single query.
Document databases are being used for object persistence. Both MongoDB and CouchDB store documents as json. They are also great for logging, where different log messages may have different contents.

Column Database

Examples: Cassandra, HBase
In a column database a column consists of a key value pair. Collections of columns that are accessed together are called Column Families. Rows within a column family may have many columns, accessed together using a row key. Different rows may have different columns.
If that didn’t make much sense, another way to think of a Column family is as a Table, each of whose rows can have different columns.
It’s also possible to have super-columns, where the value of the key value pair is a collection of columns.
Logging is a great use case for Column databases and they are used heavily in content management systems.
Column databases are sensitive to the query used. Changing the queries may require changing the column family design.

Graph Database

Examples: NeoDB, OrientDB
These are useful when you need to traverse relationships between entities. They allow both entities and relationships to be stored as first class elements, and both entities and relationships can have properties. Relationships can also have directionality. So for example given two people ‘Bob’ and ‘Sue’ with a ‘Likes’ relationship between them, it could be one way, so that ‘Bob likes Sue’ is true but ‘Sue likes Bob’ is not.
Graph databases make it easy to construct complex queries involving many entity types. In SQL those queries would involve many joins across different tables.

Should you change?

As always, it depends. I attended an interesting talk by Lisa Phillips from Twitter recently where she said they have loads of cool NoSQL stuff, some of which they invented themselves, but still use MySQL very extensively, to the extent that the default position for all new projects is ‘use MySQL unless it can’t do what you want’.

It also looks like the two worlds are converging a little. For example recent additions to MySQL include a handler socket interface that bypasses the SQL layer, and MariaDB has also added dynamic columns. It’s also possible to use MySQL as a front end to InfiniDB, and the MariaDB team has experimented with using Cassandra as a storage engine.

More information:

NoSQL Distilled

An excellent introduction to the subject.

Seven Databases in seven weeks

Practical dives into seven real databases, illustrating the strengths of each. Covers: Postgres, Riak, HBase, MongoDB, CouchDB, NeoJ, Redis

Software Engineering Radio:

I also attended a surprisingly good conference in Oxford recently on Database technologies for developers, called All your Base, which I hope they run again next year.