The Art of Scalability – Dong Ping Zhang

The second edition of the Art of Scalability is my book this week. It is coauthored by Martin Abbott and Michael Fisher. As its subtitle suggests, the book is about building scalable web architecture, processes and organisations for the modern enterprise.

In this book, the authors argue that the three key components of that are people, process and technology. In the introduction video, the author talked about how they thought initially that technology was the key, only to realise that people and process are no less important based on their consulting experience. These three components are covered in the first three parts of the book. More details on that to follow.

Before getting into the details, I share with you what I like and do not like about this book. Its content is vast, fascinatingly relevant, and not dry at all. It is engaging enough that I have had no trouble enjoying many chapters from around 3am to dawn nearly all days this week. Just to abandon this book and pick up another one is a very trivial action on my part. But I did not. The discussions, technical or not, are very plainly written. It opens up my view on how to scale. The quantitative approaches towards project management and scalability topics are straightforward. The main negative attribute of this book is repetition. It could be shortened significantly. That said, repeating concepts covered in previous chapters certainly helps to refresh the reader’s memory and improve the understanding of the topic currently under discussion. It could, therefore, be the intention of the authors.

Do I recommend it? Yes. If you do not have a large chunk of time to pursue such a big book, browsing the conclusion and key points sections of each chapter on safari books online can give you a quick overview of each chapter. The figures, tables, equations etc. are all beautifully presented online too.

Staffing a scalable organisation

In this part, the book discusses the necessary roles and their corresponding responsibilities in a scalable technology organisation. The lack of clearly defined roles or those with overlapping responsibilities can cause confusion and conflict.

The book then progresses to talk about the two key attributes of organisations: size and structure. Both can affect the communication, efficiency, quality and scalability of the organisation. The two traditional structures are functional and matrix. The third one, agile, is gaining traction for its increased innovation, measured by shorter time to market, quality of features and availability of services. There are pros and cons for both large or small teams. It is important to be aware of the specific pitfalls of each, know where your team is, take necessary steps to mitigate the negative effects of the team size.

Further, the book presents us Leadership 101 and Management 101. I like the guidance on goal setting for leaders. The goals should be SMART: Specific, Measurable, Attainable (but aggressive), Realistic and Timely (or containing a component of time). One piece of advice stands out for me in the Management 101: “spend only 5% of your project management time creating detailed project plans and 95% of your time developing the contingencies to those plans. Focus on achieving results in an appropriate time frame, rather than laboring to fit activities to the initial plan.” When it comes people management, the analogy of gardening is interesting: seeding (as of hiring), feeding (as of developing people), weeding (as of elimination of underperforming people within an organisation).

Building Processes for Scale

The second part of the book covers processes. The general idea is to create the right set of processes to standardize the steps taken to perform certain tasks, eliminate confusion and unnecessary decision making, and hence free up the employees to focus on important work. The authors use the following figure to illustrate the different levels of process complexity.

In this part, the authors discusses processes answering these questions:

How to properly control and identify change in a production environment?
What to do when things go wrong or when a crisis occurs?
How to design scalability into your products from the beginning?
How to understand and manage risk?
When to build and when to buy?
How to determine the amount of scale in your systems?
When to go forward with a release and when to wait?
When to roll back and how to prepare for that eventuality?

One chapter talks about headroom calculation. The authors advise to use 50% as the amount of maximum capacity whose use should be planned. Naturally we all know a discount factor should be used in estimating headroom, but the value to use for discounting is mostly informed from experience. This shows one great benefit of reading this book: informing me of what the authors summarised from their combined decades of experience of helping to scale businesses.

Architecting Scalable Solutions

The third part of this book discusses about the differences of implementation and architecture, how to create fault-isolative architectures, the AKF scale cube, caching and asynchronous design for scale. The AKF scale cube method suggests scaling along three dimensions: cloning the entities or data and distributing unbiased work across workers, separation of work biased by activity or data, separation of work biased by the requestor for whom the work is being performed. For illustration purpose, I cite the AKF scale figure from the book below.

The first two dimensions of the AKF scale cube approach are very similar to our scalability studies for Exascale Computing: providing more compute nodes and duplicating data and code on each of them to perform a chunk of work (that can equally be performed on other node), partitioning and assigning a specific piece of work to its most suitable compute node in a heterogeneous environment or partitioning the data among a set of nodes and sending the corresponding compute to each node. The third dimension is to direct the service requests to different subset of nodes, based on the info available about the requests or requesters. The authors point out often these are nested together.

The last part of the book covers the issue of having too much data, grid and cloud computing, monitoring applications and planning data centers. Not to miss the appendices, the examples given there are very illustrative on availability, capacity planning, load and performance calculation.

There is a set of slides from Lorenzo Alberton available on slideshare, talking about the key concepts from this book.

Overall, I enjoyed learning about scalability and how to build scalable architecture through reading this book. It is thanks to reading books like this that the darkness of winter days is slightly more bearable than it would be.

Staffing a scalable organisation

Building Processes for Scale

Architecting Scalable Solutions

Published by dpz