The Agile Architecture Guide (part 1: Product Development)

Introducing A New Organisation to Agile Product Development

As the new CTO, Chief Product Owner, Chief Architect, Lead Engineer or just natural leader it may be your responsibility to introduce an Agile way of working to a new Organisation. We are going to walk through the common steps to successfully introduce Agile methodology and we are going to do it using the most suitable place to introduce Agile which is the launch of a new Product.

I have worked in multiple companies, like BT, Bupa & Vodafone, who have tried multiple times to introduce Agile or one of its familial methodologies. These have varies in success and reach but the thing that most concentrates an organisation is the launch of a new product. This can range from SIM only propositions at Vodafone, 5G network releases at BT and dynamically priced PAYG treatments at Bupa. In each of these cases the product was a massively complex project that could not be delivered without breaking it down into its component parts. Also no one single individual could envisage all the necessary change required and what the end state will actually look like. These products lend themselves to Agile delivery.

I am quite loyal to Atlassian Products so I will be referencing Confluence and Jira liberally in this guide.

Starting with the Product Brief

There are multiple different product brief frameworks and templates available on the internet. Some it’s just easier to search by image to find what you’re looking for. However I have included some templates I have used before as examples.

The core concept of the product brief is that it literally must be brief. It is the elevator pitch of product opportunities. It’s not a manifesto or unchangeable constitutional document that defines the product going forward but the gist of what the product may be. It can be rejected very early on so don’t invest too much effort in it. Though I always recommend keeping a personal record of all your ideas.

The following sections describe the use of PowerPoint and Atlassian Confluence for documenting Product Briefs. Each have their own benefit.

A Simple PowerPoint Product Brief Template

I generally avoid using static documentation tools as they represent a pre-internet way of thinking. Product Briefs however fit quite well in a single slide PowerPoint template. The latter Confluence example is more complex, but a single page can capture the concept of the product.

The Brief template should always first capture the Product or Service Vision. This should be a concise description of the innovation (often the technology) and the benefit which it will bring.

Further sections can include:

  • detail of the Target Group describing the target market segment (B2C, B2B, B2B2C etc) and who would be the users (often as personas)
  • detail on the Needs that this product realises by answering the question ‘what benefit is solved by this product’
  • detail on the Product or Service and a description of how the product aligns with the Business Goals
  • a high-level competitor analysis is useful whilst remembering that the product can be an improvement of an existing product
  • any cost estimates at a high level such as potential Revenue and expected Cost Factors can be very useful at an early stage but a Product Brief is never expected to be fully costed as that activity can come out in a later articulation stage.
  • as I work in R&D some explanation of the Science behind the product helps explain the novelty and costs of the product
Example Product Brief Template

Atlassian Confluence Living Document Approach to a Product Brief

I personally use Atlassian Confluence for all my product design work. I maintain a folder structure in Confluence following The Open Group Architecture Forum’s (TOGAF) Business, Data, Application & Technology format to describe a complete enterprise architecture.

Product Briefs go the under the Business Architecture folder where I provide standard “Templates to be Completed” for all new Product Briefs. Because it is Confluence users have to copy the template and then create a new page under Product Briefs folder in the name of the product. Always remind your users

File structure providing Templates for a Product Brief

The organisations I work with are data centric and a lot of new products have an insights or machine learning capability. For this reason, I include sections in the template to capture the Data and algorithmic parts of the Product. It is beneficial though to keep the product description agnostic to the technology.

Example Product Brief Template Table

Product or Service

An important distinction in a SaaS environment is determining if the product is a single charge product or a recurring charge service. This does not have to be defined in the Product Brief stage but it is useful to get an idea of the nature of the product. A useful lesson I learned whilst working with R&D science start-ups is that the distinction between a product and a service is not clear cut. A science product, like a lab testing function can be expose a set of products with each having a shipped testing kit, imagine lateral flow testing kits. These products are crucial for controlling the rate of infection in a community from Covid-19. The data captured from the mass recording of lateral flow tests provides a set of insights which NHS-Digital (https://digital.nhs.uk/dashboards) used to analyse the R-value transmission rate in the UK. Genetic sequencing provided by NHS labs were able to provide more accurate R-value rates for different Covid variants and used these insights to inform the UK Government of the need for lockdown periods. The insights from these -omics analyses provided crucial insight services and show a data service can be built on top of a science product.

The What, Why & Who of a Product Vision

The vision of a product does not have to be some lofty ambitious epic of a transformational product. But it needs a definition of a What, Why & Who as early as possible. This is really important as otherwise you can rush into a wasted investment.

Personas are a good way of defining the interests of your users and a simple bit of celebrity alliteration (Stormzy the Scientist, Elton the Ecologist) can be a useful way of using characters in your stories. It’s useful to add some further colour to your personas by defining some non work items that they like and dislike. So for Stormzy the Scientist we added that they did not like having to scan barcodes on every sample and liked single click purchase solutions. For Elton we added that he did not like excessive packaging and preferred to order in bulk.

The Why of the Product is critical for understanding the benefit of the product. A recent example of poor understanding of the benefit of a product relates to an international hospital service provider in the UK. This hospital group made the decision to order one million Covid testing kits and four qPCR machines to provide a large testing capability for all doctors, nurses and visiting patients. This procurement activity was made without understanding the digital process for testing. When the solution was launched emails went out to internal staff who all arrived at the testing point at the same time causing a large queue and a potential mass spreading event. They had to go back quickly to the design process to arrange an end to end a registration, booking, sampling and results process to ensure that incoming patients could be properly tested. This design process took a month out of hospital operations during the early stage of the pandemic.

The What of the Product is an articulation of the deliverables and operations of the product. Examining and testing this early will help identify gaps in the product. It is to be expected to have gaps in the product at this early stage and investigation. Modelling the end to end process in a series of workshops will help fill in these gaps. Simple swim-lane process diagrams in Confluence can help articulate the end to end processes that are necessary for linking together stories in Jira at a later stage.

Gating the Product Brief Phase

The product design process is a continual activity and new concepts may arise from all layers of the organisation at any time. The product design process should not be the remit of a select few members of your organisation. Imagination should not be restricted to a strategy department, neither should anything else for that matter.

A gating process is required for Product Briefs where they are reviewed and handled when they are submitted. The gate then approves whether to progress the brief to the next stage or they are rejected early on. The whole process needs to be fast and transparent so that submitters get clear response as soon as possible. The submitter should be invited to the submissions process as otherwise the whole process can seem secret and bureaucratic.

In an agile methodology the aim is to determine success in as few iterations as possible in order to come to the appropriate conclusion. The aim of the product brief gating phase is to select those product briefs with the best hope of success that can be progressed to the subsequent articulation phase. The overhead of the articulation phase is that available resources are provided to support defining the next level of detail.

Articulating the Benefit

In Confluence I provide an articulation template for the next set of detail required. This provides the source of the first set of stories by highlighting and clicking text in Confluence to create Stories under the Product name Epic.

Many organisations skip a formal articulation phase and go straight to Story capture. There is nothing wrong with jumping this stage. My personal preference from working in scientific organisations is that an articulation stage is required to explain the science to the business and the business to the science. This also helps make Confluence more of a document store rather than maintaining assets in Office products. This increasingly becomes useful when Atlassian is your service desk and Confluence becomes your knowledge base for help issues.

The articulation template is more of an architectural high-level design in that it requests details around the Technology, Science (if you work for a lab science business like me), People, Operations, Data & Machine Learning requirements (grouped under Insights) and Finance. Diagrams including wireframes and flows are also useful at this stage so any links to diagramming tools like Miro or Lucid Chart.

A reiteration of the concept, like that in the Product Brief, may seem repetitive at this stage and if the concept has not changed then a link to the Brief can just be provided. Some product concepts may have evolved, and this therefore is a good opportunity to capture that change. Also any further details will help with the articulation of stories which can come from this document.

Articulation Template for Capturing More Detail

Describing Data & Machine Learning Requirements

Agile Machine Learning is a bit of a contradiction as design, training, testing and launch fit a more traditional waterfall approach. Product Management as a discipline sits at the intersection of business need, user experience and technology. A consistent Product Management strategy is necessary for delivering a viable and sustainable product. When the Product Management strategy deviates with every Product Manager hire then the focus and investment can become confused, and you end up a bit like Manchester United. With Machine Learning the requirement for a multi-disciplinary team becomes greater and necessitate ML/Ops, Data Science and hybrid development skills. Again, like Manchester United the hiring of ageing ‘superstars’ is never a good strategy. To be a successful Product Manager with Machine Learning requires flexibility and faith in an iterative process.

I have written on the modelling of Machine Learning operations as a Markov Chain here, as the software delivery model for Machine Learning has a greater number of state transition points than an agile digital delivery.

The epic and stories can frame the Machine Learning problem. The epic must explain the user-centric problem that the Machine Learning problem is trying to achieve. I have worked with 5G radio mast planning designs whilst at BT / EE in the UK. The first 5G sites were costing nearly £500k and had to provide considerable quality of service in dense urban environments. Mobile network planning uses reinforced learning techniques for training and predicting the best deployment model of multiple mobile masts. This is a very human and compute resource intensive process so any optimisation offers considerable benefit.

Machine Learning algorithm selection, in our case this included Artificial Bee Colony algorithms, were the output of a story testing and selecting the most appropriate algorithms during PoC stages. The selection of algorithms were based on comparisons with in-field tests and previous 4G model comparisons. All of this test data was then fed-back into the learning environment.

The nuance from an Agile point of view is that the time taken to attaining an optimal machine learning model cannot be easily predicted and that certain key stories such as model selection and training will run across multiple sprints. Sub-tasks are a good way of documenting the activities for a Machine Learning epic.

One last point to note in any Machine Learning delivery is that research scientists are generally unfamiliar with project management or Agile. In a research institute the time to discovery is does not have a regular cadence. But in a commercial organisation a regular manageable approach is required which can bit chunks out of the greater whole. For this reason, AWS and Azure offer improved visualisation tools for their machine learning capabilities as these lift the point of science away from the necessary infrastructure. If you can break your ML Epics into those that are infrastructure and data based away from those that are training and proof based, then you will be able to achieve success quicker.

Describing Operational Requirements

Products and services require operational support to deal with imperfections and to keep the customers happy. Don’t launch a product without an operational service wrap but also make sure you start capturing operational requirements at the Product Brief stage, because if you can’t support it then you can’t sell it. Capturing these requirements at an early stage is quite complex if you do not have an existing service wrap. If that is the case then simply document how customer issues will be captured, triaged and supported.

The Operations section of the template asks how the product will be supported. In a lab operations environment, the operational support model includes the processes implemented in the ERP and LIMS (Lab Information Mgmt System). These systems should have their own Standard Operating Procedures. So this section should not be a new domain

Conclusion

Introducing an Agile architecture to an organisation is actually very exciting. One of the most satisfying feelings I’ve ever had at work has been working with floating brain in vat scientists, is when they realise the benefits of Agile for working on a complex problem. There’s an enjoyment no matter your background in drawing UX designs and articulating simple stories. A good buddying system can work well, as an example I have paired an ecology university lecturer with a UX designer to define a geospatial planning application and they paper prototyped a highly intuitive solution. Scientists are very competitive for discovery so adding a quantified competitive element like number of story points designed drives the initial cadence and avoids inertia.

Operation Moonshot – A Costed Solution Implementation

The Francis Crick have successfully built a process for Covid PCR testing for patients and NHS Staff. The Crick have also validated a reverse transcription loop-mediated thermal amplification (RT-LAMP) method for 25-minute coronavirus testing. The best way to realise Operation Moonshot is to bring the two processes together and deliver across 200+ NHS trusts.

The following is a costed break-down of all of the necessary components within a solution architecture. I try to provide costed reasoning for all of my assumptions and to use fixed cost points and recent precedent. The costs are broken down into 5 areas: equipment, self-swabbing (as drive thru won’t scale), RT-LAMP testing, IT & processes and rollout. I believe that Operation Moonshot could be delivered for half the UK Government’s initial assessment.

  1. Testing Equipment: The UK Government has already invested in the novel RT-LAMP test capability. The highest throughput machine is the Oxford Nanopore PromethION 48 which can process 15000 RT-LAMP tests a day. Each machine costs just under half a million pounds meaning that handling 10million tests a day would require 667 machines at a non-discounted prices of a third of a billion pounds.
machine list price£476,145
tests per machine15,000
tests a day10,000,000
nhs trusts220
number of machines666.67
cost of machines£317,430,000
cost per trust£1,442,864
cost per trust for RT-LAMPore machines
  1. Testing Capacity Increase & Self-Swabbing Costs: The UK already has an appointment booking process for the national pillar 2 swab testing. These tests are carried out in car and involve bagging the swabs with pre-registered barcodes. There are 50 test sites in the UK which provide the majority of the UK’s capacity of 350k a day. Increasing the testing capacity to 10m a day would require nearly 1500 sites and tens of thousands of more testers.
current test capacity350,000
number of test sites50
site processing capacity7000
number of sites required1,429
Drive through testing capacity

The other testing approaches would be either localised testing making use of any medically trained personnel or through self-testing through posted self-test kits. We will examine the self-test model: Based on 10m test a day the whole UK population will be tested each week meaning that everybody in the UK should be receiving a number of tests through the post. Self-testing would have a lower rate of accuracy but this would be mitigated by the sheer size of the testing quorum. The collection of self-tests will need to be within 24-48 hours for the test to be valid and testing centres will be reliant on the immediate return of tests.

tests a day10,000,000
unit cost per kit£0.50
daily cost£5,000,000
kit test cost for 1 year£1,825,000,000
courier costs per kit£2.50
daily courier costs£25,000,000
courier cost for 1 year£9,125,000,000
total£10,950,000,000
Self-testing costs

The kit and courier costs of 10m tests a day would be in excess of a £10bn a year even with the lower possible unit prices for kits and couriers. To be cost effective the self-test model would need a local drop-off and collection area to lower the total courier costs. Based on a drop-off model of £10 per 100 tests the yearly cost would decrease to 2bn a year.

tests a day10,000,000
unit cost per kit£0.50
daily cost£5,000,000
kit test cost for 1 year£1,825,000,000
drop-off courier costs per 100£10.00
daily courier costs£1,000,000
courier cost for 1 year£365,000,000
total£2,190,000,000
Self-test drop-off costs
  1. RT-LAMP Testing & Results Process: RT-LAMP testing will unpack all self-swab packs and run each sample through the testing lifecycle producing a result within 25 minutes. Test results will need to be validated by medical professionals and positive tests need to be recorded against the summary care record and notified to the relevant Public Health Authority. If each NHS trust would have between 3-6 RT-LAMP machines handling tests and each machine would require a minimum staff of 6 people to continually operate and validate the test results. At an average cost of £40k per FTE this would cost more than £200m per year.
tests a day10,000,000
nhs trusts220
RT-LAMP machines per trust4
trust daily throughput45,455
daily FTE requirement24
extra staff requirement5280
staff costs£211,200,000
Assessment of staffing costs for handing RT-LAMP process
  1. Central IT Costs, Notifications and Mobile App: The national roll-out of a 10 million a day testing service would be vastly complex, far more complex than mere rocket science! Achieving such a service would require both centralised common processes and local variations to succeed. A good example of local variances would be the designing of the self-swap collection locations. A successful would also need the IT and process functions to be right first time, including the mobile app launch. The IT functions could be realised within a multi-tenanted ITIL compliant solution (e.g. ServiceNow) which would allow centralisation and local variances. Such a solution would also allow for stock and asset management. All test records could be centralised from the RT-LAMP machines and then fed to the relevant PHA’s by integration with the final notifications going to the public via a mobile app. Staffing would manage the end to end processes and the criticality of the data demands a security overhead. It is not unreasonable to include a 30% contingency on the total.
centralised IT process£5,000,000
localisations budget£10,000,000
staffing£2,500,000
mobile app£2,000,000
PHA integrations£2,000,000
security£3,000,000
contingency£7,350,000
total£31,850,000
Assessment of IT and process costs
  1. Rollout Process: Rollout costs should be viewed separately as deployments would take time to bed in and would need a degree of local stock and asset management. Precedent suggests that getting to 100k daily tests would have more easily achieved with a lot of small ships rather than following a centralised model. It is therefore not unreasonable to suggest a £10m budget per trust for rollout processes. If the rollout were to include many more smaller GPs then that budget would have to be increased, for this reason I’ve included a 30% contingency.
number of trusts220
cost per roll-out£10,000,000
contingency 30%£660,000,000
total£2,860,000,000
Roll out costs
  • Total: The total cost assessment is for one year only but is approximately half of the UK Government’s assessment of £10bn. The most accurate costs are for the Testing Equipment based on the capacity and list prices of the Oxford Nanopore equipment. The self-swabbing approach is based on a collective drop-off solution as otherwise another £8bn could be spent on individual collection of swabs. The RT-LAMP costs as predominantly staff costs for 5000 new staff. The IT costs include a 30% contingency and are based on the UK Government getting its IT right first time. The rollout costs are the the highest individual costs but should be a year one only cost and do not include any economy of scale across multiple NHS trusts who may be able to work together.
Testing Equipment£317,430,000
Testing Capacity Increase & Self-Swabbing Costs£2,190,000,000
RT-LAMP testing£211,200,000
Central IT Costs, Notifications and Mobile App£31,850,000
Rollout£2,860,000,000
Total£5,610,480,000
Total cost assessment

Operation Moonshot has not published any assumptions, cost validation or time period for its £10 billion total cost. The above costs are all based on my recent previous experience of Covid-19 PCR testing. It is not unfeasible that Operation Moonshot could be achieved for half the costs currently being claimed.

IT Spend Analysis of UK Government U-Turns in 2020 (so far)

There have been 10 UK Government U-Turns so far in 2020. Each change will have had an associated IT change cost. This is my best personal assessment of what each of these changes would likely have cost. I will provide justification for each of my assumptions and will tend towards a lower possible range. I will t-shirt size each U-turn using Low (£500k), Medium (£2-5m), High (£10m) and Very High (£50m+) as thresholds.

U-Turn Number 1: Testing In The Community 12th March – IT cost assessment: Low (circa less than £500k sunk cost)

  • This U-Turn was a retraction towards testing in hospitals rather than testing in the community. There would have been a ‘sunk’ IT cost for the testing in the community work. This testing would have involved Public Health England implementing a field service for remote swab testing and delivery of those swabs to test centres. The IT required would have extended PHE’s time booking system and resource planning. IT changes to these systems would have had IT costs. As this was scrapped relatively early we can assume that there would have been no further licence of infrastructure costs.

U-Turn Number 2: Face Coverings – IT cost assessment: Zero

  • No IT changes as this was a policy and information change.

U-Turn Number 3: NHS visa surcharge – IT cost assessment: Medium (£2-5 million sunk cost)

  • The NHS surcharge has been around since 2015 and is paid when applying for a UK visa. There are a number of applicants who do not have to pay it. The payment method is an online transaction (or cash if from North Korea). The government U-turn means scrapping an existing process and an IT solution that is less than 5 years old. Making the assumption that any online electronic payment solution (at UK government rates) would cost at minimum £0.5m to implement added to the integration costs (£0.75m) with UK visa system and vetting services within (another £0.75m) NHS trusts it is not unreasonable to expect a £2million sunk cost. The service is still available here.

U-Turn Number 4: NHS Staff Bereavement Scheme – IT Cost Assessment: Low (£500k as predominantly configuration changes)

  • The bereavement scheme, introduced in April, initially excluded cleaners, porters and social care workers. Introducing more groups would have incurred some configuration changes to the claims process and new infrastructure costs. £100k would be a low assessment for implementing these changes.

U-Turn Number 5: MP Proxy voting – IT Cost Assessment: Zero (no actual change)

  • The government had to U-turn to allow shielding MPs to vote by proxy. The remote proxy voting system will have had an IT cost but as no IT systems were removed there is no sunk cost for this U-turn. The introduction of a secure proxy voting system will have a necessary cost.

U-Turn Number 6: Re-opening schools – IT Cost Assessment: Medium (£2-5m as schools will have scaled IT for different re-openings)

  • The school re-opening would have forced each individual school to scale its IT solutions according to the expected demand. Centralisation of IT across the UK’s 33,000 schools provides an economy of scale but there will still have been significant unnecessary overspend caused by a late U-turn.

U-Turn Number 7: National school meal vouchers – IT Cost Assessment: Medium (£2-5m for claims process and roll-out)

  • The introduction of a national school meal voucher system required an immediate build of an IT claims and spend system. It will also have required IT investment in each supplier’s ability to scale. As this was predominantly a procedural and sizing change we can assume that the IT impact would have been relative to the size of the roll-out. For this reason I’m assessing this as having a medium impact.

U-Turn Number 8: UK Contact-tracing app – IT Cost Assessment: High (£10m+ major investment on a non-usable disliked technology)

  • As of June 2020 we know that the cost of the UK tracing app was £11.8m. It is not unreasonable to expect further costs to have been spent on testing across the Isle of Wight and preparing for national rollout.

U-Turn 9: Local contact tracers – IT Cost Assessment: Very High (£50m+ with major write-off of a centralised contract tracing service)

  • The centralised contact tracing model had its own IT solutions which are now inappropriate for scaled local use. The centralised solution had scaled infrastructure and licences for 18,000 users. It would have had a communications service equivalently scaled. The local authority solutions could not have easily been separated from the national solution meaning a lot of new build and completely new infrastructure. The costs would be very high because it has to include the completely throw away nature of the national solution and the costs of multiple stand-alone local authority solutions.

U-Turn 10: A-level and GCSE results – IT Cost Assessment: Medium (£2-5m for building and implementing algorithm and significant testing costs)

  • The government was forced to act after A-level grades were downgraded through a controversial algorithm developed by the Office of Qualifications and Examinations Regulation, leading to almost 40 % of grades awarded being worse than expected by pupils, parents and teachers. This service would have had to incur a cost to model, develop and test. It would needed to have been developed in less that 3 months and to be applied across a large data set of disparate data feeds. Different algorithms, and builds, would need to have been applied for GCSEs and A-levels.

Conclusion: Total Cost Very High (Low estimate £150m+)

Change is the most expensive process in IT. Fast change is even more expensive. Waste also incurs a missed opportunity cost of what else could be done with the capital investment. It also creates a culture of inefficiency where requirements become designed to handle all possible future change rather than focusing on immediate deliverables. All of these U-turn costs were avoidable. All governments should be held to account on the waste associated with U-turn changes.