mustnotgrumble

How do Covid-19 tracing apps work and how do they comply with EU-GDPR?

The UK is trialling its Covid-19 contact tracing application which tracks human interactions. The app uses Bluetooth Low Energy (BLE) communications between smartphones for registering handshakes’ duration and distance. This data is then uploaded to a centralised database so that if a user self-registers as Covid-19 positive, the centralised service can push notifications to all ‘contacts’. This is a highly centralised model based around relaying all users ‘Contact Events’ together with user self-assessments of Covid-19 symptoms.

A user can self-report having the symptoms of Coronavirus. They cannot report a positive test for Coronavirus as there is no way of entering either an NHS_ID or a Test_ID. Technically the UK mobile app does not match the mobile App ID against the user’s NHS ID and there is no mapping between the app and NHS England’s Epic system. This approach allows for greater anonymity as the centralised database will not be recording a user’s NHS identifier. The downside of this approach will be a higher percentage of false positives and contact notifications .

NHSX’s Covid-19 App requires GDPR Consent as the legal basis to enable permissions

NHSX and Pivotal, the software development firm, have published the App’s source code and the App’s Data Protection Impact Assessment. The latter is a mandatory document within the EU-GDPR framework. The user provides ‘Consent’ as the legal basis for data processing of the first three characters on their postcode and the enabling of permissions. The NHS Covid-19 app captures the first part of the user’s postcode as personal data. It then requests permissions for Bluetooth connectivity necessary for handshakes and Push notifications necessary for file transfer.

The UK NHS app captures ‘Contact Events’ between enabled devices using Bluetooth. The app records and uploads Bluetooth Low Energy handshakes based on the Bluetooth Received Signal Strength Indication (RSSI) measure for determining proximity. Not all RSSI values are the same as chip manufacturers and firmware are different. The RSSI value differs between different radio circuits. Two different models of iPhones will have similar internal bluetooth components whereas on Android devices there will be a large variation of devices and chipsets. For Android devices it will be harder to absolutely measure a consistent RSSI across millions of handshakes. A Covid-19 proximity virus transfer predictor should take into account the variances between BLE chipsets.

All of the ‘Contact Events’ are stored in the centralised NHSX database. This datastore will most likely hold simple document records for each event, its duration, the average proximity, the postcode (first three characters) only where that occurred and which devices were involved. It will then run queries against that database whenever a user self-registers their Covid-19 symptoms. The centralised server will push notification messages to all registered app users returned in that query. The logic in the server will most likely take a positive / inclusive approach to notification so that anybody within a 2 metre RSSI range for more than 1 second of a person with Covid-19 symptoms will be notified.

All EU countries must comply with EU-GDPR and all are currently launching their Covid-19 tracking applications. These applications because they require user downloading can only use ‘Consent’ as the legal basis for data capture and require a register of the user’s consent. The user must also be able to revoke ‘Consent’ through the simple step of deleting the app on their device. A more pertinent challenge is within a corporate or public work environment where there can be a ‘Legitimate Interest’ legal basis for capturing user’s symptoms. For example a care home could have legitimate interest in knowing the Covid-19 symptoms of its employees. It is likely we may see the growth in the use of private apps encouraged by employers if the national centralised government apps do not reach a critical mass. Either way we live in a smartphone world and bluetooth’s ubiquity is now certain.

Unit Tests for Covid-19 Polymerase Chain Reaction Software

Featured mustnotgrumbleLeave a comment

The Francis Crick Institute has repurposed its laboratories as an emergency Covid-19 testing facility. The Crick is helping combat the spread of infection and allow key workers to perform lifesaving duties and remain safe.

The scientists who made a 'home-brew' coronavirus test - BBC News — Crick repurposed PCR testing lab

One of the main technologies the Crick is using in this effort are Polymerase Chain Reaction machines. PCR machines test for the presence of a specific nucleic acid. The end to end process involves capturing molecules on a swab that are then broken down into genetic code, using special chemicals and liquid handling robots. The PCR (polymerase chain reaction) machine can then make billions of copies of DNA strands from the original swab. The PCR machine tests for the presence for the Covid-19 RNA. This is done on a series of 94 Wells (each containing an individual swab) on a Plate within the ThermoFisher PCR machines. The final step involves specialist clinicians making the decision on whether the sample contains sufficient RNA to justify the presence of Coronavirus.

The qPCR test produces a graph showing the exponential progress (or not) of the Cq (Ct) value as it traverses the threshold. The Cq value is the cycle quantification value of the PCR cycle number at which the sample’s reaction curve intersects the threshold line. This value tells how many cycles it took to detect a real signal from your samples. Real-Time PCR runs will have a reaction curve for each sample, and therefore many C_q values. Your cycler’s software calculates and charts the Cq value for each of your sample

Normal Fluorescence Graph in a PCR test

To help support the clinician diagnostic phase we have written a series of complementary tests. These seven tests (github) test each Well, each Plate and a series of Plates. The test data comes from the the ThermoFisher PCR machines and QuantStudio software.

T#	Test Name	Possible Outcomes	Scope	Data Required
1	Ct Range	Ct value ? < Threshold : > Threshold	Per Well	Ct Value, Exponential Phase, Pre-Exponential Phase, Threshold
2	Amplification Effect	Measure of exponential phase approximation to 100%. If Slope <0.01 test failed	Per Well	Intercept, Plot, Slope
3	R^2	R² is the Coefficient of Determination of a whole plate with a maximum and perfect value of 1 R Squared < X == failed test	Per Well & Per Plate-	R²value
4	Plate Standard Curve	Standard curve for baseline wells (A1 and H8)	Per Plate	Test well positions and reference gene
5	DeltaDelta Ct	Very similar to ANCOVA. Requires Treatment, Control and Reference genes. Difference between Delta CT Values	Multiple Plates Multiple Samples	Compare DeltaCT across multiple Gene samples. Requires a Reference Gene and a Reference Group
6	ANCOVA	Covariance analysis – what is the Ct difference of target gene value between treatment and control sample after corrected by reference gene	Multiple Plates (need reference & target, treatement & control)	CT and Concentration.
7	Efficiency Model	Log score per Well between 90% & 110% Across multiple Plates a value of Rsquared greater than >0.99	Per Well & Multiple Plates	Per Well: Column R in Results Sheet provides efficiency score per WellMultiple Plates: Slope: ~ –3.3R² >0.99

Seven logical tests implemented in code

The outcome of these tests can then be used in conjunction by the clinician reviewing 94 individual wells (each representing a unique swab). The intent is that this helps reduce human error and can improve the clinical throughput.

My Personal Experience of Working on 5G with Huawei

Featured mustnotgrumbleLeave a comment

The following blogpost explains my experiences with Huawei on 5G for the UK’s largest mobile operator. I was the lead architect responsible for the IT functions for their 5G deployment. I had a relatively close working relationship with Huawei. In summary I did not see any security issues with Huawei beyond the normal human security risks that apply to all vendors. I saw a vendor with strong investments in 5G and with good case studies from existing deployments. The removal of Huawei from the acceptable list of vendors will be to the technical detriment of my previous employer. I have since left BT and now work in biomedical research. My comments here are my own and are not influenced by any other party.

Case For:

In late 2018 BT & EE took the decision to not invite Huawei to respond to our new 5G mobile core RFP. I personally believed this was a mistake, as Huawei were one of the two incumbent suppliers of EE’s 4G LTE network core. EE currently use this 4G LTE network core to support the UK’s Emergency Services Network. I always believe that BT’s fiduciary duty is towards its shareholders and Huawei had provided a reliable & secure 4G core at a competitive price. Removing Huawei from the RFP increased the migration complexity and shrank the pool of possible vendors.

EE had the relevant software engineering skills to make a relevant technology assessment of any risks associated with Huawei. EE definitely had stronger domain knowledge and technical skills as GCHQ. However, since the acquisition of EE by BT those skills have started to leave the business. BT has been off-shoring key technical roles: preferring to keep its ‘business architects’ on-shore, and to move their technical skills off-shore. This has had an impact that there is now a shortage of on-shore technical skills within BT relating to 5G.

Huawei have invested very strongly in 5G technologies as was evident from previous demos of their technologies I have seen. Their reference case scale is also incredibly impressive: China Telecom are deploying one hundred thousand 5G masts. The equivalent in the UK, would be 5000 masts by the end 2020, and that would be across all four operators. All UK deployments of 5G could have benefitted from this 5G technical domain knowledge sharing.

Case Against:

One argument I have heard is around a 5G security access risk from the user plane accessing a backdoor into the control plane. When pressed this scenario involves a secret code being passed over the network, like a specific ‘secret’ telephone number, that opens a backdoor port into the mobile core. This is spurious for two reasons, the network implements a control plane and user plane split that makes this impossible. CUPS (control user plane split) is also one of the main architectures of 5G. The second reason is that any control to user plane integration would be network monitored and discovered by the operators.

Telecom operators invest in their network monitoring and reporting technologies. These allow the operator to see the heath of the network and to visualise the traffic flows within the network. Access to the internet is always through Internet Peering Points to which the control plane is not connected. If there was an open connection between the control plane and an internet peering point then it would be either monitored or discovered by the mobile operator

A continual security issue is a traditional issue with an industry with so many technologies and processes originating from during the Cold War. This is an issue of spies or operatives working with direct access to the telecommunications network and having the skills to eavesdrop on communications. This will always be a risk but risks can be mitigated by appropriate processes.

Conclusion:

I do not believe the back-door theories spread by certain security experts. The architecture of 5G control user plane split makes any back-door harder to access. Any tracking issues . Human risks are always present but can be mitigated. Access to the mobile core requires vetted clearance and UK tier 2 visas for Chinese workers are for only 3 months so Huawei employees never had direct access to live systems.

What is likely to occur is that telecom operators losing technical skills will become more reliant upon the OEMs for domain knowledge. If the largest OEM is excluded then the operators will either deliver things slower or at greater cost.

Mobile Network Operators and UK Open Banking – Role of Password-less Multi-Factor Authentication and 5G Network Slicing (of course)

Featured mustnotgrumbleLeave a comment

In 2017, 22 million people managed their current account on their phone which is predicted to increase to 35 million customers using mobile banking applications by 2023. The mobile phone, rather than internet or retail banking, is also the de facto standard for mobile banking services with more than 250 million Apple Pay users.

UK Open Banking is intended to create a FinTech market similar to a 1980s consumer credit boom by decoupling the underlying bank from the service provider. Open Banking promotes an aggregated single view of all of a customer’s accounts in one place as well as aggregated personal finance and debt management tools. This creates an opportunity for the Mobile Network Operator interested in providing financial services without undertaking a full banking licence.

Open APIs and security are critical to Open Banking. The Open APIs enable third-party developers to extend the services of financial institutions. Open Banking effectively supports and extends the European PSD2 directive, how non-Brexity!. In Open Banking, the UK CMA introduced rules that mean that banks must allow the customer to share their financial information with other AUTHORISED providers. These are known as Account Information Service Providers (AISPs) and are regulated by the FCA. This requirement creates an opportunity for the Mobile Network Operator to either become a Mobile Banking AISP and / or to be a more general provider of Security Services to AISPs and Banks. Both options benefit from specific technologies that the MNO can provide. These include:

a 5G Network Slice dedicated to “Mobile Banking”
the exposure of Risk Evaluation services based on fraud prevention and location data
the implementation of Passwordless Multi-Factor authentication service

Network services that increase the quality and security of mobile banking

Users of any service do not like service continuity issues. This discontent is greater when the interaction is form based and stateful; and the worry is higher if the session drops during a mobile banking transaction. For example, it can be peeving when session interruption affects transferring money whilst in the back of a taxi on the way to an airport. Mobile applications can handle session management issues more gracefully than mobile browsers. Nevertheless there will always be customer dissatisfaction associated with session drops when using mobile banking services.

5G provides improved session and service continuity. One of the key features of a 5G data service is session and service continuity, it ensures uninterrupted service experience to the user regardless whether there is any change of UE (User Equipment) IP address or change in the core network anchor point (4G LTE evolved packet system only provides continuity of IP session). This means that the Mobile Network Operator can provide a chargeable “Mobile Banking” Network Slice; or consume the service itself as a Open Banking service provider.

A 5G Mobile Network Slice dedicated to “Mobile Banking” can also provide enhanced user security as unique security parameters can be defined for network slices individually.

Multi Factor Authentication mechanisms provided by the Mobile Network Operator

The MNO can provide enhanced security based on location based services (subject to GDPR & customer approval). The MNO can provide a risk score based on location of the customer.

The Mobile operator knows through the National Device Register if the device has been stolen. The MNO can provide improved 2^nd and 3^rd Factor authentication protection through the Equipment Identity Register. This is important as finger print spoofing is a known and achievable process; and an amputated digit injected with Botox will continue to provide a useable finger print for two weeks!

Mobile operator understands the roaming likelihood and can quantify the risk Matching spend and location reduces fraud. Hence the Apple Pay contactless system does not have a £30 limit. In fact it is even safer as a physical card can be cloned and a four digit pin can be noticed.

The MNO can also wrap 2^nd and 3^rd factor authentication into its mobile app as an identity provider in the Open Banking universe. And it can provide commercial Risk and Location based APIs consumable by Open Banking service providers.

How Open Banking Implements Multi Factor Authentication and Strong Customer Authentication

UK Open Banking can implement Multi-Factor Authentication including Passwordless authentication mechanisms as part of Account Information Service Provider and Payment Initiation Service Provider flows. UK Open Banking uses OAuth 2.0, OpenID Connect and the Financial API specifications from the Open ID Foundation. This extends the PS2 OAuth 2.0 flow where the providing bank must use Strong Customer Authentication to authenticate the user.

This can be a Username / Password combination or a higher factor of authentication. More interestingly this can also be Passwordless (finger-print recognition) authentication by seamlessly pushing authentication to the bank’s mobile app (if on a mobile device). Alternatively this push can be to Account Information Service Providers’ authentication service. The Mobile Network Operator can be a UK Open Banking Account Information Service Provider using a 3 Factor authentication in a single passwordless action supplemented by the MNO’s own location based and fraud detection services

Use of Open Banking in the Internet of Things

The Mobile operator can also support an AISP model when supporting consumer Internet of Things propositions. As an example, the consumer with a listed Airbnb property that includes a number of smart devices may choose to manage the IoT contracts through a separate bank account whilst managing all their accounts through a single AISP. This creates a nice up-sell loop for the Mobile Network Operator providing AISP capabilities alongside IoT propositions.

Conclusion

Trust is critical for the success of mobile banking. Security breaches can lower the adoption of online banking services. The most effective mobile banking service is the one that integrates all of the available security tools together. This is one that the Mobile Network Operator already does well and can do better with 5G Network Slices and the use of Passwordless 3 Factor Authentication.

Good Data Governance is required to gather and store customer consent as part of Auditing phase of implementing Open Banking. The flow to secure the relationship between the Bank and the Open Banking provider must be Multi-Factor Authentication mechanism. The only way to make mass market 3-Factor Authentication any stronger is to utilise the MNOs location services.

Finally, Mobile Networks Operators have historically made poor banks but with Open Banking they do not need to take that long step. Instead they can aggregate their customer’s existing banking providers through Open Banking.

Telco Use Cases for Google Cloud Dataproc – managed Spark & Hadoop for Mobile Network Performance Data

Featured mustnotgrumbleLeave a comment

As a Google Cloud Platform certified architect I really should blog some more about my actual usage of GCP. One of my favourite tools is Dataproc as it provides a managed Spark & Hadoop environment and enables a lambda architecture suitable for complex network event processing and function remediation.

A mobile radio network is a dynamical system that can be modelled ergodically. Meaning that the radio network performance in geometrical space should be observed and modelled over a period of time. Storing this sort of data requires a geospatial datastore and a timeseries datastore. It is a huge amount of data stored as a nested map. This is why Dataproc’s ability to provide a probabilistic approach to testing a deterministic system is really useful in a remediating / self-healing mobile network.

Apache Spark provides the parallel processing of the variant datastores as Resilient Distributed Datasets (RDDs). Modelling the baseline data for geospatial topology, coverage and time-based trials is not trivial. But the fundamental processing of huge datasets for improved RAN distribution is highly challenging but eventually highly beneficial.

Guaranteeing Network Slices: The Role of 5G WAN Optimisation for Small Cells

Featured mustnotgrumbleLeave a comment

The City of Sacramento has deployed 300+ small cells as part of a 5G Fixed Wireless Access deployment with Verizon. These deployments can only provide partial 5G coverage of a city like Sacramento. This is because of the relative transmission range of 26Ghz and 28Ghz spectrum. In fill is required with further small cells and coverage fill is required at mid-band Spectrums like 3.5Ghz.

The most effective way of delivering backhaul to multiple small cells sites is to use SD-WAN technologies over either Ethernet or microwave links. WAN Optimisation requires an intelligent-path control mechanism for improving application delivery and WAN efficiency. This intelligent path control and management of VPN tunnels needs to be integrated into the network slice management control plane function in order to guarantee the mission critical services.

The Network Slice Management control plane needs to manage end to end the latencies and traffic shaping. To do this the SD-WAN component for small cell backhaul must be an integral part of the end to end network orchestration.

Master Orchestrator Problems

The challenge for telcos face is how to integrate technology specific orchestrators. A 5G SD-WAN small cell solution could involve four unique orchestrators:

small cell orchestrator
with a 5G core orchestrator
a network slice orchestrator (NSSMF)
and multiple existing SD-WAN orchestrators

Most telcos have already deployed a SD-WAN products, involving multiple SD-WAN CPE vendors, where each CPE vendor provides a bespoke orchestrator. Industry examples include, the Cisco Viptela SD-WAN solution which uses a vManage network management solution within the orchestration / management plane and the Nokia Nuage SD-WAN solution that follows the same pattern.

To break this predominance of orchestrators (with lots of compensating logic) it is important to seek integration by API direct to the control plane. To be successful telcos may wish to examine how a vendor agnostic Network as a Service may improve their 5G orchestration strategy.

A 5G Data Fabric

February 28, 2019 mustnotgrumbleLeave a comment

Most 5G deployments will not be greenfield. But a successful 5G deployment is not limited to simply deploying new radio on existing sites. It requires a new approach to telecom IT that can both simplify the telco’s estate and prepare for the new business opportunities of 5G. A complete data fabric (for 5G or for everything) will support both the business opportunities and the network complexities of 5G.

A data fabric includes all of the necessary data services for operating a mobile network and providing connectivity and ‘beyond connectivity’ services. This means offering the many different persistence storage toolsets to your business logic layer (as represented by micro-services in docker). An application can then utilise the most appropriate persistence technology for their requirements. For example, this could mean exposing a RDBMS for structured data, a Graph for modelling topologies and document storage for persisting YANG documents.

Data Fabric for a Micro Service Architecture

The business value of the data fabric is that it allows the clever telco to disassociate their software requirements’ from the data plane. Thus enabling a micro-service architecture that can manage a virtualised network; and then on top of that expose services to their customers.

Key data fabric use cases for 5G include:

A network planning architecture for geo-planning cell site deployments including in-building
A network topology architecture that can model a highly complex network and enable Self Organising Networks
A time series streaming architecture that can model events coming off a network and a customer’s deployments and enable effective Machine Learning driven autonomic improvements
A network orchestration architecture for a virtualised network (full or partial)
A network slice management and guarantee architecture (with support for a blockchain based service level guarantee)
A subscriber data management architecture for unified value added services and subscription

The following is my description of a logical data fabric for a 5G implementation. I am publishing it because it can help network operators to push their software vendors to decouple the software’s logic from its data persistence. Below are all the logical tools needed:

RDBMS for ACID based transactions that are useful for physical inventories, managing subscription updates (less so reads) and all other structure data
Graph database for modelling network topologies, relationships and dependencies. Very useful for machine learning, root cause analysis and spotting previously unknown interconnected loops between items
Wide column database for dealing with unstructured extensible datasets that include all the different devices supported on a 5G network. Very useful within IoT and customer network experience.
OLTP NoSQL database for offline analytical processing including network topology efficiency modelling and network performance analysis as part of an ITIL Problem / Change Management process
Document datastore for managing Infrastructure as Code and Virtual Network Function deployment descriptors in the form of YANG documents. Useful in blockchain contracts and services.
In Memory Datastore for fast reads and data caches
Geo-spatial database for modelling RAN deployments and radio propagation. Incredibly important as RAN efficiencies have a major bottom line impact. Increasingly need to support in-building information for small cell deployments. Needs to work together with other radio technologies including 5G.
Time series database for performance monitoring which can be implemented within the customer network experience function of a wide column database and with use of Grafana and Prometheus

Some 5G Data Fabric Use Cases:

	RDBMS Database	Graph Database	Wide Column	OLTP	Document Data Store	In Memory Database	Geo-spatial Database
Network Plan & Build and Analysis	Y			Y			Y
Physical Network & Static Inventory	Y						Y
Virtual Network & Dynamic Inventory		Y			Y
Fast Read Inventory		Y				Y
Streaming Fast Analysis		Y	Y
Offline Event Analysis		Y		Y
Subscription Management & Entitlements	Y					Y

In conclusion, most telcos have bought siloed commercial off the shelf products for individual specific use cases. This has meant that the telco has often only used as little as 40% of the intrinsic value of their commercial software licences. The cost of building 5G will be high, and the greater share of the prize will go to the most agile operators. It is therefore incumbent on mobile operators to drive the greatest efficiencies from their software investments.

5G is a great driver for change. The most effective 5G operators will be those that can get their data architecture right first time. Telecom operators must start moving to a data fabric.

Economic Analysis of mmWave Fixed Wireless Access as an Alternative for FTT/x

January 26, 2019 mustnotgrumbleLeave a comment

FWA is not a new idea with 5G and has been available to anybody tethering since 3G. FWA is comparable to Fibre-to-the-Home as both are connectivity solutions for the edge of the network. 5G mmWave (~25Ghz and above) is promising an alternative to FTTH, with 1Gb per second download speeds. It is therefore worth understanding the technologies and engineering necessary to make FWA a viable or better alternative to fibre.

Verizon has targeted FWA as an alternative to FTTx with its 5G Home service launched across Houston, Indianapolis, Los Angeles and Sacramento in October 2018. Verizon estimates the 5G mmWave FWA addressable market to include 30 million premises. To be successful Verizon’s FWA has to be cheaper than the delivery of FTTx and will have to overcome some quite considerable engineering challenges. These include the roll-out of multiple 5G antennas with small-cell front-haul for extended coverage, the deployment of external to home 5G receivers, a distributed core that can host Mobile Service Edge and CDNs close to the 5G Cell Towers, and a new 3GPP Release 16 Core that can support network slicing for the 28Ghz spectrum.

The above diagram shows a logical architecture for a 3GPP Release 16 compliant new mobile core connected through multiple distributed sites connected to radio site gNodeBs delivering FWA service to the home. A new core is not fully necessary, as Verizon are launching already using their channel coding, multiplexing and interleaving technologies. A new mobile core will be advantageous in guaranteeing the QoS for mmWave FWA slices.

The majority cost for FWA is in the delivery of the radio network and mmWave antenna. Higher costs will always be incurred if RAN planning has not been optimised and necessitates 5G small cell in-fill. For this reason mmWave may be better deployed as new sites in a standalone Model 2x configuration. Other costs include upgrading the mobile core but this cost is shared with other 5G use cases. Spectrum licencing is another important cost. Currently mmWave licence spectrum is relatively available, hence lower cost, and more extremely high frequency is being released by national regulators.

To be competitive FWA must be economically viable against fibre delivered to the home. This includes internet peering & CDNs. In regulated territories like the UK that already have Local Loop Unbundling the competitor CSP can consume service from the distributed site. This has been part of the US regulatory framework since the US Telecommunications Act of 1996 that requires ILECs to lease local loops to competitors (CLECs). In an all fibre model the cost of connection is to the premise (FTTP) or home (FTTH). If regulatory dark fibre or open ducts are in place then the competing CSP can consume those services at a regulatory defined price. In the UK that model is only being developed after initial regulatory challenges and in the US the FCC has not extended enforcement of dark fiber offering since 2014. It is therefore suitable for a US mobile carrier to consider 28Ghz as a more efficient distribution mechanism than FTTH if there are no regulated dark fibre or open-duct solutions available. It is also worth considering that the civils part of the delivery of fibre (the dotted FTTH line in the below diagram) can cost as much as 90% of the total service delivery cost.

A final comparison between FTTH and FWA:

Same Costs: Network spine, backhaul and equivalent equipment are the same for FTTH & FWA
Higher FWA Costs: The spectrum licence costs are unique to FWA but due to spectrum availability may not be prohibitive, power & cooling costs are higher for FWA and the maintenance cost of FWA should be higher for exposed antennae
Higher FTTH Costs: The only cost that is higher with FTTH is the civils part of delivery. This cost can be very high because of the complexity of getting wayleaves and permissions and digging up roads.
In conclusion, FWA should be more efficient and cheaper service to deliver as long as the network planning is accurate and does not necessitate continual modification based on further cell deployments.

Monitoring Micro-Service Applications across Hybrid Clouds using Istio service mesh multi-clusters, Kiali observability, Zipkin tracing, Prometheus events and Grafana visualisations

January 23, 2019January 24, 2019 mustnotgrumble1 Comment

Most enterprises have complex application deployments across their own internal data centres and commercial clouds. I am using Google Cloud Platform and AWS in this example. Where I work, we traditionally monitored logs and configured alarms for network and infrastructure monitoring. This approach was disjointed and slow to react. The enterprise moved to cloud hosting with elastic scalability a few years ago which led to multiple stove pipes of monitoring capability and a heavy dependency on VPC interconnects. We wanted to move to a multi-cloud environment whilst maintaining the benefits of a centralised technology operations centre.

We quickly realised that we had specific workloads running in different environments with no common mechanism for monitoring & reporting. This led us to examine open-source monitoring architectures based on Netflix’s Keystone Pipeline. Our requirements were for a universal data visualisation and observation of our application based on Grafana, Zipkin and Kiali.

Logical architecture and open source technologie

This architecture is based on open-source projects that we can use across GCP, AWS and internally. Everything is predicated on Docker containers and Kubernetes container orchestration. Istio provides the policy and load-balancing functions of a service mesh and GRPC provides the low latency integrations between the micro-services. These technologies provide the enablers for the monitoring & visualisation capabilities of Kiali, Zipkin and Grafana.

The following diagram shows the open-source component architecture to support different internal data centres (one for IT running Pivotal and one for mobile network IT running Openstack), Google App Engine and AWS Kubernetes service EKS on EC2. This logical architecture has the intention of a single pane of glass for service management toolkit technologies.

Open Source Monitoring Toolset across Hybrid Clouds

To achieve a single pane of glass across multi clouds requires the need of a aggregation function that can integrate the control plane of multiple Kubernetes container orchestrations. Istio achieves this by supporting multicluster deployments across hybrid clouds by deploying a control plane to each Kubernetes cluster. Kiali can provide service mesh observability of a Istio multi-cluster environment. A Helm variable global.remoteZipkinAddress can be used to connect Zipkin distributed tracing to the Istio cluster.

All of this together enables a Kubernetes control plane on each hybrid cloud environment to be interconnected to the master visualisation technology operations centre environment.

The traffic flow of a Kube ingress allows the ELB using GRPC to integrate multiple clusters where the Prometheus collection agents are deployed. These can then be aggregated together through the Prometheus server in the logical control plane.

Note that the HELM Tiller deployments to each cluster support the multi-cluster control plane as described here.

Kubernetes and Istio Mixer Control Plane for Multicluster Deployments

Prometheus provides the time series of events for the multiple clusters that can then be queried by any Grafana server which treats storage backends as time series data (Data Source). Each Data Source has a specific Query Editor that is customized for the features and capabilities that the particular Data Source exposes. Grafana can also consume StackDriver, CloudWatch and Ceilometer for Openstack.

In conclusion:

Istio, Helm & Tiller can manage a multi-cluster hybrid cloud deployment
moving to a hybrid cloud requires a visualisation of complex integrations which is where Istio and Kiali service mesh observability are strong
hybrid cloud monitoring can be achieved by deployment of agents including Prometheus collection agents to individual clusters and connected to a Prometheus server which in turn is rendered by a Grafana server
Zipkin provides distributed tracing and integrates with the Istio managed cluster

One point not described is the requirement for a technical inventory that describes the individual micro-services and the toolsets that can be deployed to each container, but i’ll save for another blog.

Finally, there are technology alternatives to Kiali, Zipkin, Grafana and Prometheus such as included Logstash & ELK, FluentD and commercial solutions like Datadog.

5G and TM Forum Digital Transformation Middle East

January 21, 2019 mustnotgrumbleLeave a comment

I’m talking at the TM Forum Middle East Digital Transformation event https://dtme.tmforum.org/speakers/charles-gibbons/ on 5G. It’s great to be invited to share my knowledge of 5G architecture and delivery. I will be covering the roll out of 5G service in the UK and will be specifically covering how knowledge share is critical for successful implementations of 5G.

EE is launching 5G in the UK in 2019 across 16 cities: https://newsroom.ee.co.uk/ee-announces-5g-launch-locations-for-2019/

Focus on 5G Monetisation and the business value and the need for Open APIs for an ecosystem architecture. Telcos do not have a domain right to provide IoT services over 5G. It is important that all CSPs support open APIs for their 5G services including TM Forum, GSMA OpenAPIs, ETSI Mobile Edge Compute APIs, NIST and other more commercial offerings.