Guaranteeing Network Slices: The Role of 5G WAN Optimisation for Small Cells

The City of Sacramento has deployed 300+ small cells as part of a 5G Fixed Wireless Access deployment with Verizon. These deployments can only provide partial 5G coverage of a city like Sacramento. This is because of the relative transmission range of 26Ghz and 28Ghz spectrum. In fill is required with further small cells and coverage fill is required at mid-band Spectrums like 3.5Ghz.

Sacramento 5G FWA Coverage

The most effective way of delivering backhaul to multiple small cells sites is to use SD-WAN technologies over either Ethernet or microwave links. WAN Optimisation requires an intelligent-path control mechanism for improving application delivery and WAN efficiency. This intelligent path control and management of VPN tunnels needs to be integrated into the network slice management control plane function in order to guarantee the mission critical services.

The Network Slice Management control plane needs to manage end to end the latencies and traffic shaping. To do this the SD-WAN component for small cell backhaul must be an integral part of the end to end network orchestration.

Master Orchestrator Problems

The challenge for telcos face is how to integrate technology specific orchestrators. A 5G SD-WAN small cell solution could involve four unique orchestrators:

  1. small cell orchestrator
  2. with a 5G core orchestrator
  3. a network slice orchestrator (NSSMF)
  4. and multiple existing SD-WAN orchestrators

Most telcos have already deployed a SD-WAN products, involving multiple SD-WAN CPE vendors, where each CPE vendor provides a bespoke orchestrator. Industry examples include, the Cisco Viptela SD-WAN solution which uses a vManage network management solution within the orchestration / management plane and the Nokia Nuage SD-WAN solution that follows the same pattern.

To break this predominance of orchestrators (with lots of compensating logic) it is important to seek integration by API direct to the control plane. To be successful telcos may wish to examine how a vendor agnostic Network as a Service may improve their 5G orchestration strategy.

BT & EE’s First To 5G Trial in Canary Wharf

BT has started its first live UK trial of 5G based technology in Canary Wharf Square. This is a high capacity zone test as Montgomery Square includes a London Underground entrance and high rise offices. The footfall is in excess of 150k people per day.

High capacity zone testing is a critical part of EE’s 5G launch program, with the first phase of its 5G roll-out targeting “hotspots” across the UK – the places that have the greatest number of people using the most mobile data.

The test hardware and spectrum are much closer to the final commercial deployments that will begin in 2019. Key to the test is a successful FCAPS deployment for live monitoring and reporting on the site and its associated backhaul. BT & EE’s handle 15 million network reporting events a day as part of their streaming architecture.

Edge SDN as a Service

Not all micro-services can be stateless lambda functions. Some services must maintain state. A good example is the management of autonomous vehicle platooning functions across multiple radio network sites.

A challenge for this distributed statefulness is if the stateful micro-services are running in a specific container then how does the SDN controller manage networking to a specific container? This requires attaching the SDN networking at the container rather than the host level. Something that is possible with Amazon EC2 Container Service

If Tier-1 telcos are serious about providing Network as a Service or Edge Compute as a Service then they must provide the join between data centre and network operator. To do this they can either be the edge landlord to Amazon, Google and Facebook. Or if they are truly ambitious they need to provide a SDN Edge

Charles Gibbons is talking about Future of NFV / SDN at Digital Transformation World this week in Nice:

A Reference Architecture for Cloud Operational Support Systems

Most telecoms operators have multiple stove piped networks each with a specific historic associated OSS. All CSPs want the agility of Web Scale firms and view OSS and Cloud provision as complementary technologies. The challenge for CSP is to move from legacy vertical pillars to a horizontal platform model. Trying to achieve this with a simple OSS refresh will be a mere shim. For CPSs to be revolutionary they must consider the viability of a Cloud OSS as a way of externalising the orchestration & management of their network resources.

Currently it’s quite easy to find major components of a SaaS BSS (for example Salesforce). However it is much hard it is much harder to find an equivalent within the OSS domain. The primary reason for lack of SaaS in this domain is the nicheness of OSS (discussed previously here IoT Don’t Need No OSS). This nicheness is changing as AWS, GCP and Azure offer essentially offer IoT OSS. There’s currently no ONAP SaaS; but I wouldn’t be surprised if ONAP matured into a SaaS offering at some point. The other major areas of concern are security which can be mitigated through policy & control. Lastly there are concerns around throughput / latency of Resource Performance Management which is a specific topic covered later.

There’s also increasing CSP interest in Open Source OSS (OSS2 maybe?) with Open Source Mano, ONAP and the new TM Forum ODA architecture (for which I’m partly responsible). These OSS’ provide functions that are componentised in their design.

I’ve personally be looking at putting together a best of breed architecture based OSM, ONAP and some Netflix OSS on a cloud-hosted environment to support multiple operational networks. In doing this work I’m trying to understand the following questions:

  1. What is a suitable logical architecture for a Cloud OSS?
  2. And if it can’t all be externally hosted then what would be a suitable logical hybrid architecture?

In order to answer these questions let’s decompose the functions of OSS and compare which parts are most suitable for being cloud hosted. Let’s break it down (using eTOM’s service and resource domains) into nine logical packages for further investigation.

Functions for Cloud OSS

I’ve categorised by Cloud Nativeness (how easy is it to port these functions to the Cloud and how many SaaS offerings are available) against Network Interactivity (be it throughput of data, proximity to element managers). It is fairly self-evident that certain functions are cloud native (service management) whilst others (order to activation) require both close deployment to the network and have specific security constraints.

By grouping the logical functions we end up with three groups: Cloud Native Solutions (those that already run well in the cloud), Not Cloud Native Solutions (those that can’t be externalised to the Cloud) and a middle group of Either / Or that could be either internally managed or externalised.

OSSCloudGraph

The Either / Or group is the newest area covering Machine Learning, Autonomics and MANO for NFV / SDN. These could be either natively deployed (for example a local deployment of FlinkML on top of a performance management solution) or a cloud hosted solution (e.g. Google Cloud Platform’s TensorFlow deployment

Cloud Native Solutions: 

Service and Incident Management systems include perennial favourites Service Now, BMC Remedy & Cherwell.  These tools as cloud hosted solution require feeds from alarm management systems. Whilst the architecture orients itself to data streaming and machine learning the incident management system handles less tickets and works more on auto-remediation. This model necessitates the closed loop remediation function to sit within the network. I would expect a streaming flow in and out of the network boundary and this will obviously be the biggest of the pipes (and the most risky). Network & Service Operations provides a specialism for service & incident management and includes the resource alarm management, solutions like EMC Smarts & IBM Netcool increasingly offer cloud based operation consoles for alarm management tools.

Field Management systems together with Resource Plan and Build are easily managed from a public cloud. These systems have limited access to the operational network and normally have to manage internal and 3rd party resource to complete field operations. Systems like ESRI and Trimble fit in this space. These systems predominantly need access to resource and service inventories, and resource tools (such as HR systems, maps and skills bases).

Strategy systems are an interesting case of specialist planning, delivery and product lifecycle tools with eTOM. They cover service development & retirement, capability delivery and strategic planning. These functions are all equally loosely coupled to the network so require inventory detail, resource detail and a big data store of network performance. But they can be hosted externally and are not mission critical systems. So for our OSS these should be Cloud Native.

Not Cloud Native Solutions:

Order 2 Activation are the Activation systems for management of the network which are either subscription based or resource activation. Distinction here between provisioning controller and the intent based network choreographer (passing intent and policy to the network)

Performance Management Real time operational systems predominantly taking data streams from the network require local deployment as network functions predominantly require low latency if incidents are to be immediately managed.

The Interesting Either / Or Group:

MANO for NFV / SDN can either be a localised solution or can be cloud hosted in the case of a master orchestrator implementing intent based models.  This model makes sense when the orchestration involves third party service orchestration. This is partially covered by the TM Forum ODA.  The challenges would be organising the split of VNF Management with NFV Orchestration. The security controls will need to avoid the attack vector to the client VNF Manager running inside a CSPs network. 

It is likely that CSP’s will investigate this model going forward as they look to benefit from the opportunity of providing Mobile Edge Compute as an integrated PaaS.

Machine Learning & Autonomic Remediation is partially dependent upon the NFVO cloud architecture as remediation needs services to be exposed in order to implement remediation. If the NFVO is already cloud hosted then remediation is a natural continuation of its capabilities. The Machine Learning capability is a driver for the remediation engine constantly looking for situational improvements for specific conditions. Machine Learning can be deployed locally on a CSPs own infrastructure or use the scaling capabilities of tools like TensorFlow on GCP. The decision CSPs make here will be about scaling the intelligence to provide usable conditions that can be implemented within the remediation engine. A CSP with good skills in this area will have a technology advantage.

Next Steps:

I will be updating this stream as I believe there is a genuine future for a Cloud Native OSS. So please keep following this blog and ping me @apicrazy if you’re on the same journey.

 

 

 

12 Reasons Why Cloud OSS hasn’t happened so far

I am regularly asked why there are so few Cloud OSS, or OSS as a Service, options when AWS / GCP and Azure all have IoT plays. I have also wondered why no systems integrator has deployed ONAP on AWS (or other). The following are the main reasons why I think such an option has not yet become popular for CSPs & vendors.

12 Reasons Why Cloud OSS hasn’t happened so far:

  1. Network operators are risk averse
    • That’s a very good thing as CSPs protect your data in flight and at rest. Security is critical for CSPs. However, this does not mean that a Cloud OSS cannot be used just that the appropriate security measures need to be in place
  2. Network operators have customers that are even more risk averse
    • That’s a very good thing too and CSPs have to take account of their customers requirements. However, a private cloud or a public cloud can be secured in the same way as a private data centre. The OSS must make sure that it is not persisting customer data or exposing network functions.
  3. Cloud OSS creates another attack vector and dude we’ve got enough of those
    • We sure do. But internally hosted OSS is itself a risk / attack vector. The benefit of Cloud OSS is that it should allow a simplification / reduction of the number of OSS stacks within the CSP
  4. OSS must be internal because of Data regulation and on-shoring / safe-harbouring of data
    • OSS systems should not be persisting customer data (EVER even Static IP addresses!). So, data regulation requirements will only have limited application. OSS data must be secured at rest and in transit. The low latency requirements of OSS will require near hosting.
  5. Few network operators have sufficient levels of virtualised network functions
    • This is changing rapidly and 5G technologies will be predominantly virtualised
  6. The cost of the OSS is always a low proportion of the costs of the network
    • This is true but does not stop the need to gain greater platform efficiencies.
  7. Moving to the cloud will not wipe away the legacy
    • Of course, it won’t but it will help focus of the future and pass management of VNFs to a single master. PNF management will always be a challenge.
  8. The OPEX model is not always beneficial
    • This is true but OSS stovepipes are not cheap. Best of breed SaaS will help spread the cost and not create a lock in to a single technology version.
  9. It’s the OSS, those guys don’t move quickly
    • A classic refrain but not a reason not to move to a Cloud OSS
  10. The streaming data pipe will be too fat and the latency will be too slow to fix items quickly
    • This is a genuine concern and will required a data pipeline architecture with streaming inside the network and OSS components residing outside. Intent based programming with specific levels of management at the different layers will be key to answering the low latency requirement. Especially when control is part of a network slice management function.
  11. The BSS will never be in the Cloud
    • Salesforce, GCP, AWS, Pega, Oracle Cloud, Azure are all changing that model. Especially in the IoT space.
  12. The OSS will never be in the Cloud
    • Watch this space….