Most enterprises have complex application deployments across their own internal data centres and commercial clouds. I am using Google Cloud Platform and AWS in this example. Where I work, we traditionally monitored logs and configured alarms for network and infrastructure monitoring. This approach was disjointed and slow to react. The enterprise moved to cloud hosting with elastic scalability a few years ago which led to multiple stove pipes of monitoring capability and a heavy dependency on VPC interconnects. We wanted to move to a multi-cloud environment whilst maintaining the benefits of a centralised technology operations centre.
We quickly realised that we had specific workloads running in different environments with no common mechanism for monitoring & reporting. This led us to examine open-source monitoring architectures based on Netflix’s Keystone Pipeline. Our requirements were for a universal data visualisation and observation of our application based on Grafana, Zipkin and Kiali.
This architecture is based on open-source projects that we can use across GCP, AWS and internally. Everything is predicated on Docker containers and Kubernetes container orchestration. Istio provides the policy and load-balancing functions of a service mesh and GRPC provides the low latency integrations between the micro-services. These technologies provide the enablers for the monitoring & visualisation capabilities of Kiali, Zipkin and Grafana.
The following diagram shows the open-source component architecture to support different internal data centres (one for IT running Pivotal and one for mobile network IT running Openstack), Google App Engine and AWS Kubernetes service EKS on EC2. This logical architecture has the intention of a single pane of glass for service management toolkit technologies.
To achieve a single pane of glass across multi clouds requires the need of a aggregation function that can integrate the control plane of multiple Kubernetes container orchestrations. Istio achieves this by supporting multicluster deployments across hybrid clouds by deploying a control plane to each Kubernetes cluster. Kiali can provide service mesh observability of a Istio multi-cluster environment. A Helm variable global.remoteZipkinAddress can be used to connect Zipkin distributed tracing to the Istio cluster.
All of this together enables a Kubernetes control plane on each hybrid cloud environment to be interconnected to the master visualisation technology operations centre environment.
The traffic flow of a Kube ingress allows the ELB using GRPC to integrate multiple clusters where the Prometheus collection agents are deployed. These can then be aggregated together through the Prometheus server in the logical control plane.
Note that the HELM Tiller deployments to each cluster support the multi-cluster control plane as described here.
Prometheus provides the time series of events for the multiple clusters that can then be queried by any Grafana server which treats storage backends as time series data (Data Source). Each Data Source has a specific Query Editor that is customized for the features and capabilities that the particular Data Source exposes. Grafana can also consume StackDriver, CloudWatch and Ceilometer for Openstack.
- Istio, Helm & Tiller can manage a multi-cluster hybrid cloud deployment
- moving to a hybrid cloud requires a visualisation of complex integrations which is where Istio and Kiali service mesh observability are strong
- hybrid cloud monitoring can be achieved by deployment of agents including Prometheus collection agents to individual clusters and connected to a Prometheus server which in turn is rendered by a Grafana server
- Zipkin provides distributed tracing and integrates with the Istio managed cluster
One point not described is the requirement for a technical inventory that describes the individual micro-services and the toolsets that can be deployed to each container, but i’ll save for another blog.
Finally, there are technology alternatives to Kiali, Zipkin, Grafana and Prometheus such as included Logstash & ELK, FluentD and commercial solutions like Datadog.