A Brief Introduction to Istio

building a service mesh


Microservices architectures solve some problems, but they also cause some others. Splitting applications into independent services makes them easier to develop, update, and scale. At the same time, it gives you many more moving parts to connect and secure. Managing all the network services - load balancing, traffic management, authentication and authorization, etc. - can become incredibly complex.

The more extensive and intertwined a microservices architecture becomes, the more confusing it becomes. This is referred to as a service mesh. Nowadays, many such microservices architectures are developed natively in the cloud and special demands are placed on those who are then supposed to manage these microservices. The Istio tool is designed to help them keep track of everything.

Service Mesh

The term service mesh describes a network of microservices that interact with each other via network calls and thus form an integrated application. As the number of services increases, the calling behavior of the individual components becomes more complex and thus more difficult to understand and control.

A microservice application consisting of a double-digit number of microservices, for example, can already become problematic in terms of management. Typically, each new version of a microservice is deployed in parallel with existing versions of the same microservice in production. This is done because discontinuing an old version of the microservice is not so easy and it is easier to put the new version of the microservice into production without making adjustments to the older versions, with fewer side effects. The trade-off that is made leads to multiple versions of a microservice existing in parallel. If, on average, each service exists in two or three versions in parallel, which is not uncommon, the number of services doubles or triples quite quickly. A microservice application that originally started with twenty services, for example, quickly reaches almost fifty services.

What is often neglected when considering the services are the necessary database systems, since a microservice usually also has persistence requirements. The database instances involved are therefore also part of the service mesh. In this context, one can still be happy if several versions of a microservice can share the same database, because one has managed to make the database schema downward compatible. In extension to the above example of the microservice application, at least twenty database services are also added to the nearly fifty services. The intermediate result is about seventy processes.

But that is not enough. The distributed application should also have sufficient reserves in terms of load and reliability. Doubling the number of services to increase availability is the minimum that should be planned for. Thus, in our application example, the magic limit of one hundred process instances was exceeded, and that with only fifteen microservices initially. Additional factors, such as staging environments or special multi-client requirements, can increase the number significantly or result in several complex service meshes that also have to be managed.

As a consequence of a service mesh, there are now various functionalities that a management tool must provide. These range from service discovery and load balancing to resilience and failure recovery. This would cover at least the minimal necessities of service-to-service communication. For failure analysis, it is necessary to be able to trace the call chains of the individual services, which is usually made possible with tracing tools. For the operation team, metrics and monitoring are essential for the stable operation of the distributed application, and very few applications can get by without security (access control and end-to-end authentication). Rate limiting is also sometimes necessary to avoid overloading the distributed application. In addition, there are requirements based on further concepts such as A/B testing or canary releasing in order to bring new versions of microservices into production in an orderly manner. Finally, there could be the desire for content-based routing in order to specifically control calls among each other by means of call content. For development, tests, and also the reproduction of misbehavior, it is often necessary to be able to provide errors or timeouts in a targeted manner.

In more detail, a service mesh consists of two additional layers that sit between an orchestrator like Kubernetes and the applications. A service mesh provides each service instance with a so-called sidecar container. In Kubernetes, this is placed in the same pod, the deployment unit of Kubernetes. Therefore, both containers run on the same physical host, share a network interface, and can share file systems. Inside the sidecar container is a service proxy through which all inbound and outbound communication passes. All service proxies together form the decentralized data plane, which not only captures values such as the source and destination of a request, latency and error codes, but can also control and manipulate traffic. The second plane that a service mesh adds is the Control Plane. It contains central components that configure the sidecar proxies on the Data Plane. This includes how the collected data should be processed, but also how network traffic should be inspected, modified or directed based on rules.

The sidecar approach means that the actual microservice is not aware of any of this, which means that the service developer does not have to write a line of code to enable these functionalities. This has the enormous advantage of using the same technology for a polyglot microservice application and allowing the developer to devote his time to the actual implementation of the microservice. The use of special frameworks (service discovery, resilience, circuit breaker, ...) is no longer necessary.

What is Istio?

Istio is a merger of several open source projects. It is a product of a common activity between Google, IBM and Lyft. The naming was taken from Google's Istio project. IBM is participating with Amalgam8 and Lyft is bringing Envoy into this cooperative work. Thus, the only thing new about today's Istio remains the interaction of the individual components.

In detail, the following functionalities of the individual parties were brought in during the merger. Google's Istio provides content-based request routing between different versions of a microservice. Thus, based on geographic data or only on the logged-in user, it can be decided which service should be addressed in which version. In addition, there is rate limiting, evaluation of access control lists and collection of telemetry data. But the most important part is integrating Istio into the Kubernetes runtime environment. IBM's Amalgam8 contributes service discovery and fault resilience along with resilience testing and load balancing, among other things. Enhanced content-based routing also still enables canary releasing. Lyft delivers with the Envoy Proxy, the so-called sidecar and thus the heart of Istio. During the development of the Envoy Proxy, there was a strong focus on performance and security. Connection to the Zipkin tracing system is also included in the package. According to Lyft's own information, you can use it to run more than 10,000 virtual machines with more than 100 microservices without much effort. The art of the merger will now be to unify the individual projects into a single cast while eliminating duplicate functionality, which is currently taking place.

Istio, which started with the Kubernetes runtime platform, can now also run on Nomad and Consul. Other platforms such as CloudFoundry and Apache Mesos are planned for the future. According to statements on Istio's homepage, other competitors (such as RedHat and Pivotal) have decided to replace their existing service discovery mechanisms with Istio. This is to achieve compatibility of Istio with respect to Kubernetes, CloudFoundry and OpenShift. Thus, more powerful players in this market segment support Istio's deployment capabilities.

Istio can be run on the most common Kubernetes cluster systems either in the cloud or on-premise.

Istio Architecture

Istio logically divides the service mesh into two tiers:

  1. Data Plane
  2. Control Plane

The data plane is composed of the Envoy proxies, which are deployed as sidecars next to the microservice. All inbound and outbound network traffic with the microservice is routed through the sidecar.

The control plane takes care of the management and configuration of the proxies, allowing centralized control of security and network traffic within the service mesh.

The following diagram shows the architecture of Istio.

The architecture of Istio / Source: Istio

Envoy Proxy

The Envoy Proxy, developed in C++ for performance reasons, is deployed as a separate (Docker) container together with the microservice container in a Kubernetes pod. By changing the settings of the IP tables, the sidecar can join the communication imperceptibly to the microservice as a "man-in-the-middle". With this model, Istio functionalities can still be subsequently integrated into existing microservice applications without changing the microservices themselves. The Envoy is in close communication with the Mixer. Both exchange a lot of telemetry information with each other, which allows the mixer to determine the desired policies and forward them to all participating envoys. In addition, the Envoy still takes care of service discovery, load balancing, health checks and metrics. To improve resilience, circuit-breaker tasks are handled by Envoy along with the ability to test them via fault injection.

Mixer

Further tasks of the Mixer are to establish the platform independence of Istio. This keeps properties and peculiarities of the runtime environments of Envoy and the other control components away from Istio. The collected telemetry data is sent to monitoring systems to provide them with the necessary information about the behavior of the service mesh.

Istio - Mixer

The backend services include advanced functionalities such as access control, quota enforcement or billing. Without the Envoy Proxy and the Mixer, the microservice developer would be forced to do this integration in his code. These parts are also handled by Istio, keeping the service free of dependencies on these systems. This makes the work of operation team easier, as they can act on these functionalities through centralized locations without having to do so separately in each microservice.

Pilot

Service discovery is handled by the Pilot by providing the Envoy proxies with the necessary information. This information enables the Envoy to implement the resilience patterns (such as timeout, retry and circuit breaker), among others. A/B deployment and canary releasing are also enabled by this interaction. Each Envoy manages load-balancer information it receives from the Pilot, allowing the Envoy to optimally distribute the load across the service mesh. Again, platform-specific service discovery mechanisms are abstracted and related information is transferred into an Istio format, which can then be communicated to the envoys. This allows Istio to be connected to any runtime environment by means of suitable adapters, even in the case of traffic management, and a uniform admin API is available for the operation team.

Istio - Pilot

The Platform Adapter Layer takes care of the communication, for example with the Kubernetes API, and gets information about the pod registration in order to be informed about the existing runtime instances. The Abstract Model transforms these into Istio's canonical model and together with the Rules, these are passed to the Envoys.

Istio-Auth

The third component of the Istio Control Plane is responsible for security. Istio already offers service-to-service and end-user authentication based on TLS. Admins can thus activate rudimentary policies for the time being. Unfortunately, finer access control together with auditing will only be available in future releases. So for the time being, the services themselves will have to implement the required access checks and auditing outputs.

Traffic Management

The core task of Istio is the intelligent management of the calls between the services in the service mesh. For this purpose, the so-called traffic flow is deliberately decoupled from the scaling of the infrastructure. You only specify the call behavior you want to achieve, without having to specify which Kubernetes pod or VM should process these calls. Pilot and Envoy intelligently interact to derive the call behavior and thus assign the calls to the runtime instances.

Istio Traffic Management / Source: Istio

The example in the above diagram shows that a simple configuration in Istio makes it possible to redirect 5% of the calls to a new version of Service B (svcB') (canary releasing). One does not have to worry about routing to the actual pods with Service B', since the required number of canary instances of Service B' is handled by Kubernetes itself. With the same simple settings, it is possible to control routing based on request content. For example, it is possible to specify that calls from Android devices are routed to Service B and only calls from iPhone devices are sent to the new version of Service B (svcB').

To implement these examples, a RouteRule is created in yaml format. For the Canary release example, the corresponding Istio rule is:

# canary-release.yaml
 
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: canary-release
  namespace: default
spec:
  destination:
    name: svcB
  precedence: 1
  route:
  - labels:
      version: current_version
    weight: 95
  - labels:
      version: canary_version
    weight: 5

Istio rules are assigned to a target service (in this case svcB) via the destination parameter. The evaluation order of the different rules is defined via priorities (precedence). Routing is defined based on the version labels assigned to the service. In this case 95% of the requests are distributed to all instances of svcB with the version label current_version. This rule can be installed using the command line tool istioctl:

istioctl create -f canary-release.yaml

With a few exceptions (e.g., sidecar injection in a service), the command-line tool kubectl can also be used equivalently in the Kubernetes environment. However, extended schema validation is still performed via istioctl. Once the rule is created, Pilot takes on the task of distributing the new routing information to all Envoy proxies, thus enabling the new behavior.

Without a tool like Istio, you're unlikely to get a handle on a service mesh. The strengths of a transparent sidecar infrastructure are obvious: Application development is freed from repetitive tasks and can focus on the essentials - the business logic. Istio thus transparently provides the necessary functions for operating and monitoring a service mesh for the application via Pilot, Mixer and the Envoy Proxy. Migration of existing applications can also be realized quite easily. All features are available regardless of programming language and runtime environment. In turn, a powerful tool is available for operation team to closely monitor applications and interact with the service mesh in a targeted manner via the rule set. With the involvement of powerful community members such as Google and IBM, Istio suggests great potential and rapid progress towards production readiness. At its core, Istio already focuses on production-ready projects and integration of existing tools. The upcoming support for different platforms, including non-Kubernetes-based ones, is also a big advantage and allows independence in choosing a target platform.