Integration between Cilium and SPIFFE - Part 1

This is the first part of a series of blog posts about the integration between the Cilium and SPIFFE project. This part explains the current Cilum identity model and how it could be extended to be used along with SPIFFE - which provides a universal identity control plane for distributed systems. In the next parts we are going to explain in more detail this integration and the real case challenges that we want to solve.

Introduction

Cilium is an open source project to provide networking, security, and observability for cloud native environments such as Kubernetes clusters and other container orchestration platforms [1]. Cilium uses eBPF which is a Linux kernel technology that allows dynamic inserts of a program (called eBPF program) to be safely executed into Linux kernel. Cilium operates as a CNI (​​Container Networking Interface) running in each node of the cluster.

Cilium identity model

When a pod or container is created, Cilium generates an endpoint - which logically represents the pod/container that was created. Based on the k8s labels associated with the endpoint created by Cilium, an Identity is derived. The identity is a unique number that is going to be mapped to a set of k8s labels. For now on, this numeric identity is going to be used on eBPF control-plane (at Linux kernel level) and will be used to do policy enforcement at L3/L4 authorization per-packet basis. The identity mechanism improves scalability and performance when compared with policy enforcement based on network addressing such as IPtables.

The identity number is shared and synchronized widely in the cluster using KVStore and can be used by any node of the cluster to do policy enforcement in an appropriate way. In Figure 1, we can see an example of a table which maps a set of k8s labels with the respective identity number. This map is shared through all the three nodes (A, B and C) of the cluster where each node runs an instance of Cilium.

Figure 1 - Identity management in the cluster [2].

Using this identity-based approach, it’s possible to implement network security without dependence on network addresses for flexibility and scalability reasons, which is another big advantage considering the cloud environment. When the number of pods or nodes increases, the number of rules that are used for policy enforcement do not increase, because now it’s possible to group the pods based on the labels (and consequently based on the identity number). In the same example of Figure 1, if we add a new pod with the same set of labels (role=frontend), Cilium will use the same identity (number 10) for this pod, which was previously generated for another pod (based on the same labels).

One of the current drawbacks of the identity mechanism used by Cilium is that it's limited only for Kubernetes clusters and other container orchestration platforms, and here is where SPIFFE comes into picture. Using SPIFFE, which is a universal identity control plane and has strong identity attestation procedures, it's possible to support identity/trust across different platforms/cloud providers [3]. In the next section we are going to explain all the current limitations present in Cilium and how we can face these challenges using SPIFFE.

Extend Cilium identity model using SPIFFE

SPIFFE (Secure Production Identity Framework for Everyone) contains a set of specifications that cover how a workload should retrieve and use its identity [4]. If your security identity model is based on SPIFFE, it is possible to guarantee a trust model between workload/services running in different platforms, cloud providers, or even in different edge devices, which are also based on SPIFFE. SPIRE (SPIFFE Runtime Environment) is a implementation of the SPIFFE APIs that performs platform and workload attestation in order to securely issue SVIDs (SPIFFE Verifiable Identity Document) to workloads and verify the SVIDs of other workloads, based on a predefined set of conditions [5].

In Figure 2 the basis components of identity are divided in four groups and related with both projects. This division helps us to understand how each project approaches each component of identity and what are the advantages with this integration.

Figure 2 - Components of Identity between both projects.

  1. Identity Attributes & Attestation: Every Identity system depends on a set of attributes and attestation of those attributes. The attestation procedure ensures that the enlisted attributes indeed belong to the workload which claims it. Cilium does not employ explicit workload attestation procedures and only k8s-labels are used to calculate the endpoint identity - Kubernetes control-plane takes care of label management and Cilium just uses it. On the other hand, SPIFFE provides a strong mechanism to perform identity attestation and can use Kubernetes plugin [6] to attest k8s-labels or use other information such as location (node), container/pod names, container/pod images to compose the endpoint identity. Using the SPIFFE identity mechanism, Cilium is capable of deriving the endpoint identity in a more secure way and the attestation procedures can be based on an extensive attribute list [7].
  2. Identity Mapping: The set of attributes may be mapped to an intermediate representation which essentially serves as "the ID". Cilium maps the numeric identity to k8s-labels whereas the SPIFFE workload attributes are mapped to SPIFFE ID which is in the form of an URI (Uniform Resource Identifier). An example of SPIFFE ID is “spiffe://acme.com/billing/payments”. The Identity is a document called SVID which essentially is a X.509 signed certificate with few mandatory fields such as the presence of SAN (Subject Alternate Name) which carries the SPIFFE ID.
  3. Identity Carrier: The application network connection needs to communicate the ID to the remote peer. Cilium uses IPCache which is a mapping table between pod IP addresses to identity. In this way, a pod knows the identity of a remote peer. In case of SPIFFE, a mTLS handshake is used for the carrier's identity and by the end of the handshake, both peers know the SPIFFE ID which is carried by the part of the certificate. Using the SPIFFE approach, it's possible to carry an identity beyond the Cilium’s boundaries - between a k8s and non-k8s workload for example.
  4. Identity Derivatives: The Identity attestation procedure might eventually result in derivation of other credentials (such as certs or tokens) which could be used for other applications. The identity composed by Cilium can be used just by its own, which in this case, just by CNP (Cilium Network Policy) for authorization and there isn’t identity derivation. SPIRE is able to derive X.509 certificates that can be used for mTLS or for IPSec/WireGuard based authentication procedures. Also, it can be used by JWT tokens to do policy enforcement based on micro-segmentation [8].

Benefits SPIFFE could provide to Cilium:

  1. Strong attestation and authentication procedures for Identity. Strong cryptographic protection for the identity value.
  2. Generic Identity solution which extends to non-k8s workloads and to edge/IoT/endpoint scenarios as well.
  3. Ability to federate identity across multiple service providers. For e.g, if one service provider uses Istio based service mesh and another with Cilium+SPIFFE, it will be possible to federate the identity.
  4. Ability to use the SVIDs for other purposes such as transparent encryption, WireGuard/IPSec tunnels. Solve the problem of certificate management in the right way across all the services/use-cases.
  5. Single identity across all policy enforcement engines, such as: network, system and data.
  6. Ability to extend the identity solution with hardware based attestation service using confidential computing (enclaves, TPMs).

Considerations with this integration:

  1. Ability of fallback to classic Cilium Identity solution - which maps the set of k8s-labels of an identity value which is directly used for authorization.
  2. No impact (performance or functional) on Cilium data-path handling of identities.
  3. Ability to use per-packet identity and mTLS handshake both for authorization.

High level overview of the integration

In Figure 3, it is possible to see a high level overview of a Kubernetes environment with Cilium and Spire deployed based on the integration. In the step 1, Spire is deployed in the cluster and two registration entries (step 2) are created for 2 differentes pods - podfoo and podfefault. After creating the registration entries in spire-server, they are cached to all spire-agents. Step 1 and 2 are common steps when Spire is deployed in a cluster. The integration comes into picture at step 3. When both pods are deployed in the cluster, Cilium creates an endpoint to represent the pod creation and generate a numeric identity based on the labels used by the pod. Besides that, Cilium connects to Spire (though a Delegated Identity API also created with this integration) and, on behalf of the pods, asks Spire to attest the related pods’s attributes (step 3.1).

In our example, there are two related entries already created in Spire for both pods. The selectors of each registration entry match the attributes used to create podfoo and poddefault, and, once it happens, a X.509 SVID is returned (step 3.2) and Cilium uses this and creates a label for each pod containing the SPIFFE ID URI. As soon as this label is created, Cilium uses this new label to compound a numeric identity. Finally, the label created can be used by a CNP (step 4) to do L3/L4 policy enforcement. Also is possible to do L7 policy enforcement using the X.509 SVID returned, together with cilium-envoy proxy, and use it to upgrade a non-secure connection to mTLS. This example shows an upgrade connection between two workloads running in Kubernetes, but it's also possible to use the certificate to upgrade a connection between a k8s and non-k8s workload.

Figure 3 - High level overview.

To sum up, an attestation process happens on behalf of each pod (performed by Cilium) and the pod receives a new label (SPIFFE ID) based on its own attributes. Then this new label can be used by a CNP to do L3/L4/L7 policy enforcement.

In part 2 of this series, we will explain in a detailed way how we did this integration. How Cilium and Spire was modified to be possible to integrate both projects. Also an example of L3/L4/L7 policy enforcement using SPIFFE ID will be shown together with an example of upgrading a non-secure connection to mTLS.

References:

[1] https://cilium.io/learn

[2] https://docs.cilium.io/en/v1.10/concepts/terminology/

[3] https://www.youtube.com/watch?v=0LSaNrOabH4

[4] https://spiffe.io/docs/latest/spiffe-about/overview/

[5] https://spiffe.io/docs/latest/spire-about/spire-concepts/

[6] https://github.com/spiffe/spire/blob/main/doc/plugin_agent_workloadattestor_k8s.md

[7] https://spiffe.io/docs/latest/deploying/registering/#2-defining-the-spiffe-id-of-the-workload

[8] https://blog.accuknox.com/identity-based-micro-segmentation-using-jwt-tokens/