IPv4/IPv6 Dual-Stack in Kubernetes

 Today, I was asked by a teammate to help troubleshoot an issue related to dual-stack in Kubernetes. It turned out that I am quite so unfamiliar with this subject. Actually, I didn't know some very basic things about it.  Hence I spent some time studying it. This post is a summary of it.

What is IPv4/IPv6 dual-stack?

IPv4/IPv6 dual-stack networking enables the allocation of both IPv4 and IPv6 addresses to Pods and Services.

IPv4/IPv6 dual-stack networking is enabled by default for the Kubernetes cluster starting in 1.21, allowing the simultaneous assignment of both IPv4 and IPv6 addresses.

IPv4/IPv6 dual-stack on Kubernetes cluster provides the following features:

  •     Dual-Stack Pod networking(a single IPv4 and IPv6 address assignment per Pod)
  •     IPv4 and IPv6 enabled Services
  •     Pod off-cluster egress routing(eg. the Internet) via both IPv4 and IPv6 interfaces.

IPv4/IPv6 Dual-Stack Prerequisites

Kubernetes 1.20 or later

Provider support for dual-stack networking(Cloud provider or otherwise must be able to provide Kubernetes nodes with routable IPv4/IPv6 network interfaces)

A network plugin that supports dual-stack networking.

Configure IPv4/IPv6 dual-stack

To configure IPv4/IPv6 dual-stack, set dual-stack cluster network assignments:

  •     kube-apiserver:
                    --service-cluster-ip-range=<IPv4 CIDR>, <IPv6 CIDR>

  •     kube-controller-manager:
                   --cluster-cird=<IPv4 CIDR>,<IPv6 CIDR>
                   --service-cluster-ip-range=<IPv4 CIDR>,<IPv6 CIDR>
                   --node-cidr-mask-size-ipv4|--node-cidr-mask-size-ipv6 defaults to /24 for IP/4 and /64 for   IPv6
    
  •     kube-proxy:
                   --cluster-cidr=<IPv4 CIDR>,<IPv6 CIDR>

  •     kubelet:
                   --node-ip=<IPv4 IP>,<IPv6 IP>
                        This option is required for bare metal dual-stack nodes(nodes that do not define a cloud  provider with the --cloud-provider flag). If you are using a cloud provider and choose to  override the node IPs chosen by the cloud provider, set the --node-ip option.


You can check this configuration in your Kubernetes cluster. 
Firstly, run the command: 

        kubectl get nodes

Then select a control-plane node, and log on it. And run the following command on it:

         ```ps -ef | grep kube```

You will see the process detail of kube-apiserver,  kube-proxy,  kubelet, and kube-controller-manager, for example:

eric@eric~$ ps -ef | grep kube
root        2972    1054  1 11:09 ?        00:09:51 /var/lib/minikube/binaries/v1.31.0/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/var/lib/kubelet/config.yaml --hostname-override=minikube --kubeconfig=/etc/kubernetes/kubelet.conf --node-ip=192.168.49.2
root        3516    3496  1 11:09 ?        00:08:52 etcd --advertise-client-urls=https://192.168.49.2:2379 --cert-file=/var/lib/minikube/certs/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/minikube/etcd --experimental-initial-corrupt-check=true --experimental-watch-progress-notify-interval=5s --initial-advertise-peer-urls=https://192.168.49.2:2380 --initial-cluster=minikube=https://192.168.49.2:2380 --key-file=/var/lib/minikube/certs/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://192.168.49.2:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://192.168.49.2:2380 --name=minikube --peer-cert-file=/var/lib/minikube/certs/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/var/lib/minikube/certs/etcd/peer.key --peer-trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt --proxy-refresh-interval=70000 --snapshot-count=10000 --trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt
root        3580    3542  2 11:09 ?        00:15:19 kube-apiserver --advertise-address=192.168.49.2 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/var/lib/minikube/certs/ca.crt --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --enable-bootstrap-token-auth=true --etcd-cafile=/var/lib/minikube/certs/etcd/ca.crt --etcd-certfile=/var/lib/minikube/certs/apiserver-etcd-client.crt --etcd-keyfile=/var/lib/minikube/certs/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --kubelet-client-certificate=/var/lib/minikube/certs/apiserver-kubelet-client.crt --kubelet-client-key=/var/lib/minikube/certs/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/var/lib/minikube/certs/front-proxy-client.crt --proxy-client-key-file=/var/lib/minikube/certs/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/var/lib/minikube/certs/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=8443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/var/lib/minikube/certs/sa.pub --service-account-signing-key-file=/var/lib/minikube/certs/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/var/lib/minikube/certs/apiserver.crt --tls-private-key-file=/var/lib/minikube/certs/apiserver.key
root        3674    3624  1 11:09 ?        00:06:42 kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address=127.0.0.1 --client-ca-file=/var/lib/minikube/certs/ca.crt --cluster-cidr=10.244.0.0/16 --cluster-name=mk --cluster-signing-cert-file=/var/lib/minikube/certs/ca.crt --cluster-signing-key-file=/var/lib/minikube/certs/ca.key --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=false --requestheader-client-ca-file=/var/lib/minikube/certs/front-proxy-ca.crt --root-ca-file=/var/lib/minikube/certs/ca.crt --service-account-private-key-file=/var/lib/minikube/certs/sa.key --service-cluster-ip-range=10.96.0.0/12 --use-service-account-credentials=true
root        3716    3691  0 11:09 ?        00:00:56 kube-scheduler --authentication-kubeconfig=/etc/kubernetes/scheduler.conf --authorization-kubeconfig=/etc/kubernetes/scheduler.conf --bind-address=127.0.0.1 --kubeconfig=/etc/kubernetes/scheduler.conf --leader-elect=false
root        4779    4683  0 11:09 ?        00:00:04 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=minikube
root       45275       1  0 14:42 ?        00:00:01 snapfuse /var/lib/snapd/snaps/kubeadm_3513.snap /snap/kubeadm/3513 -o ro,nodev,allow_other,suid
eric      125908     687  0 21:22 pts/0    00:00:00 grep kube
eric@E-5CG1422YTH:~$ ps -ef | grep kube | grep range
root        3580    3542  2 11:09 ?        00:15:20 kube-apiserver --advertise-address=192.168.49.2 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/var/lib/minikube/certs/ca.crt --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --enable-bootstrap-token-auth=true --etcd-cafile=/var/lib/minikube/certs/etcd/ca.crt --etcd-certfile=/var/lib/minikube/certs/apiserver-etcd-client.crt --etcd-keyfile=/var/lib/minikube/certs/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --kubelet-client-certificate=/var/lib/minikube/certs/apiserver-kubelet-client.crt --kubelet-client-key=/var/lib/minikube/certs/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/var/lib/minikube/certs/front-proxy-client.crt --proxy-client-key-file=/var/lib/minikube/certs/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/var/lib/minikube/certs/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=8443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/var/lib/minikube/certs/sa.pub --service-account-signing-key-file=/var/lib/minikube/certs/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/var/lib/minikube/certs/apiserver.crt --tls-private-key-file=/var/lib/minikube/certs/apiserver.key
root        3674    3624  1 11:09 ?        00:06:42 kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address=127.0.0.1 --client-ca-file=/var/lib/minikube/certs/ca.crt --cluster-cidr=10.244.0.0/16 --cluster-name=mk --cluster-signing-cert-file=/var/lib/minikube/certs/ca.crt --cluster-signing-key-file=/var/lib/minikube/certs/ca.key --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=false --requestheader-client-ca-file=/var/lib/minikube/certs/front-proxy-ca.crt --root-ca-file=/var/lib/minikube/certs/ca.crt --service-account-private-key-file=/var/lib/minikube/certs/sa.key --service-cluster-ip-range=10.96.0.0/12 --use-service-account-credentials=true

Note:

An example of an IPv4 CIDR: 10.244.0.0/16

An example of an IPv6 CIDR: fdxy:IJKL:MNOP:15::/64


Services

You can create Services that can use IPv4, IPv6, or both.

The address family of a Service defaults to the address family of the first service cluster IP range(configured via the --service-cluster-ip-range flag to the kube-apiserver)

When you define a Service you can optionally configure it as dual stack. To specify the behavior you want, you set the .spec.ipFamilyPolicy field to one of the following values:

  •     SingleStack: Single-stack service. The control plane allocates a cluster IP for the Service, using the first configured service cluster IP range.

  •     PreferDualStack: Allocate both IPv4 and IPv6 cluster IPs for the Service when dual-stack is enabled. If dual-stack is not enabled or supported, it falls back to single-stack behavior.

  •     RequiredDualStack: Allocate Service .spec.clusterIPs from both IPv4 and IPv6 address ranges when dual-stack is enabled. If dual-stack is not enabled or supported, the Service API object creation failes.
            Selects the .spec.clusterIP from the list of .spec.clusterIPs based on the address family of the first element in the .spec.ipFamilies array.

If you would like to define which IP family to use for single stack or define the order of IP families for dual-stack, you can choose the address families by setting an optional field, .spec.ipFamilies, on the Service.

NOTE:
The .spec.ipFamilies field is conditonally mutable: you can add or remove a secondary IP address family, but you cannot change the primary IP address family of an existing Service.

You can set .spec.ipFamilies to any of the following array values:

  •     ["IPv4"]
  •     ["IPv6"]
  •     ["IPv4", "IPv6"]
  •     ["IPv6:, "IPv4"]

The first family you list is used for the legacy .spec.clusterIP field.

The process described above is implemented in the initIPFamilyFields function of alloc.go in Kubernetes source code:

// attempts to default service ip families according to cluster configuration
// while ensuring that provided families are configured on cluster.
func (al *Allocators) initIPFamilyFields(after After, before Before) error {
    oldService, service := before.Service, after.Service

    // can not do anything here
    if service.Spec.Type == api.ServiceTypeExternalName {
        return nil
    }

    // We don't want to auto-upgrade (add an IP) or downgrade (remove an IP)
    // PreferDualStack services following a cluster change to/from
    // dual-stackness.
    //
    // That means a PreferDualStack service will only be upgraded/downgraded
    // when:
    // - changing ipFamilyPolicy to "RequireDualStack" or "SingleStack" AND
    // - adding or removing a secondary clusterIP or ipFamily
    if isMatchingPreferDualStackClusterIPFields(after, before) {
        return nil // nothing more to do.
    }

    // If the user didn't specify ipFamilyPolicy, we can infer a default.  We
    // don't want a static default because we want to make sure that we never
    // change between single- and dual-stack modes with explicit direction, as
    // provided by ipFamilyPolicy.  Consider these cases:
    //   * Create (POST): If they didn't specify a policy we can assume it's
    //     always SingleStack.
    //   * Update (PUT): If they didn't specify a policy we need to adopt the
    //     policy from before.  This is better than always assuming SingleStack
    //     because a PUT that changes clusterIPs from 2 to 1 value but doesn't
    //     specify ipFamily would work.
    //   * Update (PATCH): If they didn't specify a policy it will adopt the
    //     policy from before.
    if service.Spec.IPFamilyPolicy == nil {
        if oldService != nil && oldService.Spec.IPFamilyPolicy != nil {
            // Update from an object with policy, use the old policy
            service.Spec.IPFamilyPolicy = oldService.Spec.IPFamilyPolicy
        } else if service.Spec.ClusterIP == api.ClusterIPNone && len(service.Spec.Selector) == 0 {
            // Special-case: headless + selectorless defaults to dual.
            requireDualStack := api.IPFamilyPolicyRequireDualStack
            service.Spec.IPFamilyPolicy = &requireDualStack
        } else {
            // create or update from an object without policy (e.g.
            // ExternalName) to one that needs policy
            singleStack := api.IPFamilyPolicySingleStack
            service.Spec.IPFamilyPolicy = &singleStack
        }
    }
    // Henceforth we can assume ipFamilyPolicy is set.

    // Do some loose pre-validation of the input.  This makes it easier in the
    // rest of allocation code to not have to consider corner cases.
    // TODO(thockin): when we tighten validation (e.g. to require IPs) we will
    // need a "strict" and a "loose" form of this.
    if el := validation.ValidateServiceClusterIPsRelatedFields(service); len(el) != 0 {
        return errors.NewInvalid(api.Kind("Service"), service.Name, el)
    }

    //TODO(thockin): Move this logic to validation?
    el := make(field.ErrorList, 0)

    // Update-only prep work.
    if oldService != nil {
        if getIPFamilyPolicy(service) == api.IPFamilyPolicySingleStack {
            // As long as ClusterIPs and IPFamilies have not changed, setting
            // the policy to single-stack is clear intent.
            // ClusterIPs[0] is immutable, so it is safe to keep.
            if sameClusterIPs(oldService, service) && len(service.Spec.ClusterIPs) > 1 {
                service.Spec.ClusterIPs = service.Spec.ClusterIPs[0:1]
            }
            if sameIPFamilies(oldService, service) && len(service.Spec.IPFamilies) > 1 {
                service.Spec.IPFamilies = service.Spec.IPFamilies[0:1]
            }
        } else {
            // If the policy is anything but single-stack AND they reduced these
            // fields, it's an error.  They need to specify policy.
            if reducedClusterIPs(After{service}, Before{oldService}) {
                el = append(el, field.Invalid(field.NewPath("spec", "ipFamilyPolicy"), service.Spec.IPFamilyPolicy,
                    "must be 'SingleStack' to release the secondary cluster IP"))
            }
            if reducedIPFamilies(After{service}, Before{oldService}) {
                el = append(el, field.Invalid(field.NewPath("spec", "ipFamilyPolicy"), service.Spec.IPFamilyPolicy,
                    "must be 'SingleStack' to release the secondary IP family"))
            }
        }
    }

    // Make sure ipFamilyPolicy makes sense for the provided ipFamilies and
    // clusterIPs.  Further checks happen below - after the special cases.
    if getIPFamilyPolicy(service) == api.IPFamilyPolicySingleStack {
        if len(service.Spec.ClusterIPs) == 2 {
            el = append(el, field.Invalid(field.NewPath("spec", "ipFamilyPolicy"), service.Spec.IPFamilyPolicy,
                "must be 'RequireDualStack' or 'PreferDualStack' when multiple cluster IPs are specified"))
        }
        if len(service.Spec.IPFamilies) == 2 {
            el = append(el, field.Invalid(field.NewPath("spec", "ipFamilyPolicy"), service.Spec.IPFamilyPolicy,
                "must be 'RequireDualStack' or 'PreferDualStack' when multiple IP families are specified"))
        }
    }

    // Infer IPFamilies[] from ClusterIPs[].  Further checks happen below,
    // after the special cases.
    for i, ip := range service.Spec.ClusterIPs {
        if ip == api.ClusterIPNone {
            break
        }

        // We previously validated that IPs are well-formed and that if an
        // ipFamilies[] entry exists it matches the IP.
        fam := familyOf(ip)

        // If the corresponding family is not specified, add it.
        if i >= len(service.Spec.IPFamilies) {
            // Families are checked more later, but this is a better error in
            // this specific case (indicating the user-provided IP, rather
            // than than the auto-assigned family).
            if _, found := al.serviceIPAllocatorsByFamily[fam]; !found {
                el = append(el, field.Invalid(field.NewPath("spec", "clusterIPs").Index(i), service.Spec.ClusterIPs,
                    fmt.Sprintf("%s is not configured on this cluster", fam)))
            } else {
                // OK to infer.
                service.Spec.IPFamilies = append(service.Spec.IPFamilies, fam)
            }
        }
    }

    // If we have validation errors, bail out now so we don't make them worse.
    if len(el) > 0 {
        return errors.NewInvalid(api.Kind("Service"), service.Name, el)
    }

    // Special-case: headless + selectorless.  This has to happen before other
    // checks because it explicitly allows combinations of inputs that would
    // otherwise be errors.
    if service.Spec.ClusterIP == api.ClusterIPNone && len(service.Spec.Selector) == 0 {
        // If IPFamilies was not set by the user, start with the default
        // family.
        if len(service.Spec.IPFamilies) == 0 {
            service.Spec.IPFamilies = []api.IPFamily{al.defaultServiceIPFamily}
        }

        // this follows headful services. With one exception on a single stack
        // cluster the user is allowed to create headless services that has multi families
        // the validation allows it
        if len(service.Spec.IPFamilies) < 2 {
            if *(service.Spec.IPFamilyPolicy) != api.IPFamilyPolicySingleStack {
                // add the alt ipfamily
                if service.Spec.IPFamilies[0] == api.IPv4Protocol {
                    service.Spec.IPFamilies = append(service.Spec.IPFamilies, api.IPv6Protocol)
                } else {
                    service.Spec.IPFamilies = append(service.Spec.IPFamilies, api.IPv4Protocol)
                }
            }
        }

        // nothing more needed here
        return nil
    }

    //
    // Everything below this MUST happen *after* the above special cases.
    //

    // Demanding dual-stack on a non dual-stack cluster.
    if getIPFamilyPolicy(service) == api.IPFamilyPolicyRequireDualStack {
        if len(al.serviceIPAllocatorsByFamily) < 2 {
            el = append(el, field.Invalid(field.NewPath("spec", "ipFamilyPolicy"), service.Spec.IPFamilyPolicy,
                "this cluster is not configured for dual-stack services"))
        }
    }

    // If there is a family requested then it has to be configured on cluster.
    for i, ipFamily := range service.Spec.IPFamilies {
        if _, found := al.serviceIPAllocatorsByFamily[ipFamily]; !found {
            el = append(el, field.Invalid(field.NewPath("spec", "ipFamilies").Index(i), ipFamily, "not configured on this cluster"))
        }
    }

    // If we have validation errors, don't bother with the rest.
    if len(el) > 0 {
        return errors.NewInvalid(api.Kind("Service"), service.Name, el)
    }

    // nil families, gets cluster default
    if len(service.Spec.IPFamilies) == 0 {
        service.Spec.IPFamilies = []api.IPFamily{al.defaultServiceIPFamily}
    }

    // If this service is looking for dual-stack and this cluster does have two
    // families, append the missing family.
    if *(service.Spec.IPFamilyPolicy) != api.IPFamilyPolicySingleStack &&
        len(service.Spec.IPFamilies) == 1 &&
        len(al.serviceIPAllocatorsByFamily) == 2 {

        if service.Spec.IPFamilies[0] == api.IPv4Protocol {
            service.Spec.IPFamilies = append(service.Spec.IPFamilies, api.IPv6Protocol)
        } else if service.Spec.IPFamilies[0] == api.IPv6Protocol {
            service.Spec.IPFamilies = append(service.Spec.IPFamilies, api.IPv4Protocol)
        }
    }

    return nil
}



Comments

Popular posts from this blog

What happens under the hood when you create a Java ServerSocket and bind wildcard "0.0.0.0" in a dual-stack host?