Don’t Write a (builtin) API, Write a CRD!

Who am I?

Solly Ross (@directxman12 / metamagical.dev)

KubeBuilder Maintainer and Software Engineer on GKE

My mission is to make writing Kubernetes extensions less arcane

So… adding APIs to Kubernetes

a (snarky) Oddessy

How does one add a (built-in) API to Kubernetes?

Make a folder in vendor/k8s.io/api/<group>/v<version>, and write your public types, with JSON and proto tags.
Copy those types to pkg/apis/<group> to make your internal types, which are probably exactly the same as your external types, except without JSON tags.
Make sure you’ve ~~engraved all the magic runes~~ added all the xyz-gen markers to your types
Write defaulting code in defaults.go
Generate automatic conversions between the internal type and the external type (hopefully conversion-gen doesn’t explode or fail to run because the file doesn’t exist)
Write manual conversions (if your internal and external types don’t match)
Write validation code in validation.go
Start code generation (go-to-protobuf, client-gen, lister-gen, informer-gen) and go make some tea 🍵
Create a registry & storage implementation in pkg/registry by copying somebody else’s registry cause it’s probably the same. You’ll also add your print columns and such here.
Make sure you’ve got an install package in your group directories, and add your API version to the API server’s list, scheme, etc.
Edit a pelethora of bookkeeping files, including (but not limited to): hack/lib/init.sh, hack/update-generated-protobuf-dockerized.sh 1, various linter files, and various linter exception files 2.
Update the fuzzer if you have things like tagged unions.
Start the non-code generators (generated-docs, generated-swagger-docs, openapi-spec) and go make another cup of tea 🍵
Edit the cmd tests to make sure kubectl gives proper output.
Edit kubectl to implement a new describe command for your type.
Optional, but suggested: realize that you messed something up, make some changes, forget to run an update, have some test break subtly. Run ./hack/update-all.sh in anger, go bake a cake 🍰, and come back to find that your laptop has overheated.

not to be confused with hack/update-generated-protobuf.sh, which runs in a docker container. ↩
at any given point in time, it’s fairly likely that some generated file violates the project’s linter rules. ↩

So, what are the issues here?

high barrier to entry/long iteration cycle
tightly coupled to Kubernetes releases
lots of bespoke code, little declarative information (things like validation and defaulting are raw Go code, and don’t show up in the OpenAPI)

Enter CRDs!

Make API folder anywhere
Add external types with JSON tags
Add markers to your fields for declarative defaulting, validation, print columns
Run code generation if you really must 3
Run controller-gen to generate your CRDs
Add some tests to make sure your examples work and give the right output

i.e. you want generated clients, informers, etc ↩

What does this bring us?

Faster iteration/experimentation: it’s easier to rapidly iterate on changes, share experiments, etc, since features are contained within dynamically loadable YAML, as opposed to compiling an entire API server.
Decoupled releases: Hypothetically, types can be released both with Kubernetes releases and independently of them.
Declarative Features (better OpenAPI spec): since things like validation and defaulting are done through the OpenAPI spec, you get expose a better experience to clients: all of your validation information transfers to the exposed OpenAPI spec for external tooling to consume.
(project-selfish) Dogfood: We want to make CRDs a good, full-featured experience for users. If we can’t develop our feature, there’s a decent chance that users will have issues too.

But I don’t want to write OpenAPI by hand!

Enter controller-gen!

Part of the KubeBuilder SIG API Machinery subproject
Already in use in Cluster API, Volume Snapshots, and others Kubernetes projects
Generates your CRDs (including print columns, validating, defaulting, etc) from your Go code (quickly, too)

But what if I need …

complex validation?
declarative validation is generally sufficient.
OpenAPI’s validation is fairly robust (regexes, OneOf, format for things like date).
relly complex cross-field validation?
Use a validating webhook (another extensibility feature).

If we have problems with the existing validation, users will have them too! We should extend the format values for hard-to-validate common types.

declarative validation

// +kubebuilder:validation:Format="date-time"
type DateTime string

type Spec struct {
    // +optional
    // +kubebuilder:validation:Minimum=0
    Replicas *int32 `json:"replicas"`

    Next DateTime `json:"next"`

    // +kubebuilder:validation:Pattern=`^\w+-\d{2,}$`
    Prefix string `json:"prefix"`
}

type Status struct {
    Previous DateTime `json:"previous"`
}

But what if I need …

immutability? KEP-1265 Coming in 1.18+ 4
defaulting? Use declarative defaulting, which lets you apply defaults to arbrarity fields (including structs and lists)
special apply behavior? use the server-side apply merge configuration markers (CRDs don’t support strategic merge patch, but new consumers should be using server-side apply anyway).

Immutability, Defaulting, & Apply

type Spec struct {
    // +immutable
    // coming soon™
    DeepLink string `json:"deepLink"`

    // +default={fixedReplicas: 33}
    ScaleOptions *ScaleOptions  `json:"replicas"`

    // +listType=map
    // +listMapKey=key
    ActuallyAMap []Item `json:"actuallyAMap"`
}

type ScaleOptions struct {
    FixedReplicas *int32 `json:"fixedReplicas"`
    Burst int32 `json:"burst"`
}

type Item struct {
    Key string `json:"key"`
    Value string `json:"value"`
}

at this point, @directxman12 looks around the room to SIG API Machinery for confirmation ↩

But what if I need conversion ?

Conversion webhooks

Conversion webhooks

spec:
  conversion:
    strategy: Webhook
    webhookClientConfig:
      caBundle: ¯\_(ツ)_/¯ 
      service:
        namespace: kube-system
        name: my-crd-webhook
        path: /convert

type HubVersion struct { ... }
func (*HubVersion) Hub() {}

// +kubebuilder:storageversion
type OtherVersion struct { ... }
func (v *OtherVersion) ConvertTo(dst conversion.Hub) error {
    ...
}

But what if I need cross-field defaulting ?

Please don’t

But what if I need to embed a built-in API type ?

controller-gen will generate a basic validation schema for those types, but you won’t get full validation until pod create time Sometimes, this is desirable.

Otherwise, you’ll need to use a validating webhook.

We could improve the situation by any of:

adding declarative validation information to built-in types,
adding schema references 5

cue concerned glaces from SIG API Machinery ↩

But what if I need subresources ?

Scale and Status have built-in support in CRDs

Scale & Status subresources


// +kubebuilder:subresource:status
// +kubebuilder:subresource:scale:specpath=".spec.replicas",statuspath=".status.replicas"
type MyKind struct {}

No, I mean <insert bespoke subresource here>

Did you check with API review 🤨?

Yes!

You’ll need a built-in API

If we get too many of these, we should eventually consider supporting these in CRDs as well.

But what if I need high read-write peformance ?

You’ll need a built-in API

We don’t have proto support yet for CRDs.

We’ll want to add proto support to CRDs eventually

But what if I need field selectors ?

Are you Pod, coming from the glorious future where everything is CRDs and we wear shiny eyemasks while we drive our hovercars?

No, but…

Use an informer with an indexer.

But what if I need my types to be present for the APIServer to boot ?

You’ll want a built-in API

It’s technically possible to overcome this bootstrap problem in some cases, but you’ll need to be extra careful.

But what if I need to actually install my types on cluster boot ?

🤷

RuntimeClass and CSIDriver had issues since they had in-tree controllers depending on their CRDs, and we never quite solved them.

Talk to SIG Cluster Lifecycle…

So… have people actually done this?

Volume Snapshots
Cluster API
RuntimeClass: alpha with CRDs, migrated to a built-in API for beta
CSIDriver: alpha with CRDs, migrated to a built-in API for beta

TL;DW

CRDs have fast iteration time and good declarative features
We need to dogfood CRDs to ensure that they’re a good experience for users
There are still a few advanced use cases relating to performance, bootstrapping, and uncommon features that necessitate built-in APIs
All the problems (with the possible exception of bootstrapping) are solvable if we ~~believe in ourselves 🌈~~ write some KEPs

Any Questions?

controller-gen: book.kubebuilder.io/reference/generating-crd.html