Deploying and Scaling Microservices
with Kubernetes

Allez-y doucement avec le WiFi!
N'utilisez pas votre hotspot.
Ne chargez pas de vidéos, ne téléchargez pas de gros fichiers pendant la formation.

djalal, Nanterre, 13 sept.

1 / 724

Présentations

Bonjour, je suis:
- 👨🏾‍🎓 djalal (@enlamp, ENLAMP)
Cet atelier se déroulera de 9h à 17h.
La pause déjeuner se fera entre 12h et 13h30.

(avec 2 pauses café à 10h30 et 15h!)
N'hésitez pas à m'interrompre pour vos questions, à n'importe quel moment.
Surtout quand vous verrez des photos de conteneurs en plein écran!
Vos réactions en direct, questions, demande d'aide
sur https://tinyurl.com/docker-w-djalal

logistics.md

2 / 724

A brief introduction

This was initially written by Jérôme Petazzoni to support in-person, instructor-led workshops and tutorials
Credit is also due to multiple contributors — thank you!
You can also follow along on your own, at your own pace
We included as much information as possible in these slides
We recommend having a mentor to help you ...
... Or be comfortable spending some time reading the Kubernetes documentation ...
... And looking for answers on StackOverflow and other outlets

k8s/intro.md

3 / 724

À propos de ces diapositives

Tout le contenu est disponible dans un dépôt public Github:

https://github.com/jpetazzo/container.training
Vous pouvez obtenir une version à jour de ces diapos ici:

https://container.training/ (anglais) ou https://docker.djal.al/ (français)

4 / 724

À propos de ces diapositives

Tout le contenu est disponible dans un dépôt public Github:

https://github.com/jpetazzo/container.training
Vous pouvez obtenir une version à jour de ces diapos ici:

https://container.training/ (anglais) ou https://docker.djal.al/ (français)

Coquilles? Erreurs? Questions? N'hésitez pas à passer la souris en bas de diapo...

👇 Essayez! Le code source sera affiché et vous pourrez l'ouvrir dans Github pour le consulter et le corriger.

shared/about-slides.md

5 / 724

Détails supplémentaires

Cette diapo a une petite loupe dans le coin en haut à gauche.
Cette loupe signifie que ces diapos apportent des détails supplémentaires.
Vous pouvez les zapper si:
- vous êtes pressé(e);
- vous êtes tout nouveau et vous craignez la surcharge cognitive;
- vous ne souhaitez que l'essentiel des informations.
Vous pourrez toujours y revenir une autre fois, ils vous attendront ici ☺

shared/about-slides.md

6 / 724

(auto-generated TOC)

7 / 724

Chapitre 2

Premier contact avec kubectl
Installer Kubernetes
Lancer nos premiers conteneurs sur Kubernetes
Exposer des conteneurs

(auto-generated TOC)

8 / 724

Chapitre 3

Shipping images with a registry
Running our application on Kubernetes
Accessing the API with kubectl proxy
Controlling the cluster remotely
Accessing internal services
The Kubernetes dashboard
Security implications of kubectl apply
Scaling our demo app

(auto-generated TOC)

9 / 724

Chapitre 4

Daemon sets
Labels and selectors
Rolling updates
Namespaces
Kustomize

(auto-generated TOC)

10 / 724

Chapitre 5

Healthchecks
Accessing logs from the CLI
Centralized logging
Authentication and authorization
The CSR API
Pod Security Policies

(auto-generated TOC)

11 / 724

Chapitre 6

Exposing HTTP services with Ingress resources
Collecting metrics with Prometheus

(auto-generated TOC)

12 / 724

Chapitre 7

Volumes
Managing configuration

(auto-generated TOC)

13 / 724

Chapitre 8

Stateful sets
Running a Consul cluster
Local Persistent Volumes
Static pods

(auto-generated TOC)

14 / 724

Chapitre 9

Next steps
Links and resources

(auto-generated TOC)

shared/toc.md

15 / 724

Image separating from the next chapter

16 / 724

Pre-requis

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

17 / 724

Pre-requis

Être à l'aise avec la ligne de commande UNIX
- se déplacer à travers les dossiers
- modifier des fichiers
- un petit peu de bash-fu (variables d'environnement, boucles)

Un peu de savoir-faire sur Docker
- docker run, docker ps, docker build
- idéalement, vous savez comment écrire un Dockerfile et le générer.
  (même si c'est une ligne FROM et une paire de commandes RUN)
C'est totalement autorisé de ne pas être un expert Docker!

shared/prereqs.md

18 / 724

Raconte moi et j'oublie.
Apprends-moi et je me souviens.
Implique moi et j'apprends.

Attribué par erreur à Benjamin Franklin

(Plus probablement inspiré du philosophe chinois confucianiste Xunzi)

shared/prereqs.md

19 / 724

Sections pratiques

Cet atelier est entièrement pratique
Nous allons construire, livrer et exécuter des conteneurs!
Vous être invité(e) à reproduire toutes les démos
Les sections "pratique" sont clairement identifiées, via le rectangle gris ci-dessous

C'est le genre de trucs que vous êtes censé faire!
Allez à http://container.training/ pour voir ces diapos
Joignez-vous au salon de chat: In person!

shared/prereqs.md

20 / 724

Où allons-nous lancer nos conteneurs?

shared/prereqs.md

21 / 724

Tu gagnes un cluster!

shared/prereqs.md

22 / 724

Vous avez votre cluster de VMs dans le cloud

Chaque personne aura son cluster privé de VMs dans le cloud (partagé avec personne d'autre)
Les VMs resterons allumées toute la durée de la formation
Vous devez avoir une petite carte avec identifiant+mot de passe+adresses IP
Vous pouvez automatiquement SSH d'une VM à une autre
Les serveurs ont des alias: node1, node2, etc.

shared/prereqs.md

23 / 724

Pourquoi ne pas lancer nos conteneurs en local?

Installer cet outillage peut être difficile sur certaines machines

(CPU ou OS à 32bits... Portables sans accès admin, etc.)
Toute l'équipe a téléchargé ces images de conteneurs depuis le WiFi!
... et tout s'est bien passé (litéralement personne)
Tout ce dont vous avez besoin est un ordinateur (ou même une tablette), avec:
- une connexion internet
- un navigateur web
- un client SSH

shared/prereqs.md

24 / 724

Clients SSH

Sur Linux, OS X, FreeBSD... vous être sûrement déjà prêt(e)
Sur Windows, récupérez un de ces logiciels:
- putty
- Microsoft Win32 OpenSSH
- Git BASH
- MobaXterm

Sur Android, JuiceSSH (Play Store) marche plutôt pas mal.
Petit bonus pour: Mosh en lieu et place de SSH, si votre connexion internet à tendance à perdre des paquets.

shared/prereqs.md

25 / 724

What is this Mosh thing?

You don't have to use Mosh or even know about it to follow along.
We're just telling you about it because some of us think it's cool!

Mosh is "the mobile shell"
It is essentially SSH over UDP, with roaming features
It retransmits packets quickly, so it works great even on lossy connections

(Like hotel or conference WiFi)
It has intelligent local echo, so it works great even in high-latency connections

(Like hotel or conference WiFi)
It supports transparent roaming when your client IP address changes

(Like when you hop from hotel to conference WiFi)

shared/prereqs.md

26 / 724

Using Mosh

To install it: (apt|yum|brew) install mosh
It has been pre-installed on the VMs that we are using
To connect to a remote machine: mosh user@host

(It is going to establish an SSH connection, then hand off to UDP)
It requires UDP ports to be open

(By default, it uses a UDP port between 60000 and 61000)

shared/prereqs.md

27 / 724

Se connecter à notre environnement de test

Connectez-vous sur la première VM (node1) avec votre client SSH

Vérifiez que vous pouvez passer sur node2 sans mot de passe:
```
ssh node2
```
Tapez exit ou ^D pour revenir à node1

Si quoique ce soit va mal - appelez à l'aide!

shared/connecting.md

28 / 724

Doing or re-doing the workshop on your own?

Use something like Play-With-Docker or Play-With-Kubernetes

Zero setup effort; but environment are short-lived and might have limited resources
Create your own cluster (local or cloud VMs)

Small setup effort; small cost; flexible environments
Create a bunch of clusters for you and your friends (instructions)

Bigger setup effort; ideal for group training

shared/connecting.md

29 / 724

On travaillera (surtout) avec node1

Ces remarques s'appliquent uniquement en cas de serveurs multiples, bien sûr.

Sauf contre-indication expresse, toutes les commandes sont lancées depuis la première VM, node1
Tout code sera récupéré sur node1 uniquement.
En administration classique, nous n'avons pas besoin d'accéder aux autres serveurs.
Si nous devions diagnostiquer une panne, on utiliserait tout ou partie de:
- SSH (pour accéder aux logs de système, statut du daemon, etc.)
- l'API Docker (pour vérifier les conteneurs lancés, et l'état du moteur de conteneurs)

shared/connecting.md

30 / 724

Terminaux

Once in a while, the instructions will say:
"Open a new terminal."

There are multiple ways to do this:

create a new window or tab on your machine, and SSH into the VM;
use screen or tmux on the VM and open a new window from there.

You are welcome to use the method that you feel the most comfortable with.

shared/connecting.md

31 / 724

Tmux cheatsheet

Tmux is a terminal multiplexer like screen.

You don't have to use it or even know about it to follow along.
But some of us like to use it to switch between terminals.
It has been preinstalled on your workshop nodes.

Ctrl-b c → creates a new window
Ctrl-b n → go to next window
Ctrl-b p → go to previous window
Ctrl-b " → split window top/bottom
Ctrl-b % → split window left/right
Ctrl-b Alt-1 → rearrange windows in columns
Ctrl-b Alt-2 → rearrange windows in rows
Ctrl-b arrows → navigate to other windows
Ctrl-b d → detach session
tmux attach → reattach to session

shared/connecting.md

32 / 724

Versions installées

Kubernetes 1.14.3
Docker Engine 18.09.6
Docker Compose 1.21.1

Vérifier toutes les versions installées:

kubectl version
docker version
docker-compose -v

k8s/versions-k8s.md

33 / 724

Compatibilité entre Kubernetes et Docker

Kubernetes 1.13.x est uniquement validé avec les versions Docker Engine jusqu'à to 18.06
Kubernetes 1.14 est validé avec les versions Docker Engine versions jusqu'à 18.09
(la dernière version stable quand Kubernetes 1.14 est sorti)
Est-ce qu'on vit dangereusement en installant un Docker Engine "trop récent"?

34 / 724

Compatibilité entre Kubernetes et Docker

Kubernetes 1.13.x est uniquement validé avec les versions Docker Engine jusqu'à to 18.06
Kubernetes 1.14 est validé avec les versions Docker Engine versions jusqu'à 18.09
(la dernière version stable quand Kubernetes 1.14 est sorti)
Est-ce qu'on vit dangereusement en installant un Docker Engine "trop récent"?

Que nenni!
"Validé" = passe les tests d'intégration continue très intenses (et coûteux)
L'API Docker est versionnée, et offre une comptabilité arrière très forte.

(Si un client "parle" l'API v1.25, le Docker Engine va continuer à se comporter de la même façon)

k8s/versions-k8s.md

35 / 724

Image separating from the next chapter

36 / 724

Notre application de démo

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

37 / 724

Notre application de démo

Nous allons cloner le dépôt Github sur notre node1
Le dépôt contient aussi les scripts et outils à utiliser à travers la formation.

Cloner le dépôt sur node1:

git clone https://github.com/jpetazzo/container.training

(Vous pouvez aussi forker le dépôt sur Github et cloner votre version si vous préférez.)

shared/sampleapp.md

38 / 724

Télécharger et lancer l'application

Démarrons-la avant de s'y plonger, puisque le téléchargement peut prendre un peu de temps...

Aller dans le dossier dockercoins du dépôt cloné:
```
cd ~/container.training/dockercoins
```
Utiliser Compose pour générer et lancer tous les conteneurs:
```
docker-compose up
```

Compose indique à Docker de construire toutes les images de conteneurs (en téléchargeant les images de base correspondantes), puis de démarrer tous les conteneurs et d'afficher les logs agrégés.

shared/sampleapp.md

39 / 724

Qu'est-ce que cette application?40 / 724

Qu'est-ce que cette application?C'est un miner de DockerCoin! 💰🐳📦🚢
41 / 724

Qu'est-ce que cette application?

C'est un miner de DockerCoin! 💰🐳📦🚢
Non, on ne paiera pas le café avec des DockerCoins

42 / 724

Qu'est-ce que cette application?

C'est un miner de DockerCoin! 💰🐳📦🚢
Non, on ne paiera pas le café avec des DockerCoins
Comment DockerCoins fonctionne
- générer quelques octets aléatoires
- calculer une somme de hachage
- incrémenter un compteur (pour suivre la vitesse)
- répéter en boucle!

43 / 724

Qu'est-ce que cette application?

C'est un miner de DockerCoin! 💰🐳📦🚢
Non, on ne paiera pas le café avec des DockerCoins
Comment DockerCoins fonctionne
- générer quelques octets aléatoires
- calculer une somme de hachage
- incrémenter un compteur (pour suivre la vitesse)
- répéter en boucle!
DockerCoins n'est pas une crypto-monnaie

(les seuls points communs étant "aléatoire", "hachage", et "coins" dans le nom)

shared/sampleapp.md

44 / 724

DockerCoins à l'âge des microservices

DockerCoins est composée de 5 services:
- rng = un service web générant des octets au hasard
- hasher = un service web calculant un hachage basé sur les données POST-ées
- worker = un processus en arrière-plan utilisant rng et hasher
- webui = une interface web pour le suivi du travail
- redis = base de données (garde un décompte, mis à jour par worker)
Ces 5 services sont visibles dans le fichier Compose de l'application, docker-compose.yml

shared/sampleapp.md

45 / 724

Comment fonctionne DockerCoins

worker invoque le service web rng pour générer quelques octets aléatoires
worker invoque le service web hasher pour générer un hachage de ces octets
worker reboucle de manière infinie sur ces 2 tâches
chaque seconde, worker écrit dans redis pour indiquer combien de boucles ont été réalisées
webui interroge redis, pour calculer et exposer la "vitesse de hachage" dans notre navigateur

(Voir le diagramme en diapo suivante!)

shared/sampleapp.md

46 / 724

Diagramme montrant les 5 conteneurs de notre application

shared/sampleapp.md

47 / 724

Service discovery au pays des conteneursComment chaque service trouve l'adresse des autres?
48 / 724

Service discovery au pays des conteneurs

Comment chaque service trouve l'adresse des autres?
On ne code pas en dur des adresses IP dans le code.
On ne code pas en dur des FQDN dans le code, non plus.
On se connecte simplement avec un nom de service, et la magie du conteneur fait le reste

(Par magie du conteneur, nous entendons "l'astucieux DNS embarqué dynamique")

shared/sampleapp.md

49 / 724

Exemple dans `worker/worker.py`

redis = Redis("redis")
def get_random_bytes():
    r = requests.get("http://rng/32")
    return r.content
def hash_bytes(data):
    r = requests.post("http://hasher/",
                      data=data,
                      headers={"Content-Type": "application/octet-stream"})

(Code source complet disponible ici)

shared/sampleapp.md

50 / 724

Liens, nommage et découverte de service

Les conteneurs peuvent avoir des alias de réseau (résolus par DNS)
Compose dans sa version 2+ rend chaque conteneur disponible via son nom de service
Compose en version 1 rendait obligatoire la section "links"
Les alias de réseau sont automatiquement préfixé par un espace de nommage
- vous pouvez avoir plusieurs applications déclarées via un service appelé database
- les conteneurs dans l'appli bleue vont atteindre database via l'IP de la base de données bleue
- les conteneurs dans l'appli verte vont atteindre database via l'IP de la base de données verte

shared/sampleapp.md

51 / 724

Montrez-moi le code!

Vous pouvez ouvrir le dépôt Github avec tous les contenus de cet atelier:
https://github.com/jpetazzo/container.training
Cette application est dans le sous-dossier dockercoins
Le fichier Compose (docker-compose.yml) liste les 5 services
redis utilise une image officielle issue du Docker Hub
hasher, rng, worker, webui sont générés depuis un Dockerfile
Chaque Dockerfile de service et son code source est stocké dans son propre dossier

(hasher est dans le dossier hasher, rng est dans le dossier rng, etc.)

shared/sampleapp.md

52 / 724

Version du format de fichier Compose

Uniquement pertinent si vous avez utilisé Compose avant 2016...

Compose 1.6 a introduit le support d'un nouveau format de fichier Compose (alias "v2")
Les services ne sont plus au plus haut niveau, mais dans une section services.
Il doit y avoir une clé version tout en haut du fichier, avec la valeur "2" (la chaîne de caractères, pas le chiffre)
Les conteneurs sont placés dans un réseau dédié, rendant les links inutiles
Il existe d'autres différences mineures, mais la mise à jour est facile et assez directe.

shared/sampleapp.md

53 / 724

Notre application à l'oeuvre

A votre gauche, la bande "arc-en-ciel" montrant les noms de conteneurs
A votre droite, nous voyons la sortie standard de nos conteneurs
On peut voir le service worker exécutant des requêtes vers rng et hasher
Pour rng et hasher, on peut lire leur logs d'accès HTTP

shared/sampleapp.md

54 / 724

Se connecter à l'interface web

"Les logs, c'est excitant et drôle" (Citation de personne, jamais, vraiment)
Le conteneur webui expose un écran de contrôle web; allons-y voir.

Avec un navigateur, se connecter à node1 sur le port 8000
Rappel: les alias nodeX ne sont valides que sur les noeuds eux-mêmes.
Dans votre navigateur, vous aurez besoin de taper l'adresse IP de votre noeud.

Un diagramme devrait s'afficher, et après quelques secondes, une courbe en bleu va apparaître.

shared/sampleapp.md

55 / 724

Pourquoi le rythme semble irrégulier?

On dirait peu ou prou que la vitesse est de 4 hachages/seconde.
Ou plus précisément: 4 hachages/secondes avec des trous reguliers à zéro
Pourquoi?

56 / 724

Pourquoi le rythme semble irrégulier?

On dirait peu ou prou que la vitesse est de 4 hachages/seconde.
Ou plus précisément: 4 hachages/secondes avec des trous reguliers à zéro
Pourquoi?

L'appli a en réalité une vitesse constante et régulière de 3.33 hachages/seconde.
(ce qui correspond à 1 hachage toutes les 0.3 secondes, pour certaines raisons)
Oui, et donc?

shared/sampleapp.md

57 / 724

La raison qui fait que ce graphe n'est pas super

Le worker ne met pas à jour le compteur après chaque boucle, mais au maximum une fois par seconde.
La vitesse est calculée par le navigateur, qui vérifie le compte à peu près une fois par seconde.
Entre 2 mise à jours consécutives, le compteur augmentera soit de 4, ou de 0 (zéro).
La vitesse perçue sera donc 4 - 4 - 0 - 4 - 4 - 0, etc.
Que peut-on conclure de tout cela?

58 / 724

La raison qui fait que ce graphe n'est pas super

Le worker ne met pas à jour le compteur après chaque boucle, mais au maximum une fois par seconde.
La vitesse est calculée par le navigateur, qui vérifie le compte à peu près une fois par seconde.
Entre 2 mise à jours consécutives, le compteur augmentera soit de 4, ou de 0 (zéro).
La vitesse perçue sera donc 4 - 4 - 0 - 4 - 4 - 0, etc.
Que peut-on conclure de tout cela?

"Je suis carrément incapable d'écrire du bon code frontend" 😀 — Jérôme

shared/sampleapp.md

59 / 724

Arrêter notre application

Si nous stoppons Compose (avec ^C), il demandera poliment au Docker Engine d'arrêter l'appli
Le Docker Engine va envoyer un signal TERM aux conteneurs
Si les conteneurs ne quittent pas assez vite, l'Engine envoie le signal KILL

Arrêter l'application en tapant ^C

60 / 724

Arrêter notre application

Si nous stoppons Compose (avec ^C), il demandera poliment au Docker Engine d'arrêter l'appli
Le Docker Engine va envoyer un signal TERM aux conteneurs
Si les conteneurs ne quittent pas assez vite, l'Engine envoie le signal KILL

Arrêter l'application en tapant ^C

Certains conteneurs quittent immédiatement, d'autres prennent plus de temps. Les conteneurs qui ne gèrent pas le SIGTERM finissent pas être tués après 10 secs. Si nous sommes vraiment impatients, on peut taper ^C une seconde fois!

shared/sampleapp.md

61 / 724

Nettoyage

Avant de continuer, supprimons tous ces conteneurs.

Dire à Compose de tout enlever:
```
docker-compose down
```

shared/composedown.md

62 / 724

Image separating from the next chapter

63 / 724

Concepts Kubernetes

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

64 / 724

Concepts Kubernetes

Kubernetes est un système de gestion de conteneurs
Il lance et gère des applications conteneurisées sur un cluster

65 / 724

Concepts Kubernetes

Kubernetes est un système de gestion de conteneurs
Il lance et gère des applications conteneurisées sur un cluster
Qu'est-ce que ça signifie vraiment?

k8s/concepts-k8s.md

66 / 724

Tâches de base qu'on peut demander à Kubernetes67 / 724

Tâches de base qu'on peut demander à KubernetesDémarrer 5 conteneurs basés sur l'image atseashop/api:v1.3
68 / 724

Tâches de base qu'on peut demander à Kubernetes

Démarrer 5 conteneurs basés sur l'image atseashop/api:v1.3
Placer un load balancer interne devant ces conteneurs

69 / 724

Tâches de base qu'on peut demander à Kubernetes

Démarrer 5 conteneurs basés sur l'image atseashop/api:v1.3
Placer un load balancer interne devant ces conteneurs
Démarrer 10 conteneurs basés sur l'image atseashop/webfront:v1.3

70 / 724

Tâches de base qu'on peut demander à Kubernetes

Démarrer 5 conteneurs basés sur l'image atseashop/api:v1.3
Placer un load balancer interne devant ces conteneurs
Démarrer 10 conteneurs basés sur l'image atseashop/webfront:v1.3
Placer un load balancer public devant ces conteneurs

71 / 724

Tâches de base qu'on peut demander à Kubernetes

Démarrer 5 conteneurs basés sur l'image atseashop/api:v1.3
Placer un load balancer interne devant ces conteneurs
Démarrer 10 conteneurs basés sur l'image atseashop/webfront:v1.3
Placer un load balancer public devant ces conteneurs
C'est Black Friday (ou Noël!), le trafic explose, agrandir notre cluster et ajouter des conteneurs

72 / 724

Tâches de base qu'on peut demander à Kubernetes

Démarrer 5 conteneurs basés sur l'image atseashop/api:v1.3
Placer un load balancer interne devant ces conteneurs
Démarrer 10 conteneurs basés sur l'image atseashop/webfront:v1.3
Placer un load balancer public devant ces conteneurs
C'est Black Friday (ou Noël!), le trafic explose, agrandir notre cluster et ajouter des conteneurs
Nouvelle version! Remplacer les conteneurs avec la nouvelle image atseashop/webfront:v1.4

73 / 724

Tâches de base qu'on peut demander à Kubernetes

Démarrer 5 conteneurs basés sur l'image atseashop/api:v1.3
Placer un load balancer interne devant ces conteneurs
Démarrer 10 conteneurs basés sur l'image atseashop/webfront:v1.3
Placer un load balancer public devant ces conteneurs
C'est Black Friday (ou Noël!), le trafic explose, agrandir notre cluster et ajouter des conteneurs
Nouvelle version! Remplacer les conteneurs avec la nouvelle image atseashop/webfront:v1.4
Continuer de traiter les requêtes pendant la mise à jour; renouveler mes conteneurs un à la fois

k8s/concepts-k8s.md

74 / 724

D'autres choses que Kubernetes peut faire pour nous

Montée en charge basique
Déploiement Blue/Green, déploiement canary
Services de longue durée, mais aussi des tâches par lots (batch)
Surcharger notre cluster et évincer les tâches de basse priorité
Lancer des services à données persistentes (bases de données, etc.)
Contrôle d'accès assez fin, pour définir quelle action est autorisée pour qui sur quelle ressources.
Intégrer les services tiers (catalogue de services)
Automatiser des tâches complexes (opérateurs)

k8s/concepts-k8s.md

75 / 724

Architecture Kubernetes

k8s/concepts-k8s.md

76 / 724

haha je plaisante

k8s/concepts-k8s.md

77 / 724

Architecture Kubernetes

Ha ha ha ha
OK, je voulais juste vous faire peur, c'est plus simple que ça ❤️

k8s/concepts-k8s.md

78 / 724

celui là est plus proche du vrai

k8s/concepts-k8s.md

79 / 724

Crédits

Le premier schéma est un cluster Kubernetes avec du stockage sur l'iSCSI multi-path

(Grâce à Yongbok Kim)
Le second est une représentation simplifiée d'un cluster Kubernetes

(Grâce à Imesh Gunaratne)

k8s/concepts-k8s.md

80 / 724

Architecture de Kubernetes: les nodes

Les nodes qui font tourner nos conteneurs ont aussi une collection de services:
- un moteur de conteneurs (typiquement Docker)
- kubelet (l'agent de node)
- kube-proxy (un composant réseau nécessaire mais pas suffisant)
Les nodes étaient précédemment appelées des "minions"

(On peut encore rencontrer ce terme dans d'anciens articles ou documentation)

k8s/concepts-k8s.md

81 / 724

Architecture Kubernetes: le plan de contrôle

La logique de Kubernetes (ses "méninges") est une collection de services:
- Le serveur API (notre point d'entrée pour toute chose!)
- des services principaux comme l'ordonnanceur et le contrôleur
- etcd (une base clé-valeur hautement disponible; la "base de données" de Kubernetes)
Ensemble, ces services forment le plan de contrôle de notre cluster
Le plan de contrôle est aussi appelé le "master"

k8s/concepts-k8s.md

82 / 724

One of the best Kubernetes architecture diagrams available

k8s/concepts-k8s.md

83 / 724

Plan de contrôle sur des nodes spéciales

Il est commun de réserver une node dédiée au plan de contrôle

(Excepté pour les cluster de développement à node unique, comme avec minikube)
Cette node est alors appelée un "master"

(Oui, c'est ambigu: est-ce que le "master" est une node, ou tout le plan de contrôle?)
Les applis normales sont interdites de tourner sur cette node

(En utilisant un mécanisme appelé "taints")
Pour de la haute dispo, chaque service du plan de contrôle doit être résilient
Le plan de contrôle est alors répliqué sur de multiples noeuds

(On parle alors d'installation "multi-master")

k8s/concepts-k8s.md

84 / 724

Lancer le plan de contrôle sans conteneurs

Les services du plan de contrôle peuvent tourner avec ou sans conteneurs
Par exemple: puisque etcd est un service critique, certains le déploient directement sur un cluster dédié (sans conteneurs)

(C'est illustré dans le premier schéma "super compliqué")
Dans certaines offres commerciales Kubernetes (par ex. AKS, GKE, EKS), le plan de contrôle est invisible

(On "voit" juste un point d'entrée Kubernetes API)
Dans ce cas, il n'y a pas de node "master"

Pour cette raison, il est plus précis de parler de "plan de contrôle" plutôt que de "master".

k8s/concepts-k8s.md

85 / 724

Docker est-il obligatoire à tout prix?

Non!

86 / 724

Docker est-il obligatoire à tout prix?

Non!

Par défaut, Kubernetes choisit le Docker Engine pour lancer les conteneurs
On pourrait utiliser rkt ("Rocket") par CoreOS
Ou exploiter d'autre moteurs via la Container Runtime Interface

(comme CRI-O, ou containerd)

k8s/concepts-k8s.md

87 / 724

Devrait-on utiliser Docker?

Oui!

88 / 724

Devrait-on utiliser Docker?

Oui!

Dans cet atelier, on lancera d'abord notre appli sur un seul noeud
On devra générer les images et les envoyer à la ronde
On pourrait se débrouiller sans Docker
(et être diagnostiqué du syndrome NIH¹)
Docker est à ce jour le moteur de conteneurs le plus stable
(mais les alternatives mûrissent rapidement)

¹Not Invented Here

k8s/concepts-k8s.md

89 / 724

Devrait-on utiliser Docker?

Sur nos environnements de développement, les pipelines CI ... :

Oui, très certainement
Sur nos serveurs de production:

Oui (pour aujourd'hui)

Probablement pas (dans le futur)

Pour plus d'infos sur CRI sur le blog Kubernetes

k8s/concepts-k8s.md

90 / 724

Interagir avec Kubernetes

Le dialogue avec Kubernetes s'effectue via une API RESTful, la plupart du temps.
L'API Kubernetes définit un tas d'objets appelés resources
Ces ressources sont organisées par type, ou Kind (dans l'API)
Elle permet de déclarer, lire, modifier et supprimer les resources
Quelques types de ressources communs:
- node (une machine - physique ou virtuelle - de notre cluster)
- pod (groupe de conteneurs lancés ensemble sur une node)
- service (point d'entrée stable du réseau pour se connecter à un ou plusieurs conteneurs)
Et bien plus!
On peut afficher la liste complète avec kubectl api-resources

k8s/concepts-k8s.md

91 / 724

Node, pod, container

k8s/concepts-k8s.md

92 / 724

Crédits

Le premier diagramme est une grâcieuseté de Lucas Käldström, dans cette présentation
- c'est l'un des meilleurs diagrammes d'architecture Kubernetes disponibles!
Le second diagramme est une grâcieuseté de Weave Works
- un pod peut avoir plusieurs conteneurs qui travaillent ensemble
- les adresses IP sont associées aux pods, pas aux conteneurs eux-mêmes

Les deux diagrammes sont utilisés avec la permission de leurs auteurs.

k8s/concepts-k8s.md

93 / 724

Image separating from the next chapter

94 / 724

Déclaratif vs Impératif

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

95 / 724

Déclaratif vs Impératif

Notre orchestrateur de conteneurs insiste fortement sur sa nature déclarative
Déclaratif:

Je voudrais une tasse de thé
Impératif:

Faire bouillir de l'eau. Verser dans la théière. Ajouter les feuilles de thé. Infuser un moment. Servir dans une tasse.

96 / 724

Déclaratif vs Impératif

Notre orchestrateur de conteneurs insiste fortement sur sa nature déclarative
Déclaratif:

Je voudrais une tasse de thé
Impératif:

Faire bouillir de l'eau. Verser dans la théière. Ajouter les feuilles de thé. Infuser un moment. Servir dans une tasse.
Le mode déclaratif semble plus simple au début...

97 / 724

Déclaratif vs Impératif

Notre orchestrateur de conteneurs insiste fortement sur sa nature déclarative
Déclaratif:

Je voudrais une tasse de thé
Impératif:

Faire bouillir de l'eau. Verser dans la théière. Ajouter les feuilles de thé. Infuser un moment. Servir dans une tasse.
Le mode déclaratif semble plus simple au début...
... tant qu'on sait comment préparer du thé

shared/declarative.md

98 / 724

Déclaratif vs Impératif

Ce que le mode déclaratif devrait vraiment être:

Je voudrais une tasse de thé, obtenue en versant une infusion¹ de feuilles de thé dans une tasse.

99 / 724

Déclaratif vs Impératif

Ce que le mode déclaratif devrait vraiment être:

Je voudrais une tasse de thé, obtenue en versant une infusion¹ de feuilles de thé dans une tasse.

¹Une infusion est obtenue en laissant l'objet infuser quelques minutes dans l'eau chaude².

100 / 724

Déclaratif vs Impératif

Ce que le mode déclaratif devrait vraiment être:

Je voudrais une tasse de thé, obtenue en versant une infusion¹ de feuilles de thé dans une tasse.

¹Une infusion est obtenue en laissant l'objet infuser quelques minutes dans l'eau chaude².

²Liquide chaud obtenu en le versant dans un contenant³ approprié et le placer sur la gazinière.

101 / 724

Déclaratif vs Impératif

Ce que le mode déclaratif devrait vraiment être:

Je voudrais une tasse de thé, obtenue en versant une infusion¹ de feuilles de thé dans une tasse.

¹Une infusion est obtenue en laissant l'objet infuser quelques minutes dans l'eau chaude².

²Liquide chaud obtenu en le versant dans un contenant³ approprié et le placer sur la gazinière.

³Ah, finalement, des conteneurs! Quelque chose qu'on maitrise. Mettons-nous au boulot, n'est-ce pas?

102 / 724

Déclaratif vs Impératif

Ce que le mode déclaratif devrait vraiment être:

Je voudrais une tasse de thé, obtenue en versant une infusion¹ de feuilles de thé dans une tasse.

¹Une infusion est obtenue en laissant l'objet infuser quelques minutes dans l'eau chaude².

²Liquide chaud obtenu en le versant dans un contenant³ approprié et le placer sur la gazinière.

³Ah, finalement, des conteneurs! Quelque chose qu'on maitrise. Mettons-nous au boulot, n'est-ce pas?

Saviez-vous qu'il existait une norme ISO spécifiant comment infuser le thé?

shared/declarative.md

103 / 724

Déclaratif vs Impératif

Système impératifs:
- plus simple
- si une tache est interrompue, on doit la redémarrer de zéro
Système déclaratifs:
- si une tache est interrompue (ou si on arrive en plein milieu de la fête), on peut déduire ce qu'il manque, et on complète juste par ce qui est nécessaire.
- on doit être en mesure d'observer le système
- ... et de calculer un "diff" entre ce qui tourne en ce moment et ce que nous souhaitons

shared/declarative.md

104 / 724

Déclaratif vs Impératif dans Kubernetes

Pratiquement tout ce que nous lançons sur Kubernetes est déclaré dans une spec
Tout ce qu'on peut faire est écrire un spec et la pousser au serveur API

(en déclarant des ressources comme Pod ou Deployment)
Le serveur API va valider cette spec (la rejeter si elle est invalide)
Puis la stocker dans etcd
Un controller va "repérer" cette spécification et réagir en conséquence

k8s/declarative.md

105 / 724

Réconciliation d'état

Gardez un oeil sur les champs spec dans les fichiers YAML plus tard!
La spec décrit comment on voudrait que ce truc tourne
Kubernetes va réconcilier l'état courant avec la spec
(techniquement, c'est possible via un tas de controllers)
Quand on veut changer une ressources, on modifie la spec
Kubernetes va alors faire converger cette ressource

k8s/declarative.md

106 / 724

Image separating from the next chapter

107 / 724

Modèle réseau de Kubernetes

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

108 / 724

Modèle réseau de Kubernetes

En un mot comme en cent:

Notre cluster (nodes et pods) est un grand réseau IP tout plat.

109 / 724

Modèle réseau de Kubernetes

En un mot comme en cent:

Notre cluster (nodes et pods) est un grand réseau IP tout plat.
Dans le détail:
- toutes les nodes doivent être accessibles les unes aux autres, sans NAT
- tous les pods doivent être accessibles les uns aux autres, sans NAT
- pods et nodes doivent être accessibles les uns aux autres, sans NAT
- chaque pod connait sa propore adresse IP (sans NAT)
- les adresses IP sont assignées par l'implémentation du réseau (le plugin)
Kubernetes ne force pas une implémentation particulière

k8s/kubenet.md

110 / 724

Modèle réseau de Kubernetes: le bon

Tout peut se connecter à tout
Pas de traduction d'adresse
Pas de traduction de port
Pas de nouveau protocole
L'implémentation réseau peut décider comment allouer les adresses
Les adresses IP n'ont pas à être "portables" d'une node à une autre.

(On peut avoir par ex. un sous-réseau par node et utiliser une topologie simple)
La spécification est assez simple pour permettre différentes implémentations variées

k8s/kubenet.md

111 / 724

Modèle réseau de Kubernetes: le moins bon

Tout peut se connecter à tout
- si on cherche de la sécurité, on devra rajouter des règles réseau
- l'implémentation réseau que vous choisirez devra offrir cette fonction
Il y a littéralement des dizaines d'implémentations dans le monde

(Pas moins de 15 sont mentionnées dans la documentation Kubernetes)
Les pods ont une connectivité de niveau 3 (IP), et les services de niveau 4 (TCP ou UDP)

(Les services sont associés à un seul port TCP ou UDP; pas de groupe de ports ou de paquets IP arbitraires)
kube-proxy est sur le chemin de données quand il se connecte à un pod ou conteneur,
et ce n'est pas particulièrement rapide (il s'appuie sur du proxy utilisateur ou iptables)

k8s/kubenet.md

112 / 724

Modèle réseau de Kubernetes: en pratique

Les nodes que nous avons à notre disposition utilisent Weave
On ne recommande pas Weave plus que ça, c'est juste que "Ca Marche Pour Nous"
Pas d'inquiétude à propos des réserves sur la performance kube-proxy
Sauf si vous:
- saturez régulièrement des interfaces réseaux 10Gbps
- comptez les flux de paquets par millions à la seconde
- lancez des plate-formes VOIP ou de jeu de haut trafic
- faites des trucs bizarres qui lancent des millions de connexions simultanées
  (auquel cas vous êtes déjà familier avec l'optimisation du noyau)
Si nécessaire, des alternatives à kube-proxy existent, comme: kube-router

k8s/kubenet.md

113 / 724

La CNI (Container Network Interface)

La CNI est une spécification complète à destination des plugins réseau.
Quand un nouveau pod est créé, Kubernetes délègue la config réseau aux plugins CNI.

(ça peut être un seul plugin, ou une combinaison de plugins, chacun spécialisé dans une tache)
Généralement, un plugin CNI va:
- allouer une adresse IP (en appelant un plugin IPAM)
- ajouter une interface réseau dans le namespace réseau du pod
- configurer l'interface ainsi que les routes minimum, etc.
Tous les plugins CNI ne naissent pas égaux

(par ex. il ne supportent pas tous les politiques de réseau, obligatoires pour isoler les pods)

k8s/kubenet.md

114 / 724

Plusieurs cibles mouvantes

Le "réseau pod-à-pod" ou "réseau pod":
- fournit la communication entre pods et nodes
- est généralement implémenté via des plugins CNI
Le "réseau pod-à-service":
- fournit la communication interne et la répartition de charge
- est généralement implémenté avec kube-proxy (ou par ex. kube-router)
Network policies :
- jouent le rôle de firewall et de couche d'isolation
- peuvent être livrées avec le "réseau pod" ou fournit par un autre composant

k8s/kubenet.md

115 / 724

Encore plus de cibles mouvantes

Le trafic entrant peut être géré par plusieurs composants:
- quelque chose comme kube-proxy ou kube-router (pour les services NodePort)
- les load balancers (idéalement, connectés au réseau pod)
En théorie, il est possible d'utiliser plusieurs réseaux pods en parallèle

(avec des "meta-plugins" comme CNI-Genie ou Multus)
Quelques solutions peuvent remplir plusieurs de ces rôles

(par ex. kube-router peut être installé pour implémenter le réseau pod et/ou les network policies et/ou remplacer kube-proxy)

k8s/kubenet.md

116 / 724

Image separating from the next chapter

117 / 724

Premier contact avec kubectl

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

118 / 724

Premier contact avec `kubectl`

kubectl est (presque) le seul outil dont nous aurons besoin pour parler à Kubernetes
C'est un outil en ligne de commande très riche, autour de l'API Kubernetes

(Tout ce qu'on peut faire avec kubectl, est directement exécutable via l'API)
Sur nos machines, on trouvera un fichier ~/.kube/config avec:
- l'adresse de l'API Kubernetes
- le chemin vers nos certificats TLS d'identification
On peut aussi utiliser l'option --kubeconfig pour forcer un fichier de config
Ou passer directement --server, --user, etc.
kubectl se prononce "Cube Cé Té Elle", "Cube coeuteule", "Cube coeudeule"

k8s/kubectlget.md

119 / 724

`kubectl get`

Jetons un oeil aux ressources Node avec kubectl get!

Examiner la composition de notre cluster:
```
kubectl get node
```

Ces commandes sont équivalentes:

kubectl get no
kubectl get node
kubectl get nodes

k8s/kubectlget.md

120 / 724

Obtenir un affichage version "machine"

kubectl get peut afficher du JSON, YAML ou un format personnalisé

Sortir plus d'info sur les nodes
```
kubectl get nodes -o wide
```
Récupérons du YAML:
```
kubectl get no -o yaml
```
Ce bout de kind: List tout à la fin? C'est le type de notre résultat!

k8s/kubectlget.md

121 / 724

User et abuser de `kubectl` et `jq`

C'est super facile d'afficher ses propres rapports

Montrer la capacité de tous nos noeuds sous forme de flux d'objets JSON:

kubectl get nodes -o json | 
        jq ".items[] | {name:.metadata.name} + .status.capacity"

k8s/kubectlget.md

122 / 724

Qu'est-ce qui tourne là-dessous?

kubectl dispose de capacité d'introspection solides
On peut lister les types de ressources en lançant kubectl api-resources
(Sur Kubernetes 1.10 et les versions précédentes, il fallait taper kubectl get)
Pour détailler une ressource, c'est:
```
kubectl explain type
```
La définition d'un type de ressource s'affiche avec:
```
kubectl explain node.spec
```
ou afficher la définition complète de tous les champs et sous-champs:
```
kubectl explain node --recursive
```

k8s/kubectlget.md

123 / 724

Introspection vs. documentation

On peut accéder à la même information en lisant la documentation d'API
La doc est habituellement plus facile à lire, mais:
- elle ne montrera par les types (comme les Custom Resource Definitions)
- attention à bien utiliser la version correcte
kubectl api-resources and kubectl explain font de l'introspection

(en s'appuyant sur le serveur API, pour récupérer des définitions de types exactes)

k8s/kubectlget.md

124 / 724

Nommage de types

Les ressources les plus communes ont jusqu'à 3 formes de noms:
- singulier (par ex. node, service, deployment)
- pluriel (par ex. nodes, services, deployments)
- court (par ex. no, svc, deploy)
Certaines ressources n'ont pas de nom court
Endpoints n'ont qu'une forme au pluriel

(parce que même une seule ressource Endpoints est en fait une liste d'endpoints)

k8s/kubectlget.md

125 / 724

Détailler l'affichage

On peut taper kubectl get -o yaml pour un détail complet d'une ressource
Toutefois, le format YAML peut être à la fois trop verbeux et incomplet
Par exemple, kubectl get node node1 -o yaml est:
- trop verbeux (par ex. la liste des images disponibles sur cette node)
- incomplet (car on ne voit pas les pods qui y tournent)
- difficile à lire pour un administrateur humain
Pour une vue complète, on peut utiliser kubectl describe en alternative.

k8s/kubectlget.md

126 / 724

`kubectl describe`

kubectl describe requiert un type de ressource et (en option) un nom de ressource
Il est possible de fournir un préfixe de nom de ressource

(tous les objets contenant ce nom seront affichés)
kubectl describe va récupérer quelques infos de plus sur une ressource

Jeter un oeil aux infos de node1 avec une de ces commandes:
```
kubectl describe node/node1
kubectl describe node node1
```

(On devrait voir un tas de pods du plan de contrôle)

k8s/kubectlget.md

127 / 724

Services

Un service est un point d'entrée stable pour se connecter à "quelque chose"

(Dans la proposition initiale, on appelait ça un "portail")

Lister les services sur notre cluster avec une de ces commandes:
```
kubectl get services
kubectl get svc
```

128 / 724

Services

Un service est un point d'entrée stable pour se connecter à "quelque chose"

(Dans la proposition initiale, on appelait ça un "portail")

Lister les services sur notre cluster avec une de ces commandes:
```
kubectl get services
kubectl get svc
```

Il y a déjà un service sur notre cluster: l'API Kubernetes elle-même.

k8s/kubectlget.md

129 / 724

services ClusterIP

Un service ClusterIP est interne, disponible uniquement depuis le cluster
C'est utile pour faire l'introspection depuis l'intérieur de conteneurs.

Essayer de se connecter à l'API:
```
curl -k https://10.96.0.1
```
- -k est spécifié pour désactiver la vérification de certificat
- Attention à bien remplacer 10.96.0.1 avec l'IP CLUSTER affichée par kubectl get svc

NB :sur Docker for Desktop, l'API n'est accessible que sur https://localhost:6443/

130 / 724

services ClusterIP

Un service ClusterIP est interne, disponible uniquement depuis le cluster
C'est utile pour faire l'introspection depuis l'intérieur de conteneurs.

Essayer de se connecter à l'API:
```
curl -k https://10.96.0.1
```
- -k est spécifié pour désactiver la vérification de certificat
- Attention à bien remplacer 10.96.0.1 avec l'IP CLUSTER affichée par kubectl get svc

NB :sur Docker for Desktop, l'API n'est accessible que sur https://localhost:6443/

L'erreur que vous voyez était attendue: l'API Kubernetes exige une identification.

k8s/kubectlget.md

131 / 724

Lister les conteneurs qui tournent

Les conteneurs existent à travers des pods.
Un pod est un groupe de conteneurs:
- qui tournent ensemble (sur le même noeud)
- qui partagent des ressources (RAM, CPU; mais aussi réseau et volumes)

Lister les pods de notre cluster:
```
kubectl get pods
```

132 / 724

Lister les conteneurs qui tournent

Les conteneurs existent à travers des pods.
Un pod est un groupe de conteneurs:
- qui tournent ensemble (sur le même noeud)
- qui partagent des ressources (RAM, CPU; mais aussi réseau et volumes)

Lister les pods de notre cluster:
```
kubectl get pods
```

Ce ne sont pas là les pods que nous cherchons. Mais où sont-ils alors?!?

k8s/kubectlget.md

133 / 724

NamespacesLes espaces de nommage (namespaces) nous permettent de cloisonner des ressources.
Lister les namespaces de notre cluster avec une de ces commandes:kubectl get namespaces
kubectl get namespace
kubectl get ns


134 / 724

Namespaces

Les espaces de nommage (namespaces) nous permettent de cloisonner des ressources.

Lister les namespaces de notre cluster avec une de ces commandes:
```
kubectl get namespaces
kubectl get namespace
kubectl get ns
```

Vous savez quoi... Ce machin kube-system m'a l'air suspect.

En fait, je suis plutôt sûr de l'avoir vu tout à l'heure, quand on a tapé:

kubectl describe node node1

k8s/kubectlget.md

135 / 724

Accéder aux namespaces

Par défaut, kubectl utilise le namespace... default
On peut montrer toutes les ressources avec --all-namespaces

.exercise[

Lister les pods à travers tous les namespaces:
```
kubectl get pods --all-namespaces
```
Depuis Kubernetes 1.14, on peut aussi taper -A pour faire plus court:
```
kubectl get pods -A
```

Et voici nos pods système!

k8s/kubectlget.md

136 / 724

A quoi servent ces pods du plan de contrôle?

etcd est notre serveur etcd
kube-apiserver est le serveur API
kube-controller-manager et kube-scheduler sont d'autres composants maître
coredns fournit une découverte de services basé sur le DNS (il remplace kube-dns depuis 1.11)
kube-proxy tourne sur chaque node et gère le mapping de ports etc.
weave est le composant qui gère les réseaux superposés sur chaque noeud
la colonne READY indique le nombre de conteneurs dans chaque pod
les pods avec un nom qui finit en -node1 sont les composants maître
ils sont spécifiquement "scotchés" au noeud maître.

k8s/kubectlget.md

137 / 724

Viser un autre namespace

On peut aussi examiner un autre namespace (que default)

Lister uniquement les pods du namespace kube-system:

kubectl get pods --namespace=kube-system
kubectl get pods -n kube-system

k8s/kubectlget.md

138 / 724

Namespaces selon les commandes `kubectl`

On peut combiner -n/--namespace avec presque toute commande
Exemple:
- kubectl create --namespace=X pour créer quelque chose dans le namespace X
On peut utiliser -A/--all-namespaces avec la plupart des commandes qui manipulent plein d'objets à la fois
Exemples:
- kubectl delete supprime des ressources à travers plusieurs namespaces
- kubectl label ajoute/supprime des labels à travers plusieurs namespaces

k8s/kubectlget.md

139 / 724

Qu'en est-il de ce `kube-public`?

Lister les pods dans le namespace kube-public:
```
kubectl -n kube-public get pods
```

Rien!

kube-public est créé par kubeadm et utilisé pour établie une sécurité de base

k8s/kubectlget.md

140 / 724

Explorer `kube-public`

Le seul objet intéressant dans kube-public est un ConfigMap nommé cluster-info

Lister les ConfigMaps dans le namespace kube-public:
```
kubectl -n kube-public get configmaps
```

Inspecter cluster-info:

kubectl -n kube-public get configmap cluster-info -o yaml

Noter l'URI selfLink: /api/v1/namespaces/kube-public/configmaps/cluster-info

On pourrait en avoir besoin!

k8s/kubectlget.md

141 / 724

Accéder à `cluster-info`

Plus tôt, en interrogeant le serveur API, on a reçu une réponse Forbidden
Mais cluster-info est lisible par tous (y compris sans authentification)

Récupérer cluster-info:

curl -k https://10.96.0.1/api/v1/namespaces/kube-public/configmaps/cluster-info

Nous sommes capables d'accéder à cluster-info (sans auth)
Il contient un fichier kubeconfig

k8s/kubectlget.md

142 / 724

Récupérer `kubeconfig`

On peut facilement extraire le conenu du fichier kubeconfig de cette ConfigMap

Afficher le contenu de kubeconfig:

curl -sk https://10.96.0.1/api/v1/namespaces/kube-public/configmaps/cluster-info \
     | jq -r .data.kubeconfig

Ce fichier contient l'adresse canonique du serveur d'API, et la clé publique du CA.
Ce fichier ne contient pas les clés client ou tokens
Ce ne sont pas des infos sensibles, mais c'est essentiel pour établir une connexion sécurisée.

k8s/kubectlget.md

143 / 724

Qu'en est-il de `kube-node-lease`?

Depuis Kubernetes 1.14, il y a un namespace kube-node-lease

(ou dès la version 1.13 si la fonction NodeLease était activée)
Ce namespace contient un objet Lease par node
Un Node lease est une nouvelle manière d'implémenter les heartbeat de node

(c'est-à-dire qu'une node va contacter le master de temps à autre et dire "Je suis vivant!")
Pour plus de détails, voir KEP-0009 ou la doc de contrôleur de node k8s/kubectlget.md

144 / 724

Image separating from the next chapter

145 / 724

Installer Kubernetes

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

146 / 724

Installer KubernetesComment avons-nous installé les clusters Kubernetes à qui on parle?
147 / 724

Installer Kubernetes

Comment avons-nous installé les clusters Kubernetes à qui on parle?

On est passé par kubeadm sur des VMs fraîchement installées avec Ubuntu LTS
1. Installer Docker
2. Installer les paquets Kubernetes
3. Lancer kubeadm init sur la première node (c'est ce qui va déployer le plan de contrôle)
4. Installer Weave (la couche réseau overlay)
  (cette étape consiste en une seule commande kubectl apply; voir plus loin)
5. Lancer kubeadm join sur les autres nodes (avec le jeton fourni par kubeadm init)
6. Copier le fichier de configuration généré par kubeadm init
Allez voir README d'installation des VMs pour plus de détails.

k8s/setup-k8s.md

148 / 724

Inconvénients `kubeadm`

N'installe ni Docker ni autre moteur de conteneurs
N'installe pas de réseau overlay
N'installe pas de mode multi-maître (pas de haute disponibilité)

149 / 724

Inconvénients `kubeadm`

N'installe ni Docker ni autre moteur de conteneurs
N'installe pas de réseau overlay
N'installe pas de mode multi-maître (pas de haute disponibilité)

(En tout cas... pas encore!) Même si c'est une fonction expérimentale en version 1.12.)

150 / 724

Inconvénients `kubeadm`

N'installe ni Docker ni autre moteur de conteneurs
N'installe pas de réseau overlay
N'installe pas de mode multi-maître (pas de haute disponibilité)

(En tout cas... pas encore!) Même si c'est une fonction expérimentale en version 1.12.)

"C'est quand même le double de travail par rapport à un cluster Swarm 😕" -- Jérôme

k8s/setup-k8s.md

151 / 724

Autres options de déploiement

Si vous êtes sur Azure: AKS
Si vous êtes sur Google Cloud: GKE
Si vous êtes sur AWS: EKS, eksctl
Agnostique au cloud (AWS/DO/GCE (beta)/vSphere(alpha)): kops
Sur votre machine locale: minikube, kubespawn, Docker Desktop
Si vous avez un déploiement spécifique: kubicorn

Sans doute à ce jour l'outil le plus proche d'une solution multi-cloud/hybride, mais encore en développement.

k8s/setup-k8s.md

152 / 724

Encore plus d'options de déploiement

Si vous aimez Ansible: kubespray
Si vous aimez Terraform: typhoon
Si vous aimez Terraform et Puppet: tarmak
Vous pouvez aussi apprendre à installer chaque composant manuellement, avec l'excellent tutoriel Kubernetes The Hard Way

Kubernetes The Hard Way est optimisé pour l'apprentissage, ce qui implique de prendre les détours obligatoires à la compréhension de chaque étape nécessaire pour la construction d'un cluster Kubernetes.
Il y a aussi nombre d'options commerciales disponibles!
Pour une liste plus complète, veuillez consulter la documentation Kubernetes:
on y trouve un super guide pour choisir la bonne piste

k8s/setup-k8s.md

153 / 724

Image separating from the next chapter

154 / 724

Lancer nos premiers conteneurs sur Kubernetes

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

155 / 724

Lancer nos premiers conteneurs sur KubernetesCommençons par le commencement: on ne lance pas "un" conteneur
156 / 724

Lancer nos premiers conteneurs sur Kubernetes

Commençons par le commencement: on ne lance pas "un" conteneur
On va lancer un pod, et dans ce pod, on fera tourner un seul conteneur

157 / 724

Lancer nos premiers conteneurs sur Kubernetes

Commençons par le commencement: on ne lance pas "un" conteneur
On va lancer un pod, et dans ce pod, on fera tourner un seul conteneur
Dans ce conteneur, qui est dans le pod, nous allons lancer une simple commande ping
Puis nous allons démarrer plusieurs exemplaires du pod.

k8s/kubectlrun.md

158 / 724

Démarrer un simple pod avec `kubectl run`

On doit spécifier au moins un nom et l'image qu'on veut utiliser.

Lancer un ping sur 1.1.1.1, le serveur DNS public de Cloudflare:
```
kubectl run pingpong --image alpine ping 1.1.1.1
```

159 / 724

Démarrer un simple pod avec `kubectl run`

On doit spécifier au moins un nom et l'image qu'on veut utiliser.

Lancer un ping sur 1.1.1.1, le serveur DNS public de Cloudflare:
```
kubectl run pingpong --image alpine ping 1.1.1.1
```

(A partir de Kubernetes 1.12, un message s'affiche nous indiquant que kubectl run est déprécié. Laissons ça de côté pour l'instant.)

k8s/kubectlrun.md

160 / 724

Dans les coulisses de kubectl runJetons un oeil aux ressources créées par kubectl run
Lister tous types de ressources:kubectl get all


161 / 724

Dans les coulisses de `kubectl run`

Jetons un oeil aux ressources créées par kubectl run

Lister tous types de ressources:
```
kubectl get all
```

On devrait y voir quelque chose comme:

deployment.apps/pingpong (le deployment que nous venons juste de déclarer)
replicaset.apps/pingpong-xxxxxxxxxx (un replica set généré par ce déploiement)
pod/pingpong-xxxxxxxxxx-yyyyy (un pod généré par le replica set)

Note: à partir de 1.10.1, les types de ressources sont affichés plus en détail.

k8s/kubectlrun.md

162 / 724

Que représentent ces différentes choses?

Un deployment est une structure de haut niveau
- permet la montée en charge, les mises à jour, les retour-arrière
- plusieurs déploiements peuvent être cumulés pour implémenter un canary deployment
- délègue la gestion des pods aux replica sets
Un replica set est une structure de bas niveau
- s'assure qu'un nombre de pods identiques est lancé
- permet la montée en chage
- est rarement utilisé directement

Un replication controlller est l'ancêtre (déprécié) du replica set

k8s/kubectlrun.md

163 / 724

Notre déploiement `pingpong`

kubectl run déclare un deployment, deployment.apps/pingpong

NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/pingpong   1         1         1            1           10m

Ce déploiement a généré un replica set, replicaset.apps/pingpong-xxxxxxxxxx

NAME                                  DESIRED   CURRENT   READY     AGE
replicaset.apps/pingpong-7c8bbcd9bc   1         1         1         10m

Ce replica set a créé un pod, pod/pingpong-xxxxxxxxxx-yyyyy

NAME                            READY     STATUS    RESTARTS   AGE
pod/pingpong-7c8bbcd9bc-6c9qz   1/1       Running   0          10m

Nous verrons plus tard comment ces gars vivent ensemble pour:
- la montée en charge, la haute disponibilité, les mises à jour en continu

k8s/kubectlrun.md

164 / 724

Afficher la sortie du conteneur

Essayons la commande kubectl logs
On lui passera soit un nom de pod ou un type/name

(Par ex., si on spécifie un déploiement ou un replica set, il nous sortira le premier pod qu'il contient)
Sauf instruction expresse, la commande n'affichera que les logs du premier conteneur du pod

(Heureusement qu'il n'y en a qu'un chez nous!)

Afficher le résultat de notre commande ping:
```
kubectl logs deploy/pingpong
```

k8s/kubectlrun.md

165 / 724

Suivre les logs en temps réel

Tout comme docker logs, kubectl logs supporte des options bien pratiques:
- -f/--follow pour continuer à afficher les logs en temps réel (à la tail -f)
- --tail pour indiquer combien de lignes on veut afficher (depuis la fin)
- --since pour afficher les logs après un certain timestamp

Voir les derniers logs de notre commande ping:

kubectl logs deploy/pingpong --tail 1 --follow

k8s/kubectlrun.md

166 / 724

Escalader notre application

On peut ajouter plusieurs exemplaires de notre conteneur (notre pod, pour être plus précis), avec la commande kubectl scale

Escalader notre déploiement pingpong:

kubectl scale deploy/pingpong --replicas 3

Noter que cette autre commande fait exactement pareil:
```
kubectl scale deployment pingpong --replicas 3
```

Note: et si on avait essayé d'escalader replicaset.apps/pingpong-xxxxxxxxxx?

On pourrait! Mais le deployment le remarquerait tout de suite, et le baisserait au niveau initial.

k8s/kubectlrun.md

167 / 724

Résilience

Le déploiement pingpong affiche son replica set
Le replica set s'assure que le bon nombre de pods sont lancés
Que se passe-t-il en cas de disparition inattendue de pods?

Dans une fenêtre séparée, lister les pods en continu:
```
kubectl get pods -w
```

Supprimer un pod

kubectl delete pod pingpong-xxxxxxxxxx-yyyyy

k8s/kubectlrun.md

168 / 724

Et si on voulait que ça se passe différemment?

Et si on voulait lancer un conteneur "one-shot" qui ne va pas se relancer?
On pourrait utiliser kubectl run --restart=OnFailure or kubectl run --restart=Never
Ces commandes iraient déclarer des jobs ou pods au lieu de deployments.
Sous le capot, kubectl run invoque des "generators" pour déclarer les descriptions de ressources.
On pourrait aussi écrire ces descriptions de ressources nous-mêmes (typiquement en YAML),
et les créer sur le cluster avec kubectl apply -f (comme on verra plus loin)
Avec kubectl run --schedule=..., on peut aussi lancer des cronjobs

k8s/kubectlrun.md

169 / 724

Bon, et cet avertissement de déprécation?

Comme nous avons vu dans les diapos précédentes, kubectl run peut faire bien des choses.
Le type exact des ressources créées n'est pas flagrant.
Pour rendre les choses plus explicites, on préfère passer par kubectl create:
- kubectl create deployment pour créer un déploiement
- kubectl create job pour créer un job
- kubectl create cronjob pour lancer un job à intervalle régulier
  (depuis Kubernetes 1.14)
Finalement, kubectl run ne sera utilisé que pour démarrer des pods à usage unique

(voir https://github.com/kubernetes/kubernetes/pull/68132)

k8s/kubectlrun.md

170 / 724

Divers moyens de créer des ressources

kubectl run
- facile pour débuter
- versatile
kubectl create <ressource>
- explicite, mais lui manque quelques fonctions
- ne peut déclarer de CronJob avant Kubernetes 1.14
- ne peut pas transmettre des arguments en ligne de commande aux déploiements
kubectl create -f foo.yaml ou kubectl apply -f foo.yaml
- 100% des fonctions disponibles
- exige d'écrire du YAML

k8s/kubectlrun.md

171 / 724

Afficher les logs de multiple pods

Quand on spécifie un nom de déploiement, les logs d'un seul pod sont affichés
On peut afficher les logs de plusieurs pods en ajoutant un selector
Un sélecteur est une expression logique basée sur des labels
Pour faciliter les choses, quand on lance kubectl run monpetitnom, les objets associés ont un label run=monpetitnom

Afficher la dernière ligne de log pour tout pod confondus qui a le label run=pingpong:
```
kubectl logs -l run=pingpong --tail 1
```

k8s/kubectlrun.md

172 / 724

Suivre les logs de plusieurs pods

Est-ce qu'on peut suivre les logs de tous nos pods pingpong?

Combiner les options -l and -f:

kubectl logs -l run=pingpong --tail 1 -f

Note: combiner les options -l et -f est possible depuis Kubernetes 1.14!

Essayons de comprendre pourquoi ...

k8s/kubectlrun.md

173 / 724

Suivre les logs de plusieurs pods

Voyons ce qu'il se passe si on essaie de sortir les logs de plus de 5 pods

Escalader notre déploiement:

kubectl scale deployment pingpong --replicas=8

Afficher les logs en continu:

kubectl logs -l run=pingpong --tail 1 -f

On devrait voir un message du type:

error: you are attempting to follow 8 log streams,
but maximum allowed concurency is 5,
use --max-log-requests to increase the limit

k8s/kubectlrun.md

174 / 724

Pourquoi ne peut-on pas suivre les logs de plein de pods?

kubectl ouvre une connection vers le serveur API par pod
Pour chaque pod, le serveur API ouvre une autre connexion vers le kubelet correspondant.
S'il y a 1000 pods dans notre déploiement, cela fait 1000 connexions entrantes + 1000 connexions au serveur API.
Cela peut facilement surcharger le serveur API.
Avant la version 1.14 de K8S, il a été décidé de ne pas autoriser les multiple connexions.
A partir de 1.14, c'est autorisé, mais plafonné à 5 connexions.

(paramétrable via --max-log-requests)
Pour plus de détails sur les tenants et aboutissants, voir PR #67573

k8s/kubectlrun.md

175 / 724

Limitations de `kubectl logs`

On ne voit pas quel pod envoie quelle ligne
Si les pods sont redémarrés / remplacés, le flux de log se fige.
Si de nouveaux pods arrivent, on ne verra pas leurs logs.
Pour suivre les logs de plusieur pods, il nous faut écrire un sélecteur
Certains outils externes corrigent ces limitations:

(par ex.: Stern)

k8s/kubectlrun.md

176 / 724

`kubectl logs -l ... --tail N`

En exécutant cette commande dans Kubernetes 1.12, plusieurs lignes s'affichent
C'est une régression quand --tail et -l/--selector sont couplés.
Ca affichera toujours les 10 dernières lignes de la sortie de chaque conteneur.

(au lieu du nombre de lignes spécifiées en ligne de commande)
Le problème a été résolu dans Kubernetes 1.13

Voir #70554 pour plus de détails.

k8s/kubectlrun.md

177 / 724

Est-ce qu'on n'est pas en train de submerger 1.1.1.1?

Si on y réfléchit, c'est une bonne question!
Pourtant, pas d'inquiétude:

Le groupe de recherche APNIC a géré les adresses 1.1.1.1 et 1.0.0.1. Alors qu'elles étaient valides, tellement de gens les ont introduit dans divers systèmes, qu'ils étaient continuellement submergés par un flot de trafic polluant. L'APNIC voulait étudier cette pollution mais à chaque fois qu'ils ont essayé d'annoncer les IPs, le flot de trafic a submergé tout réseau conventionnel.

(Source: https://blog.cloudflare.com/announcing-1111/)
Il est tout à fait improbable que nos pings réunis puissent produire ne serait-ce qu'un modeste truc dans le NOC chez Cloudflare!

k8s/kubectlrun.md

178 / 724

19,000 mots

On dit "qu'une image vaut mille mots".

Les 19 diapos suivantes montrent ce qu'il se passe quand on lance:

kubectl run web --image=nginx --replicas=3

k8s/deploymentslideshow.md

179 / 724

k8s/deploymentslideshow.md

180 / 724

k8s/deploymentslideshow.md

181 / 724

k8s/deploymentslideshow.md

182 / 724

k8s/deploymentslideshow.md

183 / 724

k8s/deploymentslideshow.md

184 / 724

k8s/deploymentslideshow.md

185 / 724

k8s/deploymentslideshow.md

186 / 724

k8s/deploymentslideshow.md

187 / 724

k8s/deploymentslideshow.md

188 / 724

k8s/deploymentslideshow.md

189 / 724

k8s/deploymentslideshow.md

190 / 724

k8s/deploymentslideshow.md

191 / 724

k8s/deploymentslideshow.md

192 / 724

k8s/deploymentslideshow.md

193 / 724

k8s/deploymentslideshow.md

194 / 724

k8s/deploymentslideshow.md

195 / 724

k8s/deploymentslideshow.md

196 / 724

k8s/deploymentslideshow.md

197 / 724

k8s/deploymentslideshow.md

198 / 724

Image separating from the next chapter

199 / 724

Exposer des conteneurs

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

200 / 724

Exposer des conteneurs

kubectl expose crée un service pour des pods existant
Un service est une adresse stable pour un (ou plusieurs) pods
Si on veut se connecter à nos pods, on doit déclarer un nouveau service
Une fois que le service est créé, CoreDNS va nous permettre d'y accéder par son nom

(i.e après avoir créé le service hello, le nom hello va pointer quelque part)
Il y a différents types de services, détaillé dans les diapos suivantes:

ClusterIP, NodePort, LoadBalancer, ExternalName

k8s/kubectlexpose.md

201 / 724

Types de service de base

ClusterIP (type par défaut)
- une adresse IP virtuelle est allouée au service (dans un sous-réseau privé interne)
- cette adresse IP est accessible uniquement de l'intérieur du cluster (noeuds et pods)
- notre code peut se connecter au service par le numéro de port d'origine.
NodePort
- un port est alloué pour le service (par défaut, entre 30000 et 32768)
- ce port est exposé sur toutes les nodes et quiconque peut s'y connecter
- notre code doit être modifié pour pouvoir s'y connecter

Ces types de service sont toujours disponibles.

Sous le capot: kube-proxy passe par un proxy utilisateur et un tas de règles iptables.

k8s/kubectlexpose.md

202 / 724

Autres types de service

LoadBalancer
- un répartiteur de charge externe est alloué pour le service
- le répartiteur de charge est configuré en accord
  (par ex.: un service NodePort est créé, et le répartiteur y envoit le traffic vers son port)
- disponible seulement quand l'infrastructure sous-jacente fournit une sorte de "load balancer as a service"
  (e.g. AWS, Azure, GCE, OpenStack...)
ExternalName
- l'entrée DNS gérée par CoreDNS est juste un enregistrement CNAME
- ni port, ni adresse IP, ni rien d'autre n'est alloué

k8s/kubectlexpose.md

203 / 724

Lancer des conteneurs avec ouverture de port

Puisque ping n'a nulle part où se connecter, nous allons lancer quelque chose d'autre
On pourrait utiliser l'image officielle nginx, mais...

... comment distinguer un backend d'un autre!
On va plutôt passer par jpetazzo/httpenv, un petit serveur HTTP écrit en Go
jpetazzo/httpenv écoute sur le port 8888
Il renvoie ses variables d'environnement au format JSON
Les variables d'environnement vont inclure HOSTNAME, qui aura pour valeur le nom du pod

(et de ce fait, elle aura une valeur différente pour chaque backend)

k8s/kubectlexpose.md

204 / 724

Créer un déploiement pour notre serveur HTTP

On pourrait lancer kubectl run httpenv --image=jpetazzo/httpenv ...
Mais puisque kubectl run est bientôt obsolète, voyons voir comment utiliser kubectl create à sa place.

Dans une autre fenêtre, surveiller les pods (pour voir quand ils seront créés):
```
kubectl get pods -w
```

Créer un déploiement pour ce serveur HTTP super-léger: server:
```
kubectl create deployment httpenv --image=jpetazzo/httpenv
```

Escalader le déploiement à 10 replicas:

kubectl scale deployment httpenv --replicas=10

k8s/kubectlexpose.md

205 / 724

Exposer notre déploiement

Nous allons déclarer un service ClusterIP par défaut

Exposer le port HTTP de notre serveur:

kubectl expose deployment httpenv --port 8888

Rechercher quelles adresses IP ont été alloués:
```
kubectl get service
```

k8s/kubectlexpose.md

206 / 724

Services: constructions de 4e couche

On peut assigner des adresses IP aux services, mais elles restent dans la couche 4

(i.e un service n'est pas une adresse IP; c'est une IP+ protocole + port)
La raison en est l'implémentation actuelle de kube-proxy

(qui se base sur des mécanismes qui ne supportent pas la couche n°3)
Il en résulte que: vous devez indiquer le numéro de port de votre service
Lancer des services avec un (ou plusieurs) ports au hasard demandent des bidouilles

(comme passer le mode réseau au niveau hôte)

k8s/kubectlexpose.md

207 / 724

Tester notre service

Nous allons maintenant envoyer quelques requêtes HTTP à nos pods

Obtenir l'adresse IP qui a été allouée à notre service, sous forme de script:

IP=$(kubectl get svc httpenv -o go-template --template '{{ .spec.clusterIP }}')

Envoyer quelques requêtes:
```
curl http://$IP:8888/
```
Trop de lignes? Filtrer avec jq:
```
curl -s http://$IP:8888/ | jq .HOSTNAME
```

208 / 724

Tester notre service

Nous allons maintenant envoyer quelques requêtes HTTP à nos pods

Obtenir l'adresse IP qui a été allouée à notre service, sous forme de script:

IP=$(kubectl get svc httpenv -o go-template --template '{{ .spec.clusterIP }}')

Envoyer quelques requêtes:
```
curl http://$IP:8888/
```
Trop de lignes? Filtrer avec jq:
```
curl -s http://$IP:8888/ | jq .HOSTNAME
```

Essayez-le plusieurs fois! Nos requêtes sont réparties à travers plusieurs pods.

k8s/kubectlexpose.md

209 / 724

Si on n'a pas besoin d'un répartiteur de charge

Parfois, on voudrait accéder à nos services directement:
- si on veut économiser un petit bout de latence (typiquement < 1ms)
- si on a besoin de se connecter à n'importe quel port (au lieu de quelques ports fixes)
- si on a besoin de communiquer sur un autre protocole qu'UDP ou TCP
- si on veut décider comment répartir la charge depuis le client
- ...
Dans ce cas, on peut utiliser un "headless service"

k8s/kubectlexpose.md

210 / 724

Services Headless

On obtient un service headless en assignant la valeur None au champ clusterIP

(Soit avec --cluster-ip=None, ou via un bout de YAML)
Puisqu'il n'y a pas d'adresse IP virtuelle, il n'y pas non plus de répartiteur de charge
CoreDNS va retourner les adresses IP des pods comme autant d'enregistrements A
C'est un moyen facile de recenser tous les réplicas d'un deploiement.

k8s/kubectlexpose.md

211 / 724

Services et points d'entrée

Un service dispose d'un certain nombre de "points d'entrée" (endpoint)
Chaque endpoint est une combinaison "hôte + port" qui pointe vers le service
Les points d'entrée sont maintenus et mis à jour automatiquement par Kubernetes

Vérifier les endpoints que Kubernetes a associé au service httpenv:
```
kubectl describe service httpenv
```

Dans l'affichage, il y aura une ligne commençant par Endpoints:.

Cette ligne liste un tas d'adresses au format host:port.

k8s/kubectlexpose.md

212 / 724

Afficher les détails d'un endpoint

Dans le cas de nombreux endpoints, les commandes d'affichage tronquent la liste
```
kubectl get endpoints
```
Pour sortir la liste complète, on peut passer par la commande suivante:
```
kubectl describe endpoints httpenv
kubectl get endpoints httpenv -o yaml
```
Ces commandes vont nous montrer une liste d'adresses IP
On devrait retrouver ces mêmes adresses IP dans les pods correspondants:
```
kubectl get pods -l app=httpenv -o wide
```

k8s/kubectlexpose.md

213 / 724

`endpoints`, pas `endpoint`

endpoints est la seule ressource qui ne s'écrit jamais au singulier

$ kubectl get endpoint
error: the server doesn't have a resource type "endpoint"

C'est parce que le type lui-même est pluriel (contrairement à toutes les autres ressources)
Il n'existe aucun objet endpoint: type Endpoints struct
Le type ne représente pas un seul endpoint, mais une liste d'endpoints

k8s/kubectlexpose.md

214 / 724

Exposer des services au monde extérieur

Le type par défaut (ClusterIP) ne fonctionne que pour le trafic interne
Si nous voulons accepter du trafic depuis l'extene, on devra utiliser soit:
- NodePort (exposer un service sur un port TCP entre 30000 et 32768)
- LoadBalancer (si notre fournisseur de cloud est compatible)
- ExternalIP (passer par l'adresse IP externe d'une node)
- Ingress (mécanisme spécial pour les services HTTP)

Nous détaillerons l'usage des NodePorts et Ingresses plus loin.

k8s/kubectlexpose.md

215 / 724

Image separating from the next chapter

216 / 724

Shipping images with a registry

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

217 / 724

Shipping images with a registry

Initially, our app was running on a single node
We could build and run in the same place
Therefore, we did not need to ship anything
Now that we want to run on a cluster, things are different
The easiest way to ship container images is to use a registry

k8s/shippingimages.md

218 / 724

How Docker registries work (a reminder)

What happens when we execute docker run alpine ?
If the Engine needs to pull the alpine image, it expands it into library/alpine
library/alpine is expanded into index.docker.io/library/alpine
The Engine communicates with index.docker.io to retrieve library/alpine:latest
To use something else than index.docker.io, we specify it in the image name

Examples:

docker pull gcr.io/google-containers/alpine-with-bash:1.0
docker build -t registry.mycompany.io:5000/myimage:awesome .
docker push registry.mycompany.io:5000/myimage:awesome

k8s/shippingimages.md

219 / 724

Running DockerCoins on Kubernetes

Create one deployment for each component

(hasher, redis, rng, webui, worker)
Expose deployments that need to accept connections

(hasher, redis, rng, webui)
For redis, we can use the official redis image
For the 4 others, we need to build images and push them to some registry

k8s/shippingimages.md

220 / 724

Building and shipping images

There are many options!
Manually:
- build locally (with docker build or otherwise)
- push to the registry
Automatically:
- build and test locally
- when ready, commit and push a code repository
- the code repository notifies an automated build system
- that system gets the code, builds it, pushes the image to the registry

k8s/shippingimages.md

221 / 724

Which registry do we want to use?

There are SAAS products like Docker Hub, Quay ...
Each major cloud provider has an option as well

(ACR on Azure, ECR on AWS, GCR on Google Cloud...)
There are also commercial products to run our own registry

(Docker EE, Quay...)
And open source options, too!
When picking a registry, pay attention to its build system

(when it has one)

k8s/shippingimages.md

222 / 724

Using images from the Docker Hub

For everyone's convenience, we took care of building DockerCoins images
We pushed these images to the DockerHub, under the dockercoins user
These images are tagged with a version number, v0.1
The full image names are therefore:
- dockercoins/hasher:v0.1
- dockercoins/rng:v0.1
- dockercoins/webui:v0.1
- dockercoins/worker:v0.1

k8s/buildshiprun-dockerhub.md

223 / 724

Setting `$REGISTRY` and `$TAG`

In the upcoming exercises and labs, we use a couple of environment variables:
- $REGISTRY as a prefix to all image names
- $TAG as the image version tag
For example, the worker image is $REGISTRY/worker:$TAG
If you copy-paste the commands in these exercises:

make sure that you set $REGISTRY and $TAG first!
For example:
```
export REGISTRY=dockercoins TAG=v0.1
```
(this will expand $REGISTRY/worker:$TAG to dockercoins/worker:v0.1)

k8s/buildshiprun-dockerhub.md

224 / 724

Image separating from the next chapter

225 / 724

Running our application on Kubernetes

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

226 / 724

Running our application on Kubernetes

We can now deploy our code (as well as a redis instance)

Deploy redis:

kubectl create deployment redis --image=redis

Deploy everything else:

set -u
for SERVICE in hasher rng webui worker; do
  kubectl create deployment $SERVICE --image=$REGISTRY/$SERVICE:$TAG
done

k8s/ourapponkube.md

227 / 724

Is this working?

After waiting for the deployment to complete, let's look at the logs!

(Hint: use kubectl get deploy -w to watch deployment events)

Look at some logs:

kubectl logs deploy/rng
kubectl logs deploy/worker

228 / 724

Is this working?

After waiting for the deployment to complete, let's look at the logs!

(Hint: use kubectl get deploy -w to watch deployment events)

Look at some logs:

kubectl logs deploy/rng
kubectl logs deploy/worker

🤔 rng is fine ... But not worker.

229 / 724

Is this working?

After waiting for the deployment to complete, let's look at the logs!

(Hint: use kubectl get deploy -w to watch deployment events)

Look at some logs:

kubectl logs deploy/rng
kubectl logs deploy/worker

🤔 rng is fine ... But not worker.

💡 Oh right! We forgot to expose.

k8s/ourapponkube.md

230 / 724

Connecting containers together

Three deployments need to be reachable by others: hasher, redis, rng
worker doesn't need to be exposed
webui will be dealt with later

Expose each deployment, specifying the right port:

kubectl expose deployment redis --port 6379
kubectl expose deployment rng --port 80
kubectl expose deployment hasher --port 80

k8s/ourapponkube.md

231 / 724

Is this working yet?

The worker has an infinite loop, that retries 10 seconds after an error

Stream the worker's logs:
```
kubectl logs deploy/worker --follow
```
(Give it about 10 seconds to recover)

232 / 724

Is this working yet?

The worker has an infinite loop, that retries 10 seconds after an error

Stream the worker's logs:
```
kubectl logs deploy/worker --follow
```
(Give it about 10 seconds to recover)

We should now see the worker, well, working happily.

k8s/ourapponkube.md

233 / 724

Exposing services for external access

Now we would like to access the Web UI
We will expose it with a NodePort

(just like we did for the registry)

Create a NodePort service for the Web UI:

kubectl expose deploy/webui --type=NodePort --port=80

Check the port that was allocated:
```
kubectl get svc
```

k8s/ourapponkube.md

234 / 724

Accessing the web UI

We can now connect to any node, on the allocated node port, to view the web UI

Open the web UI in your browser (http://node-ip-address:3xxxx/)

235 / 724

Accessing the web UI

We can now connect to any node, on the allocated node port, to view the web UI

Open the web UI in your browser (http://node-ip-address:3xxxx/)

Yes, this may take a little while to update. (Narrator: it was DNS.)

236 / 724

Accessing the web UI

We can now connect to any node, on the allocated node port, to view the web UI

Open the web UI in your browser (http://node-ip-address:3xxxx/)

Yes, this may take a little while to update. (Narrator: it was DNS.)

Alright, we're back to where we started, when we were running on a single node!

k8s/ourapponkube.md

237 / 724

Image separating from the next chapter

238 / 724

Accessing the API with kubectl proxy

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

239 / 724

Accessing the API with `kubectl proxy`

The API requires us to authenticate¹
There are many authentication methods available, including:
- TLS client certificates
  (that's what we've used so far)
- HTTP basic password authentication
  (from a static file; not recommended)
- various token mechanisms
  (detailed in the documentation)

¹OK, we lied. If you don't authenticate, you are considered to be user system:anonymous, which doesn't have any access rights by default.

k8s/kubectlproxy.md

240 / 724

Accessing the API directly

Let's see what happens if we try to access the API directly with curl

Retrieve the ClusterIP allocated to the kubernetes service:
```
kubectl get svc kubernetes
```
Replace the IP below and try to connect with curl:
```
curl -k https://10.96.0.1/
```

The API will tell us that user system:anonymous cannot access this path.

k8s/kubectlproxy.md

241 / 724

Authenticating to the API

If we wanted to talk to the API, we would need to:

extract our TLS key and certificate information from ~/.kube/config

(the information is in PEM format, encoded in base64)
use that information to present our certificate when connecting

(for instance, with openssl s_client -key ... -cert ... -connect ...)
figure out exactly which credentials to use

(once we start juggling multiple clusters)
change that whole process if we're using another authentication method

🤔 There has to be a better way!

k8s/kubectlproxy.md

242 / 724

Using `kubectl proxy` for authentication

kubectl proxy runs a proxy in the foreground
This proxy lets us access the Kubernetes API without authentication

(kubectl proxy adds our credentials on the fly to the requests)
This proxy lets us access the Kubernetes API over plain HTTP
This is a great tool to learn and experiment with the Kubernetes API
... And for serious uses as well (suitable for one-shot scripts)
For unattended use, it's better to create a service account

k8s/kubectlproxy.md

243 / 724

Trying `kubectl proxy`

Let's start kubectl proxy and then do a simple request with curl!

Start kubectl proxy in the background:
```
kubectl proxy &
```
Access the API's default route:
```
curl localhost:8001
```

Terminate the proxy:
```
kill %1
```

The output is a list of available API routes.

k8s/kubectlproxy.md

244 / 724

`kubectl proxy` is intended for local use

By default, the proxy listens on port 8001

(But this can be changed, or we can tell kubectl proxy to pick a port)
By default, the proxy binds to 127.0.0.1

(Making it unreachable from other machines, for security reasons)
By default, the proxy only accepts connections from:

^localhost$,^127\.0\.0\.1$,^\[::1\]$
This is great when running kubectl proxy locally
Not-so-great when you want to connect to the proxy from a remote machine

k8s/kubectlproxy.md

245 / 724

Running `kubectl proxy` on a remote machine

If we wanted to connect to the proxy from another machine, we would need to:
- bind to INADDR_ANY instead of 127.0.0.1
- accept connections from any address

This is achieved with:

kubectl proxy --port=8888 --address=0.0.0.0 --accept-hosts=.*

Do not do this on a real cluster: it opens full unauthenticated access!

k8s/kubectlproxy.md

246 / 724

Security considerations

Running kubectl proxy openly is a huge security risk
It is slightly better to run the proxy where you need it

(and copy credentials, e.g. ~/.kube/config, to that place)
It is even better to use a limited account with reduced permissions

k8s/kubectlproxy.md

247 / 724

Good to know ...

kubectl proxy also gives access to all internal services

Specifically, services are exposed as such:

/api/v1/namespaces/<namespace>/services/<service>/proxy

We can use kubectl proxy to access an internal service in a pinch

(or, for non HTTP services, kubectl port-forward)
This is not very useful when running kubectl directly on the cluster

(since we could connect to the services directly anyway)
But it is very powerful as soon as you run kubectl from a remote machine

k8s/kubectlproxy.md

248 / 724

Image separating from the next chapter

249 / 724

Controlling the cluster remotely

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

250 / 724

Controlling the cluster remotely

All the operations that we do with kubectl can be done remotely
In this section, we are going to use kubectl from our local machine

k8s/localkubeconfig.md

251 / 724

Requirements

The exercises in this chapter should be done on your local machine.

kubectl is officially available on Linux, macOS, Windows

(and unofficially anywhere we can build and run Go binaries)
You may skip these exercises if you are following along from:
- a tablet or phone
- a web-based terminal
- an environment where you can't install and run new binaries

k8s/localkubeconfig.md

252 / 724

Installing `kubectl`

If you already have kubectl on your local machine, you can skip this

Download the kubectl binary from one of these links:

Linux | macOS | Windows
On Linux and macOS, make the binary executable with chmod +x kubectl

(And remember to run it with ./kubectl or move it to your $PATH)

Note: if you are following along with a different platform (e.g. Linux on an architecture different from amd64, or with a phone or tablet), installing kubectl might be more complicated (or even impossible) so feel free to skip this section.

k8s/localkubeconfig.md

253 / 724

Testing `kubectl`

Check that kubectl works correctly

(before even trying to connect to a remote cluster!)

Ask kubectl to show its version number:
```
kubectl version --client
```

The output should look like this:

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0",
GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean",
BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc",
Platform:"linux/amd64"}

k8s/localkubeconfig.md

254 / 724

Preserving the existing `~/.kube/config`

If you already have a ~/.kube/config file, rename it

(we are going to overwrite it in the following slides!)
If you never used kubectl on your machine before: nothing to do!

Make a copy of ~/.kube/config; if you are using macOS or Linux, you can do:
```
cp ~/.kube/config ~/.kube/config.before.training
```
If you are using Windows, you will need to adapt this command

k8s/localkubeconfig.md

255 / 724

Copying the configuration file from `node1`

The ~/.kube/config file that is on node1 contains all the credentials we need
Let's copy it over!

Copy the file from node1; if you are using macOS or Linux, you can do:

scp USER@X.X.X.X:.kube/config ~/.kube/config
# Make sure to replace X.X.X.X with the IP address of node1,
# and USER with the user name used to log into node1!

If you are using Windows, adapt these instructions to your SSH client

k8s/localkubeconfig.md

256 / 724

Updating the server address

There is a good chance that we need to update the server address
To know if it is necessary, run kubectl config view
Look for the server: address:
- if it matches the public IP address of node1, you're good!
- if it is anything else (especially a private IP address), update it!

To update the server address, run:

kubectl config set-cluster kubernetes --server=https://X.X.X.X:6443
# Make sure to replace X.X.X.X with the IP address of node1!

k8s/localkubeconfig.md

257 / 724

What if we get a certificate error?

Generally, the Kubernetes API uses a certificate that is valid for:
- kubernetes
- kubernetes.default
- kubernetes.default.svc
- kubernetes.default.svc.cluster.local
- the ClusterIP address of the kubernetes service
- the hostname of the node hosting the control plane (e.g. node1)
- the IP address of the node hosting the control plane
On most clouds, the IP address of the node is an internal IP address
... And we are going to connect over the external IP address
... And that external IP address was not used when creating the certificate!

k8s/localkubeconfig.md

258 / 724

Working around the certificate error

We need to tell kubectl to skip TLS verification

(only do this with testing clusters, never in production!)

The following command will do the trick:

kubectl config set-cluster kubernetes --insecure-skip-tls-verify

k8s/localkubeconfig.md

259 / 724

Checking that we can connect to the cluster

We can now run a couple of trivial commands to check that all is well

Check the versions of the local client and remote server:
```
kubectl version
```
View the nodes of the cluster:
```
kubectl get nodes
```

We can now utilize the cluster exactly as we did before, except that it's remote.

k8s/localkubeconfig.md

260 / 724

Image separating from the next chapter

261 / 724

Accessing internal services

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

262 / 724

Accessing internal services

When we are logged in on a cluster node, we can access internal services

(by virtue of the Kubernetes network model: all nodes can reach all pods and services)
When we are accessing a remote cluster, things are different

(generally, our local machine won't have access to the cluster's internal subnet)
How can we temporarily access a service without exposing it to everyone?

263 / 724

Accessing internal services

When we are logged in on a cluster node, we can access internal services

(by virtue of the Kubernetes network model: all nodes can reach all pods and services)
When we are accessing a remote cluster, things are different

(generally, our local machine won't have access to the cluster's internal subnet)
How can we temporarily access a service without exposing it to everyone?
kubectl proxy: gives us access to the API, which includes a proxy for HTTP resources
kubectl port-forward: allows forwarding of TCP ports to arbitrary pods, services, ...

k8s/accessinternal.md

264 / 724

Suspension of disbelief

The exercises in this section assume that we have set up kubectl on our local machine in order to access a remote cluster.

We will therefore show how to access services and pods of the remote cluster, from our local machine.

You can also run these exercises directly on the cluster (if you haven't installed and set up kubectl locally).

Running commands locally will be less useful (since you could access services and pods directly), but keep in mind that these commands will work anywhere as long as you have installed and set up kubectl to communicate with your cluster.

k8s/accessinternal.md

265 / 724

`kubectl proxy` in theory

Running kubectl proxy gives us access to the entire Kubernetes API
The API includes routes to proxy HTTP traffic
These routes look like the following:

/api/v1/namespaces/<namespace>/services/<service>/proxy
We just add the URI to the end of the request, for instance:

/api/v1/namespaces/<namespace>/services/<service>/proxy/index.html
We can access services and pods this way

k8s/accessinternal.md

266 / 724

`kubectl proxy` in practice

Let's access the webui service through kubectl proxy

Run an API proxy in the background:
```
kubectl proxy &
```

Access the webui service:

curl localhost:8001/api/v1/namespaces/default/services/webui/proxy/index.html

Terminate the proxy:
```
kill %1
```

k8s/accessinternal.md

267 / 724

`kubectl port-forward` in theory

What if we want to access a TCP service?
We can use kubectl port-forward instead
It will create a TCP relay to forward connections to a specific port

(of a pod, service, deployment...)
The syntax is:

kubectl port-forward service/name_of_service local_port:remote_port
If only one port number is specified, it is used for both local and remote ports

k8s/accessinternal.md

268 / 724

`kubectl port-forward` in practice

Let's access our remote Redis server

Forward connections from local port 10000 to remote port 6379:
```
kubectl port-forward svc/redis 10000:6379 &
```
Connect to the Redis server:
```
telnet localhost 10000
```
Issue a few commands, e.g. INFO server then QUIT

Terminate the port forwarder:
```
kill %1
```

k8s/accessinternal.md

269 / 724

Image separating from the next chapter

270 / 724

The Kubernetes dashboard

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

271 / 724

The Kubernetes dashboard

Kubernetes resources can also be viewed with a web dashboard
That dashboard is usually exposed over HTTPS

(this requires obtaining a proper TLS certificate)
Dashboard users need to authenticate
We are going to take a dangerous shortcut

k8s/dashboard.md

272 / 724

The insecure method

We could (and should) use Let's Encrypt ...
... but we don't want to deal with TLS certificates
We could (and should) learn how authentication and authorization work ...
... but we will use a guest account with admin access instead

Yes, this will open our cluster to all kinds of shenanigans. Don't do this at home.

k8s/dashboard.md

273 / 724

Running a very insecure dashboard

We are going to deploy that dashboard with one single command
This command will create all the necessary resources

(the dashboard itself, the HTTP wrapper, the admin/guest account)
All these resources are defined in a YAML file
All we have to do is load that YAML file with with kubectl apply -f

Create all the dashboard resources, with the following command:

kubectl apply -f ~/container.training/k8s/insecure-dashboard.yaml

k8s/dashboard.md

274 / 724

Connecting to the dashboard

Check which port the dashboard is on:
```
kubectl get svc dashboard
```

You'll want the 3xxxx port.

Connect to http://oneofournodes:3xxxx/

The dashboard will then ask you which authentication you want to use.

k8s/dashboard.md

275 / 724

Dashboard authentication

We have three authentication options at this point:
- token (associated with a role that has appropriate permissions)
- kubeconfig (e.g. using the ~/.kube/config file from node1)
- "skip" (use the dashboard "service account")
Let's use "skip": we're logged in!

276 / 724

Dashboard authentication

We have three authentication options at this point:
- token (associated with a role that has appropriate permissions)
- kubeconfig (e.g. using the ~/.kube/config file from node1)
- "skip" (use the dashboard "service account")
Let's use "skip": we're logged in!

By the way, we just added a backdoor to our Kubernetes cluster!

k8s/dashboard.md

277 / 724

Running the Kubernetes dashboard securely

The steps that we just showed you are for educational purposes only!
If you do that on your production cluster, people can and will abuse it
For an in-depth discussion about securing the dashboard,
check this excellent post on Heptio's blog

k8s/dashboard.md

278 / 724

Image separating from the next chapter

279 / 724

Security implications of kubectl apply

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

280 / 724

Security implications of `kubectl apply`

When we do kubectl apply -f <URL>, we create arbitrary resources
Resources can be evil; imagine a deployment that ...

281 / 724

Security implications of `kubectl apply`

When we do kubectl apply -f <URL>, we create arbitrary resources
Resources can be evil; imagine a deployment that ...
- starts bitcoin miners on the whole cluster

282 / 724

Security implications of `kubectl apply`

When we do kubectl apply -f <URL>, we create arbitrary resources
Resources can be evil; imagine a deployment that ...
- starts bitcoin miners on the whole cluster
- hides in a non-default namespace

283 / 724

Security implications of `kubectl apply`

When we do kubectl apply -f <URL>, we create arbitrary resources
Resources can be evil; imagine a deployment that ...
- starts bitcoin miners on the whole cluster
- hides in a non-default namespace
- bind-mounts our nodes' filesystem

284 / 724

Security implications of `kubectl apply`

When we do kubectl apply -f <URL>, we create arbitrary resources
Resources can be evil; imagine a deployment that ...
- starts bitcoin miners on the whole cluster
- hides in a non-default namespace
- bind-mounts our nodes' filesystem
- inserts SSH keys in the root account (on the node)

285 / 724

Security implications of `kubectl apply`

When we do kubectl apply -f <URL>, we create arbitrary resources
Resources can be evil; imagine a deployment that ...
- starts bitcoin miners on the whole cluster
- hides in a non-default namespace
- bind-mounts our nodes' filesystem
- inserts SSH keys in the root account (on the node)
- encrypts our data and ransoms it

286 / 724

Security implications of `kubectl apply`

When we do kubectl apply -f <URL>, we create arbitrary resources
Resources can be evil; imagine a deployment that ...
- starts bitcoin miners on the whole cluster
- hides in a non-default namespace
- bind-mounts our nodes' filesystem
- inserts SSH keys in the root account (on the node)
- encrypts our data and ransoms it
- ☠️☠️☠️

k8s/dashboard.md

287 / 724

`kubectl apply` is the new `curl | sh`

curl | sh is convenient
It's safe if you use HTTPS URLs from trusted sources

288 / 724

`kubectl apply` is the new `curl | sh`

curl | sh is convenient
It's safe if you use HTTPS URLs from trusted sources
kubectl apply -f is convenient
It's safe if you use HTTPS URLs from trusted sources
Example: the official setup instructions for most pod networks

289 / 724

`kubectl apply` is the new `curl | sh`

curl | sh is convenient
It's safe if you use HTTPS URLs from trusted sources
kubectl apply -f is convenient
It's safe if you use HTTPS URLs from trusted sources
Example: the official setup instructions for most pod networks
It introduces new failure modes

(for instance, if you try to apply YAML from a link that's no longer valid)

k8s/dashboard.md

290 / 724

Image separating from the next chapter

291 / 724

Scaling our demo app

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

292 / 724

Scaling our demo app

Our ultimate goal is to get more DockerCoins

(i.e. increase the number of loops per second shown on the web UI)
Let's look at the architecture again:
The loop is done in the worker; perhaps we could try adding more workers?

k8s/scalingdockercoins.md

293 / 724

Adding another worker

All we have to do is scale the worker Deployment

Open two new terminals to check what's going on with pods and deployments:
```
kubectl get pods -w
kubectl get deployments -w
```

Now, create more worker replicas:

kubectl scale deployment worker --replicas=2

After a few seconds, the graph in the web UI should show up.

k8s/scalingdockercoins.md

294 / 724

Adding more workers

If 2 workers give us 2x speed, what about 3 workers?

Scale the worker Deployment further:

kubectl scale deployment worker --replicas=3

The graph in the web UI should go up again.

(This is looking great! We're gonna be RICH!)

k8s/scalingdockercoins.md

295 / 724

Adding even more workersLet's see if 10 workers give us 10x speed!
Scale the worker Deployment to a bigger number:kubectl scale deployment worker --replicas=10


296 / 724

Adding even more workers

Let's see if 10 workers give us 10x speed!

Scale the worker Deployment to a bigger number:

kubectl scale deployment worker --replicas=10

The graph will peak at 10 hashes/second.

(We can add as many workers as we want: we will never go past 10 hashes/second.)

k8s/scalingdockercoins.md

297 / 724

Didn't we briefly exceed 10 hashes/second?

It may look like it, because the web UI shows instant speed
The instant speed can briefly exceed 10 hashes/second
The average speed cannot
The instant speed can be biased because of how it's computed

k8s/scalingdockercoins.md

298 / 724

Why instant speed is misleading

The instant speed is computed client-side by the web UI
The web UI checks the hash counter once per second
(and does a classic (h2-h1)/(t2-t1) speed computation)
The counter is updated once per second by the workers
These timings are not exact
(e.g. the web UI check interval is client-side JavaScript)
Sometimes, between two web UI counter measurements,
the workers are able to update the counter twice
During that cycle, the instant speed will appear to be much bigger
(but it will be compensated by lower instant speed before and after)

k8s/scalingdockercoins.md

299 / 724

Why are we stuck at 10 hashes per second?

If this was high-quality, production code, we would have instrumentation

(Datadog, Honeycomb, New Relic, statsd, Sumologic, ...)
It's not!
Perhaps we could benchmark our web services?

(with tools like ab, or even simpler, httping)

k8s/scalingdockercoins.md

300 / 724

Benchmarking our web services

We want to check hasher and rng
We are going to use httping
It's just like ping, but using HTTP GET requests

(it measures how long it takes to perform one GET request)

It's used like this:

httping [-c count] http://host:port/path

Or even simpler:
```
httping ip.ad.dr.ess
```
We will use httping on the ClusterIP addresses of our services

k8s/scalingdockercoins.md

301 / 724

Obtaining ClusterIP addresses

We can simply check the output of kubectl get services
Or do it programmatically, as in the example below

Retrieve the IP addresses:

HASHER=$(kubectl get svc hasher -o go-template={{.spec.clusterIP}})
RNG=$(kubectl get svc rng -o go-template={{.spec.clusterIP}})

Now we can access the IP addresses of our services through $HASHER and $RNG.

k8s/scalingdockercoins.md

302 / 724

Checking `hasher` and `rng` response times

Check the response times for both services:
```
httping -c 3 $HASHER
httping -c 3 $RNG
```

hasher is fine (it should take a few milliseconds to reply)
rng is not (it should take about 700 milliseconds if there are 10 workers)
Something is wrong with rng, but ... what?

k8s/scalingdockercoins.md

303 / 724

Hasardons-nous à des conclusions hâtives

Le goulot d'étranglement semble être rng.
Et si à tout hasard, nous n'avions pas assez d'entropie, et qu'on ne pouvait générer assez de nombres aléatoires?
On doit escalader le service rng sur plusieurs machines!

Note: ceci est une fiction! Nous avons assez d'entropie. Mais on a besoin d'un prétexte pour monter en charge.

(En réalité, le code de rng exploite /dev/urandom, qui n'est jamais à court d'entropie...)
...et c'est tout aussi bon que /dev/random.) shared/hastyconclusions.md

304 / 724

Image separating from the next chapter

305 / 724

Daemon sets

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

306 / 724

Daemon sets

We want to scale rng in a way that is different from how we scaled worker
We want one (and exactly one) instance of rng per node
What if we just scale up deploy/rng to the number of nodes?
- nothing guarantees that the rng containers will be distributed evenly
- if we add nodes later, they will not automatically run a copy of rng
- if we remove (or reboot) a node, one rng container will restart elsewhere
Instead of a deployment, we will use a daemonset

k8s/daemonset.md

307 / 724

Daemon sets in practice

Daemon sets are great for cluster-wide, per-node processes:
- kube-proxy
- weave (our overlay network)
- monitoring agents
- hardware management tools (e.g. SCSI/FC HBA agents)
- etc.
They can also be restricted to run only on some nodes

k8s/daemonset.md

308 / 724

Creating a daemon setUnfortunately, as of Kubernetes 1.14, the CLI cannot create daemon sets
309 / 724

Creating a daemon set

Unfortunately, as of Kubernetes 1.14, the CLI cannot create daemon sets
More precisely: it doesn't have a subcommand to create a daemon set

310 / 724

Creating a daemon set

Unfortunately, as of Kubernetes 1.14, the CLI cannot create daemon sets
More precisely: it doesn't have a subcommand to create a daemon set
But any kind of resource can always be created by providing a YAML description:
```
kubectl apply -f foo.yaml
```

311 / 724

Creating a daemon set

Unfortunately, as of Kubernetes 1.14, the CLI cannot create daemon sets
More precisely: it doesn't have a subcommand to create a daemon set
But any kind of resource can always be created by providing a YAML description:
```
kubectl apply -f foo.yaml
```

How do we create the YAML file for our daemon set?

312 / 724

Creating a daemon set

Unfortunately, as of Kubernetes 1.14, the CLI cannot create daemon sets
More precisely: it doesn't have a subcommand to create a daemon set
But any kind of resource can always be created by providing a YAML description:
```
kubectl apply -f foo.yaml
```

How do we create the YAML file for our daemon set?
- option 1: read the docs

313 / 724

Creating a daemon set

Unfortunately, as of Kubernetes 1.14, the CLI cannot create daemon sets
More precisely: it doesn't have a subcommand to create a daemon set
But any kind of resource can always be created by providing a YAML description:
```
kubectl apply -f foo.yaml
```

How do we create the YAML file for our daemon set?
- option 1: read the docs
- option 2: vi our way out of it

k8s/daemonset.md

314 / 724

Creating the YAML file for our daemon set

Let's start with the YAML file for the current rng resource

Dump the rng resource in YAML:
```
kubectl get deploy/rng -o yaml >rng.yml
```
Edit rng.yml

k8s/daemonset.md

315 / 724

"Casting" a resource to another

What if we just changed the kind field?

(It can't be that easy, right?)

Change kind: Deployment to kind: DaemonSet

Save, quit
Try to create our new resource:
```
kubectl apply -f rng.yml
```

316 / 724

"Casting" a resource to another

What if we just changed the kind field?

(It can't be that easy, right?)

Change kind: Deployment to kind: DaemonSet

Save, quit
Try to create our new resource:
```
kubectl apply -f rng.yml
```

We all knew this couldn't be that easy, right!

k8s/daemonset.md

317 / 724

Understanding the problemThe core of the error is:error validating data:
[ValidationError(DaemonSet.spec):
unknown field "replicas" in io.k8s.api.extensions.v1beta1.DaemonSetSpec,
...
318 / 724

Understanding the problemThe core of the error is:error validating data:
[ValidationError(DaemonSet.spec):
unknown field "replicas" in io.k8s.api.extensions.v1beta1.DaemonSetSpec,
...
Obviously, it doesn't make sense to specify a number of replicas for a daemon set
319 / 724

Understanding the problem

The core of the error is:

error validating data:
[ValidationError(DaemonSet.spec):
unknown field "replicas" in io.k8s.api.extensions.v1beta1.DaemonSetSpec,
...

Obviously, it doesn't make sense to specify a number of replicas for a daemon set
Workaround: fix the YAML
- remove the replicas field
- remove the strategy field (which defines the rollout mechanism for a deployment)
- remove the progressDeadlineSeconds field (also used by the rollout mechanism)
- remove the status: {} line at the end

320 / 724

Understanding the problem

The core of the error is:

error validating data:
[ValidationError(DaemonSet.spec):
unknown field "replicas" in io.k8s.api.extensions.v1beta1.DaemonSetSpec,
...

Obviously, it doesn't make sense to specify a number of replicas for a daemon set
Workaround: fix the YAML
- remove the replicas field
- remove the strategy field (which defines the rollout mechanism for a deployment)
- remove the progressDeadlineSeconds field (also used by the rollout mechanism)
- remove the status: {} line at the end
Or, we could also ...

k8s/daemonset.md

321 / 724

Use the `--force`, Luke

We could also tell Kubernetes to ignore these errors and try anyway
The --force flag's actual name is --validate=false

Try to load our YAML file and ignore errors:
```
kubectl apply -f rng.yml --validate=false
```

322 / 724

Use the `--force`, Luke

We could also tell Kubernetes to ignore these errors and try anyway
The --force flag's actual name is --validate=false

Try to load our YAML file and ignore errors:
```
kubectl apply -f rng.yml --validate=false
```

🎩✨🐇

323 / 724

Use the `--force`, Luke

We could also tell Kubernetes to ignore these errors and try anyway
The --force flag's actual name is --validate=false

Try to load our YAML file and ignore errors:
```
kubectl apply -f rng.yml --validate=false
```

🎩✨🐇

Wait ... Now, can it be that easy?

k8s/daemonset.md

324 / 724

Checking what we've doneDid we transform our deployment into a daemonset?
Look at the resources that we have now:kubectl get all


325 / 724

Checking what we've done

Did we transform our deployment into a daemonset?

Look at the resources that we have now:
```
kubectl get all
```

We have two resources called rng:

the deployment that was existing before
the daemon set that we just created

We also have one too many pods.
(The pod corresponding to the deployment still exists.)

k8s/daemonset.md

326 / 724

`deploy/rng` and `ds/rng`

You can have different resource types with the same name

(i.e. a deployment and a daemon set both named rng)

We still have the old rng deployment

NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rng        1         1         1            1           18m

But now we have the new rng daemon set as well

NAME                DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
daemonset.apps/rng  2        2        2      2           2          <none>         9s

k8s/daemonset.md

327 / 724

Too many pods

If we check with kubectl get pods, we see:

one pod for the deployment (named rng-xxxxxxxxxx-yyyyy)
one pod per node for the daemon set (named rng-zzzzz)

NAME                        READY     STATUS    RESTARTS   AGE
rng-54f57d4d49-7pt82        1/1       Running   0          11m
rng-b85tm                   1/1       Running   0          25s
rng-hfbrr                   1/1       Running   0          25s
[...]

328 / 724

Too many pods

If we check with kubectl get pods, we see:

one pod for the deployment (named rng-xxxxxxxxxx-yyyyy)
one pod per node for the daemon set (named rng-zzzzz)

NAME                        READY     STATUS    RESTARTS   AGE
rng-54f57d4d49-7pt82        1/1       Running   0          11m
rng-b85tm                   1/1       Running   0          25s
rng-hfbrr                   1/1       Running   0          25s
[...]

The daemon set created one pod per node, except on the master node.

The master node has taints preventing pods from running there.

(To schedule a pod on this node anyway, the pod will require appropriate tolerations.)

(Off by one? We don't run these pods on the node hosting the control plane.)

k8s/daemonset.md

329 / 724

Is this working?Look at the web UI
330 / 724

Is this working?

Look at the web UI
The graph should now go above 10 hashes per second!

331 / 724

Is this working?

Look at the web UI
The graph should now go above 10 hashes per second!
It looks like the newly created pods are serving traffic correctly
How and why did this happen?

(We didn't do anything special to add them to the rng service load balancer!)

k8s/daemonset.md

332 / 724

Image separating from the next chapter

333 / 724

Labels and selectors

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

334 / 724

Labels and selectors

The rng service is load balancing requests to a set of pods
That set of pods is defined by the selector of the rng service

Check the selector in the rng service definition:
```
kubectl describe service rng
```

The selector is app=rng
It means "all the pods having the label app=rng"

(They can have additional labels as well, that's OK!)

k8s/daemonset.md

335 / 724

Selector evaluation

We can use selectors with many kubectl commands
For instance, with kubectl get, kubectl logs, kubectl delete ... and more

Get the list of pods matching selector app=rng:

kubectl get pods -l app=rng
kubectl get pods --selector app=rng

But ... why do these pods (in particular, the new ones) have this app=rng label?

k8s/daemonset.md

336 / 724

Where do labels come from?

When we create a deployment with kubectl create deployment rng,
this deployment gets the label app=rng
The replica sets created by this deployment also get the label app=rng
The pods created by these replica sets also get the label app=rng
When we created the daemon set from the deployment, we re-used the same spec
Therefore, the pods created by the daemon set get the same labels

Note: when we use kubectl run stuff, the label is run=stuff instead.

k8s/daemonset.md

337 / 724

Updating load balancer configuration

We would like to remove a pod from the load balancer
What would happen if we removed that pod, with kubectl delete pod ...?

338 / 724

Updating load balancer configuration

We would like to remove a pod from the load balancer
What would happen if we removed that pod, with kubectl delete pod ...?

It would be re-created immediately (by the replica set or the daemon set)

339 / 724

Updating load balancer configuration

We would like to remove a pod from the load balancer
What would happen if we removed that pod, with kubectl delete pod ...?

It would be re-created immediately (by the replica set or the daemon set)
What would happen if we removed the app=rng label from that pod?

340 / 724

Updating load balancer configuration

We would like to remove a pod from the load balancer
What would happen if we removed that pod, with kubectl delete pod ...?

It would be re-created immediately (by the replica set or the daemon set)
What would happen if we removed the app=rng label from that pod?

It would also be re-created immediately

341 / 724

Updating load balancer configuration

We would like to remove a pod from the load balancer
What would happen if we removed that pod, with kubectl delete pod ...?

It would be re-created immediately (by the replica set or the daemon set)
What would happen if we removed the app=rng label from that pod?

It would also be re-created immediately

Why?!?

k8s/daemonset.md

342 / 724

Selectors for replica sets and daemon sets

The "mission" of a replica set is:

"Make sure that there is the right number of pods matching this spec!"
The "mission" of a daemon set is:

"Make sure that there is a pod matching this spec on each node!"

343 / 724

Selectors for replica sets and daemon sets

The "mission" of a replica set is:

"Make sure that there is the right number of pods matching this spec!"
The "mission" of a daemon set is:

"Make sure that there is a pod matching this spec on each node!"
In fact, replica sets and daemon sets do not check pod specifications
They merely have a selector, and they look for pods matching that selector
Yes, we can fool them by manually creating pods with the "right" labels
Bottom line: if we remove our app=rng label ...

... The pod "disappears" for its parent, which re-creates another pod to replace it

k8s/daemonset.md

344 / 724

Isolation of replica sets and daemon sets

Since both the rng daemon set and the rng replica set use app=rng ...

... Why don't they "find" each other's pods?

345 / 724

Isolation of replica sets and daemon sets

Since both the rng daemon set and the rng replica set use app=rng ...

... Why don't they "find" each other's pods?
Replica sets have a more specific selector, visible with kubectl describe

(It looks like app=rng,pod-template-hash=abcd1234)
Daemon sets also have a more specific selector, but it's invisible

(It looks like app=rng,controller-revision-hash=abcd1234)
As a result, each controller only "sees" the pods it manages

k8s/daemonset.md

346 / 724

Removing a pod from the load balancer

Currently, the rng service is defined by the app=rng selector
The only way to remove a pod is to remove or change the app label
... But that will cause another pod to be created instead!
What's the solution?

347 / 724

Removing a pod from the load balancer

Currently, the rng service is defined by the app=rng selector
The only way to remove a pod is to remove or change the app label
... But that will cause another pod to be created instead!
What's the solution?
We need to change the selector of the rng service!
Let's add another label to that selector (e.g. enabled=yes)

k8s/daemonset.md

348 / 724

Complex selectors

If a selector specifies multiple labels, they are understood as a logical AND

(In other words: the pods must match all the labels)
Kubernetes has support for advanced, set-based selectors

(But these cannot be used with services, at least not yet!)

k8s/daemonset.md

349 / 724

The plan

Add the label enabled=yes to all our rng pods
Update the selector for the rng service to also include enabled=yes
Toggle traffic to a pod by manually adding/removing the enabled label
Profit!

Note: if we swap steps 1 and 2, it will cause a short service disruption, because there will be a period of time during which the service selector won't match any pod. During that time, requests to the service will time out. By doing things in the order above, we guarantee that there won't be any interruption.

k8s/daemonset.md

350 / 724

Adding labels to pods

We want to add the label enabled=yes to all pods that have app=rng
We could edit each pod one by one with kubectl edit ...
... Or we could use kubectl label to label them all
kubectl label can use selectors itself

Add enabled=yes to all pods that have app=rng:

kubectl label pods -l app=rng enabled=yes

k8s/daemonset.md

351 / 724

Updating the service selector

We need to edit the service specification
Reminder: in the service definition, we will see app: rng in two places
- the label of the service itself (we don't need to touch that one)
- the selector of the service (that's the one we want to change)

Update the service to add enabled: yes to its selector:
```
kubectl edit service rng
```

352 / 724

Updating the service selector

We need to edit the service specification
Reminder: in the service definition, we will see app: rng in two places
- the label of the service itself (we don't need to touch that one)
- the selector of the service (that's the one we want to change)

Update the service to add enabled: yes to its selector:
```
kubectl edit service rng
```

... And then we get the weirdest error ever. Why?

k8s/daemonset.md

353 / 724

When the YAML parser is being too smart

YAML parsers try to help us:
- xyz is the string "xyz"
- 42 is the integer 42
- yes is the boolean value true
If we want the string "42" or the string "yes", we have to quote them
So we have to use enabled: "yes"

For a good laugh: if we had used "ja", "oui", "si" ... as the value, it would have worked!

k8s/daemonset.md

354 / 724

Updating the service selector, take 2

Update the service to add enabled: "yes" to its selector:
```
kubectl edit service rng
```

This time it should work!

If we did everything correctly, the web UI shouldn't show any change.

k8s/daemonset.md

355 / 724

Updating labels

We want to disable the pod that was created by the deployment
All we have to do, is remove the enabled label from that pod
To identify that pod, we can use its name
... Or rely on the fact that it's the only one with a pod-template-hash label
Good to know:
- kubectl label ... foo= doesn't remove a label (it sets it to an empty string)
- to remove label foo, use kubectl label ... foo-
- to change an existing label, we would need to add --overwrite

k8s/daemonset.md

356 / 724

Removing a pod from the load balancer

In one window, check the logs of that pod:

POD=$(kubectl get pod -l app=rng,pod-template-hash -o name)
kubectl logs --tail 1 --follow $POD

(We should see a steady stream of HTTP logs)

In another window, remove the label from the pod:
```
kubectl label pod -l app=rng,pod-template-hash enabled-
```
(The stream of HTTP logs should stop immediately)

There might be a slight change in the web UI (since we removed a bit of capacity from the rng service). If we remove more pods, the effect should be more visible.

k8s/daemonset.md

357 / 724

Updating the daemon set

If we scale up our cluster by adding new nodes, the daemon set will create more pods
These pods won't have the enabled=yes label
If we want these pods to have that label, we need to edit the daemon set spec
We can do that with e.g. kubectl edit daemonset rng

k8s/daemonset.md

358 / 724

We've put resources in your resources

Reminder: a daemon set is a resource that creates more resources!
There is a difference between:
- the label(s) of a resource (in the metadata block in the beginning)
- the selector of a resource (in the spec block)
- the label(s) of the resource(s) created by the first resource (in the template block)
We would need to update the selector and the template

(metadata labels are not mandatory)
The template must match the selector

(i.e. the resource will refuse to create resources that it will not select)

k8s/daemonset.md

359 / 724

Labels and debugging

When a pod is misbehaving, we can delete it: another one will be recreated
But we can also change its labels
It will be removed from the load balancer (it won't receive traffic anymore)
Another pod will be recreated immediately
But the problematic pod is still here, and we can inspect and debug it
We can even re-add it to the rotation if necessary

(Very useful to troubleshoot intermittent and elusive bugs)

k8s/daemonset.md

360 / 724

Labels and advanced rollout control

Conversely, we can add pods matching a service's selector
These pods will then receive requests and serve traffic
Examples:
- one-shot pod with all debug flags enabled, to collect logs
- pods created automatically, but added to rotation in a second step
  (by setting their label accordingly)
This gives us building blocks for canary and blue/green deployments

k8s/daemonset.md

361 / 724

Image separating from the next chapter

362 / 724

Rolling updates

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

363 / 724

Rolling updates

By default (without rolling updates), when a scaled resource is updated:
- new pods are created
- old pods are terminated
- ... all at the same time
- if something goes wrong, ¯\_(ツ)_/¯

k8s/rollout.md

364 / 724

Rolling updates

With rolling updates, when a resource is updated, it happens progressively
Two parameters determine the pace of the rollout: maxUnavailable and maxSurge
They can be specified in absolute number of pods, or percentage of the replicas count
At any given time ...
- there will always be at least replicas-maxUnavailable pods available
- there will never be more than replicas+maxSurge pods in total
- there will therefore be up to maxUnavailable+maxSurge pods being updated
We have the possibility of rolling back to the previous version
(if the update fails or is unsatisfactory in any way)

k8s/rollout.md

365 / 724

Checking current rollout parameters

Recall how we build custom reports with kubectl and jq:

Show the rollout plan for our deployments:

kubectl get deploy -o json |
        jq ".items[] | {name:.metadata.name} + .spec.strategy.rollingUpdate"

k8s/rollout.md

366 / 724

Rolling updates in practice

As of Kubernetes 1.8, we can do rolling updates with:

deployments, daemonsets, statefulsets
Editing one of these resources will automatically result in a rolling update
Rolling updates can be monitored with the kubectl rollout subcommand

k8s/rollout.md

367 / 724

Building a new version of the `worker` service

Only run these commands if you have built and pushed DockerCoins to a local registry.
If you are using images from the Docker Hub (dockercoins/worker:v0.1), skip this.

Go to the stacks directory (~/container.training/stacks)
Edit dockercoins/worker/worker.py; update the first sleep line to sleep 1 second

Build a new tag and push it to the registry:

#export REGISTRY=localhost:3xxxx
export TAG=v0.2
docker-compose -f dockercoins.yml build
docker-compose -f dockercoins.yml push

k8s/rollout.md

368 / 724

Rolling out the new worker serviceLet's monitor what's going on by opening a few terminals, and run:kubectl get pods -w
kubectl get replicasets -w
kubectl get deployments -w

Update worker either with kubectl edit, or by running:kubectl set image deploy worker worker=$REGISTRY/worker:$TAG

369 / 724

Rolling out the new `worker` service

Let's monitor what's going on by opening a few terminals, and run:

kubectl get pods -w
kubectl get replicasets -w
kubectl get deployments -w

Update worker either with kubectl edit, or by running:

kubectl set image deploy worker worker=$REGISTRY/worker:$TAG

That rollout should be pretty quick. What shows in the web UI?

k8s/rollout.md

370 / 724

Give it some time

At first, it looks like nothing is happening (the graph remains at the same level)
According to kubectl get deploy -w, the deployment was updated really quickly
But kubectl get pods -w tells a different story
The old pods are still here, and they stay in Terminating state for a while
Eventually, they are terminated; and then the graph decreases significantly
This delay is due to the fact that our worker doesn't handle signals
Kubernetes sends a "polite" shutdown request to the worker, which ignores it
After a grace period, Kubernetes gets impatient and kills the container

(The grace period is 30 seconds, but can be changed if needed)

k8s/rollout.md

371 / 724

Rolling out something invalid

What happens if we make a mistake?

Update worker by specifying a non-existent image:

export TAG=v0.3
kubectl set image deploy worker worker=$REGISTRY/worker:$TAG

Check what's going on:
```
kubectl rollout status deploy worker
```

372 / 724

Rolling out something invalid

What happens if we make a mistake?

Update worker by specifying a non-existent image:

export TAG=v0.3
kubectl set image deploy worker worker=$REGISTRY/worker:$TAG

Check what's going on:
```
kubectl rollout status deploy worker
```

Our rollout is stuck. However, the app is not dead.

(After a minute, it will stabilize to be 20-25% slower.)

k8s/rollout.md

373 / 724

What's going on with our rollout?

Why is our app a bit slower?
Because MaxUnavailable=25%

... So the rollout terminated 2 replicas out of 10 available
Okay, but why do we see 5 new replicas being rolled out?
Because MaxSurge=25%

... So in addition to replacing 2 replicas, the rollout is also starting 3 more
It rounded down the number of MaxUnavailable pods conservatively,
but the total number of pods being rolled out is allowed to be 25+25=50%

k8s/rollout.md

374 / 724

The nitty-gritty details

We start with 10 pods running for the worker deployment
Current settings: MaxUnavailable=25% and MaxSurge=25%
When we start the rollout:
- two replicas are taken down (as per MaxUnavailable=25%)
- two others are created (with the new version) to replace them
- three others are created (with the new version) per MaxSurge=25%)
Now we have 8 replicas up and running, and 5 being deployed
Our rollout is stuck at this point!

k8s/rollout.md

375 / 724

Checking the dashboard during the bad rollout

If you didn't deploy the Kubernetes dashboard earlier, just skip this slide.

Check which port the dashboard is on:
```
kubectl -n kube-system get svc socat
```

Note the 3xxxx port.

Connect to http://oneofournodes:3xxxx/

376 / 724

Checking the dashboard during the bad rollout

If you didn't deploy the Kubernetes dashboard earlier, just skip this slide.

Check which port the dashboard is on:
```
kubectl -n kube-system get svc socat
```

Note the 3xxxx port.

Connect to http://oneofournodes:3xxxx/

We have failures in Deployments, Pods, and Replica Sets

k8s/rollout.md

377 / 724

Recovering from a bad rollout

We could push some v0.3 image

(the pod retry logic will eventually catch it and the rollout will proceed)
Or we could invoke a manual rollback

Cancel the deployment and wait for the dust to settle:

kubectl rollout undo deploy worker
kubectl rollout status deploy worker

k8s/rollout.md

378 / 724

Changing rollout parameters

We want to:
- revert to v0.1
- be conservative on availability (always have desired number of available workers)
- go slow on rollout speed (update only one pod at a time)
- give some time to our workers to "warm up" before starting more

The corresponding changes can be expressed in the following YAML snippet:

spec:
  template:
    spec:
      containers:
      - name: worker
        image: $REGISTRY/worker:v0.1
  strategy:
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  minReadySeconds: 10

k8s/rollout.md

379 / 724

Applying changes through a YAML patch

We could use kubectl edit deployment worker
But we could also use kubectl patch with the exact YAML shown before

Apply all our changes and wait for them to take effect:

kubectl patch deployment worker -p "
spec:
  template:
    spec:
      containers:
      - name: worker
        image: $REGISTRY/worker:v0.1
  strategy:
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  minReadySeconds: 10
"
kubectl rollout status deployment worker
kubectl get deploy -o json worker |
      jq "{name:.metadata.name} + .spec.strategy.rollingUpdate"

k8s/rollout.md

380 / 724

Image separating from the next chapter

381 / 724

Namespaces

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

382 / 724

Namespaces

We would like to deploy another copy of DockerCoins on our cluster
We could rename all our deployments and services:

hasher → hasher2, redis → redis2, rng → rng2, etc.
That would require updating the code
There has to be a better way!

383 / 724

Namespaces

We would like to deploy another copy of DockerCoins on our cluster
We could rename all our deployments and services:

hasher → hasher2, redis → redis2, rng → rng2, etc.
That would require updating the code
There has to be a better way!
As hinted by the title of this section, we will use namespaces

k8s/namespaces.md

384 / 724

Identifying a resource

We cannot have two resources with the same name

(or can we...?)

385 / 724

Identifying a resource

We cannot have two resources with the same name

(or can we...?)
We cannot have two resources of the same kind with the same name

(but it's OK to have an rng service, an rng deployment, and an rng daemon set)

386 / 724

Identifying a resource

We cannot have two resources with the same name

(or can we...?)
We cannot have two resources of the same kind with the same name

(but it's OK to have an rng service, an rng deployment, and an rng daemon set)
We cannot have two resources of the same kind with the same name in the same namespace

(but it's OK to have e.g. two rng services in different namespaces)

387 / 724

Identifying a resource

We cannot have two resources with the same name

(or can we...?)
We cannot have two resources of the same kind with the same name

(but it's OK to have an rng service, an rng deployment, and an rng daemon set)
We cannot have two resources of the same kind with the same name in the same namespace

(but it's OK to have e.g. two rng services in different namespaces)
Except for resources that exist at the cluster scope

(these do not belong to a namespace)

k8s/namespaces.md

388 / 724

Uniquely identifying a resource

For namespaced resources:

the tuple (kind, name, namespace) needs to be unique
For resources at the cluster scope:

the tuple (kind, name) needs to be unique

List resource types again, and check the NAMESPACED column:
```
kubectl api-resources
```

k8s/namespaces.md

389 / 724

Pre-existing namespaces

If we deploy a cluster with kubeadm, we have three or four namespaces:
- default (for our applications)
- kube-system (for the control plane)
- kube-public (contains one ConfigMap for cluster discovery)
- kube-node-lease (in Kubernetes 1.14 and later; contains Lease objects)
If we deploy differently, we may have different namespaces

k8s/namespaces.md

390 / 724

Creating namespaces

Let's see two identical methods to create a namespace

We can use kubectl create namespace:
```
kubectl create namespace blue
```

Or we can construct a very minimal YAML snippet:

kubectl apply -f- <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: blue
EOF

Some tools like Helm will create namespaces automatically when needed

k8s/namespaces.md

391 / 724

Using namespaces

We can pass a -n or --namespace flag to most kubectl commands:
```
kubectl -n blue get svc
```
We can also change our current context
A context is a (user, cluster, namespace) tuple
We can manipulate contexts with the kubectl config command

k8s/namespaces.md

392 / 724

Viewing existing contexts

On our training environments, at this point, there should be only one context

View existing contexts to see the cluster name and the current user:
```
kubectl config get-contexts
```

The current context (the only one!) is tagged with a *
What are NAME, CLUSTER, AUTHINFO, and NAMESPACE?

k8s/namespaces.md

393 / 724

What's in a context

NAME is an arbitrary string to identify the context
CLUSTER is a reference to a cluster

(i.e. API endpoint URL, and optional certificate)
AUTHINFO is a reference to the authentication information to use

(i.e. a TLS client certificate, token, or otherwise)
NAMESPACE is the namespace

(empty string = default)

k8s/namespaces.md

394 / 724

Switching contexts

We want to use a different namespace
Solution 1: update the current context

This is appropriate if we need to change just one thing (e.g. namespace or authentication).
Solution 2: create a new context and switch to it

This is appropriate if we need to change multiple things and switch back and forth.
Let's go with solution 1!

k8s/namespaces.md

395 / 724

Updating a context

This is done through kubectl config set-context
We can update a context by passing its name, or the current context with --current

Update the current context to use the blue namespace:

kubectl config set-context --current --namespace=blue

Check the result:
```
kubectl config get-contexts
```

k8s/namespaces.md

396 / 724

Using our new namespace

Let's check that we are in our new namespace, then deploy a new copy of Dockercoins

Verify that the new context is empty:
```
kubectl get all
```

k8s/namespaces.md

397 / 724

Deploying DockerCoins with YAML files

The GitHub repository jpetazzo/kubercoins contains everything we need!

Clone the kubercoins repository:

cd ~
git clone https://github.com/jpetazzo/kubercoins

Create all the DockerCoins resources:
```
kubectl create -f kubercoins
```

If the argument behind -f is a directory, all the files in that directory are processed.

The subdirectories are not processed, unless we also add the -R flag.

k8s/namespaces.md

398 / 724

Viewing the deployed app

Let's see if this worked correctly!

Retrieve the port number allocated to the webui service:
```
kubectl get svc webui
```
Point our browser to http://X.X.X.X:3xxxx

If the graph shows up but stays at zero, give it a minute or two!

k8s/namespaces.md

399 / 724

Namespaces and isolation

Namespaces do not provide isolation
A pod in the green namespace can communicate with a pod in the blue namespace
A pod in the default namespace can communicate with a pod in the kube-system namespace
CoreDNS uses a different subdomain for each namespace
Example: from any pod in the cluster, you can connect to the Kubernetes API with:

https://kubernetes.default.svc.cluster.local:443/

k8s/namespaces.md

400 / 724

Isolating pods

Actual isolation is implemented with network policies
Network policies are resources (like deployments, services, namespaces...)
Network policies specify which flows are allowed:
- between pods
- from pods to the outside world
- and vice-versa

k8s/namespaces.md

401 / 724

Switch back to the default namespace

Let's make sure that we don't run future exercises in the blue namespace

Switch back to the original context:

kubectl config set-context --current --namespace=

Note: we could have used --namespace=default for the same result.

k8s/namespaces.md

402 / 724

Switching namespaces more easily

We can also use a little helper tool called kubens:

# Switch to namespace foo
kubens foo
# Switch back to the previous namespace
kubens -

On our clusters, kubens is called kns instead

(so that it's even fewer keystrokes to switch namespaces)

k8s/namespaces.md

403 / 724

`kubens` and `kubectx`

With kubens, we can switch quickly between namespaces
With kubectx, we can switch quickly between contexts
Both tools are simple shell scripts available from https://github.com/ahmetb/kubectx
On our clusters, they are installed as kns and kctx

(for brevity and to avoid completion clashes between kubectx and kubectl)

k8s/namespaces.md

404 / 724

`kube-ps1`

It's easy to lose track of our current cluster / context / namespace
kube-ps1 makes it easy to track these, by showing them in our shell prompt
It's a simple shell script available from https://github.com/jonmosco/kube-ps1
On our clusters, kube-ps1 is installed and included in PS1:
```
[123.45.67.89] (kubernetes-admin@kubernetes:default) docker@node1 ~
```
(The highlighted part is context:namespace, managed by kube-ps1)
Highly recommended if you work across multiple contexts or namespaces!

k8s/namespaces.md

405 / 724

Image separating from the next chapter

406 / 724

Kustomize

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

407 / 724

Kustomize

Kustomize lets us transform YAML files representing Kubernetes resources
The original YAML files are valid resource files

(e.g. they can be loaded with kubectl apply -f)
They are left untouched by Kustomize
Kustomize lets us define overlays that extend or change the resource files

k8s/kustomize.md

408 / 724

Differences with Helm

Helm charts use placeholders {{ like.this }}
Kustomize "bases" are standard Kubernetes YAML
It is possible to use an existing set of YAML as a Kustomize base
As a result, writing a Helm chart is more work ...
... But Helm charts are also more powerful; e.g. they can:
- use flags to conditionally include resources or blocks
- check if a given Kubernetes API group is supported
- and much more

k8s/kustomize.md

409 / 724

Kustomize concepts

Kustomize needs a kustomization.yaml file
That file can be a base or a variant
If it's a base:
- it lists YAML resource files to use
If it's a variant (or overlay):
- it refers to (at least) one base
- and some patches

k8s/kustomize.md

410 / 724

An easy way to get started with Kustomize

We are going to use Replicated Ship to experiment with Kustomize
The Replicated Ship CLI has been installed on our clusters
Replicated Ship has multiple workflows; here is what we will do:
- initialize a Kustomize overlay from a remote GitHub repository
- customize some values using the web UI provided by Ship
- look at the resulting files and apply them to the cluster

k8s/kustomize.md

411 / 724

Getting started with Ship

We need to run ship init in a new directory
ship init requires a URL to a remote repository containing Kubernetes YAML
It will clone that repository and start a web UI
Later, it can watch that repository and/or update from it
We will use the jpetazzo/kubercoins repository

(it contains all the DockerCoins resources as YAML files)

k8s/kustomize.md

412 / 724

`ship init`

Change to a new directory:
```
mkdir ~/kustomcoins
cd ~/kustomcoins
```

Run ship init with the kustomcoins repository:

ship init https://github.com/jpetazzo/kubercoins

k8s/kustomize.md

413 / 724

Access the web UI

ship init tells us to connect on localhost:8800
We need to replace localhost with the address of our node

(since we run on a remote machine)
Follow the steps in the web UI, and change one parameter

(e.g. set the number of replicas in the worker Deployment)
Complete the web workflow, and go back to the CLI

k8s/kustomize.md

414 / 724

Inspect the results

Look at the content of our directory
base contains the kubercoins repository + a kustomization.yaml file
overlays/ship contains the Kustomize overlay referencing the base + our patch(es)
rendered.yaml is a YAML bundle containing the patched application
.ship contains a state file used by Ship

k8s/kustomize.md

415 / 724

Using the results

We can kubectl apply -f rendered.yaml

(on any version of Kubernetes)
Starting with Kubernetes 1.14, we can apply the overlay directly with:
```
kubectl apply -k overlays/ship
```
But let's not do that for now!
We will create a new copy of DockerCoins in another namespace

k8s/kustomize.md

416 / 724

Deploy DockerCoins with Kustomize

Create a new namespace:
```
kubectl create namespace kustomcoins
```

Deploy DockerCoins:

kubectl apply -f rendered.yaml --namespace=kustomcoins

Or, with Kubernetes 1.14, you can also do this:

kubectl apply -k overlays/ship --namespace=kustomcoins

k8s/kustomize.md

417 / 724

Checking our new copy of DockerCoins

We can check the worker logs, or the web UI

Retrieve the NodePort number of the web UI:

kubectl get service webui --namespace=kustomcoins

Open it in a web browser

Look at the worker logs:

kubectl logs deploy/worker --tail=10 --follow --namespace=kustomcoins

Note: it might take a minute or two for the worker to start.

k8s/kustomize.md

418 / 724

Image separating from the next chapter

419 / 724

Healthchecks

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

420 / 724

Healthchecks

Kubernetes provides two kinds of healthchecks: liveness and readiness
Healthchecks are probes that apply to containers (not to pods)
Each container can have two (optional) probes:
- liveness = is this container dead or alive?
- readiness = is this container ready to serve traffic?
Different probes are available (HTTP, TCP, program execution)
Let's see the difference and how to use them!

k8s/healthchecks.md

421 / 724

Liveness probe

Indicates if the container is dead or alive
A dead container cannot come back to life
If the liveness probe fails, the container is killed

(to make really sure that it's really dead; no zombies or undeads!)
What happens next depends on the pod's restartPolicy:
- Never: the container is not restarted
- OnFailure or Always: the container is restarted

k8s/healthchecks.md

422 / 724

When to use a liveness probe

To indicate failures that can't be recovered
- deadlocks (causing all requests to time out)
- internal corruption (causing all requests to error)
If the liveness probe fails N consecutive times, the container is killed
N is the failureThreshold (3 by default)

k8s/healthchecks.md

423 / 724

Readiness probe

Indicates if the container is ready to serve traffic
If a container becomes "unready" (let's say busy!) it might be ready again soon
If the readiness probe fails:
- the container is not killed
- if the pod is a member of a service, it is temporarily removed
- it is re-added as soon as the readiness probe passes again

k8s/healthchecks.md

424 / 724

When to use a readiness probe

To indicate temporary failures
- the application can only service N parallel connections
- the runtime is busy doing garbage collection or initial data load
The container is marked as "not ready" after failureThreshold failed attempts

(3 by default)
It is marked again as "ready" after successThreshold successful attempts

(1 by default)

k8s/healthchecks.md

425 / 724

Different types of probes

HTTP request
- specify URL of the request (and optional headers)
- any status code between 200 and 399 indicates success
TCP connection
- the probe succeeds if the TCP port is open
arbitrary exec
- a command is executed in the container
- exit status of zero indicates success

k8s/healthchecks.md

426 / 724

Benefits of using probes

Rolling updates proceed when containers are actually ready

(as opposed to merely started)
Containers in a broken state get killed and restarted

(instead of serving errors or timeouts)
Overloaded backends get removed from load balancer rotation

(thus improving response times across the board)

k8s/healthchecks.md

427 / 724

Example: HTTP probe

Here is a pod template for the rng web service of the DockerCoins app:

apiVersion: v1
kind: Pod
metadata:
  name: rng-with-liveness
spec:
  containers:
  - name: rng
    image: dockercoins/rng:v0.1
    livenessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 1

If the backend serves an error, or takes longer than 1s, 3 times in a row, it gets killed.

k8s/healthchecks.md

428 / 724

Example: exec probe

Here is a pod template for a Redis server:

apiVersion: v1
kind: Pod
metadata:
  name: redis-with-liveness
spec:
  containers:
  - name: redis
    image: redis
    livenessProbe:
      exec:
        command: ["redis-cli", "ping"]

If the Redis process becomes unresponsive, it will be killed.

k8s/healthchecks.md

429 / 724

Details about liveness and readiness probes

Probes are executed at intervals of periodSeconds (default: 10)
The timeout for a probe is set with timeoutSeconds (default: 1)
A probe is considered successful after successThreshold successes (default: 1)
A probe is considered failing after failureThreshold failures (default: 3)
If a probe is not defined, it's as if there was an "always successful" probe

k8s/healthchecks.md

430 / 724

Image separating from the next chapter

431 / 724

Accessing logs from the CLI

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

432 / 724

Accessing logs from the CLI

The kubectl logs command has limitations:
- it cannot stream logs from multiple pods at a time
- when showing logs from multiple pods, it mixes them all together
We are going to see how to do it better

k8s/logs-cli.md

433 / 724

Doing it manually

We could (if we were so inclined) write a program or script that would:
- take a selector as an argument
- enumerate all pods matching that selector (with kubectl get -l ...)
- fork one kubectl logs --follow ... command per container
- annotate the logs (the output of each kubectl logs ... process) with their origin
- preserve ordering by using kubectl logs --timestamps ... and merge the output

434 / 724

Doing it manually

We could (if we were so inclined) write a program or script that would:
- take a selector as an argument
- enumerate all pods matching that selector (with kubectl get -l ...)
- fork one kubectl logs --follow ... command per container
- annotate the logs (the output of each kubectl logs ... process) with their origin
- preserve ordering by using kubectl logs --timestamps ... and merge the output
We could do it, but thankfully, others did it for us already!

k8s/logs-cli.md

435 / 724

Stern

Stern is an open source project by Wercker.

From the README:

Stern allows you to tail multiple pods on Kubernetes and multiple containers within the pod. Each result is color coded for quicker debugging.

The query is a regular expression so the pod name can easily be filtered and you don't need to specify the exact id (for instance omitting the deployment id). If a pod is deleted it gets removed from tail and if a new pod is added it automatically gets tailed.

Exactly what we need!

k8s/logs-cli.md

436 / 724

Installing Stern

Run stern (without arguments) to check if it's installed:

$ stern
Tail multiple pods and containers from Kubernetes
Usage:
stern pod-query [flags]

If it is not installed, the easiest method is to download a binary release

The following commands will install Stern on a Linux Intel 64 bit machine:

sudo curl -L -o /usr/local/bin/stern \
   https://github.com/wercker/stern/releases/download/1.10.0/stern_linux_amd64
sudo chmod +x /usr/local/bin/stern

k8s/logs-cli.md

437 / 724

Using Stern

There are two ways to specify the pods whose logs we want to see:
- -l followed by a selector expression (like with many kubectl commands)
- with a "pod query," i.e. a regex used to match pod names
These two ways can be combined if necessary

View the logs for all the rng containers:
```
stern rng
```

k8s/logs-cli.md

438 / 724

Stern convenient options

The --tail N flag shows the last N lines for each container

(Instead of showing the logs since the creation of the container)
The -t / --timestamps flag shows timestamps
The --all-namespaces flag is self-explanatory

View what's up with the weave system containers:

stern --tail 1 --timestamps --all-namespaces weave

k8s/logs-cli.md

439 / 724

Using Stern with a selector

When specifying a selector, we can omit the value for a label
This will match all objects having that label (regardless of the value)
Everything created with kubectl run has a label run
We can use that property to view the logs of all the pods created with kubectl run
Similarly, everything created with kubectl create deployment has a label app

View the logs for all the things started with kubectl create deployment:
```
stern -l app
```

k8s/logs-cli.md

440 / 724

Image separating from the next chapter

441 / 724

Centralized logging

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

442 / 724

Centralized logging

Using kubectl or stern is simple; but it has drawbacks:
- when a node goes down, its logs are not available anymore
- we can only dump or stream logs; we want to search/index/count...
We want to send all our logs to a single place
We want to parse them (e.g. for HTTP logs) and index them
We want a nice web dashboard

443 / 724

Centralized logging

Using kubectl or stern is simple; but it has drawbacks:
- when a node goes down, its logs are not available anymore
- we can only dump or stream logs; we want to search/index/count...
We want to send all our logs to a single place
We want to parse them (e.g. for HTTP logs) and index them
We want a nice web dashboard
We are going to deploy an EFK stack

k8s/logs-centralized.md

444 / 724

What is EFK?

EFK is three components:
- ElasticSearch (to store and index log entries)
- Fluentd (to get container logs, process them, and put them in ElasticSearch)
- Kibana (to view/search log entries with a nice UI)
The only component that we need to access from outside the cluster will be Kibana

k8s/logs-centralized.md

445 / 724

Deploying EFK on our cluster

We are going to use a YAML file describing all the required resources

Load the YAML file into our cluster:

kubectl apply -f ~/container.training/k8s/efk.yaml

If we look at the YAML file, we see that it creates a daemon set, two deployments, two services, and a few roles and role bindings (to give fluentd the required permissions).

k8s/logs-centralized.md

446 / 724

The itinerary of a log line (before Fluentd)

A container writes a line on stdout or stderr
Both are typically piped to the container engine (Docker or otherwise)
The container engine reads the line, and sends it to a logging driver
The timestamp and stream (stdout or stderr) is added to the log line
With the default configuration for Kubernetes, the line is written to a JSON file

(/var/log/containers/pod-name_namespace_container-id.log)
That file is read when we invoke kubectl logs; we can access it directly too

k8s/logs-centralized.md

447 / 724

The itinerary of a log line (with Fluentd)

Fluentd runs on each node (thanks to a daemon set)
It bind-mounts /var/log/containers from the host (to access these files)
It continuously scans this directory for new files; reads them; parses them
Each log line becomes a JSON object, fully annotated with extra information:
container id, pod name, Kubernetes labels...
These JSON objects are stored in ElasticSearch
ElasticSearch indexes the JSON objects
We can access the logs through Kibana (and perform searches, counts, etc.)

k8s/logs-centralized.md

448 / 724

Accessing Kibana

Kibana offers a web interface that is relatively straightforward
Let's check it out!

Check which NodePort was allocated to Kibana:
```
kubectl get svc kibana
```
With our web browser, connect to Kibana

k8s/logs-centralized.md

449 / 724

Using Kibana

Note: this is not a Kibana workshop! So this section is deliberately very terse.

The first time you connect to Kibana, you must "configure an index pattern"
Just use the one that is suggested, @timestamp*
Then click "Discover" (in the top-left corner)
You should see container logs
Advice: in the left column, select a few fields to display, e.g.:

kubernetes.host, kubernetes.pod_name, stream, log

*If you don't see @timestamp, it's probably because no logs exist yet.
Wait a bit, and double-check the logging pipeline!

k8s/logs-centralized.md

450 / 724

Caveat emptor

We are using EFK because it is relatively straightforward to deploy on Kubernetes, without having to redeploy or reconfigure our cluster. But it doesn't mean that it will always be the best option for your use-case. If you are running Kubernetes in the cloud, you might consider using the cloud provider's logging infrastructure (if it can be integrated with Kubernetes).

The deployment method that we will use here has been simplified: there is only one ElasticSearch node. In a real deployment, you might use a cluster, both for performance and reliability reasons. But this is outside of the scope of this chapter.

The YAML file that we used creates all the resources in the default namespace, for simplicity. In a real scenario, you will create the resources in the kube-system namespace or in a dedicated namespace.

k8s/logs-centralized.md

451 / 724

Image separating from the next chapter

452 / 724

Authentication and authorization

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

453 / 724

Authentication and authorization

And first, a little refresher!

Authentication = verifying the identity of a person

On a UNIX system, we can authenticate with login+password, SSH keys ...
Authorization = listing what they are allowed to do

On a UNIX system, this can include file permissions, sudoer entries ...
Sometimes abbreviated as "authn" and "authz"
In good modular systems, these things are decoupled

(so we can e.g. change a password or SSH key without having to reset access rights)

k8s/authn-authz.md

454 / 724

Authentication in Kubernetes

When the API server receives a request, it tries to authenticate it

(it examines headers, certificates... anything available)
Many authentication methods are available and can be used simultaneously

(we will see them on the next slide)
It's the job of the authentication method to produce:
- the user name
- the user ID
- a list of groups
The API server doesn't interpret these; that'll be the job of authorizers

k8s/authn-authz.md

455 / 724

Authentication methods

TLS client certificates

(that's what we've been doing with kubectl so far)
Bearer tokens

(a secret token in the HTTP headers of the request)
HTTP basic auth

(carrying user and password in an HTTP header)
Authentication proxy

(sitting in front of the API and setting trusted headers)

k8s/authn-authz.md

456 / 724

Anonymous requests

If any authentication method rejects a request, it's denied

(401 Unauthorized HTTP code)
If a request is neither rejected nor accepted by anyone, it's anonymous
- the user name is system:anonymous
- the list of groups is [system:unauthenticated]
By default, the anonymous user can't do anything

(that's what you get if you just curl the Kubernetes API)

k8s/authn-authz.md

457 / 724

Authentication with TLS certificates

This is enabled in most Kubernetes deployments
The user name is derived from the CN in the client certificates
The groups are derived from the O fields in the client certificate
From the point of view of the Kubernetes API, users do not exist

(i.e. they are not stored in etcd or anywhere else)
Users can be created (and added to groups) independently of the API
The Kubernetes API can be set up to use your custom CA to validate client certs

k8s/authn-authz.md

458 / 724

Viewing our admin certificate

Let's inspect the certificate we've been using all this time!

This command will show the CN and O fields for our certificate:

kubectl config view \
      --raw \
      -o json \
      | jq -r .users[0].user[\"client-certificate-data\"] \
      | openssl base64 -d -A \
      | openssl x509 -text \
      | grep Subject:

Let's break down that command together! 😅

k8s/authn-authz.md

459 / 724

Breaking down the command

kubectl config view shows the Kubernetes user configuration
--raw includes certificate information (which shows as REDACTED otherwise)
-o json outputs the information in JSON format
| jq ... extracts the field with the user certificate (in base64)
| openssl base64 -d -A decodes the base64 format (now we have a PEM file)
| openssl x509 -text parses the certificate and outputs it as plain text
| grep Subject: shows us the line that interests us

→ We are user kubernetes-admin, in group system:masters.

(We will see later how and why this gives us the permissions that we have.)

k8s/authn-authz.md

460 / 724

User certificates in practice

The Kubernetes API server does not support certificate revocation

(see issue #18982)
As a result, we don't have an easy way to terminate someone's access

(if their key is compromised, or they leave the organization)
Option 1: re-create a new CA and re-issue everyone's certificates
→ Maybe OK if we only have a few users; no way otherwise
Option 2: don't use groups; grant permissions to individual users
→ Inconvenient if we have many users and teams; error-prone
Option 3: issue short-lived certificates (e.g. 24 hours) and renew them often
→ This can be facilitated by e.g. Vault or by the Kubernetes CSR API

k8s/authn-authz.md

461 / 724

Authentication with tokens

Tokens are passed as HTTP headers:

Authorization: Bearer and-then-here-comes-the-token
Tokens can be validated through a number of different methods:
- static tokens hard-coded in a file on the API server
- bootstrap tokens (special case to create a cluster or join nodes)
- OpenID Connect tokens (to delegate authentication to compatible OAuth2 providers)
- service accounts (these deserve more details, coming right up!)

k8s/authn-authz.md

462 / 724

Service accounts

A service account is a user that exists in the Kubernetes API

(it is visible with e.g. kubectl get serviceaccounts)
Service accounts can therefore be created / updated dynamically

(they don't require hand-editing a file and restarting the API server)
A service account is associated with a set of secrets

(the kind that you can view with kubectl get secrets)
Service accounts are generally used to grant permissions to applications, services...

(as opposed to humans)

k8s/authn-authz.md

463 / 724

Token authentication in practice

We are going to list existing service accounts
Then we will extract the token for a given service account
And we will use that token to authenticate with the API

k8s/authn-authz.md

464 / 724

Listing service accounts

The resource name is serviceaccount or sa for short:
```
kubectl get sa
```

There should be just one service account in the default namespace: default.

k8s/authn-authz.md

465 / 724

Finding the secret

List the secrets for the default service account:

kubectl get sa default -o yaml
SECRET=$(kubectl get sa default -o json | jq -r .secrets[0].name)

It should be named default-token-XXXXX.

k8s/authn-authz.md

466 / 724

Extracting the token

The token is stored in the secret, wrapped with base64 encoding

View the secret:
```
kubectl get secret $SECRET -o yaml
```

Extract the token and decode it:

TOKEN=$(kubectl get secret $SECRET -o json \
      | jq -r .data.token | openssl base64 -d -A)

k8s/authn-authz.md

467 / 724

Using the token

Let's send a request to the API, without and with the token

Find the ClusterIP for the kubernetes service:

kubectl get svc kubernetes
API=$(kubectl get svc kubernetes -o json | jq -r .spec.clusterIP)

Connect without the token:
```
curl -k https://$API
```

Connect with the token:

curl -k -H "Authorization: Bearer $TOKEN" https://$API

k8s/authn-authz.md

468 / 724

Results

In both cases, we will get a "Forbidden" error
Without authentication, the user is system:anonymous
With authentication, it is shown as system:serviceaccount:default:default
The API "sees" us as a different user
But neither user has any rights, so we can't do nothin'
Let's change that!

k8s/authn-authz.md

469 / 724

Authorization in Kubernetes

There are multiple ways to grant permissions in Kubernetes, called authorizers:
- Node Authorization (used internally by kubelet; we can ignore it)
- Attribute-based access control (powerful but complex and static; ignore it too)
- Webhook (each API request is submitted to an external service for approval)
- Role-based access control (associates permissions to users dynamically)
The one we want is the last one, generally abbreviated as RBAC

k8s/authn-authz.md

470 / 724

Role-based access control

RBAC allows to specify fine-grained permissions
Permissions are expressed as rules
A rule is a combination of:
- verbs like create, get, list, update, delete...
- resources (as in "API resource," like pods, nodes, services...)
- resource names (to specify e.g. one specific pod instead of all pods)
- in some case, subresources (e.g. logs are subresources of pods)

k8s/authn-authz.md

471 / 724

From rules to roles to rolebindings

A role is an API object containing a list of rules

Example: role "external-load-balancer-configurator" can:
- [list, get] resources [endpoints, services, pods]
- [update] resources [services]
A rolebinding associates a role with a user

Example: rolebinding "external-load-balancer-configurator":
- associates user "external-load-balancer-configurator"
- with role "external-load-balancer-configurator"
Yes, there can be users, roles, and rolebindings with the same name
It's a good idea for 1-1-1 bindings; not so much for 1-N ones

k8s/authn-authz.md

472 / 724

Cluster-scope permissions

API resources Role and RoleBinding are for objects within a namespace
We can also define API resources ClusterRole and ClusterRoleBinding
These are a superset, allowing us to:
- specify actions on cluster-wide objects (like nodes)
- operate across all namespaces
We can create Role and RoleBinding resources within a namespace
ClusterRole and ClusterRoleBinding resources are global

k8s/authn-authz.md

473 / 724

Pods and service accounts

A pod can be associated with a service account
- by default, it is associated with the default service account
- as we saw earlier, this service account has no permissions anyway
The associated token is exposed to the pod's filesystem

(in /var/run/secrets/kubernetes.io/serviceaccount/token)
Standard Kubernetes tooling (like kubectl) will look for it there
So Kubernetes tools running in a pod will automatically use the service account

k8s/authn-authz.md

474 / 724

In practice

We are going to create a service account
We will use a default cluster role (view)
We will bind together this role and this service account
Then we will run a pod using that service account
In this pod, we will install kubectl and check our permissions

k8s/authn-authz.md

475 / 724

Creating a service account

We will call the new service account viewer

(note that nothing prevents us from calling it view, like the role)

Create the new service account:
```
kubectl create serviceaccount viewer
```
List service accounts now:
```
kubectl get serviceaccounts
```

k8s/authn-authz.md

476 / 724

Binding a role to the service account

Binding a role = creating a rolebinding object
We will call that object viewercanview

(but again, we could call it view)

Create the new role binding:

kubectl create rolebinding viewercanview \
      --clusterrole=view \
      --serviceaccount=default:viewer

It's important to note a couple of details in these flags...

k8s/authn-authz.md

477 / 724

Roles vs Cluster Roles

We used --clusterrole=view
What would have happened if we had used --role=view?
- we would have bound the role view from the local namespace
  (instead of the cluster role view)
- the command would have worked fine (no error)
- but later, our API requests would have been denied
This is a deliberate design decision

(we can reference roles that don't exist, and create/update them later)

k8s/authn-authz.md

478 / 724

Users vs Service Accounts

We used --serviceaccount=default:viewer
What would have happened if we had used --user=default:viewer?
- we would have bound the role to a user instead of a service account
- again, the command would have worked fine (no error)
- ...but our API requests would have been denied later
What's about the default: prefix?
- that's the namespace of the service account
- yes, it could be inferred from context, but... kubectl requires it

k8s/authn-authz.md

479 / 724

Testing

We will run an alpine pod and install kubectl there

Run a one-time pod:

kubectl run eyepod --rm -ti --restart=Never \
      --serviceaccount=viewer \
      --image alpine

Install curl, then use it to install kubectl:

apk add --no-cache curl
URLBASE=https://storage.googleapis.com/kubernetes-release/release
KUBEVER=$(curl -s $URLBASE/stable.txt)
curl -LO $URLBASE/$KUBEVER/bin/linux/amd64/kubectl
chmod +x kubectl

k8s/authn-authz.md

480 / 724

Running `kubectl` in the pod

We'll try to use our view permissions, then to create an object

Check that we can, indeed, view things:
```
./kubectl get all
```

But that we can't create things:

./kubectl create deployment testrbac --image=nginx

Exit the container with exit or ^D

k8s/authn-authz.md

481 / 724

Testing directly with `kubectl`

We can also check for permission with kubectl auth can-i:

kubectl auth can-i list nodes
kubectl auth can-i create pods
kubectl auth can-i get pod/name-of-pod
kubectl auth can-i get /url-fragment-of-api-request/
kubectl auth can-i '*' services

And we can check permissions on behalf of other users:

kubectl auth can-i list nodes \
      --as some-user
kubectl auth can-i list nodes \
      --as system:serviceaccount:<namespace>:<name-of-service-account>

k8s/authn-authz.md

482 / 724

Where does this `view` role come from?

Kubernetes defines a number of ClusterRoles intended to be bound to users
cluster-admin can do everything (think root on UNIX)
admin can do almost everything (except e.g. changing resource quotas and limits)
edit is similar to admin, but cannot view or edit permissions
view has read-only access to most resources, except permissions and secrets

In many situations, these roles will be all you need.

You can also customize them!

k8s/authn-authz.md

483 / 724

Customizing the default roles

If you need to add permissions to these default roles (or others),
you can do it through the ClusterRole Aggregation mechanism

This happens by creating a ClusterRole with the following labels:

metadata:
  labels:
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"

This ClusterRole permissions will be added to admin/edit/view respectively
This is particulary useful when using CustomResourceDefinitions

(since Kubernetes cannot guess which resources are sensitive and which ones aren't)

k8s/authn-authz.md

484 / 724

Where do our permissions come from?

When interacting with the Kubernetes API, we are using a client certificate
We saw previously that this client certificate contained:

CN=kubernetes-admin and O=system:masters
Let's look for these in existing ClusterRoleBindings:
```
kubectl get clusterrolebindings -o yaml | 
grep -e kubernetes-admin -e system:masters
```
(system:masters should show up, but not kubernetes-admin.)
Where does this match come from?

k8s/authn-authz.md

485 / 724

The `system:masters` group

If we eyeball the output of kubectl get clusterrolebindings -o yaml, we'll find out!

It is in the cluster-admin binding:

kubectl describe clusterrolebinding cluster-admin

This binding associates system:masters with the cluster role cluster-admin

And the cluster-admin is, basically, root:

kubectl describe clusterrole cluster-admin

k8s/authn-authz.md

486 / 724

Figuring out who can do what

For auditing purposes, sometimes we want to know who can perform an action
There is a proof-of-concept tool by Aqua Security which does exactly that:

https://github.com/aquasecurity/kubectl-who-can

This is one way to install it:

docker run --rm -v /usr/local/bin:/go/bin golang \
     go get -v github.com/aquasecurity/kubectl-who-can

This is one way to use it:
```
kubectl-who-can create pods
```

k8s/authn-authz.md

487 / 724

Image separating from the next chapter

488 / 724

The CSR API

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

489 / 724

The CSR API

The Kubernetes API exposes CSR resources
We can use these resources to issue TLS certificates
First, we will go through a quick reminder about TLS certificates
Then, we will see how to obtain a certificate for a user
We will use that certificate to authenticate with the cluster
Finally, we will grant some privileges to that user

k8s/csr-api.md

490 / 724

Reminder about TLS

TLS (Transport Layer Security) is a protocol providing:
- encryption (to prevent eavesdropping)
- authentication (using public key cryptography)
When we access an https:// URL, the server authenticates itself

(it proves its identity to us; as if it were "showing its ID")
But we can also have mutual TLS authentication (mTLS)

(client proves its identity to server; server proves its identity to client)

k8s/csr-api.md

491 / 724

Authentication with certificates

To authenticate, someone (client or server) needs:
- a private key (that remains known only to them)
- a public key (that they can distribute)
- a certificate (associating the public key with an identity)
A message encrypted with the private key can only be decrypted with the public key

(and vice versa)
If I use someone's public key to encrypt/decrypt their messages,
I can be certain that I am talking to them / they are talking to me
The certificate proves that I have the correct public key for them

k8s/csr-api.md

492 / 724

Certificate generation workflow

This is what I do if I want to obtain a certificate.

Create public and private keys.
Create a Certificate Signing Request (CSR).

(The CSR contains the identity that I claim and a public key.)
Send that CSR to the Certificate Authority (CA).
The CA verifies that I can claim the identity in the CSR.
The CA generates my certificate and gives it to me.

The CA (or anyone else) never needs to know my private key.

k8s/csr-api.md

493 / 724

The CSR API

The Kubernetes API has a CertificateSigningRequest resource type

(we can list them with e.g. kubectl get csr)
We can create a CSR object

(= upload a CSR to the Kubernetes API)
Then, using the Kubernetes API, we can approve/deny the request
If we approve the request, the Kubernetes API generates a certificate
The certificate gets attached to the CSR object and can be retrieved

k8s/csr-api.md

494 / 724

Using the CSR API

We will show how to use the CSR API to obtain user certificates
This will be a rather complex demo
... And yet, we will take a few shortcuts to simplify it

(but it will illustrate the general idea)
The demo also won't be automated

(we would have to write extra code to make it fully functional)

k8s/csr-api.md

495 / 724

General idea

We will create a Namespace named "users"
Each user will get a ServiceAccount in that Namespace
That ServiceAccount will give read/write access to one CSR object
Users will use that ServiceAccount's token to submit a CSR
We will approve the CSR (or not)
Users can then retrieve their certificate from their CSR object
...And use that certificate for subsequent interactions

k8s/csr-api.md

496 / 724

Resource naming

For a user named jean.doe, we will have:

ServiceAccount jean.doe in Namespace users
CertificateSigningRequest users:jean.doe
ClusterRole users:jean.doe giving read/write access to that CSR
ClusterRoleBinding users:jean.doe binding ClusterRole and ServiceAccount

k8s/csr-api.md

497 / 724

Creating the user's resources

If you want to use another name than jean.doe, update the YAML file!

Create the global namespace for all users:
```
kubectl create namespace users
```
Create the ServiceAccount, ClusterRole, ClusterRoleBinding for jean.doe:
```
kubectl apply -f ~/container.training/k8s/users:jean.doe.yaml
```

k8s/csr-api.md

498 / 724

Extracting the user's token

Let's obtain the user's token and give it to them

(the token will be their password)

List the user's secrets:

kubectl --namespace=users describe serviceaccount jean.doe

Show the user's token:

kubectl --namespace=users describe secret jean.doe-token-xxxxx

k8s/csr-api.md

499 / 724

Configure `kubectl` to use the token

Let's create a new context that will use that token to access the API

Add a new identity to our kubeconfig file:

kubectl config set-credentials token:jean.doe --token=...

Add a new context using that identity:

kubectl config set-context jean.doe --user=token:jean.doe --cluster=kubernetes

k8s/csr-api.md

500 / 724

Access the API with the token

Let's check that our access rights are set properly

Try to access any resource:
```
kubectl get pods
```
(This should tell us "Forbidden")
Try to access "our" CertificateSigningRequest:
```
kubectl get csr users:jean.doe
```
(This should tell us "NotFound")

k8s/csr-api.md

501 / 724

Create a key and a CSR

There are many tools to generate TLS keys and CSRs
Let's use OpenSSL; it's not the best one, but it's installed everywhere

(many people prefer cfssl, easyrsa, or other tools; that's fine too!)

Generate the key and certificate signing request:

openssl req -newkey rsa:2048 -nodes -keyout key.pem \
            -new -subj /CN=jean.doe/O=devs/ -out csr.pem

The command above generates:

a 2048-bit RSA key, without encryption, stored in key.pem
a CSR for the name jean.doe in group devs

k8s/csr-api.md

502 / 724

Inside the Kubernetes CSR object

The Kubernetes CSR object is a thin wrapper around the CSR PEM file
The PEM file needs to be encoded to base64 on a single line

(we will use base64 -w0 for that purpose)
The Kubernetes CSR object also needs to list the right "usages"

(these are flags indicating how the certificate can be used)

k8s/csr-api.md

503 / 724

Sending the CSR to Kubernetes

Generate and create the CSR resource:

kubectl apply -f - <<EOF
apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
  name: users:jean.doe
spec:
  request: $(base64 -w0 < csr.pem)
  usages:
  - digital signature
  - key encipherment
  - client auth
EOF

k8s/csr-api.md

504 / 724

Adjusting certificate expiration

By default, the CSR API generates certificates valid 1 year
We want to generate short-lived certificates, so we will lower that to 1 hour
For now, this is configured through an experimental controller manager flag

Edit the static pod definition for the controller manager:

sudo vim /etc/kubernetes/manifests/kube-controller-manager.yaml

In the list of flags, add the following line:
```
- --experimental-cluster-signing-duration=1h
```

k8s/csr-api.md

505 / 724

Verifying and approving the CSR

Let's inspect the CSR, and if it is valid, approve it

Switch back to cluster-admin:
```
kctx -
```
Inspect the CSR:
```
kubectl describe csr users:jean.doe
```

Approve it:

kubectl certificate approve users:jean.doe

k8s/csr-api.md

506 / 724

Obtaining the certificate

Switch back to the user's identity:
```
kctx -
```

Retrieve the updated CSR object and extract the certificate:

kubectl get csr users:jean.doe \
      -o jsonpath={.status.certificate} \
      | base64 -d > cert.pem

Inspect the certificate:
```
openssl x509 -in cert.pem -text -noout
```

k8s/csr-api.md

507 / 724

Using the certificate

Add the key and certificate to kubeconfig:

kubectl config set-credentials cert:jean.doe --embed-certs \
      --client-certificate=cert.pem --client-key=key.pem

Update the user's context to use the key and cert to authenticate:
```
kubectl config set-context jean.doe --user cert:jean.doe
```
Confirm that we are seen as jean.doe (but don't have permissions):
```
kubectl get pods
```

k8s/csr-api.md

508 / 724

What's missing?

We have just shown, step by step, a method to issue short-lived certificates for users.

To be usable in real environments, we would need to add:

a kubectl helper to automatically generate the CSR and obtain the cert

(and transparently renew the cert when needed)
a Kubernetes controller to automatically validate and approve CSRs

(checking that the subject and groups are valid)
a way for the users to know the groups to add to their CSR

(e.g.: annotations on their ServiceAccount + read access to the ServiceAccount)

k8s/csr-api.md

509 / 724

Is this realistic?

Larger organizations typically integrate with their own directory
The general principle, however, is the same:
- users have long-term credentials (password, token, ...)
- they use these credentials to obtain other, short-lived credentials
This provides enhanced security:
- the long-term credentials can use long passphrases, 2FA, HSM...
- the short-term credentials are more convenient to use
- we get strong security and convenience
Systems like Vault also have certificate issuance mechanisms

k8s/csr-api.md

510 / 724

Image separating from the next chapter

511 / 724

Pod Security Policies

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

512 / 724

Pod Security Policies

By default, our pods and containers can do everything

(including taking over the entire cluster)
We are going to show an example of a malicious pod
Then we will explain how to avoid this with PodSecurityPolicies
We will enable PodSecurityPolicies on our cluster
We will create a couple of policies (restricted and permissive)
Finally we will see how to use them to improve security on our cluster

k8s/podsecuritypolicy.md

513 / 724

Setting up a namespace

For simplicity, let's work in a separate namespace
Let's create a new namespace called "green"

Create the "green" namespace:
```
kubectl create namespace green
```
Change to that namespace:
```
kns green
```

k8s/podsecuritypolicy.md

514 / 724

Creating a basic Deployment

Just to check that everything works correctly, deploy NGINX

Create a Deployment using the official NGINX image:
```
kubectl create deployment web --image=nginx
```
Confirm that the Deployment, ReplicaSet, and Pod exist, and that the Pod is running:
```
kubectl get all
```

k8s/podsecuritypolicy.md

515 / 724

One example of malicious pods

We will now show an escalation technique in action
We will deploy a DaemonSet that adds our SSH key to the root account

(on each node of the cluster)
The Pods of the DaemonSet will do so by mounting /root from the host

Check the file k8s/hacktheplanet.yaml with a text editor:
```
vim ~/container.training/k8s/hacktheplanet.yaml
```
If you would like, change the SSH key (by changing the GitHub user name)

k8s/podsecuritypolicy.md

516 / 724

Deploying the malicious pods

Let's deploy our "exploit"!

Create the DaemonSet:

kubectl create -f ~/container.training/k8s/hacktheplanet.yaml

Check that the pods are running:
```
kubectl get pods
```
Confirm that the SSH key was added to the node's root account:
```
sudo cat /root/.ssh/authorized_keys
```

k8s/podsecuritypolicy.md

517 / 724

Cleaning up

Before setting up our PodSecurityPolicies, clean up that namespace

Remove the DaemonSet:
```
kubectl delete daemonset hacktheplanet
```
Remove the Deployment:
```
kubectl delete deployment web
```

k8s/podsecuritypolicy.md

518 / 724

Pod Security Policies in theory

To use PSPs, we need to activate their specific admission controller
That admission controller will intercept each pod creation attempt
It will look at:
- who/what is creating the pod
- which PodSecurityPolicies they can use
- which PodSecurityPolicies can be used by the Pod's ServiceAccount
Then it will compare the Pod with each PodSecurityPolicy one by one
If a PodSecurityPolicy accepts all the parameters of the Pod, it is created
Otherwise, the Pod creation is denied and it won't even show up in kubectl get pods

k8s/podsecuritypolicy.md

519 / 724

Pod Security Policies fine print

With RBAC, using a PSP corresponds to the verb use on the PSP

(that makes sense, right?)
If no PSP is defined, no Pod can be created

(even by cluster admins)
Pods that are already running are not affected
If we create a Pod directly, it can use a PSP to which we have access
If the Pod is created by e.g. a ReplicaSet or DaemonSet, it's different:
- the ReplicaSet / DaemonSet controllers don't have access to our policies
- therefore, we need to give access to the PSP to the Pod's ServiceAccount

k8s/podsecuritypolicy.md

520 / 724

Pod Security Policies in practice

We are going to enable the PodSecurityPolicy admission controller
At that point, we won't be able to create any more pods (!)
Then we will create a couple of PodSecurityPolicies
...And associated ClusterRoles (giving use access to the policies)
Then we will create RoleBindings to grant these roles to ServiceAccounts
We will verify that we can't run our "exploit" anymore

k8s/podsecuritypolicy.md

521 / 724

Enabling Pod Security Policies

To enable Pod Security Policies, we need to enable their admission plugin
This is done by adding a flag to the API server
On clusters deployed with kubeadm, the control plane runs in static pods
These pods are defined in YAML files located in /etc/kubernetes/manifests
Kubelet watches this directory
Each time a file is added/removed there, kubelet creates/deletes the corresponding pod
Updating a file causes the pod to be deleted and recreated

k8s/podsecuritypolicy.md

522 / 724

Updating the API server flags

Let's edit the manifest for the API server pod

Have a look at the static pods:
```
ls -l /etc/kubernetes/manifests
```

Edit the one corresponding to the API server:

sudo vim /etc/kubernetes/manifests/kube-apiserver.yaml

k8s/podsecuritypolicy.md

523 / 724

Adding the PSP admission plugin

There should already be a line with --enable-admission-plugins=...
Let's add PodSecurityPolicy on that line

Locate the line with --enable-admission-plugins=
Add PodSecurityPolicy

It should read: --enable-admission-plugins=NodeRestriction,PodSecurityPolicy
Save, quit

k8s/podsecuritypolicy.md

524 / 724

Waiting for the API server to restart

The kubelet detects that the file was modified
It kills the API server pod, and starts a new one
During that time, the API server is unavailable

Wait until the API server is available again

k8s/podsecuritypolicy.md

525 / 724

Check that the admission plugin is active

Normally, we can't create any Pod at this point

Try to create a Pod directly:

kubectl run testpsp1 --image=nginx --restart=Never

Try to create a Deployment:
```
kubectl run testpsp2 --image=nginx
```
Look at existing resources:
```
kubectl get all
```

We can get hints at what's happening by looking at the ReplicaSet and Events.

k8s/podsecuritypolicy.md

526 / 724

Introducing our Pod Security Policies

We will create two policies:
- privileged (allows everything)
- restricted (blocks some unsafe mechanisms)
For each policy, we also need an associated ClusterRole granting use

k8s/podsecuritypolicy.md

527 / 724

Creating our Pod Security Policies

We have a couple of files, each defining a PSP and associated ClusterRole:
- k8s/psp-privileged.yaml: policy privileged, role psp:privileged
- k8s/psp-restricted.yaml: policy restricted, role psp:restricted

Create both policies and their associated ClusterRoles:

kubectl create -f ~/container.training/k8s/psp-restricted.yaml
kubectl create -f ~/container.training/k8s/psp-privileged.yaml

The privileged policy comes from the Kubernetes documentation
The restricted policy is inspired by that same documentation page

k8s/podsecuritypolicy.md

528 / 724

Check that we can create Pods again

We haven't bound the policy to any user yet
But cluster-admin can implicitly use all policies

Check that we can now create a Pod directly:

kubectl run testpsp3 --image=nginx --restart=Never

Create a Deployment as well:
```
kubectl run testpsp4 --image=nginx
```
Confirm that the Deployment is not creating any Pods:
```
kubectl get all
```

k8s/podsecuritypolicy.md

529 / 724

What's going on?

We can create Pods directly (thanks to our root-like permissions)
The Pods corresponding to a Deployment are created by the ReplicaSet controller
The ReplicaSet controller does not have root-like permissions
We need to either:
- grant permissions to the ReplicaSet controller
or
- grant permissions to our Pods' ServiceAccount
The first option would allow anyone to create pods
The second option will allow us to scope the permissions better

k8s/podsecuritypolicy.md

530 / 724

Binding the restricted policy

Let's bind the role psp:restricted to ServiceAccount green:default

(aka the default ServiceAccount in the green Namespace)
This will allow Pod creation in the green Namespace

(because these Pods will be using that ServiceAccount automatically)

Create the following RoleBinding:

kubectl create rolebinding psp:restricted \
        --clusterrole=psp:restricted \
        --serviceaccount=green:default

k8s/podsecuritypolicy.md

531 / 724

Trying it out

The Deployments that we created earlier will eventually recover

(the ReplicaSet controller will retry to create Pods once in a while)
If we create a new Deployment now, it should work immediately

Create a simple Deployment:

kubectl create deployment testpsp5 --image=nginx

Look at the Pods that have been created:
```
kubectl get all
```

k8s/podsecuritypolicy.md

532 / 724

Trying to hack the cluster

Let's create the same DaemonSet we used earlier

Create a hostile DaemonSet:

kubectl create -f ~/container.training/k8s/hacktheplanet.yaml

Look at the state of the namespace:
```
kubectl get all
```

k8s/podsecuritypolicy.md

533 / 724

What's in our restricted policy?

The restricted PSP is similar to the one provided in the docs, but:
- it allows containers to run as root
- it doesn't drop capabilities
Many containers run as root by default, and would require additional tweaks
Many containers use e.g. chown, which requires a specific capability

(that's the case for the NGINX official image, for instance)
We still block: hostPath, privileged containers, and much more!

k8s/podsecuritypolicy.md

534 / 724

The case of static pods

If we list the pods in the kube-system namespace, kube-apiserver is missing
However, the API server is obviously running

(otherwise, kubectl get pods --namespace=kube-system wouldn't work)
The API server Pod is created directly by kubelet

(without going through the PSP admission plugin)
Then, kubelet creates a "mirror pod" representing that Pod in etcd
That "mirror pod" creation goes through the PSP admission plugin
And it gets blocked!
This can be fixed by binding psp:privileged to group system:nodes

k8s/podsecuritypolicy.md

535 / 724

Before moving on...

Our cluster is currently broken

(we can't create pods in namespaces kube-system, default, ...)
We need to either:
- disable the PSP admission plugin
- allow use of PSP to relevant users and groups
For instance, we could:
- bind psp:restricted to the group system:authenticated
- bind psp:privileged to the ServiceAccount kube-system:default

k8s/podsecuritypolicy.md

536 / 724

Image separating from the next chapter

537 / 724

Exposing HTTP services with Ingress resources

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

538 / 724

Exposing HTTP services with Ingress resources

Services give us a way to access a pod or a set of pods
Services can be exposed to the outside world:
- with type NodePort (on a port >30000)
- with type LoadBalancer (allocating an external load balancer)
What about HTTP services?
- how can we expose webui, rng, hasher?
- the Kubernetes dashboard?
- a new version of webui?

k8s/ingress.md

539 / 724

Exposing HTTP services

If we use NodePort services, clients have to specify port numbers

(i.e. http://xxxxx:31234 instead of just http://xxxxx)
LoadBalancer services are nice, but:
- they are not available in all environments
- they often carry an additional cost (e.g. they provision an ELB)
- they require one extra step for DNS integration
  (waiting for the LoadBalancer to be provisioned; then adding it to DNS)
We could build our own reverse proxy

k8s/ingress.md

540 / 724

Building a custom reverse proxy

There are many options available:

Apache, HAProxy, Hipache, NGINX, Traefik, ...

(look at jpetazzo/aiguillage for a minimal reverse proxy configuration using NGINX)
Most of these options require us to update/edit configuration files after each change
Some of them can pick up virtual hosts and backends from a configuration store
Wouldn't it be nice if this configuration could be managed with the Kubernetes API?

541 / 724

Building a custom reverse proxy

There are many options available:

Apache, HAProxy, Hipache, NGINX, Traefik, ...

(look at jpetazzo/aiguillage for a minimal reverse proxy configuration using NGINX)
Most of these options require us to update/edit configuration files after each change
Some of them can pick up virtual hosts and backends from a configuration store
Wouldn't it be nice if this configuration could be managed with the Kubernetes API?
Enter¹ Ingress resources!

¹ Pun maybe intended.

k8s/ingress.md

542 / 724

Ingress resources

Kubernetes API resource (kubectl get ingress/ingresses/ing)
Designed to expose HTTP services
Basic features:
- load balancing
- SSL termination
- name-based virtual hosting
Can also route to different services depending on:
- URI path (e.g. /api→api-service, /static→assets-service)
- Client headers, including cookies (for A/B testing, canary deployment...)
- and more!

k8s/ingress.md

543 / 724

Principle of operation

Step 1: deploy an ingress controller
- ingress controller = load balancer + control loop
- the control loop watches over ingress resources, and configures the LB accordingly
Step 2: set up DNS
- associate DNS entries with the load balancer address
Step 3: create ingress resources
- the ingress controller picks up these resources and configures the LB
Step 4: profit!

k8s/ingress.md

544 / 724

Ingress in action

We will deploy the Traefik ingress controller
- this is an arbitrary choice
- maybe motivated by the fact that Traefik releases are named after cheeses
For DNS, we will use nip.io
- *.1.2.3.4.nip.io resolves to 1.2.3.4
We will create ingress resources for various HTTP services

k8s/ingress.md

545 / 724

Deploying pods listening on port 80

We want our ingress load balancer to be available on port 80
We could do that with a LoadBalancer service

... but it requires support from the underlying infrastructure
We could use pods specifying hostPort: 80

... but with most CNI plugins, this doesn't work or requires additional setup
We could use a NodePort service

... but that requires changing the --service-node-port-range flag in the API server
Last resort: the hostNetwork mode

k8s/ingress.md

546 / 724

Without `hostNetwork`

Normally, each pod gets its own network namespace

(sometimes called sandbox or network sandbox)
An IP address is assigned to the pod
This IP address is routed/connected to the cluster network
All containers of that pod are sharing that network namespace

(and therefore using the same IP address)

k8s/ingress.md

547 / 724

With `hostNetwork: true`

No network namespace gets created
The pod is using the network namespace of the host
It "sees" (and can use) the interfaces (and IP addresses) of the host
The pod can receive outside traffic directly, on any port
Downside: with most network plugins, network policies won't work for that pod
- most network policies work at the IP address level
- filtering that pod = filtering traffic from the node

k8s/ingress.md

548 / 724

Running Traefik

The Traefik documentation tells us to pick between Deployment and Daemon Set
We are going to use a Daemon Set so that each node can accept connections
We will do two minor changes to the YAML provided by Traefik:
- enable hostNetwork
- add a toleration so that Traefik also runs on node1

k8s/ingress.md

549 / 724

Taints and tolerations

A taint is an attribute added to a node
It prevents pods from running on the node
... Unless they have a matching toleration
When deploying with kubeadm:
- a taint is placed on the node dedicated to the control plane
- the pods running the control plane have a matching toleration

k8s/ingress.md

550 / 724

Checking taints on our nodes

Check our nodes specs:

kubectl get node node1 -o json | jq .spec
kubectl get node node2 -o json | jq .spec

We should see a result only for node1 (the one with the control plane):

  "taints": [
    {
      "effect": "NoSchedule",
      "key": "node-role.kubernetes.io/master"
    }
  ]

k8s/ingress.md

551 / 724

Understanding a taint

The key can be interpreted as:
- a reservation for a special set of pods
  (here, this means "this node is reserved for the control plane")
- an error condition on the node
  (for instance: "disk full," do not start new pods here!)
The effect can be:
- NoSchedule (don't run new pods here)
- PreferNoSchedule (try not to run new pods here)
- NoExecute (don't run new pods and evict running pods)

k8s/ingress.md

552 / 724

Checking tolerations on the control plane

Check tolerations for CoreDNS:

kubectl -n kube-system get deployments coredns -o json |
      jq .spec.template.spec.tolerations

The result should include:

  {
    "effect": "NoSchedule",
    "key": "node-role.kubernetes.io/master"
  }

It means: "bypass the exact taint that we saw earlier on node1."

k8s/ingress.md

553 / 724

Special tolerations

Check tolerations on kube-proxy:

kubectl -n kube-system get ds kube-proxy -o json | 
      jq .spec.template.spec.tolerations

The result should include:

  {
    "operator": "Exists"
  }

This one is a special case that means "ignore all taints and run anyway."

k8s/ingress.md

554 / 724

Running Traefik on our cluster

We provide a YAML file (k8s/traefik.yaml) which is essentially the sum of:
- Traefik's Daemon Set resources (patched with hostNetwork and tolerations)
- Traefik's RBAC rules allowing it to watch necessary API objects

Apply the YAML:

kubectl apply -f ~/container.training/k8s/traefik.yaml

k8s/ingress.md

555 / 724

Checking that Traefik runs correctly

If Traefik started correctly, we now have a web server listening on each node

Check that Traefik is serving 80/tcp:
```
curl localhost
```

We should get a 404 page not found error.

This is normal: we haven't provided any ingress rule yet.

k8s/ingress.md

556 / 724

Setting up DNS

To make our lives easier, we will use nip.io
Check out http://cheddar.A.B.C.D.nip.io

(replacing A.B.C.D with the IP address of node1)
We should get the same 404 page not found error

(meaning that our DNS is "set up properly", so to speak!)

k8s/ingress.md

557 / 724

Traefik web UI

Traefik provides a web dashboard
With the current install method, it's listening on port 8080

Go to http://node1:8080 (replacing node1 with its IP address)

k8s/ingress.md

558 / 724

Setting up host-based routing ingress rules

We are going to use errm/cheese images

(there are 3 tags available: wensleydale, cheddar, stilton)
These images contain a simple static HTTP server sending a picture of cheese
We will run 3 deployments (one for each cheese)
We will create 3 services (one for each deployment)
Then we will create 3 ingress rules (one for each service)
We will route <name-of-cheese>.A.B.C.D.nip.io to the corresponding deployment

k8s/ingress.md

559 / 724

Running cheesy web servers

Run all three deployments:

kubectl create deployment cheddar --image=errm/cheese:cheddar
kubectl create deployment stilton --image=errm/cheese:stilton
kubectl create deployment wensleydale --image=errm/cheese:wensleydale

Create a service for each of them:

kubectl expose deployment cheddar --port=80
kubectl expose deployment stilton --port=80
kubectl expose deployment wensleydale --port=80

k8s/ingress.md

560 / 724

What does an ingress resource look like?

Here is a minimal host-based ingress resource:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: cheddar
spec:
  rules:
  - host: cheddar.A.B.C.D.nip.io
    http:
      paths:
      - path: /
        backend:
          serviceName: cheddar
          servicePort: 80

(It is in k8s/ingress.yaml.)

k8s/ingress.md

561 / 724

Creating our first ingress resources

Edit the file ~/container.training/k8s/ingress.yaml
Replace A.B.C.D with the IP address of node1
Apply the file
Open http://cheddar.A.B.C.D.nip.io

(An image of a piece of cheese should show up.)

k8s/ingress.md

562 / 724

Creating the other ingress resources

Edit the file ~/container.training/k8s/ingress.yaml
Replace cheddar with stilton (in name, host, serviceName)
Apply the file
Check that stilton.A.B.C.D.nip.io works correctly
Repeat for wensleydale

k8s/ingress.md

563 / 724

Using multiple ingress controllers

You can have multiple ingress controllers active simultaneously

(e.g. Traefik and NGINX)
You can even have multiple instances of the same controller

(e.g. one for internal, another for external traffic)
The kubernetes.io/ingress.class annotation can be used to tell which one to use
It's OK if multiple ingress controllers configure the same resource

(it just means that the service will be accessible through multiple paths)

k8s/ingress.md

564 / 724

Ingress: the good

The traffic flows directly from the ingress load balancer to the backends
- it doesn't need to go through the ClusterIP
- in fact, we don't even need a ClusterIP (we can use a headless service)
The load balancer can be outside of Kubernetes

(as long as it has access to the cluster subnet)
This allows the use of external (hardware, physical machines...) load balancers
Annotations can encode special features

(rate-limiting, A/B testing, session stickiness, etc.)

k8s/ingress.md

565 / 724

Ingress: the bad

Aforementioned "special features" are not standardized yet
Some controllers will support them; some won't
Even relatively common features (stripping a path prefix) can differ:
- traefik.ingress.kubernetes.io/rule-type: PathPrefixStrip
- ingress.kubernetes.io/rewrite-target: /
This should eventually stabilize

(remember that ingresses are currently apiVersion: extensions/v1beta1)

k8s/ingress.md

566 / 724

Image separating from the next chapter

567 / 724

Collecting metrics with Prometheus

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

568 / 724

Collecting metrics with Prometheus

Prometheus is an open-source monitoring system including:
- multiple service discovery backends to figure out which metrics to collect
- a scraper to collect these metrics
- an efficient time series database to store these metrics
- a specific query language (PromQL) to query these time series
- an alert manager to notify us according to metrics values or trends
We are going to use it to collect and query some metrics on our Kubernetes cluster

k8s/prometheus.md

569 / 724

Why Prometheus?

We don't endorse Prometheus more or less than any other system
It's relatively well integrated within the cloud-native ecosystem
It can be self-hosted (this is useful for tutorials like this)
It can be used for deployments of varying complexity:
- one binary and 10 lines of configuration to get started
- all the way to thousands of nodes and millions of metrics

k8s/prometheus.md

570 / 724

Exposing metrics to Prometheus

Prometheus obtains metrics and their values by querying exporters
An exporter serves metrics over HTTP, in plain text
This is what the node exporter looks like:

http://demo.robustperception.io:9100/metrics
Prometheus itself exposes its own internal metrics, too:

http://demo.robustperception.io:9090/metrics
If you want to expose custom metrics to Prometheus:
- serve a text page like these, and you're good to go
- libraries are available in various languages to help with quantiles etc.

k8s/prometheus.md

571 / 724

How Prometheus gets these metrics

The Prometheus server will scrape URLs like these at regular intervals

(by default: every minute; can be more/less frequent)
If you're worried about parsing overhead: exporters can also use protobuf
The list of URLs to scrape (the scrape targets) is defined in configuration

k8s/prometheus.md

572 / 724

Defining scrape targets

This is maybe the simplest configuration file for Prometheus:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

In this configuration, Prometheus collects its own internal metrics
A typical configuration file will have multiple scrape_configs
In this configuration, the list of targets is fixed
A typical configuration file will use dynamic service discovery

k8s/prometheus.md

573 / 724

Service discovery

This configuration file will leverage existing DNS A records:

scrape_configs:
  - ...
  - job_name: 'node'
    dns_sd_configs:
      - names: ['api-backends.dc-paris-2.enix.io']
        type: 'A'
        port: 9100

In this configuration, Prometheus resolves the provided name(s)

(here, api-backends.dc-paris-2.enix.io)
Each resulting IP address is added as a target on port 9100

k8s/prometheus.md

574 / 724

Dynamic service discovery

In the DNS example, the names are re-resolved at regular intervals
As DNS records are created/updated/removed, scrape targets change as well
Existing data (previously collected metrics) is not deleted
Other service discovery backends work in a similar fashion

k8s/prometheus.md

575 / 724

Other service discovery mechanisms

Prometheus can connect to e.g. a cloud API to list instances
Or to the Kubernetes API to list nodes, pods, services ...
Or a service like Consul, Zookeeper, etcd, to list applications
The resulting configurations files are way more complex

(but don't worry, we won't need to write them ourselves)

k8s/prometheus.md

576 / 724

Time series database

We could wonder, "why do we need a specialized database?"
One metrics data point = metrics ID + timestamp + value
With a classic SQL or noSQL data store, that's at least 160 bits of data + indexes
Prometheus is way more efficient, without sacrificing performance

(it will even be gentler on the I/O subsystem since it needs to write less)
Would you like to know more? Check this video:

Storage in Prometheus 2.0 by Goutham V at DC17EU

k8s/prometheus.md

577 / 724

Checking if Prometheus is installed

Before trying to install Prometheus, let's check if it's already there

Look for services with a label app=prometheus across all namespaces:
```
kubectl get services --selector=app=prometheus --all-namespaces
```

If we see a NodePort service called prometheus-server, we're good!

(We can then skip to "Connecting to the Prometheus web UI".)

k8s/prometheus.md

578 / 724

Running Prometheus on our cluster

We need to:

Run the Prometheus server in a pod

(using e.g. a Deployment to ensure that it keeps running)
Expose the Prometheus server web UI (e.g. with a NodePort)
Run the node exporter on each node (with a Daemon Set)
Set up a Service Account so that Prometheus can query the Kubernetes API
Configure the Prometheus server

(storing the configuration in a Config Map for easy updates)

k8s/prometheus.md

579 / 724

Helm charts to the rescue

To make our lives easier, we are going to use a Helm chart
The Helm chart will take care of all the steps explained above

(including some extra features that we don't need, but won't hurt)

k8s/prometheus.md

580 / 724

Step 1: install Helm

If we already installed Helm earlier, these commands won't break anything

Install Tiller (Helm's server-side component) on our cluster:
```
helm init
```

Give Tiller permission to deploy things on our cluster:

kubectl create clusterrolebinding add-on-cluster-admin \
  --clusterrole=cluster-admin --serviceaccount=kube-system:default

k8s/prometheus.md

581 / 724

Step 2: install Prometheus

Skip this if we already installed Prometheus earlier

(in doubt, check with helm list)

Install Prometheus on our cluster:

helm upgrade prometheus stable/prometheus \
    --install \
    --namespace kube-system \
    --set server.service.type=NodePort \
    --set server.service.nodePort=30090 \
    --set server.persistentVolume.enabled=false \
    --set alertmanager.enabled=false

Curious about all these flags? They're explained in the next slide.

k8s/prometheus.md

582 / 724

Explaining all the Helm flags

helm upgrade prometheus → upgrade release "prometheus" to the latest version...

(a "release" is a unique name given to an app deployed with Helm)
stable/prometheus → ... of the chart prometheus in repo stable
--install → if the app doesn't exist, create it
--namespace kube-system → put it in that specific namespace
And set the following values when rendering the chart's templates:
- server.service.type=NodePort → expose the Prometheus server with a NodePort
- server.service.nodePort=30090 → set the specific NodePort number to use
- server.persistentVolume.enabled=false → do not use a PersistentVolumeClaim
- alertmanager.enabled=false → disable the alert manager entirely

k8s/prometheus.md

583 / 724

Connecting to the Prometheus web UI

Let's connect to the web UI and see what we can do

Figure out the NodePort that was allocated to the Prometheus server:
```
kubectl get svc --all-namespaces | grep prometheus-server
```
With your browser, connect to that port

k8s/prometheus.md

584 / 724

Querying some metrics

This is easy... if you are familiar with PromQL

Click on "Graph", and in "expression", paste the following:

sum by (instance) (
  irate(
    container_cpu_usage_seconds_total{
      pod_name=~"worker.*"
      }[5m]
  )
)

Click on the blue "Execute" button and on the "Graph" tab just below
We see the cumulated CPU usage of worker pods for each node
(if we just deployed Prometheus, there won't be much data to see, though)

k8s/prometheus.md

585 / 724

Getting started with PromQL

We can't learn PromQL in just 5 minutes
But we can cover the basics to get an idea of what is possible

(and have some keywords and pointers)
We are going to break down the query above

(building it one step at a time)

k8s/prometheus.md

586 / 724

Graphing one metric across all tags

This query will show us CPU usage across all containers:

container_cpu_usage_seconds_total

The suffix of the metrics name tells us:
- the unit (seconds of CPU)
- that it's the total used since the container creation
Since it's a "total," it is an increasing quantity

(we need to compute the derivative if we want e.g. CPU % over time)
We see that the metrics retrieved have tags attached to them

k8s/prometheus.md

587 / 724

Selecting metrics with tags

This query will show us only metrics for worker containers:

container_cpu_usage_seconds_total{pod_name=~"worker.*"}

The =~ operator allows regex matching
We select all the pods with a name starting with worker

(it would be better to use labels to select pods; more on that later)
The result is a smaller set of containers

k8s/prometheus.md

588 / 724

Transforming counters in rates

This query will show us CPU usage % instead of total seconds used:

100*irate(container_cpu_usage_seconds_total{pod_name=~"worker.*"}[5m])

The irate operator computes the "per-second instant rate of increase"
- rate is similar but allows decreasing counters and negative values
- with irate, if a counter goes back to zero, we don't get a negative spike
The [5m] tells how far to look back if there is a gap in the data
And we multiply with 100* to get CPU % usage

k8s/prometheus.md

589 / 724

Aggregation operators

This query sums the CPU usage per node:

sum by (instance) (
  irate(container_cpu_usage_seconds_total{pod_name=~"worker.*"}[5m])
)

instance corresponds to the node on which the container is running
sum by (instance) (...) computes the sum for each instance
Note: all the other tags are collapsed

(in other words, the resulting graph only shows the instance tag)
PromQL supports many more aggregation operators

k8s/prometheus.md

590 / 724

What kind of metrics can we collect?

Node metrics (related to physical or virtual machines)
Container metrics (resource usage per container)
Databases, message queues, load balancers, ...

(check out this list of exporters!)
Instrumentation (=deluxe printf for our code)
Business metrics (customers served, revenue, ...)

k8s/prometheus.md

591 / 724

Node metrics

CPU, RAM, disk usage on the whole node
Total number of processes running, and their states
Number of open files, sockets, and their states
I/O activity (disk, network), per operation or volume
Physical/hardware (when applicable): temperature, fan speed...
...and much more!

k8s/prometheus.md

592 / 724

Container metrics

Similar to node metrics, but not totally identical
RAM breakdown will be different
- active vs inactive memory
- some memory is shared between containers, and specially accounted for
I/O activity is also harder to track
- async writes can cause deferred "charges"
- some page-ins are also shared between containers

For details about container metrics, see:
http://jpetazzo.github.io/2013/10/08/docker-containers-metrics/

k8s/prometheus.md

593 / 724

Application metrics

Arbitrary metrics related to your application and business
System performance: request latency, error rate...
Volume information: number of rows in database, message queue size...
Business data: inventory, items sold, revenue...

k8s/prometheus.md

594 / 724

Detecting scrape targets

Prometheus can leverage Kubernetes service discovery

(with proper configuration)
Services or pods can be annotated with:
- prometheus.io/scrape: true to enable scraping
- prometheus.io/port: 9090 to indicate the port number
- prometheus.io/path: /metrics to indicate the URI (/metrics by default)
Prometheus will detect and scrape these (without needing a restart or reload)

k8s/prometheus.md

595 / 724

Querying labels

What if we want to get metrics for containers belonging to a pod tagged worker?
The cAdvisor exporter does not give us Kubernetes labels
Kubernetes labels are exposed through another exporter
We can see Kubernetes labels through metrics kube_pod_labels

(each container appears as a time series with constant value of 1)
Prometheus kind of supports "joins" between time series
But only if the names of the tags match exactly

k8s/prometheus.md

596 / 724

Unfortunately ...

The cAdvisor exporter uses tag pod_name for the name of a pod
The Kubernetes service endpoints exporter uses tag pod instead
See this blog post or this other one to see how to perform "joins"
Alas, Prometheus cannot "join" time series with different labels

(see Prometheus issue #2204 for the rationale)
There is a workaround involving relabeling, but it's "not cheap"
- see this comment for an overview
- or this blog post for a complete description of the process

k8s/prometheus.md

597 / 724

In practice

Grafana is a beautiful (and useful) frontend to display all kinds of graphs
Not everyone needs to know Prometheus, PromQL, Grafana, etc.
But in a team, it is valuable to have at least one person who know them
That person can set up queries and dashboards for the rest of the team
It's a little bit like knowing how to optimize SQL queries, Dockerfiles...

Don't panic if you don't know these tools!

...But make sure at least one person in your team is on it 💯

k8s/prometheus.md

598 / 724

Image separating from the next chapter

599 / 724

Volumes

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

600 / 724

Volumes

Volumes are special directories that are mounted in containers
Volumes can have many different purposes:
- share files and directories between containers running on the same machine
- share files and directories between containers and their host
- centralize configuration information in Kubernetes and expose it to containers
- manage credentials and secrets and expose them securely to containers
- store persistent data for stateful services
- access storage systems (like Ceph, EBS, NFS, Portworx, and many others)

k8s/volumes.md

601 / 724

Kubernetes volumes vs. Docker volumes

Kubernetes and Docker volumes are very similar

(the Kubernetes documentation says otherwise ...
but it refers to Docker 1.7, which was released in 2015!)
Docker volumes allow us to share data between containers running on the same host
Kubernetes volumes allow us to share data between containers in the same pod
Both Docker and Kubernetes volumes enable access to storage systems
Kubernetes volumes are also used to expose configuration and secrets
Docker has specific concepts for configuration and secrets
(but under the hood, the technical implementation is similar)
If you're not familiar with Docker volumes, you can safely ignore this slide!

k8s/volumes.md

602 / 724

Volumes ≠ Persistent Volumes

Volumes and Persistent Volumes are related, but very different!
Volumes:
- appear in Pod specifications (see next slide)
- do not exist as API resources (cannot do kubectl get volumes)
Persistent Volumes:
- are API resources (can do kubectl get persistentvolumes)
- correspond to concrete volumes (e.g. on a SAN, EBS, etc.)
- cannot be associated with a Pod directly; but through a Persistent Volume Claim
- won't be discussed further in this section

k8s/volumes.md

603 / 724

A simple volume example

apiVersion: v1
kind: Pod
metadata:
  name: nginx-with-volume
spec:
  volumes:
  - name: www
  containers:
  - name: nginx
    image: nginx
    volumeMounts:
    - name: www
      mountPath: /usr/share/nginx/html/

k8s/volumes.md

604 / 724

A simple volume example, explained

We define a standalone Pod named nginx-with-volume
In that pod, there is a volume named www
No type is specified, so it will default to emptyDir

(as the name implies, it will be initialized as an empty directory at pod creation)
In that pod, there is also a container named nginx
That container mounts the volume www to path /usr/share/nginx/html/

k8s/volumes.md

605 / 724

A volume shared between two containers

apiVersion: v1
kind: Pod
metadata:
  name: nginx-with-volume
spec:
  volumes:
  - name: www
  containers:
  - name: nginx
    image: nginx
    volumeMounts:
    - name: www
      mountPath: /usr/share/nginx/html/
  - name: git
    image: alpine
    command: [ "sh", "-c", "apk add --no-cache git && git clone https://github.com/octocat/Spoon-Knife /www" ]
    volumeMounts:
    - name: www
      mountPath: /www/
  restartPolicy: OnFailure

k8s/volumes.md

606 / 724

We added another container to the pod
That container mounts the www volume on a different path (/www)
It uses the alpine image
When started, it installs git and clones the octocat/Spoon-Knife repository

(that repository contains a tiny HTML website)
As a result, NGINX now serves this website

k8s/volumes.md

607 / 724

Let's try it!

Create the pod by applying the YAML file:

kubectl apply -f ~/container.training/k8s/nginx-with-volume.yaml

Check the IP address that was allocated to our pod:

kubectl get pod nginx-with-volume -o wide
IP=$(kubectl get pod nginx-with-volume -o json | jq -r .status.podIP)

Access the web server:
```
curl $IP
```

k8s/volumes.md

608 / 724

The devil is in the details

The default restartPolicy is Always
This would cause our git container to run again ... and again ... and again

(with an exponential back-off delay, as explained in the documentation)
That's why we specified restartPolicy: OnFailure
There is a short period of time during which the website is not available

(because the git container hasn't done its job yet)
This could be avoided by using Init Containers

(we will see a live example in a few sections)

k8s/volumes.md

609 / 724

Volume lifecycle

The lifecycle of a volume is linked to the pod's lifecycle
This means that a volume is created when the pod is created
This is mostly relevant for emptyDir volumes

(other volumes, like remote storage, are not "created" but rather "attached" )
A volume survives across container restarts
A volume is destroyed (or, for remote storage, detached) when the pod is destroyed

k8s/volumes.md

610 / 724

Image separating from the next chapter

611 / 724

Managing configuration

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

612 / 724

Managing configuration

Some applications need to be configured (obviously!)
There are many ways for our code to pick up configuration:
- command-line arguments
- environment variables
- configuration files
- configuration servers (getting configuration from a database, an API...)
- ... and more (because programmers can be very creative!)
How can we do these things with containers and Kubernetes?

k8s/configuration.md

613 / 724

Passing configuration to containers

There are many ways to pass configuration to code running in a container:
- baking it into a custom image
- command-line arguments
- environment variables
- injecting configuration files
- exposing it over the Kubernetes API
- configuration servers
Let's review these different strategies!

k8s/configuration.md

614 / 724

Baking custom images

Put the configuration in the image

(it can be in a configuration file, but also ENV or CMD actions)
It's easy! It's simple!
Unfortunately, it also has downsides:
- multiplication of images
- different images for dev, staging, prod ...
- minor reconfigurations require a whole build/push/pull cycle
Avoid doing it unless you don't have the time to figure out other options

k8s/configuration.md

615 / 724

Command-line arguments

Pass options to args array in the container specification

Example (source):

  args: 
    - "--data-dir=/var/lib/etcd"
    - "--advertise-client-urls=http://127.0.0.1:2379"
    - "--listen-client-urls=http://127.0.0.1:2379"
    - "--listen-peer-urls=http://127.0.0.1:2380"
    - "--name=etcd"

The options can be passed directly to the program that we run ...

... or to a wrapper script that will use them to e.g. generate a config file

k8s/configuration.md

616 / 724

Command-line arguments, pros & cons

Works great when options are passed directly to the running program

(otherwise, a wrapper script can work around the issue)
Works great when there aren't too many parameters

(to avoid a 20-lines args array)
Requires documentation and/or understanding of the underlying program

("which parameters and flags do I need, again?")
Well-suited for mandatory parameters (without default values)
Not ideal when we need to pass a real configuration file anyway

k8s/configuration.md

617 / 724

Environment variables

Pass options through the env map in the container specification

Example:

  env:
  - name: ADMIN_PORT
    value: "8080"
  - name: ADMIN_AUTH
    value: Basic
  - name: ADMIN_CRED
    value: "admin:0pensesame!"

value must be a string! Make sure that numbers and fancy strings are quoted.

🤔 Why this weird {name: xxx, value: yyy} scheme? It will be revealed soon!

k8s/configuration.md

618 / 724

The downward API

In the previous example, environment variables have fixed values
We can also use a mechanism called the downward API
The downward API allows exposing pod or container information
- either through special files (we won't show that for now)
- or through environment variables
The value of these environment variables is computed when the container is started
Remember: environment variables won't (can't) change after container start
Let's see a few concrete examples!

k8s/configuration.md

619 / 724

Exposing the pod's namespace

    - name: MY_POD_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace

Useful to generate FQDN of services

(in some contexts, a short name is not enough)

For instance, the two commands should be equivalent:

curl api-backend
curl api-backend.$MY_POD_NAMESPACE.svc.cluster.local

k8s/configuration.md

620 / 724

Exposing the pod's IP address

    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP

Useful if we need to know our IP address

(we could also read it from eth0, but this is more solid)

k8s/configuration.md

621 / 724

Exposing the container's resource limits

    - name: MY_MEM_LIMIT
      valueFrom:
        resourceFieldRef:
          containerName: test-container
          resource: limits.memory

Useful for runtimes where memory is garbage collected
Example: the JVM

(the memory available to the JVM should be set with the -Xmx flag)
Best practice: set a memory limit, and pass it to the runtime

(see this blog post for a detailed example)

k8s/configuration.md

622 / 724

More about the downward API

This documentation page tells more about these environment variables
And this one explains the other way to use the downward API

(through files that get created in the container filesystem)

k8s/configuration.md

623 / 724

Environment variables, pros and cons

Works great when the running program expects these variables
Works great for optional parameters with reasonable defaults

(since the container image can provide these defaults)
Sort of auto-documented

(we can see which environment variables are defined in the image, and their values)
Can be (ab)used with longer values ...
... You can put an entire Tomcat configuration file in an environment ...
... But should you?

(Do it if you really need to, we're not judging! But we'll see better ways.)

k8s/configuration.md

624 / 724

Injecting configuration files

Sometimes, there is no way around it: we need to inject a full config file
Kubernetes provides a mechanism for that purpose: configmaps
A configmap is a Kubernetes resource that exists in a namespace
Conceptually, it's a key/value map

(values are arbitrary strings)
We can think about them in (at least) two different ways:
- as holding entire configuration file(s)
- as holding individual configuration parameters

Note: to hold sensitive information, we can use "Secrets", which are another type of resource behaving very much like configmaps. We'll cover them just after!

k8s/configuration.md

625 / 724

Configmaps storing entire files

In this case, each key/value pair corresponds to a configuration file
Key = name of the file
Value = content of the file
There can be one key/value pair, or as many as necessary

(for complex apps with multiple configuration files)

Examples:

# Create a configmap with a single key, "app.conf"
kubectl create configmap my-app-config --from-file=app.conf
# Create a configmap with a single key, "app.conf" but another file
kubectl create configmap my-app-config --from-file=app.conf=app-prod.conf
# Create a configmap with multiple keys (one per file in the config.d directory)
kubectl create configmap my-app-config --from-file=config.d/

k8s/configuration.md

626 / 724

Configmaps storing individual parameters

In this case, each key/value pair corresponds to a parameter
Key = name of the parameter
Value = value of the parameter

Examples:

# Create a configmap with two keys
kubectl create cm my-app-config \
  --from-literal=foreground=red \
  --from-literal=background=blue
# Create a configmap from a file containing key=val pairs
kubectl create cm my-app-config \
  --from-env-file=app.conf

k8s/configuration.md

627 / 724

Exposing configmaps to containers

Configmaps can be exposed as plain files in the filesystem of a container
- this is achieved by declaring a volume and mounting it in the container
- this is particularly effective for configmaps containing whole files
Configmaps can be exposed as environment variables in the container
- this is achieved with the downward API
- this is particularly effective for configmaps containing individual parameters
Let's see how to do both!

k8s/configuration.md

628 / 724

Passing a configuration file with a configmap

We will start a load balancer powered by HAProxy
We will use the official haproxy image
It expects to find its configuration in /usr/local/etc/haproxy/haproxy.cfg
We will provide a simple HAproxy configuration, k8s/haproxy.cfg
It listens on port 80, and load balances connections between IBM and Google

k8s/configuration.md

629 / 724

Creating the configmap

Go to the k8s directory in the repository:
```
cd ~/container.training/k8s
```
Create a configmap named haproxy and holding the configuration file:
```
kubectl create configmap haproxy --from-file=haproxy.cfg
```
Check what our configmap looks like:
```
kubectl get configmap haproxy -o yaml
```

k8s/configuration.md

630 / 724

Using the configmap

We are going to use the following pod definition:

apiVersion: v1
kind: Pod
metadata:
  name: haproxy
spec:
  volumes:
  - name: config
    configMap:
      name: haproxy
  containers:
  - name: haproxy
    image: haproxy
    volumeMounts:
    - name: config
      mountPath: /usr/local/etc/haproxy/

k8s/configuration.md

631 / 724

Using the configmap

The resource definition from the previous slide is in k8s/haproxy.yaml

Create the HAProxy pod:

kubectl apply -f ~/container.training/k8s/haproxy.yaml

Check the IP address allocated to the pod:

kubectl get pod haproxy -o wide
IP=$(kubectl get pod haproxy -o json | jq -r .status.podIP)

k8s/configuration.md

632 / 724

Testing our load balancer

The load balancer will send:
- half of the connections to Google
- the other half to IBM

Access the load balancer a few times:
```
curl $IP
curl $IP
curl $IP
```

We should see connections served by Google, and others served by IBM.
(Each server sends us a redirect page. Look at the URL that they send us to!)

k8s/configuration.md

633 / 724

Exposing configmaps with the downward API

We are going to run a Docker registry on a custom port
By default, the registry listens on port 5000
This can be changed by setting environment variable REGISTRY_HTTP_ADDR
We are going to store the port number in a configmap
Then we will expose that configmap as a container environment variable

k8s/configuration.md

634 / 724

Creating the configmap

Our configmap will have a single key, http.addr:

kubectl create configmap registry --from-literal=http.addr=0.0.0.0:80

Check our configmap:
```
kubectl get configmap registry -o yaml
```

k8s/configuration.md

635 / 724

Using the configmap

We are going to use the following pod definition:

apiVersion: v1
kind: Pod
metadata:
  name: registry
spec:
  containers:
  - name: registry
    image: registry
    env:
    - name: REGISTRY_HTTP_ADDR
      valueFrom:
        configMapKeyRef:
          name: registry
          key: http.addr

k8s/configuration.md

636 / 724

Using the configmap

The resource definition from the previous slide is in k8s/registry.yaml

Create the registry pod:

kubectl apply -f ~/container.training/k8s/registry.yaml

Check the IP address allocated to the pod:

kubectl get pod registry -o wide
IP=$(kubectl get pod registry -o json | jq -r .status.podIP)

Confirm that the registry is available on port 80:
```
curl $IP/v2/_catalog
```

k8s/configuration.md

637 / 724

Passwords, tokens, sensitive information

For sensitive information, there is another special resource: Secrets
Secrets and Configmaps work almost the same way

(we'll expose the differences on the next slide)
The intent is different, though:

"You should use secrets for things which are actually secret like API keys, credentials, etc., and use config map for not-secret configuration data."

"In the future there will likely be some differentiators for secrets like rotation or support for backing the secret API w/ HSMs, etc."

(Source: the author of both features)

k8s/configuration.md

638 / 724

Differences between configmaps and secrets

Secrets are base64-encoded when shown with kubectl get secrets -o yaml
- keep in mind that this is just encoding, not encryption
- it is very easy to automatically extract and decode secrets
Secrets can be encrypted at rest
With RBAC, we can authorize a user to access configmaps, but not secrets

(since they are two different kinds of resources)

k8s/configuration.md

639 / 724

Image separating from the next chapter

640 / 724

Stateful sets

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

641 / 724

Stateful sets

Stateful sets are a type of resource in the Kubernetes API

(like pods, deployments, services...)
They offer mechanisms to deploy scaled stateful applications
At a first glance, they look like deployments:
- a stateful set defines a pod spec and a number of replicas R
- it will make sure that R copies of the pod are running
- that number can be changed while the stateful set is running
- updating the pod spec will cause a rolling update to happen
But they also have some significant differences

k8s/statefulsets.md

642 / 724

Stateful sets unique features

Pods in a stateful set are numbered (from 0 to R-1) and ordered
They are started and updated in order (from 0 to R-1)
A pod is started (or updated) only when the previous one is ready
They are stopped in reverse order (from R-1 to 0)
Each pod know its identity (i.e. which number it is in the set)
Each pod can discover the IP address of the others easily
The pods can persist data on attached volumes

🤔 Wait a minute ... Can't we already attach volumes to pods and deployments?

k8s/statefulsets.md

643 / 724

Revisiting volumes

Volumes are used for many purposes:
- sharing data between containers in a pod
- exposing configuration information and secrets to containers
- accessing storage systems
Let's see examples of the latter usage

k8s/statefulsets.md

644 / 724

Volumes types

There are many types of volumes available:
- public cloud storage (GCEPersistentDisk, AWSElasticBlockStore, AzureDisk...)
- private cloud storage (Cinder, VsphereVolume...)
- traditional storage systems (NFS, iSCSI, FC...)
- distributed storage (Ceph, Glusterfs, Portworx...)
Using a persistent volume requires:
- creating the volume out-of-band (outside of the Kubernetes API)
- referencing the volume in the pod description, with all its parameters

k8s/statefulsets.md

645 / 724

Using a cloud volume

Here is a pod definition using an AWS EBS volume (that has to be created first):

apiVersion: v1
kind: Pod
metadata:
  name: pod-using-my-ebs-volume
spec:
  containers:
  - image: ...
    name: container-using-my-ebs-volume
    volumeMounts:
    - mountPath: /my-ebs
      name: my-ebs-volume
  volumes:
  - name: my-ebs-volume
    awsElasticBlockStore:
      volumeID: vol-049df61146c4d7901
      fsType: ext4

k8s/statefulsets.md

646 / 724

Using an NFS volume

Here is another example using a volume on an NFS server:

apiVersion: v1
kind: Pod
metadata:
  name: pod-using-my-nfs-volume
spec:
  containers:
  - image: ...
    name: container-using-my-nfs-volume
    volumeMounts:
    - mountPath: /my-nfs
      name: my-nfs-volume
  volumes:
  - name: my-nfs-volume
      nfs:
        server: 192.168.0.55
        path: "/exports/assets"

k8s/statefulsets.md

647 / 724

Shortcomings of volumes

Their lifecycle (creation, deletion...) is managed outside of the Kubernetes API

(we can't just use kubectl apply/create/delete/... to manage them)
If a Deployment uses a volume, all replicas end up using the same volume
That volume must then support concurrent access
- some volumes do (e.g. NFS servers support multiple read/write access)
- some volumes support concurrent reads
- some volumes support concurrent access for colocated pods
What we really need is a way for each replica to have its own volume

k8s/statefulsets.md

648 / 724

Persistent Volume Claims

To abstract the different types of storage, a pod can use a special volume type
This type is a Persistent Volume Claim
A Persistent Volume Claim (PVC) is a resource type

(visible with kubectl get persistentvolumeclaims or kubectl get pvc)
A PVC is not a volume; it is a request for a volume

k8s/statefulsets.md

649 / 724

Persistent Volume Claims in practice

Using a Persistent Volume Claim is a two-step process:
- creating the claim
- using the claim in a pod (as if it were any other kind of volume)
A PVC starts by being Unbound (without an associated volume)
Once it is associated with a Persistent Volume, it becomes Bound
A Pod referring an unbound PVC will not start

(but as soon as the PVC is bound, the Pod can start)

k8s/statefulsets.md

650 / 724

Binding PV and PVC

A Kubernetes controller continuously watches PV and PVC objects
When it notices an unbound PVC, it tries to find a satisfactory PV

("satisfactory" in terms of size and other characteristics; see next slide)
If no PV fits the PVC, a PV can be created dynamically

(this requires to configure a dynamic provisioner, more on that later)
Otherwise, the PVC remains unbound indefinitely

(until we manually create a PV or setup dynamic provisioning)

k8s/statefulsets.md

651 / 724

What's in a Persistent Volume Claim?

At the very least, the claim should indicate:
- the size of the volume (e.g. "5 GiB")
- the access mode (e.g. "read-write by a single pod")
Optionally, it can also specify a Storage Class
The Storage Class indicates:
- which storage system to use (e.g. Portworx, EBS...)
- extra parameters for that storage system
e.g.: "replicate the data 3 times, and use SSD media"

k8s/statefulsets.md

652 / 724

What's a Storage Class?

A Storage Class is yet another Kubernetes API resource

(visible with e.g. kubectl get storageclass or kubectl get sc)
It indicates which provisioner to use

(which controller will create the actual volume)
And arbitrary parameters for that provisioner

(replication levels, type of disk ... anything relevant!)
Storage Classes are required if we want to use dynamic provisioning

(but we can also create volumes manually, and ignore Storage Classes)

k8s/statefulsets.md

653 / 724

Defining a Persistent Volume Claim

Here is a minimal PVC:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
   name: my-claim
spec:
   accessModes:
     - ReadWriteOnce
   resources:
     requests:
       storage: 1Gi

k8s/statefulsets.md

654 / 724

Using a Persistent Volume Claim

Here is a Pod definition like the ones shown earlier, but using a PVC:

apiVersion: v1
kind: Pod
metadata:
  name: pod-using-a-claim
spec:
  containers:
  - image: ...
    name: container-using-a-claim
    volumeMounts:
    - mountPath: /my-vol
      name: my-volume
  volumes:
  - name: my-volume
    persistentVolumeClaim:
      claimName: my-claim

k8s/statefulsets.md

655 / 724

Persistent Volume Claims and Stateful sets

The pods in a stateful set can define a volumeClaimTemplate
A volumeClaimTemplate will dynamically create one Persistent Volume Claim per pod
Each pod will therefore have its own volume
These volumes are numbered (like the pods)
When updating the stateful set (e.g. image upgrade), each pod keeps its volume
When pods get rescheduled (e.g. node failure), they keep their volume

(this requires a storage system that is not node-local)
These volumes are not automatically deleted

(when the stateful set is scaled down or deleted)

k8s/statefulsets.md

656 / 724

Stateful set recap

A Stateful sets manages a number of identical pods

(like a Deployment)
These pods are numbered, and started/upgraded/stopped in a specific order
These pods are aware of their number

(e.g., #0 can decide to be the primary, and #1 can be secondary)
These pods can find the IP addresses of the other pods in the set

(through a headless service)
These pods can each have their own persistent storage

(Deployments cannot do that)

k8s/statefulsets.md

657 / 724

Image separating from the next chapter

658 / 724

Running a Consul cluster

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

659 / 724

Running a Consul cluster

Here is a good use-case for Stateful sets!
We are going to deploy a Consul cluster with 3 nodes
Consul is a highly-available key/value store

(like etcd or Zookeeper)
One easy way to bootstrap a cluster is to tell each node:
- the addresses of other nodes
- how many nodes are expected (to know when quorum is reached)

k8s/statefulsets.md

660 / 724

Bootstrapping a Consul cluster

After reading the Consul documentation carefully (and/or asking around), we figure out the minimal command-line to run our Consul cluster.

consul agent -data=dir=/consul/data -client=0.0.0.0 -server -ui \
       -bootstrap-expect=3 \
       -retry-join=X.X.X.X \
       -retry-join=Y.Y.Y.Y

Replace X.X.X.X and Y.Y.Y.Y with the addresses of other nodes
The same command-line can be used on all nodes (convenient!)

k8s/statefulsets.md

661 / 724

Cloud Auto-join

Since version 1.4.0, Consul can use the Kubernetes API to find its peers
This is called Cloud Auto-join
Instead of passing an IP address, we need to pass a parameter like this:
```
consul agent -retry-join "provider=k8s label_selector=\"app=consul\""
```
Consul needs to be able to talk to the Kubernetes API
We can provide a kubeconfig file
If Consul runs in a pod, it will use the service account of the pod k8s/statefulsets.md

662 / 724

Setting up Cloud auto-join

We need to create a service account for Consul
We need to create a role that can list and get pods
We need to bind that role to the service account
And of course, we need to make sure that Consul pods use that service account

k8s/statefulsets.md

663 / 724

Putting it all together

The file k8s/consul.yaml defines the required resources

(service account, cluster role, cluster role binding, service, stateful set)
It has a few extra touches:
- a podAntiAffinity prevents two pods from running on the same node
- a preStop hook makes the pod leave the cluster when shutdown gracefully

This was inspired by this excellent tutorial by Kelsey Hightower. Some features from the original tutorial (TLS authentication between nodes and encryption of gossip traffic) were removed for simplicity.

k8s/statefulsets.md

664 / 724

Running our Consul cluster

We'll use the provided YAML file

Create the stateful set and associated service:

kubectl apply -f ~/container.training/k8s/consul.yaml

Check the logs as the pods come up one after another:
```
stern consul
```

Check the health of the cluster:
```
kubectl exec consul-0 consul members
```

k8s/statefulsets.md

665 / 724

Caveats

We haven't used a volumeClaimTemplate here
That's because we don't have a storage provider yet

(except if you're running this on your own and your cluster has one)
What happens if we lose a pod?
- a new pod gets rescheduled (with an empty state)
- the new pod tries to connect to the two others
- it will be accepted (after 1-2 minutes of instability)
- and it will retrieve the data from the other pods

k8s/statefulsets.md

666 / 724

Failure modes

What happens if we lose two pods?
- manual repair will be required
- we will need to instruct the remaining one to act solo
- then rejoin new pods
What happens if we lose three pods? (aka all of them)
- we lose all the data (ouch)
If we run Consul without persistent storage, backups are a good idea!

k8s/statefulsets.md

667 / 724

Image separating from the next chapter

668 / 724

Local Persistent Volumes

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

669 / 724

Local Persistent Volumes

We want to run that Consul cluster and actually persist data
But we don't have a distributed storage system
We are going to use local volumes instead

(similar conceptually to hostPath volumes)
We can use local volumes without installing extra plugins
However, they are tied to a node
If that node goes down, the volume becomes unavailable

k8s/local-persistent-volumes.md

670 / 724

With or without dynamic provisioning

We will deploy a Consul cluster with persistence
That cluster's StatefulSet will create PVCs
These PVCs will remain unbound¹, until we will create local volumes manually

(we will basically do the job of the dynamic provisioner)
Then, we will see how to automate that with a dynamic provisioner

¹Unbound = without an associated Persistent Volume.

k8s/local-persistent-volumes.md

671 / 724

If we have a dynamic provisioner ...

The labs in this section assume that we do not have a dynamic provisioner
If we do have one, we need to disable it

Check if we have a dynamic provisioner:
```
kubectl get storageclass
```

If the output contains a line with (default), run this command:

kubectl annotate sc storageclass.kubernetes.io/is-default-class- --all

Check again that it is no longer marked as (default)

k8s/local-persistent-volumes.md

672 / 724

Work in a separate namespace

To avoid conflicts with existing resources, let's create and use a new namespace

Create a new namespace:
```
kubectl create namespace orange
```
Switch to that namespace:
```
kns orange
```

Make sure to call that namespace orange: it is hardcoded in the YAML files.

k8s/local-persistent-volumes.md

673 / 724

Deploying Consul

We will use a slightly different YAML file
The only differences between that file and the previous one are:
- volumeClaimTemplate defined in the Stateful Set spec
- the corresponding volumeMounts in the Pod spec
- the namespace orange used for discovery of Pods

Apply the persistent Consul YAML file:

kubectl apply -f ~/container.training/k8s/persistent-consul.yaml

k8s/local-persistent-volumes.md

674 / 724

Observing the situation

Let's look at Persistent Volume Claims and Pods

Check that we now have an unbound Persistent Volume Claim:
```
kubectl get pvc
```
We don't have any Persistent Volume:
```
kubectl get pv
```
The Pod consul-0 is not scheduled yet:
```
kubectl get pods -o wide
```

Hint: leave these commands running with -w in different windows.

k8s/local-persistent-volumes.md

675 / 724

Explanations

In a Stateful Set, the Pods are started one by one
consul-1 won't be created until consul-0 is running
consul-0 has a dependency on an unbound Persistent Volume Claim
The scheduler won't schedule the Pod until the PVC is bound

(because the PVC might be bound to a volume that is only available on a subset of nodes; for instance EBS are tied to an availability zone)

k8s/local-persistent-volumes.md

676 / 724

Creating Persistent Volumes

Let's create 3 local directories (/mnt/consul) on node2, node3, node4
Then create 3 Persistent Volumes corresponding to these directories

Create the local directories:

for NODE in node2 node3 node4; do
  ssh $NODE sudo mkdir -p /mnt/consul
done

Create the PV objects:

kubectl apply -f ~/container.training/k8s/volumes-for-consul.yaml

k8s/local-persistent-volumes.md

677 / 724

Check our Consul cluster

The PVs that we created will be automatically matched with the PVCs
Once a PVC is bound, its pod can start normally
Once the pod consul-0 has started, consul-1 can be created, etc.
Eventually, our Consul cluster is up, and backend by "persistent" volumes

Check that our Consul clusters has 3 members indeed:
```
kubectl exec consul-0 consul members
```

k8s/local-persistent-volumes.md

678 / 724

Devil is in the details (1/2)

The size of the Persistent Volumes is bogus

(it is used when matching PVs and PVCs together, but there is no actual quota or limit)

k8s/local-persistent-volumes.md

679 / 724

Devil is in the details (2/2)

This specific example worked because we had exactly 1 free PV per node:
- if we had created multiple PVs per node ...
- we could have ended with two PVCs bound to PVs on the same node ...
- which would have required two pods to be on the same node ...
- which is forbidden by the anti-affinity constraints in the StatefulSet
To avoid that, we need to associated the PVs with a Storage Class that has:
```
volumeBindingMode: WaitForFirstConsumer
```
(this means that a PVC will be bound to a PV only after being used by a Pod)
See this blog post for more details

k8s/local-persistent-volumes.md

680 / 724

Bulk provisioning

It's not practical to manually create directories and PVs for each app
We could pre-provision a number of PVs across our fleet
We could even automate that with a Daemon Set:
- creating a number of directories on each node
- creating the corresponding PV objects
We also need to recycle volumes
... This can quickly get out of hand

k8s/local-persistent-volumes.md

681 / 724

Dynamic provisioning

We could also write our own provisioner, which would:
- watch the PVCs across all namespaces
- when a PVC is created, create a corresponding PV on a node
Or we could use one of the dynamic provisioners for local persistent volumes

(for instance the Rancher local path provisioner)

k8s/local-persistent-volumes.md

682 / 724

Strategies for local persistent volumes

Remember, when a node goes down, the volumes on that node become unavailable
High availability will require another layer of replication

(like what we've just seen with Consul; or primary/secondary; etc)
Pre-provisioning PVs makes sense for machines with local storage

(e.g. cloud instance storage; or storage directly attached to a physical machine)
Dynamic provisioning makes sense for large number of applications

(when we can't or won't dedicate a whole disk to a volume)
It's possible to mix both (using distinct Storage Classes)

k8s/local-persistent-volumes.md

683 / 724

Image separating from the next chapter

684 / 724

Static pods

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

685 / 724

Static pods

Hosting the Kubernetes control plane on Kubernetes has advantages:
- we can use Kubernetes' replication and scaling features for the control plane
- we can leverage rolling updates to upgrade the control plane
However, there is a catch:
- deploying on Kubernetes requires the API to be available
- the API won't be available until the control plane is deployed
How can we get out of that chicken-and-egg problem?

k8s/staticpods.md

686 / 724

A possible approach

Since each component of the control plane can be replicated...
We could set up the control plane outside of the cluster
Then, once the cluster is fully operational, create replicas running on the cluster
Finally, remove the replicas that are running outside of the cluster

What could possibly go wrong?

k8s/staticpods.md

687 / 724

Sawing off the branch you're sitting on

What if anything goes wrong?

(During the setup or at a later point)
Worst case scenario, we might need to:
- set up a new control plane (outside of the cluster)
- restore a backup from the old control plane
- move the new control plane to the cluster (again)
This doesn't sound like a great experience

k8s/staticpods.md

688 / 724

Static pods to the rescue

Pods are started by kubelet (an agent running on every node)
To know which pods it should run, the kubelet queries the API server
The kubelet can also get a list of static pods from:
- a directory containing one (or multiple) manifests, and/or
- a URL (serving a manifest)
These "manifests" are basically YAML definitions

(As produced by kubectl get pod my-little-pod -o yaml)

k8s/staticpods.md

689 / 724

Static pods are dynamic

Kubelet will periodically reload the manifests
It will start/stop pods accordingly

(i.e. it is not necessary to restart the kubelet after updating the manifests)
When connected to the Kubernetes API, the kubelet will create mirror pods
Mirror pods are copies of the static pods

(so they can be seen with e.g. kubectl get pods)

k8s/staticpods.md

690 / 724

Bootstrapping a cluster with static pods

We can run control plane components with these static pods
They can start without requiring access to the API server
Once they are up and running, the API becomes available
These pods are then visible through the API

(We cannot upgrade them from the API, though)

This is how kubeadm has initialized our clusters.

k8s/staticpods.md

691 / 724

Static pods vs normal pods

The API only gives us read-only access to static pods
We can kubectl delete a static pod...

...But the kubelet will re-mirror it immediately
Static pods can be selected just like other pods

(So they can receive service traffic)
A service can select a mixture of static and other pods

k8s/staticpods.md

692 / 724

From static pods to normal pods

Once the control plane is up and running, it can be used to create normal pods
We can then set up a copy of the control plane in normal pods
Then the static pods can be removed
The scheduler and the controller manager use leader election

(Only one is active at a time; removing an instance is seamless)
Each instance of the API server adds itself to the kubernetes service
Etcd will typically require more work!

k8s/staticpods.md

693 / 724

From normal pods back to static pods

Alright, but what if the control plane is down and we need to fix it?
We restart it using static pods!
This can be done automatically with the Pod Checkpointer
The Pod Checkpointer automatically generates manifests of running pods
The manifests are used to restart these pods if API contact is lost

(More details in the Pod Checkpointer documentation page)
This technique is used by bootkube k8s/staticpods.md

694 / 724

Where should the control plane run?

Is it better to run the control plane in static pods, or normal pods?

If I'm a user of the cluster: I don't care, it makes no difference to me
What if I'm an admin, i.e. the person who installs, upgrades, repairs... the cluster?
If I'm using a managed Kubernetes cluster (AKS, EKS, GKE...) it's not my problem

(I'm not the one setting up and managing the control plane)
If I already picked a tool (kubeadm, kops...) to set up my cluster, the tool decides for me
What if I haven't picked a tool yet, or if I'm installing from scratch?
- static pods = easier to set up, easier to troubleshoot, less risk of outage
- normal pods = easier to upgrade, easier to move (if nodes need to be shut down)

k8s/staticpods.md

695 / 724

Static pods in action

On our clusters, the staticPodPath is /etc/kubernetes/manifests

Have a look at this directory:
```
ls -l /etc/kubernetes/manifests
```

We should see YAML files corresponding to the pods of the control plane.

k8s/staticpods.md

696 / 724

Running a static pod

We are going to add a pod manifest to the directory, and kubelet will run it

Copy a manifest to the directory:

sudo cp ~/container.training/k8s/just-a-pod.yaml /etc/kubernetes/manifests

Check that it's running:
```
kubectl get pods
```

The output should include a pod named hello-node1.

k8s/staticpods.md

697 / 724

Remarks

In the manifest, the pod was named hello.

apiVersion: v1
Kind: Pod
metadata:
  name: hello
  namespace: default
spec:
  containers:
  - name: hello
    image: nginx

The -node1 suffix was added automatically by kubelet.

If we delete the pod (with kubectl delete), it will be recreated immediately.

To delete the pod, we need to delete (or move) the manifest file.

k8s/staticpods.md

698 / 724

Image separating from the next chapter

699 / 724

Next steps

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

700 / 724

Next steps

Alright, how do I get started and containerize my apps?

701 / 724

Next steps

Alright, how do I get started and containerize my apps?

Suggested containerization checklist:

write a Dockerfile for one service in one app
write Dockerfiles for the other (buildable) services
write a Compose file for that whole app
make sure that devs are empowered to run the app in containers
set up automated builds of container images from the code repo
set up a CI pipeline using these container images
set up a CD pipeline (for staging/QA) using these images

And then it is time to look at orchestration!

k8s/whatsnext.md

702 / 724

Options for our first production cluster

Get a managed cluster from a major cloud provider (AKS, EKS, GKE...)

(price: $, difficulty: medium)
Hire someone to deploy it for us

(price: $$, difficulty: easy)
Do it ourselves

(price: $-$$$, difficulty: hard)

k8s/whatsnext.md

703 / 724

One big cluster vs. multiple small ones

Yes, it is possible to have prod+dev in a single cluster

(and implement good isolation and security with RBAC, network policies...)
But it is not a good idea to do that for our first deployment
Start with a production cluster + at least a test cluster
Implement and check RBAC and isolation on the test cluster

(e.g. deploy multiple test versions side-by-side)
Make sure that all our devs have usable dev clusters

(whether it's a local minikube or a full-blown multi-node cluster)

k8s/whatsnext.md

704 / 724

Namespaces

Namespaces let you run multiple identical stacks side by side
Two namespaces (e.g. blue and green) can each have their own redis service
Each of the two redis services has its own ClusterIP
CoreDNS creates two entries, mapping to these two ClusterIP addresses:

redis.blue.svc.cluster.local and redis.green.svc.cluster.local
Pods in the blue namespace get a search suffix of blue.svc.cluster.local
As a result, resolving redis from a pod in the blue namespace yields the "local" redis

This does not provide isolation! That would be the job of network policies.

k8s/whatsnext.md

705 / 724

Relevant sections

Namespaces
Network Policies
Role-Based Access Control

(covers permissions model, user and service accounts management ...)

k8s/whatsnext.md

706 / 724

Stateful services (databases etc.)

As a first step, it is wiser to keep stateful services outside of the cluster
Exposing them to pods can be done with multiple solutions:
- ExternalName services
  (redis.blue.svc.cluster.local will be a CNAME record)
- ClusterIP services with explicit Endpoints
  (instead of letting Kubernetes generate the endpoints from a selector)
- Ambassador services
  (application-level proxies that can provide credentials injection and more)

k8s/whatsnext.md

707 / 724

Stateful services (second take)

If we want to host stateful services on Kubernetes, we can use:
- a storage provider
- persistent volumes, persistent volume claims
- stateful sets
Good questions to ask:
- what's the operational cost of running this service ourselves?
- what do we gain by deploying this stateful service on Kubernetes?
Relevant sections: Volumes | Stateful Sets | Persistent Volumes
Excellent blog post tackling the question: “Should I run Postgres on Kubernetes?”

k8s/whatsnext.md

708 / 724

HTTP traffic handling

Services are layer 4 constructs
HTTP is a layer 7 protocol
It is handled by ingresses (a different resource kind)
Ingresses allow:
- virtual host routing
- session stickiness
- URI mapping
- and much more!
This section shows how to expose multiple HTTP apps using Træfik

k8s/whatsnext.md

709 / 724

Logging

Logging is delegated to the container engine
Logs are exposed through the API
Logs are also accessible through local files (/var/log/containers)
Log shipping to a central platform is usually done through these files

(e.g. with an agent bind-mounting the log directory)
This section shows how to do that with Fluentd and the EFK stack

k8s/whatsnext.md

710 / 724

Metrics

The kubelet embeds cAdvisor, which exposes container metrics

(cAdvisor might be separated in the future for more flexibility)
It is a good idea to start with Prometheus

(even if you end up using something else)
Starting from Kubernetes 1.8, we can use the Metrics API
Heapster was a popular add-on

(but is being deprecated starting with Kubernetes 1.11)

k8s/whatsnext.md

711 / 724

Managing the configuration of our applications

Two constructs are particularly useful: secrets and config maps
They allow to expose arbitrary information to our containers
Avoid storing configuration in container images

(There are some exceptions to that rule, but it's generally a Bad Idea)
Never store sensitive information in container images

(It's the container equivalent of the password on a post-it note on your screen)
This section shows how to manage app config with config maps (among others)

k8s/whatsnext.md

712 / 724

Managing stack deployments

The best deployment tool will vary, depending on:
- the size and complexity of your stack(s)
- how often you change it (i.e. add/remove components)
- the size and skills of your team
A few examples:
- shell scripts invoking kubectl
- YAML resources descriptions committed to a repo
- Helm (~package manager)
- Spinnaker (Netflix' CD platform)
- Brigade (event-driven scripting; no YAML)

k8s/whatsnext.md

713 / 724

Cluster federation714 / 724

Cluster federation

Star Trek Federation

715 / 724

Cluster federation

Star Trek Federation

Sorry Star Trek fans, this is not the federation you're looking for!

716 / 724

Cluster federation

Star Trek Federation

Sorry Star Trek fans, this is not the federation you're looking for!

(If I add "Your cluster is in another federation" I might get a 3rd fandom wincing!)

k8s/whatsnext.md

717 / 724

Cluster federation

Kubernetes master operation relies on etcd
etcd uses the Raft protocol
Raft recommends low latency between nodes
What if our cluster spreads to multiple regions?

718 / 724

Cluster federation

Kubernetes master operation relies on etcd
etcd uses the Raft protocol
Raft recommends low latency between nodes
What if our cluster spreads to multiple regions?
Break it down in local clusters
Regroup them in a cluster federation
Synchronize resources across clusters
Discover resources across clusters

k8s/whatsnext.md

719 / 724

Developer experience

We've put this last, but it's pretty important!

How do you on-board a new developer?
What do they need to install to get a dev stack?
How does a code change make it from dev to prod?
How does someone add a component to a stack?

k8s/whatsnext.md

720 / 724

Image separating from the next chapter

721 / 724

Links and resources

Section préc. | Retour à la table des matières | Section suivante

(automatically generated title slide)

722 / 724

Links and resources

All things Kubernetes:

Kubernetes Community - Slack, Google Groups, meetups
Kubernetes on StackOverflow
Play With Kubernetes Hands-On Labs

All things Docker:

Everything else:

Local meetups

These slides (and future updates) are on → http://container.training/

k8s/links.md

723 / 724

C'est tout pour aujourd'hui!
Des questions?

end

shared/thankyou.md

724 / 724

Présentations

Bonjour, je suis:

👨🏾‍🎓 djalal (@enlamp, ENLAMP)

Cet atelier se déroulera de 9h à 17h.

La pause déjeuner se fera entre 12h et 13h30.

(avec 2 pauses café à 10h30 et 15h!)

N'hésitez pas à m'interrompre pour vos questions, à n'importe quel moment.

Surtout quand vous verrez des photos de conteneurs en plein écran!

Vos réactions en direct, questions, demande d'aide
sur https://tinyurl.com/docker-w-djalal

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Présentations

A brief introduction

À propos de ces diapositives

À propos de ces diapositives

Détails supplémentaires

Chapitre 1

Chapitre 2

Chapitre 3

Chapitre 4

Chapitre 5

Chapitre 6

Chapitre 7

Chapitre 8

Chapitre 9

Pre-requis

Sections pratiques

Où allons-nous lancer nos conteneurs?

Vous avez votre cluster de VMs dans le cloud

Pourquoi ne pas lancer nos conteneurs en local?

Clients SSH

What is this Mosh thing?

Using Mosh

Se connecter à notre environnement de test

Doing or re-doing the workshop on your own?

On travaillera (surtout) avec node1

Terminaux

Tmux cheatsheet

Versions installées

Compatibilité entre Kubernetes et Docker

Compatibilité entre Kubernetes et Docker

Notre application de démo

Télécharger et lancer l'application

Qu'est-ce que cette application?

Qu'est-ce que cette application?

Qu'est-ce que cette application?

Qu'est-ce que cette application?

Qu'est-ce que cette application?

DockerCoins à l'âge des microservices

Comment fonctionne DockerCoins

Service discovery au pays des conteneurs

Service discovery au pays des conteneurs

Exemple dans worker/worker.py

Liens, nommage et découverte de service

Montrez-moi le code!

Version du format de fichier Compose

Notre application à l'oeuvre

Se connecter à l'interface web

Pourquoi le rythme semble irrégulier?

Pourquoi le rythme semble irrégulier?

La raison qui fait que ce graphe n'est pas super

La raison qui fait que ce graphe n'est pas super

Arrêter notre application

Arrêter notre application

Nettoyage

Concepts Kubernetes

Concepts Kubernetes

Tâches de base qu'on peut demander à Kubernetes

Tâches de base qu'on peut demander à Kubernetes

Tâches de base qu'on peut demander à Kubernetes

Tâches de base qu'on peut demander à Kubernetes

Tâches de base qu'on peut demander à Kubernetes

Tâches de base qu'on peut demander à Kubernetes

Tâches de base qu'on peut demander à Kubernetes

Tâches de base qu'on peut demander à Kubernetes

D'autres choses que Kubernetes peut faire pour nous

Architecture Kubernetes

Architecture Kubernetes

Crédits

Architecture de Kubernetes: les nodes

Architecture Kubernetes: le plan de contrôle

Plan de contrôle sur des nodes spéciales

Lancer le plan de contrôle sans conteneurs

Docker est-il obligatoire à tout prix?

Docker est-il obligatoire à tout prix?

Devrait-on utiliser Docker?

Devrait-on utiliser Docker?

Devrait-on utiliser Docker?

Interagir avec Kubernetes

Crédits

Déclaratif vs Impératif

Exemple dans `worker/worker.py`

Premier contact avec `kubectl`

`kubectl get`

User et abuser de `kubectl` et `jq`

`kubectl describe`

Namespaces selon les commandes `kubectl`

Qu'en est-il de ce `kube-public`?

Explorer `kube-public`

Accéder à `cluster-info`

Récupérer `kubeconfig`

Qu'en est-il de `kube-node-lease`?

Inconvénients `kubeadm`

Inconvénients `kubeadm`

Inconvénients `kubeadm`

Démarrer un simple pod avec `kubectl run`

Démarrer un simple pod avec `kubectl run`

Dans les coulisses de `kubectl run`

Dans les coulisses de `kubectl run`

Notre déploiement `pingpong`