[EKS] Autoscaling

728x90

Amazon EKS에서의 Autoscaling은 Kubernetes 자체의 Autoscaling 기능과 AWS의 클라우드 인프라를 결합하여 더욱 강력하고 유연한 확장성하기 때문에 리소스를 효율적으로 관리하고 비용을 최적화할 수 있습니다.

HPA (Horizontal Pod Autoscaler)
- CPU 사용량과 같은 메트릭을 기반으로 파드의 수를 자동으로 조정하여, 트래픽 변화에 따라 적절한 서비스 수준을 유지합니다.
KEDA (Kubernetes Event-Driven Autoscaling)
- 이벤트 기반의 소스로부터 메트릭을 수집하여 HPA를 확장합니다. 이를 통해 이벤트 기반의 작업량에 효과적으로 대응할 수 있습니다.
VPA (Vertical Pod Autoscaler)
- 파드의 CPU와 메모리 할당량을 자동으로 조정하여, 각 파드의 자원 사용을 최적화합니다.
Cluster Autoscaler
- 클러스터 자체의 크기를 동적으로 조정하여, 현재 필요에 따라 노드를 자동으로 추가하거나 제거합니다.
Cluster Proportional Autoscaler
- 클러스터 규모에 비례하여 파드의 복제본 수를 조정하는 컴포넌트입니다. 클러스터의 크기 변화에 따라 자동으로 스케일링합니다.
Karpenter
- Karpenter는 AWS에서 개발한 오픈 소스 자동 스케일링 도구로, 워크로드 요구 사항에 따라 적절한 유형과 수의 인스턴스를 신속하게 시작하고 관리합니다.

이러한 도구들은 EKS 환경에서 자원을 더욱 유연하게 조절하여 애플리케이션의 성능을 극대화하고 비용을 절감하는 데 중요한 역할을 합니다.

1. EKS Node Viewer & Metrics Server 설치

EKS Node Viewer란?
- EKS 클러스터의 각 노드에 대한 실시간 메트릭과 상태 정보를 제공하는 도구입니다.
- 사용자는 EKS Node Viewer를 통해 각 노드의 CPU, 메모리, 스토리지 사용량과 같은 중요 지표를 확인할 수 있어, 시스템의 건강과 성능을 모니터링하는 데 유용합니다.
Metrics Server란?
- 쿠버네티스 클러스터의 파드와 노드에서 리소스 사용 데이터를 수집하는 경량 서버입니다.
- 이 서버는 클러스터의 메트릭을 수집하여 HPA(Horizontal Pod Autoscaler)와 같은 오토스케일링 도구가 사용할 수 있도록 합니다.
- Metrics Server는 클러스터의 리소스 사용량을 파악하고, 부하에 따라 자동으로 스케일링을 결정하는 데 필수적인 서비스입니다.
- HPA(horizontal pod autoscaler)나 kubectl top 명령어를 사용하려면 metrics-server를 사용해야 합니다.
EKS Node Viewer와 Metrics Server의 설치 및 설정은 쿠버네티스 클러스터의 성능 모니터링과 자원 관리를 위해 필수적인 단계이며, 이를 통해 시스템 관리자와 개발자는 EKS 환경을 더 효과적으로 운영할 수 있습니다.

1-1) EKS Node Viewer 설치

노드 할당 가능 용량과 요청 request 리소스 표시, 실제 파드 리소스 사용량 X - 링크

# go 설치
wget https://go.dev/dl/go1.22.1.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.22.1.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
go version
go version go1.22.1 linux/amd64

# EKS Node Viewer 설치 : 약 2분 이상 소요
go install github.com/awslabs/eks-node-viewer/cmd/eks-node-viewer@latest

# [신규 터미널] EKS Node Viewer 접속
cd ~/go/bin && ./eks-node-viewer
혹은
cd ~/go/bin && ./eks-node-viewer --resources cpu,memory

명령 샘플
# Standard usage
./eks-node-viewer

# Display both CPU and Memory Usage
./eks-node-viewer --resources cpu,memory

# Karenter nodes only
./eks-node-viewer --node-selector "karpenter.sh/provisioner-name"

# Display extra labels, i.e. AZ
./eks-node-viewer --extra-labels topology.kubernetes.io/zone

# Specify a particular AWS profile and region
AWS_PROFILE=myprofile AWS_REGION=us-west-2

기본 옵션
# select only Karpenter managed nodes
node-selector=karpenter.sh/provisioner-name

# display both CPU and memory
resources=cpu,memory

./eks-node-viewer --resources cpu,memory

1-2) Metrics Server 설치

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

kubectl top node

1-3) krew 설치

kubernetes cli 도구 kubectl의 플러그인을 관리하는 패키지 매니저 krew를 사용하면, 손쉽게 kubernetes 관련 플러그인들을 설치가 가능합니다.

set -x; cd "$(mktemp -d)" &&
OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
KREW="krew-${OS}_${ARCH}" &&
curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
tar zxvf "${KREW}.tar.gz" &&
./"${KREW}" install krew

export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
kubectl krew

kubectl krew install neat

2. HPA (Horizontal Pod Autoscaler) 실습

실습 사전 준비
- Grafana에서 Horizontal Pod Autoscaler 용 대시보드를 생성합니다
- 17125_rev1.json : 대시보드 → Import : JSON 내용 복붙!

{
  "__inputs": [],
  "__requires": [
    {
      "type": "grafana",
      "id": "grafana",
      "name": "Grafana",
      "version": "6.1.6"
    },
    {
      "type": "panel",
      "id": "graph",
      "name": "Graph",
      "version": ""
    },
    {
      "type": "datasource",
      "id": "prometheus",
      "name": "Prometheus",
      "version": "1.0.0"
    },
    {
      "type": "panel",
      "id": "singlestat",
      "name": "Singlestat",
      "version": ""
    }
  ],
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": 17125,
  "graphTooltip": 0,
  "id": null,
  "iteration": 1558717029334,
  "links": [],
  "panels": [
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": false,
      "colors": [
        "#299c46",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "datasource": "$datasource",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "id": 5,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": true
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "kube_horizontalpodautoscaler_status_desired_replicas{job=\"kube-state-metrics\", namespace=\"$namespace\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "thresholds": "",
      "title": "Desired Replicas",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "0",
          "value": "null"
        }
      ],
      "valueName": "current"
    },
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": false,
      "colors": [
        "#299c46",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "datasource": "$datasource",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "gridPos": {
        "h": 3,
        "w": 6,
        "x": 6,
        "y": 0
      },
      "id": 6,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": true
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "kube_horizontalpodautoscaler_status_current_replicas{job=\"kube-state-metrics\", namespace=\"$namespace\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "thresholds": "",
      "title": "Current Replicas",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "0",
          "value": "null"
        }
      ],
      "valueName": "current"
    },
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": false,
      "colors": [
        "#299c46",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "datasource": "$datasource",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "gridPos": {
        "h": 3,
        "w": 6,
        "x": 12,
        "y": 0
      },
      "id": 7,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": false
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "kube_horizontalpodautoscaler_spec_min_replicas{job=\"kube-state-metrics\",  namespace=\"$namespace\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "thresholds": "",
      "title": "Min Replicas",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "0",
          "value": "null"
        }
      ],
      "valueName": "current"
    },
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": false,
      "colors": [
        "#299c46",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "datasource": "$datasource",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "gridPos": {
        "h": 3,
        "w": 6,
        "x": 18,
        "y": 0
      },
      "id": 8,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": false
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "kube_horizontalpodautoscaler_spec_max_replicas{job=\"kube-state-metrics\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "thresholds": "",
      "title": "Max Replicas",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "0",
          "value": "null"
        }
      ],
      "valueName": "current"
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "$datasource",
      "fill": 0,
      "gridPos": {
        "h": 12,
        "w": 24,
        "x": 0,
        "y": 3
      },
      "id": 9,
      "legend": {
        "alignAsTable": false,
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "rightSide": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "paceLength": 10,
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "repeat": null,
      "seriesOverrides": [
        {
          "alias": "Max",
          "color": "#C4162A"
        },
        {
          "alias": "Min",
          "color": "#1F60C4"
        }
      ],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "kube_horizontalpodautoscaler_status_desired_replicas{job=\"kube-state-metrics\",namespace=\"$namespace\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "Desired",
          "refId": "B"
        },
        {
          "expr": "kube_horizontalpodautoscaler_status_current_replicas{job=\"kube-state-metrics\",namespace=\"$namespace\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "Running",
          "refId": "C"
        },
        {
          "expr": "kube_horizontalpodautoscaler_spec_max_replicas{job=\"kube-state-metrics\",namespace=\"$namespace\"}",
          "format": "time_series",
          "instant": false,
          "intervalFactor": 2,
          "legendFormat": "Max",
          "refId": "A"
        },
        {
          "expr": "kube_horizontalpodautoscaler_spec_min_replicas{job=\"kube-state-metrics\",namespace=\"$namespace\"}",
          "format": "time_series",
          "instant": false,
          "intervalFactor": 2,
          "legendFormat": "Min",
          "refId": "D"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Replicas",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "refresh": "10s",
  "schemaVersion": 18,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "current": {
          "text": "Prometheus",
          "value": "Prometheus"
        },
        "hide": 0,
        "includeAll": false,
        "label": null,
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "allValue": null,
        "current": {},
        "datasource": "$datasource",
        "definition": "label_values(kube_horizontalpodautoscaler_metadata_generation{job=\"kube-state-metrics\"}, namespace)",
        "hide": 0,
        "includeAll": false,
        "label": "Namespace",
        "multi": false,
        "name": "namespace",
        "options": [],
        "query": "label_values(kube_horizontalpodautoscaler_metadata_generation{job=\"kube-state-metrics\"}, namespace)",
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
        "allValue": null,
        "current": {},
        "datasource": "$datasource",
        "definition": "label_values(kube_horizontalpodautoscaler_labels{job=\"kube-state-metrics\", namespace=\"$namespace\"}, horizontalpodautoscaler)",
        "hide": 0,
        "includeAll": false,
        "label": "Name",
        "multi": false,
        "name": "horizontalpodautoscaler",
        "options": [],
        "query": "label_values(kube_horizontalpodautoscaler_labels{job=\"kube-state-metrics\", namespace=\"$namespace\"}, horizontalpodautoscaler)",
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      }
    ]
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ],
    "time_options": [
      "5m",
      "15m",
      "1h",
      "6h",
      "12h",
      "24h",
      "2d",
      "7d",
      "30d"
    ]
  },
  "timezone": "",
  "title": "Kubernetes / Horizontal Pod Autoscaler",
  "uid": "alJY6yWZz",
  "version": 10,
  "description": "A quick and simple dashboard for viewing how your horizontal pod autoscaler is doing."
}

2-1) 부하 테스트용 Pod 생성

테스트용 Pod 생성

# Run and expose php-apache server
curl -s -O https://raw.githubusercontent.com/kubernetes/website/main/content/en/examples/application/php-apache.yaml
cat php-apache.yaml | yh
kubectl apply -f php-apache.yaml

# 확인
kubectl exec -it deploy/php-apache -- cat /var/www/html/index.php

# POD IP 확인
PODIP=$(kubectl get pod -l run=php-apache -o jsonpath={.items[0].status.podIP})
echo $PODIP

# 워커노드 접속
ssh -i ./test.pem ec2-user@10.143.7.111

# 워커노드에서 POD IP 호출 테스트
PODIP=10.143.8.178
curl -s $PODIP; echo

2-2) HPA 생성 및 부하 발생 후 오토 스케일링 테스트

증가 시 기본 대기 시간(30초), 감소 시 기본 대기 시간(5분) → 조정 가능

# Create the HorizontalPodAutoscaler : requests.cpu=200m - 알고리즘
# Since each pod requests 200 milli-cores by kubectl run, this means an average CPU usage of 100 milli-cores.
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
kubectl describe hpa

# HPA 설정 확인
kubectl get hpa php-apache -o yaml | kubectl neat | yh

# 반복 접속 1 (파드1 IP로 접속) >> 증가 확인 후 중지
while true;do curl -s $PODIP; sleep 0.5; done

# 반복 접속 2 (서비스명 도메인으로 파드들 분산 접속) >> 증가 확인(몇개까지 증가되는가? 그 이유는?) 후 중지 >> 중지 5분 후 파드 갯수 감소 확인
# Run this in a separate terminal
# so that the load generation continues and you can carry on with the rest of the steps
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

3. KEDA (Kubernetes based Event Driven Autoscaler) 실습

KEDA AutoScaler 소개 - Docs DevOcean
기존의 HPA(Horizontal Pod Autoscaler)는 리소스(CPU, Memory) 메트릭을 기반으로 스케일 여부를 결정하게 됩니다.
반면에 KEDA는 특정 이벤트를 기반으로 스케일 여부를 결정할 수 있습니다.
예를 들어 airflow는 metadb를 통해 현재 실행 중이거나 대기 중인 task가 얼마나 존재하는지 알 수 있습니다.
이러한 이벤트를 활용하여 worker의 scale을 결정한다면 queue에 task가 많이 추가되는 시점에 더 빠르게 확장할 수 있습니다.

3-1) KEDA with Helm : 특정 이벤트(cron 등)기반의 파드 오토 스케일링

- Chart Grafana Cron SQS_Scale aws-sqs-queue

실습 사전 준비

Grafana에서 KEDA 용 대시보드를 생성합니다
keda-dashboard.json : 대시보드 → Import : JSON 내용 복붙!

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": {
          "limit": 100,
          "matchAny": false,
          "tags": [],
          "type": "dashboard"
        },
        "type": "dashboard"
      }
    ]
  },
  "description": "Visualize metrics provided by KEDA",
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 1653,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 8,
      "panels": [],
      "title": "Metric Server",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The total number of errors encountered for all scalers.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "Errors/sec"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "scaledObject"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "keda-system/keda-operator-metrics-apiserver"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 0,
        "y": 1
      },
      "id": 4,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(job) (rate(keda_scaler_errors{}[5m]))",
          "legendFormat": "{{ job }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaler Total Errors",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The number of errors that have occurred for each scaler.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "Errors/sec"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "scaler"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "prometheusScaler"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 8,
        "y": 1
      },
      "id": 3,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(scaler) (rate(keda_scaler_errors{exported_namespace=~\"$namespace\", scaledObject=~\"$scaledObject\", scaler=~\"$scaler\"}[5m]))",
          "legendFormat": "{{ scaler }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaler Errors",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The number of errors that have occurred for each scaled object.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "Errors/sec"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 16,
        "y": 1
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(scaledObject) (rate(keda_scaled_object_errors{exported_namespace=~\"$namespace\", scaledObject=~\"$scaledObject\"}[5m]))",
          "legendFormat": "{{ scaledObject }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaled Object Errors",
      "type": "timeseries"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 10
      },
      "id": 10,
      "panels": [],
      "title": "Scale Target",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The current value for each scaler’s metric that would be used by the HPA in computing the target average.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "none"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "blue",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 24,
        "x": 0,
        "y": 11
      },
      "id": 5,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(metric) (keda_scaler_metrics_value{exported_namespace=~\"$namespace\", metric=~\"$metric\", scaledObject=\"$scaledObject\"})",
          "legendFormat": "{{ metric }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaler Metric Value",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "shows current replicas against max ones based on time difference",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 21,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineStyle": {
              "fill": "solid"
            },
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 20
      },
      "id": 13,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_status_current_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "format": "time_series",
          "instant": false,
          "interval": "",
          "legendFormat": "current_replicas",
          "range": true,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_spec_max_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "format": "time_series",
          "hide": false,
          "instant": false,
          "legendFormat": "max_replicas",
          "range": true,
          "refId": "B"
        }
      ],
      "title": "Current/max replicas (time based)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "shows current replicas against max ones based on time difference",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "continuous-GrYlRd"
          },
          "custom": {
            "fillOpacity": 70,
            "lineWidth": 0,
            "spanNulls": false
          },
          "mappings": [
            {
              "options": {
                "0": {
                  "color": "green",
                  "index": 0,
                  "text": "No scaling"
                }
              },
              "type": "value"
            },
            {
              "options": {
                "from": -200,
                "result": {
                  "color": "light-red",
                  "index": 1,
                  "text": "Scaling down"
                },
                "to": 0
              },
              "type": "range"
            },
            {
              "options": {
                "from": 0,
                "result": {
                  "color": "semi-dark-red",
                  "index": 2,
                  "text": "Scaling up"
                },
                "to": 200
              },
              "type": "range"
            }
          ],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "none"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 28
      },
      "id": 16,
      "options": {
        "alignValue": "left",
        "legend": {
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": false,
          "width": 0
        },
        "mergeValues": true,
        "rowHeight": 1,
        "showValue": "never",
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "delta(kube_horizontalpodautoscaler_status_current_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}[1m])",
          "format": "time_series",
          "instant": false,
          "interval": "",
          "legendFormat": ".",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Changes in replicas",
      "type": "state-timeline"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "shows current replicas against max ones",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "min": 0,
          "thresholds": {
            "mode": "percentage",
            "steps": [
              {
                "color": "green"
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 36
      },
      "id": 15,
      "options": {
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "/^current_replicas$/",
          "values": false
        },
        "showThresholdLabels": false,
        "showThresholdMarkers": true
      },
      "pluginVersion": "9.5.2",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_status_current_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "instant": true,
          "legendFormat": "current_replicas",
          "range": false,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_spec_max_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "hide": false,
          "instant": true,
          "legendFormat": "max_replicas",
          "range": false,
          "refId": "B"
        }
      ],
      "title": "Current/max replicas",
      "type": "gauge"
    }
  ],
  "refresh": "1m",
  "schemaVersion": 38,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "current": {
          "selected": false,
          "text": "Prometheus",
          "value": "Prometheus"
        },
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "queryValue": "",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "current": {
          "selected": false,
          "text": "bhe-test",
          "value": "bhe-test"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active,exported_namespace)",
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "namespace",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active,exported_namespace)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaledObject)",
        "hide": 0,
        "includeAll": true,
        "multi": true,
        "name": "scaledObject",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaledObject)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "cronScaler",
          "value": "cronScaler"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaler)",
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "scaler",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaler)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "s0-cron-Etc-UTC-40xxxx-55xxxx",
          "value": "s0-cron-Etc-UTC-40xxxx-55xxxx"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},metric)",
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "metric",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},metric)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      }
    ]
  },
  "time": {
    "from": "now-24h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "KEDA",
  "uid": "asdasd8rvmMxdVk",
  "version": 8,
  "weekStart": ""
}

KEDA Helm install

# KEDA 설치
cat <<EOT > keda-values.yaml
metricsServer:
  useHostNetwork: true

prometheus:
  metricServer:
    enabled: true
    port: 9022
    portName: metrics
    path: /metrics
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus Operator
      enabled: true
    podMonitor:
      # Enables PodMonitor creation for the Prometheus Operator
      enabled: true
  operator:
    enabled: true
    port: 8080
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus Operator
      enabled: true
    podMonitor:
      # Enables PodMonitor creation for the Prometheus Operator
      enabled: true

  webhooks:
    enabled: true
    port: 8080
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus webhooks
      enabled: true
EOT

kubectl create namespace keda
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --version 2.13.0 --namespace keda -f keda-values.yaml

KEDA 설치 확인

# KEDA 설치 확인
kubectl get all -n keda
kubectl get validatingwebhookconfigurations keda-admission
kubectl get validatingwebhookconfigurations keda-admission | kubectl neat | yh
kubectl get crd | grep keda

keda 네임스페이스에 디플로이먼트 생성

kubectl apply -f php-apache.yaml -n keda
kubectl get pod -n keda

ScaledObject 정책 생성 : cron

# ScaledObject 정책 생성 : cron
cat <<EOT > keda-cron.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: php-apache-cron-scaled
spec:
  minReplicaCount: 0
  maxReplicaCount: 2
  pollingInterval: 30
  cooldownPeriod: 300
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  triggers:
  - type: cron
    metadata:
      timezone: Asia/Seoul
      start: 00,15,30,45 * * * *
      end: 05,20,35,50 * * * *
      desiredReplicas: "1"
EOT
kubectl apply -f keda-cron.yaml -n keda

위 keda-cron.yaml 구성은 KEDA를 사용하여 php-apache 배포를 위한 ScaledObject를 정의합니다.
특정 시간에 따라 파드의 수를 자동으로 조정합니다. minReplicaCount와 maxReplicaCount는 각각 최소 및 최대 복제본 수를 정의합니다.
pollingInterval은 KEDA가 이 구성을 폴링하는 시간 간격이며, cooldownPeriod는 스케일 다운 작업 사이의 시간 간격입니다.
triggers 섹션에서는 cron 트리거를 사용하여 서울 시간대 기준 매 15분마다 시작하여 5분간 작업을 수행하도록 설정하고, 이 기간 동안 desiredReplicas를 1로 유지하여 파드가 스케일 업되도록 합니다.

Grafana에서 위와 같이 00분, 15분에 php-apache pod가 생성되는 것을 확인 가능합니다.

4. VPA (Vertical Pod Autoscaler) 실습

VPA - 링크 : pod resources.request을 최대한 최적값으로 수정, HPA와 같이 사용 불가능, 수정 시 파드 재실행

실습 사전 준비

Grafana에서 대시보드 생성 - 링크 14588

# 코드 다운로드
git clone https://github.com/kubernetes/autoscaler.git
cd ~/autoscaler/vertical-pod-autoscaler/
tree hack

# openssl 버전 확인
openssl version
OpenSSL 1.0.2k-fips  26 Jan 2017

# openssl 1.1.1 이상 버전 확인
yum install openssl11 -y
openssl11 version
OpenSSL 1.1.1g FIPS  21 Apr 2020

# 스크립트파일내에 openssl11 수정
sed -i 's/openssl/openssl11/g' ~/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/gencerts.sh

# Deploy the Vertical Pod Autoscaler to your cluster with the following command.
watch -d kubectl get pod -n kube-system
cat hack/vpa-up.sh
./hack/vpa-up.sh
kubectl get crd | grep autoscaling
kubectl get mutatingwebhookconfigurations vpa-webhook-config
kubectl get mutatingwebhookconfigurations vpa-webhook-config -o json | jq

# 모니터링
watch -d "kubectl top pod;echo "----------------------";kubectl describe pod | grep Requests: -A2"

# 공식 예제 배포
cd ~/autoscaler/vertical-pod-autoscaler/
cat examples/hamster.yaml | yh
kubectl apply -f examples/hamster.yaml && kubectl get vpa -w

VPA은 설정해 놓은 request, limit 값보다 큰 리소스가 요구가 되면 기존 Pod의 리소스 할당량을 조정하기 위해 해당 Pod를 재시작하면서 새로운 리소스 할당량을 적용합니다.
이 과정에서 기존 Pod는 종료되고, 새로운 리소스 할당량을 가진 Pod가 생성되는 것을 확인할 수있습니다.
위 캡쳐와 같이 특정 리소스가 요구되었을 때 pod가 재생성된 것을 확인하실 수 있습니다.

5. CA (Cluster Autoscaler) 실습

https://catalog.us-east-1.prod.workshops.aws/workshops/9c0aa9ab-90a9-44a6-abe1-8dff360ae428/ko-KR/100-scaling/200-cluster-scaling

Cluster Autoscale 동작을 하기 위한 cluster-autoscaler 파드(디플로이먼트)를 배치합니다.
Cluster Autoscaler(CA)는 pending 상태인 파드가 존재할 경우, 워커 노드를 스케일 아웃합니다.
특정 시간을 간격으로 사용률을 확인하여 스케일 인/아웃을 수행합니다. 그리고 AWS에서는 Auto Scaling Group(ASG)을 사용하여 Cluster Autoscaler를 적용합니다.

사전 확인
- EKS 노드에 이미 아래 tag가 들어가 있음을 확인합니다.
- k8s.io/cluster-autoscaler/enabled : true
- k8s.io/cluster-autoscaler/myeks : owned

aws ec2 describe-instances  --filters Name=tag:eks:nodegroup-name,Values=junho-test-eks
ng --query "Reservations[*].Instances[*].Tags[*]" --output yaml | yh

현재 autoscaling(ASG) 정보 확인

aws autoscaling describe-auto-scaling-groups \
    --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='junho-test-eks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" \
    --output table

MaxSize 6개로 수정

# MaxSize 6개로 수정
export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='junho-test-eks']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 3 --desired-capacity 3 --max-size 6

# 확인
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='junho-test-eks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

배포 : Deploy the Cluster Autoscaler (CA)

CLUSTER_NAME=junho-test-eks
curl -s -O https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
sed -i "s/<YOUR CLUSTER NAME>/$CLUSTER_NAME/g" cluster-autoscaler-autodiscover.yaml
kubectl apply -f cluster-autoscaler-autodiscover.yaml

생성 오류 발생
authorized to perform: autoscaling:DescribeAutoScalingGroups

NodeGroup Role 다음과 같이 AutoSacling policy 추가

생성확인

kubectl get pod -n kube-system | grep cluster-autoscaler
kubectl describe deployments.apps -n kube-system cluster-autoscaler
kubectl describe deployments.apps -n kube-system cluster-autoscaler | grep node-group-auto-discovery

(옵션) cluster-autoscaler 파드가 동작하는 워커 노드가 퇴출(evict) 되지 않게 설정

kubectl -n kube-system annotate deployment.apps/cluster-autoscaler cluster-autoscaler.kubernetes.io/safe-to-evict="false"

SCALE A CLUSTER WITH Cluster Autoscaler(CA) - 링크

# 모니터링 
kubectl get nodes -w
while true; do kubectl get node; echo "------------------------------" ; date ; sleep 1; done
while true; do aws ec2 describe-instances --query "Reservations[*].Instances[*].{PrivateIPAdd:PrivateIpAddress,InstanceName:Tags[?Key=='Name']|[0].Value,Status:State.Name}" --filters Name=instance-state-name,Values=running --output text ; echo "------------------------------"; date; sleep 1; done

5-1) 부하 테스트용 Pod 생성

# Deploy a Sample App
# We will deploy an sample nginx application as a ReplicaSet of 1 Pod
cat <<EoF> nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-to-scaleout
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        service: nginx
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx-to-scaleout
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi
EoF

kubectl apply -f nginx.yaml
kubectl get deployment/nginx-to-scaleout

# Scale our ReplicaSet
# Let’s scale out the replicaset to 15
kubectl scale --replicas=15 deployment/nginx-to-scaleout && date

scale-out을 통해 Pod의 replicas를 15개로 늘리면서 node의 리소스 요청량이 많아진 모습을 확인 가능합니다.

ASG를 통해 Worker Node의 수를 늘리는 모습을 확인 가능합니다.

cluster-autoscaler를 통해 ASG의 Worker Node의 수를 늘려 해당 Pod가 모두 배포된 모습을 확인할 수 있습니다.

위 실습 중 디플로이먼트 삭제 후 10분 후 노드 갯수 축소되는 것을 확인

# Cluster Autoscaler 삭제
kubectl delete -f cluster-autoscaler-autodiscover.yaml

5-2) CA 문제점

하나의 자원에 대해 두군데 (AWS ASG vs AWS EKS)에서 각자의 방식으로 관리 ⇒ 관리 정보가 서로 동기화되지 않아 다양한 문제 발생합니다.
CA 문제점 : ASG에만 의존하고 노드 생성/삭제 등에 직접 관여하지 않습니다.
EKS에서 노드를 삭제 해도 인스턴스는 삭제되지 않습니다.
노드 축소 될 때 특정 노드가 축소 되도록 하기 매우 어려운 문제가 발생합니다.
- pod이 적은 노드 먼저 축소, 이미 드레인 된 노드 먼저 축소
특정 노드를 삭제 하면서 동시에 노드 개수를 줄이기 어려운 문제가 발생합니다.
- 줄일때 삭제 정책 옵션이 다양하지 않습니다.
- 정책 미지원 시 삭제 방식(예시) : 100대 중 미삭제 EC2 보호 설정 후 삭제 될 ec2의 파드를 이주 후 scaling 조절로 삭제 후 원복해야 합니다.
특정 노드를 삭제하면서 동시에 노드 개수를 줄이기 어렵습니다.
폴링 방식이기에 너무 자주 확장 여유를 확인 하면 API 제한에 도달할 수 있습니다.
스케일링 속도가 매우 느립니다.
Cluster Autoscaler 는 쿠버네티스 클러스터 자체의 오토 스케일링을 의미하며, 수요에 따라 워커 노드를 자동으로 추가하는 기능입니다.
언뜻 보기에 클러스터 전체나 각 노드의 부하 평균이 높아졌을 때 확장으로 보입니다. → 함정! 🚧
Pending 상태의 파드가 생기는 타이밍에 처음으로 Cluster Autoscaler 이 동작합니다.
- 즉, Request 와 Limits 를 적절하게 설정하지 않은 상태에서는 실제 노드의 부하 평균이 낮은 상황에서도 스케일 아웃이 되거나, 부하 평균이 높은 상황임에도 스케일 아웃이 되지 않습니다!
기본적으로 리소스에 의한 스케줄링은 Requests(최소)를 기준으로 이루어진다. 다시 말해 Requests 를 초과하여 할당한 경우에는 최소 리소스 요청만으로 리소스가 꽉 차 버려서 신규 노드를 추가해야만 하며, 이때 실제 컨테이너 프로세스가 사용하는 리소스 사용량은 고려되지 않습니다.
반대로 Request 를 낮게 설정한 상태에서 Limit 차이가 나는 상황을 생각해보면 각 컨테이너는 Limits 로 할당된 리소스를 최대로 사용게 되는데 리소스 사용량이 높아졌더라도 Requests 합계로 보면 아직 스케줄링이 가능하기 때문에 클러스터가 스케일 아웃하지 않는 상황이 발생합니다.
여기서는 CPU 리소스 할당을 예로 설명했지만 메모리의 경우도 마찬가지입니다.

6. CPA (Cluster Proportional Autoscaler) 실습

노드 수 증가에 비례하여 성능 처리가 필요한 애플리케이션(컨테이너/파드)를 수평으로 자동 확장 ex. coredns - Github
CPA(Cluster Proportional Autoscaler)는 즉, Cluster 내의 Node수나 총 CPU, Memory 등의 크기에 따라 애플리케이션를 Scaling하는 도구입니다.

6-1) CPA 설치

helm repo add cluster-proportional-autoscaler https://kubernetes-sigs.github.io/cluster-proportional-autoscaler

# CPA규칙을 설정하고 helm차트를 릴리즈 필요
helm upgrade --install cluster-proportional-autoscaler cluster-proportional-autoscaler/cluster-proportional-autoscaler

# nginx 디플로이먼트 배포
cat <<EOT > cpa-nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          limits:
            cpu: "100m"
            memory: "64Mi"
          requests:
            cpu: "100m"
            memory: "64Mi"
        ports:
        - containerPort: 80
EOT
kubectl apply -f cpa-nginx.yaml

# CPA 규칙 설정
cat <<EOF > cpa-values.yaml
config:
  ladder:
    nodesToReplicas:
      - [1, 1]
      - [2, 2]
      - [3, 3]
      - [4, 3]
      - [5, 5]
options:
  namespace: default
  target: "deployment/nginx-deployment"
EOF

# helm 업그레이드
helm upgrade --install cluster-proportional-autoscaler -f cpa-values.yaml cluster-proportional-autoscaler/cluster-proportional-autoscaler

6-2) 노드 3개에서 5개로 증가 후 Pod 개수 확인 테스트

# 노드 5개로 증가
export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='junho-test-eks']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 5 --desired-capacity 5 --max-size 5

# 확인
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='junho-test-eks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

기존엔 Node의 수가 는 3개이고, Pod의 수도 3개였지만 Node가 5개로 늘면서 nginx Pod 또한 5개로 늘어난 것을 확인 가능합니다.

6-3) 노드 5개에서 4개로 감소 후 Pod 개수 확인 테스트

# 노드 4개로 감소
export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='junho-test-eks']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 4 --desired-capacity 4 --max-size 4

# 확인
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='junho-test-eks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

설정한 CPA 정책에 맞게 Node의 수가 4개가 되었으니 5개였던 Pod의 수가 3개로 감소되었음을 확인 가능합니다.

7. Karpenter (K8S Native AutoScaler) 실습

(Terraform으로 Karpenter 테스트 실패.. 추후 보완예정)

Karpenter를 이용하여 관리할 WorkerNode에서 사용할 Role에 다음과 같은 Policy를 추가합니다.

AmazonEKSWorkerNodePolicy
AmazonEKS_CNI_Policy
AmazonEC2ContainerRegistryReadOnly
AmazonSSMManagedInstanceCore

SSMManagedInstanceCore권한은 Karpenter가 공용 SSM 매개변수 저장소에서 최신 EKS 최적화 AMI를 가져오는 데 사용됩니다.

Karpenter 에 필요한 Role 및 IAM Profile 생성 (w. terraform)

data "aws_iam_policy_document" "karpenter_role_assume_role_policy" {
  statement {
    sid     = ""
    effect  = "Allow"
    actions = ["sts:AssumeRoleWithWebIdentity"]

    condition {
      test     = "StringEquals"
      variable = "${replace(aws_iam_openid_connect_provider.oidc_cluster.url, "https://", "")}:aud"
      values   = ["sts.amazonaws.com"]
    }

    condition {
      test     = "StringEquals"
      variable = "${replace(aws_iam_openid_connect_provider.oidc_cluster.url, "https://", "")}:sub"
      values   = ["system:serviceaccount:karpenter:karpenter"]
    }

    principals {
      type        = "Federated"
      identifiers = [aws_iam_openid_connect_provider.oidc_cluster.arn]
    }
  }
}

resource "aws_iam_role" "karpenter_role" {
  assume_role_policy = data.aws_iam_policy_document.karpenter_role_assume_role_policy.json
  name               = "${local.company_name}-karpenter-role"

  depends_on = [aws_iam_openid_connect_provider.oidc_cluster]

  tags = {
    "ServiceAccount"          = "karpenter"
    "ServiceAccountNameSpace" = "karpenter"
  }
}

data "aws_iam_policy_document" "karpenter_policy_document" {
  statement {
    actions = [
      "ssm:GetParameter",
      "ec2:DescribeImages",
      "ec2:RunInstances",
      "ec2:DescribeSubnets",
      "ec2:DescribeSecurityGroups",
      "ec2:DescribeLaunchTemplates",
      "ec2:DescribeInstances",
      "ec2:DescribeInstanceTypes",
      "ec2:DescribeInstanceTypeOfferings",
      "ec2:DescribeAvailabilityZones",
      "ec2:DeleteLaunchTemplate",
      "ec2:CreateTags",
      "ec2:CreateLaunchTemplate",
      "ec2:CreateFleet",
      "ec2:DescribeSpotPriceHistory",
      "pricing:GetProducts"
    ]
    effect    = "Allow"
    resources = ["*"]
    sid       = "Karpenter"
  }

  statement {
    actions   = ["ec2:TerminateInstances"]
    effect    = "Allow"
    resources = ["*"]
    sid       = "ConditionalEC2Termination"
    condition {
      test     = "StringLike"
      variable = "ec2:ResourceTag/karpenter.sh/provisioner-name"
      values   = ["*"]
    }
  }

  statement {
    actions   = ["iam:PassRole"]
    effect    = "Allow"
    resources = ["arn:aws:iam::${var.AWS_ACCOUNT_ID}:role/${aws_iam_role.worker_ng_role.name}"]
    sid       = "PassNodeIAMRole"
  }

  statement {
    actions   = ["eks:DescribeCluster"]
    effect    = "Allow"
    resources = ["arn:aws:eks:${var.AWS_REGION}:${var.AWS_ACCOUNT_ID}:cluster/${aws_eks_cluster.cluster.name}"]
    sid       = "EKSClusterEndpointLookup"
  }
}


resource "aws_iam_policy" "karpenter_policy" {
  name   = "${local.company_name}-karpenter-policy"
  policy = data.aws_iam_policy_document.karpenter_policy_document.json
}

resource "aws_iam_role_policy_attachment" "karpenter_policy_attach" {
  policy_arn = aws_iam_policy.karpenter_policy.arn
  role       = aws_iam_role.karpenter_role.name
}

resource "aws_iam_instance_profile" "karpenter_node_instance_role_profile" {
  name = "${local.company_name}-karpenter-role"
  role = aws_iam_role.karpenter_role.name
}

Karpenter CRD & Helm chart 설치

resource "kubernetes_namespace" "karpenter" {
  metadata {
    name = "karpenter"
  }
}

resource "helm_release" "karpenter_crd" {
  namespace        = kubernetes_namespace.karpenter.id
  create_namespace = true
  name                = "karpenter-crd"
  repository          = "oci://public.ecr.aws/karpenter"
  chart               = "karpenter-crd"
  version             = "v0.32.1"
  replace             = true
  force_update        = true

  depends_on = [module.eks]
}

resource "helm_release" "karpenter" {
  namespace = kubernetes_namespace.karpenter.id
  name       = "karpenter"
  repository = "oci://public.ecr.aws/karpenter"
  chart      = "karpenter"
  version    = "v0.32.1"

  set {
    name  = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = module.eks.karpenter_role_arn
  }

  set {
    name  = "settings.aws.clusterName"
    value = module.eks.eks_name
  }

  set {
    name  = "settings.aws.clusterEndpoint"
    value = module.eks.eks_host
  }

  set {
    name  = "settings.aws.defaultInstanceProfile"
    value = module.eks.karpenter_node_instance_role_profile.name
  }

  values = [
    templatefile("${path.module}/custom-yaml/karpenter_values.yaml",
      {
        NodeGroup_Name = module.eks.aws_eks_node_group_name,
      }
    )
  ]

  depends_on = [module.eks, helm_release.karpenter_crd]
}

export K8S_VERSION="1.28"
export ARM_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text)"
export AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"
export GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)"

export CLUSTER_NAME=junho-test-eks
export NODE_GROUP_ROLE=junho-test-role-eksng

cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h # 30 * 24h = 720h
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2 # Amazon Linux 2
  role: "${NODE_GROUP_ROLE}" # replace with your cluster name
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  amiSelectorTerms:
    - id: "${ARM_AMI_ID}"
    - id: "${AMD_AMI_ID}"
#   - id: "${GPU_AMI_ID}" # <- GPU Optimized AMD AMI 
#   - name: "amazon-eks-node-${K8S_VERSION}-*" # <- automatically upgrade when a new AL2 EKS Optimized AMI is released. This is unsafe for production workloads. Validate AMIs in lower environments before deploying them to production.
EOF

Scale up deployment

KARPENTER_NAMESPACE=karpenter

# pause 파드 1개에 CPU 1개 최소 보장 할당
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
EOF

# Scale up
kubectl get pod
kubectl scale deployment inflate --replicas 5
kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controller
kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controller | jq '.'

다음과 같은오류 발생

"error":"no subnets exist given constraints [{map[karpenter.sh/discovery:junho-test-eks] }]; no security groups exist given constraints

junho-karpenter-role/1712412563536894548 is not authorized to perform: iam:GetInstanceProfile on resource

오류 해결
subnet 및 SG에 해당 Tag값 추가 (karpenter.sh/discovery:junho-test-eks)

다음과 같은오류 발생

{"level":"ERROR","time":"2024-04-06T15:09:01.485Z","logger":"controller","message":"Reconciler error","commit":"1072d3b","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"default-t5ln7"},"namespace":"","name":"default-t5ln7","reconcileID":"39287041-5c9c-481c-9540-78cb10619c6a","error":"launching nodeclaim, creating instance, with fleet error(s), AuthFailure.ServiceLinkedRoleCreationNotPermitted: The provided credentials do not have permission to create the service-linked role for EC2 Spot Instances."}

ServiceLinkedRoleCreateNotPermitted 오류가 발생하여 iam 관련 권한을 이것저것 넣어봤지만, 실패..
추후.. 다시 Kapenter는 테스트 예정입니다.

728x90

저작자표시 비영리 변경금지

'EKS' 카테고리의 다른 글

[EKS] CI/CD (Jenkins, ArgoCD) (1)	2024.04.20
[EKS] Security (0)	2024.04.14
[EKS] Observability (0)	2024.03.28
[EKS] AWS EKS의 Storage & Nodegroup (1)	2024.03.23
[EKS] AWS EKS의 Networking (VPC CNI, LB Controller) (0)	2024.03.16

준호의 발자취

[EKS] Autoscaling

1. EKS Node Viewer & Metrics Server 설치

1-1) EKS Node Viewer 설치

1-2) Metrics Server 설치

1-3) krew 설치

2. HPA (Horizontal Pod Autoscaler) 실습

2-1) 부하 테스트용 Pod 생성

2-2) HPA 생성 및 부하 발생 후 오토 스케일링 테스트

3. KEDA (Kubernetes based Event Driven Autoscaler) 실습

3-1) KEDA with Helm : 특정 이벤트(cron 등)기반의 파드 오토 스케일링

- Chart Grafana Cron SQS_Scale aws-sqs-queue

4. VPA (Vertical Pod Autoscaler) 실습

5. CA (Cluster Autoscaler) 실습

5-1) 부하 테스트용 Pod 생성

5-2) CA 문제점

6. CPA (Cluster Proportional Autoscaler) 실습

6-1) CPA 설치

6-2) 노드 3개에서 5개로 증가 후 Pod 개수 확인 테스트

6-3) 노드 5개에서 4개로 감소 후 Pod 개수 확인 테스트

7. Karpenter (K8S Native AutoScaler) 실습

'EKS' 카테고리의 다른 글

티스토리툴바

[EKS] Autoscaling

1. EKS Node Viewer & Metrics Server 설치

1-1) EKS Node Viewer 설치

1-2) Metrics Server 설치

1-3) krew 설치

2. HPA (Horizontal Pod Autoscaler) 실습

2-1) 부하 테스트용 Pod 생성

2-2) HPA 생성 및 부하 발생 후 오토 스케일링 테스트

3. KEDA (Kubernetes based Event Driven Autoscaler) 실습

3-1) KEDA with Helm : 특정 이벤트(cron 등)기반의 파드 오토 스케일링

- Chart Grafana Cron SQS_Scale aws-sqs-queue

4. VPA (Vertical Pod Autoscaler) 실습

5. CA (Cluster Autoscaler) 실습

5-1) 부하 테스트용 Pod 생성

5-2) CA 문제점

6. CPA (Cluster Proportional Autoscaler) 실습

6-1) CPA 설치

6-2) 노드 3개에서 5개로 증가 후 Pod 개수 확인 테스트

6-3) 노드 5개에서 4개로 감소 후 Pod 개수 확인 테스트

7. Karpenter (K8S Native AutoScaler) 실습

'EKS' 카테고리의 다른 글

'EKS' Related Articles

티스토리툴바