Prometheus Helm Values Reference
Overview
This document provides the configuration reference for the Prometheus monitoring system deployed in the Fawkes platform. Prometheus collects metrics from platform services and applications for observability and alerting.
Helm Chart: prometheus/prometheus (Prometheus Community chart)
Chart Repository: https://prometheus-community.github.io/helm-charts
Values File Location: platform/apps/prometheus/values.yaml
Server Configuration
server.resources
Resource allocation for the Prometheus server.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
server.resources.requests.cpu |
String | No | 100m |
CPU request (millicores). |
server.resources.requests.memory |
String | No | 256Mi |
Memory request. |
server.resources.limits.cpu |
String | No | 200m |
Maximum CPU allocation. |
server.resources.limits.memory |
String | No | 512Mi |
Maximum memory allocation. |
Fawkes Defaults:
server:
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 200m
Scaling Guidelines:
| Environment | CPU Request | Memory Request | CPU Limit | Memory Limit |
|---|---|---|---|---|
| Development | 100m |
256Mi |
200m |
512Mi |
| Staging | 200m |
512Mi |
500m |
1Gi |
| Production | 500m |
1Gi |
2 |
4Gi |
server.service
Kubernetes Service configuration for the Prometheus server.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
server.service.type |
String | No | ClusterIP |
Service type: ClusterIP, NodePort, LoadBalancer. |
server.service.port |
Integer | No | 80 |
Service port. |
server.service.targetPort |
Integer | No | 9090 |
Container port. |
Fawkes Default:
server:
service:
type: ClusterIP
Access Pattern: Prometheus is accessed via Ingress, not directly exposed.
server.persistentVolume
Persistent storage for metrics data.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
server.persistentVolume.enabled |
Boolean | No | true |
Enable persistent storage. |
server.persistentVolume.size |
String | No | 8Gi |
Storage size. |
server.persistentVolume.storageClass |
String | No | standard |
StorageClass name. |
server.persistentVolume.accessModes |
Array[String] | No | ["ReadWriteOnce"] |
Volume access modes. |
Retention Recommendations:
| Retention Period | Recommended Size | Use Case |
|---|---|---|
| 7 days | 8Gi |
Development |
| 15 days | 20Gi |
Staging |
| 30 days | 50Gi |
Production |
| 90 days | 150Gi |
Long-term analysis |
Example:
server:
persistentVolume:
enabled: true
size: 50Gi
storageClass: gp3
server.retention
Metrics retention configuration.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
server.retention |
String | No | 15d |
Time-based retention (e.g., 7d, 30d). |
server.retentionSize |
String | No | - | Size-based retention (e.g., 10GB, 50GB). Overrides time-based retention. |
Example:
server:
retention: "30d"
retentionSize: "45GB"
Alerting Configuration
alertmanager.enabled
Enable or disable Alertmanager integration.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
alertmanager.enabled |
Boolean | No | true |
Deploy Alertmanager for alert routing. |
Fawkes Default: true
alertmanagerFiles
Alert routing configuration.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
alertmanagerFiles.alertmanager.yml |
Object | No | {} |
Alertmanager configuration (receivers, routes). |
Example:
alertmanagerFiles:
alertmanager.yml:
global:
resolve_timeout: 5m
route:
group_by: ["alertname", "cluster", "service"]
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: "mattermost"
receivers:
- name: "mattermost"
webhook_configs:
- url: "https://mattermost.fawkes.example.com/hooks/alerts"
Scrape Configurations
serverFiles.prometheus.yml
Prometheus scrape configuration.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
serverFiles.prometheus.yml.scrape_configs |
Array[Object] | No | [] |
List of scrape jobs. |
Default Scrape Targets:
| Job | Target | Metrics |
|---|---|---|
prometheus |
Prometheus itself | Self-monitoring metrics. |
kubernetes-apiservers |
Kubernetes API server | Control plane metrics. |
kubernetes-nodes |
Kubelet on each node | Node metrics (CPU, memory, disk). |
kubernetes-pods |
Pods with prometheus.io/scrape=true annotation |
Application metrics. |
kubernetes-service-endpoints |
Service endpoints | Service-level metrics. |
Example Custom Scrape Job:
serverFiles:
prometheus.yml:
scrape_configs:
- job_name: "jenkins"
static_configs:
- targets: ["jenkins.jenkins.svc.cluster.local:8080"]
metrics_path: "/prometheus"
scrape_interval: 30s
Service Discovery
kubeStateMetrics.enabled
Enable Kube State Metrics for Kubernetes resource metrics.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
kubeStateMetrics.enabled |
Boolean | No | true |
Deploy kube-state-metrics. |
Metrics Collected:
- Deployment status (replicas, available replicas)
- Pod status (phase, restarts, resource usage)
- Node status (allocatable resources, conditions)
- PersistentVolumeClaim status (phase, capacity)
nodeExporter.enabled
Enable Node Exporter for node-level metrics.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
nodeExporter.enabled |
Boolean | No | true |
Deploy node-exporter DaemonSet. |
Metrics Collected:
- CPU usage and load average
- Memory usage (total, available, buffers, caches)
- Disk I/O and space
- Network traffic and errors
Ingress Configuration
server.ingress
Ingress configuration for accessing Prometheus UI.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
server.ingress.enabled |
Boolean | No | false |
Enable Ingress resource creation. |
server.ingress.annotations |
Object | No | {} |
Ingress annotations (e.g., cert-manager.io/cluster-issuer). |
server.ingress.hosts |
Array[String] | No | [] |
Hostnames for Ingress rules. |
server.ingress.tls |
Array[Object] | No | [] |
TLS configuration. |
Example:
server:
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: prometheus-basic-auth
hosts:
- prometheus.fawkes.example.com
tls:
- secretName: prometheus-tls
hosts:
- prometheus.fawkes.example.com
Federation Configuration
server.global
Global Prometheus configuration for federation.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
server.global.scrape_interval |
String | No | 15s |
Default scrape interval. |
server.global.scrape_timeout |
String | No | 10s |
Default scrape timeout. |
server.global.evaluation_interval |
String | No | 15s |
Rule evaluation interval. |
server.global.external_labels |
Object | No | {} |
Labels added to all metrics (e.g., cluster, environment). |
Example:
server:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: "fawkes-prod"
environment: "production"
Security Configuration
server.podSecurityPolicy
Pod Security Policy for Prometheus server.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
server.podSecurityPolicy.enabled |
Boolean | No | false |
Enable PodSecurityPolicy (deprecated in K8s 1.25+). |
Note: Use Kyverno policies instead for modern Kubernetes versions.
Complete Example
server:
resources:
requests:
memory: 1Gi
cpu: 500m
limits:
memory: 4Gi
cpu: 2
service:
type: ClusterIP
persistentVolume:
enabled: true
size: 50Gi
storageClass: gp3
retention: "30d"
global:
scrape_interval: 15s
external_labels:
cluster: "fawkes-prod"
ingress:
enabled: true
hosts:
- prometheus.fawkes.example.com
tls:
- secretName: prometheus-tls
hosts:
- prometheus.fawkes.example.com
alertmanager:
enabled: true
kubeStateMetrics:
enabled: true
nodeExporter:
enabled: true