Next.js 16 consuming 1+ CPU core per pod at idle on k3s - constant crash loops
Unanswered
Pacific herring posted this in #help-forum
Pacific herringOP
I'm running Next.js 16.0.10 in production on a k3s cluster and experiencing severe performance issues that I didn't have before migrating to Kubernetes.
The problem:
* Each pod consumes ~1100m CPU (1+ core) constantly, even with zero traffic
* This causes readiness/liveness probes to timeout → pod restarts
* 124+ restarts in 22 hours, creating an endless crash loop
* The app starts fine (
Current metrics (with 0 traffic):
NAME CPU(cores) MEMORY(bytes)
web-app-xxx 1098m 339Mi
web-app-yyy 1177m 280Mi
Inside the pod (top):
PID 1 next-server 29% CPU VSZ 11.1g
Deployment config:
* Resources: 500m CPU request, 2Gi limit
*
* Using
* Production build with
What I've tried:
* Adjusting probe timeouts (no effect)
* Lowering/raising memory limits
* Scaling to 1 pod vs multiple pods (same behavior)
This is a production app that's currently unusable. The app runs perfectly fine locally in development and when I build it locally with
Any insights would be greatly appreciated. I can provide additional logs, configs, or metrics if needed.
The problem:
* Each pod consumes ~1100m CPU (1+ core) constantly, even with zero traffic
* This causes readiness/liveness probes to timeout → pod restarts
* 124+ restarts in 22 hours, creating an endless crash loop
* The app starts fine (
Ready in 153ms) but immediately spins CPU to 100%Current metrics (with 0 traffic):
NAME CPU(cores) MEMORY(bytes)
web-app-xxx 1098m 339Mi
web-app-yyy 1177m 280Mi
Inside the pod (top):
PID 1 next-server 29% CPU VSZ 11.1g
Deployment config:
* Resources: 500m CPU request, 2Gi limit
*
NODE_OPTIONS=--max-old-space-size=1536* Using
emptyDir for .next/cache (20Gi limit)* Production build with
output: 'standalone'What I've tried:
* Adjusting probe timeouts (no effect)
* Lowering/raising memory limits
* Scaling to 1 pod vs multiple pods (same behavior)
This is a production app that's currently unusable. The app runs perfectly fine locally in development and when I build it locally with
next build && next start, so I have no way to reproduce this behavior outside of the k3s environment. I'm stuck debugging in production which is not ideal.Any insights would be greatly appreciated. I can provide additional logs, configs, or metrics if needed.
2 Replies
Pacific herringOP
I have all this kind of error logs :
⨯ Error: {"message":"TypeError: fetch failed","details":"TypeError: fetch failed\n\nCaused by: AggregateError: (ETIMEDOUT)\nAggregateError: \n at internalConnectMultiple (node:net:1122:18)\n at internalConnectMultiple (node:net:1190:5)\n at Timeout.internalConnectMultipleTimeout (node:net:1716:5)\n at listOnTimeout (node:internal/timers:583:11)\n at process.processTimers (node:internal/timers:519:7)","hint":"","code":""}
at ignore-listed frames {
digest: '3713074019'
}
⨯ Error: {"message":"TypeError: fetch failed","details":"TypeError: fetch failed\n\nCaused by: AggregateError: (ETIMEDOUT)\nAggregateError: \n at internalConnectMultiple (node:net:1122:18)\n at internalConnectMultiple (node:net:1190:5)\n at Timeout.internalConnectMultipleTimeout (node:net:1716:5)\n at listOnTimeout (node:internal/timers:583:11)\n at process.processTimers (node:internal/timers:519:7)","hint":"","code":""}
at ignore-listed frames {
digest: '3713074019'
}
⨯ Error: {"message":"TypeError: fetch failed","details":"TypeError: fetch failed\n\nCaused by: AggregateError: (ETIMEDOUT)\nAggregateError: \n at internalConnectMultiple (node:net:1122:18)\n at internalConnectMultiple (node:net:1190:5)\n at Timeout.internalConnectMultipleTimeout (node:net:1716:5)\n at listOnTimeout (node:internal/timers:583:11)\n at process.processTimers (node:internal/timers:519:7)","hint":"","code":""}
at ignore-listed frames {
digest: '3713074019'
}Saint Hubert Jura Hound
Try requesting 2 cores, dont set any mem or cpu limits, also remove the liveness probe for now, lemme know what happens