K3s Production Crash Loop - `ETIMEDOUT` & Heap Out of Memory

Unanswered

California pilchard posted this in #help-forum

California pilchardOP

2025-12-12T15:01:55.600Z

Hi ! I'm completely stuck with a production crash loop. My app works perfectly with docker-compose locally, but in my K3s cluster, it works for a few minutes then becomes unresponsive and crashes.

My Stack
- Next.js: 16.0.7 (App Router)
- Deployment: K3s Cluster on Hetzner Cloud
- Data Fetching: Server-side fetching via supabase-js

The Symptoms
- The pod starts, works fine for a few minutes.
- Suddenly, the logs are flooded with TypeError: fetch failed and ETIMEDOUT errors.
- The app becomes unresponsive, causing Readiness probe failed: context deadline exceeded.
- The pod is killed and restarts, creating a crash loop. Eventually, it crashes with JavaScript heap out of memory.

What I've Already Ruled Out
- Basic Networking: The pod can successfully wget google.com.
- CPU/Memory Limits: Increased to 2 CPU / 3Gi RAM, but probes still fail. It's a symptom, not the cause.
- IPv6: ping6 fails, so I tried forcing IPv4 with a custom /etc/gai.conf. The ETIMEDOUT errors still come back.
- File Descriptor Leak: Checked open FDs for the Node process, the count is very low (~35).

Well I have no idea what is wrong rn...

5 Replies

@California pilchard Hi ! I'm completely stuck with a production crash loop. My app works perfectly with docker-compose locally, but in my K3s cluster, it works for a few minutes then becomes unresponsive and crashes. My Stack - Next.js: 16.0.7 (App Router) - Deployment: K3s Cluster on Hetzner Cloud - Data Fetching: Server-side fetching via supabase-js The Symptoms - The pod starts, works fine for a few minutes. - Suddenly, the logs are flooded with TypeError: fetch failed and ETIMEDOUT errors. - The app becomes unresponsive, causing Readiness probe failed: context deadline exceeded. - The pod is killed and restarts, creating a crash loop. Eventually, it crashes with JavaScript heap out of memory. What I've Already Ruled Out - Basic Networking: The pod can successfully wget google.com. - CPU/Memory Limits: Increased to 2 CPU / 3Gi RAM, but probes still fail. It's a symptom, not the cause. - IPv6: ping6 fails, so I tried forcing IPv4 with a custom /etc/gai.conf. The ETIMEDOUT errors still come back. - File Descriptor Leak: Checked open FDs for the Node process, the count is very low (~35). Well I have no idea what is wrong rn...

B33fb0n3

2025-12-12T18:50:00.640Z

can you check if you fixed the newest CVE from react from today?

react version updated to:

Versions 19.0.3, 19.1.4, 19.2.3 are safe.

and for nextjs:
Version DoS (CVE-2025-55184) Source Code Exposure (CVE-2025-55183) Fixed In
>=13.3 ✓ — Upgrade to 14.2.35
14.x ✓ — 14.2.35
15.0.x ✓ ✓ 15.0.7
15.1.x ✓ ✓ 15.1.11
15.2.x ✓ ✓ 15.2.8
15.3.x ✓ ✓ 15.3.8
15.4.x ✓ ✓ 15.4.10
15.5.x ✓ ✓ 15.5.9
15.x canary ✓ ✓ 15.6.0-canary.60
16.0.x ✓ ✓ 16.0.10
16.x canary ✓ ✓ 16.1.0-canary.19

The CVE-2025-55184 is specifically a DoS attack. So check if you really upgraded

@B33fb0n3 can you check if you fixed the newest CVE from react from today? react version updated to: > Versions 19.0.3, 19.1.4, 19.2.3 are safe. and for nextjs: Version DoS (CVE-2025-55184) Source Code Exposure (CVE-2025-55183) Fixed In >=13.3 ✓ — Upgrade to 14.2.35 14.x ✓ — 14.2.35 15.0.x ✓ ✓ 15.0.7 15.1.x ✓ ✓ 15.1.11 15.2.x ✓ ✓ 15.2.8 15.3.x ✓ ✓ 15.3.8 15.4.x ✓ ✓ 15.4.10 15.5.x ✓ ✓ 15.5.9 15.x canary ✓ ✓ 15.6.0-canary.60 16.0.x ✓ ✓ 16.0.10 16.x canary ✓ ✓ 16.1.0-canary.19 The CVE-2025-55184 is specifically a DoS attack. So check if you really upgraded

California pilchardOP

2025-12-12T19:03:01.827Z

I have :

"next": "16.0.10",
"react": "19.2.3",

@California pilchard I have : "next": "16.0.10", "react": "19.2.3",

B33fb0n3

2025-12-12T20:02:54.193Z

I am confused. You said in your initial message that you have 16.0.7. Did you just upgraded? If yes, is the issue still there?

@B33fb0n3 I am confused. You said in your initial message that you have 16.0.7. Did you just upgraded? If yes, is the issue still there?

California pilchardOP

2025-12-12T20:14:40.169Z

Yes sry Ive updated, but issue still happens

My issue only happens in k3s cluster. Thats weird because in docker (production build) ram stay around 300mo. But in k3s cluster, my app go to 2go and never go down after just 5 minutes...