Resolved

Post-Mortem: Phala Cloud API Authorization Vulnerability (June 1, 2026)

Jun 6, 2026 at 1:57am UTC

Affected services

Phala Cloud API

Resolved
Jun 6, 2026 at 1:57am UTC

Post-Mortem: Phala Cloud API Authorization Vulnerability (June 1, 2026)

Status: Resolved & patched · Severity: P0 · Window: May 31 – June 1, 2026 (UTC) · This is a detailed follow-up to our June 1 security notice.

On June 1, 2026, an attacker used a bug in a Phala Cloud API endpoint to change CVMs they didn't own. We've fixed the bug, undone every unauthorized change, banned the accounts involved, and emailed affected customers directly.

This post explains what happened, what was and wasn't affected, what we did, and what we're changing so it doesn't happen again. It includes full technical detail, including the attacker's script.

In short — A few of our CVM "change" endpoints let someone edit a CVM by its ID without checking that they owned it. An attacker used this to replace the pre-launch script on other people's CVMs with a script that copies their secrets to a server the attacker controls. Saving that change also restarts the CVM, so the script ran and the secrets on those CVMs were exposed. Only Offchain KMS CVMs were affected. Onchain KMS CVMs and the TEE / attestation chain were not.

If we emailed you that your CVM was affected: treat that CVM as compromised and redeploy it, then rotate the secrets in its encrypted environment variables and any registry credentials it used. Full checklist is near the end. We've contacted every affected customer directly — if you didn't get an email from us, your CVMs were not affected.

What happened

Phala Cloud groups your resources into workspaces. Every CVM belongs to one workspace, and our API is supposed to check that the person making a change belongs to the workspace that owns the CVM.

A few of the endpoints that change a CVM skipped this check. They found the CVM by its ID and made the change without checking who owned it. So anyone who knew a CVM's ID could change CVMs in other people's workspaces.

The endpoint that mattered most was the one that sets a CVM's pre-launch script — a script that runs as root inside the secure VM, just before your containers start, with access to your decrypted secrets. This is a normal, useful feature for the owner, but the same access makes it a high-value target if someone can change it without permission. The attacker used the bug to replace this script with one that copies secrets to a server they control.

Saving a new pre-launch script also restarts the CVM to apply it. So the script ran during that restart, and the secrets on the affected CVMs were exposed to the attacker.

Who was affected

Area	Status
Offchain KMS CVMs	Affected — the ones we emailed about
Onchain KMS CVMs	Not affected (these use a different path the bug couldn't reach)
TEE hardware & attestation chain	Not affected (the bug was in the cloud API, not the secure hardware)
Phala platform infrastructure (KMS, control plane)	Not affected (the attacker probed some platform CVMs, but none were modified; KMS keys and the attestation trust root are intact)
Phala account logins & billing data	Not affected (the vulnerability could not reach account or billing systems)

The attacker first tested on their own CVMs, then focused on one customer's workspace (three CVMs), and finally ran two automated rounds — one aimed at 170 CVM IDs, the other at 165 — pushing the bad pre-launch script across the platform. The change was applied about 130 times across the two rounds; after removing duplicates, that's roughly 100 distinct CVMs. Because saving the change restarts the CVM, the script ran on those CVMs, so we treat all of them as compromised.

We identified the affected CVMs and emailed those customers directly with what to do.

Timeline (all times UTC)

May 31, 15:04 — Attacker starts scanning workspaces and apps, then spends about 7 hours building and testing their tools on their own CVMs.
May 31, 22:26 — First confirmed unauthorized change to a customer CVM.
May 31 → June 1 — Attacker moves through several customer CVMs, then runs two automated rounds pushing the bad pre-launch script to many CVM IDs, keeping a deliberately slow pace to avoid notice.
June 1, 14:47 — A customer reports something odd. (Thank you — this is how we found out.)
June 1, 14:54 — Our team responds; we open a dedicated incident channel at 15:31.
June 1, 15:47 — Bug fixed. This is also the last request we saw from the attacker.
June 1, 15:53 — All affected CVMs stopped and the unauthorized changes undone.
June 1, 18:50 — Public notice published.

The attack ran for about a day before we caught it, and a customer reported it before our own monitoring did.

Why it happened

Our code had two ways to find a CVM by its ID:

a workspace-check version that says "not found" if the CVM isn't in your workspace, and
a no-check version that returns any CVM by its ID.

The affected endpoints used the no-check version. From the outside, both behave the same — the request succeeds and the right CVM is changed — so the missing check didn't show up in normal testing. The common name for this kind of bug is Broken Access Control (OWASP A01).

Two things made it possible, and we want to be straight about both:

The safe check was optional. Each endpoint had to choose the safe version itself. The unsafe version was just as easy to pick, and nothing stopped code from shipping with the wrong one.
The endpoint was written with help from an AI coding assistant, which used the simpler no-check version. The generated code worked and passed our tests; the missing ownership check doesn't show up in normal functional testing, so neither our tests nor our review caught it.

How the attacker found targets

To change a CVM, the attacker first needed its ID (App ID). They gathered IDs from a public listing endpoint that returned apps running on the platform, and from places where app IDs are openly available (for example, public attestation data).

One point is worth being clear about: a CVM or App ID is a private identifier, not an access credential. Knowing one was never meant to let anyone act on your CVM. We've also retired the deprecated public listing endpoint the attacker used to gather IDs in bulk, which removes an easy way to scan for targets.

What the script could reach

The script the attacker added was a broad "grab everything" tool. It ran in the background, so the CVM still started up normally and there was no obvious sign, and it sent whatever it found to a server the attacker controlled.

On the CVM itself, it could read:

Decrypted environment variables and KMS-derived app keys — the most sensitive items, and the main exposure here
Docker registry credentials (for example, AWS ECR)
Your deployment configuration (the compose file and app settings)
Basic network and process details from the VM

The script also tried to grab other things — TLS and SSH keys, wallet and keystore files, and stray .env/secrets files. Those normally live inside your containers, not on the CVM itself. The pre-launch script runs before your containers start, so at that point the containers aren't running and their files aren't mounted yet — the attacker tried for them but couldn't reach them through this path. The full script is in Appendix B if you want to see exactly what it looked for.

What we did right away

Fixed the affected endpoints so they use the workspace check. The safe version already existed, so the fix was about one line per endpoint.
Stopped every affected CVM and reverted the malicious pre-launch scripts to their original state. Every CVM change on Phala Cloud is saved with full before/after history, which let us find and undo the changes quickly. (Reverting stops any further exfiltration, but it can't undo secrets that were already exposed — which is why affected CVMs still need a clean redeploy.)
Found all the accounts the attacker controlled and banned them, and preserved the incident logs.
Emailed affected customers with clear next steps.
Published a public notice the same day.

What we're changing

We're addressing this at four levels: preventing the bug, detecting an attack sooner, limiting the damage if something gets through, and responding faster.

Prevention

The root problem was that the safe ownership check was optional, so a single endpoint could skip it. We're removing that choice:

Build-time checks (✅ done). Our test suite now inspects every CVM endpoint and fails the build if one that operates on a specific CVM doesn't go through the workspace ownership check. This would have caught the exact bug behind this incident.
Full endpoint review and isolation tests (✅ done). We reviewed all of our API endpoints again to confirm each one requires the right login and permissions, and added automated tests that a user in one workspace cannot reach another workspace's CVM. New endpoints are covered automatically.
Safe-by-default routing (🔧 in progress). We're reorganizing the API so an endpoint's location sets its security level automatically. "Remember to add the check" stops being a manual step anyone can forget.
Focused review of sensitive code (🔧 in progress). We're grouping the most security-critical code so it gets focused human review. As more of our code is written with AI assistance, concentrating these critical paths for deeper review matters more, not less.
A separate permission service (🗓️ planned). Our product changes quickly, but the code that decides who is allowed to do what doesn't need to. By moving these permission checks into a separate service that changes rarely and is reviewed more strictly, we keep this critical logic stable and lower the chance that fast day-to-day development breaks it.

Detection

This is where we fell short. The attack ran for about a day, and a customer found it before we did. We keep a full audit log of every action, which let us reconstruct what happened, but we weren't monitoring it in real time. That changes:

Alerts on cross-workspace attempts (🗓️ planned). Acting on a CVM outside your own workspace is now blocked, and we're adding alerts on any such attempt.
Automated review of API access patterns (🗓️ planned). The attacker kept a slow pace on purpose (about 3–5 requests a minute) to stay under simple rate limits, so rate limits alone won't catch this. We plan to use an LLM-based system to scan API access patterns — for example, one account editing the pre-launch script across many CVMs, or scanning a lot of resources — and flag risky behavior for review.

The aim is to catch this kind of activity much sooner than we did this time.

Mitigation

Even with the bug, a few things would have reduced the impact. We're adding them:

Confirmation for sensitive changes (🗓️ planned). Optional email confirmation or 2FA before high-impact Offchain KMS CVM changes, like editing the pre-launch script. A valid session alone wouldn't be enough.
Outbound firewall rules (🗓️ mid-to-long term). The script worked by sending data to an outside server. Adding network ACLs to limit where a CVM can send traffic would reduce this kind of leak even if a malicious script runs. This is a larger change, so we're tracking it as longer-term work.
Faster recovery (🗓️ planned). Better tooling to rotate secrets and redeploy affected CVMs, so cleanup takes minutes.

Response

Once we knew, we moved quickly (see "What we did right away"). Two things made that possible:

The safe mechanism already existed, so the fix was about one line per endpoint and shipped roughly an hour after the report.
Every CVM change is saved with full before/after history, so we could undo the attacker's changes cleanly and completely.

We're also adding:

A written incident playbook and clearer on-call (🗓️ planned) so this response time is consistent.
An audit log in your dashboard (🗓️ planned) so you can see changes made to your own CVMs and check what happened yourself, without waiting on us.

What you should do

We've identified the affected CVMs and emailed those customers directly. If you didn't get an email from us about this, your CVMs were not affected and there's nothing you need to do.

If we did notify you, please:

Redeploy the affected CVM from a clean start, and treat the old one as compromised. Redeploying generates fresh KMS-derived app keys; once the old CVM is destroyed, the exposed keys can no longer be used.
Revoke and replace every secret in its encrypted environment variables — API keys, tokens, database passwords, private keys. Assume the old values are already in the attacker's hands, so invalidate them; don't just issue new ones alongside.
Follow the blast radius downstream. For each exposed secret, check what it unlocks — external APIs, databases, wallets, signing keys — and rotate or lock those too.
Rotate the registry credentials (AWS ECR and any others) the CVM used.
If you keep egress/network logs for the CVM (many setups don't), check them for connections to the attacker's server or any other unexpected address (see Appendix A). If you don't have such logs, assume exposure and complete the steps above.

If you'd like to confirm the impact yourself, you can check when the CVM last restarted (around May 31–June 1 UTC) and inspect its pre-launch script for the injected block shown in Appendix B.

Not sure about anything, or want a hand working through it? Email us at cloud@phala.com and we'll go through your setup with you.

Our commitment

We're sorry this happened, and we don't take your trust for granted. The part we're least happy about is the detection gap — that the attack ran for a day and a customer found it before we did. That's why detection and response are front and center in the plan above, alongside the prevention work, and we'll keep you posted as the rest of it lands.

We know how much you rely on this, and we're committed to doing better.

Appendix A — Indicators of compromise (IOCs)

If you'd like to check your own logs:

Attacker's server: http://199.91.221.65:9998 (data was sent to paths like /x/<app_id>/...)
Endpoint that was abused: PATCH /api/v1/cvms/{id}/pre-launch-script (also /docker-compose and /envs)
Attacker source IPs: 160.202.160.203, 160.202.160.239, 160.202.160.248, 160.202.160.56, 61.97.243.9, 195.154.153.4, 199.91.221.65 (the last address is also the attacker's server listed above)
User agents seen: curl/7.81.0, python-requests/2.31.0, python-requests/2.33.1, node

If you see traffic from one of your CVMs going to the attacker's server, treat that CVM's secrets as exposed.

This is a customer-facing post-mortem. For questions or to schedule a review of your setup, contact the Phala Cloud team at cloud@phala.com.