Host LLM in GPU TEE
Last updated
Was this helpful?
Last updated
Was this helpful?
Private AI or called confidential AI addresses critical concerns such as data privacy, secure execution, and computation verifiability, making it indispensable for sensitive applications. As illustrated in the diagram below, people currently cannot fully trust the responses returned by LLMs from services like OpenAI or Meta, due to the lack of cryptographic verification. By running the LLM inside a TEE, we can add verification primitives alongside the returned response, known as a Remote Attestation (RA) Report. This allows users to verify the AI generation results locally without relying on any third parties.
The implementation for running LLMs in GPU TEE is available in the private-ml-sdk GitHub repository. This project is built by Phala Network and was made possible through a grant from NEARAI. The SDK provides the necessary tools and infrastructure to deploy and run LLMs securely within GPU TEE.
Redpill is a models marketplace that supports private AI inference. It currently supports two models that are running in GPU TEE, you can view them in the models page by clicking the GPU TEE
checkbox:
After sign in on Redpill, you can get the API key from here.
The first step is verify if the LLM is running in GPU TEE before you chat with it. This can be done by verify its attestation report. To get the attestation report of the LLM inference, you can do this by sending a POST request to the Redpill API endpoint like below:
The response will be like:
The signing_address
is the account address generated inside TEE that will be used to sign the chat message later. You can go to https://etherscan.io/verifiedSignatures, click Verify Signature, and paste the signing_address
and message message to verify it.
nvidia_payload
and intel_quote
are the attestation report from NVIDIA TEE and Intel TEE respectively. You can use them to verify the integrity of the TEE. See Verify the Attestation on Redpill docs for more details.
Note: The trust chain works as follows: when you verify the attestation report, you trust the model provider (Redpill) and the TEE providers (NVIDIA and Intel). You then trust the open-source, reproducible code by verifying the source code here. Finally, you trust the cryptographic key derived inside the TEE. This is why we only need to verify the signature of the message during chat.
We provide OpenAI-compatible API for you to send chat request to the LLM running inside TEE, where you just need to replace the API endpoint to https://platform.openai.com/docs/api-reference/chat
. A simple request could be like:
Check more details on Redpill model page, for example, deepseek-r1-70b