GPU TEE Inference API
Last updated
Was this helpful?
Last updated
Was this helpful?
Phala Cloud provides LLM inference API by connecting to Redpill. By navigating to Dashboard->GPU TEE API page, you will see details.
In order to use the API, user needs to deposit at least $5 in advance to enable a Redpill account if your balance is not sufficient. You can head to Dashboard->Billing page then click Deposit button to top-up with your bank card or crypto wallet.
At the bottom of the page, click the Enable button to connect your Cloud account with a Redpill account, and then click the button Create New API Key. Copy the key to use when you interact with Redpill API.
We provide OpenAI-compatible API for you to send chat requests to the LLM running inside TEE, where you just need to use the API endpoint https://api.red-pill.ai/v1/chat/completions
. A simple request could be like:
Sample Response
You can verify if the LLM is running in GPU TEE. This can be done by verifying its attestation report. To get the attestation report of the LLM inference, you can do this by sending a POST request to the Redpill API endpoint like below:
The response will be like:
The signing_address
is the account address generated inside TEE that will be used to sign the chat message later.
The all_attestations
is the list of all the attestations of all GPU nodes since we add more TEE nodes to serve the inference requests. You can utilize the signing_address
from the all_attestations
to select the appropriate TEE node for verifying its integrity.
You can copy the value of nvidia_payload as the whole payload as followed to verify:
The signing_address
is the account address generated inside TEE that will be used to sign the chat response. You can go to https://etherscan.io/verifiedSignatures, click Verify Signature, and paste the signing_address
and message response to verify it.
nvidia_payload
and intel_quote
are the attestation report from NVIDIA TEE and Intel TEE respectively. You can use them to verify the integrity of the TEE. See Verify the Attestation for more details.
If you chat with the LLM, the response will contain an id
which you can use to get the chat Signature later.
Sample Request
That sha256 of the request body is e5542b0757e0b9d05bfa4a15da7bac97a03bd35d21b648ec492152708e795ff9
(note: in this example, there is no new line in the end of request)
Simple Response
The sha256sum of response body is 7a97926adb2044fd598b392eee98ad8f7c39ea3a47747ca968ef755bbf57c211
(note: in this example, there are two new line in the end of response)
The id
is calculated by sha256sum(sha256sum(request_body) + sha256sum(response_body)).
By default, you can query another API with the value of id in the response in 30 minutes.
Request
GET https://api.red-pill.ai/v1/signature/{request_id}?model={model_id}&signing_algo=ecdsa
For example, the response in the previous section, the id is chatcmpl-0cdf7629fcfa4135bbdb9936e737e95c
:
Response
text: the message you may want to verify. It is joined by the sha256 of the HTTP request body, and of the HTTP response body, separated by a colon :.
signature: the signature data.
signing_algo: The cryptographic scheme that the signer private key generated.
Exactly match the value we calculated in the sample in previous section.
Limitation
Since the resource limitation, the signature will be kept in the memory for 5 minutes since the response is generated.
Go to https://etherscan.io/verifiedSignatures, click Verify Signature:
Address: You can get the address from the attestation API. The address should be same if the service did not restart.
Message: see the Response of the Signature section. You can also calculate the sha256 by yourself.
Signature Hash: See the Signature section.
is a models marketplace that supports private AI inference. It currently supports two models that are running in GPU TEE, you can view them in the models page by clicking the GPU TEE
checkbox:
You can verify the Intel TDX Attestation Report, aka quote with the value of intel_quote
at .
Note: The trust chain works as follows: when you verify the attestation report, you trust the model provider (Redpill) and the TEE providers (NVIDIA and Intel). You then trust the open-source, reproducible code by verifying the source code . Finally, you trust the cryptographic key derived inside the TEE. This is why we only need to verify the signature of the message during chat.