Harnessing Apigee as an LLM Gateway: Implementation Guide

Introduction

Welcome back to the second part of our series on leveraging Apigee as a powerful API Management platform for Large Language Models (LLMs)! In Part 1 of this guide, we explored the compelling reasons and architectural advantages of positioning Apigee as a unified gateway for interacting with diverse LLM providers. We discussed how Apigee enhances security, streamlines integration, optimizes performance, and provides crucial analytics and monetization capabilities for your AI-powered applications.

Now, it’s time to get hands-on! This tutorial will guide you through the practical steps of implementing the solution we discussed. We’ll start by setting up the backend service that directly interacts with various LLMs (Gemini, OpenAI, Anthropic Claude), then configure Apigee to act as the intelligent gateway, and finally, connect a simple frontend application to demonstrate the end-to-end flow.

Target Audience: This tutorial is designed for developers, API architects, and AI engineers who have a good understanding of Google Cloud Platform (GCP), Apigee concepts, and Python.

Prerequisites:

A Google Cloud Platform (GCP) project with billing enabled.
An Apigee instance provisioned in your GCP project.
Basic familiarity with the gcloud CLI.
Python 3.8+ installed on your local machine.
API keys for Gemini, OpenAI and Anthropic (if you plan to use those LLMs).

By the end of this tutorial, you’ll have a functional Apigee-powered LLM gateway, ready to be extended and customized for your specific needs. Below is the architecture we will be implementing.

High-level architecture of the LLM gateway solution (powered by Apigee)

The complete source code for this tutorial is available on GitHub:

https://github.com/apigeekk/apigee-llms-gw

Before you begin, clone this repository to your local machine:

git clone https://github.com/apigeekk/apigee-llms-gw.git

cd apigee-llms-gw

⚠️ Disclaimer: The code above is provided for demo purposes only. It is not intended for production use. Use at your own risk.

1. The LLM Integration Backend Service (Google Cloud Run)

At the heart of our LLM gateway solution is a lightweight backend service responsible for abstracting the direct interactions with different LLM providers. This service, deployed on Google Cloud Run for its scalability and serverless nature, acts as a centralized point for sending prompts and receiving responses from Gemini, OpenAI’s GPT models, and Anthropic’s Claude.

1.1. Service Overview

The Python code (main.py) powers this backend service, built using Flask to handle incoming HTTP requests and route them to the appropriate LLM.

The full source code of this service is available in the GitHub repository under the cloudrun-integration/ directory.

Key Responsibilities:

LLM Routing & Interaction: our service inmain.py defines separate Flask routes (/gemini, /openai, /anthropic) to handle requests for different LLM providers. Each route calls a dedicated function (e.g., get_gemini_response, get_chatgpt_response, get_calude_response) that implements the specific API client calls and data parsing for its respective LLM. For Google Gemini, OpenAI (GPT models), and Anthropic (Claude models), the api_key is passed in the request payload and used by the respective API clients.
Prompt Sanitization Utility (/sanitizePrompt endpoint): our Cloud Run service also exposes a /sanitizePrompt endpoint. This utility endpoint is designed to interact with Google’s Model Armor API to perform real-time scanning of user prompts. Crucially, the LLM-specific handlers (/gemini, /openai, /anthropic) in this main.py do not directly call this sanitization logic; instead, Apigee will orchestrate the sanitization step as part of its pre-processing. This will be covered later on.
Response Normalization: Each LLM interaction function extracts the relevant response text and token usage, wrapping them in a consistent JSON format (response, tokens_count).

The full list of Python libraries necessary for the Flask application to run can be found in therequirements.txt file.

The Procfile specifies the command that Gunicorn should execute to start the web service. Cloud Run automatically detects and uses this file.

1.2. Deploying to Google Cloud Run with Authentication

Now, let’s deploy this service to Google Cloud Run, ensuring it only accepts authenticated requests. This assumes you have the Google Cloud SDK installed and configured.

Enable Necessary APIs: Ensure the Cloud Run API, Artifact Registry API, and Cloud Build API are enabled in your GCP project.

gcloud services enable run.googleapis.com cloudbuild.googleapis.com artifactregistry.googleapis.com

Navigate to Your Project Directory: Open your terminal or command prompt and navigate to the cloudrun-integration directory within the cloned GitHub repository.

cd cloudrun-integration

Deploy the Service (Requiring Authentication): To make your Cloud Run service require authentication, omit the --allow-unauthenticated flag. Cloud Run services are private by default when this flag is not used.

gcloud run deploy <SERVICE_NAME> --source . --region <YOUR_GCP_REGION>

<SERVICE_NAME>: This will be the name of your Cloud Run service.
--source: Tells Cloud Run to build and deploy from the current directory.
<YOUR_GCP_REGION>: Your favorite GCP region (e.g., us-central1, europe-west1).

Important: After this deployment, direct calls to your Cloud Run service URL without proper authentication (e.g., an ID Token) will be rejected. This is the desired secure behavior.

Once deployed, the command output will provide the Service URL for your Cloud Run service. It will look something like https://service_name-xxxxxxx-uc.a.run.app. Make sure to copy this URL, as you’ll need it when configuring your Apigee proxy.

1.3. Service Account Permissions for Model Armor

The interaction with Google Gemini (via Vertex AI) and Model Armor within your Cloud Run service typically leverages the Cloud Run service’s default service account. For these interactions to succeed, this service account (usually in the format PROJECT_NUMBER-compute@developer.gserviceaccount.com) needs the appropriate IAM role :

Model Armor User (roles/modelarmor.user)

You can grant this role via the GCP Console (Navigate to IAM & Admin > IAM) or using the gcloud CLI as shown below:

gcloud projects add-iam-policy-binding <YOUR_GCP_PROJECT_ID> \
    --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" \
    --role="roles/modelarmor.user"

Note: It is considered best practice not to use the default service account. Instead, create a dedicated service account for this purpose and assign it the required role. The default service account is used here for demonstration purposes only.

2. The Apigee API Platform Layer

With our LLM integration service deployed and securely configured on Cloud Run, the next critical step is to set up the Apigee API Platform. Apigee will act as the intelligent intermediary, handling client requests, applying security policies, generating authentication tokens for the Cloud Run service, routing to the correct LLM backend, and orchestrating advanced features like prompt sanitization and token accounting.

2.1. Apigee API Proxy Structure

The apigee/apiproxy/apigee-llms-gw-api.xml file defines the overall structure of our Apigee API proxy. This is the main definition that ties together all the policies, proxy endpoints, and target endpoints.

Note: This API proxy also includes a shared flow used for prompt sanitization, which we will cover in detail below. First, you need to create the shared flow in Apigee, and then create the API proxy using the assets provided in the GitHub repository.

Here is an overview of our API proxy:

How does it work?

In short, the request first arrives and is checked for a valid API key. It then goes through a rate limiting policy, after which relevant inputs such as the user prompt and LLM type are extracted. If any of these required inputs are missing, an error is thrown and returned to the API client.

If the inputs are valid, Apigee checks whether prompt sanitization is needed based on the API products associated with the provided API key. If sanitization is required, a shared flow is triggered, which performs prompt sanitization by making a service callout to our previously deployed Cloud Run service. This service uses Cloud Armor to evaluate the prompt and returns a final verdict to Apigee.

Depending on the request endpoint (/gemini, /openai, /anthropic), the relevant API key is retrieved from an encrypted Key Value Map (KVM). The request is then forwarded to our Cloud Run service, which routes it to the appropriate LLM, retrieves the response, and sends it back to Apigee. Finally, Apigee extracts token usage (if the request was successful) and returns the response to the API client.

2.2. Apigee Proxy Policies: In-Depth Explanation

Now let’s delve into the individual Apigee policies that enforce and manage the flows within our apigee-llms-gw-api proxy. The full configuration for each policy is available in the GitHub repository under apigee/apiproxy/policies/ and apigee/sharedflowbundle/policies/.

Proxy Endpoint Configuration (apigee/apiproxy/proxies/default.xml): This XML defines the client-facing interface of your API proxy. It specifies the base paths and sets up conditional flows that direct incoming requests to different LLM-specific processing paths based on the URI suffix (e.g., /gemini, /openai) and HTTP method. It also attaches policies to these flows, controlling the execution order for request pre-processing and response post-processing.
VA-VerifyApiKey: This VerifyAPIKey policy is fundamental for client authentication. It validates the API key provided by the client against registered developer apps and API products in Apigee, ensuring only authorized applications can access your LLM gateway.

<VerifyAPIKey continueOnError="false" enabled="true" name="VA-VerifyApiKey">
  <DisplayName>VA-VerifyApiKey</DisplayName>
  <APIKey ref="request.header.apikey"/>
</VerifyAPIKey>

DynamicQuota: This Quota policy dynamically enforces request limits based on the attributes defined in the API Product associated with the client’s API key. This allows for tiered access where different products can have different maximum requests per interval. If the quota is exceeded, the proxy raises an RF-QuotaExceeded fault.

<Quota continueOnError="false" enabled="true" name="DynamicQuota" type="calendar">
  <DisplayName>DynamicQuota</DisplayName>
  <Interval ref="verifyapikey.VA-VerifyApiKey.apiproduct.developer.quota.interval">1</Interval>
  <TimeUnit ref="verifyapikey.VA-VerifyApiKey.apiproduct.developer.quota.timeunit">minute</TimeUnit>
  <Allow countRef="verifyapikey.VA-VerifyApiKey.apiproduct.developer.quota.limit"/>
</Quota>

SA-SpikeArrest: This SpikeArrest policy protects our Cloud Run service from sudden traffic surges. It limits the number of requests processed within a very short window, preventing your LLM services from being overwhelmed by unexpected bursts of traffic. If the limit (2 requests/second in our case) is exceeded, an RF-TooManyRequests fault is raised.

<SpikeArrest continueOnError="false" enabled="true" name="SA-SpikeArrest">
  <DisplayName>SA-SpikeArrest</DisplayName>
  <Rate>30pm</Rate>
</SpikeArrest>

GetGeminiAPIKey (and similar for OpenAI, Anthropic): These KeyValueMapOperations policies are used to securely retrieve LLM API keys from an Apigee encrypted Key Value Map (KVM) named hib-kvm. This practice avoids hardcoding sensitive API keys directly in the proxy configuration.

<KeyValueMapOperations name="GetGeminiAPIKey" mapIdentifier="hib-kvm">
  <DisplayName>KVM-GetGeminiAPIKey</DisplayName>
  <Scope>environment</Scope>
  <Get assignTo="private.gemini_apikey">
    <Key><Parameter>gemini_api_key</Parameter></Key>
  </Get>
</KeyValueMapOperations>

Note: You must create an encrypted KVM (e.ghib-kvm) in your Apigee environment and populate it with keys gemini_api_key, openai_api_key, and anthropic_api_key.

Assign-Message-Gemini (and similar for Assign-Message-Open, Assign-Message-Anthropic): This specific AssignMessage policy is responsible for dynamically setting the target.url to the correct Cloud Run backend endpoint (e.g., /gemini). This allows Apigee to route the request to the appropriate LLM handler in our Cloud Run service.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage continueOnError="false" enabled="true" name="Assign-Message-Gemini">
  <DisplayName>Assign Message-Gemini</DisplayName>
  <Properties/>
  <AssignVariable>
    <Name>target.url</Name>
    <Value>[CLOUD_RUN_SERVICE_URL]/gemini</Value>
    <Ref/>
  </AssignVariable>
  <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
  <AssignTo createNew="false" transport="http" type="request"/>
</AssignMessage>

Note: Replace CLOUD_RUN_SERVICE_URL with your actual Cloud Run service URL from the previous step. Do the same for Assign-Message-Open, Assign-Message-Anthropic.

AM-Gemini (and similar for AM-Open, AM-Anthropic): This AssignMessage policy is used to construct the JSON request payload that will be sent to our Cloud Run backend. It injects necessary parameters such as the LLM model, the user prompt, the API key (retrieved from KVM), the max_output_tokens (from API Product attributes), and the Model Armor template URL.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage continueOnError="false" enabled="true" name="AM-Gemini">
  <DisplayName>AM-Gemini</DisplayName>
  <Properties/>
  <Add>
    <Headers>
      <Header name="X-Api-Key">{private.gemini_apikey}</Header>
      <Header name="X-Max-Output-Tokens">{verifyapikey.VA-VerifyApiKey.apiproduct.operation.attributes.max_output_tokens}</Header>
    </Headers>
  </Add>
  <Set>
    <Payload contentType="application/json">
            {
                "model": "{urirequest.model}",
                "prompt":"{urirequest.prompt}"
            }
        </Payload>
  </Set>
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
  <AssignTo createNew="false" transport="http" type="request"/>
</AssignMessage>

DC-TokensCount: This DataCapture policy extracts the tokens_count from the backend’s successful response. This value is then recorded into a Data Collector named dc_llm_used_tokens for analytics and potential monetization purposes.

<DataCapture name="DC-TokensCount" continueOnError="false" enabled="true">
  <Capture>
    <DataCollector>dc_llm_used_tokens</DataCollector>
    <Collect ref="apigee.tokens_count"/>
  </Capture>
  <Condition>response.status.code = 200 </Condition>
</DataCapture>

DC-ModelType: This DataCapture policy captures the LLM model type used in the request (e.g., gemini-1.5-flash, gpt-4). This data is stored in the dc_model_type Data Collector, allowing you to analyze usage patterns per model.
Fault Handling Policies (RF-NotFound, RF-MissingInputs, RF-TooManyRequests, RF-QuotaExceeded): These RaiseFault policies are used to handle various error conditions gracefully. When a specific error occurs (e.g., API key missing, quota exceeded, or internal server errors), these policies construct and return an appropriate HTTP status code and custom error message to the client, providing clear feedback. The full XMLs for these policies are in the GitHub repo.
EV-Inputs & EV-TokenCounts: These ExtractVariables policies are used to parse and extract specific data from incoming request parameters (e.g., prompt, model) and outgoing response data (e.g., tokens_count before it’s passed to DC-TokensCount). This allows Apigee to create accessible flow variables from parts of the request or response. The full XMLs are in the GitHub repo.

2.3. Apigee Service Account for Cloud Run Invocation

Since our Cloud Run service is configured to accept only authenticated requests, Apigee needs a way to authenticate itself when calling it. This is achieved by having Apigee generate a Google ID token signed by a specific Google Cloud Service Account and including it in the Authorization header of the request to Cloud Run.

Steps to prepare your Service Account:

Create a dedicated Service Account for Apigee: It’s a best practice to create a service account specifically for Apigee to invoke your Cloud Run service. Navigate to IAM & Admin > Service Accounts in the GCP Console and create a new service account (e.g., apigee-cloudrun-invoker).
Grant Cloud Run Invoker role: Grant this new service account the Cloud Run Invoker role (roles/run.invoker) on your GCP project. This permission allows it to make authenticated calls to your Cloud Run services. Replace <YOUR_GCP_PROJECT_ID> with your actual project ID.

gcloud projects add-iam-policy-binding <YOUR_GCP_PROJECT_ID> \
    --member="serviceAccount:apigee-cloudrun-invoker@<YOUR_GCP_PROJECT_ID>.iam.gserviceaccount.com" \
    --role="roles/run.invoker"

Important Note for Deployment: When deploying your Apigee proxy, you will need to provide the email of this service account (apigee-cloudrun-invoker@<YOUR_GCP_PROJECT_ID>.iam.gserviceaccount.com) in the Apigee UI (or via CLI/Mage/Maven if using those tools) to allow Apigee to use it for generating ID tokens to our Cloud Run service.

2.4. Configuring Model Armor Template for Prompt Sanitization

Before setting up your Apigee proxy and API Products, you need to configure your Model Armor template. This template defines the specific content filters and confidence levels that will be applied to user prompts for sanitization. Our Cloud Run backend’s /sanitizePrompt endpoint will call this Model Armor service.

Set Environment Variables: Open your terminal and set the following environment variables.

export TEMPLATE_ID=<PROJECT_ID> #e.g. my-apigee-llm-template2"
export LOCATION=<GCP_REGION> # Or your desired region for Model Armor (check documentation) )
export PROJECT_ID=<PROJECT_ID> # Replace with your GCP Project ID
export GCLOUD_AUTH_TOKEN=$(gcloud auth print-access-token)
echo "Using Project ID: $PROJECT_ID"
echo "Using Location: $LOCATION"
echo "Generated Access Token: $GCLOUD_AUTH_TOKEN"

Create a Model Armor Template: Initially, create a basic Model Armor template. This task can also be done manually in the GCP console.

curl -X POST \
  -d "{'filter_config': {} }" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GCLOUD_AUTH_TOKEN" \
  "https://modelarmor.us-central1.rep.googleapis.com/v1alpha/projects/$PROJECT_ID/locations/$LOCATION/templates?template_id=$TEMPLATE_ID"

Update the Model Armor Template with Filters: Now, update the template with specific content filters and their confidence levels, enabling PI & Jailbreak filtering, malicious URI filtering, and SDP settings. Under modelarmor/template.json you will find the template I used for this tutorial.

# See cloud-run-backend/modelarmor.py for the full curl command
curl -X PATCH \
-H "Authorization: Bearer $GCLOUD_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "filterConfig": { "raiSettings": { ... }, "piAndJailbreakFilterSettings": { ... }, ... } }' \
"https://modelarmor.us-central1.rep.googleapis.com/v1alpha/projects/$PROJECT_ID/locations/$LOCATION/templates/$TEMPLATE_ID?update_mask=filter_config"

Note the Model Armor Template URL: The full URL to this configured Model Armor template’s sanitizeUserPrompt endpoint will be crucial. This is the URL that our Apigee API Product will store and pass to the Cloud Run service. This URL has the following format: https://modelarmor.us-central1.rep.googleapis.com/v1alpha/projects/<YOUR_GCP_PROJECT_ID>/locations/<LOCATION>/templates/<TEMPLATE_ID>:sanitizeUserPrompt

2.5. Prompt Sanitization: The PromptSanitization Shared Flow and API Product Tiering

This feature centralizes prompt validation and offers differentiated services based on API Products. Apigee invokes a Shared Flow named PromptSanitization for this task.

Apigee Shared Flow for prompt sanitization

Our shared flow has the steps/policies shown below:

<SharedFlow name="default">
    <Step><Name>SC-SanitizePrompt</Name></Step>
    <Step><Name>EV-ExtractModelArmorResponse</Name></Step>
    <Step><Name>Raise-Fault-1</Name>
        <Condition>(modelarmor.verdict == "MATCH_FOUND") or (modelarmor.verdict == "ERROR")</Condition>
    </Step>
</SharedFlow>

Here’s a detailed explanation of the policies within this Shared Flow:

Conditional Execution of FC-SanitizePrompt: The FC-SanitizePrompt policy in our main proxy’s flow has a condition checking the ModelArmorTemplatePath custom attribute from the API Product. If an API key from the “Advanced” API product is used, this attribute is present, then the Shared Flow is executed. Otherwise, it’s skipped.
SC-SanitizePrompt: This ServiceCallout policy makes an authenticated HTTP POST request to our Cloud Run /sanitizePrompt endpoint. This is where the prompt is sent to our Cloud Run backend service for Model Armor sanitization. It includes a Google ID Token for secure, service-to-service authentication.

<ServiceCallout continueOnError="false" enabled="true" name="SC-SanitizePrompt">
    <DisplayName>SC-SanitizePrompt</DisplayName>
    <Request clearPayload="true" variable="myRequest">
        <Set><Payload contentType="application/json">
            { "prompt": "{urirequest.prompt}", "model_armor_template_url": "{verifyapikey.VA-VerifyApiKey.apiproduct.ModelArmorTemplatePath}" }
        </Payload></Set>
    </Request>
    <Response>calloutResponse</Response>
    <HTTPTargetConnection>
        <Authentication><GoogleIDToken><Audience useTargetUrl="true"/></GoogleIDToken></Authentication>
        <URL>[CLOUD_RUN_SERVICE_URL]/sanitizePrompt</URL>
    </HTTPTargetConnection>
</ServiceCallout>

Note: replace the CLOUD_RUN_SERVICE_URL with your Cloud Run service URL.

EV-ExtractModelArmorResponse: This ExtractVariables policy parses the JSON response received from the SC-SanitizePrompt ServiceCallout. It extracts the verdict (e.g., NO_MATCH_FOUND, MATCH_FOUND, ERROR) and any message from Model Armor’s response, making these values available as Apigee flow variables (modelarmor.verdict, modelarmor.message) for subsequent policy decisions.

<ExtractVariables name="EV-ExtractModelArmorResponse">
    <Source>calloutResponse</Source>
    <VariablePrefix>modelarmor</VariablePrefix>
    <JSONPayload>
        <Variable name="verdict"><JSONPath>$.verdict</JSONPath></Variable>
        <Variable name="message"><JSONPath>$.message</JSONPath></Variable>
    </JSONPayload>
</ExtractVariables>

Raise-Fault-1: This RaiseFault policy is conditionally executed based on the modelarmor.verdict variable. If Model Armor determines that a sensitive prompt MATCH_FOUND or an ERROR occurred during sanitization, this policy is triggered. It then returns a 400 Bad Request HTTP status code and an informative error payload to the client, preventing the harmful or invalid prompt from reaching the LLM.

<RaiseFault continueOnError="false" enabled="true" name="Raise-Fault-1">
    <DisplayName>Raise Fault-1</DisplayName>
    <FaultResponse>
        <Set>
            <Payload contentType="application/json">{ "error": "{modelarmor.message}" }</Payload>
            <StatusCode>400</StatusCode><ReasonPhrase>Bad Request</ReasonPhrase>
        </Set>
    </FaultResponse>
    <Condition>(modelarmor.verdict == "MATCH_FOUND") or (modelarmor.verdict == "ERROR")</Condition>
</RaiseFault>

2.6. Defining API Products: Tiered Access and Capabilities

A key strength of Apigee is its ability to define and manage API Products. These products allow you to bundle your LLM APIs and offer varying levels of service, features, and quotas to different developer segments or use cases. You’ve created two distinct API products to showcase tiered access:

HIB_LLM_API_Product_Basic: Standard tier, basic access.
HIB_LLM_API_Product_Advanced: Premium tier, enhanced capabilities like prompt sanitization and higher quota limits.

Apigee intelligently detects which API Product is being used at runtime based on the API key (via VA-VerifyApiKey) provided by the API client.

Steps to create your API Products in Apigee:

Navigate to Publish > API Products in your Apigee UI.
Click “+ API Product” to create each product.
Configure each product as shown below, focusing on Operations (paths, methods, quota) and Custom Attributes.

Below is the configuration of our basic API product.

The advanced API product below includes a custom attribute that points to the Model Armor template used for prompt sanitization. You will need to replace this with your own template url.

By configuring API Products this way, Apigee dynamically applies policies like quota and conditionally executes prompt sanitization based on the API key.

3. The Frontend Application (Streamlit on Google Cloud Run)

Our frontend is an interactive Streamlit application acting as the client. It allows users to select LLMs, input prompts, choose API product tiers, and send requests to your Apigee proxy.

This Streamlit application provides the user interface for interacting with your Apigee LLM Gateway.

The full source code for the Streamlit frontend is available in the GitHub repository under the frontend/ directory.

Key Responsibilities (Referencing app.py):

User Interface: Provides UI widgets for selecting LLMs, API products, and inputting prompts.
API Product Selection: Allows choosing “Basic” or “Advanced” API Products, which selects the corresponding API key from config.py, demonstrating tiered capabilities.
Apigee Proxy Interaction: The call_api function sends POST requests to our Apigee proxy, including the API key in the apikey header.

For this Streamlit application to call Apigee, it needs valid API keys linked to your API Products. As the next step, we will create two developer apps in Apigee in order to obtain API keys.

A. Create Developer Apps:

Navigate to Publish > Developer Apps in your Apigee UI.
Create two apps: “LLM App Basic App” (associate it with HIB_LLM_API_Product_Basic) and “LLM App Advanced App” (associate it with HIB_LLM_API_Product_Advanced).
Copy the API key for each app.
Update your frontend/config.py file with these actual API keys and your Apigee Proxy Endpoint URL.

This config.py file stores sensitive or environment-specific information for local development and testing only.

apigee_endpoint: Base URL of your deployed Apigee proxy. Update this with your actual proxy host.
apigee_api_key_basic: API key for HIB_LLM_API_Product_Basic.
apigee_api_key_secure: API key for HIB_LLM_API_Product_Advanced.

Important Security Note: For production, externalize these sensitive values (API keys and endpoint URLs) using Google Cloud Secret Manager (most secure and recommended) or Cloud Run Environment Variables. Dot not commit them to a Github repository.

B. Deploying the Streamlit Frontend to Google Cloud Run:

Deploying the Streamlit application to Cloud Run makes it accessible as a serverless web application.

Navigate to the frontend/ folder directly and deploy the service using the commands below:

cd streamlit-frontend

gcloud run deploy <YOUR_SERVICE_NAME> --source . --region <YOUR_GCP_REGION> --allow-unauthenticated

Once deployed, open the Service URL for your Cloud Runservice in your browser. The UI is shown in the screenshot below

Note: Unauthenticated requests are allowed here for demonstration purposes only. For production environments, I recommend securing your Cloud Run service using GCP IAM or Identity-Aware Proxy (IAP).

5. Analytics and Monitoring: Gaining Insights into LLM Usage

Apigee’s analytics capabilities are crucial for managing LLM usage, controlling costs, and understanding consumption patterns.

5.1. Data Capture Policies for LLM Metrics

Our proxy includes DataCapture policies to extract and record key metrics:

DC-TokensCount: Captures total tokens used per interaction into a Data Collector named dc_llm_used_tokens.

<DataCapture name="DC-TokensCount" continueOnError="false" enabled="true">
  <Capture>
    <DataCollector>dc_llm_used_tokens</DataCollector>
    <Collect ref="apigee.tokens_count"/>
  </Capture>
  <Condition>response.status.code = 200 </Condition>
</DataCapture>

DC-ModelType: Captures the LLM model type for analytics. The full XML is in apiproxy/policies/DC-ModelType.xml.

For the DataCapture policies above to work, we need to create two Data Collectors in Apigee as described below:

Data Collector dc_llm_used_tokens

Name: dc_llm_used_tokens
Display Name: LLM Used Tokens
Type: INTEGER

Data Collector dc_model_type

Create dc_model_type:
Name: dc_model_type
Display Name: LLM Model Type
Type: STRING

5.2. Leveraging Custom Reports for Insights

Once data collectors are active, we need to create custom analytics report(s).

Create a new Custom Report:

Metrics: Select LLM Used Tokens (dc_llm_used_tokens).
Dimensions: Add API Product, Developer App, Developer Email, LLM Model Type (dc_model_type), etc., to slice data.
Filters: Apply filters for specific data.

This visibility helps answer questions like:

Token consumption by API Product tier?
Which apps/developers use the most tokens?
Token breakdown per LLM?

This deep visibility allows informed decisions on pricing, resource allocation, and API strategy. Furthermore, dc_llm_used_tokens is a powerful asset for monetization. By tracking token consumption, you can integrate this data with Apigee’s monetization features to establish usage-based pricing models, directly translating API usage into business value.

6. Testing the End-to-End Setup

Now that all components are deployed and configured, it’s time to test the entire LLM Gateway solution end-to-end.

1. Access the Frontend Application:

Open your web browser and navigate to the Cloud Run Service URL of your Streamlit frontend (e.g., https://llms-frontend-xxxxxxx-uc.a.run.app).

2. Initial LLM Interaction:

Select an API Product: Choose “Basic LLM API Product”.
Select an LLM: Choose gemini-1.5-flash.
Enter a Prompt: Type a simple, harmless prompt, e.g., “Tell me a fun fact about giraffes.”
Click “Submit”.
Verify Result: You should see a response from the Gemini LLM. Note the “Total tokens used”.

3. Monitor Apigee Trace Session (Highly Recommended!):

In the Apigee UI, navigate to Develop > API Proxies > apigee-llms-gw-api > “Trace” tab.
Click “Start Trace Session”.
Send a few more requests from the Streamlit app.
Observe the trace to see: Policy Execution, Flow Variables, and Target Request/Response.
Stop Trace Session when done.

4. Test Quota Limits:

In the Streamlit frontend, ensure “Basic LLM API Product” is selected.
Rapidly send more requests than its quota (e.g., 5 requests/minute).
Expected Result: An Apigee error (e.g., “HTTP error occurred: 500 Server Error”) due to RF-QuotaExceeded. Check Apigee Trace.

5. Test Rate Limiting (Spike Arrest):

Attempt to send a very high volume of requests in a short burst (more than 2 requests/minute in our case)
Expected Result: You should receive 429 Too many requests errors from Apigee when the SA-SpikeArrest limit is hit. Verify RF-TooManyRequests in the Apigee Trace.

6. Test Prompt Sanitization (Advanced API Product):

In the Streamlit frontend, select “Advanced LLM API Product”.
Try a “sensitive” prompt: Enter a prompt that your Model Armor template should flag (e.g., “What should do I with my NVDA call options?” for the FINANCE filter).
Expected Result: An error from Apigee (e.g., “HTTP error occurred: Prompt failed sanity check due to the following filters: FINANCE”).
Verify in Trace: See FC-SanitizePrompt, SC-SanitizePrompt, EV-ExtractModelArmorResponse, and Raise-Fault-1 execution.

7. Compare Quotas and Max Output Tokens (Basic vs. Advanced):

Send requests with both “Advanced” and “Basic” API Products.
Expected Result: The “Advanced” product should allow higher quotas and potentially larger max_output_tokens as configured.

Conclusion

Congratulations! You have successfully implemented a robust and intelligent LLM Gateway using Apigee API Management. Throughout this two-part series, we’ve moved from understanding the architectural advantages to a hands-on deployment, demonstrating how Apigee acts as a powerful intermediary for your AI initiatives.

You’ve learned how to:

Integrate diverse LLMs (Gemini, OpenAI, Anthropic) through a unified Cloud Run backend service.
Securely expose LLM APIs via Apigee, leveraging API key verification and ID token authentication for Cloud Run.
Implement sophisticated traffic management with dynamic quotas and spike arrest policies, adaptable to different API product tiers.
Orchestrate advanced features like real-time prompt sanitization using Google’s Model Armor, ensuring responsible AI usage.
Gain deep insights into LLM consumption through Apigee’s custom analytics and data collectors, paving the way for cost management and potential monetization.

This Apigee-powered LLM gateway provides a scalable, secure, and observable foundation for your AI-powered applications. It abstracts the complexity of multiple LLM providers, centralizes governance, and offers granular control over access and usage.

I encourage you to explore the provided GitHub repository, adapt this solution to your specific use cases, and further leverage Apigee’s extensive capabilities to unlock the full potential of your LLM integrations.

If you found this useful, please share it. Don’t hesitate to reach out with any questions.