Your Own Devops Agent: The future of observability
Wed 28 May 2025Introduction¶
Google has recently released the Agent Development Kit (ADK), a framework designed to simplify the development and deployment of AI-driven agents. This framework is remarkably user-friendly. In this context, an agent is an interactive API that leverages a Large Language Model (LLM), like Gemini, and is further empowered by your custom scripts to generate responses based on their output.
In the DevSecOps world, we’re already equipped with a multitude of tools and scripts (CLI, API, and more). Now, imagine equipping an AI with all of these capabilities.
As DevOps engineers, we often find ourselves manually reading logs, describing resources to find statuses, and watching events. What if each of us had a dedicated assistant to perform these initial diagnostic steps for you, directly pointing to relevant information?
This article explores how to build and utilize an ADK (Python-based) Kubernetes agent. This agent will act as a diagnostic tool, callable via chat or an API, to quickly gather crucial information about our cluster and provide initial diagnostic insights.
Test it; personally, I am convinced: this is the future of observability.
ADK agent developpement quick start¶
Getting started with the Agent Development Kit (ADK) is straightforward. This section will guide you through how to set up and create a (very) basic agent.
Prerequisites¶
- Before you begin, ensure you have Python (including
venvfor virtual environments andpipfor package installation) installed. - You will also need a Google Cloud Platform (GCP) project set up to use Vertex AI for calling the Gemini model.
- Create a folder for your project:
bash mkdir kubernetes-admin && cd kubernetes-admin - Initialize a virtual environment:
bash python -m venv venv - Activate the virtual environment:
- On macOS and Linux:
bash source venv/bin/activate - On Windows:
bash .\venv\Scripts\activate
- On macOS and Linux:
- Install poetry:
pip install poetry
Basic structure¶
- Create the following files and directories within the
kubernetes-admindirectory you created earlier:
- kubernetes_admin/
|- agent.py # Containing agent instructions : initial prompt and parameters
|- __init__.py # A default init file
|- .env
|- tools/ # To hosting futurer tools interfaces
|- __init__.py # A default init file for the tools package
- Navigate into the inner
kubernetes_admindirectory (this is where your Python package will reside):
cd kubernetes_admin
- Initialize the Poetry project within this
kubernetes_admindirectory:
poetry init
Follow the prompts. You can accept defaults for most, but ensure the package name is `kubernetes_admin`.
- Add the ADK dependency to your project:
poetry add google-ai-agent
*(Note: The package name is `google-ai-agent`, not `google-adk`)*
- Populate the
kubernetes_admin/__init__.pyfile. This makes theagentmodule accessible.
# In kubernetes_admin/kubernetes_admin/__init__.py
from . import agent" > __init__.py
- The
kubernetes_admin/tools/__init__.pyfile can remain empty for now.
# In kubernetes_admin/kubernetes_admin/tools/
touch __init__.py
- Create an empty
.envfile in thekubernetes_admin/kubernetes_admin/directory. This can be used later for environment-specific configurations if needed.
# In kubernetes_admin/kubernetes_admin/
touch .env
Navigate back to the root kubernetes-admin project directory:
cd ..
Giving Agent Instructions¶
The agent.py file (located at kubernetes-admin/kubernetes_admin/agent.py) is crucial in the Agent Development Kit (ADK). It defines the initial prompt and core instructions for the AI agent. Later, it will be expanded to include the specific tools the agent can use and the logic for how it should utilize them.
You can use AI to help generate the initial prompt and instructions, or write them yourself.
Create/edit kubernetes-admin/kubernetes_admin/agent.py with the following content:
# In kubernetes_admin/kubernetes_admin/agent.py
"""Kubernetes_Admin: Agent for interacting with and managing a Kubernetes cluster."""
from google.adk.agents import LlmAgent
MODEL = "gemini-2.5-pro-preview-05-06"
KUBERNETES_ADMIN_INSTRUCTION = """
You are Kubernetes Admin, an AI assistant designed to help users manage and understand their Kubernetes (K8s) clusters.
Your primary goal is to help users by:
- Answering questions about their Kubernetes cluster.
- Retrieving information about resources like pods, nodes, services, deployments, namespaces, etc. using the available tools.
- Explaining Kubernetes concepts relevant to their queries.
- Assisting in understanding resource status and configurations.
Always strive to be clear, concise, and helpful in your responses.
Example interactions:
- User: "How many pods are running in the 'default' namespace?"
- You: (After using a tool to get pod count) "There are X pods currently running in the 'default' namespace."
- User: "What's the status of the pod named 'my-app-pod-123'?"
- You: (After using a tool to get pod status) "The pod 'my-app-pod-123' is currently in a 'Running' state."
"""
kubernetes_admin_agent = LlmAgent(
name="kubernetes_admin_agent",
model=MODEL,
description=(
"An AI agent that helps users explore and understand their Kubernetes cluster by answering questions and retrieving information about resources."
),
instruction=KUBERNETES_ADMIN_INSTRUCTION,
tools=[], # Pass the FunctionTool instance
)
root_agent = kubernetes_admin_agent
Testing agent¶
Before running test, you must configure a cloud-environement :
export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_CLOUD_PROJECT=<Complete with your GCP project ID>
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_CLOUD_STORAGE_BUCKET=${GOOGLE_CLOUD_PROJECT}-adk-bucket-$(openssl rand -hex 4) # generate a unique bucket name
gcloud auth application-default login
gcloud storage buckets create gs://$GOOGLE_CLOUD_STORAGE_BUCKET
Now you can run test with the following command :
adk run .
If everything is ok, you will see the following prompt:
Log setup complete: /tmp/agents_log/agent.20250528_200909.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
Running agent kubernetes_admin_agent, type exit to exit.
[user]: hello
[kubernetes_admin_agent]: Hello! I'm Kubernetes Admin, your AI assistant for managing and understanding your Kubernetes cluster.
How can I help you today? For example, you can ask me about pods, nodes, services, or any other Kubernetes resources.
[user]: how many pod in my kubernetes clusters
[kubernetes_admin_agent]: I can help you with that! To tell you how many pods are in your cluster, I need to check all namespaces.
Should I proceed to get the list of all pods across all namespaces in your cluster?
Giving tools to my agent¶
First, let’s write a Python function to interact with kubectl and wrap it as an ADK tool.
- Create/edit
kubernetes-admin/kubernetes_admin/tools/kubectl_tool.pywith the following content:
# In kubernetes_admin/kubernetes_admin/tools/kubectl_tool.py<<EOF
"""Tool for executing kubectl get commands."""
import subprocess # For actual kubectl calls
from google.adk.tools import FunctionTool
def run_kubectl(verbs: str = "get", resource_type: str = "pods", resource_name: str = "", namespace: str = "default") -> str:
"""
Runs a kubectl command with the specified verb, resource type, optional resource name, and namespace.
Exemples:
To get configuration of a specific pod use verbs: "describe", namespace: "all", resource_type: "pods", resource_name: "<custom pod name>"
To get logs of a specific pod use verbs: "logs", namespace: "<namespace>", resource_type: "", resource_name: "<custom pod name>"
To list pods in all namespace use verbs: "get", namespace: "all", resource_type: "pods", resource_name: ""
Args:
verbs: The kubectl verb to use (e.g., 'get', 'describe', 'logs', 'top').
resource_type: The type of Kubernetes resource (e.g., 'pods', 'nodes', 'services', 'deployments').
resource_name: Optional. The specific name of the resource. If None, lists all resources of the type.
namespace: Optional. The Kubernetes namespace. Defaults to 'default'.
Returns:
A string containing the output of the kubectl command or an error message.
"""
try:
command = ["kubectl", verbs]
if resource_name != "":
command.append(f'{resource_type}/{resource_name}')
else:
command.append(f'{resource_type}')
if namespace != "all":
command.extend(["-n", namespace])
else:
command.extend(["--all-namespaces"])
result = subprocess.run(command, capture_output=True, text=True, check=True, timeout=30)
return result.stdout
except FileNotFoundError:
return "Error: kubectl command not found. Please ensure kubectl is installed and in your PATH."
except subprocess.CalledProcessError as e:
return f"Error executing kubectl command: {e.stderr}"
except subprocess.TimeoutExpired:
return "Error: kubectl command timed out."
except Exception as e:
return f"An unexpected error occurred: {str(e)}"
kubectl_tool = FunctionTool(
func=run_kubectl)
- Now, let’s update kubernetes-admin/kubernetes_admin/agent.py to include this new tool. Import the tool:
from .tools.kubectl_tool import kubectl_tool # Import the FunctionTool instance
...
...
kubernetes_admin_agent = LlmAgent(
name="kubernetes_admin_agent",
model=MODEL,
description=(
"An AI agent that helps users explore and understand their Kubernetes cluster by answering questions and retrieving information about resources."
),
instruction=KUBERNETES_ADMIN_INSTRUCTION,
tools=[kubectl_tool], # Pass the FunctionTool instance
)
- Update the agent’s instructions within KUBERNETES_ADMIN_INSTRUCTION in kubernetes-admin/kubernetes_admin/agent.py to inform the LLM about the new tool and how to use it:
KUBERNETES_ADMIN_INSTRUCTION = """
...
...
You have a tool named 'run_kubectl_command' to fetch information from the Kubernetes cluster by executing kubectl commands.
Tool 'run_kubectl_command' arguments:
- 'verbs': (string, required) The kubectl verb to use (e.g., 'get', 'describe', 'logs', 'top').
- 'resource_type': (string, required) The type of Kubernetes resource (e.g., 'pods', 'nodes', 'services', 'deployments', 'namespaces').
- 'resource_name': (string, optional) The specific name of the resource. If not provided, all resources of the given type in the namespace will be listed.
- 'namespace': (string, optional, defaults to 'default') The Kubernetes namespace to query.
When a user asks for information that requires querying the cluster (e.g., "list pods", "get node status", "describe service my-service", "get logs for pod xyz"), you MUST use the 'run_kubectl_command' tool.
Clearly state that you are using the tool, what you are querying for, and then present the information returned by the tool.
Determine the correct 'verbs' argument based on the user's request (e.g., "list" or "how many" implies 'get', "what's the status" implies 'get', "describe" implies 'describe', "show logs" implies 'logs').
"""
And now real testing¶
Now, let’s perform a real test to see if our agent uses kubectl as expected. (Ensure you have a valid kubeconfig file and are connected to a Kubernetes cluster for this test.)
$ adk run .
...
[user]: how many pod in my cluster all namespaces
[kubernetes_admin_agent]: Okay, I can help you with that. I will use the `run_kubectl` tool to list all pods in all namespaces and then count them for you.
[kubernetes_admin_agent]: I have retrieved the list of all pods across all namespaces. There are a lot of them!
Based on the output, there are 119 pods in your cluster across all namespaces.
Kubernetes integration¶
We will now deploy our agent into a kubernetes cluster, to give possibility to other user to use it.
Build a docker container :¶
-
Here’s a Dockerfile to build a container image for your Python agent. This Dockerfile:
-
Uses a slim Python base image.
-
Installs
kubectl(as your agent’s tool uses it) and Poetry. -
Sets up a non-root user for better security.
-
Installs your project dependencies using Poetry.
-
Sets the default command to run your ADK agent using its built-in HTTP server.
```Dockerfile
In kubernetes_admin/Dockerfile¶
FROM python:3.13-slim WORKDIR /app
RUN pip install poetry ENV PORT=8080 RUN apt update && apt install -y kubernetes-client/stable prometheus/stable && apt clean COPY . .
RUN adduser –disabled-password –gecos “” myuser && \ chown -R myuser:myuser /app
USER myuser
RUN poetry install
ENV PATH=”/home/myuser/.local/bin:$PATH” CMD poetry run uvicorn main:app –host 0.0.0.0 –port $PORT ```
-
-
(One-time setup) Create an Artifact Registry repository if you haven’t already.
bash # Replace REGION and GOOGLE_CLOUD_PROJECT with your details. gcloud artifacts repositories create adk-repo --repository-format=docker \ --location=YOUR_REGION \ --description="ADK agent repository" \ --project=YOUR_GOOGLE_CLOUD_PROJECT - Build the Docker image and push it to Artifact Registry.
bash # Replace REGION and GOOGLE_CLOUD_PROJECT with your details. gcloud builds submit --tag YOUR_REGION-docker.pkg.dev/YOUR_GOOGLE_CLOUD_PROJECT/adk-repo/k8s-admin-agent:latest \
Add RBAC¶
The following manifest creates a Kubernetes ServiceAccount in the kubernetes-admin-agent namespace. It also defines a ClusterRole with read-only permissions (get, list, watch) across all API groups and resources, and then uses a ClusterRoleBinding to grant these permissions to the created ServiceAccount.
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: kubernetes-admin-agent
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubernetes-admin-agent
rules:
- apiGroups:
- "*"
resources:
- "*"
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
# This cluster role binding allows the kubernetes-admin-agent ServiceAccount to read all resources in any namespace.
kind: ClusterRoleBinding
metadata:
name: kubernetes-admin-agent-global
subjects:
- kind: ServiceAccount
name: kubernetes-admin-agent
namespace: kubernetes-admin-agent
roleRef:
kind: ClusterRole
name: kubernetes-admin-agent
apiGroup: rbac.authorization.k8s.io
EOF
Add secret for google credentials¶
To allow your agent running in Kubernetes to authenticate with Google Cloud services (specifically Vertex AI for Gemini), you need to create a GCP service account, grant it the necessary permissions, generate a key for it, and store this key as a Kubernetes secret.
Make sure your GOOGLE_CLOUD_PROJECT environment variable is set correctly before running these commands.
# 1. Create the service account
gcloud iam service-accounts create kubernetes-admin-agent
# 2. Grant the service account the "Vertex AI User" role on your project
gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \
--member="serviceAccount:kubernetes-admin-agent@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
# 3. Create and download a key for the service account
gcloud iam service-accounts keys create key.json \
--iam-account="kubernetes-admin-agent@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com"
# 4. Create a Kubernetes namespace for the agent (if not already created)
kubectl create ns kubernetes-admin-agent
# 5. Create a Kubernetes secret from the downloaded key file
kubectl create secret generic google-cloud-key -n kubernetes-admin-agent --from-file=key.json=key.json
Cloud Ops Chronicles