Skip to main content

Agents, prompts, and audit patterns

The AI work I've done hasn't been chatbot-shaped. The problems I was trying to solve were assessment-shaped: collect data from an Azure environment, evaluate it against a framework, produce output that an engineer or a governance review could use. That framing changes almost every design decision — what the agent needs to do, how the system prompt is written, when tool calling matters more than generation.

This page covers the patterns I've built and iterated on: the multi-agent Azure Resource Graph system, the environment assessment application design, and where Semantic Kernel fits in.


The core design principle: no ungrounded responses

The most important rule I established for any Azure-facing agent is this: never report results that haven't been successfully retrieved. An agent that answers from general knowledge when a query fails is worse than useless — it produces plausible-sounding wrong answers about the specific environment you're investigating.

The system prompt pattern for any Azure query agent includes these explicit constraints:

## Core Principles (ALL AGENTS)
- Never report results that haven't been successfully retrieved
- Always verify query syntax before execution
- Properly handle and report errors
- Never respond with information from general AI knowledge when query execution fails
- Always use the appropriate resource table based on the resource type

This sounds obvious in principle. In practice, language models will helpfully fill in what they know about Azure when they can't execute a query — and that behaviour needs to be explicitly suppressed in the system prompt, not just hoped away.


Multi-agent Azure Resource Graph system

The first multi-agent system I built was for Azure Resource Graph queries. The motivation was that a single agent trying to generate, execute, and interpret KQL was too error-prone — specifically, it would generate syntactically wrong queries and then report on the results as if the queries had worked.

The three-agent design separates the failure modes:

Query Generator Agent

Responsibilities:

  • Parse user intent and translate to valid Azure Resource Graph KQL
  • Select the correct table (Resources, ResourceContainers, AuthorizationResources, etc.) based on the resource type
  • Verify syntax before passing to the Execution Agent
  • Return a corrected query if validation fails

The key failure this agent was built to prevent: using the wrong table. It's easy to query resources for something that only exists in authorizationResources, and get an empty result that looks like "no assignments found" rather than "wrong query."

Execution Agent

Responsibilities:

  • Execute the query from the Query Generator
  • Handle pagination and result size limits
  • Return raw results or a structured error — nothing else
  • Flag partial results distinctly from empty results

Lead Agent

Responsibilities:

  • Coordinate the workflow between Generator and Execution
  • Verify that results came from actual query execution
  • Interpret results in context of the user's original request
  • Present final response — or explicitly state that the query failed and why

The Lead Agent's verify step is where the "no ungrounded responses" rule is enforced operationally. It checks that the Execution Agent returned actual data before allowing a response to be formed.


Azure environment assessment application

The larger project was a full assessment application that evaluates Azure environments against the Cloud Adoption Framework (CAF) and Well-Architected Framework (WAF). The design was application-shaped rather than chat-shaped — multi-tenant, read-only, structured output.

Architecture decisions

Authentication: OAuth 2.0/OIDC with both delegated permissions and service principal support. Multi-tenant consent flow. Read-only scope to Azure Resource Graph — the assessment application should never have write access to the environments it's assessing.

Data collection: Azure Resource Graph for all resource discovery. Query batching and pagination for large environments. State caching for incremental scans — you don't want to re-query everything on every run.

Assessment framework structure: Five WAF pillars as the top-level assessment categories:

  • Cost Optimization
  • Operational Excellence
  • Performance Efficiency
  • Reliability
  • Security

CAF domains mapped separately: Governance, Management, Platform architecture, Landing zone configurations, DevOps practices.

Rules engine

The rules engine design used severity levels (Critical, High, Medium, Low) with each rule defining:

  • The Resource Graph query to collect evidence
  • The evaluation logic (what constitutes a pass/fail)
  • The recommendation text
  • The CAF/WAF pillar mapping

Separating the query from the evaluation logic matters when you're building for scale across many tenants. The same query structure works across tenants; the evaluation threshold might vary by policy.

AI layer in the assessment

The AI component sits on top of the rules engine, not instead of it. The rules engine produces structured findings; the AI layer:

  • Groups related findings into coherent remediation themes
  • Prioritises remediation order based on dependencies
  • Drafts output suitable for engineering backlogs (work items, not reports)
  • Translates technical findings into governance language for review meetings

The pattern of using AI to transform structured output — rather than to generate findings directly — produces more reliable results. The findings are grounded in actual query data; the AI's job is presentation and prioritisation.


Semantic Kernel for document processing

A separate use case: using Semantic Kernel in .NET to loop through a screenshots folder, analyse each image with a local OpenAI-compatible vision model, and dump structured output to CSV.

The pattern:

// Configure kernel with local OpenAI-compatible endpoint
var kernel = Kernel.CreateBuilder()
.AddOpenAIChatCompletion(
modelId: "llava", // or whichever local vision model
apiKey: "not-needed",
httpClient: new HttpClient { BaseAddress = new Uri("http://localhost:11434/v1") }
)
.Build();

// Loop screenshots and process each
foreach (var file in Directory.GetFiles(screenshotFolder, "*.png"))
{
var imageData = await File.ReadAllBytesAsync(file);
var imageContent = new ImageContent(imageData, "image/png");

var result = await kernel.InvokePromptAsync(
"Analyse this screenshot. Extract: category, key UI elements, any error messages, application name if visible. Return as JSON.",
new KernelArguments { ["image"] = imageContent }
);

// Write structured output to CSV
csvWriter.WriteRecord(new { FileName = Path.GetFileName(file), Analysis = result });
}

The local model endpoint (http://localhost:11434/v1 for Ollama) is drop-in compatible with the OpenAI SDK and Semantic Kernel's OpenAI connector. This means the same code runs against a local model or Azure OpenAI — the only change is the base URL and model ID.


Prompt design principles

Patterns that held up across all the agent work:

Role-specific framing, not generic assistant framing. An agent with the role "Azure Resource Graph query specialist" produces better KQL than one told to "help with Azure." The narrower the role, the tighter the output.

Explicit output schema. When the output needs to be machine-readable (JSON for a CSV dump, structured findings for a rules engine), include the schema in the system prompt with an example. "Return JSON with fields: category, severity, detail" produces more consistent output than hoping the model infers it.

Tool calling for facts. Anything that should come from execution — query results, API responses, environment data — should come through a tool, not from the model's training data. The system prompt should explicitly state this as a requirement.

Separate reasoning from evidence collection. The multi-agent pattern is one expression of this: have one agent collect evidence, have another reason over it. Even in a single-agent design, structuring the prompt as "first execute this query, then interpret the results" produces better output than asking for both at once.

Fail explicitly. When a tool call fails or returns nothing, a well-designed agent reports that failure clearly rather than continuing with a response. "The query returned no results" is a valid and useful output. "I couldn't find any VMs in your subscription" fabricated from a failed query is not.