Skip to content

When to Write a Linting Rule vs. Defer to AI Tooling

This guidance helps authors decide whether a given Azure API design guideline should be implemented as a deterministic linting rule, deferred to AI authoring/validation tools, or addressed with a hybrid approach.

Prefer deterministic linting when the guideline can be expressed as a stable, reproducible check with high confidence and actionable diagnostics. Prefer AI when the guideline requires semantic judgment, domain intent, content generation, or evaluation of qualitative adequacy. Use hybrid approaches when deterministic rules can identify objective gaps and AI can help interpret, prioritize, or remediate them.

The decision to create a linting rule hinges primarily on determinism of detection — can you reliably identify violations with low false positives? Whether the resolution requires judgment is a separate question that determines whether the rule is standalone or hybrid, not whether it should exist.

All linter rules emit diagnostics with warning severity. This is important to understand:

  • Warnings are suppressible — authors can add a suppression comment with justification
  • Unsuppressed warnings block CI — the compiler treats unsuppressed diagnostics as errors during validation
  • Errors (non-linter) indicate inconsistent code or code that will not compile and are never suppressible

Linter rules are never classed as errors because they enforce design guidelines, not language correctness. The suppression mechanism exists precisely because even high-confidence rules occasionally have legitimate exceptions that require human judgment.

When evaluating whether a guideline should become a linting rule, consider two independent questions:

  1. Can violations be detected deterministically with high fidelity? (detection axis)
  2. Can violations be resolved without contextual judgment? (resolution axis)

These combine into four categories:

DetectionResolutionApproach
Deterministic, low false positivesUnambiguous or few optionsStandalone linting rule — detects and may offer fix
Deterministic, low false positivesRequires judgment/contextHybrid — rule detects, AI assists resolution
Requires semantic judgmentRequires judgment/contextDefer to AI tooling
Requires semantic judgmentUnambiguous or few optionsDefer to AI tooling — detection is the bottleneck

The key insight: detection determinism decides whether a rule should exist. Resolution complexity decides whether it needs AI assistance, not whether it’s worth building.

These criteria all relate to the reliability of detection:

1. The violation is mechanically identifiable from the AST

Section titled “1. The violation is mechanically identifiable from the AST”

You can write a predicate over the TypeSpec syntax tree that reliably identifies violations. The check relies on structural or syntactic properties, not on understanding what the API means.

  • “No resource type should use the suffix ‘Resource’” — name check
  • “All operations must have an api-version parameter” — parameter presence check
  • “Enums must use the extensible pattern” — structural pattern check

The rule should almost never flag code that is actually correct. If you cannot distinguish violations from valid patterns mechanically, the rule will produce noise that erodes author trust in the linter.

3. Exceptions are rare and handleable via suppression

Section titled “3. Exceptions are rare and handleable via suppression”

The guideline doesn’t need to be exceptionless. Linter warnings are suppressible by design. A rule is appropriate as long as exceptions are infrequent enough that suppression is a reasonable mechanism (not a routine annoyance).

The author can understand what’s wrong from the diagnostic message. They may or may not know how to fix it without help — that determines standalone vs. hybrid, not whether the rule should exist.

Linting rules provide instant, in-editor feedback. For guidelines where catching violations early prevents expensive rework, deterministic detection is strongly preferred over async AI review.

These criteria indicate that detection itself requires judgment:

1. Identifying the violation requires semantic or domain understanding

Section titled “1. Identifying the violation requires semantic or domain understanding”

The guideline’s applicability depends on what the API means, not just its structural shape. No AST predicate can reliably determine whether the code violates the guideline.

  • 🤖 “Resource types should have clear, concise names” — “clear” depends on domain context
  • 🤖 “Choose appropriate HTTP methods for operations” — requires understanding intent
  • 🤖 “Model structure should reflect the resource lifecycle” — requires domain modeling judgment

2. Detection would produce unacceptable false positives

Section titled “2. Detection would produce unacceptable false positives”

If the mechanical proxy for the guideline would flag many valid patterns, it should not be a linting rule. Authors trained to suppress noise will also suppress legitimate findings.

3. The guideline evaluates subjective quality

Section titled “3. The guideline evaluates subjective quality”

Assessing whether something is good enough rather than present or structurally correct is inherently non-deterministic.

  • 🤖 “Documentation is clear, accurate, and useful”
  • 🤖 “The API surface is ergonomic for the target scenarios”
  • 🤖 “Naming clearly communicates purpose to consumers”

Hybrid Approach: Deterministic Detection + AI-Assisted Resolution

Section titled “Hybrid Approach: Deterministic Detection + AI-Assisted Resolution”

This is the middle ground where a linting rule should exist because detection is reliable, but the fix requires contextual judgment. The rule identifies the problem; AI helps solve it.

PatternRule Detects (deterministic)AI Assists (judgment)
Missing documentation@doc decorator absentGenerates appropriate text
Naming violationSuffix/prefix/casing wrongSuggests contextually appropriate name
Missing paginationList operation lacks pagingHelps structure the paging model
Overly broad typeRecord<unknown> usedSuggests appropriate typed alternative

Why the rule still matters in hybrid cases:

  • It provides a structured, reliable signal that AI tools can consume
  • It ensures the issue is never silently ignored (unsuppressed warnings block CI)
  • It gives instant in-editor feedback even when AI tools aren’t active
  • It makes the guideline auditable — you can count violations, track suppressions, measure compliance

❌ Skipping a rule because the fix is hard

Section titled “❌ Skipping a rule because the fix is hard”

If detection is reliable, create the rule even if the fix requires judgment. The hybrid pattern exists for exactly this case. A diagnostic that says “this model uses Record<unknown>, which limits SDK usability” is valuable even without an auto-fix.

❌ Rules whose easiest fix is meaningless mechanical compliance

Section titled “❌ Rules whose easiest fix is meaningless mechanical compliance”

If the rule incentivizes authors to add useless placeholders (e.g., @doc("The Foo property")) just to silence the warning, the rule needs complementary AI review of content quality. The rule is still worth having — it catches absence — but it shouldn’t be the only check.

❌ Rules encoding service-specific policy as universal Azure policy

Section titled “❌ Rules encoding service-specific policy as universal Azure policy”

A pattern that’s wrong for one service may be correct for another. Universal rules should reflect truly universal guidelines.

❌ Rules that require expensive whole-program or cross-version analysis

Section titled “❌ Rules that require expensive whole-program or cross-version analysis”

If the rule needs to compare against previous API versions or analyze the entire spec graph, it may be too expensive for real-time editor feedback. Consider running such checks only in CI rather than in-editor.

❌ AI as the sole enforcement for consistently-applied guidelines

Section titled “❌ AI as the sole enforcement for consistently-applied guidelines”

AI validation is non-deterministic and hard to audit. If a guideline must be reproducibly enforced across all services, a deterministic rule (even a simple structural proxy) provides the necessary backstop.

FactorFavors Linting RuleFavors AI Tooling
Detection reliabilityAST predicate, high confidenceRequires semantic understanding
False positive rateVery lowModerate to high
Knowledge for detectionLocal structural/syntacticDomain, historical, or cross-service
Speed importanceCritical (in-editor feedback)Async/advisory acceptable
Guideline maturityWell-established, stableEvolving, subjective
Auditability needMust track complianceAdvisory is sufficient
Maintenance costSimple AST check, stable APIsComplex inference, frequent exceptions

Note: fix complexity does not appear in this table — it affects whether the rule is standalone or hybrid, not whether it should exist.

When a new rule produces diagnostics on existing specs, each violation must be individually addressed before the rule can ship in a ruleset. All rules in a ruleset apply universally to all specs that ruleset covers — there is no mechanism to exempt individual specs from specific rules.

For each existing violation, the author must decide:

  1. Fix the violation (preferred) — If the fix is API-neutral (doesn’t require service-specific knowledge and doesn’t meaningfully change downstream artifacts), apply the fix directly.

  2. Suppress with FIXME — If the violation cannot be fixed without service-specific knowledge or would cause meaningful downstream changes, suppress it with a FIXME comment indicating why and what would be needed to resolve it.

The External Integration check (int:azure-specs) identifies which existing specs produce new diagnostics. See the Creating Linter Rules guide for the workflow of submitting spec fixes alongside linter rule PRs.

Can the violation be identified from AST/structure alone?
├── No → Defer to AI tooling
└── Yes → Would it produce high false positives?
├── Yes → Defer to AI tooling
└── No → CREATE A LINTING RULE ✅
└── Can the fix be applied without contextual judgment?
├── Yes → Standalone rule (may include code fix)
└── No → Hybrid: rule detects, AI assists resolution