Apache Kafka Governance (B2B): Improving data visibility

Project overview

This case study shows how we introduced Kafka governance at Aiven in a way that’s secure, discoverable, and easy to adopt. It focuses on onboarding, navigation, and developer workflows, and on the key trade-offs I made to add governance without getting in the way of existing Kafka usage.

As Kafka adoption scaled across enterprise customers, governance gaps created operational risk: teams could not reliably discover ownership, validate schemas, or enforce access policies. This slowed onboarding, increased support load, and blocked broader platform adoption.

The main set of functionality includes the following:

  • Improve data discoverability: Enable users to explore data through a topic-centric catalog view.

  • Secure topic creation: Require service owner approval for new topics to ensure compliance and accountability.

  • Clarify topic ownership: Assign clear topic owners for easier management and administration.

  • Enforce topic policies: Define default and maximum configuration limits to ensure consistent and efficient resource usage.

  • Streamline approvals: Replace direct topic creation with structured request and approval flows for better governance.

My role

  • Led problem framing and UX strategy for Kafka governance

  • Led MVP scope definition and success metrics in the absence of a PM.

  • Facilitated alignment across Platform, Billing, and SAs

  • Owned end-to-end design from discovery to rollout

Challenges

We identified three governance failures, but intentionally scoped the first release to governance activation, topic discoverability and ownership distribution, as it was the highest-leverage blocker to adoption and prerequisite for enforcement workflows.

Prototype

To align the team before the kick of the development I prototyped the main flows.

🔗 Review

Key design decisions

Below, I present a selection of the many design challenges I tackled during the project. While this list is not exhaustive, it effectively illustrates the range of problems I focused on.

Decision 1: Opt-In governance setup to reduce production risk and remove the main obstacle for the feature adoption

Why do we need an activation flow instead of enabling governance by default?

Introducing governance through passive discovery rather than mandatory configuration lowers friction and increases the likelihood that teams return and gradually adopt advanced controls. It also allows the product to better serve different customer personas: small teams (1–5 developers) can start with lightweight visibility, while larger or enterprise teams can progressively activate more complex governance features as their needs mature.

After conducting interviews with our enterprise customers, I identified several recurring concerns across all participants.

  • Users were hesitant to enable governance on production clusters.

  • Topic catalog attracted initial interest but failed to retain engagement.

  • Tool placement lacked a clear purpose, reducing discovery.

  • Governance workflows needed to balance visibility with core service functionalities.

As a result, I prioritised an opt-in flow that allows customers to select only test clusters from the outset. Below are screenshots of the winning version of the activation flow:

Note: The production version of the video differs slightly from the designs, as we opted to use existing components to ensure faster development.

Decision 2: Low-Friction onboarding to drive an early adoption

🔍

Goal: To create a minimal, frictionless onboarding process to improve adoption, while keeping development effort to a minimum.

As a result, I created a simple animation that we present to the user using the existing design system component.

Decision 3: Governance placement within existing navigation

🔍

Goal: When I joined the project, an MVP of the Topic Catalog already existed in the Tools menu. The objective was to evaluate whether its placement supported adoption and, if needed, propose a better location.

📖

Context: User retention was low—only ~3–9% of users returned in week 1 after their first visit, dropping to ~0–3% by weeks 2–3—suggesting that initial interest wasn’t translating into sustained value.

🔍

Goal: When I joined the project, an MVP of the Topic Catalog already existed in the Tools menu. The objective was to evaluate whether its placement supported adoption and, if needed, propose a better location.

📖

Context: User retention was low—only ~3–9% of users returned in week 1 after their first visit, dropping to ~0–3% by weeks 2–3—suggesting that initial interest wasn’t translating into sustained value.

Options Considered:

  1. Move Kafka Governance to the top-level navigation Improved visibility but risked implying Kafka is Aiven’s main product, misaligning with the platform’s broader positioning across databases and streaming services.

  1. Come up with the new structure, by regrouping Admin features into more granular categories.

To validate the idea, I compiled a set of potential feature groups based on a competitor review and a series of brainstorming sessions with colleagues. Using the final selection of categories, I conducted a quick tree-sorting exercise with 10 internal users to evaluate the new structure and identify the optimal hierarchy.

The first round of validation showed the low agreement rate even among people well informed about the functionality, therefore I rejected this grouping options.

Final implementation

As a result, I decided to build on top of the existing structure, unless the team is ready to commit to a more substantial navigation redesign. The functionality was divided into two areas aligned with the main user personas: Admin (for setup and configuration) and Tools (for development and operations).

Nevertheless the solution is not shy from the number of the tradeoffs outlined in the table:

📛

Topic catalog initially attracts user interest but fails to retain engagement User drop-off likely due to limited functionality and value delivery

Topic catalog initially attracts user interest but fails to retain engagement User drop-off likely due to limited functionality and value delivery

Current implementation preserves existing workflows but at the cost of utility

Current implementation preserves existing workflows but at the cost of utility

Opportunity to reimagine the topic catalog with clearer user value proposition

Opportunity to reimagine the topic catalog with clearer user value proposition

Tool placement and organization lack clear purpose and information architecture

Tool placement and organization lack clear purpose and information architecture

Helps to balance visibility with the priority of core service functionality

Helps to balance visibility with the priority of core service functionality

Functionality remains hidden from users, reducing discovery

Functionality remains hidden from users, reducing discovery

Decision 4: Extending Governance to Terraform

For our largest enterprise customers, Terraform is the primary interface for managing Kafka resources. While the Console supports discovery and operational tasks, governance adoption required first-class support for existing Terraform workflows.

This was reinforced by two signals:

  • Customer workshops: A recurring theme was the need for governance support directly in Terraform to make adoption viable.

  • Usage data: The majority of API requests from enterprise accounts originated from Terraform, not the Console.

Due to limited Terraform engineering capacity, I split the solution into two phases.

Phase 1: Low Effort, High Return

  • Added an ownership field to Terraform resources to establish accountability

  • Introduced configuration and naming validation to catch policy violations early

//topic terraform recource 

"kafka-topic1" {
project                = data.aiven_project.example_project.project
service_name           = aiven_kafka.example_kafka.service_name
topic_name             = "kafka-topic1"
//new field 
owner_user_group_id = aiven_organization_user_group.group_example_one.id //
partitions             = 5
replication            = 3
termination_protection = true 
config {
flush_ms                       = 10
cleanup_policy                 = "compact,delete"
 }
 timeouts { 
  create = "1m"
  read   = "5m"}
}

Phase 2: Advanced approval flow

In collaboration with engineering, we explored a Git-integrated approval model to ensure ACL changes require approval from the designated topic owner. This phase was scoped as a future enhancement due to implementation complexity.

Bellow is the artifact I used to align with the team:

Kate Lozanova

Results & Impact

The changes increased adoption, with five major customers beginning to test the governance features.

  • This early usage confirmed there is strong demand for the functionality.

  • As a result, the organization made expanding the feature set and growing the user base a key priority for the upcoming quarter.

What’s next?

  • Reevaluate the navigation decision when the feature set is more mature.

  • Focus on delivering the access management — the highest requested feature by the clients.

  • Review the possibility to inject governance flows the proper onboarding starting from the first Kafka cluster creation.