Dronk
All posts
Engineering

How Dronk reads your Postgres and MySQL schema

A tour of the indexing layer that turns a live database into the context an agent needs to write correct SQL.

RC

Ryan Chandler

3 min read

Before Dronk writes a single query, it has to understand the database it's talking to. That understanding doesn't come from a config file you maintain by hand. It comes from the database itself. We read your schema, build a snapshot of it, and that snapshot is what the agent explores when it answers a question.

Start with the catalog

Both Postgres and MySQL expose their structure through queryable metadata. We lean on information_schema for the portable parts. Tables, columns, types, and nullability all come straight from the source of truth:

SELECT table_name, column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY table_name, ordinal_position;

We pull more than that per column (defaults, length, precision, scale, collation) and we read indexes, primary keys, and unique and check constraints too, so the agent knows not just what exists but how it's shaped.

Relationships are the real prize

Columns tell you what exists. Relationships tell you how it connects, and joins are where generated SQL most often goes wrong. We pull declared foreign keys from the catalog:

SELECT
    kcu.table_name  AS from_table,
    kcu.column_name AS from_column,
    ref.table_name  AS to_table,
    ref.column_name AS to_column
FROM information_schema.key_column_usage kcu
JOIN information_schema.referential_constraints rc
  ON rc.constraint_name = kcu.constraint_name
JOIN information_schema.key_column_usage ref
  ON ref.constraint_name = rc.unique_constraint_name;

With that graph in hand, "revenue by customer" knows it has to travel from orders to customers on the right key, without anyone spelling it out.

When the foreign keys aren't there

Plenty of real databases don't declare every foreign key. Maybe they were never added, maybe the app enforces relationships in code instead. So we don't stop at what the catalog declares. We also infer relationships from naming.

If a table has a user_id column and there's a users table, that's almost certainly a relationship, even with no constraint backing it. Dronk spots that pattern and adds the relationship with a confidence score, so the agent can use it while still knowing it was inferred rather than declared:

orders.user_id  ->  users.id     declared foreign key
orders.team_id  ->  teams.id     inferred from naming (confidence 75)

We skip columns that look polymorphic (a commentable_id paired with a commentable_type), because the target isn't a single table and guessing one would do more harm than good.

Reading it once

Introspection isn't free, so we don't redo it on every question. We index the schema into a stored snapshot, and the agent reads from that. When your structure changes, you re-index and the snapshot catches up. The common path stays fast, and every answer is grounded in your real schema rather than a hand-written description that drifts out of date.

The unglamorous plumbing here is exactly what makes the magic feel reliable.