Dronk
All posts
Engineering

What happens when you ask for a million rows

Some questions touch a lot of data. Here's how Dronk returns a useful answer without letting a careless query take down your database.

RC

Ryan Chandler

3 min read

"Ask your database anything" runs into a hard reality pretty quickly. Some questions touch a lot of rows, and "show me every event we've ever recorded" is a fine thing to type but a terrible thing to run as-is against production. So Dronk never runs a generated query exactly as written. It wraps it, bounds it, and pages through it.

Every query gets a ceiling

Whatever the agent writes, we run it as a subquery with a limit on top. We ask for one more row than the page size so we can tell you there's more without a separate COUNT:

select * from (
    -- the agent's generated query goes here, untouched
) as dronk_results
limit 26;   -- a page of 25, plus one to detect "has more"

The page size defaults to 25, and there's a hard ceiling of 1,000 rows that a single result can request, no matter what. You get a fast, readable answer instead of a browser trying to render half a million rows.

More than a page? Page through it

When there is more, you don't lose it. We track the offset and fetch the next page on demand, so a large result becomes something you move through rather than something that arrives all at once:

Field Meaning
per_page How many rows came back this page
page Which page you're looking at
has_more Whether another page is waiting

Nothing is truncated silently. You can see there's more and go get it.

A clock on every query

A small LIMIT doesn't help if the query itself is expensive to compute. A join across three unindexed tables can chew through CPU long before it ever produces a row. So the connection that runs your query has a timeout baked into the session, and the database enforces it, not us:

-- Postgres
SET statement_timeout = 15000;        -- 15 seconds, in milliseconds

-- MySQL
SET SESSION max_execution_time = 15000;

If a query blows past that, the database cancels it and you get a clear error instead of a hung connection. The timeout is configurable, but the default is deliberately short. A question should feel quick, and the ones that can't be answered quickly should fail fast and tell you so.

The point of a limit

None of these bounds are there to be hit in normal use. They're there so the worst case is boring. Ask for a million rows and you don't melt anything. You get the first page, a note that there's more, and the option to keep going. Speed you can trust is worth more than speed that occasionally pages you at 2am.