What happens when you ask for a million rows
Some questions touch a lot of data. Here's how Dronk returns a useful answer without letting a careless query take down your database.
Ryan Chandler
3 min read
"Ask your database anything" runs into a hard reality pretty quickly. Some questions touch a lot of rows, and "show me every event we've ever recorded" is a fine thing to type but a terrible thing to run as-is against production. So Dronk never runs a generated query exactly as written. It wraps it, bounds it, and pages through it.
Every query gets a ceiling
Whatever the agent writes, we run it as a subquery with a limit on top. We ask
for one more row than the page size so we can tell you there's more without a
separate COUNT:
select * from (
-- the agent's generated query goes here, untouched
) as dronk_results
limit 26; -- a page of 25, plus one to detect "has more"
The page size defaults to 25, and there's a hard ceiling of 1,000 rows that a single result can request, no matter what. You get a fast, readable answer instead of a browser trying to render half a million rows.
More than a page? Page through it
When there is more, you don't lose it. We track the offset and fetch the next page on demand, so a large result becomes something you move through rather than something that arrives all at once:
| Field | Meaning |
|---|---|
per_page |
How many rows came back this page |
page |
Which page you're looking at |
has_more |
Whether another page is waiting |
Nothing is truncated silently. You can see there's more and go get it.
A clock on every query
A small LIMIT doesn't help if the query itself is expensive to compute. A join
across three unindexed tables can chew through CPU long before it ever produces a
row. So the connection that runs your query has a timeout baked into the session,
and the database enforces it, not us:
-- Postgres
SET statement_timeout = 15000; -- 15 seconds, in milliseconds
-- MySQL
SET SESSION max_execution_time = 15000;
If a query blows past that, the database cancels it and you get a clear error instead of a hung connection. The timeout is configurable, but the default is deliberately short. A question should feel quick, and the ones that can't be answered quickly should fail fast and tell you so.
The point of a limit
None of these bounds are there to be hit in normal use. They're there so the worst case is boring. Ask for a million rows and you don't melt anything. You get the first page, a note that there's more, and the option to keep going. Speed you can trust is worth more than speed that occasionally pages you at 2am.