Estimating with ranges: why "2 days, give or take" beats "2 days"

Forecasting2026-05-29

Ask someone how long a task will take and they will give you a number. Ask them whether they would bet a month's salary on that number and they will immediately give you a range. The range was in their head the whole time. Most planning tools simply refuse to record it.

That refusal has a cost. When every task in a plan is a single number, the plan's total is a single number too, and every downstream decision (the commitment to a client, the hiring call, the launch date) inherits false precision. The fix is small: store the spread alongside the centre. This article is about what that looks like and why it works.

Two numbers, not one#

In TOL, Topolog's planning language, an estimate looks like this:

task t_record "Record the lessons" { agent: me; estimate: 20h cv 0.5 }

The 20h is the centre. The cv 0.5 is a coefficient of variation: the standard deviation divided by the mean. It is the honest part of the estimate, and you can read it as a fluency scale:

cv	What it feels like	Typical work
0.10	You have done this exact thing many times	Routine, repeated production
0.20	Familiar work, modest unknowns	Steady knowledge work, admin
0.30	Normal project work	Most tasks, most plans
0.50	Real unknowns in the middle	Integration, debugging, first drafts
0.80+	Genuine research	"We will know when we get there"

Two tasks with the same 20-hour centre and different cv values are different planning objects. One is a brick; the other is a cloud shaped like a brick. A forecast that treats them identically will be confidently wrong about one of them.

Why lognormal, specifically#

Once a task has a centre and a spread, the engine needs a distribution to sample from, and the shape matters. Task durations have three stubborn empirical properties: they are never negative, they cluster somewhere near the estimate, and they have a long right tail (the occasional 4x blowout happens; a 4x underrun does not, because the task cannot take negative time).

The lognormal distribution has exactly these properties, and the empirical literature backs it as the right family for activity durations. Sonmez (2004), in the ARCOM proceedings, anchored construction estimation on lognormal with low, medium, and high variability brackets near cv 0.10, 0.20, and 0.30. Ballesteros-Perez et al. (2020) measured actual-versus-planned duration ratios across more than 5,000 construction activities and confirmed both the family and the regimes. Trietsch and Baker (2012) found software and engineering activity times lognormal in shape as well. This research is not background reading; it is compiled into the engine as per-area defaults, and the next section prints the whole table.

The asymmetry is the part worth internalising. A lognormal with mean 20 hours and cv 0.5 puts meaningful probability on 35 hours and essentially none on 5. Your intuition already knows this ("it might balloon, it will not finish itself") and a symmetric plus-or-minus range fails to capture it. The long tail is not pessimism; it is the actual shape of work.

Eighteen areas, each with an evidence-anchored default#

You do not have to pick every cv from intuition, because most tasks belong to a recognisable type of activity, and activity types have measurable variability signatures. Topolog ships a curated taxonomy of eighteen areas, designed to cover all known types of activity, each carrying a default cv anchored in the duration-variability literature. Tag a task with its area and it starts from the evidence; your own cv (and, later, your own actuals) refine from there.

The full table, exactly as the engine uses it:

Area	Default cv	Anchor
`software_engineering`	0.50	Trietsch and Baker 2012; Hofstadter's law
`systems_admin`	0.40	DevOps retrospectives (extrapolated from software)
`design`	0.55	Creative-iteration variance
`writing`	0.45	te Braak et al. 2023
`research`	0.60	Open-ended investigation (high-variability regime)
`music_production`	0.45	Audio-engineering retrospectives
`visual_art`	0.55	Studio-time tracking literature
`construction`	0.20	Ballesteros-Perez et al. 2020; Sonmez 2004
`cooking`	0.15	Recipe execution (low-variability bracket)
`sport_training`	0.20	Periodisation literature
`health_medical`	0.30	Clinical task-duration studies
`education_teaching`	0.30	Lesson prep and delivery
`admin`	0.20	Sonmez 2004 "medium" analogue
`finance_analysis`	0.35	Bracketed knowledge work (modelling, reporting)
`legal`	0.40	Contract-review studies
`negotiation`	0.50	Deal-cycle variance
`manufacturing`	0.15	Sonmez 2004 "low variability"
`universe_external`	0.05	Physical processes (near-deterministic)

The table rewards a minute of reading. Research sits at 0.60 and manufacturing at 0.15, a four-fold difference in relative spread, which is the quantitative version of something every planner knows qualitatively: an hour of lab work and an hour of production-line work are not the same kind of hour. Writing and music production land together at 0.45 (creative work with a craft floor); legal and systems administration share 0.40 (procedural work that occasionally opens a trapdoor). And universe_external at 0.05 covers the things that are not work at all, like paint drying and packages shipping, which a schedule must contain but barely vary.

The full citation list, with links to the underlying papers, lives in the handbook's references section; the same table is the canonical source inside the engine, so the defaults you read here are the defaults the forecast actually uses.

Ranges compound; points lie#

The real payoff arrives when estimates combine. Add ten point estimates and you get one number that is wrong in an unknowable way. Sample ten lognormals through a dependency graph thousands of times, as the Monte Carlo engine does, and you get the full distribution of plan completion, including the effects no point arithmetic captures: parallel branches where the slowest one sets the pace, gates that branch the future, and the way independent uncertainties partially cancel while chained ones compound.

We walked through a concrete example of this in the completion spectrum article: four tasks summing to 44 estimated hours produced a P95 of 65 hours, purely from the per-task cv values. Nothing pathological, just compounding spread made visible.

There is also a correction the engine applies that deserves its own mention. People do not just estimate with spread; they estimate with bias, systematically low (the planning fallacy, the subject of its own article). Topolog separates the two: a Bayesian multiplier layer learns your systematic bias from completed work, while cv captures the residual randomness. Conflating them, as "just pad the estimate" does, fixes neither.

How to actually do it#

Estimating with ranges sounds like more work. In practice it is about thirty seconds per task, because you are recording a judgement you already have:

Estimate the centre the way you always did. The typical case, not the best case.
Pick the cv off the fluency scale above. Have I done this before? Are the unknowns in the middle of it or at the edges?
Do not pad. Padding moves the centre to hide the spread, which destroys both numbers. Record the honest centre and the honest spread; let the quantiles do the protecting.
Let actuals correct you. As you complete work, recorded durations recalibrate the model (how that works). Your first cv guesses do not need to be right; they need to be written down.

The deepest objection to ranges is that they feel like an admission of not knowing. They are the opposite. "Two days" pretends to knowledge nobody has. "Two days, cv 0.3" states precisely what you know and precisely how well you know it. In planning, as in measurement, the error bar is not a confession. It is the signature of someone who has done this before.

Two numbers, not one#

Why lognormal, specifically#

Eighteen areas, each with an evidence-anchored default#

Ranges compound; points lie#

How to actually do it#

Ready to plan in graphs?