Confidence Score | DimensionsPot

What the Confidence Score measures, how it is calibrated, and how to use it in your application.

What it is

The Confidence Score is a proprietary heuristic reliability index [0–100] attached to every output dimension. It reflects prediction uncertainty based on two factors:

Anchor tier — the quality and completeness of the measurements you supplied
Dimension type — BONE (skeletal) dimensions are more predictable than FLESH (soft-tissue) dimensions

It is not a frequentist p-value or a statistical confidence interval. The 95% prediction interval (range_95) is the statistically valid interval; the Confidence Score is an interpretability layer on top.

Calibration guarantee

The system is calibrated to never over-promise: actual 95% prediction interval coverage is always ≥ the stated confidence score. If the API reports a score of 78, the empirical coverage of range_95 in held-out validation data was ≥ 78%.

Empirical calibration evidence

Track 7 of the v1.4.0 validation tested this guarantee under the hardest realistic condition: a single non-primary anchor with no height and no mass. 14 anchor types × 100 subjects each (1,400 total calls).

Result: 14 / 14 CALIBRATED. Zero optimistic verdicts.

Anchor	Tier	Stated confidence	Actual 95% PI coverage
`foot_length`	SECONDARY	36.3%	79.0%
`hand_length`	SECONDARY	31.7%	78.2%
`sitting_height`	SECONDARY	41.4%	74.0%
`span`	SECONDARY	47.5%	77.8%
`hip_circumference`	SECONDARY	15.1%	84.5%
`waist_circumference`	SECONDARY	11.6%	82.4%
`neck_circumference`	SECONDARY	12.0%	81.6%
`wrist_circumference`	SECONDARY	22.1%	79.3%
`head_circumference`	TERTIARY	14.2%	78.6%
`biacromial_breadth`	TERTIARY	23.5%	76.9%
`forearm_circumference`	TERTIARY	14.6%	80.1%
`ankle_circumference`	TERTIARY	9.4%	76.5%
`calf_circumference`	TERTIARY	13.8%	83.1%
`thigh_circumference`	TERTIARY	9.5%	78.2%

In every case, actual interval coverage exceeded the API’s stated confidence. The gap between stated and actual coverage is by design — the engine treats uncertainty conservatively. An application treating confidence_score as a lower bound on coverage will never be negatively surprised.

Note: stated confidence in Track 7 is an aggregate across all 130 output dimensions for a single-anchor input. High-confidence dimensions (directly constrained by the anchor) pull the average up; low-signal dimensions pull it down. The aggregate score is conservative by design.

Score by anchor tier

Tier	BONE score	FLESH score
`PRIMARY_RICH`	~87	~80
`PRIMARY_BOTH`	~85	~78
`PRIMARY_ONE`	~79	~62
`SECONDARY`	~74	~67
`TERTIARY`	~69	~62

Scores decrease by up to −10 points when primary anchors are derived via imputation rather than supplied directly.

Supplied anchors score 100

Any measurement you provide as an anchor is returned with confidence_score: 100 and type: "MEASURED". The engine returns your input as-is — it does not adjust or replace it.

Pediatric engine scores

Dimension source	Score
LMS dimensions (body_height, body_mass, head_circumference ≤ 36 months)	99
Ridge hybrid predictions (all remaining pediatric dimensions)	≤ 80

Using the threshold filter

Filter out low-confidence dimensions automatically:

"output_format": {
  "confidence_score_threshold": 75
}

Dimensions with confidence_score < 75 are excluded from the response. Useful when you only want to surface high-reliability predictions in your UI.

The 95% prediction interval

range_95 gives the lower and upper bound of the 95% prediction interval for each dimension:

"chest_circumference": {
  "value": 1012.4,
  "confidence_score": 78,
  "range_95": [954.1, 1070.7]
}

Enable it with include_range_95: true in output_format. Use it when you need to communicate prediction uncertainty to users or downstream systems.