Confidence Score

What the Confidence Score measures, how it is calibrated, and how to use it in your application.

What it is

The Confidence Score is a proprietary heuristic reliability index [0–100] attached to every output dimension. It reflects prediction uncertainty based on two factors:

  1. Anchor tier — the quality and completeness of the measurements you supplied
  2. Dimension type — BONE (skeletal) dimensions are more predictable than FLESH (soft-tissue) dimensions

It is not a frequentist p-value or a statistical confidence interval. The 95% prediction interval (range_95) is the statistically valid interval; the Confidence Score is an interpretability layer on top.


Calibration guarantee

The system is calibrated to never over-promise: actual 95% prediction interval coverage is always ≥ the stated confidence score. If the API reports a score of 78, the empirical coverage of range_95 in held-out validation data was ≥ 78%.


Empirical calibration evidence

Track 7 of the v1.4.0 validation tested this guarantee under the hardest realistic condition: a single non-primary anchor with no height and no mass. 14 anchor types × 100 subjects each (1,400 total calls).

Result: 14 / 14 CALIBRATED. Zero optimistic verdicts.

AnchorTierStated confidenceActual 95% PI coverage
foot_lengthSECONDARY36.3%79.0%
hand_lengthSECONDARY31.7%78.2%
sitting_heightSECONDARY41.4%74.0%
spanSECONDARY47.5%77.8%
hip_circumferenceSECONDARY15.1%84.5%
waist_circumferenceSECONDARY11.6%82.4%
neck_circumferenceSECONDARY12.0%81.6%
wrist_circumferenceSECONDARY22.1%79.3%
head_circumferenceTERTIARY14.2%78.6%
biacromial_breadthTERTIARY23.5%76.9%
forearm_circumferenceTERTIARY14.6%80.1%
ankle_circumferenceTERTIARY9.4%76.5%
calf_circumferenceTERTIARY13.8%83.1%
thigh_circumferenceTERTIARY9.5%78.2%

In every case, actual interval coverage exceeded the API’s stated confidence. The gap between stated and actual coverage is by design — the engine treats uncertainty conservatively. An application treating confidence_score as a lower bound on coverage will never be negatively surprised.

Note: stated confidence in Track 7 is an aggregate across all 130 output dimensions for a single-anchor input. High-confidence dimensions (directly constrained by the anchor) pull the average up; low-signal dimensions pull it down. The aggregate score is conservative by design.


Score by anchor tier

TierBONE scoreFLESH score
PRIMARY_RICH~87~80
PRIMARY_BOTH~85~78
PRIMARY_ONE~79~62
SECONDARY~74~67
TERTIARY~69~62

Scores decrease by up to −10 points when primary anchors are derived via imputation rather than supplied directly.


Supplied anchors score 100

Any measurement you provide as an anchor is returned with confidence_score: 100 and type: "MEASURED". The engine returns your input as-is — it does not adjust or replace it.


Pediatric engine scores

Dimension sourceScore
LMS dimensions (body_height, body_mass, head_circumference ≤ 36 months)99
Ridge hybrid predictions (all remaining pediatric dimensions)≤ 80

Using the threshold filter

Filter out low-confidence dimensions automatically:

"output_format": {
  "confidence_score_threshold": 75
}

Dimensions with confidence_score < 75 are excluded from the response. Useful when you only want to surface high-reliability predictions in your UI.


The 95% prediction interval

range_95 gives the lower and upper bound of the 95% prediction interval for each dimension:

"chest_circumference": {
  "value": 1012.4,
  "confidence_score": 78,
  "range_95": [954.1, 1070.7]
}

Enable it with include_range_95: true in output_format. Use it when you need to communicate prediction uncertainty to users or downstream systems.