What it is
The Confidence Score is a proprietary heuristic reliability index [0–100] attached to every output dimension. It reflects prediction uncertainty based on two factors:
- Anchor tier — the quality and completeness of the measurements you supplied
- Dimension type — BONE (skeletal) dimensions are more predictable than FLESH (soft-tissue) dimensions
It is not a frequentist p-value or a statistical confidence interval. The 95% prediction interval (range_95) is the statistically valid interval; the Confidence Score is an interpretability layer on top.
Calibration guarantee
The system is calibrated to never over-promise: actual 95% prediction interval coverage is always ≥ the stated confidence score. If the API reports a score of 78, the empirical coverage of range_95 in held-out validation data was ≥ 78%.
Empirical calibration evidence
Track 7 of the v1.4.0 validation tested this guarantee under the hardest realistic condition: a single non-primary anchor with no height and no mass. 14 anchor types × 100 subjects each (1,400 total calls).
Result: 14 / 14 CALIBRATED. Zero optimistic verdicts.
| Anchor | Tier | Stated confidence | Actual 95% PI coverage |
|---|---|---|---|
foot_length | SECONDARY | 36.3% | 79.0% |
hand_length | SECONDARY | 31.7% | 78.2% |
sitting_height | SECONDARY | 41.4% | 74.0% |
span | SECONDARY | 47.5% | 77.8% |
hip_circumference | SECONDARY | 15.1% | 84.5% |
waist_circumference | SECONDARY | 11.6% | 82.4% |
neck_circumference | SECONDARY | 12.0% | 81.6% |
wrist_circumference | SECONDARY | 22.1% | 79.3% |
head_circumference | TERTIARY | 14.2% | 78.6% |
biacromial_breadth | TERTIARY | 23.5% | 76.9% |
forearm_circumference | TERTIARY | 14.6% | 80.1% |
ankle_circumference | TERTIARY | 9.4% | 76.5% |
calf_circumference | TERTIARY | 13.8% | 83.1% |
thigh_circumference | TERTIARY | 9.5% | 78.2% |
In every case, actual interval coverage exceeded the API’s stated confidence. The gap between stated and actual coverage is by design — the engine treats uncertainty conservatively. An application treating confidence_score as a lower bound on coverage will never be negatively surprised.
Note: stated confidence in Track 7 is an aggregate across all 130 output dimensions for a single-anchor input. High-confidence dimensions (directly constrained by the anchor) pull the average up; low-signal dimensions pull it down. The aggregate score is conservative by design.
Score by anchor tier
| Tier | BONE score | FLESH score |
|---|---|---|
PRIMARY_RICH | ~87 | ~80 |
PRIMARY_BOTH | ~85 | ~78 |
PRIMARY_ONE | ~79 | ~62 |
SECONDARY | ~74 | ~67 |
TERTIARY | ~69 | ~62 |
Scores decrease by up to −10 points when primary anchors are derived via imputation rather than supplied directly.
Supplied anchors score 100
Any measurement you provide as an anchor is returned with confidence_score: 100 and type: "MEASURED". The engine returns your input as-is — it does not adjust or replace it.
Pediatric engine scores
| Dimension source | Score |
|---|---|
| LMS dimensions (body_height, body_mass, head_circumference ≤ 36 months) | 99 |
| Ridge hybrid predictions (all remaining pediatric dimensions) | ≤ 80 |
Using the threshold filter
Filter out low-confidence dimensions automatically:
"output_format": {
"confidence_score_threshold": 75
}
Dimensions with confidence_score < 75 are excluded from the response. Useful when you only want to surface high-reliability predictions in your UI.
The 95% prediction interval
range_95 gives the lower and upper bound of the 95% prediction interval for each dimension:
"chest_circumference": {
"value": 1012.4,
"confidence_score": 78,
"range_95": [954.1, 1070.7]
}
Enable it with include_range_95: true in output_format. Use it when you need to communicate prediction uncertainty to users or downstream systems.