Confidence Scores in Anthropometric APIs: Dev Guide

When you get a body dimension prediction from an anthropometric API, you typically see something like this:

"chest_circumference": {
  "value": 921.4,
  "unit": "mm",
  "type": "FLESH",
  "confidence_score": 78,
  "range_95": [869.0, 973.8]
}

Three numbers: a value, a confidence score, and a prediction interval. Most integrations read value and ignore the other two. That’s leaving significant information on the table. This guide explains what these numbers actually mean and how to use them.

What confidence_score is not

confidence_score is not a probability. It is not a percentage likelihood that the prediction is correct. It is a reliability index — a composite heuristic that reflects how much statistical information the model had available when producing this prediction.

Specifically, it is derived from the 95% prediction interval: a narrower interval relative to the predicted value maps to a higher score. The theoretical ceiling of 100 would require a prediction interval of width zero — only possible if the dimension were directly measured, not predicted.

Think of it as an answer to: “How much should I trust this number for a critical decision?” A score of 85 means “high confidence, use it.” A score of 65 means “use with caution, widen your tolerances.”

The two dimension types: BONE and FLESH

Every output dimension is either BONE or FLESH. This distinction is more important than the confidence score for understanding prediction accuracy.

BONE dimensions are skeletal landmarks: sitting height, biacromial breadth (shoulder width), forearm-hand length, crotch height, foot length. These are governed by skeletal geometry, which varies smoothly and predictably with body height. Ridge regression captures this relationship with high fidelity.

FLESH dimensions are soft-tissue measurements: chest circumference, waist, hip, thigh, upper arm. These depend on fat distribution, muscle mass, and body composition — all of which vary substantially at a given height and weight. A person who is 175cm and 75kg can have a waist measurement anywhere in a large range, depending on their metabolic profile.

For height + weight input:

BONE confidence scores: typically 83–87
FLESH confidence scores: typically 76–80

BONE predictions are more reliable. When building sizing features, be aware that circumference predictions carry more inherent uncertainty than skeletal predictions.

Anchor tiers and how they affect confidence

The confidence score is strongly influenced by which inputs you provide. The system recognizes three tiers:

PRIMARY_BOTH — height and weight provided. This is the baseline. BONE ~85, FLESH ~78.

PRIMARY_RICH — height, weight, and at least one circumference anchor. Adding one circumference (chest, waist, hip, neck, or wrist) directly reduces FLESH prediction variance for that body region. BONE ~87, FLESH ~80.

# PRIMARY_RICH request — add waist to improve torso dimension confidence
"anchors": {
    "body_height": 1780,
    "body_mass": 82.0,
    "waist_circumference_omphalion": 880.0
}

SECONDARY — a single non-primary anchor. If the user provides only foot length, or only hand length, the engine first imputes height and weight from that anchor, then runs the adult model. Confidence drops: BONE ~74, FLESH ~67.

# SECONDARY tier — imputed from foot length alone
"anchors": {
    "foot_length": 275
}

The practical implication for product design: if your onboarding flow can ask one additional question beyond height and weight, make it a circumference measurement. Waist size is the highest-value single anchor for most sizing use cases (it directly constrains trouser, dress, and outerwear sizing). Many users know their waist from existing clothing.

What range_95 actually means

range_95 is the 95% prediction interval — derived analytically from the Standard Error of the Estimate for each dimension and the input leverage.

"range_95": [869.0, 973.8]

This means: in 95 out of 100 people with the same height, weight, and demographic profile, the actual chest circumference measurement will fall between 869.0mm and 973.8mm. The interval width is the honest quantification of prediction uncertainty.

A 104.8mm interval on a 921.4mm chest prediction is a ±5.7% uncertainty band. For fit recommendation purposes — “is this a size M or L?” — this is often precise enough. For bespoke tailoring requiring ±3mm accuracy, it is not.

Using range_95 in your product:

For size recommendation: take the predicted value and check which size bucket it falls into. If the range_95 spans two size buckets, show the user both options and let them choose.

For ergonomic design: use the 97.5th percentile (upper bound of range_95) for clearance dimensions — you need to accommodate the largest realistic user. Use the 2.5th percentile (lower bound) for reach dimensions — you need to accommodate the smallest realistic user.

For display to end users: don’t show them range_95 directly. Translate it into language they understand: “Your recommended size is M, with L as a likely fit if you prefer more room.”

The biological_limit_status field

Each dimension also includes biological_limit_status, which is either "OK", "WARNING", or "EXCEEDED".

This flags predictions that approach or exceed documented biological limits for the species — extreme values that suggest an unusual combination of inputs. An "EXCEEDED" status means the predicted value is outside the observed range in the training data and should be treated with caution.

In practice, this triggers for:

BMI values below 15 or above 50
Heights below 145cm or above 210cm in combination with typical weight ranges
Inputs that create physiologically implausible dimension combinations

If you see biological_limit_status: "WARNING" in a response, log it for review and add wider tolerances in your downstream logic.

Practical thresholds for product decisions

Here are the confidence thresholds we’ve found most useful when building with anthropometric predictions:

Score range	Interpretation	Recommended use
85–100	High confidence	Use directly for sizing decisions
75–84	Moderate confidence	Use with ±1 size buffer
65–74	Lower confidence (SECONDARY tier)	Use as rough estimate, show alternatives
< 65	Low confidence	Don’t use for sizing; surface to user for correction

The API’s confidence_score_threshold output filter lets you request only dimensions above a minimum score:

"output_format": {
    "confidence_score_threshold": 75,  # only return dimensions with score >= 75
    "include_range_95": True
}

This is useful for targeted queries where you only care about high-confidence dimensions.

Understanding these three numbers — value, confidence_score, and range_95 — turns a body measurement API from a black box into a statistical tool you can reason about. The uncertainty is real; the confidence score makes it visible. Products that surface this uncertainty appropriately (offering multiple size options, noting “best estimate”) will serve users better than products that present predictions as absolute facts.

Understanding Confidence Scores in Anthropometric APIs: A Developer's Guide

What confidence_score is not

The two dimension types: BONE and FLESH

Anchor tiers and how they affect confidence

What range_95 actually means

The biological_limit_status field

Practical thresholds for product decisions

Single Anchor vs. Multi-Anchor: When One Body Measurement Is Enough

Body Measurement Bundles Explained: TORSO, HAND_ARM, LEGS_FEET, HEAD_FACE

Imputation Methods for Missing Body Measurements: MICE vs. MissForest