Schema Identification
Detect which schema unknown data belongs to, then transform it.
Schema Identification
Sometimes you have data but don't know which schema it came from. An API payload, a deserialized blob, a message from a queue. You need to figure out what it is before you can transform it.
The identify config option adds two methods to your registry: identify() to detect the schema, and identifyAndTransform() to detect and transform in one call. Both are fully opt-in. If you don't configure identify, these methods don't exist on the registry type at all.
Two forms
The identify option accepts either a guard map or a function. Pick the one that fits your data.
Guard map
Map schema keys to predicates. Each predicate receives an unknown value and returns true if it belongs to that schema:
import { , } from 'dobajs'
import { } from 'zod'
const registry = ({ : {
: .({
: .(),
: .(),
: .(),
: .(['admin', 'user']),
}),
: .({
: .(),
: .(),
: .(),
: .(['admin', 'user']),
}),
: .({
: .(),
: .(),
: .(),
}),
},
: {
'database->frontend': () => ({
: .,
: .,
: new ().(),
: .,
}),
'frontend->ai': () => ({
: .,
: .,
: . === 'admin',
}),
},
: {
: .('passwordHash'),
: .('isAdmin'),
: .('createdAt', 'role').(() => {
return !('passwordHash' in ( as object)) && !('isAdmin' in ( as object))
}),
},
})Guards run in definition order. The first guard that returns true wins. You don't have to cover every schema. Unlisted schemas are simply not identifiable.
Guard map keys are typed against your schema map. Writing identify: { typo: match.field('x') } is a compile-time error if "typo" isn't a registered schema key.
Function form
For pattern-based discrimination (like versioned data with a type field), pass a single function that returns the schema key or null:
import { createRegistry, byField } from 'dobajs'
const registry = createRegistry({
schemas: { v1: v1Schema, v2: v2Schema, v3: v3Schema },
migrations: { /* ... */ },
identify: byField('version', { prefix: 'v' }),
})
// { version: "2" } -> "v2" -> matches the "v2" schema keyOr write the function yourself for full control:
const registry = createRegistry({
schemas: { database: dbSchema, ai: aiSchema, frontend: feSchema },
migrations: { /* ... */ },
identify: (value: unknown) => {
if (typeof value !== 'object' || value === null) {
return null
}
if ('passwordHash' in value) {
return 'database'
}
if ('isAdmin' in value) {
return 'ai'
}
if ('createdAt' in value) {
return 'frontend'
}
return null
},
})Return null when nothing matches. Returned keys are verified against the schema map at runtime. If the function returns a string that isn't a registered schema key, identify() returns an identify_failed error.
Using identify
identify()
Detects which schema a value belongs to:
const = await .({ : 'abc123', : '1', : 'a@b.com' })
if (.) {
.value ..
} else {
.[0].code}identifyAndTransform()
The primary use case. Detect the source schema and transform to a target in one call:
const = await .(, 'ai')
if (.) {
.value ..from ..path}This runs identify() first, then feeds the result into transform(). The path is resolved automatically through the migration graph. All transform options work here:
await registry.identifyAndTransform(data, 'ai', {
validate: 'each',
pathStrategy: 'direct',
})Helpers
match
Chainable predicate builder. Each method adds an AND condition. The result is both chainable (add more conditions) and callable as (value: unknown) => boolean:
import { match } from 'dobajs'
// single checks
match.field('passwordHash') // field exists
match.field('version', 2) // field equals value (strict ===)
match.fields('displayName', 'avatar') // all fields present
match.type('string') // typeof check
match.test((v) => Array.isArray(v)) // arbitrary predicate
// chaining = AND
match.field('passwordHash').field('email')
// true only if both passwordHash AND email exist
// complex guard
match.type('object')
.fields('id', 'email')
.test((v) => !('passwordHash' in (v as object)))byField
Reads a field from the value and derives a schema key. Handles the common case where data has a type, version, or kind field:
import { byField } from 'dobajs'
// value.version matches schema key directly
byField('version')
// { version: "v1" } -> "v1"
// prefix/suffix for naming conventions
byField('version', { prefix: 'v' })
// { version: "2" } -> "v2"
byField('kind', { prefix: 'schema_', suffix: '_legacy' })
// { kind: "user" } -> "schema_user_legacy"
// explicit mapping when convention doesn't fit
byField('type', { map: { UserDB: 'database', UserFE: 'frontend' } })
// { type: "UserDB" } -> "database"
// { type: "Unknown" } -> nullprefix/suffix and map are mutually exclusive. Field values are converted to strings via String(). Returns null if the value isn't an object or the field is missing.
firstMatch
Composes multiple discriminator functions. Tries each in order, returns the first non-null result:
import { byField, firstMatch } from 'dobajs'
identify: firstMatch(
byField('_tag'), // try tagged data first
byField('version', { prefix: 'v' }), // then version field
(v) => typeof v === 'string' ? 'name' : null, // then typeof
)tryParse
A sentinel value. When used as a guard, doba validates the value against that schema's ~standard.validate() instead of running a sync predicate. Import it from dobajs:
import { createRegistry, match, tryParse } from 'dobajs'
const registry = createRegistry({
schemas: { cat: catSchema, dog: dogSchema, fish: fishSchema },
migrations: { /* ... */ },
identify: {
cat: match.field('indoor'), // cheap sync check
dog: tryParse, // validate against dogSchema
fish: tryParse, // validate against fishSchema
},
})How it works:
- Sync guards run first (cheap, one function call each)
- If a sync guard matches, that's the result.
tryParseschemas are never checked. - If no sync guard matched, all
tryParseschemas are validated in parallel - If exactly one validates, that's the match
- If multiple validate, you get
identify_ambiguous - If none validate, you get
identify_failed
Prefer sync guards over tryParse. Sync strategies (guard map, function, byField) run at ~120--200ns per call. tryParse runs actual schema validation and costs ~435ns, roughly 2x slower. Reserve it for schemas that are structurally hard to tell apart.
Error handling
Two issue codes specific to identification:
identify_failed
No guard matched and no tryParse schema validated the value:
const result = await registry.identify({ completely: 'unknown' })
if (!result.ok) {
result.issues[0].code // "identify_failed"
result.issues[0].message // "no schema matched the provided value"
}identify_ambiguous
Multiple tryParse schemas validated the same value. This only happens with tryParse. Sync guards use first-match-wins, so they can never produce ambiguity.
const result = await registry.identify(ambiguousData)
if (!result.ok) {
result.issues[0].code // "identify_ambiguous"
result.issues[0].message // "multiple schemas matched: cat, dog"
result.issues[0].meta // { matches: ["cat", "dog"] }
}Fix ambiguity by adding a sync guard for one of the conflicting schemas, or making the schemas more specific so they don't both validate the same input.
Performance
All identify strategies are fast, but they aren't equal. Benchmarks on Apple M3 Pro:
| Operation | Time | Throughput |
|---|---|---|
| Guard map (match) | ~120ns | 8.4M ops/sec |
| Function form | ~195ns | 5.1M ops/sec |
| byField | ~205ns | 4.9M ops/sec |
| tryParse (schema validation) | ~435ns | 2.3M ops/sec |
| identifyAndTransform (1 hop) | ~816ns | 1.2M ops/sec |
| identifyAndTransform (2 hops) | ~773ns | 1.3M ops/sec |
Build match chains once. Calling match.field('x').field('y') allocates new arrays and closures on each call (~1.6us). Assign the chain to a variable and reuse it. Executing an already built chain is ~10--30ns.
identifyAndTransform has minimal overhead compared to calling identify then transform separately. Use whichever reads better in your code.
Conditional types
When identify is not in the config, the methods don't exist on the registry type. This is enforced at the type level via function overloads:
// Without identify: methods don't exist
const reg = createRegistry({ schemas, migrations })
// reg.identify -> type error: property does not exist
// reg.identifyAndTransform -> type error: property does not exist
// With identify: methods are present
const reg = createRegistry({ schemas, migrations, identify: { ... } })
await reg.identify(data) // works
await reg.identifyAndTransform(data, 'ai') // worksThe existing transform(), validate(), has(), findPath(), and explain() methods work exactly the same whether identify is configured or not.
Full example
Putting it all together with a versioned API:
import { , } from 'dobajs'
import { } from 'zod'
const = .({ : .(), : .() })
const = .({ : .(), : .(), : .(['admin', 'user']) })
const = .({ : .(), : .(['admin', 'user']), : .() })
const = ({
: { : , : , : },
: {
'v1->v2': () => ({
: ..(' ')[0] ?? .,
: ..(' ')[1] ?? '',
: . ? 'admin' as : 'user' as ,
}),
'v2->v3': () => ({
: `${.} ${.}`.(),
: .,
: 'unknown@example.com',
}),
},
: {
: .('admin'),
: .('firstName', 'lastName'),
: .('displayName'),
},
})
// Unknown data comes in from an API
const : unknown = { : 'Alice Smith', : true }
const = await .(, 'v3')
if (.) {
.value ..from ..path}For the full API reference including all types and method signatures, see the Identify API reference.