Pydantic `cloudpickle` issue with models with definitions

The Fix

Adds a regression test for the cloudpickle issue with models that have definitions, addressing the problem where deserialized models are missing their defined attributes.

Based on closed pydantic/pydantic issue #12696 · PR/commit linked

Jump to Verify Open PR/Commit

@@ -1,7 +1,11 @@
 import gc
 import pickle
+import platform
+import subprocess
 import sys

repro.py

# /// script
# dependencies = [
#   "pydantic==2.12.5",
#   "pyspark[connect]==3.5.3",
#   "setuptools==77.0.3",
# ]
# ///

import os
import pydantic as pd
from pyspark.sql import SparkSession, functions as F, types as t

class Foo(pd.BaseModel):
    foo: int

class Bar(pd.BaseModel):
    bar1: list[Foo] | None
    bar2: Foo

class Baz(pd.BaseModel):
    baz1: Foo
    baz2: int

if __name__ == "__main__":
    s = SparkSession.builder.master("local[1]").getOrCreate()

    (
        s
        .range(1)
        .select(
            F.udf(
                lambda: repr(Bar.model_validate_json('{"bar1": [{"foo": 1}], "bar2": {"foo": 2}}'))
            )()
        )
        .show(truncate=False)
    )
    #   +----------+
    #   |<lambda>()|
    #   +----------+
    #   |Bar()     |
    #   +----------+

    (
        s
        .range(1)
        .select(
            F.udf(
                lambda: repr(Baz.model_validate_json('{"baz1": {"foo": 1}, "baz2": 2}'))
            )()
        )
        .show(truncate=False)
    )
    #   +----------------------------+
    #   |<lambda>()                  |
    #   +----------------------------+
    #   |Baz(baz1=Foo(foo=1), baz2=2)|
    #   +----------------------------+

verify

Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

fix.md

Option A — Apply the official fix\nAdds a regression test for the cloudpickle issue with models that have definitions, addressing the problem where deserialized models are missing their defined attributes.\nWhen NOT to use: This fix should not be used if the model structure changes significantly or if backward compatibility is a concern.\n\n

Why This Fix Works in Production

Trigger: This does not seem to happen when using multiple primitive fields, e.g. the below is fine:
Mechanism: Deserialization of Pydantic models with multiple attributes of the same type fails, resulting in missing attributes

Production impact:

If left unfixed, this can cause silent data inconsistencies that propagate (bad cache entries, incorrect downstream decisions).

Why This Breaks in Prod

Shows up under Python 3.13.3 in real deployments (not just unit tests).
Deserialization of Pydantic models with multiple attributes of the same type fails, resulting in missing attributes
Production symptom (often without a traceback): This does not seem to happen when using multiple primitive fields, e.g. the below is fine:

Proof / Evidence

GitHub issue: #12696
Fix PR: https://github.com/pydantic/pydantic/pull/12712
Reproduced locally: No (not executed)
Last verified: 2026-02-09
Confidence: 0.80
Did this fix it?: Yes (upstream fix exists)
Own content ratio: 0.54

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

Jump to Sources Open on GitHub

“We recently had a similar issue fixed in https://github.com/pydantic/pydantic-core/pull/1693”

@Viicos · 2026-01-12 · confirmation · source

“Thanks for the quick response. I've not yet been able to reproduce in pure Python, without a Spark cluster, but I'll keep working on that.”

@agrski · 2026-01-12 · source

“The issue does not seem to occur with dumps/loads in the same process”

@psavalle · 2026-01-12 · source

“Thanks @psavalle, I know understand where the issue is coming from. I think https://github.com/pydantic/pydantic-core/pull/1895 might actually fix it, cc @lmmx”

@Viicos · 2026-01-13 · source

Failure Signature (Search String)

This does not seem to happen when using multiple primitive fields, e.g. the below is fine:
SchemaSerializer(serializer=Prebuilt(

Copy-friendly signature

signature.txt

Failure Signature
-----------------
This does not seem to happen when using multiple primitive fields, e.g. the below is fine:
SchemaSerializer(serializer=Prebuilt(

Error Message

Signature-only (no traceback captured)

error.txt

Error Message
-------------
This does not seem to happen when using multiple primitive fields, e.g. the below is fine:
SchemaSerializer(serializer=Prebuilt(

Minimal Reproduction

repro.py

# /// script
# dependencies = [
#   "pydantic==2.12.5",
#   "pyspark[connect]==3.5.3",
#   "setuptools==77.0.3",
# ]
# ///

import os
import pydantic as pd
from pyspark.sql import SparkSession, functions as F, types as t

class Foo(pd.BaseModel):
    foo: int

class Bar(pd.BaseModel):
    bar1: list[Foo] | None
    bar2: Foo

class Baz(pd.BaseModel):
    baz1: Foo
    baz2: int

if __name__ == "__main__":
    s = SparkSession.builder.master("local[1]").getOrCreate()

    (
        s
        .range(1)
        .select(
            F.udf(
                lambda: repr(Bar.model_validate_json('{"bar1": [{"foo": 1}], "bar2": {"foo": 2}}'))
            )()
        )
        .show(truncate=False)
    )
    #   +----------+
    #   |<lambda>()|
    #   +----------+
    #   |Bar()     |
    #   +----------+

    (
        s
        .range(1)
        .select(
            F.udf(
                lambda: repr(Baz.model_validate_json('{"baz1": {"foo": 1}, "baz2": 2}'))
            )()
        )
        .show(truncate=False)
    )
    #   +----------------------------+
    #   |<lambda>()                  |
    #   +----------------------------+
    #   |Baz(baz1=Foo(foo=1), baz2=2)|
    #   +----------------------------+

Environment

Python: 3.13.3
Pydantic: 2

What Broke

Deserialized models are empty, leading to AttributeErrors when accessing attributes.

Why It Broke

Deserialization of Pydantic models with multiple attributes of the same type fails, resulting in missing attributes

Fix Options (Details)

Option A — Apply the official fix

Adds a regression test for the cloudpickle issue with models that have definitions, addressing the problem where deserialized models are missing their defined attributes.

When NOT to use: This fix should not be used if the model structure changes significantly or if backward compatibility is a concern.

Option D — Guard side-effects with OnceOnly Guardrail for side-effects

Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.

Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.
This does NOT fix data corruption; it only prevents duplicate side-effects.

Show example snippet (optional)

onceonly.py

from onceonly import OnceOnly
import os

once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True)

# Stable idempotency key per real side-effect.
# Use a request id / job id / webhook delivery id / Stripe event id, etc.
event_id = "evt_..."  # replace
key = f"stripe:webhook:{event_id}"

res = once.check_lock(key=key, ttl=3600)
if res.duplicate:
    return {"status": "already_processed"}

# Safe to execute the side-effect exactly once.
handle_event(event_id)

See OnceOnly SDK

When NOT to use: Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Fix reference: https://github.com/pydantic/pydantic/pull/12712

Last verified: 2026-02-09. Validate in your environment.

When NOT to Use This Fix

This fix should not be used if the model structure changes significantly or if backward compatibility is a concern.
Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Verify Fix

verify

Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

Add a CI check that diffs key outputs after upgrades (OpenAPI schema snapshots, JSON payload shapes, CLI output).
Upgrade behind a canary and run integration tests against the canary before 100% rollout.

Related Issues

No related fixes found.

Cluster: pydantic:data-consistency Pydantic hub Pydantic best practices All hubs All clusters

Related clusters: Configuration error OpenAPI schema Serialization

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.

Pydantic `cloudpickle` issue with models with definitions (Fix)