Intuition and Insights

introducing pydantic - the data savior

authored by: @himanshustwts

hello! hope you're doing good. a very happy teacher's day as well to all teachers out there. context of writing on Pydantic is similar to last blog. if you didn't checkout it yet, read here decoding-how-spell-checkers-work

last night, my team was discussing about how to bring json serialization in our model (it's complex one) and incorporates external APIs. i just looked into JSON mode (supported in chat completion APIs) and found "pydantic" there.

let's discuss about problem first.
as you all know python lacks static typing (type of variables aren't declared unlike java or c).

x = 4 in python
int x = 4 in java/c

we can infer python overwrites variables as well from here. meaning, if 4 (int) is assigned to a variable x, we can also assign "four" (string) to x. basically, 4 got overwritten by "four".
It may look easier as we get started but cause a lots of problem later on.
As the application get bigger it becomes harder to keep track of all variables and what type they should be.

let's discuss an example code-

smith = Person("smith", 24)
smith = Person("smith, "24")

the biggest downside with this approach is it allows us to create invalid objects.
here, when we try to use the "age" variable as a number it will fail (now it is assigned as string).
also, it becomes quite hard to debug in a large codebase.

Python has a lot of tools already to solve this kind of problem like dataclass, type-hinting etc.

First, we're going to look over pydantic then we'll compare with existing tools.
what pydantic is?
in most simpler terms, it is data validation library in python. You may find using pydantic in top python libraries like huggingface/transformer, fastapi, langchain, airflow etc.
It has varied benefits:

  1. IDE Type Hints
  2. Data Validation
  3. JSON Serialization

how to use pydantic?

from pydantic import BaseModel, EmailStr, validator

class User(BaseModel):
    name: str
    email: EmailStr
    account_id: int

    # Data Validation
    @validator("account_id")
    def validate_account_id(cls, value):
     if value<=0:
        raise ValueError(f"account_id must be positive: {value}")
     return value



#creating instance of the model
user1 = User(
    name="himanshu",
    email="hdubey261102@gmail.com",
    account_id= 20211234
)
print(user1)

user2 = User(name="xyz", email="abc@gmail.com", account_id=2345)
print(user2.email)

#this will be invalid, account_id = int -> str
user2 = User(name="abc", email="pqr@gmail.com", account_id= "12346")
print(user2)

# JSON Serialization
# to convert pydantic models to JSON, we can call JSON method on the model instance.
# this will return a JSON string representation of model's data

user_json_str = user1.json()
print(user_json_str)

# if you don't want json string but plane python dictionary
user_json_obj = user1.dict()
print(user_json_obj)

# if you've json string and want to convertit back into pydantic model
json_str = {"name":"xyz","email":"abc@gmail.com","account_id":2345}
user3 = User.model_validate(json_str)
print(user3)

let's compare pydantic with dataclass.

@dataclass
class myProduct:
    name: str
    price: float
    in_stock: bool

product1 = Product("PS5", 70K, True)
import json

product1_dict = asdict(product1)
product1_json = json.dumps(product1_dict)
new_product1_dict = json.loads(product1_json)
new_product1 = Product(**new_product1_dict)

this is all about pydantic in brief. refer official doc for further use cases. Pydantic Docs

cyaa :)