improvements of subclasses and jsonization time by add two options in global_config #521

democrazyx · 2024-02-28T13:40:16Z

The two contributions of the pr are as follows:

add class info to restore subclasses from json, can be enabled by set global_config.include_class_info = True
save time by cache the result of type checking, can be enabled by set global_config.enable_cache = True

to see detailed usage and comparation, you can open the jupyter notebook file

the following code is derived from the ipynb file

# %% [markdown]
# # 1. include class info in the json result

# %%
from dataclasses import dataclass,field
from typing import Set, Optional

from dataclasses_json import dataclass_json,global_config


@dataclass_json
@dataclass
class Animal:
    id: int = 0
    health: int = 100


@dataclass_json
@dataclass
class Cat(Animal):
    age: int = 1

@dataclass_json
@dataclass
class Dog(Animal):
    age: int = 1

@dataclass_json
@dataclass
class PetCat(Cat):
    name: str = ''

@dataclass_json
@dataclass
class Person:
    name:str = 'zyx'
    animals: list[Animal] = field(default_factory=lambda:[])


# %%
p1=Person(animals=[Animal(),Cat(),PetCat()])
p1.to_dict()

# %%
p2 = Person.from_dict(p1.to_dict())
p2.to_dict()

# %% [markdown]
# some fields are missing!
# 
# to solve this, we need to include class info into the result

# %%
global_config.include_class_info=True
p1.to_dict()

# %%
p2 = Person.from_dict(p1.to_dict())
global_config.include_class_info=False
p2.to_dict()

# %% [markdown]
# now the fields are all restored!

# %% [markdown]
# # 2. use cache to save time

# %% [markdown]
# if i have thousands of objects to jsonize, the code will waste much time on get dataclass info, which will not change however in the process of jsonization 

# %%
import cProfile
import pstats
global_config.enable_cache=False
p3 = Person(animals=[Animal() for _ in range(100000)])

pr = cProfile.Profile()
pr.enable()
result_without_cache = p3.to_json()
pr.disable()
pr.dump_stats('profile_stats1')
stats = pstats.Stats('profile_stats1')
stats.sort_stats('cumulative')
stats.print_stats()

# %%
import cProfile
import pstats
global_config.enable_cache=True
p3 = Person(animals=[Animal() for _ in range(100000)])

pr = cProfile.Profile()
pr.enable()
result_with_cache = p3.to_json()
pr.disable()
pr.dump_stats('profile_stats2')
stats = pstats.Stats('profile_stats2')
stats.sort_stats('cumulative')
stats.print_stats()

# %% [markdown]
# The improvement in program speed is huge, from 6.6s to 2.5s in my laptop
# 
# now let's check if the results are the same

# %%
result_with_cache==result_without_cache

2. save time by cache the result of type checking

zyx added 2 commits February 28, 2024 21:28

1. add class info to restore subclasses from json

5ba014d

2. save time by cache the result of type checking

_cache max size bug fixed

f2e5a95

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvements of subclasses and jsonization time by add two options in global_config #521

improvements of subclasses and jsonization time by add two options in global_config #521

democrazyx commented Feb 28, 2024

improvements of subclasses and jsonization time by add two options in global_config #521

Are you sure you want to change the base?

improvements of subclasses and jsonization time by add two options in global_config #521

Conversation

democrazyx commented Feb 28, 2024