Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements of subclasses and jsonization time by add two options in global_config #521

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

democrazyx
Copy link

The two contributions of the pr are as follows:

  1. add class info to restore subclasses from json, can be enabled by set global_config.include_class_info = True
  2. save time by cache the result of type checking, can be enabled by set global_config.enable_cache = True

to see detailed usage and comparation, you can open the jupyter notebook file

the following code is derived from the ipynb file

# %% [markdown]
# # 1. include class info in the json result

# %%
from dataclasses import dataclass,field
from typing import Set, Optional

from dataclasses_json import dataclass_json,global_config


@dataclass_json
@dataclass
class Animal:
    id: int = 0
    health: int = 100


@dataclass_json
@dataclass
class Cat(Animal):
    age: int = 1

@dataclass_json
@dataclass
class Dog(Animal):
    age: int = 1

@dataclass_json
@dataclass
class PetCat(Cat):
    name: str = ''

@dataclass_json
@dataclass
class Person:
    name:str = 'zyx'
    animals: list[Animal] = field(default_factory=lambda:[])


# %%
p1=Person(animals=[Animal(),Cat(),PetCat()])
p1.to_dict()

# %%
p2 = Person.from_dict(p1.to_dict())
p2.to_dict()

# %% [markdown]
# some fields are missing!
# 
# to solve this, we need to include class info into the result

# %%
global_config.include_class_info=True
p1.to_dict()

# %%
p2 = Person.from_dict(p1.to_dict())
global_config.include_class_info=False
p2.to_dict()

# %% [markdown]
# now the fields are all restored!

# %% [markdown]
# # 2. use cache to save time

# %% [markdown]
# if i have thousands of objects to jsonize, the code will waste much time on get dataclass info, which will not change however in the process of jsonization 

# %%
import cProfile
import pstats
global_config.enable_cache=False
p3 = Person(animals=[Animal() for _ in range(100000)])

pr = cProfile.Profile()
pr.enable()
result_without_cache = p3.to_json()
pr.disable()
pr.dump_stats('profile_stats1')
stats = pstats.Stats('profile_stats1')
stats.sort_stats('cumulative')
stats.print_stats()

# %%
import cProfile
import pstats
global_config.enable_cache=True
p3 = Person(animals=[Animal() for _ in range(100000)])

pr = cProfile.Profile()
pr.enable()
result_with_cache = p3.to_json()
pr.disable()
pr.dump_stats('profile_stats2')
stats = pstats.Stats('profile_stats2')
stats.sort_stats('cumulative')
stats.print_stats()

# %% [markdown]
# The improvement in program speed is huge, from 6.6s to 2.5s in my laptop
# 
# now let's check if the results are the same

# %%
result_with_cache==result_without_cache
zyx added 2 commits February 28, 2024 21:28
2. save time by cache the result of type checking
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant