Python dataclass from dict
The standard library in 3.7 can recursively convert a dataclass into a dict (example from the docs):
from dataclasses import dataclass, asdict
from typing import List
@dataclass
class Point:
x: int
y: int
@dataclass
class C:
mylist: List[Point]
p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert asdict(c) == tmp
I am looking for a way to turn a dict back into a dataclass when there is nesting. Something like C(**tmp)
only works if the fields of the data class are simple types and not themselves dataclasses. I am familiar with jsonpickle, which however comes with a prominent security warning.
python python-3.x python-dataclasses
add a comment |
The standard library in 3.7 can recursively convert a dataclass into a dict (example from the docs):
from dataclasses import dataclass, asdict
from typing import List
@dataclass
class Point:
x: int
y: int
@dataclass
class C:
mylist: List[Point]
p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert asdict(c) == tmp
I am looking for a way to turn a dict back into a dataclass when there is nesting. Something like C(**tmp)
only works if the fields of the data class are simple types and not themselves dataclasses. I am familiar with jsonpickle, which however comes with a prominent security warning.
python python-3.x python-dataclasses
The question this is marked as a duplicate of is indeed asking the same, but the answer given there does not work for this particular example. I've left a comment there and still looking for a more general answer.
– mbatchkarov
Nov 21 '18 at 9:51
Could you make that difference explicit here? It looks like you may have to add anelif
to thatif
that checks for various hints. I'm not sure how you would generalize it to arbitrary type hints though (Dict
andTuple
in addition toList
, for example)
– Patrick Haugh
Nov 21 '18 at 14:40
5
asdict
is losing information. It would not be possible to do this in the general case.
– wim
Nov 26 '18 at 18:44
6
Specifically,asdict
doesn't store any information about what class the dict was produced from. Givenclass A: x: int
andclass B: x: int
, should{'x': 5}
be used to create an instance ofA
orB
? You seem to be making the assumption that the list of attribute names uniquely defines a list, and that there is an existing mapping of names to data classes that could be used to select the correct class.
– chepner
Nov 26 '18 at 18:50
1
I would recommend you to check out this library.
– Abdul Niyas P M
Nov 28 '18 at 8:48
add a comment |
The standard library in 3.7 can recursively convert a dataclass into a dict (example from the docs):
from dataclasses import dataclass, asdict
from typing import List
@dataclass
class Point:
x: int
y: int
@dataclass
class C:
mylist: List[Point]
p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert asdict(c) == tmp
I am looking for a way to turn a dict back into a dataclass when there is nesting. Something like C(**tmp)
only works if the fields of the data class are simple types and not themselves dataclasses. I am familiar with jsonpickle, which however comes with a prominent security warning.
python python-3.x python-dataclasses
The standard library in 3.7 can recursively convert a dataclass into a dict (example from the docs):
from dataclasses import dataclass, asdict
from typing import List
@dataclass
class Point:
x: int
y: int
@dataclass
class C:
mylist: List[Point]
p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert asdict(c) == tmp
I am looking for a way to turn a dict back into a dataclass when there is nesting. Something like C(**tmp)
only works if the fields of the data class are simple types and not themselves dataclasses. I am familiar with jsonpickle, which however comes with a prominent security warning.
python python-3.x python-dataclasses
python python-3.x python-dataclasses
edited Nov 27 '18 at 16:52
wim
162k50310442
162k50310442
asked Nov 19 '18 at 13:51
mbatchkarovmbatchkarov
10.5k54170
10.5k54170
The question this is marked as a duplicate of is indeed asking the same, but the answer given there does not work for this particular example. I've left a comment there and still looking for a more general answer.
– mbatchkarov
Nov 21 '18 at 9:51
Could you make that difference explicit here? It looks like you may have to add anelif
to thatif
that checks for various hints. I'm not sure how you would generalize it to arbitrary type hints though (Dict
andTuple
in addition toList
, for example)
– Patrick Haugh
Nov 21 '18 at 14:40
5
asdict
is losing information. It would not be possible to do this in the general case.
– wim
Nov 26 '18 at 18:44
6
Specifically,asdict
doesn't store any information about what class the dict was produced from. Givenclass A: x: int
andclass B: x: int
, should{'x': 5}
be used to create an instance ofA
orB
? You seem to be making the assumption that the list of attribute names uniquely defines a list, and that there is an existing mapping of names to data classes that could be used to select the correct class.
– chepner
Nov 26 '18 at 18:50
1
I would recommend you to check out this library.
– Abdul Niyas P M
Nov 28 '18 at 8:48
add a comment |
The question this is marked as a duplicate of is indeed asking the same, but the answer given there does not work for this particular example. I've left a comment there and still looking for a more general answer.
– mbatchkarov
Nov 21 '18 at 9:51
Could you make that difference explicit here? It looks like you may have to add anelif
to thatif
that checks for various hints. I'm not sure how you would generalize it to arbitrary type hints though (Dict
andTuple
in addition toList
, for example)
– Patrick Haugh
Nov 21 '18 at 14:40
5
asdict
is losing information. It would not be possible to do this in the general case.
– wim
Nov 26 '18 at 18:44
6
Specifically,asdict
doesn't store any information about what class the dict was produced from. Givenclass A: x: int
andclass B: x: int
, should{'x': 5}
be used to create an instance ofA
orB
? You seem to be making the assumption that the list of attribute names uniquely defines a list, and that there is an existing mapping of names to data classes that could be used to select the correct class.
– chepner
Nov 26 '18 at 18:50
1
I would recommend you to check out this library.
– Abdul Niyas P M
Nov 28 '18 at 8:48
The question this is marked as a duplicate of is indeed asking the same, but the answer given there does not work for this particular example. I've left a comment there and still looking for a more general answer.
– mbatchkarov
Nov 21 '18 at 9:51
The question this is marked as a duplicate of is indeed asking the same, but the answer given there does not work for this particular example. I've left a comment there and still looking for a more general answer.
– mbatchkarov
Nov 21 '18 at 9:51
Could you make that difference explicit here? It looks like you may have to add an
elif
to that if
that checks for various hints. I'm not sure how you would generalize it to arbitrary type hints though (Dict
and Tuple
in addition to List
, for example)– Patrick Haugh
Nov 21 '18 at 14:40
Could you make that difference explicit here? It looks like you may have to add an
elif
to that if
that checks for various hints. I'm not sure how you would generalize it to arbitrary type hints though (Dict
and Tuple
in addition to List
, for example)– Patrick Haugh
Nov 21 '18 at 14:40
5
5
asdict
is losing information. It would not be possible to do this in the general case.– wim
Nov 26 '18 at 18:44
asdict
is losing information. It would not be possible to do this in the general case.– wim
Nov 26 '18 at 18:44
6
6
Specifically,
asdict
doesn't store any information about what class the dict was produced from. Given class A: x: int
and class B: x: int
, should {'x': 5}
be used to create an instance of A
or B
? You seem to be making the assumption that the list of attribute names uniquely defines a list, and that there is an existing mapping of names to data classes that could be used to select the correct class.– chepner
Nov 26 '18 at 18:50
Specifically,
asdict
doesn't store any information about what class the dict was produced from. Given class A: x: int
and class B: x: int
, should {'x': 5}
be used to create an instance of A
or B
? You seem to be making the assumption that the list of attribute names uniquely defines a list, and that there is an existing mapping of names to data classes that could be used to select the correct class.– chepner
Nov 26 '18 at 18:50
1
1
I would recommend you to check out this library.
– Abdul Niyas P M
Nov 28 '18 at 8:48
I would recommend you to check out this library.
– Abdul Niyas P M
Nov 28 '18 at 8:48
add a comment |
4 Answers
4
active
oldest
votes
Below is the CPython implementation of asdict
– or specifically, the internal recursive helper function _asdict_inner
that it uses:
# Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.py
def _asdict_inner(obj, dict_factory):
if _is_dataclass_instance(obj):
result =
for f in fields(obj):
value = _asdict_inner(getattr(obj, f.name), dict_factory)
result.append((f.name, value))
return dict_factory(result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
# [large block of author comments]
return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
elif isinstance(obj, (list, tuple)):
# [ditto]
return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_asdict_inner(k, dict_factory),
_asdict_inner(v, dict_factory))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
asdict
simply calls the above with some assertions, and dict_factory=dict
by default.
How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?
1. Adding type information
My attempt involved creating a custom return wrapper inheriting from dict
:
class TypeDict(dict):
def __init__(self, t, *args, **kwargs):
super(TypeDict, self).__init__(*args, **kwargs)
if not isinstance(t, type):
raise TypeError("t must be a type")
self._type = t
@property
def type(self):
return self._type
Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containers of dataclass
-es:
# only use dict for now; easy to add back later
def _todict_inner(obj):
if is_dataclass_instance(obj):
result =
for f in fields(obj):
value = _todict_inner(getattr(obj, f.name))
result.append((f.name, value))
return TypeDict(type(obj), result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
return type(obj)(*[_todict_inner(v) for v in obj])
elif isinstance(obj, (list, tuple)):
return type(obj)(_todict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_todict_inner(k), _todict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Imports:
from dataclasses import dataclass, fields, is_dataclass
# thanks to Patrick Haugh
from typing import *
# deepcopy
import copy
Functions used:
# copy of the internal function _is_dataclass_instance
def is_dataclass_instance(obj):
return is_dataclass(obj) and not is_dataclass(obj.type)
# the adapted version of asdict
def todict(obj):
if not is_dataclass_instance(obj):
raise TypeError("todict() should be called on dataclass instances")
return _todict_inner(obj)
Tests with the example dataclasses:
c = C([Point(0, 0), Point(10, 4)])
print(c)
cd = todict(c)
print(cd)
# {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
print(cd.type)
# <class '__main__.C'>
Results are as expected.
2. Converting back to a dataclass
The recursive routine used by asdict
can be re-used for the reverse process, with some relatively minor changes:
def _fromdict_inner(obj):
# reconstruct the dataclass using the type tag
if is_dataclass_dict(obj):
result = {}
for name, data in obj.items():
result[name] = _fromdict_inner(data)
return obj.type(**result)
# exactly the same as before (without the tuple clause)
elif isinstance(obj, (list, tuple)):
return type(obj)(_fromdict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_fromdict_inner(k), _fromdict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Functions used:
def is_dataclass_dict(obj):
return isinstance(obj, TypeDict)
def fromdict(obj):
if not is_dataclass_dict(obj):
raise TypeError("fromdict() should be called on TypeDict instances")
return _fromdict_inner(obj)
Test:
c = C([Point(0, 0), Point(10, 4)])
cd = todict(c)
cf = fromdict(cd)
print(c)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
print(cf)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
Again as expected.
6
TL;DR, +1 for the comprehensiveness for the answer.
– iBug
Nov 27 '18 at 11:28
2
+0: +1 for trying it, but -1 because it's basically a bad idea in the first place.
– wim
Nov 27 '18 at 16:54
1
@wim I'd agree tbh - can't see it as much more than a theoretical exercise (which at least shows thatdataclass
plays well with existing object types).
– meowgoesthedog
Nov 27 '18 at 16:57
I'm going to accept this as it's the most comprehensive answer that helps future users understand the core of the issue. I ended up with something closer to @Martijn's suggestion as I did indeed want JSON. Thank you everyone for your answers
– mbatchkarov
Dec 11 '18 at 9:56
add a comment |
I'm the author of dacite
- the tool that simplifies creation of data classes from dictionaries.
This library has only one function from_dict
- this is a quick example of usage:
from dataclasses import dataclass
from dacite import from_dict
@dataclass
class User:
name: str
age: int
is_active: bool
data = {
'name': 'john',
'age': 30,
'is_active': True,
}
user = from_dict(data_class=User, data=data)
assert user == User(name='john', age=30, is_active=True)
Moreover dacite
supports following features:
- nested structures
- (basic) types checking
- optional fields (i.e. typing.Optional)
- unions
- collections
- values casting and transformation
- remapping of fields names
... and it's well tested - 100% code coverage!
To install dacite, simply use pip (or pipenv):
$ pip install dacite
add a comment |
You can use mashumaro for creating dataclass object from a dict according to the scheme. Mixin from this library adds convenient from_dict
and to_dict
methods to dataclasses:
from dataclasses import dataclass
from typing import List
from mashumaro import DataClassDictMixin
@dataclass
class Point(DataClassDictMixin):
x: int
y: int
@dataclass
class C(DataClassDictMixin):
mylist: List[Point]
p = Point(10, 20)
tmp = {'x': 10, 'y': 20}
assert p.to_dict() == tmp
assert Point.from_dict(tmp) == p
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.to_dict() == tmp
assert C.from_dict(tmp) == c
add a comment |
If your goal is to produce JSON from and to existing, predefined dataclasses, then just write custom encoder and decoder hooks. Do not use dataclasses.asdict()
here, instead record in JSON a (safe) reference to the original dataclass.
jsonpickle
is not safe because it stores references to arbitrary Python objects and passes in data to their constructors. With such references I can get jsonpickle to reference internal Python data structures and create and execute functions, classes and modules at will. But that doesn't mean you can't handle such references unsafely. Just verify that you only import (not call) and then verify that the object is an actual dataclass type, before you use it.
The framework can be made generic enough but still limited only to JSON-serialisable types plus dataclass
-based instances:
import dataclasses
import importlib
import sys
def dataclass_object_dump(ob):
datacls = type(ob)
if not dataclasses.is_dataclass(datacls):
raise TypeError(f"Expected dataclass instance, got '{datacls!r}' object")
mod = sys.modules.get(datacls.__module__)
if mod is None or not hasattr(mod, datacls.__qualname__):
raise ValueError(f"Can't resolve '{datacls!r}' reference")
ref = f"{datacls.__module__}.{datacls.__qualname__}"
fields = (f.name for f in dataclasses.fields(ob))
return {**{f: getattr(ob, f) for f in fields}, '__dataclass__': ref}
def dataclass_object_load(d):
ref = d.pop('__dataclass__', None)
if ref is None:
return d
try:
modname, hasdot, qualname = ref.rpartition('.')
module = importlib.import_module(modname)
datacls = getattr(module, qualname)
if not dataclasses.is_dataclass(datacls) or not isinstance(datacls, type):
raise ValueError
return datacls(**d)
except (ModuleNotFoundError, ValueError, AttributeError, TypeError):
raise ValueError(f"Invalid dataclass reference {ref!r}") from None
This uses JSON-RPC-style class hints to name the dataclass, and on loading this is verified to still be a data class with the same fields. No type checking is done on the values of the fields (as that's a whole different kettle of fish).
Use these as the default
and object_hook
arguments to json.dump[s]()
and json.dump[s]()
:
>>> print(json.dumps(c, default=dataclass_object_dump, indent=4))
{
"mylist": [
{
"x": 0,
"y": 0,
"__dataclass__": "__main__.Point"
},
{
"x": 10,
"y": 4,
"__dataclass__": "__main__.Point"
}
],
"__dataclass__": "__main__.C"
}
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load)
C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load) == c
True
or create instances of the JSONEncoder
and JSONDecoder
classes with those same hooks.
Instead of using fully qualifying module and class names, you could also use a separate registry to map permissible type names; check against the registry on encoding, and again on decoding to ensure you don't forget to register dataclasses as you develop.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376099%2fpython-dataclass-from-dict%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Below is the CPython implementation of asdict
– or specifically, the internal recursive helper function _asdict_inner
that it uses:
# Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.py
def _asdict_inner(obj, dict_factory):
if _is_dataclass_instance(obj):
result =
for f in fields(obj):
value = _asdict_inner(getattr(obj, f.name), dict_factory)
result.append((f.name, value))
return dict_factory(result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
# [large block of author comments]
return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
elif isinstance(obj, (list, tuple)):
# [ditto]
return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_asdict_inner(k, dict_factory),
_asdict_inner(v, dict_factory))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
asdict
simply calls the above with some assertions, and dict_factory=dict
by default.
How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?
1. Adding type information
My attempt involved creating a custom return wrapper inheriting from dict
:
class TypeDict(dict):
def __init__(self, t, *args, **kwargs):
super(TypeDict, self).__init__(*args, **kwargs)
if not isinstance(t, type):
raise TypeError("t must be a type")
self._type = t
@property
def type(self):
return self._type
Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containers of dataclass
-es:
# only use dict for now; easy to add back later
def _todict_inner(obj):
if is_dataclass_instance(obj):
result =
for f in fields(obj):
value = _todict_inner(getattr(obj, f.name))
result.append((f.name, value))
return TypeDict(type(obj), result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
return type(obj)(*[_todict_inner(v) for v in obj])
elif isinstance(obj, (list, tuple)):
return type(obj)(_todict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_todict_inner(k), _todict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Imports:
from dataclasses import dataclass, fields, is_dataclass
# thanks to Patrick Haugh
from typing import *
# deepcopy
import copy
Functions used:
# copy of the internal function _is_dataclass_instance
def is_dataclass_instance(obj):
return is_dataclass(obj) and not is_dataclass(obj.type)
# the adapted version of asdict
def todict(obj):
if not is_dataclass_instance(obj):
raise TypeError("todict() should be called on dataclass instances")
return _todict_inner(obj)
Tests with the example dataclasses:
c = C([Point(0, 0), Point(10, 4)])
print(c)
cd = todict(c)
print(cd)
# {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
print(cd.type)
# <class '__main__.C'>
Results are as expected.
2. Converting back to a dataclass
The recursive routine used by asdict
can be re-used for the reverse process, with some relatively minor changes:
def _fromdict_inner(obj):
# reconstruct the dataclass using the type tag
if is_dataclass_dict(obj):
result = {}
for name, data in obj.items():
result[name] = _fromdict_inner(data)
return obj.type(**result)
# exactly the same as before (without the tuple clause)
elif isinstance(obj, (list, tuple)):
return type(obj)(_fromdict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_fromdict_inner(k), _fromdict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Functions used:
def is_dataclass_dict(obj):
return isinstance(obj, TypeDict)
def fromdict(obj):
if not is_dataclass_dict(obj):
raise TypeError("fromdict() should be called on TypeDict instances")
return _fromdict_inner(obj)
Test:
c = C([Point(0, 0), Point(10, 4)])
cd = todict(c)
cf = fromdict(cd)
print(c)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
print(cf)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
Again as expected.
6
TL;DR, +1 for the comprehensiveness for the answer.
– iBug
Nov 27 '18 at 11:28
2
+0: +1 for trying it, but -1 because it's basically a bad idea in the first place.
– wim
Nov 27 '18 at 16:54
1
@wim I'd agree tbh - can't see it as much more than a theoretical exercise (which at least shows thatdataclass
plays well with existing object types).
– meowgoesthedog
Nov 27 '18 at 16:57
I'm going to accept this as it's the most comprehensive answer that helps future users understand the core of the issue. I ended up with something closer to @Martijn's suggestion as I did indeed want JSON. Thank you everyone for your answers
– mbatchkarov
Dec 11 '18 at 9:56
add a comment |
Below is the CPython implementation of asdict
– or specifically, the internal recursive helper function _asdict_inner
that it uses:
# Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.py
def _asdict_inner(obj, dict_factory):
if _is_dataclass_instance(obj):
result =
for f in fields(obj):
value = _asdict_inner(getattr(obj, f.name), dict_factory)
result.append((f.name, value))
return dict_factory(result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
# [large block of author comments]
return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
elif isinstance(obj, (list, tuple)):
# [ditto]
return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_asdict_inner(k, dict_factory),
_asdict_inner(v, dict_factory))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
asdict
simply calls the above with some assertions, and dict_factory=dict
by default.
How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?
1. Adding type information
My attempt involved creating a custom return wrapper inheriting from dict
:
class TypeDict(dict):
def __init__(self, t, *args, **kwargs):
super(TypeDict, self).__init__(*args, **kwargs)
if not isinstance(t, type):
raise TypeError("t must be a type")
self._type = t
@property
def type(self):
return self._type
Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containers of dataclass
-es:
# only use dict for now; easy to add back later
def _todict_inner(obj):
if is_dataclass_instance(obj):
result =
for f in fields(obj):
value = _todict_inner(getattr(obj, f.name))
result.append((f.name, value))
return TypeDict(type(obj), result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
return type(obj)(*[_todict_inner(v) for v in obj])
elif isinstance(obj, (list, tuple)):
return type(obj)(_todict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_todict_inner(k), _todict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Imports:
from dataclasses import dataclass, fields, is_dataclass
# thanks to Patrick Haugh
from typing import *
# deepcopy
import copy
Functions used:
# copy of the internal function _is_dataclass_instance
def is_dataclass_instance(obj):
return is_dataclass(obj) and not is_dataclass(obj.type)
# the adapted version of asdict
def todict(obj):
if not is_dataclass_instance(obj):
raise TypeError("todict() should be called on dataclass instances")
return _todict_inner(obj)
Tests with the example dataclasses:
c = C([Point(0, 0), Point(10, 4)])
print(c)
cd = todict(c)
print(cd)
# {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
print(cd.type)
# <class '__main__.C'>
Results are as expected.
2. Converting back to a dataclass
The recursive routine used by asdict
can be re-used for the reverse process, with some relatively minor changes:
def _fromdict_inner(obj):
# reconstruct the dataclass using the type tag
if is_dataclass_dict(obj):
result = {}
for name, data in obj.items():
result[name] = _fromdict_inner(data)
return obj.type(**result)
# exactly the same as before (without the tuple clause)
elif isinstance(obj, (list, tuple)):
return type(obj)(_fromdict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_fromdict_inner(k), _fromdict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Functions used:
def is_dataclass_dict(obj):
return isinstance(obj, TypeDict)
def fromdict(obj):
if not is_dataclass_dict(obj):
raise TypeError("fromdict() should be called on TypeDict instances")
return _fromdict_inner(obj)
Test:
c = C([Point(0, 0), Point(10, 4)])
cd = todict(c)
cf = fromdict(cd)
print(c)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
print(cf)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
Again as expected.
6
TL;DR, +1 for the comprehensiveness for the answer.
– iBug
Nov 27 '18 at 11:28
2
+0: +1 for trying it, but -1 because it's basically a bad idea in the first place.
– wim
Nov 27 '18 at 16:54
1
@wim I'd agree tbh - can't see it as much more than a theoretical exercise (which at least shows thatdataclass
plays well with existing object types).
– meowgoesthedog
Nov 27 '18 at 16:57
I'm going to accept this as it's the most comprehensive answer that helps future users understand the core of the issue. I ended up with something closer to @Martijn's suggestion as I did indeed want JSON. Thank you everyone for your answers
– mbatchkarov
Dec 11 '18 at 9:56
add a comment |
Below is the CPython implementation of asdict
– or specifically, the internal recursive helper function _asdict_inner
that it uses:
# Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.py
def _asdict_inner(obj, dict_factory):
if _is_dataclass_instance(obj):
result =
for f in fields(obj):
value = _asdict_inner(getattr(obj, f.name), dict_factory)
result.append((f.name, value))
return dict_factory(result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
# [large block of author comments]
return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
elif isinstance(obj, (list, tuple)):
# [ditto]
return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_asdict_inner(k, dict_factory),
_asdict_inner(v, dict_factory))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
asdict
simply calls the above with some assertions, and dict_factory=dict
by default.
How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?
1. Adding type information
My attempt involved creating a custom return wrapper inheriting from dict
:
class TypeDict(dict):
def __init__(self, t, *args, **kwargs):
super(TypeDict, self).__init__(*args, **kwargs)
if not isinstance(t, type):
raise TypeError("t must be a type")
self._type = t
@property
def type(self):
return self._type
Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containers of dataclass
-es:
# only use dict for now; easy to add back later
def _todict_inner(obj):
if is_dataclass_instance(obj):
result =
for f in fields(obj):
value = _todict_inner(getattr(obj, f.name))
result.append((f.name, value))
return TypeDict(type(obj), result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
return type(obj)(*[_todict_inner(v) for v in obj])
elif isinstance(obj, (list, tuple)):
return type(obj)(_todict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_todict_inner(k), _todict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Imports:
from dataclasses import dataclass, fields, is_dataclass
# thanks to Patrick Haugh
from typing import *
# deepcopy
import copy
Functions used:
# copy of the internal function _is_dataclass_instance
def is_dataclass_instance(obj):
return is_dataclass(obj) and not is_dataclass(obj.type)
# the adapted version of asdict
def todict(obj):
if not is_dataclass_instance(obj):
raise TypeError("todict() should be called on dataclass instances")
return _todict_inner(obj)
Tests with the example dataclasses:
c = C([Point(0, 0), Point(10, 4)])
print(c)
cd = todict(c)
print(cd)
# {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
print(cd.type)
# <class '__main__.C'>
Results are as expected.
2. Converting back to a dataclass
The recursive routine used by asdict
can be re-used for the reverse process, with some relatively minor changes:
def _fromdict_inner(obj):
# reconstruct the dataclass using the type tag
if is_dataclass_dict(obj):
result = {}
for name, data in obj.items():
result[name] = _fromdict_inner(data)
return obj.type(**result)
# exactly the same as before (without the tuple clause)
elif isinstance(obj, (list, tuple)):
return type(obj)(_fromdict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_fromdict_inner(k), _fromdict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Functions used:
def is_dataclass_dict(obj):
return isinstance(obj, TypeDict)
def fromdict(obj):
if not is_dataclass_dict(obj):
raise TypeError("fromdict() should be called on TypeDict instances")
return _fromdict_inner(obj)
Test:
c = C([Point(0, 0), Point(10, 4)])
cd = todict(c)
cf = fromdict(cd)
print(c)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
print(cf)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
Again as expected.
Below is the CPython implementation of asdict
– or specifically, the internal recursive helper function _asdict_inner
that it uses:
# Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.py
def _asdict_inner(obj, dict_factory):
if _is_dataclass_instance(obj):
result =
for f in fields(obj):
value = _asdict_inner(getattr(obj, f.name), dict_factory)
result.append((f.name, value))
return dict_factory(result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
# [large block of author comments]
return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
elif isinstance(obj, (list, tuple)):
# [ditto]
return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_asdict_inner(k, dict_factory),
_asdict_inner(v, dict_factory))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
asdict
simply calls the above with some assertions, and dict_factory=dict
by default.
How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?
1. Adding type information
My attempt involved creating a custom return wrapper inheriting from dict
:
class TypeDict(dict):
def __init__(self, t, *args, **kwargs):
super(TypeDict, self).__init__(*args, **kwargs)
if not isinstance(t, type):
raise TypeError("t must be a type")
self._type = t
@property
def type(self):
return self._type
Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containers of dataclass
-es:
# only use dict for now; easy to add back later
def _todict_inner(obj):
if is_dataclass_instance(obj):
result =
for f in fields(obj):
value = _todict_inner(getattr(obj, f.name))
result.append((f.name, value))
return TypeDict(type(obj), result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
return type(obj)(*[_todict_inner(v) for v in obj])
elif isinstance(obj, (list, tuple)):
return type(obj)(_todict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_todict_inner(k), _todict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Imports:
from dataclasses import dataclass, fields, is_dataclass
# thanks to Patrick Haugh
from typing import *
# deepcopy
import copy
Functions used:
# copy of the internal function _is_dataclass_instance
def is_dataclass_instance(obj):
return is_dataclass(obj) and not is_dataclass(obj.type)
# the adapted version of asdict
def todict(obj):
if not is_dataclass_instance(obj):
raise TypeError("todict() should be called on dataclass instances")
return _todict_inner(obj)
Tests with the example dataclasses:
c = C([Point(0, 0), Point(10, 4)])
print(c)
cd = todict(c)
print(cd)
# {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
print(cd.type)
# <class '__main__.C'>
Results are as expected.
2. Converting back to a dataclass
The recursive routine used by asdict
can be re-used for the reverse process, with some relatively minor changes:
def _fromdict_inner(obj):
# reconstruct the dataclass using the type tag
if is_dataclass_dict(obj):
result = {}
for name, data in obj.items():
result[name] = _fromdict_inner(data)
return obj.type(**result)
# exactly the same as before (without the tuple clause)
elif isinstance(obj, (list, tuple)):
return type(obj)(_fromdict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_fromdict_inner(k), _fromdict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Functions used:
def is_dataclass_dict(obj):
return isinstance(obj, TypeDict)
def fromdict(obj):
if not is_dataclass_dict(obj):
raise TypeError("fromdict() should be called on TypeDict instances")
return _fromdict_inner(obj)
Test:
c = C([Point(0, 0), Point(10, 4)])
cd = todict(c)
cf = fromdict(cd)
print(c)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
print(cf)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
Again as expected.
edited Nov 29 '18 at 15:05
answered Nov 27 '18 at 11:25
meowgoesthedogmeowgoesthedog
9,82231526
9,82231526
6
TL;DR, +1 for the comprehensiveness for the answer.
– iBug
Nov 27 '18 at 11:28
2
+0: +1 for trying it, but -1 because it's basically a bad idea in the first place.
– wim
Nov 27 '18 at 16:54
1
@wim I'd agree tbh - can't see it as much more than a theoretical exercise (which at least shows thatdataclass
plays well with existing object types).
– meowgoesthedog
Nov 27 '18 at 16:57
I'm going to accept this as it's the most comprehensive answer that helps future users understand the core of the issue. I ended up with something closer to @Martijn's suggestion as I did indeed want JSON. Thank you everyone for your answers
– mbatchkarov
Dec 11 '18 at 9:56
add a comment |
6
TL;DR, +1 for the comprehensiveness for the answer.
– iBug
Nov 27 '18 at 11:28
2
+0: +1 for trying it, but -1 because it's basically a bad idea in the first place.
– wim
Nov 27 '18 at 16:54
1
@wim I'd agree tbh - can't see it as much more than a theoretical exercise (which at least shows thatdataclass
plays well with existing object types).
– meowgoesthedog
Nov 27 '18 at 16:57
I'm going to accept this as it's the most comprehensive answer that helps future users understand the core of the issue. I ended up with something closer to @Martijn's suggestion as I did indeed want JSON. Thank you everyone for your answers
– mbatchkarov
Dec 11 '18 at 9:56
6
6
TL;DR, +1 for the comprehensiveness for the answer.
– iBug
Nov 27 '18 at 11:28
TL;DR, +1 for the comprehensiveness for the answer.
– iBug
Nov 27 '18 at 11:28
2
2
+0: +1 for trying it, but -1 because it's basically a bad idea in the first place.
– wim
Nov 27 '18 at 16:54
+0: +1 for trying it, but -1 because it's basically a bad idea in the first place.
– wim
Nov 27 '18 at 16:54
1
1
@wim I'd agree tbh - can't see it as much more than a theoretical exercise (which at least shows that
dataclass
plays well with existing object types).– meowgoesthedog
Nov 27 '18 at 16:57
@wim I'd agree tbh - can't see it as much more than a theoretical exercise (which at least shows that
dataclass
plays well with existing object types).– meowgoesthedog
Nov 27 '18 at 16:57
I'm going to accept this as it's the most comprehensive answer that helps future users understand the core of the issue. I ended up with something closer to @Martijn's suggestion as I did indeed want JSON. Thank you everyone for your answers
– mbatchkarov
Dec 11 '18 at 9:56
I'm going to accept this as it's the most comprehensive answer that helps future users understand the core of the issue. I ended up with something closer to @Martijn's suggestion as I did indeed want JSON. Thank you everyone for your answers
– mbatchkarov
Dec 11 '18 at 9:56
add a comment |
I'm the author of dacite
- the tool that simplifies creation of data classes from dictionaries.
This library has only one function from_dict
- this is a quick example of usage:
from dataclasses import dataclass
from dacite import from_dict
@dataclass
class User:
name: str
age: int
is_active: bool
data = {
'name': 'john',
'age': 30,
'is_active': True,
}
user = from_dict(data_class=User, data=data)
assert user == User(name='john', age=30, is_active=True)
Moreover dacite
supports following features:
- nested structures
- (basic) types checking
- optional fields (i.e. typing.Optional)
- unions
- collections
- values casting and transformation
- remapping of fields names
... and it's well tested - 100% code coverage!
To install dacite, simply use pip (or pipenv):
$ pip install dacite
add a comment |
I'm the author of dacite
- the tool that simplifies creation of data classes from dictionaries.
This library has only one function from_dict
- this is a quick example of usage:
from dataclasses import dataclass
from dacite import from_dict
@dataclass
class User:
name: str
age: int
is_active: bool
data = {
'name': 'john',
'age': 30,
'is_active': True,
}
user = from_dict(data_class=User, data=data)
assert user == User(name='john', age=30, is_active=True)
Moreover dacite
supports following features:
- nested structures
- (basic) types checking
- optional fields (i.e. typing.Optional)
- unions
- collections
- values casting and transformation
- remapping of fields names
... and it's well tested - 100% code coverage!
To install dacite, simply use pip (or pipenv):
$ pip install dacite
add a comment |
I'm the author of dacite
- the tool that simplifies creation of data classes from dictionaries.
This library has only one function from_dict
- this is a quick example of usage:
from dataclasses import dataclass
from dacite import from_dict
@dataclass
class User:
name: str
age: int
is_active: bool
data = {
'name': 'john',
'age': 30,
'is_active': True,
}
user = from_dict(data_class=User, data=data)
assert user == User(name='john', age=30, is_active=True)
Moreover dacite
supports following features:
- nested structures
- (basic) types checking
- optional fields (i.e. typing.Optional)
- unions
- collections
- values casting and transformation
- remapping of fields names
... and it's well tested - 100% code coverage!
To install dacite, simply use pip (or pipenv):
$ pip install dacite
I'm the author of dacite
- the tool that simplifies creation of data classes from dictionaries.
This library has only one function from_dict
- this is a quick example of usage:
from dataclasses import dataclass
from dacite import from_dict
@dataclass
class User:
name: str
age: int
is_active: bool
data = {
'name': 'john',
'age': 30,
'is_active': True,
}
user = from_dict(data_class=User, data=data)
assert user == User(name='john', age=30, is_active=True)
Moreover dacite
supports following features:
- nested structures
- (basic) types checking
- optional fields (i.e. typing.Optional)
- unions
- collections
- values casting and transformation
- remapping of fields names
... and it's well tested - 100% code coverage!
To install dacite, simply use pip (or pipenv):
$ pip install dacite
answered Dec 10 '18 at 20:42
Konrad HałasKonrad Hałas
2,08811112
2,08811112
add a comment |
add a comment |
You can use mashumaro for creating dataclass object from a dict according to the scheme. Mixin from this library adds convenient from_dict
and to_dict
methods to dataclasses:
from dataclasses import dataclass
from typing import List
from mashumaro import DataClassDictMixin
@dataclass
class Point(DataClassDictMixin):
x: int
y: int
@dataclass
class C(DataClassDictMixin):
mylist: List[Point]
p = Point(10, 20)
tmp = {'x': 10, 'y': 20}
assert p.to_dict() == tmp
assert Point.from_dict(tmp) == p
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.to_dict() == tmp
assert C.from_dict(tmp) == c
add a comment |
You can use mashumaro for creating dataclass object from a dict according to the scheme. Mixin from this library adds convenient from_dict
and to_dict
methods to dataclasses:
from dataclasses import dataclass
from typing import List
from mashumaro import DataClassDictMixin
@dataclass
class Point(DataClassDictMixin):
x: int
y: int
@dataclass
class C(DataClassDictMixin):
mylist: List[Point]
p = Point(10, 20)
tmp = {'x': 10, 'y': 20}
assert p.to_dict() == tmp
assert Point.from_dict(tmp) == p
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.to_dict() == tmp
assert C.from_dict(tmp) == c
add a comment |
You can use mashumaro for creating dataclass object from a dict according to the scheme. Mixin from this library adds convenient from_dict
and to_dict
methods to dataclasses:
from dataclasses import dataclass
from typing import List
from mashumaro import DataClassDictMixin
@dataclass
class Point(DataClassDictMixin):
x: int
y: int
@dataclass
class C(DataClassDictMixin):
mylist: List[Point]
p = Point(10, 20)
tmp = {'x': 10, 'y': 20}
assert p.to_dict() == tmp
assert Point.from_dict(tmp) == p
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.to_dict() == tmp
assert C.from_dict(tmp) == c
You can use mashumaro for creating dataclass object from a dict according to the scheme. Mixin from this library adds convenient from_dict
and to_dict
methods to dataclasses:
from dataclasses import dataclass
from typing import List
from mashumaro import DataClassDictMixin
@dataclass
class Point(DataClassDictMixin):
x: int
y: int
@dataclass
class C(DataClassDictMixin):
mylist: List[Point]
p = Point(10, 20)
tmp = {'x': 10, 'y': 20}
assert p.to_dict() == tmp
assert Point.from_dict(tmp) == p
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.to_dict() == tmp
assert C.from_dict(tmp) == c
answered Nov 28 '18 at 11:10
tikhonov_atikhonov_a
312
312
add a comment |
add a comment |
If your goal is to produce JSON from and to existing, predefined dataclasses, then just write custom encoder and decoder hooks. Do not use dataclasses.asdict()
here, instead record in JSON a (safe) reference to the original dataclass.
jsonpickle
is not safe because it stores references to arbitrary Python objects and passes in data to their constructors. With such references I can get jsonpickle to reference internal Python data structures and create and execute functions, classes and modules at will. But that doesn't mean you can't handle such references unsafely. Just verify that you only import (not call) and then verify that the object is an actual dataclass type, before you use it.
The framework can be made generic enough but still limited only to JSON-serialisable types plus dataclass
-based instances:
import dataclasses
import importlib
import sys
def dataclass_object_dump(ob):
datacls = type(ob)
if not dataclasses.is_dataclass(datacls):
raise TypeError(f"Expected dataclass instance, got '{datacls!r}' object")
mod = sys.modules.get(datacls.__module__)
if mod is None or not hasattr(mod, datacls.__qualname__):
raise ValueError(f"Can't resolve '{datacls!r}' reference")
ref = f"{datacls.__module__}.{datacls.__qualname__}"
fields = (f.name for f in dataclasses.fields(ob))
return {**{f: getattr(ob, f) for f in fields}, '__dataclass__': ref}
def dataclass_object_load(d):
ref = d.pop('__dataclass__', None)
if ref is None:
return d
try:
modname, hasdot, qualname = ref.rpartition('.')
module = importlib.import_module(modname)
datacls = getattr(module, qualname)
if not dataclasses.is_dataclass(datacls) or not isinstance(datacls, type):
raise ValueError
return datacls(**d)
except (ModuleNotFoundError, ValueError, AttributeError, TypeError):
raise ValueError(f"Invalid dataclass reference {ref!r}") from None
This uses JSON-RPC-style class hints to name the dataclass, and on loading this is verified to still be a data class with the same fields. No type checking is done on the values of the fields (as that's a whole different kettle of fish).
Use these as the default
and object_hook
arguments to json.dump[s]()
and json.dump[s]()
:
>>> print(json.dumps(c, default=dataclass_object_dump, indent=4))
{
"mylist": [
{
"x": 0,
"y": 0,
"__dataclass__": "__main__.Point"
},
{
"x": 10,
"y": 4,
"__dataclass__": "__main__.Point"
}
],
"__dataclass__": "__main__.C"
}
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load)
C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load) == c
True
or create instances of the JSONEncoder
and JSONDecoder
classes with those same hooks.
Instead of using fully qualifying module and class names, you could also use a separate registry to map permissible type names; check against the registry on encoding, and again on decoding to ensure you don't forget to register dataclasses as you develop.
add a comment |
If your goal is to produce JSON from and to existing, predefined dataclasses, then just write custom encoder and decoder hooks. Do not use dataclasses.asdict()
here, instead record in JSON a (safe) reference to the original dataclass.
jsonpickle
is not safe because it stores references to arbitrary Python objects and passes in data to their constructors. With such references I can get jsonpickle to reference internal Python data structures and create and execute functions, classes and modules at will. But that doesn't mean you can't handle such references unsafely. Just verify that you only import (not call) and then verify that the object is an actual dataclass type, before you use it.
The framework can be made generic enough but still limited only to JSON-serialisable types plus dataclass
-based instances:
import dataclasses
import importlib
import sys
def dataclass_object_dump(ob):
datacls = type(ob)
if not dataclasses.is_dataclass(datacls):
raise TypeError(f"Expected dataclass instance, got '{datacls!r}' object")
mod = sys.modules.get(datacls.__module__)
if mod is None or not hasattr(mod, datacls.__qualname__):
raise ValueError(f"Can't resolve '{datacls!r}' reference")
ref = f"{datacls.__module__}.{datacls.__qualname__}"
fields = (f.name for f in dataclasses.fields(ob))
return {**{f: getattr(ob, f) for f in fields}, '__dataclass__': ref}
def dataclass_object_load(d):
ref = d.pop('__dataclass__', None)
if ref is None:
return d
try:
modname, hasdot, qualname = ref.rpartition('.')
module = importlib.import_module(modname)
datacls = getattr(module, qualname)
if not dataclasses.is_dataclass(datacls) or not isinstance(datacls, type):
raise ValueError
return datacls(**d)
except (ModuleNotFoundError, ValueError, AttributeError, TypeError):
raise ValueError(f"Invalid dataclass reference {ref!r}") from None
This uses JSON-RPC-style class hints to name the dataclass, and on loading this is verified to still be a data class with the same fields. No type checking is done on the values of the fields (as that's a whole different kettle of fish).
Use these as the default
and object_hook
arguments to json.dump[s]()
and json.dump[s]()
:
>>> print(json.dumps(c, default=dataclass_object_dump, indent=4))
{
"mylist": [
{
"x": 0,
"y": 0,
"__dataclass__": "__main__.Point"
},
{
"x": 10,
"y": 4,
"__dataclass__": "__main__.Point"
}
],
"__dataclass__": "__main__.C"
}
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load)
C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load) == c
True
or create instances of the JSONEncoder
and JSONDecoder
classes with those same hooks.
Instead of using fully qualifying module and class names, you could also use a separate registry to map permissible type names; check against the registry on encoding, and again on decoding to ensure you don't forget to register dataclasses as you develop.
add a comment |
If your goal is to produce JSON from and to existing, predefined dataclasses, then just write custom encoder and decoder hooks. Do not use dataclasses.asdict()
here, instead record in JSON a (safe) reference to the original dataclass.
jsonpickle
is not safe because it stores references to arbitrary Python objects and passes in data to their constructors. With such references I can get jsonpickle to reference internal Python data structures and create and execute functions, classes and modules at will. But that doesn't mean you can't handle such references unsafely. Just verify that you only import (not call) and then verify that the object is an actual dataclass type, before you use it.
The framework can be made generic enough but still limited only to JSON-serialisable types plus dataclass
-based instances:
import dataclasses
import importlib
import sys
def dataclass_object_dump(ob):
datacls = type(ob)
if not dataclasses.is_dataclass(datacls):
raise TypeError(f"Expected dataclass instance, got '{datacls!r}' object")
mod = sys.modules.get(datacls.__module__)
if mod is None or not hasattr(mod, datacls.__qualname__):
raise ValueError(f"Can't resolve '{datacls!r}' reference")
ref = f"{datacls.__module__}.{datacls.__qualname__}"
fields = (f.name for f in dataclasses.fields(ob))
return {**{f: getattr(ob, f) for f in fields}, '__dataclass__': ref}
def dataclass_object_load(d):
ref = d.pop('__dataclass__', None)
if ref is None:
return d
try:
modname, hasdot, qualname = ref.rpartition('.')
module = importlib.import_module(modname)
datacls = getattr(module, qualname)
if not dataclasses.is_dataclass(datacls) or not isinstance(datacls, type):
raise ValueError
return datacls(**d)
except (ModuleNotFoundError, ValueError, AttributeError, TypeError):
raise ValueError(f"Invalid dataclass reference {ref!r}") from None
This uses JSON-RPC-style class hints to name the dataclass, and on loading this is verified to still be a data class with the same fields. No type checking is done on the values of the fields (as that's a whole different kettle of fish).
Use these as the default
and object_hook
arguments to json.dump[s]()
and json.dump[s]()
:
>>> print(json.dumps(c, default=dataclass_object_dump, indent=4))
{
"mylist": [
{
"x": 0,
"y": 0,
"__dataclass__": "__main__.Point"
},
{
"x": 10,
"y": 4,
"__dataclass__": "__main__.Point"
}
],
"__dataclass__": "__main__.C"
}
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load)
C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load) == c
True
or create instances of the JSONEncoder
and JSONDecoder
classes with those same hooks.
Instead of using fully qualifying module and class names, you could also use a separate registry to map permissible type names; check against the registry on encoding, and again on decoding to ensure you don't forget to register dataclasses as you develop.
If your goal is to produce JSON from and to existing, predefined dataclasses, then just write custom encoder and decoder hooks. Do not use dataclasses.asdict()
here, instead record in JSON a (safe) reference to the original dataclass.
jsonpickle
is not safe because it stores references to arbitrary Python objects and passes in data to their constructors. With such references I can get jsonpickle to reference internal Python data structures and create and execute functions, classes and modules at will. But that doesn't mean you can't handle such references unsafely. Just verify that you only import (not call) and then verify that the object is an actual dataclass type, before you use it.
The framework can be made generic enough but still limited only to JSON-serialisable types plus dataclass
-based instances:
import dataclasses
import importlib
import sys
def dataclass_object_dump(ob):
datacls = type(ob)
if not dataclasses.is_dataclass(datacls):
raise TypeError(f"Expected dataclass instance, got '{datacls!r}' object")
mod = sys.modules.get(datacls.__module__)
if mod is None or not hasattr(mod, datacls.__qualname__):
raise ValueError(f"Can't resolve '{datacls!r}' reference")
ref = f"{datacls.__module__}.{datacls.__qualname__}"
fields = (f.name for f in dataclasses.fields(ob))
return {**{f: getattr(ob, f) for f in fields}, '__dataclass__': ref}
def dataclass_object_load(d):
ref = d.pop('__dataclass__', None)
if ref is None:
return d
try:
modname, hasdot, qualname = ref.rpartition('.')
module = importlib.import_module(modname)
datacls = getattr(module, qualname)
if not dataclasses.is_dataclass(datacls) or not isinstance(datacls, type):
raise ValueError
return datacls(**d)
except (ModuleNotFoundError, ValueError, AttributeError, TypeError):
raise ValueError(f"Invalid dataclass reference {ref!r}") from None
This uses JSON-RPC-style class hints to name the dataclass, and on loading this is verified to still be a data class with the same fields. No type checking is done on the values of the fields (as that's a whole different kettle of fish).
Use these as the default
and object_hook
arguments to json.dump[s]()
and json.dump[s]()
:
>>> print(json.dumps(c, default=dataclass_object_dump, indent=4))
{
"mylist": [
{
"x": 0,
"y": 0,
"__dataclass__": "__main__.Point"
},
{
"x": 10,
"y": 4,
"__dataclass__": "__main__.Point"
}
],
"__dataclass__": "__main__.C"
}
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load)
C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load) == c
True
or create instances of the JSONEncoder
and JSONDecoder
classes with those same hooks.
Instead of using fully qualifying module and class names, you could also use a separate registry to map permissible type names; check against the registry on encoding, and again on decoding to ensure you don't forget to register dataclasses as you develop.
edited Nov 27 '18 at 18:23
answered Nov 27 '18 at 17:58
Martijn Pieters♦Martijn Pieters
712k13724842303
712k13724842303
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376099%2fpython-dataclass-from-dict%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The question this is marked as a duplicate of is indeed asking the same, but the answer given there does not work for this particular example. I've left a comment there and still looking for a more general answer.
– mbatchkarov
Nov 21 '18 at 9:51
Could you make that difference explicit here? It looks like you may have to add an
elif
to thatif
that checks for various hints. I'm not sure how you would generalize it to arbitrary type hints though (Dict
andTuple
in addition toList
, for example)– Patrick Haugh
Nov 21 '18 at 14:40
5
asdict
is losing information. It would not be possible to do this in the general case.– wim
Nov 26 '18 at 18:44
6
Specifically,
asdict
doesn't store any information about what class the dict was produced from. Givenclass A: x: int
andclass B: x: int
, should{'x': 5}
be used to create an instance ofA
orB
? You seem to be making the assumption that the list of attribute names uniquely defines a list, and that there is an existing mapping of names to data classes that could be used to select the correct class.– chepner
Nov 26 '18 at 18:50
1
I would recommend you to check out this library.
– Abdul Niyas P M
Nov 28 '18 at 8:48