This is sort of an afterthought of the MetaProgramming article.
I realized that while I teased a lot, I never really implemented something non-trivial. And so that’s what we are going to do today - we’re going to implement a dataclass like module to help simplify our class definitions.
Say you have a Person class that has a name and an age.
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
Just look at the code duplication! We use the word “name” and “age” in 6 different places. Sure, this isn’t so bad, but it’s a lot of pain to maintain when you have a dozen or so attributes.
The right way to do this is to use a dataclass.
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
But we are not going to do that today. Instead, we’ll try to implement our version of dataclasses. Of course, dataclasses support lots of features, but we’re going to focus on the basics. Here’s what our dataclass will support:
Enough talking. Now let’s get to the code.
Note: Don’t expect this to be a step-by-step guide. It’s just a quick and dirty implementation that I hacked together in an hour and wanted to share.
Let’s start by taking a look at our descriptor
from typing import Any, Callable
class NoDefault:
...
NO_DEFAULT = NoDefault()
class Descriptor:
def __init__(self, name, kind, default: Callable | NoDefault):
self.name = name
self.kind = kind
self.default = default
def __get__(self, instance, owner):
if (
self.name not in instance.__dict__
and self.default is not NO_DEFAULT
):
return self.default() # type: ignore
return instance.__dict__[self.name]
def __set__(self, instance, value):
if isinstance(value, self.kind):
instance.__dict__[self.name] = value
return
raise TypeError(
f"Expected '{self.kind.__name__}' \
got '{type(value).__name__}' for {self.name}"
)
Descriptors are out of the scope of this article, but I’ll give a gist for those unaware. Descriptors allow you to(as is commonly said) “own the dot”
Say when you do obj.x
. Normally, this would lookup x
in the object’s dictionary. But if x
is a descriptor, it will call __get__
on it.`
What our descriptor does is straightforward. Let’s take a look at the two methods in it.
In the __get__
method, it checks if the attribute x
is in the object’s dictionary. If it is, it returns the value. This means the user has already set the value. If it isn’t, it checks if the user set a default value. If they did, it returns the default value.
For the __set__
method, it checks if the value is of the correct type. If it is, it sets the value in the object’s dictionary. If it isn’t, it raises a TypeError.
Now that we have a descriptor, let’s define a way to tell our code that we want to use our dataclass. We use this by creating a Type
class.
class Type:
def __init__(self, kind, **kwargs) -> None:
self.kind = kind
if "default" in kwargs:
self.default = kwargs["default"]
else:
self.default = NO_DEFAULT
All it does is take in the kind
of the attribute and the default
value. The kind
can be a type. For example, if we want to create a str
attribute, we would do Type(str)
.
default
is the function that when called, would return the default value. For example, if we want to create a str
attribute with a default value of “Hello”, we would do Type(str, default=lambda: "Hello")
.
And now finally, we can create our metaclass!
class MetaClass(type):
def __new__(cls, name, bases, attrs):
new_attrs = {}
for key, value in attrs.items():
if isinstance(value, Type):
new_attrs[key] = Descriptor(
key,
value.kind,
value.default
)
else:
new_attrs[key] = value
return super().__new__(cls, name, bases, new_attrs)
def __call__(self, *args: Any, **kwargs: Any) -> Any:
params = {
key: value
for key, value in self.__dict__.items()
if isinstance(value, Descriptor)
}
if (len(args) + len(kwargs)) > len(params):
raise TypeError("Too many arguments")
items = iter(params.items())
for arg in args:
key, _ = next(items)
if key in kwargs:
raise TypeError(
f"Duplicate argument - {key}\
passed both by position and by name"
)
kwargs[key] = arg
for param in items:
key, value = param
if key not in kwargs and value.default is NO_DEFAULT:
raise TypeError(f"Missing argument - {key}")
def __init__(instance, *vargs, **kvargs):
for key, value in kwargs.items():
setattr(instance, key, value)
self.__init__ = __init__
return super().__call__()
A metaclass is a class that is used to create a class. That class must be returned by the __new__
method. In it, we loop over the attributes of the class.
for key, value in attrs.items():
if isinstance(value, Type):
new_attrs[key] = Descriptor(
key,
value.kind,
value.default
)
else:
new_attrs[key] = value
If the attribute is a Type
instance, we create a Descriptor
instance and add it to the new class. Remember - even though descriptors belong to the class, they intercept attribute accesses on the actual instance.
If it is not a Type
instance, we just add it to the list of attributes.
Then, we call the super class’s __new__
method, passing in the attributes that we created.
return super().__new__(cls, name, bases, new_attrs)
Next, we create the __call__
method. This method is called when an instance of the class is created. For example, if we do Person()
, this method is called.
Here, we do some basic housekeeping. First, we ensure that we have the correct number of arguments. Then, we loop over the attributes and set the values.
params = {
key: value
for key, value in self.__dict__.items()
if isinstance(value, Descriptor)
}
if (len(args) + len(kwargs)) > len(params):
raise TypeError("Too many arguments")
items = iter(params.items())
for arg in args:
key, _ = next(items)
if key in kwargs:
raise TypeError(
f"Duplicate argument - {key}\
passed both by position and by name"
)
kwargs[key] = arg
for param in items:
key, value = param
if key not in kwargs and value.default is NO_DEFAULT:
raise TypeError(f"Missing argument - {key}")
We also take care of missing and duplicate arguments. If however there’s a default value set, we need not worry about raising an error.
And of course, we must set the __init__
method of the class. While we have all the arguments we need, we have to make sure when the __init__
method is called, it sets the values.
def __init__(instance, *vargs, **kvargs):
for key, value in kwargs.items():
setattr(instance, key, value)
self.__init__ = __init__
return super().__call__()
Yes, that’s perfectly legal. You can set __init__
on a class(which in this case is self
)
And that’s it! I know that was a lot of work, but I think it’s worth it.
Let’s consider the same example as before.
class Person(MyDataClass):
name = Type(str)
age = Type(int, default=lambda: 18)
So we have a class called Person
that has a name
attribute(which is a str
) and an age
attribute (which is an int
with a default value of 18).
Now, let’s create a new instance of Person
.
>>> p = Person()
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
p = Person()
File "/mnt/Programming/Python/lib.py", line 70, in __call__
raise TypeError(f"Missing argument - {key}")
TypeError: Missing argument - name
>>> p = Person("Shashwat")
>>> p.name
'Shashwat'
>>> p.age
18
Looks good so far! Let’s try the type checking
>>> p = Person("Shashwat", age="25")
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
p = Person("Shashwat", age="25")
File "/mnt/Programming/Python/lib.py", line 77, in __call__
return super().__call__()
File "/mnt/Programming/Python/lib.py", line 74, in __init__
setattr(instance, key, value)
File "/mnt/Programming/Python/lib.py", line 26, in __set__
raise TypeError(
TypeError: Expected 'int' got 'str' for age
We can also set the attributes after the instance has been created.
>>> p = Person("Shashwat", 20)
>>> p.age
20
>>> p.age = 10
>>> p.age
10
>>> p.age = 10.5
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
p.age = 10.5
File "/mnt/Programming/Python/lib.py", line 26, in __set__
raise TypeError(
TypeError: Expected 'int' got 'float' for age
And we have validation for the __init__
as well
>>> p = Person("Shashwat", name="Another")
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
p = Person("Shashwat", name="Another")
File "/mnt/Programming/Python/lib.py", line 62, in __call__
raise TypeError(
TypeError: Duplicate argument - name passed both by position and by name
>>> p = Person("Shashwat", 10, 20)
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
p = Person("Shashwat", 10, 20)
File "/mnt/Programming/Python/lib.py", line 57, in __call__
raise TypeError("Too many arguments")
TypeError: Too many arguments
That’s a lot of neat functionality for a class that is only 3 lines long.
If your head hasn’t exploded yet, you can try using inheritance to see if this works there as well!
If you’ve never heard of Metaclasses or Descriptors, this might feel weird and confusing. And you might wonder why you’d want to use them at all. After all, python has dataclasses so why reinvent the wheel?
The answer to that is obvious - You should never create your version of dataclasses. But the concepts we’ve discussed today are quite useful and you’ll find these techniques used in a lot of frameworks to make things easier.
Take this example from the sqlalchemy documentation:
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(80), unique=True, nullable=False)
email = db.Column(db.String(120), unique=True, nullable=False)
The reason it’s possible to define your models in such a concise manner is that the library itself is using descriptors and metaclasses to do the heavy lifting.
Using them in user code is probably something you’d never want to do but if you’re writing your framework, they come in incredibly handy.
Here are some things you might find helpful if you want to learn more about what we’ve discussed:
You can find the entire code for this article here.
Hope you learned something useful!