One new and exciting feature coming in Python 3.7 is the data class. A data class is a class typically containing mainly data, although there aren’t really any restrictions. It is created using the new 8 decorator, as follows: Show
Note: This code, as well as all other examples in this tutorial, will only work in Python 3.7 and above. A data class comes with basic functionality already implemented. For instance, you can instantiate, print, and compare data class instances straight out of the box: >>>
Compare that to a regular class. A minimal regular class would look something like this:
While this is not much more code to write, you can already see signs of the boilerplate pain: 9 and 0 are both repeated three times simply to initialize an object. Furthermore, if you try to use this plain class, you’ll notice that the representation of the objects is not very descriptive, and for some reason a queen of hearts is not the same as a queen of hearts:>>>
Seems like data classes are helping us out behind the scenes. By default, data classes implement a 1 method to provide a nice string representation and an 2 method that can do basic object comparisons. For the 3 class to imitate the data class above, you need to add these methods as well:
In this tutorial, you will learn exactly which conveniences data classes provide. In addition to nice representations and comparisons, you’ll see:
We will soon dive deeper into those features of data classes. However, you might be thinking that you have already seen something like this before. Free Download: Get a sample chapter from Python Tricks: The Book that shows you Python’s best practices with simple examples you can apply instantly to write more beautiful + Pythonic code. Alternatives to Data ClassesFor simple data structures, you have probably already used a 4 or a 5. You could represent the queen of hearts card in either of the following ways:>>>
It works. However, it puts a lot of responsibility on you as a programmer:
Furthermore, using these structures is not ideal: >>>
A better alternative is the 1. It has long been used to create readable small data structures. We can in fact recreate the data class example above using a 1 like this:
This definition of 3 will give the exact same output as our 4 example did:>>>
So why even bother with data classes? First of all, data classes come with many more features than you have seen so far. At the same time, the 1 has some other features that are not necessarily desirable. By design, a 1 is a regular tuple. This can be seen in comparisons, for instance:>>>
While this might seem like a good thing, this lack of awareness about its own type can lead to subtle and hard-to-find bugs, especially since it will also happily compare two different 1 classes:>>> 0The 1 also comes with some restrictions. For instance, it is hard to add default values to some of the fields in a 1. A 1 is also by nature immutable. That is, the value of a 1 can never change. In some applications, this is an awesome feature, but in other settings, it would be nice to have more flexibility:>>> 1Data classes will not replace all uses of 1. For instance, if you need your data structure to behave like a tuple, then a named tuple is a great alternative!Another alternative, and one of the inspirations for data classes, is the 3 project. With 3 installed ( 5), you can write a card class as follows: 2This can be used in exactly the same way as the 4 and 3 examples earlier. The 3 project is great and does support some features that data classes do not, including converters and validators. Furthermore, 3 has been around for a while and is supported in Python 2.7 as well as Python 3.4 and up. However, as 3 is not a part of the standard library, it does add an external dependency to your projects. Through data classes, similar functionality will be available everywhere.In addition to 4, 5, 1, and 3, there are , including , 6, 7, 8, and 9. While data classes are a great new alternative, there are still use cases where one of the older variants fits better. For instance, if you need compatibility with a specific API expecting tuples or need functionality not supported in data classes.Remove adsBasic Data ClassesLet us get back to data classes. As an example, we will create a 00 class that will represent geographic positions with a name as well as the latitude and longitude: 3What makes this a data class is the 8 decorator just above the class definition. Beneath the 02 line, you simply list the fields you want in your data class. The 03 notation used for the fields is using a new feature in Python 3.6 called variable annotations. We will talk more about this notation and why we specify data types like 04 and 05.Those few lines of code are all you need. The new class is ready for use: >>> 4You can also create data classes similarly to how named tuples are created. The following is (almost) equivalent to the definition of 00 above: 5A data class is a regular Python class. The only thing that sets it apart is that it has basic like 07, 1, and 2 implemented for you.Default ValuesIt is easy to add default values to the fields of your data class: 6This works exactly as if you had specified the default values in the definition of the 07 method of a regular class:>>> 7you will learn about 11, which gives a way to provide more complicated default values.Type HintsSo far, we have not made a big fuss of the fact that data classes support typing out of the box. You have probably noticed that we defined the fields with a type hint: 12 says that 13 should be a text string ( 04 type).In fact, adding some kind of type hint is mandatory when defining the fields in your data class. Without a type hint, the field will not be a part of the data class. However, if you do not want to add explicit types to your data class, use 15: 8While you need to add type hints in some form when using data classes, these types are not enforced at runtime. The following code runs without any problems: >>> 9This is how typing in Python usually works: . To actually catch type errors, type checkers like Mypy can be run on your source code. Remove adsAdding MethodsYou already know that a data class is just a regular class. That means that you can freely add your own methods to a data class. As an example, let us calculate the distance between one position and another, along the Earth’s surface. One way to do this is by using the haversine formula: You can add a 16 method to your data class just like you can with normal classes: 0It works as you would expect: >>> 1More Flexible Data ClassesSo far, you have seen some of the basic features of the data class: it gives you some convenience methods, and you can still add default values and other methods. Now you will learn about some more advanced features like parameters to the 8 decorator and the 18 function. Together, they give you more control when creating a data class.Let us return to the playing card example you saw at the beginning of the tutorial and add a class containing a deck of cards while we are at it: 2A simple deck containing only two cards can be created like this: >>> 3Advanced Default ValuesSay that you want to give a default value to the 19. It would for example be convenient if 20 created a regular (French) deck of 52 playing cards. First, specify the different ranks and suits. Then, add a function 21 that creates a list of instances of 22: 4For fun, the four different suits are specified using their Unicode symbols.
To simplify comparisons of cards later, the ranks and suits are also listed in their usual order. >>> 5In theory, you could now use this function to specify a default value for 28: 6Don’t do this! This introduces one of the most common anti-patterns in Python: . The problem is that all instances of 19 will use the same list object as the default value of the 30 property. This means that if, say, one card is removed from one 19, then it disappears from all other instances of 19 as well. Actually, data classes try to , and the code above will raise a 33.Instead, data classes use something called a 11 to handle mutable default values. To use 11 (and many other cool features of data classes), you need to use the 18 specifier: 7The argument to 11 can be any zero parameter callable. Now it is easy to create a full deck of playing cards:>>> 8The 18 specifier is used to customize each field of a data class individually. You will see some other examples later. For reference, these are the parameters 18 supports:
In the 00 example, you saw how to add simple default values by writing 55. However, if you also want to customize the field, for instance to hide it in the 45, you need to use the 40 parameter: 58. You may not specify both 40 and 11.The 53 parameter is not used by the data classes themselves but is available for you (or third party packages) to attach information to fields. In the 00 example, you could for instance specify that latitude and longitude should be given in degrees: 9The metadata (and other information about a field) can be retrieved using the 63 function (note the plural s):>>> 0You Need Representation?Recall that we can create decks of cards out of thin air: >>> 8While this representation of a 19 is explicit and readable, it is also very verbose. I have deleted 48 of the 52 cards in the deck in the output above. On an 80-column display, simply printing the full 19 takes up 22 lines! Let us add a more concise representation. In general, a Python object has two different string representations:
Let us implement a user-friendly representation of a 22: 2The cards now look much nicer, but the deck is still as verbose as ever: >>> 3To show that it is possible to add your own 1 method as well, we will violate the principle that it should return code that can recreate an object. Practicality beats purity after all. The following code adds a more concise representation of the 19: 4Note the 78 specifier in the 79 format string. It means that we explicitly want to use the 80 representation of each 22. With the new 1, the representation of 19 is easier on the eyes:>>> 5This is a nicer representation of the deck. However, it comes at a cost. You’re no longer able to recreate the deck by executing its representation. Often, you’d be better off implementing the same representation with 73 instead.Comparing CardsIn many card games, cards are compared to each other. For instance in a typical trick taking game, the highest card takes the trick. As it is currently implemented, the 22 class does not support this kind of comparison:>>> 6This is, however, (seemingly) easy to rectify: 7The 8 decorator has two forms. So far you have seen the simple form where 8 is specified without any parentheses and parameters. However, you can also give parameters to the 88 decorator in parentheses. The following parameters are supported:
See for more information about each parameter. After setting 06, instances of 22 can be compared:>>> 8How are the two cards compared though? You have not specified how the ordering should be done, and for some reason Python seems to believe that a Queen is higher than an Ace… It turns out that data classes compare objects as if they were tuples of their fields. In other words, a Queen is higher than an Ace because 08 comes after 09 in the alphabet:>>> 9That does not really work for us. Instead, we need to define some kind of sort index that uses the order of 10 and 11. Something like this:>>> 0For 22 to use this sort index for comparisons, we need to add a field 13 to the class. However, this field should be calculated from the other fields 14 and 15 automatically. This is exactly what the special method 16 is for. It allows for special processing after the regular 07 method is called: 1Note that 13 is added as the first field of the class. That way, the comparison is first done using 13 and only if there are ties are the other fields used. Using 18, you must also specify that 13 should not be included as a parameter in the 07 method (because it is calculated from the 14 and 15 fields). To avoid confusing the user about this implementation detail, it is probably also a good idea to remove 13 from the 45 of the class.Finally, aces are high: >>> 2You can now easily create a sorted deck: >>> 3Or, if you don’t care about sorting, this is how you draw a random hand of 10 cards: >>> 4Of course, you don’t need 06 for that…Remove adsImmutable Data ClassesOne of the defining features of the 1 you saw earlier is that it is immutable. That is, the value of its fields may never change. For many types of data classes, this is a great idea! To make a data class immutable, set 29 when you create it. For example, the following is an immutable version of the 00 class : 5In a frozen data class, you can not assign values to the fields after creation: >>> 6Be aware though that if your data class contains mutable fields, those might still change. This is true for all nested data structures in Python (see this video for further info): 7Even though both 31 and 32 are immutable, the list holding 33 is not. You can therefore still change the cards in the deck:>>> 8To avoid this, make sure all fields of an immutable data class use immutable types (but remember that types are not enforced at runtime). The 32 should be implemented using a tuple instead of a list.InheritanceYou can subclass data classes quite freely. As an example, we will extend our 00 example with a 36 field and use it to record capitals: 9In this simple example, everything works without a hitch: >>> 0The 36 field of 38 is added after the three original fields in 00. Things get a little more complicated if any fields in the base class have default values: 1This code will immediately crash with a 40 complaining that “non-default argument ‘country’ follows default argument.” The problem is that our new 36 field has no default value, while the 42 and 43 fields have default values. The data class will try to write an 07 method with the following signature: 2However, this is not valid Python. . In other words, if a field in a base class has a default value, then all new fields added in a subclass must have default values as well. Another thing to be aware of is how fields are ordered in a subclass. Starting with the base class, fields are ordered in the order in which they are first defined. If a field is redefined in a subclass, its order does not change. For example, if you define 00 and 38 as follows: 3Then the order of the fields in 38 will still be 13, 42, 43, 36. However, the default value of 43 will be 53.>>> 4Remove adsOptimizing Data ClassesI’m going to end this tutorial with a few words about . Slots can be used to make classes faster and use less memory. Data classes have no explicit syntax for working with slots, but the normal way of creating slots works for data classes as well. (They really are just regular classes!) 5Essentially, slots are defined using 54 to list the variables on a class. Variables or attributes not present in 54 may not be defined. Furthermore, a slots class may not have default values.The benefit of adding such restrictions is that certain optimizations may be done. For instance, slots classes take up less memory, as can be measured using Pympler: >>> 6Similarly, slots classes are typically faster to work with. The following example measures the speed of attribute access on a slots data class and a regular data class using timeit from the standard library. >>> 7In this particular example, the slot class is about 35% faster. Conclusion & Further ReadingData classes are one of the new features of Python 3.7. With data classes, you do not have to write boilerplate code to get proper initialization, representation, and comparisons for your objects. You have seen how to define your own data classes, as well as:
If you want to dive into all the details of data classes, have a look at PEP 557 as well as the discussions in the original GitHub repo. In addition, Raymond Hettinger’s PyCon 2018 talk Dataclasses: The code generator to end all code generators is well worth watching. If you do not yet have Python 3.7, there is also a data classes backport for Python 3.6. And now, go forth and write less code! Mark as Completed Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Using Data Classes in Python 🐍 Python Tricks 💌 Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team. Send Me Python Tricks » About Geir Arne Hjelle Geir Arne is an avid Pythonista and a member of the Real Python tutorial team. » More about Geir ArneEach tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are: Aldren Dan Joanna Master Real-World Python Skills With Unlimited Access to Real Python Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas: Level Up Your Python Skills » Master Real-World Python Skills Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas: Level Up Your Python Skills » What Do You Think? Rate this article: Tweet Share Share EmailWhat’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know. Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. and get answers to common questions in our support portal. |