Serializing Legal Rules with Pydantic
I’ve released version 0.9 of AuthoritySpoke. In my last blog post about AuthoritySpoke, I wrote that I had decided not to migrate all its data serialization code to Pydantic. In this post, I’ll explain why I changed my mind and did just that.
Basically, I became tired of the proliferation of messy data loading code in the AuthoritySpoke repository. That repository was the core of my “legal rule automation” project, but it was beginning to look like a cluttered workshop full of odds and ends. Every time a part of AuthoritySpoke started to look neat and coherent, I bundled it up as a separate Python package with separate documentation and moved it to a separate GitHub repository, leaving behind the messier code that didn’t quite fit together or that was hard to use.
When I created the judicial opinion download library Justopinion, I was able to choose a serializer without the burden of supporting legacy code, and Pydantic felt like the right choice, so I went with it. But then Justopinion became a dependency of AuthoritySpoke, which meant AuthoritySpoke had to import Pydantic to run. That put me on the path to adopting Pydantic for the entire AuthoritySpoke project.
The major design difference between Pydantic and the serializer I previously used, Marshmallow, is that with Pydantic the information needed to serialize objects to JSON is stored on the objects themselves, rather than in separate serializer classes. The result was that I was also able to delete a lot of old code I’d written, including several whole modules, and replace it with Pydantic’s built-in functionality.
My biggest fear about the transition was that because I’d have to make changes to all the Python classes in my project that stored data, the change might introduce bugs that I wouldn’t be able to fix, and the migration to Pydantic would simply fail. But in the end I was able to migrate every feature to Pydantic, while removing both Marshmallow and its associated API documentation library Apispec from the list of dependencies that have to be imported when AuthoritySpoke is installed.
The new version 0.9 of AuthoritySpoke has been mainly about reducing the amount of code and improving its organization, without introducing many new features. But as a result of the Pydantic transition, nearly all AuthoritySpoke classes have newly-added .dict()
and .json()
methods for serializing to generic datatypes, as well as .schema()
and .schema_json()
methods for generating JSON Schema API documentation. These serialization methods are easier to use and understand than the alternatives that existed in the past. Overall, version 0.9 is more consistent, more maintainable, less buggy, and more suitable for larger projects.