<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://mscarey.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://mscarey.github.io/" rel="alternate" type="text/html" /><updated>2025-08-26T04:28:56+00:00</updated><id>https://mscarey.github.io/feed.xml</id><title type="html">Python for Law</title><subtitle>A blog about Python libraries for working with legal datasets including legislation, caselaw, regulations, and contracts</subtitle><author><name>Matt Carey</name></author><entry><title type="html">Answer Set Programming for Legal Analysis</title><link href="https://mscarey.github.io/2025/08/25/answer-set-programming.html" rel="alternate" type="text/html" title="Answer Set Programming for Legal Analysis" /><published>2025-08-25T03:00:00+00:00</published><updated>2025-08-25T03:00:00+00:00</updated><id>https://mscarey.github.io/2025/08/25/answer-set-programming</id><content type="html" xml:base="https://mscarey.github.io/2025/08/25/answer-set-programming.html"><![CDATA[<p>This blog post was inspired by a paper presented at <a href="https://sites.northwestern.edu/icail2025/">ICAIL 2025</a> by <a href="https://josephinedik.github.io/">Josephine Dik</a> and <a href="https://rekamarkovich.github.io/#publications">Réka Markovich</a>, called “Judicial Discretion as Normative Reasoning: Deontic Characterization of Judicial Decision Making with Answer Set Programming”. It describes the use of the <a href="https://pypi.org/project/clingo/">clingo</a> Answer Set Programming (ASP) solver to weigh various considerations in a hypothetical child custody case and to prescribe a suggested exercise of discretion for the judge.</p>

<p>Unfortunately I don’t know of any way to read this year’s ICAIL papers, even through a library, other than requesting access to the preprint. But I’d still like to comment about the paper a little because I think of answer set programming as a slightly enigmatic and under-documented part of the Python world, but still one that could be promising for legal analysis. The <a href="https://github.com/josephinedik/ICAIL2025">logic program from the paper is on GitHub</a>.</p>

<p>The answer set program in the Dix and Markovich paper defines a model in which a set of judges can each determine “each parent’s properties, whether these are relevant, if so, whether they are positive or negative, and how relevant these properties are.” The conclusions that the “judges” reach in the program are defeasible, which means the model can reach a conclusion based on incomplete information, and that adding new facts to the model can change the conclusion. The paper gives the example of reaching a conclusion on child custody in the mother’s favor, but then adding the fact “the mother is in jail” to change the outcome.</p>

<p>The ASP system supports both strong and weak negation, represented by a minus sign and the word “not” respectively. The paper suggests that these forms of negation should be used to represent the concepts “a judge has stated that a fact is false” and “a judge has not stated that a fact is true.” Apparently, because the formal logic statements include a reference to the judge, it’s possible to use some “negation” statements to describe the point of view of one judge, while using other negation statements for other judges. I wasn’t really able to understand the paper well enough to know if this is viable or not, but it seems odd. It’s easier for me to imagine the negation operators creating statements like “it’s false that the judge has stated that the fact is true” and “it’s not known that the judge has stated that the fact is true.”</p>

<p>I think the “judges” in the ASP system should be thought of as agents in a very abstract simulation, so the model doesn’t include a very complete depiction of the actions a judge might take in real life. But the paper does describe a way for one “judge” to be considered the “appellate judge” that “reviews” the decisions of the other judges.</p>

<p>Whether or not this model is adequate to describe what happens in real courtrooms, I think there’s a lot of potential in using answer set programming to discover possible outcomes given a set of legal “facts” or “statements”. The most accessible presentation I’ve seen about using solvers in Python (not specific to the legal domain) was Raymond Hettinger’s PyCon talk “<a href="https://www.youtube.com/watch?v=_GP9OpZPUYc">Modern solvers: Problems well-defined are problems solved</a>” (presentation at https://rhettinger.github.io/). But even that deals primarily with <a href="https://en.wikipedia.org/wiki/SAT_solver">SAT solvers</a>, not ASP solvers.</p>

<p>I’m left with a few questions:</p>

<ul>
  <li>
    <p>How can legal analysts put themselves in enough of a “logic programming” mindset to recognize when solvers are a good solution for a problem?</p>
  </li>
  <li>
    <p>Does legal procedure require so many changes of frame of reference that it can’t usefully fit into a logic programming model?</p>
  </li>
  <li>
    <p>Can we break down small parts of legal analysis tasks to be handled with solvers, while still doing most of our data modeling with traditional imperative programming (preferably in Python)?</p>
  </li>
</ul>]]></content><author><name>Matt Carey</name></author><category term="caselaw" /><summary type="html"><![CDATA[This blog post was inspired by a paper presented at ICAIL 2025 by Josephine Dik and Réka Markovich, called “Judicial Discretion as Normative Reasoning: Deontic Characterization of Judicial Decision Making with Answer Set Programming”. It describes the use of the clingo Answer Set Programming (ASP) solver to weigh various considerations in a hypothetical child custody case and to prescribe a suggested exercise of discretion for the judge.]]></summary></entry><entry><title type="html">CATLEX: Populating a Factor Flowchart Using LLMs</title><link href="https://mscarey.github.io/2025/06/29/catlex.html" rel="alternate" type="text/html" title="CATLEX: Populating a Factor Flowchart Using LLMs" /><published>2025-06-29T17:00:00+00:00</published><updated>2025-06-29T17:00:00+00:00</updated><id>https://mscarey.github.io/2025/06/29/catlex</id><content type="html" xml:base="https://mscarey.github.io/2025/06/29/catlex.html"><![CDATA[<p>At the 2025 <a href="https://sites.northwestern.edu/icail2025/accepted-papers/#fullpapers">International Conference on Artificial Intelligence and Law</a>, I was very impressed by a demonstration of <a href="https://github.com/hwestermann/CATLEX">CATLEX</a>, which includes LLM-assisted extraction of legal doctrines, and a flowchart-like view that diagrams a court’s legal reasoning process in resolving a legal issue.</p>

<p>CATLEX relies on a corpus of <a href="https://github.com/LLTLab/VetClaims-JSON">disability-claim decisions issued by the Board of Veterans’ Appeals</a>, which was assembled by the <a href="https://www.lltlab.org">Research Laboratory for Law, Logic and Technology (LLT Lab)</a> at Hofstra. CATLEX was presented by <a href="https://cris.maastrichtuniversity.nl/en/persons/hannes-westermann">Hannes Westermann</a>, and the other two authors of the conference paper were <a href="https://www.lltlab.org/author/vern-walker/">Vern Walker</a> of the LLT Lab, and <a href="https://www.cs.cmu.edu/~jsavelka/">Jaromir Savelka</a>.</p>

<p>The project depended on a description of the elements of a claim of a service-related disability, in the form of a JSON “rule tree” that I think was created entirely by hand. Then the authors used the <a href="http://pypi.org/project/litellm/">litellm</a> Python client to present LLMs, including Claude 3.5 Sonnet, with Board decisions from the LLT Lab’s collection. The LLMs were prompted to determine whether the Board of Veterans’ Appeals found each element to be present or absent in each case. The Board’s determinations could then be plotted onto a diagram showing the steps needed to establish the claim, along with the LLM’s explanations of its interpretations of the Board’s reasoning. The screenshots of the resulting Vue app are neat-looking, but I’m not posting them because the paper is still a preprint and I don’t think the authors have published the screenshots online. Probably the screenshots could be replicated by running the code in the GitHub repo linked above.</p>

<p>The model may oversimplify somewhat, in condensing the range of possible decisions on a factor to just “True” or “False”. In the example data, when the Board assumed a factor was satisfied, that was considered “True”. When the Board didn’t mention a factor, that was considered “False”.</p>

<p>I think CATLEX’s rule visualization by itself could be an interesting avenue for future open source tools, even if the content was created by hand. And the LLM data extraction is also exciting, since it shows how researchers are starting to break down judicial reasoning into modular parts, assigning some of those parts to AI agents, and measuring the accuracy of the results. But in this project the AI hasn’t fully taken over, since the legal analysis is still tethered to human annotations describing the structure of a very specific legal issue.</p>]]></content><author><name>Matt Carey</name></author><category term="caselaw" /><summary type="html"><![CDATA[At the 2025 International Conference on Artificial Intelligence and Law, I was very impressed by a demonstration of CATLEX, which includes LLM-assisted extraction of legal doctrines, and a flowchart-like view that diagrams a court’s legal reasoning process in resolving a legal issue.]]></summary></entry><entry><title type="html">Comparing Some Opinion Text with CollateX</title><link href="https://mscarey.github.io/2024/11/17/collatex.html" rel="alternate" type="text/html" title="Comparing Some Opinion Text with CollateX" /><published>2024-11-17T20:00:00+00:00</published><updated>2024-11-17T20:00:00+00:00</updated><id>https://mscarey.github.io/2024/11/17/collatex</id><content type="html" xml:base="https://mscarey.github.io/2024/11/17/collatex.html"><![CDATA[<p>I tried out the <a href="https://interedition.github.io/collatex/pythonport.html">CollateX Python library</a> to see if it seems useful for visualizing similar text passages about legal doctrines, especially caselaw. I used a very simple example dataset consisting of a holding from <a href="https://scholar.google.com/scholar_case?case=6039670076559479890">Loper Bright Enterprises v. Raimondo</a> (the opinion that abolished <em>Chevron</em> deference) and two passages from later opinions restating the holding of <em>Raimondo</em> (<a href="https://scholar.google.com/scholar_case?case=18318116451583080884">Tennessee v. Becerra</a> and <a href="https://scholar.google.com/scholar_case?case=2905254164667079068">United States v. Trumbull</a>). You can find the exact code I used in <a href="https://github.com/mscarey/collatex-example/blob/main/example.ipynb">this Jupyter Notebook</a>.</p>

<p>I thought CollateX’s out-of-the-box “html” mode visualization was potentially useful. In this case it might help readers notice that while the Supreme Court said a certain “argument” was not a basis for overruling a holding, one of the lower courts restated this to assert that an “error” was not a basis for overruling.</p>

<p>Here’s the table CollateX produced to compare the three passages.</p>

<table>
    <thead>
        <tr>
            <th>Raimondo</th>
            <th>Trumbull</th>
            <th>Becerra</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>The</td>
            <td>The</td>
            <td>The</td>
        </tr>
        <tr>
            <td>holdings of those<br />cases that specific<br />agency actions are<br />lawful—including the<br />Clean Air Act<br />holding of</td>
            <td>-</td>
            <td>Supreme</td>
        </tr>
        <tr>
            <td>-</td>
            <td>Court</td>
            <td>Court</td>
        </tr>
        <tr>
            <td>-</td>
            <td>acknowledged its &quot;</td>
            <td>cautioned litigants<br />hoping to rehash or<br />relitigate<br />previously settled<br />issues decided on</td>
        </tr>
        <tr>
            <td>Chevron</td>
            <td>-</td>
            <td>Chevron</td>
        </tr>
        <tr>
            <td>itself—are still<br />subject to statutory<br />stare decisis<br />despite our</td>
            <td>-</td>
            <td>that &quot;[m]ere</td>
        </tr>
        <tr>
            <td>change in<br />interpretive<br />methodology</td>
            <td>change in<br />interpretive<br />methodology</td>
            <td>-</td>
        </tr>
        <tr>
            <td>. Mere</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>reliance on Chevron<br />cannot constitute a</td>
            <td>-</td>
            <td>reliance on Chevron<br />cannot constitute a</td>
        </tr>
        <tr>
            <td>&quot;</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>special<br />justification</td>
            <td>-</td>
            <td>special<br />justification</td>
        </tr>
        <tr>
            <td>&quot;</td>
            <td>&quot;</td>
            <td>-</td>
        </tr>
        <tr>
            <td>for overruling such<br />a holding</td>
            <td>meant that these<br />precedents were<br />&quot;wrongly decided,&quot;<br />but explained</td>
            <td>for overruling such<br />a holding</td>
        </tr>
        <tr>
            <td>, because to say a<br />precedent relied on<br />Chevron is</td>
            <td>-</td>
            <td>.&quot; To argue as much,<br />the Court continued</td>
        </tr>
        <tr>
            <td>,</td>
            <td>-</td>
            <td>,</td>
        </tr>
        <tr>
            <td>-</td>
            <td>-</td>
            <td>would &quot;</td>
        </tr>
        <tr>
            <td>at best</td>
            <td>-</td>
            <td>at best</td>
        </tr>
        <tr>
            <td>, &quot;</td>
            <td>-</td>
            <td>,&quot; be &quot;</td>
        </tr>
        <tr>
            <td>just an argument</td>
            <td>-</td>
            <td>just an argument</td>
        </tr>
        <tr>
            <td>that</td>
            <td>that</td>
            <td>that</td>
        </tr>
        <tr>
            <td>the precedent was<br />wrongly decided</td>
            <td>mere error</td>
            <td>the precedent was<br />wrongly decided</td>
        </tr>
        <tr>
            <td>-</td>
            <td>-</td>
            <td>.&quot; Id</td>
        </tr>
        <tr>
            <td>.</td>
            <td>-</td>
            <td>.</td>
        </tr>
        <tr>
            <td>That</td>
            <td>-</td>
            <td>And this, as the<br />majority concluded,</td>
        </tr>
        <tr>
            <td>is</td>
            <td>is</td>
            <td>is</td>
        </tr>
        <tr>
            <td>-</td>
            <td>&quot;</td>
            <td>&quot;</td>
        </tr>
        <tr>
            <td>not enough to<br />justify overruling a<br />statutory precedent</td>
            <td>not enough to<br />justify overruling a<br />statutory precedent</td>
            <td>not enough to<br />justify overruling a<br />statutory precedent</td>
        </tr>
        <tr>
            <td>.</td>
            <td>.&quot;</td>
            <td>.&quot;</td>
        </tr>
    </tbody>
</table>

<p>I thought the “svg” and “svg_simple” modes were not very readable, because they were a little too chaotic and cluttered with arrows and labels. I didn’t find any option for the svg charts to be vertically oriented, so even with this relatively short text I got an extremely wide image, which Jupyter and Jekyll both wanted to shrink down to an unreadable size.</p>

<p><img src="/assets/img/2024-11-17-Digraph.gv.svg" alt="Left-to-right flowchart comparing the three opinion snippets" /></p>

<p>(CollateX also has an “html2” mode that uses eyeball-melting background colors to “draw your attention” to cells containing unique text.)</p>

<p>CollateX has a “fuzzy” matching mode, where similar strings can be shown side-by-side in the visualization. However, the fuzzy mode is only available when CollateX is set to put only a single word or “token” in each cell of the table, which seems like it would be far less readable. There are also JSON and XML output modes that could be used to export the data to alternate visualization tools.</p>

<p>The primary use case of CollateX is comparing alternate versions of the same work, which is not exactly what I did here. While a later court opinion might choose to restate a significant passage from an earlier opinion, it might also insert commentary or quote parts of the earlier opinion out of order. If the two texts aren’t intended to be the same, then it may not be useful for a collation to show that there are lots of differences. To determine whether it’s appropriate to show a text collation to a user, a publisher would need a reliable way of detecting situations where a later court is trying to restate a holding from an earlier opinion, but not using the exact same words.</p>

<p>CollateX is available under GNU General Public License v3.0.</p>]]></content><author><name>Matt Carey</name></author><category term="caselaw" /><category term="collation" /><summary type="html"><![CDATA[I tried out the CollateX Python library to see if it seems useful for visualizing similar text passages about legal doctrines, especially caselaw. I used a very simple example dataset consisting of a holding from Loper Bright Enterprises v. Raimondo (the opinion that abolished Chevron deference) and two passages from later opinions restating the holding of Raimondo (Tennessee v. Becerra and United States v. Trumbull). You can find the exact code I used in this Jupyter Notebook.]]></summary></entry><entry><title type="html">A Python Package for Legal Case Based Reasoning</title><link href="https://mscarey.github.io/2023/07/02/legal-case-based-reasoning.html" rel="alternate" type="text/html" title="A Python Package for Legal Case Based Reasoning" /><published>2023-07-02T20:00:00+00:00</published><updated>2023-07-02T20:00:00+00:00</updated><id>https://mscarey.github.io/2023/07/02/legal-case-based-reasoning</id><content type="html" xml:base="https://mscarey.github.io/2023/07/02/legal-case-based-reasoning.html"><![CDATA[<p>In June I was lucky enough to be sent to Braga, Portugal to represent <a href="https://www.law.cornell.edu/">Cornell Legal Information Institute</a> at the <a href="https://icail2023.di.uminho.pt/">19th ICAIL conference</a> (International Conference on Artificial Intelligence and Law). The core of this conference is an academic community rooted in knowledge-heavy AI approaches, many of them with lineage extending back at least to the 1980s. I’ve been reading work by many of these researchers for many years, so it was great to meet them in person and hear what they’ve been up to lately.</p>

<p>Only a few of the research projects presented had code available to run in Python. Out of those that did, I think the standout was a <a href="https://git.science.uu.nl/D.Odekerken/lcbr/">Legal Case-Based Reasoning</a> system presented by <a href="https://webspace.science.uu.nl/~3827887/">Daphne Odekerken</a> of Utrecht University. The paper was coauthored with <a href="https://www.florisbex.com/">Floris Bex</a> and <a href="https://webspace.science.uu.nl/~prakk101/publications.html">Henry Prakken</a>, and it also cited inspiration from formal logic models of precedent developed by <a href="http://www.horty.umiacs.io/publications.html">John Horty</a>.</p>

<p>The Python package, which is just named with the initials LCBR, lets users define a legal issue and a list of factors that contribute to determining the issue’s outcome. A factor’s value can be set as a boolean, or as a number in a range. For instance, the demo application is about whether a retail sales website should be investigated as fraudulent. Boolean factors include whether the site has a terms and conditions page, and whether it has a non-functioning payment link. Numeric factors include the number of days the page has been online. If I understand right, setting up a user application with the package requires identifying the full list of factors that can be used to decide the legal issue, and also requires labeling each of the factors and “pro” or “contra”. For instance, the number of days online would be a “contra” factor where the website becomes less suspicious the longer it’s been online. I don’t think there’s any way to say that a “pro” factor can become a “contra” factor in the presence of specific other factors, so the system would only work when every factor has a certain <strong>polarity</strong> that never changes.</p>

<p>One great feature of LCBR is that it might not require you to have all the possible information about a particular case before you can get an answer about the outcome. If there’s no possible factor that will <strong>distinguish</strong> your case from other cases that reached a certain outcome, then the system can reach a <strong>stable</strong> conclusion that the same outcome will be reached even if new information is added later. For example, if there’s no way the facts of the current case can turn out to be more favorable to the defendant on any dimension compared to the facts of a prior case that reached a decision against a defendant, then the system reasons that the current case will go against the defendant as well. As that example suggests, LCBR does assume that all the cases in the database are consistent with one another. If not enough factors have been determined to reach a stable conclusion, LCBR also has an algorithm to determine which other factors are most relevant to decide the issue. This seems similar to the function that <a href="https://docassemble.org/">Docassemble</a> uses to determine which question to present to the user next during a guided interview.</p>

<p>The LCBR package has a UI that can be spun up using Flask and <a href="https://dash.plotly.com/">Dash</a>, but I wouldn’t exactly call it user-friendly because it seems to assume the user has a lot of knowledge about the theoretical concepts that inform the case-based reasoning algorithm. However, a version of the same tool was used to create a <a href="https://aangifte.politie.nl/iaai-preintake/#/">user-facing intake system</a> for the National Police AI Lab of the Netherlands.</p>

<p>I’d love to see a system like LCBR expanded beyond its current limitations. When creating a model of a legal issues, I’d prefer not to have to list all factors that can bear on a determination in advance, because sometimes newly-added cases will also identify new factors. I’d also prefer not to have to identify the polarity of every factor in advance. Even the <a href="https://git.science.uu.nl/D.Odekerken/lcbr/-/blob/master/ExtendedPaperWithProofs.pdf">paper introducing LCBR</a> concedes that the assumption of a consistent case base is “quite a strong assumption.” Instead of making that assumption, I’d like to use a system that can provide rules of precedent that start with an inconsistent set of cases, and then show how certain cases can be overruled or disregarded until all the cases that remain are consistent. (But when I suggest features like that, I’m thinking of what would be useful for simulating common law jurisprudence, which probably wouldn’t meet the needs of government agencies in the Netherlands.) And finally, instead of having models of “cases” covering only one legal issue, it would be nice to model a collection of rules showing ways to reach multiple legal conclusions, some of which are factors potentially supporting further conclusions. That would allow the system to evolve beyond simple one-step legal determinations such as whether to open a fraud investigation, so that the system could model more complex processes such as litigation.</p>]]></content><author><name>Matt Carey</name></author><category term="reasoning" /><category term="explainability" /><category term="ICAIL" /><summary type="html"><![CDATA[In June I was lucky enough to be sent to Braga, Portugal to represent Cornell Legal Information Institute at the 19th ICAIL conference (International Conference on Artificial Intelligence and Law). The core of this conference is an academic community rooted in knowledge-heavy AI approaches, many of them with lineage extending back at least to the 1980s. I’ve been reading work by many of these researchers for many years, so it was great to meet them in person and hear what they’ve been up to lately.]]></summary></entry><entry><title type="html">Serializing Legal Rules with Pydantic</title><link href="https://mscarey.github.io/2021/10/27/serializing-legal-rules-with-pydantic.html" rel="alternate" type="text/html" title="Serializing Legal Rules with Pydantic" /><published>2021-10-27T14:00:00+00:00</published><updated>2021-10-27T14:00:00+00:00</updated><id>https://mscarey.github.io/2021/10/27/serializing-legal-rules-with-pydantic</id><content type="html" xml:base="https://mscarey.github.io/2021/10/27/serializing-legal-rules-with-pydantic.html"><![CDATA[<p>I’ve released <a href="https://pypi.org/project/AuthoritySpoke/">version 0.9 of AuthoritySpoke</a>. In <a href="/2021/05/24/design-notes-authorityspoke-schemas.html">my last blog post about AuthoritySpoke</a>, I wrote that I had decided not to migrate all its data serialization code to <a href="https://pydantic-docs.helpmanual.io/">Pydantic</a>. In this post, I’ll explain why I changed my mind and did just that.</p>

<p>Basically, I became tired of the proliferation of messy data loading code in <a href="https://github.com/mscarey/AuthoritySpoke">the AuthoritySpoke repository</a>. That repository was the core of my “legal rule automation” project, but it was beginning to look like a cluttered workshop full of odds and ends. Every time a part of AuthoritySpoke started to look neat and coherent, I bundled it up as a separate Python package with separate documentation and moved it to a separate GitHub repository, leaving behind the messier code that didn’t quite fit together or that was hard to use.</p>

<p>When I created the judicial opinion download library <a href="https://github.com/mscarey/justopinion">Justopinion</a>, I was able to choose a serializer without the burden of supporting legacy code, and Pydantic felt like the right choice, so I went with it. But then Justopinion became a dependency of AuthoritySpoke, which meant AuthoritySpoke had to import Pydantic to run. That put me on the path to adopting Pydantic for the entire AuthoritySpoke project.</p>

<p>The major design difference between Pydantic and the serializer I previously used, Marshmallow, is that with Pydantic the information needed to serialize objects to JSON is stored on the objects themselves, rather than in separate serializer classes. The result was that I was also able to delete a lot of old code I’d written, including several whole modules, and replace it with Pydantic’s built-in functionality.</p>

<p>My biggest fear about the transition was that because I’d have to make changes to all the Python classes in my project that stored data, the change might introduce bugs that I wouldn’t be able to fix, and the migration to Pydantic would simply fail. But in the end I was able to migrate every feature to Pydantic, while removing both Marshmallow and its associated API documentation library Apispec from the list of dependencies that have to be imported when AuthoritySpoke is installed.</p>

<p>The new version 0.9 of AuthoritySpoke has been mainly about reducing the amount of code and improving its organization, without introducing many new features. But as a result of the Pydantic transition, nearly all AuthoritySpoke classes have newly-added <code class="language-plaintext highlighter-rouge">.dict()</code> and <code class="language-plaintext highlighter-rouge">.json()</code> methods <a href="https://pydantic-docs.helpmanual.io/usage/exporting_models/">for serializing to generic datatypes</a>, as well as <code class="language-plaintext highlighter-rouge">.schema()</code> and <code class="language-plaintext highlighter-rouge">.schema_json()</code> methods <a href="https://pydantic-docs.helpmanual.io/usage/schema/">for generating JSON Schema API documentation</a>. These serialization methods are easier to use and understand than the alternatives that existed in the past. Overall, version 0.9 is more consistent, more maintainable, less buggy, and more suitable for larger projects.</p>]]></content><author><name>Matt Carey</name></author><category term="AuthoritySpoke" /><category term="APIs" /><summary type="html"><![CDATA[I’ve released version 0.9 of AuthoritySpoke. In my last blog post about AuthoritySpoke, I wrote that I had decided not to migrate all its data serialization code to Pydantic. In this post, I’ll explain why I changed my mind and did just that.]]></summary></entry><entry><title type="html">Using the Caselaw Access Project API</title><link href="https://mscarey.github.io/2021/08/15/caselaw-access-project-api.html" rel="alternate" type="text/html" title="Using the Caselaw Access Project API" /><published>2021-08-15T06:00:00+00:00</published><updated>2021-08-15T06:00:00+00:00</updated><id>https://mscarey.github.io/2021/08/15/caselaw-access-project-api</id><content type="html" xml:base="https://mscarey.github.io/2021/08/15/caselaw-access-project-api.html"><![CDATA[<p>The <a href="https://case.law/">Caselaw Access Project</a> is one of the two best resources for free programmatic access to American caselaw data (along with <a href="https://www.courtlistener.com/">CourtListener</a>). It has a great, user-friendly website, and thoughtful documentation aimed as several different audiences. And it has a more dramatic story than most legal tech projects, in which archivists at Harvard’s law library cut the spines off of every book in an exhaustive law library collection, digitally scanned them all, but subjected the resulting archive to access restrictions for seven years from the end of the date of <a href="http://www.nytimes.com/2015/10/29/us/harvard-law-library-sacrifices-a-trove-for-the-sake-of-a-free-database.html">the scanning project</a>. (Beware of clicking that link, if like me you jealously guard your monthly allocation of free New York Times articles.)</p>

<p>In the years since the API launched, it’s become significantly more useful with the addition of <a href="https://lil.law.harvard.edu/blog/2020/04/22/caselaw-access-project-citation-graph/">citation graph data</a>. But it’s also important to recognize the <a href="https://case.law/about/#scope-limits">limits on the API’s scope</a>: it only includes cases published in print, and only cases published in bound volumes through 2018, when the scanning project took place. The API also limits public users to 500 API calls per day for most jurisdictions.</p>

<p>I created a Python module called <a href="https://justopinion.readthedocs.io/en/latest/">Justopinion</a> with a few utility functions for getting opinions from the Caselaw Access Project API. It’s mostly designed around the use case of downloading a judicial decision with a known citation, getting the text of the opinions in the case, and then downloading any other decisions cited within those opinions.</p>

<p>Here’s an example from Justopinion’s <a href="https://justopinion.readthedocs.io/en/latest/guides/getting_started.html">getting started guide</a> that roughly follows that workflow:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">justopinion</span> <span class="kn">import</span> <span class="n">CAPClient</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">CAPClient</span><span class="p">(</span><span class="n">api_token</span><span class="o">=</span><span class="n">CAP_API_KEY</span><span class="p">)</span>
<span class="n">thornton</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">read_cite</span><span class="p">(</span><span class="s">"1 Breese 34"</span><span class="p">,</span> <span class="n">full_case</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>

<p>The text that gets passed to the <code class="language-plaintext highlighter-rouge">CAPClient.read_cite</code> method (such as “1 Breese 34”) can be normalized as a recognizable citation <a href="/2021/05/12/trying-out-eyecite.html">thanks to the Eyecite package</a> from the Free Law Project.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>thornton.casebody.data.parties[0]
'John Thornton and others, Appellants, v. George Smiley and John Bradshaw, Appellees.'
</code></pre></div></div>

<p>The case is loaded as a <a href="https://pydantic-docs.helpmanual.io/">Pydantic</a> model, so any static analysis tools you use on your Python code should understand the data types for each field. The <a href="https://case.law/docs/learning_tracks/APIs/in_depth">case.law API documentation</a> describes what you should expect the API to deliver.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">len</span><span class="p">(</span><span class="n">thornton</span><span class="p">.</span><span class="n">cites_to</span><span class="p">)</span>
<span class="mi">1</span>
<span class="nb">str</span><span class="p">(</span><span class="n">thornton</span><span class="p">.</span><span class="n">cites_to</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="s">'Citation to 15 Ill., 284'</span>
</code></pre></div></div>

<p>We can see that <em>Thornton v. Smiley</em> cites to only one other case. By passing the citation to the <code class="language-plaintext highlighter-rouge">CAPClient.read_cite</code> method, we can download JSON representing the cited decision and turn it into another instance of the <code class="language-plaintext highlighter-rouge">Decision</code> class.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cited</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">read_cite</span><span class="p">(</span><span class="n">thornton</span><span class="p">.</span><span class="n">cites_to</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">full_case</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="nb">str</span><span class="p">(</span><span class="n">cited</span><span class="p">)</span>
<span class="s">'Marsh v. People, 15 Ill. 284 (1853-12-01)'</span>
</code></pre></div></div>

<p>We can also locate text within an opinion we downloaded, and generate an <a href="https://anchorpoint.readthedocs.io/en/latest/">Anchorpoint</a> selector to reference a passage from the opinion.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">thornton</span><span class="p">.</span><span class="n">opinions</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">locate_text</span><span class="p">(</span><span class="s">"The court knows of no power in the administrator"</span><span class="p">)</span>
<span class="n">TextPositionSet</span><span class="p">{</span><span class="n">TextPositionSelector</span><span class="p">[</span><span class="mi">22</span><span class="p">,</span> <span class="mi">70</span><span class="p">)}</span>
</code></pre></div></div>

<p>Of course, Justopinion isn’t necessary for accessing the Case Access Project API from Python. The API’s documentation gives this example of downloading a case using <a href="https://docs.python-requests.org/en/master/user/quickstart/">requests</a>, which is a more flexible option but it might involve writing more code in some situations.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="n">get</span><span class="p">(</span>
    <span class="s">'https://api.case.law/v1/cases/435800/?full_case=true'</span><span class="p">,</span>
    <span class="n">headers</span><span class="o">=</span><span class="p">{</span><span class="s">'Authorization'</span><span class="p">:</span> <span class="s">'Token abcd12345'</span><span class="p">}</span>
<span class="p">)</span>
</code></pre></div></div>

<p>Justopinion originated as part of my other Python library AuthoritySpoke, and as of <a href="https://authorityspoke.readthedocs.io/en/latest/">AuthoritySpoke version 0.8</a>, Justopinion is a dependency that gets imported as part of AuthoritySpoke’s setup process. Justopinion is still in an early state, and there are lots of features that could still be added. I decided to use the generic name Justopinion instead of naming the package after the CAP API because I’m considering also adding support for the CourtListener API, and possibly some use cases that don’t depend on an API. If you have any comments or requests about Justopinion, pleases post them at <a href="https://github.com/mscarey/justopinion">its GitHub repo</a>.</p>]]></content><author><name>Matt Carey</name></author><category term="caselaw" /><category term="APIs" /><category term="Justopinion" /><summary type="html"><![CDATA[The Caselaw Access Project is one of the two best resources for free programmatic access to American caselaw data (along with CourtListener). It has a great, user-friendly website, and thoughtful documentation aimed as several different audiences. And it has a more dramatic story than most legal tech projects, in which archivists at Harvard’s law library cut the spines off of every book in an exhaustive law library collection, digitally scanned them all, but subjected the resulting archive to access restrictions for seven years from the end of the date of the scanning project. (Beware of clicking that link, if like me you jealously guard your monthly allocation of free New York Times articles.)]]></summary></entry><entry><title type="html">Docassemble-L4: Deep and Meticulous Abstractions for Legal Rules</title><link href="https://mscarey.github.io/2021/06/08/docassemble-l4.html" rel="alternate" type="text/html" title="Docassemble-L4: Deep and Meticulous Abstractions for Legal Rules" /><published>2021-06-08T14:00:00+00:00</published><updated>2021-06-08T14:00:00+00:00</updated><id>https://mscarey.github.io/2021/06/08/docassemble-l4</id><content type="html" xml:base="https://mscarey.github.io/2021/06/08/docassemble-l4.html"><![CDATA[<p>Creating a data schema for legal analysis involves plunging into abstraction. How deeply abstract the schema becomes probably depends more than we want to admit on the temperament of the person creating the schema. The more abstraction, the more powerful and expressive the schema can be, but also the greater the risk the schema will crumple under the pressure of the analyst’s assumptions or unprovable metaphysical beliefs. At the <a href="https://cclaw.smu.edu.sg/">Singapore Management University Center for Computational Law</a>, Principal Investigator Meng Weng Wong, <a href="https://medium.com/computational-law-diary/introducing-l4-docassemble-69ce4b1fb1e7">Jason Morris</a>, and the rest of their team went extremely deep to create an extension to <a href="https://docassemble.org">Docassemble</a> called <a href="https://github.com/smucclaw/docassemble-l4">Docassemble-L4</a>. I salute their bravery.</p>

<p>Like most Docassemble extensions, Docassemble-L4 uses Python code to create an interactive interview to help apply a legal standard to a user’s fact pattern. What makes Docassemble-L4 unique is that it lets you create that Python code by translating it automatically from a different language called <a href="https://gitlab.software.imdea.org/ciao-lang/sCASP">s(CASP)</a>, which appears to be closely related to Prolog. The intended way to get that S(CASP) code is by translating it automatically from <a href="https://github.com/smucclaw/baby-l4">L4</a>. I understand L4 is a programming language that <a href="https://github.com/smucclaw/dsl">hasn’t been released yet</a>, but it will leverage the <a href="https://github.com/Z3Prover/z3">Z3</a> theorem prover. To use Docassemble-L4, you must provide not just the s(CASP) code, but also a YAML document in a special format called <a href="https://github.com/smucclaw/docassemble-l4/blob/main/docassemble/l4/data/static/r34.yml">LExSIS</a> (no relation to the legal publisher), which tells Docassemble how to create an interview interface to elicit information relevant to the declarative logic statements in the s(CASP) code.</p>

<p>And the data input syntax is verbose. It suffers from a degree of <a href="https://medium.com/computational-law-diary/legal-drafting-to-avoid-computational-complexity-115a55818493">combinatorial explosion</a> since s(CASP)’s inference rules don’t seem to have a proper “or” syntax, as shown in this <a href="https://github.com/smucclaw/docassemble-l4/blob/main/docassemble/l4/data/static/r34.pl">excerpt from the example data</a>:</p>

<div class="language-prolog highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ss">business_entity</span><span class="p">(</span><span class="nv">X</span><span class="p">)</span> <span class="p">:-</span> <span class="ss">carries_on</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">business</span><span class="p">(</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">company</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">law_practice_in_singapore</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">joint_law_venture</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">formal_law_alliance</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">foreign_law_practice</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">third_schedule_institution</span><span class="p">(</span><span class="nv">X</span><span class="p">).</span>
<span class="ss">business_entity</span><span class="p">(</span><span class="nv">X</span><span class="p">)</span> <span class="p">:-</span> <span class="ss">carries_on</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">business</span><span class="p">(</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">corporation</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">law_practice_in_singapore</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">joint_law_venture</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">formal_law_alliance</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">foreign_law_practice</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">third_schedule_institution</span><span class="p">(</span><span class="nv">X</span><span class="p">).</span>
<span class="ss">business_entity</span><span class="p">(</span><span class="nv">X</span><span class="p">)</span> <span class="p">:-</span> <span class="ss">carries_on</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">business</span><span class="p">(</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">partnership</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">law_practice_in_singapore</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">joint_law_venture</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">formal_law_alliance</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">foreign_law_practice</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">third_schedule_institution</span><span class="p">(</span><span class="nv">X</span><span class="p">).</span>
<span class="ss">business_entity</span><span class="p">(</span><span class="nv">X</span><span class="p">)</span> <span class="p">:-</span> <span class="ss">carries_on</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">business</span><span class="p">(</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">llp</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">law_practice_in_singapore</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">joint_law_venture</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">formal_law_alliance</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">foreign_law_practice</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">third_schedule_institution</span><span class="p">(</span><span class="nv">X</span><span class="p">).</span>
<span class="ss">business_entity</span><span class="p">(</span><span class="nv">X</span><span class="p">)</span> <span class="p">:-</span> <span class="ss">carries_on</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">business</span><span class="p">(</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">soleprop</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">law_practice_in_singapore</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">joint_law_venture</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">formal_law_alliance</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">foreign_law_practice</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">third_schedule_institution</span><span class="p">(</span><span class="nv">X</span><span class="p">).</span>
<span class="ss">business_entity</span><span class="p">(</span><span class="nv">X</span><span class="p">)</span> <span class="p">:-</span> <span class="ss">carries_on</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">business</span><span class="p">(</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">business_trust</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">law_practice_in_singapore</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">joint_law_venture</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">formal_law_alliance</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">foreign_law_practice</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">third_schedule_institution</span><span class="p">(</span><span class="nv">X</span><span class="p">).</span>
<span class="ss">business_entity</span><span class="p">(</span><span class="nv">X</span><span class="p">)</span> <span class="p">:-</span> <span class="ss">carries_on</span><span class="p">(</span><span class="nv">X</span><span class="p">,</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">business</span><span class="p">(</span><span class="nv">Y</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">law_practice_in_singapore</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">joint_law_venture</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">formal_law_alliance</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">foreign_law_practice</span><span class="p">(</span><span class="nv">X</span><span class="p">),</span> <span class="ss">not</span> <span class="ss">third_schedule_institution</span><span class="p">(</span><span class="nv">X</span><span class="p">).</span>
</code></pre></div></div>

<p>So Docassemble-L4 is very challenging to use. But still, it’s a significant accomplishment because it moves Docassemble farther from single-purpose interviews, and toward generating conclusions by searching a large collection of legal rules derived from legislation. This is the same idea suggested by the <a href="https://docassemble.org/docs/logic.html#bpsharing">Best Practices section of the Docassemble documentation</a>, which suggests distributing the “mandatory” block that defines the objective of a particular interview separately from the file that defines the rest of the rules, so that the parts of the interview that are more likely to be reusable are easy to find in a separate module. Another benefit of Docassemble-L4’s roots in logic programming is that it can generate explanations for its conclusions, with links to the relevant passages of legislation. Maybe more of Docassemble-L4’s workflow can be automated in future versions.</p>

<p>Although I’ve been critical enough already, I have one more quibble with Docassemble-L4’s design philosophy. I think Docassemble-L4 was written with the idea that each rule written in s(CASP) will correspond to a separately-numbered section of the published legislation. I think that’s a little bit of an artificial restriction, and I think it resulted in the example data containing a lot of shorter rules connected by meta-rules about how one rule “overrides” another. If instead the legislation was thought of as containing larger rules that could span multiple numbered paragraphs, it would no longer look like there were so many contradictions within the same legal code, and there would be less need for the user to create meta-rules about which rules take precedence over others in the event of a conflict. A really expressive schema for legal analysis should include a rich syntax to describe the relationship between a legal rule and the legal documents that enable it.</p>]]></content><author><name>Matt Carey</name></author><category term="Docassemble" /><category term="L4" /><category term="Z3" /><summary type="html"><![CDATA[Creating a data schema for legal analysis involves plunging into abstraction. How deeply abstract the schema becomes probably depends more than we want to admit on the temperament of the person creating the schema. The more abstraction, the more powerful and expressive the schema can be, but also the greater the risk the schema will crumple under the pressure of the analyst’s assumptions or unprovable metaphysical beliefs. At the Singapore Management University Center for Computational Law, Principal Investigator Meng Weng Wong, Jason Morris, and the rest of their team went extremely deep to create an extension to Docassemble called Docassemble-L4. I salute their bravery.]]></summary></entry><entry><title type="html">Design Notes: YAML and JSON Schemas in AuthoritySpoke</title><link href="https://mscarey.github.io/2021/05/24/design-notes-authorityspoke-schemas.html" rel="alternate" type="text/html" title="Design Notes: YAML and JSON Schemas in AuthoritySpoke" /><published>2021-05-24T14:00:00+00:00</published><updated>2021-05-24T14:00:00+00:00</updated><id>https://mscarey.github.io/2021/05/24/design-notes-authorityspoke-schemas</id><content type="html" xml:base="https://mscarey.github.io/2021/05/24/design-notes-authorityspoke-schemas.html"><![CDATA[<p>AuthoritySpoke version 0.7 is <a href="https://pypi.org/project/AuthoritySpoke/">available on PyPI</a>, bringing with it a new data input format using YAML files. For documentation on that feature, check out the just-published <a href="https://authorityspoke.readthedocs.io/en/latest/guides/load_yaml_holdings.html">user guide</a> or the <a href="https://authorityspoke.readthedocs.io/en/latest/io/loaders.html#authorityspoke.io.loaders.read_anchored_holdings_from_file">API documentation</a>. With this blog post I’ll go more into my reasoning in making the changes, and where I see AuthoritySpoke going next.</p>

<p>I planned for AuthoritySpoke to load two kinds of data: machine-serialized JSON objects, and also handmade test data. I wanted this handmade data to allow various kinds of abbreviations, and even to be tolerant of certain kinds of errors. And then I made the fateful decision to create just one set of data loading schemas for loading both kinds of data. That was probably the most costly design mistake I’ve made on AuthoritySpoke (that I know of!) so far.</p>

<p>The functions that expanded abbreviated text in input files turned out to be easily the most finicky and error-prone parts of AuthoritySpoke. They also had a tendency to break in inscrutable ways when I modified functions in far-away parts of AuthoritySpoke that I had assumed were safely isolated from the text expansion functions. And they caused workflows that should have been simple to become lengthy and hard to debug. It was as if all the data that I loaded into AuthoritySpoke first had to be placed on a very long conveyor belt (or to be more literal, a very tall call stack) where the data would be poked and tweaked and adjusted by a long series of functions that corrected typos, expanded abbreviations, and the like. When something went wrong, I’d have to inspect all of the functions along the conveyor belt until I found the one that wasn’t working as designed. The <a href="https://marshmallow.readthedocs.io/en/stable/">Marshmallow</a> data serialization library was permissive enough to let me introduce all kinds of anomalies into the data loading process, but in some ways I used that freedom to shoot myself in the foot. And of course, when I tried to use open source libraries to automatically generate a publishable OpenAPI specification by analyzing the schemas I’d written, the result made no sense because I’d used the serializers in nonstandard ways. (AuthoritySpoke’s <a href="https://authorityspoke.readthedocs.io/en/latest/guides/create_holding_data.html#json-api-specification">current OpenAPI specification</a> is better, I think.)</p>

<p>Also, the first process I established for loading handmade data was for the user to create a JSON file. But really, nobody wants to create JSON files by hand without purpose-built tools. So in version 0.7, my solution is to create a separate data loading workflow for handmade data, which should now be in YAML instead of in JSON. Here’s an example of a YAML file using the new data input format, with one of the rules from the “Beard Act” test dataset that <a href="/2020/11/30/a-test-rubric-for-legal-rule-automation.html">I posted about before</a>.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="pi">-</span> <span class="na">holdings</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">inputs</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="na">type</span><span class="pi">:</span> <span class="s">fact</span>
          <span class="na">content</span><span class="pi">:</span> <span class="s2">"</span><span class="s">{the</span><span class="nv"> </span><span class="s">suspected</span><span class="nv"> </span><span class="s">beard}</span><span class="nv"> </span><span class="s">was</span><span class="nv"> </span><span class="s">facial</span><span class="nv"> </span><span class="s">hair"</span>
        <span class="pi">-</span> <span class="na">type</span><span class="pi">:</span> <span class="s">fact</span>
          <span class="na">content</span><span class="pi">:</span> <span class="s">the length of the suspected beard was &gt;= 5 millimetres</span>
        <span class="pi">-</span> <span class="na">type</span><span class="pi">:</span> <span class="s">fact</span>
          <span class="na">content</span><span class="pi">:</span> <span class="s">the suspected beard occurred on or below the chin</span>
      <span class="na">outputs</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="na">type</span><span class="pi">:</span> <span class="s">fact</span>
          <span class="na">content</span><span class="pi">:</span> <span class="s">the suspected beard was a beard</span>
      <span class="na">enactments</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="na">node</span><span class="pi">:</span> <span class="s">/test/acts/47/4</span>
          <span class="na">exact</span><span class="pi">:</span>
            <span class="s2">"</span><span class="s">In</span><span class="nv"> </span><span class="s">this</span><span class="nv"> </span><span class="s">Act,</span><span class="nv"> </span><span class="s">beard</span><span class="nv"> </span><span class="s">means</span><span class="nv"> </span><span class="s">any</span><span class="nv"> </span><span class="s">facial</span><span class="nv"> </span><span class="s">hair</span><span class="nv"> </span><span class="s">no</span><span class="nv"> </span><span class="s">shorter</span><span class="nv"> </span><span class="s">than</span><span class="nv"> </span><span class="s">5</span><span class="nv"> </span><span class="s">millimetres</span>
            <span class="s">in</span><span class="nv"> </span><span class="s">length</span><span class="nv"> </span><span class="s">that:</span><span class="nv"> </span><span class="s">occurs</span><span class="nv"> </span><span class="s">on</span><span class="nv"> </span><span class="s">or</span><span class="nv"> </span><span class="s">below</span><span class="nv"> </span><span class="s">the</span><span class="nv"> </span><span class="s">chin"</span>
      <span class="na">universal</span><span class="pi">:</span> <span class="no">true</span>
</code></pre></div></div>

<p>(Nobody wants to create YAML files by hand either, but that’s a problem for another day.)</p>

<p>The YAML data loading module can now be kept separate from the rest of AuthoritySpoke, where it’ll be less likely to hurt anyone, and the workflow for loading data from JSON won’t include any features for handling abbreviations or typos. Most importantly for me, I’ll be able to write unit tests that get closer to isolating just the functions they’re really trying to test, without touching the text formatting functions.</p>

<p>I considered switching from Marshmallow to the trendier Pydantic serializer, but I decided against it for two related reasons. First, the AuthoritySpoke classes that represent units of legal analysis already have a very complicated subclass inheritance pattern. Pydantic requires any class that’s going to be serialized to also inherit from a Pydantic serialization parent class. I was afraid that inheriting another subclass would have added even more complexity that could have had unforeseen consequences. Second, I’ve had good experiences applying the design concept of <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">dependency inversion</a>. I want to think of serialization libraries as implementation details, not as core features of AuthoritySpoke. By sticking with Marshmallow, I can keep the serialization schemas in their own modules separate from the core business logic. The core modules of AuthoritySpoke don’t have to “know about” the serializer classes, and I can write unit tests for the core business logic that don’t touch Marshmallow in any way.</p>

<p>The biggest challenge remaining in AuthoritySpoke’s data schema (including the simpler non-YAML schema) is that it’s a polymorphic schema, meaning more than one object schema can occur in the same place. For instance, an “input” or “output” for AuthoritySpoke’s <a href="https://authorityspoke.readthedocs.io/en/latest/api/holdings.html#authorityspoke.holdings.Holding">Holding</a> class could be a Fact, or it could be an item of Evidence, or other things. In order to implement the feature of polymorphism, AuthoritySpoke needs to import not just Marshmallow but also a related library called <a href="https://github.com/marshmallow-code/marshmallow-oneofschema">marshmallow-oneofschema</a>. I’ve learned that I should get nervous when I import a software package without a large and active community, and for me the easiest way to measure that community is GitHub stars, which basically correspond to satisfied users. Marshmallow has <a href="https://github.com/marshmallow-code/marshmallow/stargazers">5,500 stars</a>, which is not that high compared to the <a href="https://github.com/encode/django-rest-framework/stargazers">21,000 stars</a> that its competitor Django Rest Framework has. (Pydantic has 6,500.) But if I want to generate an OpenAPI specification for my Marshmallow schema, I have to also download apispec, which has <a href="https://github.com/marshmallow-code/apispec/stargazers">859 stars</a> at the time of writing. Then my polymorphic schema requires me to grab marshmallow-oneofschema, which has a mere <a href="https://github.com/marshmallow-code/marshmallow-oneofschema/stargazers">96 stars</a>. And then the polymorphic part of my schema needs to be included in the OpenAPI specification too, so I have to import <a href="https://github.com/timakro/apispec-oneofschema">apispec-oneofschema</a>, which has just <a href="https://github.com/timakro/apispec-oneofschema/stargazers">eight stars including mine</a>. Pretty scary. These libraries could have trouble in the future, and I expect to be relying on them a lot as I move forward with AuthoritySpoke.</p>

<p>The future of AuthoritySpoke depends on getting it working with web APIs. Not to mention a web user interface. Version 0.7 does a lot to simplify one of AuthoritySpoke’s data models to make it suitable for the web. An even simpler data model would be better, but I think the foundation exists to design ways to share and organize judicial rule models on <a href="https://authorityspoke.com/">authorityspoke.com</a>.</p>]]></content><author><name>Matt Carey</name></author><category term="AuthoritySpoke" /><summary type="html"><![CDATA[AuthoritySpoke version 0.7 is available on PyPI, bringing with it a new data input format using YAML files. For documentation on that feature, check out the just-published user guide or the API documentation. With this blog post I’ll go more into my reasoning in making the changes, and where I see AuthoritySpoke going next.]]></summary></entry><entry><title type="html">Trying Out Eyecite, Free Law Project’s New Case Citation Tool</title><link href="https://mscarey.github.io/2021/05/12/trying-out-eyecite.html" rel="alternate" type="text/html" title="Trying Out Eyecite, Free Law Project’s New Case Citation Tool" /><published>2021-05-12T19:00:00+00:00</published><updated>2021-05-12T19:00:00+00:00</updated><id>https://mscarey.github.io/2021/05/12/trying-out-eyecite</id><content type="html" xml:base="https://mscarey.github.io/2021/05/12/trying-out-eyecite.html"><![CDATA[<p>Around the beginning of 2021, the <a href="https://free.law/">Free Law Project</a> extracted the code that it’s been using to link case citations within <a href="https://www.courtlistener.com/">CourtListener</a>, and released it as a new open source Python package called Eyecite. I think <a href="https://github.com/freelawproject/eyecite">Eyecite</a> could become the most widely useful open source legal analysis tool to be released by anyone so far. It seems to have incredible potential for citation network analysis, and for preparing caselaw for natural language processing. I’m sure tools like this have existed inside commercial publishers for a long time, but providing these capabilities to open source developers could make a huge difference in expanding access to law.</p>

<p>Eyecite is built atop two arduous research projects that were themselves released as Python packages: <a href="https://github.com/freelawproject/courts-db">Courts-DB</a> and <a href="https://github.com/freelawproject/reporters-db">Reporters-DB</a>. These provide the data that lets Eyecite know which strings are valid case citations, and what courts published the opinions at each citation. Courts-DB and Reporters-DB were also created by the Free Law Project, building on earlier work by Frank Bennett and the <a href="https://juris-m.github.io/posts/2021-01-17-1.html">Legal Resource Registry</a>.</p>

<p>I’ll use the rest of this blog post to try out Eyecite’s basic features and give my first impressions. Eyecite is still under active development and I’m testing the version on the current master branch, which isn’t an official release version so it could be extra-buggy.</p>

<h2 id="detecting-citations">Detecting Citations</h2>

<p>I tested Eyecite’s citation detection feature on the first paragraph of the discussion section of the US Supreme Court’s recent opinion in <a href="https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_Inc.">Google v. Oracle America</a>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">eyecite</span>
<span class="n">text_from_opinion</span> <span class="o">=</span> <span class="s">"""Copyright and patents, the Constitution says,
    are to “promote the Progress of Science and useful Arts,
    by securing for limited Times to Authors and Inventors the
    exclusive Right to their respective Writings and Discoveries.”
    Art. I, §8, cl. 8. Copyright statutes and case law have made
    clear that copyright has practical objectives. It grants an
    author an exclusive right to produce his work (sometimes for
    a hundred years or more), not as a special reward, but in order
    to encourage the production of works that others might reproduce
    more cheaply. At the same time, copyright has negative features.
    Protection can raise prices to consumers. It can impose special
    costs, such as the cost of contacting owners to obtain reproduction
    permission. And the exclusive rights it awards can sometimes stand
    in the way of others exercising their own creative powers. See
    generally Twentieth Century Music Corp. v. Aiken, 422 U. S. 151,
    156 (1975); Mazer v. Stein, 347 U. S. 201, 219 (1954)."""</span>
</code></pre></div></div>

<p>Eyecite successfully discovered all three citations in the paragraph.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">citations</span> <span class="o">=</span> <span class="n">eyecite</span><span class="p">.</span><span class="n">get_citations</span><span class="p">(</span><span class="n">text_from_opinion</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">citations</span><span class="p">)</span>
<span class="mi">3</span>
</code></pre></div></div>

<p>Eyecite also successfully found that the first citation wasn’t a citation to a case.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">citations</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">NonopinionCitation</span><span class="p">(</span>
    <span class="n">token</span><span class="o">=</span><span class="n">SectionToken</span><span class="p">(</span>
        <span class="n">data</span><span class="o">=</span><span class="s">'§8,'</span><span class="p">,</span>
        <span class="n">start</span><span class="o">=</span><span class="mi">254</span><span class="p">,</span>
        <span class="n">end</span><span class="o">=</span><span class="mi">257</span><span class="p">),</span>
    <span class="n">index</span><span class="o">=</span><span class="mi">93</span><span class="p">,</span>
    <span class="n">span_start</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
    <span class="n">span_end</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
</code></pre></div></div>

<p>The only slight problem was that Eyecite only found three characters of the Non-opinion citation. If I needed to exclude the Non-opinion citations from the text for some reason, it would have been better if it had found the full citation text “Art. I, §8, cl. 8”.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">citations</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">token</span><span class="p">.</span><span class="n">data</span>
<span class="s">'§8,'</span>
</code></pre></div></div>

<p>Eyecite identified the other two citations in the paragraph as case citations. It came up with an amazing amount of information about them, almost all of which looks correct (it only came up with “Corp.” for the plaintiff’s name).</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">citations</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">FullCaseCitation</span><span class="p">(</span>
    <span class="n">token</span><span class="o">=</span><span class="n">CitationToken</span><span class="p">(</span>
        <span class="n">data</span><span class="o">=</span><span class="s">'422 U. S. 151'</span><span class="p">,</span>
        <span class="n">start</span><span class="o">=</span><span class="mi">984</span><span class="p">,</span>
        <span class="n">end</span><span class="o">=</span><span class="mi">997</span><span class="p">,</span>
        <span class="n">volume</span><span class="o">=</span><span class="s">'422'</span><span class="p">,</span>
        <span class="n">reporter</span><span class="o">=</span><span class="s">'U. S.'</span><span class="p">,</span>
        <span class="n">page</span><span class="o">=</span><span class="s">'151'</span><span class="p">,</span>
        <span class="n">exact_editions</span><span class="o">=</span><span class="p">(),</span>
        <span class="n">variation_editions</span><span class="o">=</span><span class="p">(</span>
            <span class="n">Edition</span><span class="p">(</span>
                <span class="n">reporter</span><span class="o">=</span><span class="n">Reporter</span><span class="p">(</span>
                    <span class="n">short_name</span><span class="o">=</span><span class="s">'U.S.'</span><span class="p">,</span>
                    <span class="n">name</span><span class="o">=</span><span class="s">'United States Supreme Court Reports'</span><span class="p">,</span>
                    <span class="n">cite_type</span><span class="o">=</span><span class="s">'federal'</span><span class="p">,</span>
                    <span class="n">is_scotus</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span>
                <span class="n">short_name</span><span class="o">=</span><span class="s">'U.S.'</span><span class="p">,</span>
                <span class="n">start</span><span class="o">=</span><span class="n">datetime</span><span class="p">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">1875</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
                <span class="n">end</span><span class="o">=</span><span class="bp">None</span><span class="p">),),</span>
        <span class="n">short</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
        <span class="n">extra_match_groups</span><span class="o">=</span><span class="p">{}),</span>
    <span class="n">index</span><span class="o">=</span><span class="mi">365</span><span class="p">,</span>
    <span class="n">span_start</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
    <span class="n">span_end</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
    <span class="n">reporter</span><span class="o">=</span><span class="s">'U.S.'</span><span class="p">,</span>
    <span class="n">page</span><span class="o">=</span><span class="s">'151'</span><span class="p">,</span>
    <span class="n">volume</span><span class="o">=</span><span class="s">'422'</span><span class="p">,</span>
    <span class="n">canonical_reporter</span><span class="o">=</span><span class="s">'U.S.'</span><span class="p">,</span>
    <span class="n">plaintiff</span><span class="o">=</span><span class="s">'Corp.'</span><span class="p">,</span>
    <span class="n">defendant</span><span class="o">=</span><span class="s">'Aiken,'</span><span class="p">,</span>
    <span class="n">pin_cite</span><span class="o">=</span><span class="s">'156'</span><span class="p">,</span>
    <span class="n">extra</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
    <span class="n">court</span><span class="o">=</span><span class="s">'scotus'</span><span class="p">,</span>
    <span class="n">year</span><span class="o">=</span><span class="mi">1975</span><span class="p">,</span>
    <span class="n">parenthetical</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
    <span class="n">reporter_found</span><span class="o">=</span><span class="s">'U. S.'</span><span class="p">,</span>
    <span class="n">exact_editions</span><span class="o">=</span><span class="p">(),</span>
    <span class="n">variation_editions</span><span class="o">=</span><span class="p">(</span>
        <span class="n">Edition</span><span class="p">(</span>
            <span class="n">reporter</span><span class="o">=</span><span class="n">Reporter</span><span class="p">(</span>
                <span class="n">short_name</span><span class="o">=</span><span class="s">'U.S.'</span><span class="p">,</span>
                <span class="n">name</span><span class="o">=</span><span class="s">'United States Supreme Court Reports'</span><span class="p">,</span>
                <span class="n">cite_type</span><span class="o">=</span><span class="s">'federal'</span><span class="p">,</span>
                <span class="n">is_scotus</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span>
            <span class="n">short_name</span><span class="o">=</span><span class="s">'U.S.'</span><span class="p">,</span>
            <span class="n">start</span><span class="o">=</span><span class="n">datetime</span><span class="p">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">1875</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="n">end</span><span class="o">=</span><span class="bp">None</span><span class="p">),),</span>
    <span class="n">all_editions</span><span class="o">=</span><span class="p">(</span>
        <span class="n">Edition</span><span class="p">(</span>
            <span class="n">reporter</span><span class="o">=</span><span class="n">Reporter</span><span class="p">(</span>
                <span class="n">short_name</span><span class="o">=</span><span class="s">'U.S.'</span><span class="p">,</span>
                <span class="n">name</span><span class="o">=</span><span class="s">'United States Supreme Court Reports'</span><span class="p">,</span> <span class="n">cite_type</span><span class="o">=</span><span class="s">'federal'</span><span class="p">,</span>
                <span class="n">is_scotus</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span>
            <span class="n">short_name</span><span class="o">=</span><span class="s">'U.S.'</span><span class="p">,</span>
            <span class="n">start</span><span class="o">=</span><span class="n">datetime</span><span class="p">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">1875</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="n">end</span><span class="o">=</span><span class="bp">None</span><span class="p">),),</span>
    <span class="n">edition_guess</span><span class="o">=</span><span class="n">Edition</span><span class="p">(</span>
        <span class="n">reporter</span><span class="o">=</span><span class="n">Reporter</span><span class="p">(</span>
            <span class="n">short_name</span><span class="o">=</span><span class="s">'U.S.'</span><span class="p">,</span>
            <span class="n">name</span><span class="o">=</span><span class="s">'United States Supreme Court Reports'</span><span class="p">,</span> <span class="n">cite_type</span><span class="o">=</span><span class="s">'federal'</span><span class="p">,</span>
            <span class="n">is_scotus</span><span class="o">=</span><span class="bp">True</span><span class="p">),</span>
        <span class="n">short_name</span><span class="o">=</span><span class="s">'U.S.'</span><span class="p">,</span>
        <span class="n">start</span><span class="o">=</span><span class="n">datetime</span><span class="p">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">1875</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
        <span class="n">end</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
    <span class="p">)</span>
</code></pre></div></div>

<p>Of course, the court that issued the cited opinion, and the reporter where it was published, are identified correctly.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">citations</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">court</span>
<span class="s">'scotus'</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">citations</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">reporter</span>
<span class="s">'U.S.'</span>
</code></pre></div></div>

<p>Eyecite can’t extract the exact date of the cited case, but it can get the start and end dates for the reporter series where the case was published, and it can also get the year from the parenthetical in the citation.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">citations</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">year</span>
<span class="mi">1975</span>
</code></pre></div></div>

<h2 id="cleaning-up-opinion-text">Cleaning up Opinion Text</h2>

<p>It’s also worth noticing how Eyecite handles “Id.” citations. I grabbed a paragraph from the Facts section of <em>Google v. Oracle America</em> with an example of an “Id.” citation. But this time, because the text looks like it probably has a problem with line breaks or whitespace, I’ll also try out Eyecite’s utility function for cleaning up opinion text.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">facts_section</span> <span class="o">=</span> <span class="s">"""Google envisioned an Android platform that was free and
    open, such that software developers could use the tools
    found there free of charge. Its idea was that more and more
    developers using its Android platform would develop ever
    more Android-based applications, all of which would make
    Google’s Android-based smartphones more attractive to ultimate consumers.
    Consumers would then buy and use ever
    more of those phones. Oracle America, Inc. v. Google Inc.,
    872 F. Supp. 2d 974, 978 (ND Cal. 2012); App. 111, 464.
    That vision required attracting a sizeable number of skilled
    programmers.
    At that time, many software developers understood and
    wrote programs using the Java programming language, a
    language invented by Sun Microsystems (Oracle’s predecessor). 872 F. Supp. 2d, at 975, 977. About six million programmers had spent considerable time learning, and then
    using, the Java language. App. 228. Many of those programmers used Sun’s own popular Java SE platform to develop new programs primarily for use in desktop and laptop
    computers. Id., at 151–152, 200. That platform allowed
    developers using the Java language to write programs that
    were able to run on any desktop or laptop computer, regardless of the underlying hardware (i.e., the programs were in
    large part “interoperable”). 872 F. Supp. 2d, at 977. Indeed, one of Sun’s slogans was “‘write once, run anywhere.’”
    886 F. 3d, at 1186."""</span>
</code></pre></div></div>

<p>To use the <code class="language-plaintext highlighter-rouge">clean_text</code> function, you pass a parameter containing the names of the cleaning functions you want to use.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">clean_facts_section</span> <span class="o">=</span> <span class="n">eyecite</span><span class="p">.</span><span class="n">clean_text</span><span class="p">(</span><span class="n">facts_section</span><span class="p">,</span> <span class="p">[</span><span class="s">"all_whitespace"</span><span class="p">])</span>
</code></pre></div></div>

<p>I can verify that the cleaning function removed some whitespace by comparing the length of the two text strings.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">facts_section</span><span class="p">)</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">clean_facts_section</span><span class="p">)</span>
<span class="mi">78</span>
</code></pre></div></div>

<h2 id="handling-short-and-id-citations">Handling Short and “Id.” Citations</h2>

<p>Running the <code class="language-plaintext highlighter-rouge">get_citations</code> method again, I found that it discovered all 5 citations.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">facts_section_citations</span> <span class="o">=</span> <span class="n">eyecite</span><span class="p">.</span><span class="n">get_citations</span><span class="p">(</span><span class="n">clean_facts_section</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">facts_section_citations</span><span class="p">)</span>
<span class="mi">5</span>
</code></pre></div></div>

<p>Eyecite has special <code class="language-plaintext highlighter-rouge">ShortCitation</code> and <code class="language-plaintext highlighter-rouge">IdCitation</code> classes that will capture all the information available from a citation even when it’s not a full citation. Eyecite’s string representation of the <code class="language-plaintext highlighter-rouge">ShortCitation</code> class still looks a little wonky in the version I’m testing…</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">facts_section_citations</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="bp">None</span><span class="p">,</span> <span class="mi">872</span> <span class="n">F</span><span class="p">.</span> <span class="n">Supp</span><span class="p">.</span> <span class="mi">2</span><span class="n">d</span><span class="p">,</span> <span class="n">at</span> <span class="mi">975</span>
</code></pre></div></div>

<p>…but by looking at the <code class="language-plaintext highlighter-rouge">token</code> attribute I can see that Eyecite found a lot of useful information.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">facts_section_citations</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">token</span>
<span class="n">CitationToken</span><span class="p">(</span>
    <span class="n">data</span><span class="o">=</span><span class="s">'872 F. Supp. 2d, at 975'</span><span class="p">,</span>
    <span class="n">start</span><span class="o">=</span><span class="mi">757</span><span class="p">,</span>
    <span class="n">end</span><span class="o">=</span><span class="mi">780</span><span class="p">,</span>
    <span class="n">volume</span><span class="o">=</span><span class="s">'872'</span><span class="p">,</span>
    <span class="n">reporter</span><span class="o">=</span><span class="s">'F. Supp. 2d'</span><span class="p">,</span>
    <span class="n">page</span><span class="o">=</span><span class="s">'975'</span><span class="p">,</span>
    <span class="n">exact_editions</span><span class="o">=</span><span class="p">(</span>
        <span class="n">Edition</span><span class="p">(</span>
            <span class="n">reporter</span><span class="o">=</span><span class="n">Reporter</span><span class="p">(</span>
                <span class="n">short_name</span><span class="o">=</span><span class="s">'F. Supp.'</span><span class="p">,</span>
                <span class="n">name</span><span class="o">=</span><span class="s">'Federal Supplement'</span><span class="p">,</span>
                <span class="n">cite_type</span><span class="o">=</span><span class="s">'federal'</span><span class="p">,</span>
                <span class="n">is_scotus</span><span class="o">=</span><span class="bp">False</span><span class="p">),</span>
            <span class="n">short_name</span><span class="o">=</span><span class="s">'F. Supp. 2d'</span><span class="p">,</span>
            <span class="n">start</span><span class="o">=</span><span class="n">datetime</span><span class="p">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">1988</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
            <span class="n">end</span><span class="o">=</span><span class="n">datetime</span><span class="p">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2014</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">21</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)),),</span>
    <span class="n">variation_editions</span><span class="o">=</span><span class="p">(),</span>
    <span class="n">short</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="n">extra_match_groups</span><span class="o">=</span><span class="p">{})</span>
</code></pre></div></div>

<p>In the short citation <code class="language-plaintext highlighter-rouge">872 F. Supp. 2d, at 975, 977</code>, the start page of the cited opinion is omitted, but Eyecite has recognized the pin cite to two different pages.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">facts_section_citations</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">pin_cite</span>
<span class="s">'975, 977'</span>
</code></pre></div></div>

<p>The next citation is an “Id.” citation, which provides even less information than a ShortCitation.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">facts_section_citations</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="n">Id</span><span class="p">.,</span> <span class="n">at</span> <span class="mi">151</span>
</code></pre></div></div>

<p>It looks like Eyecite wasn’t able to collect much from the “Id.” citation, other than the pin cite and the position of the citation in the text I provided.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">facts_section_citations</span><span class="p">[</span><span class="mi">2</span><span class="p">].</span><span class="n">__dict__</span>
<span class="p">{</span>
    <span class="s">'token'</span><span class="p">:</span> <span class="n">IdToken</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="s">'Id.,'</span><span class="p">,</span> <span class="n">start</span><span class="o">=</span><span class="mi">1041</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="mi">1045</span><span class="p">),</span>
    <span class="s">'index'</span><span class="p">:</span> <span class="mi">310</span><span class="p">,</span>
    <span class="s">'span_start'</span><span class="p">:</span> <span class="bp">None</span><span class="p">,</span>
    <span class="s">'span_end'</span><span class="p">:</span> <span class="mi">1052</span><span class="p">,</span>
    <span class="s">'pin_cite'</span><span class="p">:</span> <span class="s">'at 151'</span>
<span class="p">}</span>
</code></pre></div></div>

<p>It might look like we’re going to have to match that “Id.” citation to the case it references manually. But no! Eyecite has another trick up its sleeve. If we pass an ordered list of citations to Eyecite’s <code class="language-plaintext highlighter-rouge">resolve_citations</code> method, it’ll match up the Id. citation to the case cited by its antecedent citation.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">resolved_citations</span> <span class="o">=</span> <span class="n">eyecite</span><span class="p">.</span><span class="n">resolve_citations</span><span class="p">(</span><span class="n">facts_section_citations</span><span class="p">)</span>
</code></pre></div></div>

<p>Basically, Eyecite will use the citations its recognizes to create <code class="language-plaintext highlighter-rouge">Resource</code> objects, and then those Resources become keys for a lookup table to get all the citations that match the same Resource. When you look up the correct Resource in <code class="language-plaintext highlighter-rouge">resolved_citations</code>, it gives you all the citations that refer to that Resource, including any “Id.” citations. I think this feature is still under development, and honestly I’d like to see more documentation about how to use it efficiently. But there are definitely great gains to be made from a tool that can understand “Id.” and “Supra” citations automatically.</p>

<h2 id="annotating-citations-in-text">Annotating Citations in Text</h2>

<p>Eyecite’s <code class="language-plaintext highlighter-rouge">annotate</code> function method is exciting for anybody publishing caselaw online. It can add HTML links or other markup to the text that Eyecite just searched through for citations. CourtListener’s URL structure doesn’t seem to lend itself to automatically creating links, so instead I’ll give an example of automatically creating links to Harvard’s <a href="https://case.law/">case.law</a> website. I’ll start by getting a list of citations again.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">discussion_text</span> <span class="o">=</span> <span class="n">eyecite</span><span class="p">.</span><span class="n">clean_text</span><span class="p">(</span><span class="n">text_from_opinion</span><span class="p">,</span> <span class="p">[</span><span class="s">"all_whitespace"</span><span class="p">])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">discussion_citations</span> <span class="o">=</span> <span class="n">eyecite</span><span class="p">.</span><span class="n">get_citations</span><span class="p">(</span><span class="n">discussion_text</span><span class="p">)</span>
</code></pre></div></div>

<p>Next, I need a function that can generate the URL for a court opinion on <code class="language-plaintext highlighter-rouge">case.law</code> based on its CaseCitation object. Unfortunately Eyecite’s CaseCitation object doesn’t provide the same abbreviation style that case.law uses for the names of reporter volumes, so I had to add a mockup of a conversion table using the <code class="language-plaintext highlighter-rouge">reporter_abbreviations</code> variable. But the CaseCitation object does supply the <code class="language-plaintext highlighter-rouge">volume</code> and <code class="language-plaintext highlighter-rouge">page</code> fields for the reporter where the case is published, and the <code class="language-plaintext highlighter-rouge">pin_cite</code> field seems to be easy to transform into the format case.law needs.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">re</span>
<span class="kn">from</span> <span class="nn">urllib.parse</span> <span class="kn">import</span> <span class="n">urlunparse</span><span class="p">,</span> <span class="n">ParseResult</span>
<span class="kn">from</span> <span class="nn">eyecite.models</span> <span class="kn">import</span> <span class="n">CaseCitation</span>

<span class="k">def</span> <span class="nf">url_from_citation</span><span class="p">(</span><span class="n">cite</span><span class="p">:</span> <span class="n">CaseCitation</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="s">"""Make a URL for linking to an opinion on case.law."""</span>
    <span class="n">reporter_abbreviations</span> <span class="o">=</span> <span class="p">{</span>
        <span class="s">'U.S.'</span><span class="p">:</span> <span class="s">"us"</span><span class="p">,</span>
        <span class="s">"F. Supp."</span><span class="p">:</span> <span class="s">"f-supp"</span>
    <span class="p">}</span>
    <span class="n">reporter</span> <span class="o">=</span> <span class="n">reporter_abbreviations</span><span class="p">[</span><span class="n">cite</span><span class="p">.</span><span class="n">canonical_reporter</span><span class="p">]</span>

    <span class="k">if</span> <span class="n">cite</span><span class="p">.</span><span class="n">pin_cite</span><span class="p">:</span>
        <span class="c1"># Assumes that the first number in the pin_cite field is
</span>        <span class="c1"># the correct HTML fragment identifier for the URL.
</span>        <span class="n">page_number</span> <span class="o">=</span> <span class="n">re</span><span class="p">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s">'\d+'</span><span class="p">,</span> <span class="n">cite</span><span class="p">.</span><span class="n">pin_cite</span><span class="p">).</span><span class="n">group</span><span class="p">()</span>
        <span class="n">fragment</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"p</span><span class="si">{</span><span class="n">page_number</span><span class="si">}</span><span class="s">"</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">fragment</span> <span class="o">=</span> <span class="s">""</span>

    <span class="n">url_parts</span> <span class="o">=</span> <span class="n">ParseResult</span><span class="p">(</span>
        <span class="n">scheme</span><span class="o">=</span><span class="s">'https'</span><span class="p">,</span>
        <span class="n">netloc</span><span class="o">=</span><span class="s">'cite.case.law'</span><span class="p">,</span>
        <span class="n">path</span><span class="o">=</span><span class="sa">f</span><span class="s">'/</span><span class="si">{</span><span class="n">reporter</span><span class="si">}</span><span class="s">/</span><span class="si">{</span><span class="n">cite</span><span class="p">.</span><span class="n">volume</span><span class="si">}</span><span class="s">/</span><span class="si">{</span><span class="n">cite</span><span class="p">.</span><span class="n">page</span><span class="si">}</span><span class="s">/'</span><span class="p">,</span>
        <span class="n">params</span><span class="o">=</span><span class="s">''</span><span class="p">,</span>
        <span class="n">query</span><span class="o">=</span><span class="s">''</span><span class="p">,</span>
        <span class="n">fragment</span><span class="o">=</span><span class="n">fragment</span><span class="p">)</span>

    <span class="k">return</span> <span class="n">urlunparse</span><span class="p">(</span><span class="n">url_parts</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">url_from_citation</span><span class="p">(</span><span class="n">citations</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="s">'https://cite.case.law/us/347/201/#p219'</span>
</code></pre></div></div>

<p>Now I can write a short function to make annotations in the the expected format, and then use Eyecite to insert these links in the text anywhere that Eyecite finds a case citation.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">make_annotations</span><span class="p">(</span>
    <span class="n">citations</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">CaseCitation</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">int</span><span class="p">],</span> <span class="nb">str</span><span class="p">,</span> <span class="nb">str</span><span class="p">]]:</span>
    <span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">cite</span> <span class="ow">in</span> <span class="n">citations</span><span class="p">:</span>
        <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">cite</span><span class="p">,</span> <span class="n">CaseCitation</span><span class="p">):</span>
            <span class="n">caselaw_url</span> <span class="o">=</span> <span class="n">url_from_citation</span><span class="p">(</span><span class="n">cite</span><span class="p">)</span>
            <span class="n">result</span><span class="p">.</span><span class="n">append</span><span class="p">(</span>
                <span class="p">(</span><span class="n">cite</span><span class="p">.</span><span class="n">span</span><span class="p">(),</span>
                <span class="sa">f</span><span class="s">'&lt;a href="</span><span class="si">{</span><span class="n">caselaw_url</span><span class="si">}</span><span class="s">"&gt;'</span><span class="p">,</span>
                <span class="s">"&lt;/a&gt;"</span><span class="p">)</span>
            <span class="p">)</span>
    <span class="k">return</span> <span class="n">result</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">annotations</span> <span class="o">=</span> <span class="n">make_annotations</span><span class="p">(</span><span class="n">discussion_citations</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">annotated_text</span> <span class="o">=</span> <span class="n">eyecite</span><span class="p">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">discussion_text</span><span class="p">,</span> <span class="n">annotations</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">annotated_text</span><span class="p">)</span>
<span class="n">Copyright</span> <span class="ow">and</span> <span class="n">patents</span><span class="p">,</span> <span class="n">the</span> <span class="n">Constitution</span> <span class="n">says</span><span class="p">,</span> <span class="n">are</span> <span class="n">to</span> <span class="err">“</span><span class="n">promote</span> <span class="n">the</span> <span class="n">Progress</span> <span class="n">of</span> <span class="n">Science</span> <span class="ow">and</span> <span class="n">useful</span> <span class="n">Arts</span><span class="p">,</span> <span class="n">by</span> <span class="n">securing</span> <span class="k">for</span> <span class="n">limited</span> <span class="n">Times</span> <span class="n">to</span> <span class="n">Authors</span> <span class="ow">and</span> <span class="n">Inventors</span> <span class="n">the</span> <span class="n">exclusive</span> <span class="n">Right</span> <span class="n">to</span> <span class="n">their</span> <span class="n">respective</span> <span class="n">Writings</span> <span class="ow">and</span> <span class="n">Discoveries</span><span class="p">.</span><span class="err">”</span> <span class="n">Art</span><span class="p">.</span> <span class="n">I</span><span class="p">,</span> <span class="err">§</span><span class="mi">8</span><span class="p">,</span> <span class="n">cl</span><span class="p">.</span> <span class="mf">8.</span> <span class="n">Copyright</span> <span class="n">statutes</span> <span class="ow">and</span> <span class="n">case</span> <span class="n">law</span> <span class="n">have</span> <span class="n">made</span> <span class="n">clear</span> <span class="n">that</span> <span class="n">copyright</span> <span class="n">has</span> <span class="n">practical</span> <span class="n">objectives</span><span class="p">.</span> <span class="n">It</span> <span class="n">grants</span> <span class="n">an</span> <span class="n">author</span> <span class="n">an</span> <span class="n">exclusive</span> <span class="n">right</span> <span class="n">to</span> <span class="n">produce</span> <span class="n">his</span> <span class="n">work</span> <span class="p">(</span><span class="n">sometimes</span> <span class="k">for</span> <span class="n">a</span> <span class="n">hundred</span> <span class="n">years</span> <span class="ow">or</span> <span class="n">more</span><span class="p">),</span> <span class="ow">not</span> <span class="k">as</span> <span class="n">a</span> <span class="n">special</span> <span class="n">reward</span><span class="p">,</span> <span class="n">but</span> <span class="ow">in</span> <span class="n">order</span> <span class="n">to</span> <span class="n">encourage</span> <span class="n">the</span> <span class="n">production</span> <span class="n">of</span> <span class="n">works</span> <span class="n">that</span> <span class="n">others</span> <span class="n">might</span> <span class="n">reproduce</span> <span class="n">more</span> <span class="n">cheaply</span><span class="p">.</span> <span class="n">At</span> <span class="n">the</span> <span class="n">same</span> <span class="n">time</span><span class="p">,</span> <span class="n">copyright</span> <span class="n">has</span> <span class="n">negative</span> <span class="n">features</span><span class="p">.</span> <span class="n">Protection</span> <span class="n">can</span> <span class="k">raise</span> <span class="n">prices</span> <span class="n">to</span> <span class="n">consumers</span><span class="p">.</span> <span class="n">It</span> <span class="n">can</span> <span class="n">impose</span> <span class="n">special</span> <span class="n">costs</span><span class="p">,</span> <span class="n">such</span> <span class="k">as</span> <span class="n">the</span> <span class="n">cost</span> <span class="n">of</span> <span class="n">contacting</span> <span class="n">owners</span> <span class="n">to</span> <span class="n">obtain</span> <span class="n">reproduction</span> <span class="n">permission</span><span class="p">.</span> <span class="n">And</span> <span class="n">the</span> <span class="n">exclusive</span> <span class="n">rights</span> <span class="n">it</span> <span class="n">awards</span> <span class="n">can</span> <span class="n">sometimes</span> <span class="n">stand</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">way</span> <span class="n">of</span> <span class="n">others</span> <span class="n">exercising</span> <span class="n">their</span> <span class="n">own</span> <span class="n">creative</span> <span class="n">powers</span><span class="p">.</span> <span class="n">See</span> <span class="n">generally</span> <span class="n">Twentieth</span> <span class="n">Century</span> <span class="n">Music</span> <span class="n">Corp</span><span class="p">.</span> <span class="n">v</span><span class="p">.</span> <span class="n">Aiken</span><span class="p">,</span> <span class="o">&lt;</span><span class="n">a</span> <span class="n">href</span><span class="o">=</span><span class="s">"https://cite.case.law/us/422/151/#p156"</span><span class="o">&gt;</span><span class="mi">422</span> <span class="n">U</span><span class="p">.</span> <span class="n">S</span><span class="p">.</span> <span class="mi">151</span><span class="o">&lt;/</span><span class="n">a</span><span class="o">&gt;</span><span class="p">,</span> <span class="mi">156</span> <span class="p">(</span><span class="mi">1975</span><span class="p">);</span> <span class="n">Mazer</span> <span class="n">v</span><span class="p">.</span> <span class="n">Stein</span><span class="p">,</span> <span class="o">&lt;</span><span class="n">a</span> <span class="n">href</span><span class="o">=</span><span class="s">"https://cite.case.law/us/347/201/#p219"</span><span class="o">&gt;</span><span class="mi">347</span> <span class="n">U</span><span class="p">.</span> <span class="n">S</span><span class="p">.</span> <span class="mi">201</span><span class="o">&lt;/</span><span class="n">a</span><span class="o">&gt;</span><span class="p">,</span> <span class="mi">219</span> <span class="p">(</span><span class="mi">1954</span><span class="p">).</span>
</code></pre></div></div>

<p>We can see that the <code class="language-plaintext highlighter-rouge">annotate</code> function has inserted hyperlink markup around the citations near the end of the text passage. And by displaying the text as Markdown, we can verify that the generated links go to the right places on case.law.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">IPython.display</span> <span class="kn">import</span> <span class="n">display</span><span class="p">,</span> <span class="n">Markdown</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">display</span><span class="p">(</span><span class="n">Markdown</span><span class="p">(</span><span class="n">annotated_text</span><span class="p">))</span>
</code></pre></div></div>

<p>Copyright and patents, the Constitution says, are to “promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.” Art. I, §8, cl. 8. Copyright statutes and case law have made clear that copyright has practical objectives. It grants an author an exclusive right to produce his work (sometimes for a hundred years or more), not as a special reward, but in order to encourage the production of works that others might reproduce more cheaply. At the same time, copyright has negative features. Protection can raise prices to consumers. It can impose special costs, such as the cost of contacting owners to obtain reproduction permission. And the exclusive rights it awards can sometimes stand in the way of others exercising their own creative powers. See generally Twentieth Century Music Corp. v. Aiken, <a href="https://cite.case.law/us/422/151/#p156">422 U. S. 151</a>, 156 (1975); Mazer v. Stein, <a href="https://cite.case.law/us/347/201/#p219">347 U. S. 201</a>, 219 (1954).</p>

<p>Overall, <a href="https://github.com/freelawproject/eyecite">Eyecite</a> is a powerful tool with great potential to help the legal field gain the benefits of Python’s data analysis and data science ecosystem.</p>]]></content><author><name>Matt Carey</name></author><category term="Eyecite" /><category term="Courts-DB" /><category term="Reporters-DB" /><category term="caselaw" /><summary type="html"><![CDATA[Around the beginning of 2021, the Free Law Project extracted the code that it’s been using to link case citations within CourtListener, and released it as a new open source Python package called Eyecite. I think Eyecite could become the most widely useful open source legal analysis tool to be released by anyone so far. It seems to have incredible potential for citation network analysis, and for preparing caselaw for natural language processing. I’m sure tools like this have existed inside commercial publishers for a long time, but providing these capabilities to open source developers could make a huge difference in expanding access to law.]]></summary></entry><entry><title type="html">Using Context to Automate Legal Comparisons and Explanations</title><link href="https://mscarey.github.io/2021/04/09/context-for-comparisons.html" rel="alternate" type="text/html" title="Using Context to Automate Legal Comparisons and Explanations" /><published>2021-04-09T19:00:00+00:00</published><updated>2021-04-09T19:00:00+00:00</updated><id>https://mscarey.github.io/2021/04/09/context-for-comparisons</id><content type="html" xml:base="https://mscarey.github.io/2021/04/09/context-for-comparisons.html"><![CDATA[<p>One of the most difficult AuthoritySpoke features for users to understand has been the ability for the Factors of legal rules to have “generic context” affecting how the rules can be compared to one another. This article will try to make that concept a little clearer, and also describe how to use contexts with comparison methods like <code class="language-plaintext highlighter-rouge">.means()</code>, <code class="language-plaintext highlighter-rouge">.implies()</code>, and <code class="language-plaintext highlighter-rouge">.contradicts()</code>.</p>

<h2 id="generic-in-authorityspoke">“Generic” in AuthoritySpoke</h2>

<p>As <a href="/2021/01/25/python-template-strings.html">mentioned before</a>, the text of a Factor is like a phrase, and the terms of the Factor are like the nouns that function as the subject and objects of the phrase. If a Factor’s terms are labeled as “generic”, then the Factor can be considered to have the same meaning as another Factor that has different terms, as long as the other Factor’s terms are also labeled “generic”.</p>

<p>Here’s an example (based on the <a href="https://nettlesome.readthedocs.io/en/latest/guides/context.html">documentation for Nettlesome</a>, which is a dependency of AuthoritySpoke). Each Entity object is “generic” by default, so all four of the “terms” in the example are considered “generic”. I’ve also given each Entity a generic-sounding name, instead of a proper noun, to emphasize that each Fact could refer to many different people. However, nothing in AuthoritySpoke will stop you from using proper nouns as the names of generic terms. All the examples in this article were tested on version 0.6.0 of AuthoritySpoke.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">authorityspoke</span> <span class="kn">import</span> <span class="n">Fact</span><span class="p">,</span> <span class="n">Entity</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">poet_payment</span> <span class="o">=</span> <span class="n">Fact</span><span class="p">(</span><span class="s">"$payor made a payment to $payee"</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">terms</span><span class="o">=</span><span class="p">[</span><span class="n">Entity</span><span class="p">(</span><span class="s">"the fierce philanthropist"</span><span class="p">),</span> <span class="n">Entity</span><span class="p">(</span><span class="s">"the starving poet"</span><span class="p">)])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">henchman_payment</span> <span class="o">=</span> <span class="n">Statement</span><span class="p">(</span><span class="s">"$payor made a payment to $payee"</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">terms</span><span class="o">=</span><span class="p">[</span><span class="n">Entity</span><span class="p">(</span><span class="s">"the affable spy"</span><span class="p">),</span> <span class="n">Entity</span><span class="p">(</span><span class="s">"the devious henchman"</span><span class="p">)])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">henchman_payment</span><span class="p">)</span>
<span class="n">the</span> <span class="n">statement</span> <span class="n">that</span> <span class="o">&lt;</span><span class="n">the</span> <span class="n">affable</span> <span class="n">spy</span><span class="o">&gt;</span> <span class="n">made</span> <span class="n">a</span> <span class="n">payment</span> <span class="n">to</span> <span class="o">&lt;</span><span class="n">the</span> <span class="n">devious</span> <span class="n">henchman</span><span class="o">&gt;</span>
</code></pre></div></div>

<p>The means() method can be used to determine whether two Facts have the same meaning.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">poet_payment</span><span class="p">.</span><span class="n">means</span><span class="p">(</span><span class="n">henchman_payment</span><span class="p">)</span>
<span class="bp">True</span>
</code></pre></div></div>

<p>The explain_same_meaning() method generates an Explanation showing why the two Facts can be considered to have the same meaning.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">poet_payment</span><span class="p">.</span><span class="n">explain_same_meaning</span><span class="p">(</span><span class="n">henchman_payment</span><span class="p">))</span>
<span class="s">"""Because &lt;the fierce philanthropist&gt; is like &lt;the affable spy&gt;, and &lt;the starving poet&gt; is like &lt;the devious henchman&gt;,
  the fact that &lt;the fierce philanthropist&gt; made a payment to &lt;the starving poet&gt;
MEANS
  the fact that &lt;the affable spy&gt; made a payment to &lt;the devious henchman&gt;"""</span>
</code></pre></div></div>

<p>The illustration above shows that when AuthoritySpoke finds two Facts to be the same, it means that the relationships they describe are the same. It doesn’t at all mean that the identities of the “generic” Entities are the same.</p>

<h2 id="identifying-parallel-generic-terms">Identifying Parallel Generic Terms</h2>

<p>In AuthoritySpoke, the “context” of a comparison dictates which generic terms are considered parallel to one another. If no context parameter is passed in to a comparison method like .means(), then the comparison method will try every permutation of both Facts’ generic terms to find ways to match them all. Passing in a context parameter limits the ways that the generic terms can be matched.</p>

<p>In this example, the two Facts are compared in a context where it has been established that the philanthropist is more like the henchman, and the poet is more like the spy. (After all, to AuthoritySpoke these names are just strings, and AuthoritySpoke has no idea that they don’t sound similar.) To create this context, we pass in a tuple of two lists: a list of terms on the left that will be replaced, and a list of replacements from the right in the corresponding order. Because the context precludes the first term of <code class="language-plaintext highlighter-rouge">poet_payment</code> from being matched to the first term of <code class="language-plaintext highlighter-rouge">henchman_payment</code>, the method that checks whether these Facts have the same meaning now returns False.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">poet_payment</span><span class="p">.</span><span class="n">means</span><span class="p">(</span><span class="n">henchman_payment</span><span class="p">)</span>
<span class="p">...</span>     <span class="n">context</span><span class="o">=</span><span class="p">(</span>
<span class="p">...</span>         <span class="p">[</span><span class="n">Entity</span><span class="p">(</span><span class="s">"the fierce philanthropist"</span><span class="p">),</span> <span class="n">Entity</span><span class="p">(</span><span class="s">"the starving poet"</span><span class="p">)],</span>
<span class="p">...</span>         <span class="p">[</span><span class="n">Entity</span><span class="p">(</span><span class="s">"the devious henchman"</span><span class="p">),</span> <span class="n">Entity</span><span class="p">(</span><span class="s">"the affable spy"</span><span class="p">)])</span>
<span class="p">...</span>     <span class="p">)</span>
<span class="bp">False</span>
</code></pre></div></div>

<p>Instead of using <code class="language-plaintext highlighter-rouge">.means()</code>, we could also have used the context parameter in the same format to test whether the first Factor <code class="language-plaintext highlighter-rouge">.implies()</code> or <code class="language-plaintext highlighter-rouge">.contradicts()</code> the other. There’s more information about using a context parameter with comparison methods <a href="https://nettlesome.readthedocs.io/en/latest/guides/context.html#formatting-contexts-for-comparison-methods">in the Nettlesome documentation</a>.</p>

<h2 id="using-context-to-generalize-legal-holdings">Using Context to Generalize Legal Holdings</h2>

<p>Up to now, we’ve only compared Factors outside the scope of any legal rule or judicial holding. To build on the ideas above and do a little real legal analysis, let’s see an example of how generic context works in an example Holding based on <a href="https://www.courtlistener.com/opinion/4843029/united-states-v-harmon/">United States v. Harmon</a>, a recent case from the District Court of the District of Columbia.</p>

<p>First, we use AuthoritySpoke’s legislation download client to get the statute being interpreted. If you try out this code yourself, you’ll need to follow the <a href="https://legislice.readthedocs.io/en/latest/guides/downloading.html#using-an-api-token">directions for creating an environment variable called “LEGISLICE_API_TOKEN”</a>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">os</span>
<span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">dotenv</span> <span class="kn">import</span> <span class="n">load_dotenv</span>
<span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">authorityspoke.io.downloads</span> <span class="kn">import</span> <span class="n">Client</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">load_dotenv</span><span class="p">()</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">LEGISLICE_API_TOKEN</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s">"LEGISLICE_API_TOKEN"</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">CLIENT</span> <span class="o">=</span> <span class="n">Client</span><span class="p">(</span><span class="n">api_token</span><span class="o">=</span><span class="n">LEGISLICE_API_TOKEN</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">offense_statute</span> <span class="o">=</span> <span class="n">CLIENT</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="s">"/us/usc/t18/s1960/a"</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">offense_statute</span><span class="p">)</span>
<span class="s">'"Whoever knowingly conducts, controls, manages, supervises, directs, or owns all or part of an unlicensed money transmitting business, shall be fined in accordance with this title or imprisoned not more than 5 years, or both." (/us/usc/t18/s1960/a 2013-07-18)'</span>
</code></pre></div></div>

<p>Next, we create the Facts that the court found to be relevant to the elements of a criminal offense, and we combine them into a Holding.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">authorityspoke</span> <span class="kn">import</span> <span class="n">Entity</span><span class="p">,</span> <span class="n">Fact</span><span class="p">,</span> <span class="n">Holding</span><span class="p">,</span> <span class="n">Predicate</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">no_license</span> <span class="o">=</span> <span class="n">Fact</span><span class="p">(</span>
<span class="p">...</span>     <span class="s">"$business was licensed as a money transmitting business"</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">truth</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">terms</span><span class="o">=</span><span class="n">Entity</span><span class="p">(</span><span class="s">"Helix"</span><span class="p">))</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">operated</span> <span class="o">=</span> <span class="n">Fact</span><span class="p">(</span>
<span class="p">...</span>     <span class="s">"$person operated $business as a business"</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">terms</span><span class="o">=</span><span class="p">[</span><span class="n">Entity</span><span class="p">(</span><span class="s">"Harmon"</span><span class="p">),</span> <span class="n">Entity</span><span class="p">(</span><span class="s">"Helix"</span><span class="p">)])</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">transmitting</span> <span class="o">=</span> <span class="n">Fact</span><span class="p">(</span>
<span class="p">...</span>     <span class="s">"$business was a money transmitting business"</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">terms</span><span class="o">=</span><span class="n">Entity</span><span class="p">(</span><span class="s">"Helix"</span><span class="p">))</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">offense</span> <span class="o">=</span> <span class="n">Fact</span><span class="p">(</span>
<span class="p">...</span>     <span class="s">"$person committed the offense of conducting an unlicensed money transmitting business"</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">terms</span><span class="o">=</span><span class="n">Entity</span><span class="p">(</span><span class="s">"Harmon"</span><span class="p">))</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">offense_holding</span> <span class="o">=</span> <span class="n">Holding</span><span class="p">.</span><span class="n">from_factors</span><span class="p">(</span>
<span class="p">...</span>     <span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="n">operated</span><span class="p">,</span> <span class="n">transmitting</span><span class="p">,</span> <span class="n">no_license</span><span class="p">],</span>
<span class="p">...</span>     <span class="n">outputs</span><span class="o">=</span><span class="n">offense</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">enactments</span><span class="o">=</span><span class="n">offense_statute</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">universal</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>

<p>This Holding simply says that if a person has committed the elements of the offense, the court may convict the person of the offense.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">offense_holding</span><span class="p">)</span>
<span class="s">"""the Holding to ACCEPT
  the Rule that the court MAY ALWAYS impose the
    RESULT:
      the fact that &lt;Harmon&gt; committed the offense of conducting an
      unlicensed money transmitting business
    GIVEN:
      the fact that &lt;Harmon&gt; operated &lt;Helix&gt; as a business
      the fact that &lt;Helix&gt; was a money transmitting business
      the fact it was false that &lt;Helix&gt; was licensed as a money
      transmitting business
    GIVEN the ENACTMENT:
      "Whoever knowingly conducts, controls, manages, supervises, directs, or owns all or part of an unlicensed money transmitting business, shall be fined in accordance with this title or imprisoned not more than 5 years, or both." (/us/usc/t18/s1960/a 2013-07-18)"""</span>
</code></pre></div></div>

<p>And then we can create the more important Holding of the case, in which the court found that a bitcoin transmitting business met the statutory definition of a “money transmitting” business requiring a license.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">definition_statute</span> <span class="o">=</span> <span class="n">CLIENT</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="s">"/us/usc/t18/s1960/b/2"</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">bitcoin</span> <span class="o">=</span> <span class="n">Fact</span><span class="p">(</span>
<span class="p">...</span>     <span class="s">"$business transferred bitcoin on behalf of the public"</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">terms</span><span class="o">=</span><span class="n">Entity</span><span class="p">(</span><span class="s">"Helix"</span><span class="p">))</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">bitcoin_holding</span> <span class="o">=</span> <span class="n">Holding</span><span class="p">.</span><span class="n">from_factors</span><span class="p">(</span>
<span class="p">...</span>     <span class="n">inputs</span><span class="o">=</span><span class="n">bitcoin</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">outputs</span><span class="o">=</span><span class="n">transmitting</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">enactments</span><span class="o">=</span><span class="n">definition_statute</span><span class="p">,</span>
<span class="p">...</span>     <span class="n">universal</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">bitcoin_holding</span><span class="p">)</span>
<span class="s">"""the Holding to ACCEPT
  the Rule that the court MAY ALWAYS impose the
    RESULT:
      the fact that &lt;Helix&gt; was a money transmitting business
    GIVEN:
      the fact that &lt;Helix&gt; transferred bitcoin on behalf of the public
    GIVEN the ENACTMENT:
      "the term “money transmitting” includes transferring funds on behalf of the public by any and all means including but not limited to transfers within this country or to locations abroad by wire, check, draft, facsimile, or courier; and" (/us/usc/t18/s1960/b/2 2013-07-18)"""</span>
</code></pre></div></div>

<p>By adding the two Holdings above, we get a new Holding indicating that if a person operated a business that transferred bitcoin on behalf of the public without a “money transmitting business” license, the person may be found guilty of the offense. To generate this Holding, AuthoritySpoke finds that the terms named “Harmon” and “Helix” in <code class="language-plaintext highlighter-rouge">offense_holding</code> can be matched to the terms with the same names in <code class="language-plaintext highlighter-rouge">bitcoin_holding</code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">result</span> <span class="o">=</span> <span class="n">bitcoin_holding</span> <span class="o">+</span> <span class="n">offense_holding</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
<span class="s">"""the Holding to ACCEPT
  the Rule that the court MAY ALWAYS impose the
    RESULT:
      the fact that &lt;Harmon&gt; committed the offense of conducting an
      unlicensed money transmitting business
      the fact that &lt;Helix&gt; was a money transmitting business
    GIVEN:
      the fact that &lt;Harmon&gt; operated &lt;Helix&gt; as a business
      the fact it was false that &lt;Helix&gt; was licensed as a money
      transmitting business
      the fact that &lt;Helix&gt; transferred bitcoin on behalf of the public
    GIVEN the ENACTMENTS:
      "Whoever knowingly conducts, controls, manages, supervises, directs, or owns all or part of an unlicensed money transmitting business, shall be fined in accordance with this title or imprisoned not more than 5 years, or both." (/us/usc/t18/s1960/a 2013-07-18)
      "the term “money transmitting” includes transferring funds on behalf of the public by any and all means including but not limited to transfers within this country or to locations abroad by wire, check, draft, facsimile, or courier; and" (/us/usc/t18/s1960/b/2 2013-07-18)"""</span>
</code></pre></div></div>

<p>Finally, once that Holding is created, we can also generalize the Holding by using a new context to apply it to different generic terms.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="n">result_with_new_context</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="n">new_context</span><span class="p">(</span>
<span class="p">...</span>     <span class="p">([</span><span class="n">Entity</span><span class="p">(</span><span class="s">"Harmon"</span><span class="p">),</span> <span class="n">Entity</span><span class="p">(</span><span class="s">"Helix"</span><span class="p">)],</span>
<span class="p">...</span>    <span class="p">[</span><span class="n">Entity</span><span class="p">(</span><span class="s">"Schrute"</span><span class="p">),</span> <span class="n">Entity</span><span class="p">(</span><span class="s">"Schrute Bucks"</span><span class="p">)]))</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span><span class="p">(</span><span class="n">result_with_new_context</span><span class="p">)</span>
<span class="s">"""the Holding to ACCEPT
  the Rule that the court MAY ALWAYS impose the
    RESULT:
      the fact that &lt;Schrute&gt; committed the offense of conducting an
      unlicensed money transmitting business
      the fact that &lt;Schrute Bucks&gt; was a money transmitting business
    GIVEN:
      the fact that &lt;Schrute&gt; operated &lt;Schrute Bucks&gt; as a business
      the fact it was false that &lt;Schrute Bucks&gt; was licensed as a money
      transmitting business
      the fact that &lt;Schrute Bucks&gt; transferred bitcoin on behalf of the
      public
    GIVEN the ENACTMENTS:
      "Whoever knowingly conducts, controls, manages, supervises, directs, or owns all or part of an unlicensed money transmitting business, shall be fined in accordance with this title or imprisoned not more than 5 years, or both." (/us/usc/t18/s1960/a 2013-07-18)
      "the term “money transmitting” includes transferring funds on behalf of the public by any and all means including but not limited to transfers within this country or to locations abroad by wire, check, draft, facsimile, or courier; and" (/us/usc/t18/s1960/b/2 2013-07-18)"""</span>
</code></pre></div></div>]]></content><author><name>Matt Carey</name></author><category term="Nettlesome" /><category term="AuthoritySpoke" /><summary type="html"><![CDATA[One of the most difficult AuthoritySpoke features for users to understand has been the ability for the Factors of legal rules to have “generic context” affecting how the rules can be compared to one another. This article will try to make that concept a little clearer, and also describe how to use contexts with comparison methods like .means(), .implies(), and .contradicts().]]></summary></entry></feed>