I tried out the CollateX Python library to see if it seems useful for visualizing similar text passages about legal doctrines, especially caselaw. I used a very simple example dataset consisting of a holding from Loper Bright Enterprises v. Raimondo (the opinion that abolished Chevron deference) and two passages from later opinions restating the holding of Raimondo (Tennessee v. Becerra and United States v. Trumbull). You can find the exact code I used in this Jupyter Notebook.

I thought CollateX’s out-of-the-box “html” mode visualization was potentially useful. In this case it might help readers notice that while the Supreme Court said a certain “argument” was not a basis for overruling a holding, one of the lower courts restated this to assert that an “error” was not a basis for overruling.

Here’s the table CollateX produced to compare the three passages.

Raimondo Trumbull Becerra
The The The
holdings of those
cases that specific
agency actions are
lawful—including the
Clean Air Act
holding of
- Supreme
- Court Court
- acknowledged its “ cautioned litigants
hoping to rehash or
relitigate
previously settled
issues decided on
Chevron - Chevron
itself—are still
subject to statutory
stare decisis
despite our
- that “[m]ere
change in
interpretive
methodology
change in
interpretive
methodology
-
. Mere - -
reliance on Chevron
cannot constitute a
- reliance on Chevron
cannot constitute a
- -
special
justification
- special
justification
-
for overruling such
a holding
meant that these
precedents were
“wrongly decided,”
but explained
for overruling such
a holding
, because to say a
precedent relied on
Chevron is
- .” To argue as much,
the Court continued
, - ,
- - would “
at best - at best
, “ - ,” be “
just an argument - just an argument
that that that
the precedent was
wrongly decided
mere error the precedent was
wrongly decided
- - .” Id
. - .
That - And this, as the
majority concluded,
is is is
-
not enough to
justify overruling a
statutory precedent
not enough to
justify overruling a
statutory precedent
not enough to
justify overruling a
statutory precedent
. .” .”

I thought the “svg” and “svg_simple” modes were not very readable, because they were a little too chaotic and cluttered with arrows and labels. I didn’t find any option for the svg charts to be vertically oriented, so even with this relatively short text I got an extremely wide image, which Jupyter and Jekyll both wanted to shrink down to an unreadable size.

Left-to-right flowchart comparing the three opinion snippets

(CollateX also has an “html2” mode that uses eyeball-melting background colors to “draw your attention” to cells containing unique text.)

CollateX has a “fuzzy” matching mode, where similar strings can be shown side-by-side in the visualization. However, the fuzzy mode is only available when CollateX is set to put only a single word or “token” in each cell of the table, which seems like it would be far less readable. There are also JSON and XML output modes that could be used to export the data to alternate visualization tools.

The primary use case of CollateX is comparing alternate versions of the same work, which is not exactly what I did here. While a later court opinion might choose to restate a significant passage from an earlier opinion, it might also insert commentary or quote parts of the earlier opinion out of order. If the two texts aren’t intended to be the same, then it may not be useful for a collation to show that there are lots of differences. To determine whether it’s appropriate to show a text collation to a user, a publisher would need a reliable way of detecting situations where a later court is trying to restate a holding from an earlier opinion, but not using the exact same words.

CollateX is available under GNU General Public License v3.0.