Comparing Some Opinion Text with CollateX

I tried out the CollateX Python library to see if it seems useful for visualizing similar text passages about legal doctrines, especially caselaw. I used a very simple example dataset consisting of a holding from Loper Bright Enterprises v. Raimondo (the opinion that abolished Chevron deference) and two passages from later opinions restating the holding of Raimondo (Tennessee v. Becerra and United States v. Trumbull). You can find the exact code I used in this Jupyter Notebook.

I thought CollateX’s out-of-the-box “html” mode visualization was potentially useful. In this case it might help readers notice that while the Supreme Court said a certain “argument” was not a basis for overruling a holding, one of the lower courts restated this to assert that an “error” was not a basis for overruling.

Here’s the table CollateX produced to compare the three passages.

Raimondo	Trumbull	Becerra
The	The	The
holdings of those cases that specific agency actions are lawful—including the Clean Air Act holding of	-	Supreme
-	Court	Court
-	acknowledged its “	cautioned litigants hoping to rehash or relitigate previously settled issues decided on
Chevron	-	Chevron
itself—are still subject to statutory stare decisis despite our	-	that “[m]ere
change in interpretive methodology	change in interpretive methodology	-
. Mere	-	-
reliance on Chevron cannot constitute a	-	reliance on Chevron cannot constitute a
”	-	-
special justification	-	special justification
”	”	-
for overruling such a holding	meant that these precedents were “wrongly decided,” but explained	for overruling such a holding
, because to say a precedent relied on Chevron is	-	.” To argue as much, the Court continued
,	-	,
-	-	would “
at best	-	at best
, “	-	,” be “
just an argument	-	just an argument
that	that	that
the precedent was wrongly decided	mere error	the precedent was wrongly decided
-	-	.” Id
.	-	.
That	-	And this, as the majority concluded,
is	is	is
-	”	”
not enough to justify overruling a statutory precedent	not enough to justify overruling a statutory precedent	not enough to justify overruling a statutory precedent
.	.”	.”

I thought the “svg” and “svg_simple” modes were not very readable, because they were a little too chaotic and cluttered with arrows and labels. I didn’t find any option for the svg charts to be vertically oriented, so even with this relatively short text I got an extremely wide image, which Jupyter and Jekyll both wanted to shrink down to an unreadable size.

Left-to-right flowchart comparing the three opinion snippets

(CollateX also has an “html2” mode that uses eyeball-melting background colors to “draw your attention” to cells containing unique text.)

CollateX has a “fuzzy” matching mode, where similar strings can be shown side-by-side in the visualization. However, the fuzzy mode is only available when CollateX is set to put only a single word or “token” in each cell of the table, which seems like it would be far less readable. There are also JSON and XML output modes that could be used to export the data to alternate visualization tools.

The primary use case of CollateX is comparing alternate versions of the same work, which is not exactly what I did here. While a later court opinion might choose to restate a significant passage from an earlier opinion, it might also insert commentary or quote parts of the earlier opinion out of order. If the two texts aren’t intended to be the same, then it may not be useful for a collation to show that there are lots of differences. To determine whether it’s appropriate to show a text collation to a user, a publisher would need a reliable way of detecting situations where a later court is trying to restate a holding from an earlier opinion, but not using the exact same words.

CollateX is available under GNU General Public License v3.0.