Thursday, January 24, 2008

PreCYdent legal search engine
Tom Smith

Here is the link to the alpha version of the PreCYdent legal search engine created by the startup I co-founded with Antonio Tomarchio, a mathematical engineer from the Politecnico di Milano, and a team of very dedicated engineers in Italy.

Right now our library consists of all US Supreme Court cases and US Court of Appeals cases going back to the 1950s (i.e. F.3d and F.2d).  Automatic updaters are in place, so new cases are uploaded in slip opinion form as soon as they are released by these courts.  We are working on having the last ten years of cases from all 50 states available soon.  Everything is in XML.

It's free.  We believe that all law that is in the public domain should be available to everybody for free.   Personally, I think I paid for it once already around April 15th or so.

We are especially proud of our search technology.  It is based on the legal citation network in a way somewhat analogous to how Google's PageRank is based on the link network of the Web.    However, the legal citation network is its own animal, so a lot of work was required to create an algorithm that would exploit the unique characteristics of the Web of Law.  PreCYdent ranks results by "authority" -- something only we do.  It is much, much more sophisticated than mere citation count, and it appears to work really well.

Search quality is hard to measure, but based on our tests, we believe our search recall and precision are on the order of three to four times better than that of the market leader's natural language search.  Try it out and see what you think.  You can do this experiment at home:  Write down a search string in Google-style ("takings private use commercial development" or whatever) then write down the list of cases (preferably 20, but as many as you can) and then run that search string on PreCYdent (using the "authority" ranking) and also on the natural language search of the leading online legal research service (which, unless you are at a big law firm, are a student, or law professor, won't be free).  Then run it on PreCYdent.  Count how many of the most important cases you wrote down appear in the first 20 results of their results and the first 20 results on PreCYdent.  We did that with 200 searches and were very pleased with the results.

There's a feedback button on the upper right of results pages.  This feedback goes directly to the team in Italy and they will take your comments very seriously.  This is a true alpha;  we are still very much in development.  There are plenty of rough spots, but we think early users will help us fix those.

| Permalink

TrackBack URL for this entry:

Listed below are links to weblogs that reference PreCYdent legal search engine
Tom Smith


I'll try this out on my Lawyering Skills appellate brief... looks like it would be useful!

Posted by: Andrew | Jan 24, 2008 8:43:01 PM


I'm also looking forward to the "introduction to the law" section being populated.

Do you make your money via advertising?

Corkie the Dog

Posted by: Corkie the Dog | Jan 25, 2008 10:50:48 AM

Yes -- the model is to be ad-supported and Web 2.0. Even 3.0 if we manage to deploy some things we are working on.

We are going to populate the upload documents with a bunch of stuff in the public domain but not that easy to access or search, then invite people to upload away. Intro to law will be a lot of stuff for people just looking to find out the basics about areas of law of interest to the general public

Posted by: Tom Smith | Jan 25, 2008 3:19:54 PM

If it has easily browsable versions of FRCP / FRAP / USC / etc, I'll use it constantly. Both Lexis and Westlaw make it far too time consuming to just find the text of a simple rule / statute - I usually end up Googling and reading it off the Cornell website or something similar.

Posted by: Andrew | Jan 25, 2008 3:39:09 PM

Professor --

Here's one thing you should fix -- searches involving hyphenated terms are, in essence, not allowed. For example, I tried a search for "courts-martial jurisdiction over civilians." The PreCYdent search reads this the same as a search for "courts -martial" -- and excludes any opinions with the word "martial" therein. The space is the key -- without it, the search engine should search for the term as it is written, hyphenated. Without the space, the search should exclude the term following the "negative" sign. This is how Google works (contrast a search between courts-martial and courts -martial).

Posted by: USD 3L | Jan 26, 2008 1:54:36 PM

USD 3L -- we will check that out

Posted by: Tom Smith | Jan 26, 2008 7:16:24 PM