A few weeks ago, Dennis Crouch shone light of the fact that the USPTO web server housing decisions of the Board of Patent Appeals and Interferences blocks search engine bots from crawling and indexing the decisions. Dennis linked directly to the server’s robots.txt file, which, as of today, still blocks access to all bots.
I had the chance to exchange emails with Hal Wegner about this problem and a proposed solution over the weekend, and figured I’d summarize our discussion on the blog in hopes of eliciting further input on a proposed solution.
The problem – search engines are actively prohibited from crawling and indexing all decisions of the Board of Patent Appeals and Interferences. The Office does provide a search feature on its ‘BPAI Reading Room‘ page, but this solution suffers from many drawbacks:
- First, the search interface is clunky and confusing (paralleling the rest of the PTO website, I suppose)
- Second, it’s not clear if the search engine accurately parses boolean search strings
- Third, results of text searches do not provide ‘results snippets’ with highlighted search terms
- Fourth, the search page is incredibly slow, even when handling relatively simple queries
Together, these drawbacks make it difficult at best to use the PTO solution for research purposes.
The cause – as highlighted by Dennis, the server has a robots.txt file that prevents the search engine bots from crawling and indexing the decision files. The use of robots.txt files is a standard (and good) practice in the maintenance of a webserver. But, the blocking that the these files implement can be tailored so that search engine bots can access (and index) the content in particular directories on a web server while blocking them from others.
(As an example of such a ‘tailored’ robots.txt file, fee free to view mine from Promote the Progress, which is always available for anyone to see: http://promotetheprogress.com/robots.txt)
Unfortunately, the current PTO robots.txt file (http://des.uspto.gov/robots.txt) for the PTO server housing BPAI decisions blocks access by all bots to all directories on the server.
This simple to understand technical primer might be helpful in understanding robots.txt files: http://www.robotstxt.org/robotstxt.html
(As an aside, the presence of this robots.txt file on the BPAI server is particularly troublesome when you consider a bit of history. As recently as a few years ago, PTO apparently did not use such bot-blocking tactics – users were able to search BPAI decisions using Google by limiting your search to the old decisions directory. Now, this hack provides an extremely limited results set – today, an unlimited search on that directory returns only 137 results. It appears, therefore, that a decision was made to actively remove Board decisions from the internet search space in favor of the flawed USPTO search page).
A proposed solution – A workable solution would be easy to implement – the Office could place all Board decisions in a directory on a single server or cluster of servers (which is, most likely, already the case) and alter the applicable robots.txt file to permit the search engines to crawl and index all files in that directory, while blocking other directories that are (legitimately) off-limits.
Once the search engine bots see this change, they will crawl and index all BPAI decisions, making them fully searchable right from Google, Yahoo! and others.
From the technical side of things, it really is this simple. The use of robots.txt files is such standard practice that no web-savvy IT department can legitimately claim ignorance as to their use and effect. PTO has a significant web presence and, as such, clearly has qualified staff that understands the technical issues involved.
As evidence of PTO’s knowledge on the subject, you need look no further than the robots.txt file on the main uspto.gov web server: http://uspto.gov/robots.txt
The contents of that file, reproduced in toto below (accessed at the time of writing this post), demonstrate that someone at PTO knows how to set up such files to selectively expose directories on the web server to search engine bots:
User-agent: *Disallow: /web/offices/dcom/olia/trilat/Disallow: /web/offices/ac/ahrpa/ohr/employment/Disallow: /web/offices/dcom/olia/oed/roster/Disallow: /web/offices/nonpto/ptos/leg/bills/Disallow: /web/gifs/Disallow: /web/offices/cio/tempsitp/Disallow: /web/access_login/Disallow: /web/access_login/trilateral/Disallow: /web/main/faq_bak/Disallow: /ebc/efs-test/
Interestingly, both full text servers for issued patents (http://patft.uspto.gov/robots.txt) and published applications (http://appft1.uspto.gov/robots.txt) block access to all files by all bots, just like the BPAI server.
Benefits of this approach – Adopting this approach would benefit both the patent community and the PTO:
Practitioners and others would have reliable, standard, and efficient access to search functionality of Board decisions.
PTO would benefit through decreased server load (the ‘BPAI Reading Room’ page would have limited, if any, utility once Google and other search engines indexed all decisions) and maintenance. PTO would no doubt recognize some degree of bandwidth savings, too, which is a legitimate concern of the Office lately.
I suspect that, with ready access to all BPAI decisions, everyone would benefit through increased accountability for the Board. Furthermore, it’s not a huge stretch, at least in my mind, to imagine that quality of both patent applications and Office action responses would increase over time as practitioners develop the habit of researching BPAI decisions as part of their existing workflows.
But wait, there’s more – As Hal pointed out in our email exchange over the weekend, the PTO could extend this approach to petition decisions as well. As Hal put it:
“Here, the PTO could end much if not all of the mystery overnight by placing all decisions on petitions – final or interlocutory – that are in files open to the public under 35 USC § 122 on a search site at the PTO that would be easy to search.”
The knowledge that could be unlocked if PTO adopted both of these changes (BPAI decisions and petitions) is nothing short of amazing. Imagine a world where you’re able to easily research BPAI decisions on a legal issue before drafting an application, filing a response or interviewing an Examiner…imagine being able to review petition decisions before deciding to file your own….or before advising a client on a particular issue. Clearly, as I said before, the ability of such access to positively impact application and response quality is not insignificant.
Making these changes could also go a long way to repairing the strain that’s affected the Office’s relationship with the community of late. As Hal stated in his email:
“In the modern, open government era of internet searching, this would be an easy exercise for the PTO to accomplish, one that could be a signal of open government, a promise of cooperation for the future.”
Violating a duty? – The patent statute imposes a dissemination duty on the PTO for patent-related information. 35 U.S.C. §2(a) provides that:
“The United States Patent and Trademark Office, subject to the policy direction of the Secretary of Commerce – … (2) shall be responsible for disseminating to the public information with respect to patents and trademarks.”
I am not aware of any caselaw or other legal authority that addresses the scope if this ‘dissemination duty,’ but it’s hard to imagine that an active shielding of BPAI decisions from all modern search engines is congruent with the legislative intent behind §2.
Regarding scope of the duty, at least one PTO official seems to believe the duty doesn’t extend beyond the threshold obligations of FOIA. From Hal’s email this weekend, quoting a “well-placed PTO official” responding to Dennis’ original post on the matter:
“I don’t fully understand what problem Professor Crouch is referencing.
“If his point is that the Annual Report indicates that we decide far more petitions than the Office posts on our e-FOIA web page, the reason for this is that the vast majority of petition decisions are not final in nature and are thus exempt from indexing under FOIA. And the USPTO does not index petition decisions that are interlocutory in nature. See for example, Leeds v. Quigg, 745 F. Supp. 1 (D.D.C. 1990).”
PTO not alone – Sadly, the PTO is not alone in its use of a broad robots.txt file to exclude search engine bots from content on government servers. Back in 2007, Declan McCullagh, CNET News’ chief political correspondent, reported on his study that found several government servers using broad bot exclusions (USPTO is noticeably absent from Declan’s list).
Declan’s proposed solution is a bit bolder than mine – bots can ignore the instructions in robots.txt files, and he suggests that search engine bots should start doing so for government servers:
“Search engines should ignore robots.txt when a government agency is trying to use it to keep its entire Web site hidden from the public.”
Perhaps, if PTO doesn’t change its approach to robots.txt files in the near future, Google and other search engines will instruct their bots to do exactly that.











Discussion
There are One response to this post, including 0 comments, 0 pings, 0 tweetbacks, and 0 related tweets.
You can join the discussion by leaving a comment on this page using the form at the bottom, by pinging the page with a post on your own site, or by adding a tweet to your Twitter page that mentions the post.
I try to collect related tweets that readers of the post will find helpful or interesting. If you know of a tweet that provides background content, a different viewpoint, or other relevant information, please send me a link to the tweet.
Comments
There are no comments on this post.
Pings
# posted on 01.22.09 at 11:06 am
[...] Patent Appeals and Interferences. I explained the technical cause and effect of the robots file in this post, and even offered a proposed solution. While researching that post, I discovered a bit of technical [...]
Tweetbacks
There are no tweetbacks to this post.
Other relevant tweets
I haven't added any other tweets to this post.
Leave a comment