6- Sigma Patent DataQuality


ICO Enterprise IAM
ICO Patent Awareness
ICO Product Brochures
ICO Patent Research
ICO Tours & Demos


Patent databases that have traditionally been considered the “standard of practice” contain thousands of data errors, and are increasingly being proven to have created a false confidence. Relying on such poor data quality creates an unacceptably high exposure to patent infringement, invalidity, or lost licensing opportunities, legal liability and exposure the significant financial loss.Despite what seasoned researchers think about their old “tried and true” patent databases, the facts contradict these quality assumptions.

Patent data is generated by scanning paper patents, then using OCR technology to extract searchable patent text. Even on the best day, OCR technology generates nearly a 3% error rate (applied to the US patent database, that’s 10s of thousands of patent data errors). Many foreign patent databases have higher error rates.

A recent random sampling on a popular online commercial patent search engine found more than a dozen US patents containing NO claims where patent claims should have existed in the full text, and one issued patent was completely missing from its database. Practically speaking, when conducting a Boolean search on the claims, these patents “do not exist”, and would not be returned in the search results.

The average data error rate of nearly ALL searchable patent databases is only slightly better than 4-Sigma, or about 2,500 critical errors per million patents! Searching patents on 4-Sigma quality databases means that on any given patent search, there is a high likelihood of not finding up to 6,500 relevant patents depending on search criteria.

PatentCafe’s premier ICO Global Patent Database is the global patent repository originally built and maintained to a 6-SIGMA quality target. 6-Sigma, which has become the worldwide quality standard adopted by industry, is a statistical quality level characterized by 3.4 errors per million.

All of PatentCafe’s patent data is converted to a standard XML format. Each patent is parsed into nearly 100 separate data fields. These data fields are then verified to exist, checked for completeness, then imported into our database (over 20 million patents) using our proprietary data processing software.

Every single patent not meeting our 6- SIGMA quality target is flagged. We then manually repair, replace, re-scan, or otherwise correct every critical data error we identify.

See the 6-SIGMA Patent Quality White Paper (HERE).Single User Accounts – Sign up Immediately Online (HERE)
Multi-user Accounts – Request Demo or Product Information (HERE)

Company • Privacy Policy • Terms of Use • Partner With Us • Site Map• Contact Us