st22 revision proposal
DESCRIPTION
ST22 revision proposal. June-2006 WIPO-SDWG meeting Geneva. Agenda. Reasons for the revision of the ST22 Age of current standard Expected benefits PCT International Bureau experience Examples of pages difficult to OCR Conclusion Discussion / Questions. Age of current standard. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/1.jpg)
ST22 revision proposal
June-2006WIPO-SDWG meeting Geneva
![Page 2: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/2.jpg)
Agenda
• Reasons for the revision of the ST22– Age of current standard– Expected benefits– PCT International Bureau experience– Examples of pages difficult to OCR– Conclusion
• Discussion / Questions
![Page 3: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/3.jpg)
Age of current standard
• Inadequate title: “Recommendation for the presentation of patent applications typed in optical character recognition (OCR) format”
• Contains valid recommendations but expressed using an old-fashioned terminology (ribbons, typewriter,…). Some recommendations need to be precised.
• A few new recommendations should be added to take into account the progress in OCR technology in the last 10 years.
• Not enough followed by agents/applicants: some promotion is required
![Page 4: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/4.jpg)
Expected benefits• Experience shows that if documents follow simple layout
rules, the automatic OCR procedures are sufficiently effective to yield a satisfying result for full text search purposes (i.e. an average accuracy above 98.5%).
• An updated standard ST22 would lead to:– Significant reductions in cost for the OCR procedures
performed by the IP regional/national offices and the IB.
– Better quality for the full-text published documents built from OCR procedures
– More efficient and precise search procedures for the IP community
![Page 5: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/5.jpg)
PCT International BureauExperience
• An internal automatic OCR system and a Quality Checking system have been developed by the PCT
• The system has been tested for 6 months and then put in production. It has been in operations since January, 1st 2006 and OCRs the pamphlets published weekly by the PCT.
![Page 6: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/6.jpg)
Internal OCR key points
• Use an off-the-shelf commercial product and adapt it to the PCT needs
• Build a generic and scalable service so that the OCR function can be used from different applications (on- line or batch) and fulfill PCT future needs
• Operate the service in house to reduce costs and gain flexibility in the publication process (discontinue Outsourcing contract)
![Page 7: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/7.jpg)
Internal OCR: key points
• OCR the description and claims sections of the published PCT pamphlets each week (circa 50’000 pages to OCR weekly)
• Provide the results as ST36 XML files that are used to feed the indexation engine of the Patentscope site and the espacenet site (see
http://www.wipo.int/pctdb/en/browse.jsp)
• Enrich the PCT electronic products with the results of the OCR (searchable PDFs added to the rule 87 DVD)
![Page 8: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/8.jpg)
Internal OCR some figures
• With our hardware configuration, the OCR of a complete publication week lasts around 16 hours (it runs during week ends).
• 5 staffs are performing part-time Quality Checking operations every Monday (Around 3 to 4 man days are spent each week on quality checking) in order to correct the worse cases.
![Page 9: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/9.jpg)
Quality Checking system
![Page 10: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/10.jpg)
Quality Checking system
![Page 11: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/11.jpg)
Some examples of difficult pages submitted in paper
or in image form, the revised ST22 standard should discourage...
![Page 12: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/12.jpg)
Narrow fonts, justified paragraphs
![Page 13: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/13.jpg)
Underline, italic, bold text
![Page 14: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/14.jpg)
Subscripts too small
![Page 15: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/15.jpg)
Mathematical formulae embedded in text
![Page 16: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/16.jpg)
Handwritten text or cursive fonts
![Page 17: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/17.jpg)
Gray or coloured backgrounds
![Page 18: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/18.jpg)
Conclusion
We invite the SDWG to:(a) to consider the proposal to revise WIPO Standard ST.22; and
(b) to consider establishing a task for the revision of WIPO Standard ST.22 and to set up a Task Force to handle such revision.
![Page 19: ST22 revision proposal](https://reader034.vdocuments.us/reader034/viewer/2022052317/56814732550346895db470da/html5/thumbnails/19.jpg)
Agenda
• Reasons for the review of the ST22– Age of current standard– Expected benefits– PCT International Bureau experience– Examples of applications difficult to OCR– Conclusion
• Discussion / Questions