patstat beyond europe

14
1 Patstat beyond Europe An insight into Patstat data from patent authorities other than EPO By Gianluca Tarasconi Madrid, 9/12/2010

Upload: enye

Post on 14-Jan-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Patstat beyond Europe. By Gianluca Tarasconi Madrid, 9/12/2010. An insight into Patstat data from patent authorities other than EP O. What is PATSTAT. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Patstat beyond Europe

1

Patstat beyond Europe

An insight into Patstat data from patent authorities other than EPO

By Gianluca TarasconiMadrid, 9/12/2010

Page 2: Patstat beyond Europe

What is PATSTAT

PATSTAT stands for EPO Worldwide Patent Statistical Database.

Contains a snapshot of the EPO master documentation database (DOCDB) which contains data of about 90 national and international patent offices with different degree of coverage.

Data include bibliographic data, citations and family links. This database is designed to be used for statistical research and requires the data to be loaded in the customer's own database.

http://www.epo.org/patents/patent-information/raw-data/test/product-14-24.html

http://forums.epo.org/epo-patstat-faqs/

2

Page 3: Patstat beyond Europe

Non EPO data vs APE-INV Name Game

Data from other patent authorities may help in:

Validate algorithms against other spellings/conventions;

Fill missing/correct data (FI address/city) using data from equivalents;

Use Patent Family(1) data to improve algorithms using other data to give a similarity score;

(1) For a list of patent family definitions see : C. Martinez Insight into Different Types of Patent Families, STI Working Paper 2010/2

3

Page 4: Patstat beyond Europe

Example (I): inpadoc family # 75, Mr Roberts

4

PUBLN_AUTH PUBLN_NR

INVT_SEQ_NR

CTRY_CODE LAST_NAME FIRST_NAME ADDRESS CITY

BG 98254 2 GB ROBERTS, TONY G.

DK 0517145 2 GB ROBERTS, TONY GORDON

EP 0517145 1 GBRoberts, Tony Gordon, Glaxo Group Research Limited Park Road, Ware

Hertfordshire, SG12 0DG

IE 921780 2 TONY GORDON ROBERTS

RU 2102393 5 TONI GORDON ROBERTS

US 5905082 2 GB Roberts Tony Gordon Ware

WO 9221676 2 GB ROBERTS, TONY, GORDON

GLAXO GROUP RESEARCH LIMITED;PARK ROAD;WARE HERTFORDSHIRE SG12 0DG

6 different spellings for name, 3 different addresses

In this case name and city are better parsed in US equivalent patent data;

Page 5: Patstat beyond Europe

Example (II): inpadoc family # 88, Mr Newman

5

WO patent data confirm that correct address is 43111 Robbins street

US patent tells us A. stand for Antony

PUBLN_AUTH PUBLN_NR INVT_SEQ_NRCTRY_C

ODE LAST_NAME ADDRESS CITY

EP 0605442 1 US

NEWMAN, Roland, A. 43111 Robbins Street

San Diego, CA 92122

EP 0854885 2 US

NEWMAN, Roland, A. 4311 Robbins Street

San Diego, CA 92122

WO 9302108 1 US

NEWMAN, ROLAND, A.

43111 ROBBINS STREET;SAN DIEGO, CA 92122

US 6136310 2 USNewman, Roland Anthony San Diego

Page 6: Patstat beyond Europe

What countries (I)

Patstat contains 92 application authorities; 45 are inside Europe; 47 are outside Europe; Contains regional/international authorities (WIPO; ARIPO…);

Contains also ‘terminated’ authorities (DDR, URSS)

6

Page 7: Patstat beyond Europe

What countries (II)

7

1 Albania (AL) 13 China (CN) 25 Estonia (EE) 37 Hungary (HU) 49 Luxembourg (LU) 61 Nicaragua (NI) 73 Russia (RU)85 United States of America (US)

2 ARIPO (AP) 14 Costa Rica (CR) 26 Egypt (EG) 38 Indonesia (ID) 50 Latvia (LV) 62 Netherlands (NL) 74 Sweden (SE) 86 Uruguay (UY)

3 Argentina (AR)15 Czechoslovakia (CS)

27 European Patent Office (EP) 39 Ireland (IE) 51 Morocco (MA) 63 Norway (NO) 75 Singapore (SG) 87 Viet Nam (VN)

4 Austria (AT) 16 Cuba (CU) 28 Spain (ES) 40 Israel (IL) 52 Monaco (MC) 54 New Zealand (NZ) 76 Slovenia (SI)

88 World Intellectual Property Organization (WO)

5 Australia (AU) 17 Cyprus (CY) 29 Finland (FI) 41 India(IN) 53 Moldova (MD) 65 OAPI (OA) 77 Slovakia (SK)89 Former Serbia and Montenegro (YU)

6 Bosnia and Herzegovina (BA) 18 Czech republic (CZ) 30 France (FR) 42 Iceland (IS)

54 Republic of Montenegro (ME) 66 Panama (PA) 78 San Marino (SM) 90 South Africa (ZA)

7 Belgium (BE)19 German Democratic republic (DD) 31 Great Britain (GB) 43 Italy (IT)

55 Former Yugoslav Republic of Macedonia (MK) 67 Peru (PE) 79 Soviet Union (SU) 91 Zambia (ZM)

8 Bulgaria (BG) 20 Germany (DE)32 Gulf Cooperation Council (GC) 44 Japan (JP) 56 Mongolia (MN)

68 The Philippines (PH) 80 El Salvador (SV) 92 Zimbabwe (ZW)

9 Brazil (BR) 21 Denmark (DK) 33 Georgia (GE) 45 Kenya (KE) 57 Malta (MT) 69 Poland (PL) 81 Tajikistan (TJ)

10 Canada (CA) 22 Algeria (DZ) 34 Greece (GR) 46 Korea (South) (KR) 58 Malawi (MW) 70 Portugal (PT) 82 Turkey (TR)

11 Switzerland (CH) 23 Eurasia (EA)35 Hong Kong S.A.R (HK) 47 Liechtenstein (LI) 59 Mexico (MX) 71 Romania (RO) 83 Taiwan (TW)

12 Chile (CL) 24 Ecuador (EC) 36 Croatia (HR) 48 Lithuania (LT) 60 Malaysia (MY)72 Republic of Serbia (RS) 84 Ukraine (UA)

(last upd. 19.4.2010)

Page 8: Patstat beyond Europe

What dimensions are relevant

8

A) data coverage (% of coverage by year)

Are data from patent authority X 100% included into Patstat from year W to year Z ?

B) Data transmission delays

How long does it take a non EPO patent to reach in PATSTAT?

C) Completeness of geographic data

How is quality (and coverage) of address / city / country code ?

Page 9: Patstat beyond Europe

Data coverage (I)

9

EPO gives partial informations

http://www.epo.org/patents/patent-information/data-quality.html

http://www.epo.org/patents/patent-information/raw-data/useful-tables.html

Total number of applications is given but not the % of total (EPO gives what it gets)

Page 10: Patstat beyond Europe

Data coverage (II): example on India

10

CC Authority DATE NUMBERS Kind of dataDOCDB

KIND CODEKind Group

Last input week

   

IN India 02/08/1975 11/05/2007 137485 203704 Patent A1, E P 2005/52

                   

In patstat are reported from EPO 66219 Indian applications

Indian Patent office reports 28.882 applications filed only for 2006

Page 11: Patstat beyond Europe

Data Transimission delays (I)

We study time series 2003- 2008 for BR, CN, JP, DE, KR and IN compared to EP;

Graph differences suggest publication lags and data transmission lags differ from country to country;

Timeseries may also highlight ‘holes’ or changes of population (FI USPTO from 2000 onward)

11

BR CN DE EP IN JP KR

2003 20878 205557 134623 137230 1047 432789 1089222004 22811 235189 111554 145312 1115 443034 1295152005 23922 287662 105002 154398 1687 447845 1605902006 13414 341493 95404 160288 1966 428966 1830372007 9197 382948 83663 160275 2195 405234 1877122008 7340 404476 73819 139610 2493 356748 175785

Page 12: Patstat beyond Europe

Data Transimission delays (II)

12

Page 13: Patstat beyond Europe

Completeness of geographic data

Table for the TOP 20 by inventor count;

13 authorities have more than 80% of records with no country code;

12 authorities have 0% of address/city;

Anyway in many cases address data are inside first name field (FI: DE)

(data from patstat 09/2009)

13

APPLNAUTH

inventors no state no zip no country no address no city

US 5960856 86% 98% 21% 97% 25%EP 3705123 100% 100% 0% 1% 1%DE 2750079 100% 100% 33% 100% 100%JP 1798271 100% 100% 98% 99% 100%CN 1537587 100% 100% 2% 100% 100%CA 1120490 100% 100% 45% 100% 100%AU 1087573 100% 100% 98% 100% 100%SU 968915 100% 100% 41% 100% 100%AT 653048 100% 100% 29% 100% 100%KR 637296 100% 100% 14% 100% 100%FR 565254 100% 100% 98% 99% 100%GB 531087 100% 100% 70% 65% 100%RU 394691 100% 100% 29% 100% 100%CH 338739 100% 100% 11% 100% 100%BR 292047 100% 100% 89% 100% 100%SE 256248 100% 100% 85% 98% 100%FI 212722 100% 100% 11% 43% 100%IT 192460 100% 100% 74% 100% 100%ES 133471 100% 100% 17% 100% 100%DD 129845 100% 100% 7% 97% 100%

Page 14: Patstat beyond Europe

Conclusions

Non EPO have coverage, quality and ‘spelling’ that may change a lot from patent authority to patent authority;

Data can be used as addictional source of information but not as main source (BONUS not MALUS);

EPO could probably improve quality of this data, especially add more addresses (FI in april 2011 will release WO address data) is up to users demand more on this topic.

14