Text mining techniques for patent analysis

Yuen Hsien Tseng, Chi Jen Lin, Yu I. Lin

Research output: Contribution to journalArticle

440 Citations (Scopus)

Abstract

Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it takes a lot of human efforts for analyses. Automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand. This paper describes a series of text mining techniques that conforms to the analytical process used by patent analysts. These techniques include text segmentation, summary extraction, feature selection, term association, cluster generation, topic identification, and information mapping. The issues of efficiency and effectiveness are considered in the design of these techniques. Some important features of the proposed methodology include a rigorous approach to verify the usefulness of segment extracts as the document surrogates, a corpus- and dictionary-free algorithm for keyphrase extraction, an efficient co-word analysis method that can be applied to large volume of patents, and an automatic procedure to create generic cluster titles for ease of result interpretation. Evaluation of these techniques was conducted. The results confirm that the machine-generated summaries do preserve more important content words than some other sections for classification. To demonstrate the feasibility, the proposed methodology was applied to a real-world patent set for domain analysis and mapping, which shows that our approach is more effective than existing classification systems. The attempt in this paper to automate the whole process not only helps create final patent maps for topic analyses, but also facilitates or improves other patent analysis tasks such as patent classification, organization, knowledge sharing, and prior art searches.

Original languageEnglish
Pages (from-to)1216-1247
Number of pages32
JournalInformation Processing and Management
Volume43
Issue number5
DOIs
Publication statusPublished - 2007 Sep 1

Fingerprint

patent
Terminology
Glossaries
Feature extraction
Engineers
Patent analysis
Patents
Text mining
methodology
research results
dictionary
technical language
decision maker
engineer
art
organization
efficiency
interpretation
demand
evaluation

Keywords

  • Clustering
  • Co-word analysis
  • Phrase extraction
  • Summarization
  • Topic mapping

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Cite this

Text mining techniques for patent analysis. / Tseng, Yuen Hsien; Lin, Chi Jen; Lin, Yu I.

In: Information Processing and Management, Vol. 43, No. 5, 01.09.2007, p. 1216-1247.

Research output: Contribution to journalArticle

Tseng, Yuen Hsien ; Lin, Chi Jen ; Lin, Yu I. / Text mining techniques for patent analysis. In: Information Processing and Management. 2007 ; Vol. 43, No. 5. pp. 1216-1247.
@article{34dc74fd42594ecb814920c7237721a2,
title = "Text mining techniques for patent analysis",
abstract = "Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it takes a lot of human efforts for analyses. Automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand. This paper describes a series of text mining techniques that conforms to the analytical process used by patent analysts. These techniques include text segmentation, summary extraction, feature selection, term association, cluster generation, topic identification, and information mapping. The issues of efficiency and effectiveness are considered in the design of these techniques. Some important features of the proposed methodology include a rigorous approach to verify the usefulness of segment extracts as the document surrogates, a corpus- and dictionary-free algorithm for keyphrase extraction, an efficient co-word analysis method that can be applied to large volume of patents, and an automatic procedure to create generic cluster titles for ease of result interpretation. Evaluation of these techniques was conducted. The results confirm that the machine-generated summaries do preserve more important content words than some other sections for classification. To demonstrate the feasibility, the proposed methodology was applied to a real-world patent set for domain analysis and mapping, which shows that our approach is more effective than existing classification systems. The attempt in this paper to automate the whole process not only helps create final patent maps for topic analyses, but also facilitates or improves other patent analysis tasks such as patent classification, organization, knowledge sharing, and prior art searches.",
keywords = "Clustering, Co-word analysis, Phrase extraction, Summarization, Topic mapping",
author = "Tseng, {Yuen Hsien} and Lin, {Chi Jen} and Lin, {Yu I.}",
year = "2007",
month = "9",
day = "1",
doi = "10.1016/j.ipm.2006.11.011",
language = "English",
volume = "43",
pages = "1216--1247",
journal = "Information Processing and Management",
issn = "0306-4573",
publisher = "Elsevier Limited",
number = "5",

}

TY - JOUR

T1 - Text mining techniques for patent analysis

AU - Tseng, Yuen Hsien

AU - Lin, Chi Jen

AU - Lin, Yu I.

PY - 2007/9/1

Y1 - 2007/9/1

N2 - Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it takes a lot of human efforts for analyses. Automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand. This paper describes a series of text mining techniques that conforms to the analytical process used by patent analysts. These techniques include text segmentation, summary extraction, feature selection, term association, cluster generation, topic identification, and information mapping. The issues of efficiency and effectiveness are considered in the design of these techniques. Some important features of the proposed methodology include a rigorous approach to verify the usefulness of segment extracts as the document surrogates, a corpus- and dictionary-free algorithm for keyphrase extraction, an efficient co-word analysis method that can be applied to large volume of patents, and an automatic procedure to create generic cluster titles for ease of result interpretation. Evaluation of these techniques was conducted. The results confirm that the machine-generated summaries do preserve more important content words than some other sections for classification. To demonstrate the feasibility, the proposed methodology was applied to a real-world patent set for domain analysis and mapping, which shows that our approach is more effective than existing classification systems. The attempt in this paper to automate the whole process not only helps create final patent maps for topic analyses, but also facilitates or improves other patent analysis tasks such as patent classification, organization, knowledge sharing, and prior art searches.

AB - Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it takes a lot of human efforts for analyses. Automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand. This paper describes a series of text mining techniques that conforms to the analytical process used by patent analysts. These techniques include text segmentation, summary extraction, feature selection, term association, cluster generation, topic identification, and information mapping. The issues of efficiency and effectiveness are considered in the design of these techniques. Some important features of the proposed methodology include a rigorous approach to verify the usefulness of segment extracts as the document surrogates, a corpus- and dictionary-free algorithm for keyphrase extraction, an efficient co-word analysis method that can be applied to large volume of patents, and an automatic procedure to create generic cluster titles for ease of result interpretation. Evaluation of these techniques was conducted. The results confirm that the machine-generated summaries do preserve more important content words than some other sections for classification. To demonstrate the feasibility, the proposed methodology was applied to a real-world patent set for domain analysis and mapping, which shows that our approach is more effective than existing classification systems. The attempt in this paper to automate the whole process not only helps create final patent maps for topic analyses, but also facilitates or improves other patent analysis tasks such as patent classification, organization, knowledge sharing, and prior art searches.

KW - Clustering

KW - Co-word analysis

KW - Phrase extraction

KW - Summarization

KW - Topic mapping

UR - http://www.scopus.com/inward/record.url?scp=34247500870&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247500870&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2006.11.011

DO - 10.1016/j.ipm.2006.11.011

M3 - Article

AN - SCOPUS:34247500870

VL - 43

SP - 1216

EP - 1247

JO - Information Processing and Management

JF - Information Processing and Management

SN - 0306-4573

IS - 5

ER -