nitish manocha. platforms §aix workstation §os/390 §sun solaris §windows nt
Post on 20-Dec-2015
232 views
TRANSCRIPT
Tools to Use
Clustering Tool (Finding Similar Information) Dividing Documents into Groups Identifying hidden similarities in documents Identifying duplicate documents from a
collection Finding Documents that are out of place
Text Analysis Tool Using Feature Extraction tool to extract names
imzxrun -b 2 -f C -x n -o faculty.out faculty.htm
Tools to Use
Language Identification Tool Organize collection of documents by language Restrict Search Results to documents in a
particular language
Text Analysis Tool
Language Identification Tool Results Supports 13 Languages, New Languages Can
be trained
Tools to Use
Web Crawler Follows the Link topology for a fast search Produces a Web Site Map Use to Recognize the Authoritative pages Provides a filtered collection of pages
Web Crawler
imyclean - to define a web space Created include.re , exclude.re, types.re
imycrawl - to crawl a defined web space imycrawl url webspace
imystat - to track what happens during a crawl
Tools to Use
Text Search Engine Complicated Text Search Powerful Linguistic Capabilities Fuzzy searches Query based on structure of document
Text Search Engine
Types of Index Linguistic Index (bought as buy) Feature Index (Linguistics + Names) Precise Index (bought as bought) Normalized Precise Index (Case Insensitive) Ngram Index
Combining Tools for Solutions
Searching with Categories combining Text Search Engine and Topic
Categorization Tool
Surviving a flood of email by using Topic Categorization Tools
Selectively indexing Web Pages by combining Web Crawler, Topic
Categorization Tool & Text Search Engine