yappo groonga - with japanese search software history @ osdc.tw 2011
DESCRIPTION
http://www.youtube.com/watch?v=e9lxTRTKHWUTRANSCRIPT
![Page 1: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/1.jpg)
Groonga
OSDC.tw 2011 Yappo(大沢和宏)
with japanese search software history
yappo {aT} shibuya {dOt} plhttp://blog.yappo.jp/
http://github.com/yappo/http://search.cpan.org/~yappo/
http://twitter.com/yappo
2011年3月28日月曜日
![Page 2: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/2.jpg)
Profile
• Yappo• from 東京
2011年3月28日月曜日
![Page 3: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/3.jpg)
employer
2011年3月28日月曜日
![Page 4: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/4.jpg)
our service is
• Ficia• Pikubo
2011年3月28日月曜日
![Page 5: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/5.jpg)
2011年3月28日月曜日
![Page 6: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/6.jpg)
2011年3月28日月曜日
![Page 7: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/7.jpg)
my latest topic
• iphone web site development
• jquery mobile hack
• tiny perl hack
2011年3月28日月曜日
![Page 8: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/8.jpg)
2011年3月28日月曜日
![Page 9: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/9.jpg)
agenda
• yappo with search software• japanese search software’s topic• Groonga
2011年3月28日月曜日
![Page 10: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/10.jpg)
yappo with search software
2011年3月28日月曜日
![Page 11: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/11.jpg)
Since 1997I started Search Engine Service 'Yappo' at 1997.‘Yappo’ is Service Name.
very cheap, using grep.
I use Rental server, im banned server, because high load avg service.
2011年3月28日月曜日
![Page 12: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/12.jpg)
Since 1998
I made search engine software for ISP with work.
modern than grep.
i wrote indexer, searcher by C-lang.
2011年3月28日月曜日
![Page 13: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/13.jpg)
Since 1999I started Search Engine Service i'Yappo' at 1999.iYappo for japanese mobile device (i-mode).
Crawler, indexer, searcher is self development.
but switch to another software,because maintenance very hard.
2011年3月28日月曜日
![Page 14: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/14.jpg)
i was using search software from ancient times.
2011年3月28日月曜日
![Page 15: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/15.jpg)
history of search software in japan
2011年3月28日月曜日
![Page 16: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/16.jpg)
use grepOne of the easiest ways to search is grep.It's sometimes called "idiot search" in Japan.
It's not good for searching lots of documentations... and it's slow.
However, it does have merit; it's easy to implement.
2011年3月28日月曜日
![Page 17: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/17.jpg)
using indexYou need to know which document contains which word to search things quickly.It's easy for English.But it's very difficult for Japanese.
2011年3月28日月曜日
![Page 18: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/18.jpg)
word separateBecause English sentences are basically separated by white space.(You need to handle declension and conjugation though)
You can't easily tell which character belongs to which word in Japanese.
2011年3月28日月曜日
![Page 19: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/19.jpg)
in english
• "today is rainy."• "today", "is", "rainy"So you can write a simple tokenizer for English by splitting sentences on
white space.
2011年3月28日月曜日
![Page 20: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/20.jpg)
in japanese
• "今日は雨です。"
• "今日", "は", "雨", "です", "。"
Japanese sentences are not separated by white space. So you need to know the
meaning and contextof the words in the sentence.
You can't split based on whether a character is Kanji or Kana.
2011年3月28日月曜日
![Page 21: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/21.jpg)
in japanese2
• "私ははずかしいです。"
• "私", "は", "はずかしい", "です", "。"(in english "I am ashamed")
In Japanese you can transliterate with Kana instead of writing in Kanji. It makes tokenizing more difficult.
2011年3月28日月曜日
![Page 22: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/22.jpg)
解決方法
2011年3月28日月曜日
![Page 23: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/23.jpg)
2011年3月28日月曜日
![Page 24: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/24.jpg)
形態素解析 (詞素解析)KAKASHI is often perform wrong tokenize, because a longest-first search algorithm.
Morphological analysis is precision is high, because it use the grammar that learned.
2011年3月28日月曜日
![Page 25: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/25.jpg)
2011年3月28日月曜日
![Page 26: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/26.jpg)
2011年3月28日月曜日
![Page 27: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/27.jpg)
n-grambut, MeCab has limit of tokenizeしかしながら、立派なアルゴリズムを駆使しても限界があります。but, MeCab has limit of tokenize.
辞書を使うため、新しい名前がわからない。MeCab not have new words, because using dictionary.
- ex. けいおん, K-ON
日本人は言葉を作るのが大好きなので追いつかない。Japanese people like "Create New Words".
- ex. Twitter -> ヒウィッヒヒー
このような欠点を回避するためn-gramを使う事もあります。It will solve, if N-Gram is used.
but, MeCab has limit of tokenize.MeCab not have new words, because using dictionary.
- ex. けいおん, K-ON
Japanese people like "Create New Words".
- ex. Twitter -> ヒウィッヒヒー
It will solve, if N-Gram is used.2011年3月28日月曜日
![Page 28: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/28.jpg)
3-gram example•けいおん -> "けいお", "いおん"
• K-ON -> "K-O", "-ON"• Twitter -> "Twi", "wit", "itt", "tte", "ter"•ヒウィッヒヒー -> "ヒウィ", "ウィッ", "ィッヒ", "ッヒヒ", "ヒヒー"
2011年3月28日月曜日
![Page 29: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/29.jpg)
summary of word separator
• japanese is too hard• but, we have a solution means
• - Morphological analysis
• - n-gram
2011年3月28日月曜日
![Page 30: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/30.jpg)
another search software in japan
2011年3月28日月曜日
![Page 31: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/31.jpg)
Namazuhttp://www.namazu.org/
Namaze in english is catfish(鮎)
Namazu is developed in Japan since early times, with an indexer and searcher.Not suitable for embedding to another system.
2011年3月28日月曜日
![Page 32: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/32.jpg)
HyperEstraierhttp://fallabs.com/hyperestraier/
Released at 2004, with a crawler, indexer, searcher.
Suitable for embedding to another system.
2011年3月28日月曜日
![Page 33: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/33.jpg)
Rasthttp://projects.netlab.jp/rast/
Released at 2005, developed by NaCl, suitable for embedding.
I wrote a Perl binding for it.Used in iYappo, but stopped using it because of the troubles in indexing.
Rast is now deprecated.
2011年3月28日月曜日
![Page 34: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/34.jpg)
Sennahttp://qwik.jp/senna/
Released at 2005, suitable for embedding.A Perl binding was wrote by @lestrrat.
Senna can be integrated into MySQL full text search system.
2011年3月28日月曜日
![Page 35: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/35.jpg)
tritonnhttp://qwik.jp/tritonn/
tritonn is a project to manage a patch of MySQL to integrate Senna.
> SELECT * FROM tbl WHERE MATCH(col) AGAINST("検索キーワード");> INSERT INTO t1 VALUES (3, "東京特許許可局");
Used in iYappo, very usefl.but, Senna is depricated.
2011年3月28日月曜日
![Page 36: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/36.jpg)
次世代の検索システHyperEstraier, Senna, Rast が同時期にリリースされて、日本国内でも次世代の検索システムのブームがきました。どれも使いやすいライブラリとして提供されているため、hacker達が気軽に検索機能を追加出来るようになったのです。例えば Plagger では Plagger::Plugin::Search::Estraier, Plagger::Plugin::Search::Rast, Plagger::Plugin::Search::Senna などが作られました。
2011年3月28日月曜日
![Page 37: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/37.jpg)
good book検索問題、良書 in japan.
this book written by senna/groonga developer.
2011年3月28日月曜日
![Page 38: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/38.jpg)
Probably, you have question to "what about Lucene and
else?".
2011年3月28日月曜日
![Page 39: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/39.jpg)
Lucene is not domestic in japan. Therefore, I do not
talk a topic.
2011年3月28日月曜日
![Page 40: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/40.jpg)
2011年3月28日月曜日
![Page 41: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/41.jpg)
Groonahttp://groonga.org/
Groonga とは、 Senna の開発者が作った新しい検索ソフトウェアです。私は開発に参加していません。 sory, i am not a developer.Senna の欠点を補いつつ、高機能化したものです。"1から書き直した方が良いよね"、という良くある話です。彼らの会社のプロダクトで使う機能が実装されています。
2011年3月28日月曜日
![Page 42: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/42.jpg)
2011年3月28日月曜日
![Page 43: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/43.jpg)
Groonga spec• bundle groonga daemon(HTTP, memcached
protocol, groonga protocol)
• suitable for embedding
• geolocation search
• 高速な集計クエリ
• groonga has not english document ;( (i was surprised)
• hack is too hard, because thin documents2011年3月28日月曜日
![Page 44: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/44.jpg)
install$ wget http://groonga.org/files/groonga/groonga-1.1.0.tar.gz$ tar zxvf groonga-1.1.0.tar.gz && cd groonga-1.1.0$ ./configure --prefix=/usr --localstatedir=/var$ make && sudo make install
2011年3月28日月曜日
![Page 45: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/45.jpg)
tiny demos for
gronnga CLI
2011年3月28日月曜日
![Page 46: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/46.jpg)
MySQL Storagehttps://github.com/mroonga
Groonga has mysql storage engine plugin.same the Tritonn.
2011年3月28日月曜日
![Page 47: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/47.jpg)
example 1/2mysql> CREATE TABLE t1 ( > c1 INT PRIMARY KEY, > c2 TEXT, > _score FLOAT, > FULLTEXT INDEX (c2) > ) ENGINE = groonga DEFAULT CHARSET utf8;Query OK, 0 rows affected (0.22 sec)
2011年3月28日月曜日
![Page 48: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/48.jpg)
example 2/2mysql> insert into t1 values(1, "aa ii uu ee oo", null);Query OK, 1 row affected (0.00 sec)mysql> insert into t1 values(2, "aa ii ii ii oo", null);Query OK, 1 row affected (0.00 sec)mysql> insert into t1 values(3, "dummy", null);Query OK, 1 row affected (0.00 sec)
mysql> select * from t1 where match(c2) against("ii") order by _score desc;+----+----------------+--------+| c1 | c2 | _score |+----+----------------+--------+| 2 | aa ii ii ii oo | 3 || 1 | aa ii uu ee oo | 1 |+----+----------------+--------+2 rows in set (0.00 sec)
2011年3月28日月曜日
![Page 49: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/49.jpg)
bindings
• Python• PHP• Ruby (rroonga/ラングバ)
http://groonga.rubyforge.org/
2011年3月28日月曜日
![Page 50: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/50.jpg)
perl bindinghttps://github.com/yappo/p5-Groonga
i written perl binding of Groonga.I'm working for it, but it's not yet completed.
2011年3月28日月曜日
![Page 51: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/51.jpg)
package main { no utf8;
my $path = 'tag_keys.db'; my $pat = Groonga::PatriciaTrie->new; if (! $pat->open($path)) { $pat->create($path, 1024, 1024, GRN_OBJ_KEY_VAR_SIZE | GRN_OBJ_KEY_NORMALIZE) or die 'Groonga::PatriciaTrie create error'; } $pat->add('ガッ', ''); $pat->add('muteki', ''); $pat->add('yappo', '');
my $text = 'muTEki マッチしない Yappo <> ガッ'; my $replace = $pat->tag_keys($text, sub { my($record, $word, $record_id) = @_; sprintf '<span class="keyword">%s(%s)</span>', $record, $word; });
say $replace;}
__END__<span class="keyword">muTEki(muteki)</span> マッチしない <span class="keyword">Yappo(yappo)</span> <> <span class="keyword">ガッ(ガッ)</span>
2011年3月28日月曜日
![Page 52: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/52.jpg)
Summary of this talk
•Making search engines in Japanese.
• One of the hot topics is Groonga.• There's no English document ;( We'll write it in the near future.
• I'm happy if it interests you
2011年3月28日月曜日
![Page 53: Yappo Groonga - with japanese search software history @ osdc.tw 2011](https://reader033.vdocuments.us/reader033/viewer/2022052504/54be751b4a795913778b460d/html5/thumbnails/53.jpg)
謝謝2011年3月28日月曜日