1. 2 google session 1.about mit’s google search appliance (gsa) 2.adding google search to your web...

Post on 29-Jan-2016

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Goooogle at MIT

IT Partners, April 2005

Suzana Lisanti, Hubert Pham

google@mit.edu

2

Google Session

1. About MIT’s Google Search Appliance (GSA)

2. Adding Google search to your web site

3. Customizing search results

4. Tips on improving a site’s rankings

5. Q&A – actually, ask questions anytime!

3

MIT's Google Configuration

• MIT license is for 3M documents

• Two collections of 1.5M documents each

• MIT has over 1M web pages on 1,000 web servers

• Google follows links from the MIT Home Page

• web.mit.edu – crawled three times a week

• Other MIT web servers – crawled twice a week

4

MIT Google does

• Performs twice as well as Inktomi in a “blind test”

• Indexes 220 different file formats

• Provides control over our own crawling schedule

• Allows user customization of search results format

• Indexes certificate-restricted content(not implemented yet)

5

MIT Google does NOT

• Cache old pages

• Index image files (our decision)

• Index image ALT tags (Google’s decision)

• Allow us to fiddle with the relevancy algorithm

• Tell you “who’s linking to my page” because the GSA does not share that information across collections.

When your pages move, we recommend using a 301 redirect.

6

MIT Google does NOT index

Java, Perl, Python documentation

Debian, GNU/Linux mirrors

URLs containing these strings:

sipb.mit.edu

dev.mit.edu

net.mit.edu

lees.mit.edu

ops.mit.edu

classics.mit.edu

hypermail

pipermail

Certificate protected pages

No robots sites, no index pages

Dynamically generated pages

containing ‘?’ except by request

URLs containing cgi-bin

URLs containing /afs/

7

Telling Google not to index

• No robots in server

• No robots in locker/directory

• No robots in html file

• No index, follow

8

Avg. daily views - January 2005

0

5000

10000

15000

20000

25000

30000

5:00

7:00

9:00

11:0

013

:00

15:0

017

:00

19:0

021

:00

23:0

01:

003:

00

Series1

Total queries Jan 1 - 26: 340,656

9

Gooogle search forms

10

Simple search form

11

Sample search code

1. <form method='get' action='http://gb-server.mit.edu/search'>2. <input type='text' name='q' size='32' maxlength='255' value=''/>3. <input type='submit' name='btnG' value='Search'/>4. <input type='hidden' name='site' value='mit'/>5. <input type='hidden' name='client' value='mit'/>6. <input type='hidden' name='proxystylesheet‘ value='http://web.mit.edu/xsl/google-mit.xsl'/>7. <input type='hidden' name='output' value='xml_no_dtd'/>8. <input type='hidden' name='as_dt' value='i'/>9. <input type='hidden' name='as_sitesearch' value= 'web.mit.edu/newsoffice'/>10.</form>

Doc

12

Restrict to one directory tree

• name='as_sitesearch' value='<yoururl>'

use web.mit.edu/newsoffice not web/newsoffice

• The slash / matters  

web.mit.edu/newsoffice to include sub-directories

web.mit.edu/newsoffice/ to exclude sub-directories

• as_sitesearch allows allows you to specify one directory (and all its

sub-directories) as the domain to be searched—you cannot specify

multiple disparate directories using this option

• If you want the search feature on your site to search the entire MIT web

site, delete this parameter.

Doc

13

Restrict to multiple directories or servers

Doc

• Contact google@mit.edu and we will create a subcollection for you.

• A subcollection is a list of URL patterns that can be referred to by a single name, such as "Library".

14

Advanced search example

15

Gooogle Custom Results

You can customize the look and feel of

Google’s search results by providing a stylesheet.

16

Site-wide MIT template

17

IS&T custom results

18

IS&T Search

19IS&T Custom Results

20

Customizing results

• You provide the header and footer (HTML) wrapper, and any desired content formatting

• Google provides the raw data (XML)

GoogleResults Data

Your HTMLheader/footer

21

Results content “title” only

22

How customization works• The form points to an XSLT stylesheet

• Google returns results to query in XML

• An XSLT document translates the XML into your custom HTML

MIT-GoogleIndex

MIT-GoogleIndex

MIT-GoogleIndex

MIT-GoogleIndex

Search Query

<XML/>

Search Results

<XSLT>

Stylesheet

+HTMLResults

=

23

Notes

• It is not necessary to customize the results.

– You can place a search form on your site, and Google will use the site-wide MIT XSLT stylesheet.

• Updates to the Google service may require you to make changes in your stylesheet.

– Subscribe to google-partners@mit.edu

• WCS will provide fee-based production services for custom search results.

24

How to customize the results

• Plan how you want the results to look

• Copy the MIT Google XSLT stylesheet

http://web.mit.edu/xsl/google-mit.xsl

• Save it to web readable space, naming it

google-mysite.xsl

25

Point to your XSL

<form method='get' action='http://gb-server.mit.edu/search'><input type='text' name='q' size='32' maxlength='255' value=''/><input type='submit' name='btnG' value='Search'/><input type='hidden' name='site' value='mit'/><input type='hidden' name='client' value='mit'/><input type='hidden' name='proxystylesheet' value='http://web.mit.edu/my_dept/google-mydept.xsl'/><input type='hidden' name='output' value='xml_no_dtd'/></form>

• Update your search form to point the MIT-Google server to your custom XSLT style sheet.

26

Step-by-step customization

See

http://web.mit.edu/ist/google/stylesheets.html

27

Documentation

• http://web.mit.edu/ist/google/

(Includes the “official” Google documentation, including their XML specification; also XSLT tips.)

• Search Engine Submission Tips http://searchenginewatch.com/webmasters/Using SS for an

• Effective SEO Campaignhttp://www.alistapart.com/articles/seo/

28

Support

• The MIT Google team will support your creating a Google search form and answer queries sent to google@mit.edu

• WCS offers fee-based production services for custom search results

HTMLResults

29

Q&A

top related