test

11
Unit 5 Part 2: More? What more is there? Search engines are not the only search tools, that definition includes what-we-call-search- engines but is not limited to them. Other popular search tools are subject directories (like LII, discussed in Unit 4), and federated or meta-search engines and now social networking sites like Facebook, Digg, Del.icio.us, Flickr, and YouTube - where you can search by following a tag or a link or someone you know. In general: search tools are the way in which we can find things via the Internet that are stored on someone else's computer. For search engines, this is done by a computer program using something called a search algorithm which we started talking about in the lecture on Search and continue below. For subject directories, and social networking sites, humans play a MUCH more active role, but then there is often an algorithm of sorts used to order results. There are two other library classes, LR 011 and LR 001 that discuss other search tools in more depth, but for this class, we are going to stick with search engines. When someone says search engine, you think Google, Yahoo, AOL, or something like that, don't you? Most people do. However, it is an imprecise term, search engines are actually just pieces of code that allow a regular person to give an instruction to a machine, requesting the retrieval of a document. For the purposes of most of this conversation we will go ahead with the popular definition, but keep in mind the other. The ALGORITHM Because the world seems to be getting more and not less "google-ized", in that more tools are operating like Google, it is good to understand a bit more about the algorithms behind the search box. The search algorithm takes what you type in and turns it into something called a search "string" which is what the machine understands. A very VERY simplified search algorithm would go something like this - You type global warming into the search engine text box. The search engine interface translates that into (hint: read the next lines aloud, where +=plus, even if it feels dumb -- it will make more sense I think): 1. Search for string object g+l+o+b+a+l AND string object w+a+r+m+i+n+g 2. IF complete search string is found put DOCUMENT in this pile 3. Repeat for every web page in the database

Upload: lia-thomas

Post on 12-Mar-2016

212 views

Category:

Documents


0 download

DESCRIPTION

dfgdsgfdhnmnhm

TRANSCRIPT

Page 1: test

Unit 5 Part 2: More?

What more is there?

Search engines are not the only search tools, that definition includes what-we-call-search-

engines but is not limited to them. Other popular search tools are subject directories (like LII,

discussed in Unit 4), and federated or meta-search engines and now social networking sites like

Facebook, Digg, Del.icio.us, Flickr, and YouTube - where you can search by following a tag or a

link or someone you know.

In general: search tools are the way in which we can find things via the Internet that are stored

on someone else's computer. For search engines, this is done by a computer program using

something called a search algorithm which we started talking about in the lecture on Search

and continue below. For subject directories, and social networking sites, humans play a MUCH

more active role, but then there is often an algorithm of sorts used to order results.

There are two other library classes, LR 011 and LR 001 that discuss other search tools in more

depth, but for this class, we are going to stick with search engines.

When someone says search engine, you think Google, Yahoo, AOL, or something like that, don't

you? Most people do. However, it is an imprecise term, search engines are actually just pieces

of code that allow a regular person to give an instruction to a machine, requesting the retrieval

of a document. For the purposes of most of this conversation we will go ahead with the

popular definition, but keep in mind the other.

The ALGORITHM

Because the world seems to be getting more and not less "google-ized", in that more tools are

operating like Google, it is good to understand a bit more about the algorithms behind the

search box. The search algorithm takes what you type in and turns it into something called a

search "string" which is what the machine understands. A very VERY simplified search algorithm

would go something like this - You type global warming into the search engine text box.

The search engine interface translates that into (hint: read the next lines aloud, where +=plus,

even if it feels dumb -- it will make more sense I think):

1. Search for string object g+l+o+b+a+l AND string object w+a+r+m+i+n+g

2. IF complete search string is found put DOCUMENT in this pile

3. Repeat for every web page in the database

Page 2: test

THEN once the search algorithm has done its job and you have 200,000 pages that include the

search string g+l+o+b+a+l w+a+r+m+i+n+g, the ranking algorithm takes over (for Google it is

called the PageRankℵ algorithm) and puts the pile in some sort of order for you to view. That

happens (again, this is VERY simplified, the actual algorithm is a VERY, VERY LONG

mathematical formula) like this:

IF both string objects appear together, in the same order typed, give 5 points AND

IF both strings appear in the order typed, in title of page, give 2 points AND

IF both strings appear in the order typed in metadata about page, give 4 points AND

IF both strings appear in the order typed twice give .1 point AND

IF both strings appear in the order typed 20 times give 3 points AND

IF web page with correct string is linked to Harvard University, give 1 point AND

IF web page with correct string is linked to Stanford University, give 1 point AND

IF web page with correct string is linked to National Enquirer, subtract 3 points AND

THEN put the web page with most points first on the page of results, and list the rest in

descending order.

The actual Google PageRank algorithm, the source of their fame and fortune, is very closely guarded.

Before the Google founders came up with their ranking system, results from a search engine were

almost random. This was one of the reasons human directories were popular, and still are extremely

useful. If you are comfortable with math, you will find this article, The Google Page Rank®

Algorithm and How It Works (http://www.alvit.de/vf/en/web-development-the-google-

pagerank-algorithm-and-how-it-works.html) by Ian Rogers interesting (it is interesting even if

you can't quite get the finer points).

Google

Google the Internet search engine. Please read this wikipedia page carefully:

http://en.wikipedia.org/wiki/Google_(search_engine)

It is part of the lecture. While it describes Google specifically, you can apply most of what is in

the article to other search engines with a few alterations. And most people don't know how a

search engine operates, even though they use them all the time. Magic? NOT.

Most people search Google by typing a bunch of words in the text box. As you probably now

know from reading the wikipedia article, that works because of Google's ranking algorithm, also

discussed above. When you search this way, your search is done according to the way Google

wants to do it and they may not have the same criteria or needs you do. Their goal as a for-

profit business has to be to make money from your visit. But your goal is to get what you want

Page 3: test

and get it now. So sometimes trying something other than typing a bunch of words is helpful.

And, this is where you see better results using an advanced search page or Boolean operators.

Boolean operators were introduced in the first lecture. Here we'll see them in action. Boolean

operators and advanced search pages offer you the ability to create better, more targeted

searches. For instance if you are looking for things on global warming - there is a synonym:

climate change. Some of the best resources could be about "climate change" and not mention

global warming. To prevent missing some good results you could type this into the search box:

"global warming" OR "climate change"

Below are pictures of the difference in results between just "global warming" and "global

warming" OR "climate change". Notice that when you say you want X OR Y, you get many more

results than if you just want X. What happens when you say you want both X AND Y?

Search results for "global warming" OR "climate change".

Page 4: test

Search results for "global warming" only.

NOT is another Boolean command. For Google, use the minus sign immediately in front of a

word you don't want to see. Here is a picture of the results for "global warming" OR "climate

change" without documents containing the words green, house or greenhouse.

Another thing you can do with the Boolean command NOT is to eliminate all of the dot com

sites; for an academic research project this is usually a VERY good way to go. You do that by

adding this exact string -site:.com to your search.

Here are the results you get this way.

Of course no one will ever look at even 3 million websites, but learning how to refine your

search strategies will make the top 20 results more relevant to you - and you might look at

them.

Page 5: test

Beyond Boolean

I have this joke with some friends, when we are trying to be helpful but maybe not quite

making it. We say in a sort of sing song voice "helping". And then we laugh. This is what I think

about as I watch the evolution of search. And search tools. Search engines in particular are

always looking at new ways of "helping". A lot of them are actually great, once you notice they

are there. Google has recently put some new "help" out there for you. On a typical results page

there are now lots of ways to narrow a search even if you don't know Boolean.

New toys from Google are always popping up. In this case you are looking at the top of a search

results page. There are some new options, you can search web results, video, news, etc. Always

important, in fact maybe the most important thing you will learn in this class, is to pay attention

to what you are seeing. Buried here in the lecture in our seventh or eighth week is "The most

important thing". Study the page you are viewing. Pay attention to the links on the edges. Take

a quick look at the ads, you can tell a lot about ads just by looking at them. AND THINK. Pay

attention and THINK. And then once you do that, explore. The web is pretty wondrous. People

are amazing and inventive and things change all the time. So pay attention, THINK and then

explore. Check out the new stuff.

Page 6: test

This is the bottom of a search results page in October of 2008. Check out the extras.

One of the reasons to pay attention, think, and explore is that it keeps you in the right frame of

mind to live today -- we all live in a time of change, sometimes frivolous change, but often

substantive and real changes. They affect the way we shop, vote, love, work and play. If you are

always watching, and learning, you don't necessarily expect things to be the same today as

yesterday, and you are more ready to cope with the inevitable and take advantage of it. And

that attitude of being ready for change and able to analyze and explore new tools, is invaluable.

But how does it relate to academic research? The ways we present, store and explore

information is always changing. Yesterday you needed a library unless you were very wealthy.

Today you need a computer with a fast internet connection, a few tools and a very good brain.

But the tools keep changing. Like we saw on the Google results page. That page is now a tool,

not just a list of links. If I am going to write a paper on Global Warming, right on that page is

everything I need to get started. There are images, articles, links to books, etc. Last month I

would have had to run 3 or 4 separate searches to get to all of those things.

Google Advanced Search

Page 7: test

Google's advanced search can be accessed by a link on their regular page, just to the right of

the search button. The link says "Advanced Search". Advanced search pages are often very

useful to use when you are not getting the results you want because they will lay out your

options, usually in plain English. Here is a picture of the Google advanced search page, where I

have highlighted what I typed in. Up at the top is what I think of as the translation, Google is

showing you the Boolean statement they are creating from your search terms.

Other ways you can use Google to find information is to use some of their other databases. If

you'll look down at the bottom of the picture above, they list some of their "topic-specific

search engines". Google doesn't just search the web, it also has a book search and a journal and

magazine search and a news and news archive search. Each of these will search a separate

database not accessed by the regular searches. What follows is a brief description of each

Google Books

Google Books is an attempt by Google to index, AND scan, all the books ever printed. Think

about that for a minute. They have teamed up with several libraries to make this happen and

are in the process of dealing with many of the legal issues around copyright that currently limit

their ability to do more. What you can currently do is to search their book database in 2 ways,

one is to look for mentions of your terms in all of the pages they have scanned, which includes

many still protected by copyright and so not viewable in whole -- so in that case you are only be

Page 8: test

able to read a page or two of the book itself. The best reason to do this is to find what books

exist on a topic and then you can head to your friendly librarian to get the book for you.

The other way to search, especially useful if you are a procrastinator, is to click on the full view

option and search only through books in the common domain, fully and freely available. The

difference between the results is stark, looking in all of them, you find 7160 books, looking in

only those you can view for free, you get 813.

Page 9: test

Google Scholar

Google Scholar is Google's attempt to make magazine and journal articles visible and

sometimes available to you. Most journal and magazine articles are not free - although some

are. What Google has done is to index articles, make them searchable, and then link you to the

publisher's webpage where you can purchase them. Because we signed up with Google Scholar

you can access many of our articles using the Google Scholar search. If you are on campus,

automatically you will see the links to Full Text @ My Library, and they will take you into our

library databases when we have the articles, then they are free for you. If you are off campus

you have to set up your preferences in Google Scholar to look at our databases.

Page 10: test

The other option if you find something you want is to copy the information and ask one of us

for it, we'll get you almost any article you can find - and usually for free, via Interlibrary Loan.

Bonus points! What does the motto of Google Scholar, "Stand on the shoulders of giants",

mean?

Google News and the News Archive

Google News is a special database that houses the results from visits to 4,500 news sites around

the world. This database refreshes itself every 15 minutes, rather than the 4-6 week cycle for

most of the other databases you search like regular Google searches, Google Scholar, etc. You

can read the very interesting story about how it all started in this article by Richard Wiggins:

http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/890/799

The News archive is a newish addition and it does with newspapers and news magazines

something similar to what Google Scholar does for journals - it indexes them and either links

Page 11: test

you to a free article, or to the publisher where you can pay for it. Many of the articles you can

find in our library databases for free, and the rest can be procured via Interlibrary Loan - so

DON'T use this to pay for articles! Both of the articles shown below for a fee can be found in

our library databases. The news database does not link into our databases, but that just means

you need to go looking yourself. Next week we will begin discussing database searching in

detail, you'll be expert soon.

There is much more to Google than what we've put here, if you have time and are curious,

much more about Google and Google tools can be found on this page, Google Web Search

Features (http://www.google.com/help/features.html). And you can check out Google

products (http://www.google.com/intl/en/options/), which describes all of the different tools

Google offers -- so far beyond the classic search box. To see what is on the horizon, check out

the Labs!

And once you are a true advanced Google user, try ASK (ask.com).