test
DESCRIPTION
dfgdsgfdhnmnhmTRANSCRIPT
Unit 5 Part 2: More?
What more is there?
Search engines are not the only search tools, that definition includes what-we-call-search-
engines but is not limited to them. Other popular search tools are subject directories (like LII,
discussed in Unit 4), and federated or meta-search engines and now social networking sites like
Facebook, Digg, Del.icio.us, Flickr, and YouTube - where you can search by following a tag or a
link or someone you know.
In general: search tools are the way in which we can find things via the Internet that are stored
on someone else's computer. For search engines, this is done by a computer program using
something called a search algorithm which we started talking about in the lecture on Search
and continue below. For subject directories, and social networking sites, humans play a MUCH
more active role, but then there is often an algorithm of sorts used to order results.
There are two other library classes, LR 011 and LR 001 that discuss other search tools in more
depth, but for this class, we are going to stick with search engines.
When someone says search engine, you think Google, Yahoo, AOL, or something like that, don't
you? Most people do. However, it is an imprecise term, search engines are actually just pieces
of code that allow a regular person to give an instruction to a machine, requesting the retrieval
of a document. For the purposes of most of this conversation we will go ahead with the
popular definition, but keep in mind the other.
The ALGORITHM
Because the world seems to be getting more and not less "google-ized", in that more tools are
operating like Google, it is good to understand a bit more about the algorithms behind the
search box. The search algorithm takes what you type in and turns it into something called a
search "string" which is what the machine understands. A very VERY simplified search algorithm
would go something like this - You type global warming into the search engine text box.
The search engine interface translates that into (hint: read the next lines aloud, where +=plus,
even if it feels dumb -- it will make more sense I think):
1. Search for string object g+l+o+b+a+l AND string object w+a+r+m+i+n+g
2. IF complete search string is found put DOCUMENT in this pile
3. Repeat for every web page in the database
THEN once the search algorithm has done its job and you have 200,000 pages that include the
search string g+l+o+b+a+l w+a+r+m+i+n+g, the ranking algorithm takes over (for Google it is
called the PageRankℵ algorithm) and puts the pile in some sort of order for you to view. That
happens (again, this is VERY simplified, the actual algorithm is a VERY, VERY LONG
mathematical formula) like this:
IF both string objects appear together, in the same order typed, give 5 points AND
IF both strings appear in the order typed, in title of page, give 2 points AND
IF both strings appear in the order typed in metadata about page, give 4 points AND
IF both strings appear in the order typed twice give .1 point AND
IF both strings appear in the order typed 20 times give 3 points AND
IF web page with correct string is linked to Harvard University, give 1 point AND
IF web page with correct string is linked to Stanford University, give 1 point AND
IF web page with correct string is linked to National Enquirer, subtract 3 points AND
THEN put the web page with most points first on the page of results, and list the rest in
descending order.
The actual Google PageRank algorithm, the source of their fame and fortune, is very closely guarded.
Before the Google founders came up with their ranking system, results from a search engine were
almost random. This was one of the reasons human directories were popular, and still are extremely
useful. If you are comfortable with math, you will find this article, The Google Page Rank®
Algorithm and How It Works (http://www.alvit.de/vf/en/web-development-the-google-
pagerank-algorithm-and-how-it-works.html) by Ian Rogers interesting (it is interesting even if
you can't quite get the finer points).
Google the Internet search engine. Please read this wikipedia page carefully:
http://en.wikipedia.org/wiki/Google_(search_engine)
It is part of the lecture. While it describes Google specifically, you can apply most of what is in
the article to other search engines with a few alterations. And most people don't know how a
search engine operates, even though they use them all the time. Magic? NOT.
Most people search Google by typing a bunch of words in the text box. As you probably now
know from reading the wikipedia article, that works because of Google's ranking algorithm, also
discussed above. When you search this way, your search is done according to the way Google
wants to do it and they may not have the same criteria or needs you do. Their goal as a for-
profit business has to be to make money from your visit. But your goal is to get what you want
and get it now. So sometimes trying something other than typing a bunch of words is helpful.
And, this is where you see better results using an advanced search page or Boolean operators.
Boolean operators were introduced in the first lecture. Here we'll see them in action. Boolean
operators and advanced search pages offer you the ability to create better, more targeted
searches. For instance if you are looking for things on global warming - there is a synonym:
climate change. Some of the best resources could be about "climate change" and not mention
global warming. To prevent missing some good results you could type this into the search box:
"global warming" OR "climate change"
Below are pictures of the difference in results between just "global warming" and "global
warming" OR "climate change". Notice that when you say you want X OR Y, you get many more
results than if you just want X. What happens when you say you want both X AND Y?
Search results for "global warming" OR "climate change".
Search results for "global warming" only.
NOT is another Boolean command. For Google, use the minus sign immediately in front of a
word you don't want to see. Here is a picture of the results for "global warming" OR "climate
change" without documents containing the words green, house or greenhouse.
Another thing you can do with the Boolean command NOT is to eliminate all of the dot com
sites; for an academic research project this is usually a VERY good way to go. You do that by
adding this exact string -site:.com to your search.
Here are the results you get this way.
Of course no one will ever look at even 3 million websites, but learning how to refine your
search strategies will make the top 20 results more relevant to you - and you might look at
them.
Beyond Boolean
I have this joke with some friends, when we are trying to be helpful but maybe not quite
making it. We say in a sort of sing song voice "helping". And then we laugh. This is what I think
about as I watch the evolution of search. And search tools. Search engines in particular are
always looking at new ways of "helping". A lot of them are actually great, once you notice they
are there. Google has recently put some new "help" out there for you. On a typical results page
there are now lots of ways to narrow a search even if you don't know Boolean.
New toys from Google are always popping up. In this case you are looking at the top of a search
results page. There are some new options, you can search web results, video, news, etc. Always
important, in fact maybe the most important thing you will learn in this class, is to pay attention
to what you are seeing. Buried here in the lecture in our seventh or eighth week is "The most
important thing". Study the page you are viewing. Pay attention to the links on the edges. Take
a quick look at the ads, you can tell a lot about ads just by looking at them. AND THINK. Pay
attention and THINK. And then once you do that, explore. The web is pretty wondrous. People
are amazing and inventive and things change all the time. So pay attention, THINK and then
explore. Check out the new stuff.
This is the bottom of a search results page in October of 2008. Check out the extras.
One of the reasons to pay attention, think, and explore is that it keeps you in the right frame of
mind to live today -- we all live in a time of change, sometimes frivolous change, but often
substantive and real changes. They affect the way we shop, vote, love, work and play. If you are
always watching, and learning, you don't necessarily expect things to be the same today as
yesterday, and you are more ready to cope with the inevitable and take advantage of it. And
that attitude of being ready for change and able to analyze and explore new tools, is invaluable.
But how does it relate to academic research? The ways we present, store and explore
information is always changing. Yesterday you needed a library unless you were very wealthy.
Today you need a computer with a fast internet connection, a few tools and a very good brain.
But the tools keep changing. Like we saw on the Google results page. That page is now a tool,
not just a list of links. If I am going to write a paper on Global Warming, right on that page is
everything I need to get started. There are images, articles, links to books, etc. Last month I
would have had to run 3 or 4 separate searches to get to all of those things.
Google Advanced Search
Google's advanced search can be accessed by a link on their regular page, just to the right of
the search button. The link says "Advanced Search". Advanced search pages are often very
useful to use when you are not getting the results you want because they will lay out your
options, usually in plain English. Here is a picture of the Google advanced search page, where I
have highlighted what I typed in. Up at the top is what I think of as the translation, Google is
showing you the Boolean statement they are creating from your search terms.
Other ways you can use Google to find information is to use some of their other databases. If
you'll look down at the bottom of the picture above, they list some of their "topic-specific
search engines". Google doesn't just search the web, it also has a book search and a journal and
magazine search and a news and news archive search. Each of these will search a separate
database not accessed by the regular searches. What follows is a brief description of each
Google Books
Google Books is an attempt by Google to index, AND scan, all the books ever printed. Think
about that for a minute. They have teamed up with several libraries to make this happen and
are in the process of dealing with many of the legal issues around copyright that currently limit
their ability to do more. What you can currently do is to search their book database in 2 ways,
one is to look for mentions of your terms in all of the pages they have scanned, which includes
many still protected by copyright and so not viewable in whole -- so in that case you are only be
able to read a page or two of the book itself. The best reason to do this is to find what books
exist on a topic and then you can head to your friendly librarian to get the book for you.
The other way to search, especially useful if you are a procrastinator, is to click on the full view
option and search only through books in the common domain, fully and freely available. The
difference between the results is stark, looking in all of them, you find 7160 books, looking in
only those you can view for free, you get 813.
Google Scholar
Google Scholar is Google's attempt to make magazine and journal articles visible and
sometimes available to you. Most journal and magazine articles are not free - although some
are. What Google has done is to index articles, make them searchable, and then link you to the
publisher's webpage where you can purchase them. Because we signed up with Google Scholar
you can access many of our articles using the Google Scholar search. If you are on campus,
automatically you will see the links to Full Text @ My Library, and they will take you into our
library databases when we have the articles, then they are free for you. If you are off campus
you have to set up your preferences in Google Scholar to look at our databases.
The other option if you find something you want is to copy the information and ask one of us
for it, we'll get you almost any article you can find - and usually for free, via Interlibrary Loan.
Bonus points! What does the motto of Google Scholar, "Stand on the shoulders of giants",
mean?
Google News and the News Archive
Google News is a special database that houses the results from visits to 4,500 news sites around
the world. This database refreshes itself every 15 minutes, rather than the 4-6 week cycle for
most of the other databases you search like regular Google searches, Google Scholar, etc. You
can read the very interesting story about how it all started in this article by Richard Wiggins:
http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/890/799
The News archive is a newish addition and it does with newspapers and news magazines
something similar to what Google Scholar does for journals - it indexes them and either links
you to a free article, or to the publisher where you can pay for it. Many of the articles you can
find in our library databases for free, and the rest can be procured via Interlibrary Loan - so
DON'T use this to pay for articles! Both of the articles shown below for a fee can be found in
our library databases. The news database does not link into our databases, but that just means
you need to go looking yourself. Next week we will begin discussing database searching in
detail, you'll be expert soon.
There is much more to Google than what we've put here, if you have time and are curious,
much more about Google and Google tools can be found on this page, Google Web Search
Features (http://www.google.com/help/features.html). And you can check out Google
products (http://www.google.com/intl/en/options/), which describes all of the different tools
Google offers -- so far beyond the classic search box. To see what is on the horizon, check out
the Labs!
And once you are a true advanced Google user, try ASK (ask.com).