mining the “deep web” - bates information servicesreluctant-entrepreneur.com the “deep web”...

24
Reluctant-Entrepreneur.com Mining the “Deep Web” Mary Ellen Bates Reluctant–Entrepreneur.com August 16, 2017

Upload: others

Post on 23-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Mining the “DeepWeb”

Mary Ellen BatesReluctant–Entrepreneur.comAugust 16, 2017

Page 2: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

What’s the “Dark Web”?

Accessible only through anonymousnetworks

Black-market content (drugs, hackingsoftware, porn)Free-speech forumsDrop-sites for leaks

6

Page 3: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Search engines are blocked by…

8

Page 4: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

The “Deep Web” is…

Info that search engines can’t easilyfind, get into or read

DatabasesImages, multimedia, statisticsBooks, articlesFacebook and other social media*

*tomorrow at 10:15

9

Page 5: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Deep Web strategies

Look for the next lead, not The Answer

Treat it as a treasure huntWatch for clues, lists of resources

Build your own “library” of sources

10

Page 6: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Horticultural resources

plants.usda.gov

garden.org

catalog.extension.org

11

Page 7: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Business resources

SCORE.org

SBA.gov

inc.com/grow

Pixabay.com (free images)

13

Page 8: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Finding deep web content

Use search engines for leads:Keywords (database OR dataset ORarchive)Keywords (portal OR resources OR“online tool”)

15

Page 9: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Finding deep web content

Start with one known source (ass’n,agency, non-profit, library, etc.)

Then find their links to other resources

Look for mentions OF that sitee.g. “consumerhort.org”

16

Page 10: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Find “similar” sites

SimilarSites.comBased on content, link analysis, userbehavior, etc.Use to find other good sites

17

Page 11: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

We librarians

Librarians build “libguides”Road map for better research on a topicLoaded with deep web links

inurl:libguides ("garden center" ORgardening OR horticulture)

19

Page 12: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Libraries have deep web!

Stick around for “HiddenDatabases: Accessing PricelessMarket Research… WithoutSpending a Dime”

21

Page 13: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Insights from other shows

ID relevant conferences("garden show" OR "horticulturalshow") trends

Be sure to limit the search to this yearThen….

22

Page 14: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Insights from other shows

Find the web site for that showScan workshops, keynotesLook up those speakers, see theirweb pages

23

Page 15: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Insights from expen$ive reports

ID the title of a useful report

Google mentions of that report andthe word trends

"according to the 2017 NationalGardening Survey" trends

24

Page 16: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Has a page disappeared?

Try the Wayback Machine(archive.org)

Copies of the page over time

25

Page 17: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

26

Page 18: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Has a page disappeared?

Try the Google cached copyCopy the URL into Google’s search boxClick the next to the link

27

Page 19: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

28

Page 20: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

The Deep Web = 2nd pageof Google search results

Go deeper!

Change your settings to 100 results

30

Page 21: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

31

Page 22: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

“Hidden” search results

Use Millionshort.com to see otherresults

“Long-tail” search engineEliminates the most popular sitesFind obscure and less-commercialsites

32

Page 23: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

33

Page 24: Mining the “Deep Web” - Bates Information ServicesReluctant-Entrepreneur.com The “Deep Web” is… Info that search engines can’t easily find, get into or read Databases Images,

Reluctant-Entrepreneur.com

Slide deck is at

Reluctant–Entrepreneur.com/extras(or just text me)

Mary Ellen Bates+1 303 772 [email protected]: @mebs

34