what can a business do with a web index?

From Trust Flows Understanding The Deloitte Fast 50 Big Data Company You never heard of… Until now.

Upload: dixon-jones

Post on 14-Aug-2015




3 download


  1. 1. From Trust Flows Understanding The Deloitte Fast 50 Big Data Company You never heard of Until now.
  2. 2. @tryMajestic Some Stuff Youll Learn How we built a search engine without $30 billion dollars How you can use it to make lots of: Predictions Insights Money Data Stories
  3. 3. @tryMajestic Reaching for the Stars
  4. 4. @tryMajestic An Inspiration of a Search Engine
  5. 5. @tryMajestic Majestic is a Specialist Search Engine Digital knowledge on a grand scale Dixon Jones
  6. 6. @tryMajestic The BIG specialist search engine Twitter has 500,000,000 Tweets per day on average In the same day, Majestic crawls well over 2,000,000,000 NEW URLs (and sees 7 billion)
  7. 7. @tryMajestic How do they do that? Information Retrieval in the Zeta age 1. Data Collection 2. Data Grouping 3. Data Indexing 4. Data Matching
  8. 8. @tryMajestic How to Collect 7 Billion URLs a Day?
  9. 9. @tryMajestic How to Analyze 200 Billion URLs a Day?
  10. 10. @tryMajestic Groups Make Search Much Better Find a Fact Find a Friend Find a Customer Finding Anything LibraryofCongresscirca1940 Research At: info.majestic.com/groupresearch
  11. 11. @tryMajestic We Group AND ANALYSE pages Topical Trust Flow using decay Algorithm ???
  12. 12. @tryMajestic The Index: For every page we know Its influence in a simple score Its Context Its context by keyword Its Influence in Context! In a series of simple 0-100 scores
  13. 13. @tryMajestic Works best with Universal Data set Every signal is small Individually prone to error or opinion At scale the error decreases Confidence increases http://info.majestic.com/universal
  14. 14. @tryMajestic Data Matching
  15. 15. @tryMajestic Our Data Stack (For the Techies) Crawler: C# .net / Mono NoSQL Read only file system Java Interrogation Dynamic Front End Perl/Ruby etc Hadoop coming soon
  16. 16. @tryMajestic So we built it Now Imagine What COULD you do with it?
  17. 17. @tryMajestic 1: Compare Competitor Backlinks
  18. 18. @tryMajestic Who is more popular on Twitter? 2: Finding influencers Lady Gaga? Barack Obama? Trust Flow 74 Trust Flow 70
  19. 19. @tryMajestic 3: Prediction Elections Boris v Ken Obama v Romney
  20. 20. @tryMajestic 4: Lobbying Senators
  21. 21. @tryMajestic 5: Data Art (Profiling Companies)
  22. 22. @tryMajestic What if we Pivot? Hadoop Imagine your OWN version of our web index? A subset of the data, prepopulated for your needs Updated Daily / Weekly / Monthly Stored in Open Source Hadoop instances ready for easy interrogation What could you do then?
  23. 23. @tryMajestic Data Store Examples
  24. 24. @tryMajestic
  25. 25. @tryMajestic
  26. 26. @tryMajestic
  27. 27. @tryMajestic Ways you could segment the web All domains hosted in [Choose country or City Here] Most influential sites about [Insert 800 Topics Here] Best Web Pages for [Choose 50 Million Phrases Here] Spamiest pages about [Insert 800 Topics Here] Most influential Pages on [Choose any set of sites] Create a set of pages with [Choose properties here] Got a plan? We have the starting point for web data
  28. 28. @tryMajestic Some Takeaways How we built a search engine without $30 billion dollars How you can use it to make lots of: Predictions Insights Money Data Stories
  29. 29. @tryMajestic Out of Trust Flows understanding Real insight into the world wide web from Majestic, the specialist search engine
  30. 30. From Trust Flows Understanding