nuweb a new web system sun wu nuweb team 2007/05/20

37
NUWeb NUWeb A New Web System A New Web System Sun Wu Sun Wu NUWeb Team NUWeb Team 2007/05/20 2007/05/20

Post on 21-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

NUWebNUWebA New Web SystemA New Web System

Sun WuSun Wu

NUWeb TeamNUWeb Team

2007/05/202007/05/20

Page 2: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Birth of WWWBirth of WWW

• 1989 Berners-Lee: WWW proposal– Proposed a networked hypertext information system,– Later, modified to:

• HTML HyperText Markup Language • HTTP, HyperText Transfer Protocol• URI, Uniform Resource Indicator

• 1993: – MOSAIC, (National Center for Supercomputing Applic

ations, U. Illinoi, led by Marc Andreessen. )• 1994:

– Netscape, Yahoo, WebCrawler, Lycos, …– The year the www revolution sweeps over the world

Page 3: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

WWW ArchitectureWWW Architecture

• Web Server (e.g., Apache, IIS)

• Browser (e.g., IE, Firefox)

• Addressing and Information Channel (DNS, URL, SearchEngine)

• Abstract Model: – Provider (server), Consumer (client), Channel– Client-Server architecture, Centralized Service

Page 4: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Factors WWW so successfulFactors WWW so successful

• Marriage of Hypertext to the net• Free and Open

– When WWW starts to boom, Gopher starts to charge.

• The introduction of Multimedia UI: MOSAIC• A unsung hero:

– The design of CGI makes the WWW a service provision platform, rather than only an information provision plaftorm.

• …

Page 5: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Pre-WWW historyPre-WWW history

• Internet is popular in academic only– Mail, BBS, NewsGroup, ListServer, Talk, IRC,

anonymous ftp, Gopher, Archie, WAIS,

• Hypertext, SGML in Document Processing

-> WWW is a mile-stone integration of the existing concepts to create a new model of information system.

Page 6: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Staring Techs/Services after 1994Staring Techs/Services after 1994on the Webon the Web

• Java, Javascript, ASP, Perl, PHP, MySQL, RoR, …• Search Engines, Cache, Proxy, Portals, info agents• WebMails, Web-based community…• EC, E-learning, Info-Matching, (ex. Job-matching), …• EIP, KM, Web-Apps, Internet Appliance, etc., …• Information Security:

– Firewall, antispam, antispyware, content filtering, …• Authoring systems/services/tools:

– Dreamweaver, frontpage, …– Blog, wiki, web-based editors,

• CSP: community service provider– Blog, Wikipedia, YouTube, Flickr, …

• Semantic Web, WebMining, InfoExtraction, XML, CSS, DHTML, AJAX

• …

Page 7: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Birth of P2PBirth of P2P

• P2P is another history-making innovation that brings tremendous impact on the information age– 1996 ICQ, 1999 Napster, 2000 Gnutella, 2001 BitTorrent, …

• P2P is not a WWW system, it uses different approaches for information sharing.– No http, no web server, not webpages, … – Peer network, multi-source download, filename index, – A pure peer-to-peer network does not have the notion of clients or servers, but only equal pe

er nodes that simultaneously function as both "clients" and "servers" to the other nodes on the network (quoted from Wikipedia)

• P2P is now mainly for sharing files in the user’s PC space, and for communication between peer users.

• P2P network is being utilized to provide super scale service, e.g., streaming service of broadcasting, …

• File sharing: foxy, bt, edonkey, …• IM/VOIP: ICQ, MSN-IM, YIM, Skype, …

Page 8: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

WWW vs P2PWWW vs P2P

• Grounded vs Undergrounded file sharing• Explicit Caching/Proxy vs Implicit Caching/Proxy• Centralized vs Decentralized• Roughly divided the bandwidth usage in the internet.• Commercial Value:

– WWW: highly explored, ( because of web service ) – P2P: under-explored

• Technologically, P2P has higher scalability and cost-effectiveness, however, it is much more limited in applications, because,

• Service shipping vs Information shipping– P2P: information shipping,– WWW: info/service shipping,

Page 9: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Information sharing in current ageInformation sharing in current age

• Publishing and Public sharing through Web sharing platform: Websites, Portals, Blogs, YouTube, Wikipedia, …

• Search engines are major vehicles in finding wanted information

• Dark-net search-engine and sharing of files through P2P peer-space search-and-grabbing tools such as BT, Foxy, edonkey

• Communication and private peer2peer sharing through IM style software such as MSN-IM, YM, Skype, …

Page 10: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Problems in current information Problems in current information sharingsharing

• Sharing through Web space: – WebSites is the critical platform for sharing in Web sp

ace, but, Individuals in general do not own websites, – Public sharing platform has limitation such as size an

d autonomy– Up-load/download is not efficient/convenient

• Sharing through P2P:– Security threat, files retrieved from who-knows-who c

ould contain malicious stuff.– Copyright infringement– No Browsing/Searching function on the peer-node

Page 11: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Problems of the WWW due to the Problems of the WWW due to the fundamental designfundamental design

• Naming/Addressing problem:– Physical naming/addressing– Static Binding through DNS– URL may not be a good design, (hard-to-remember)– DNS could be slow

• Information flow organization not designed in the first place, – Hotspot bottleneck problem, bandwidth waste problem, – Cache and Proxy tech are added separately afterwards,

• Linkrot problem– Dead links, wrong links, faked links, – Approximately up to 15% of links

• Need static IP, need to apply for URL, need knowledge in building up and managing Websites

– Creating and maintaining a website is costly– Webpage creation is not easy

• Divide the computer world into two hierarchies– Server: Website owners, service providers– Client: ordinary users

Page 12: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Some problems in current web eraSome problems in current web era

• Sharing is limited and sometimes inconvenient or infeasible

• Size limitation is a headache, • If one takes giga bytes of pctures or video in a trip with many friends,

how to share is a headache• Website is the main vehicle in sharing, but general users do not own

web sites.

• Information power and webilization are extremely centralized in the super portals

• The users are homeless and powerless, with no autonomy and their privacy at stake, personal data could be utilized/monetized

• No powerful and complete information/web management platform yet.

Page 13: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Personal Information Management Personal Information Management problemproblem

• Information is scattered around in different places in the web space and the PC.

• Different UI experience in Web and PC’s windows system.

• Hard to manage one’s own information space. • Although browsing/consuming the web is easy n

ow, building/weaving a personal web is not easy yet. Wiki and blog is a big improvement, but still not good enough.

Page 14: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Weaving the WebWeaving the Web(quoted from wikipedia)(quoted from wikipedia)

• In Berners-Lee's book, Weaving the Web, several recurring themes are apparent:– It is just as important to be able to edit the Web as browse it.

Wikis are a step in this direction, although Berners-Lee considers them merely a shadow of the WYSIWYG functionality of his first browser.

– Computers can be used for background tasks that enable humans to work better in groups.

– Every aspect of the Internet should function as a Web, rather than a hierarchy. Notable current exceptions are the Domain Name System and the domain naming rules managed by ICANN.

– Computer scientists have a moral responsibility as well as a technical responsibility.

Page 15: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Unification?Unification?

• Can we put all the information sharing, information service, information production, information management under a unified framework?

• Can we make the PC’s information space a personal Web space?

• Can we integrate the two mainstream information systems to be one? – Integrating Web with P2P!

• If we are to design a new Web system, what shall we do?

Page 16: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Our AnswerOur Answer

• NUWeb

• Net User’s Web

• A New Web System

Page 17: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

What Is NUWeb?What Is NUWeb?

• Marriage of WWW with P2P• Technologically:

– NUWeb = WebServer + Browser + WNS + SearchEngine + Proxy/Cache + WebBuilder + Blog + CommunityEngine + KIM + P2P – URL – DNS and – Cost

• Logically:– A New Web System for any net user to build his/her own web in

an extremely easy-to-use way. – A platform for web-building, information sharing, information man

agement, community, and service management

• A platform for Webilization• A project to pursue Wemocracy

Page 18: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

WebilizationWebilization

– Webilization refers to the degree of web civilization in the internet. We say that a service is being webilized if it is implemented through the web platform. The process of digitizing the information and activities and put them on the web for service can be viewed as a process of webilization.

Page 19: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

WemocracyWemocracy

– Wemocracy stands for web democracy. It’s an era where the users are granted, by nature, a basic information right and power to own and manage his/her web space in the cyberspace, with unlimited sharing capability and uncompromised autonomy. It’s an ideal web era where the web belongs to all the net users, rather than strongly centralized and controlled by the super empires. It’s a web for the user, by the user, and of the user.

– We envision Web 3.0 to be an era where the degree of webilization is balanced and high, and the idealism of wemocracy is achieved!

Page 20: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

NUWeb Features(1)NUWeb Features(1)

• Can set up one’s own website on one’s PC for free– Without the need to apply for URL and no need for static IP, – Can be set up in a few minutes, like setting up an instant messenger– No need of knowledge of web server administration– Content can be cached, accessible even if the PC is off line. – Can create web pages in an extremely easy way

• Can share publicly like Youtube, flickr, blogger, etc., with a powerful full-text search engine.

• Can set up a blog on one’s own web site, with content cached in NUWeb space

• Can share directly with friends through PC 2 PC connection without size limitation.

Page 21: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

NUWeb Features(2)NUWeb Features(2)

• Can manage the sharing directories easily regarding what to share and who to share!

• Can set up a community for friends/relatives on one’s own PC.

• A browser with information management function.• Can set up a portal for a community such as school• A decentralized portal which is a federation of collaborati

ve regional portals and personal portals, while the center is a community center and search engine for the nuweb space.

• A platform for sharing, searching, web service, community, and management

Page 22: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

NUWeb Software ArchitectureNUWeb Software Architecture

• NUWeb system is composed of three subsystems– NUWeb.CC CyberCenter

• WNS, (web name service),• Search engine, Cache• Commuity services, (Photo, Blog, Video…)

– NUWeb CP (Community Portal)• Community services, (Blog, Photo, Video…) • Search Engine service, • Proxy and Cache

– NUWeb PP (Personal Portal) • NUWeb browser, kim, • NUWeb server, • NUWeb personal portal/blog builder

Page 23: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20
Page 24: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

How it worksHow it works

• Personal Web server on Windows platform– Auto indexing, thumbnail, – Auto page generation and run-time rendering– Auto caching, – Bundled with php/perl platform

• Registration to WNS in the set up, – Site name, user-account, SiteKey, …

• UPNP to handle firewall/NAT• Each time a client gets on line, send the current IP and n

ame/key info to the WNS center.• The connection request to a personal site will first send t

he name of the site to the WNS to get the IP of the target site (dynamic binding)

Page 25: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Naming and Dynamic AddressingNaming and Dynamic Addressing

– A page is a textual web document. It contains UltraLinks or tags and the display of such page might instantiate the display of some other objects such as included images.

– An object is either a richtext document such as pdf, msdoc, msppt, etc., a multimedia file, or any singular file that can be accessed in the web space.

– A resource is either a page or an object– GRN, global resource naming

• SiteUniqName#objectname[#class#type#location]

– fixed IP is not necessary– ABN (AddressByName), ABI (AddressById), ABC(AddressByCo

ntent)– USI (UniversalSiteId),

Page 26: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

NUWeb CyberCenterNUWeb CyberCenter

• GRI: Global Resource Index– A distributed index structure for objects/pages on the NuWeb sp

ace– Use hash data structure

• Search engine• Collaborative proxy

– Content enhancement– Info filtering, content switching– Relay casting– Hierarchical search– Collaborative cache (super cache)

• P2P UMTP protocol

Page 27: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Site InitializationSite Initialization• When a new site is installed:

– Register the following info• SiteUniqName, to be interacted by the center• Titles of the site (at most T bytes)• Abstract of the site (at most P bytes)• tags, (if inappropriate, such as infringing others right, will

be abolished by the center)• Country/city/county, real world geography info• Profile of personal info• Residents : SUN.resident will identify a user

– Decide which directories to be open to public– Decide which directories to be open to private con

nections– Decide whether to open caching of the public direc

tory

Page 28: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Site InitializationSite Initialization

• The server will build an index for the pages/objects that are covered in the site . The index for public and private areas are separated such that the privacy will be secured.

• The index is on the name and signature level, plus the content of pages, the support for object content index such as ms-doc files pdf files will be optional

• After the site is set up, the user will be asked to provide a list of friends to which the system will send invitation letters.

Page 29: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

NUWeb PPNUWeb PP

• Service Manager (starting from search)• Grabing, Caching, • Personal Portal, blog, …• NUMail, p2p secure mail system• Share, file transfer• Information Watch-dog• Information filter• Information/Knowledge management• Relay casting, streaming, • Web site builder, page creator,

Page 30: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

NUWeb CPNUWeb CP

• A suite of programs for setting up Community Portal in NUWeb space

• The proxy and caching nodes in the NUWeb cyberspace

• Community services: – Mail, Blog, BBS, …– Web HD – Search engine, …

Page 31: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

SearchingSearching

• The search in the nuweb center includes:– Search pages/objects by name (WNS)– Page content search– * attributed search , for example, search for pages authored by Hammin

g• The indexer in each nusite will send the raw-index to the center, and

the center will build an index . The raw-index is a record containing indexable texts for each page or object. A text extractor will be used to extract text from rich text documents such as MS-DOC/PPT documents. The upload of such raw index will get approval from the users first.

• Before rendering the search result to the user, the searcher needs to check whether the result page/object exists at that moment.

• It uses the SSN to check the SiteDB and to see whether that site is avalable. It also use grn to check where such resource is available in the cache.

Page 32: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

CachingCaching• Caching

– Every site page will be automatically cached, unless explicitly disabled

– In the first phase, the caching will be done in the center and the NUWeb CP cache spaces. Objects will be cached if accessed

• The client will cache it in its cache spool, and an index will be sent to the center to notify the center that it has such object in cache.

– In the second phase, the caching will be done by collaborative caching in the p2p space too, assuming that some of the personal sites are willing to participate.

– The cache object will be indexed by GRN and MD5– Note that if an object is modified, it will trigger a update to th

e global cache space to remove the original cache indexed by GRN

– Each cache object will record a timestamp of the content (the time such content is created.)

Page 33: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

GRI & Collaborative ProxyGRI & Collaborative Proxy

• GRI:– Object indexed by MD5-signature & GRN– Home page indexed by GRN– Instance indexed by MD5

• Syntax:– GRN: SUN#OBN

• Distributed/Collaborative GRI

• Multi-tier Collaborative Proxy

Page 34: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Indices (1)Indices (1)• In the nuweb center, there are several indices:

– SiteDB: indexed by SSN• Last live time, access cnt, data size, • When alive, each site will periodically send alive info to t

he center (every K minutes)

– NameDB: indexed using gaisindex• Each name is associated with a SSN by which we can ch

eck whether such page/object exists.• Each name will have a record, which will have a SSN val

ue, and a GRN cache flag• In the search result of name db, if a record does not hav

e a online instance (either roiginal site or the cache copy), it will have a flag indicating “not available”

Page 35: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

Indices(2)Indices(2)

– MD5 index, objects/pages indexed by MD5 signature. Each site will produce MD5 signatures for each object, and the (grn,md5) info will be sent to the center to be indexed.The return of a MD5 lookup is the source SSN/IP or the cache site/s IP

– Page/document Content index• Indexed through gais search engine

Page 36: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

NUWeb Portal ServiceNUWeb Portal Service

• Search engine for the NUWeb cyberspace– Websites, pages, pictures, videos, documents,

articles, etc., …

• Browsing and Viewing– What’s hot, what’s new, what’s cool, – Automatically generated through page renderi

ng tool based on a CountDB and list manager.

Page 37: NUWeb A New Web System Sun Wu NUWeb Team 2007/05/20

ThanksThanks

http://www.nuweb.cchttp://www.nuweb.cc