nuweb system [email protected]. www architecture web server (e.g., apache, iis) browser (e.g., ie,...

33
NUWeb System NUWeb System [email protected] [email protected]

Upload: ashlee-marshall

Post on 25-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUWeb SystemNUWeb System

[email protected]@gais.cs.ccu.edu

Page 2: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

WWW ArchitectureWWW Architecture

• Web Server (e.g., Apache, IIS)

• Browser (e.g., IE, Firefox)

• Addressing and Information Channel (DNS, URL, SearchEngine)

• Abstract Model: – Provider (server), Consumer (client), Channel– Client-Server architecture, Centralized Service

Page 3: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Problems of the WWW due to the Problems of the WWW due to the fundamental designfundamental design

• Naming/Addressing problem:– Physical naming/addressing– Static Binding through DNS– URL may not be a good design, (hard-to-remember)– DNS could be slow

• Information flow organization not designed in the first place, – Hotspot bottleneck problem, bandwidth waste problem, – Cache and Proxy tech are added separately afterwards,

• Linkrot problem– Dead links, wrong links, faked links, – Approximately up to 15% of links

• Need static IP, need to apply for URL, need knowledge in building up and managing Websites

– Creating and maintaining a website is costly– Webpage creation is not easy

• Divide the computer world into two hierarchies– Server: Website owners, service providers– Client: ordinary users

Page 4: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Weaving the WebWeaving the Web(quoted from wikipedia)(quoted from wikipedia)

• In Berners-Lee's book, Weaving the Web, several recurring themes are apparent:– It is just as important to be able to edit the Web as browse it.

Wikis are a step in this direction, although Berners-Lee considers them merely a shadow of the WYSIWYG functionality of his first browser.

– Computers can be used for background tasks that enable humans to work better in groups.

– Every aspect of the Internet should function as a Web, rather than a hierarchy. Notable current exceptions are the Domain Name System and the domain naming rules managed by ICANN.

– Computer scientists have a moral responsibility as well as a technical responsibility.

Page 5: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

What Is NUWeb?What Is NUWeb?

• Marriage of WWW with P2P• Technologically:

– NUWeb = WebServer + Browser + WNS + SearchEngine + Proxy/Cache + WebBuilder + Blog + CommunityEngine + KIM + P2P – URL – DNS and – Cost

• Logically:– A New Web System for any net user to build his/her own web in

an extremely easy-to-use way. – A platform for web-building, information sharing, information

management, community, and service management

• A platform for Webilization• A project to pursue Wemocracy

Page 6: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUWeb FunctionsNUWeb Functions

• A platform for Public Sharing and Publishing– Personal website/blog– Public community– Search Engine,

• A platform for Private Sharing and Community– Personal community builder– Sharing management

• A platform for personal information / knowledge management, content engine,

Page 7: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUWeb Software ArchitectureNUWeb Software Architecture

• NUWeb system is composed of three subsystems– NUWeb.CC CyberCenter

• WNS, (web name service),• Search engine, Cache• Commuity services, (Photo, Blog, Video…)

– NUWeb CP (Community Portal)• Community services, (Blog, Photo, Video…) • Search Engine service, • Proxy and Cache

– NUWeb PP (Personal Portal) • NUWeb browser, kim, • NUWeb server, • NUWeb personal portal/blog builder

Page 8: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,
Page 9: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

How it worksHow it works

• Personal Web server on Windows platform– Auto indexing, thumbnail, – Auto page generation and run-time rendering– Auto caching, – Bundled with php/perl platform

• Registration to WNS in the set up, – Site name, user-account, SiteKey, …

• UPNP to handle firewall/NAT,• Packet forwarding Proxy to handle the cases

where UPNP does not work correctly.

Page 10: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

How it works (2)How it works (2)

• Each time a client gets on line, send the current IP and name/key info to the WNS center.

• The connection request to a personal site will first send the name of the site to the WNS to get the IP of the target site (dynamic binding)

• If the requested site is not online, then the center will redirect the request to the cache server.

• If the site is connected through proxy, then connect it through relay proxy.

Page 11: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Naming and Dynamic AddressingNaming and Dynamic Addressing

– A page is a textual web document. It contains UltraLinks or tags and the display of such page might instantiate the display of some other objects such as included images.

– An object is either a richtext document such as pdf, msdoc, msppt, etc., a multimedia file, or any singular file that can be accessed in the web space.

– A resource is either a page or an object– GRN, global resource naming

• SiteUniqName#objectname[#class#type#location]

– fixed IP is not necessary– ABN (AddressByName), ABI (AddressById),

ABC(AddressByContent)– USI (UniversalSiteId),

Page 12: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUWeb CyberCenterNUWeb CyberCenter

• GRI: Global Resource Index– A distributed index structure for objects/pages on the NUWeb

space– Use hash data structure

• Search engine, Community Service, Portal for NUWeb• Proxy & Caching

– Auto backup and versioning– Info filtering, content switching– Packet forwarding, center relay– Relay casting, media streaming– Hierarchical search– Collaborative cache (super cache)

Page 13: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Site InitializationSite Initialization• When a new site is installed:

– Register the following info• SiteUniqName, to be interacted by the center• Titles of the site (at most T bytes)• Abstract of the site (at most P bytes)• tags, (if inappropriate, such as infringing others right, will

be abolished by the center)• Country/city/county, real world geography info• Profile of personal info• Residents : SUN.resident will identify a user

– Decide which directories to be open to public– Decide which directories to be open to private

connections– Decide whether to open caching of the public

directory

Page 14: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Site InitializationSite Initialization

• The server will build an index for the pages/objects that are covered in the site . The index for public and private areas are separated such that the privacy will be secured.

• The index is on the name and signature level, plus the content of pages, the support for object content index such as ms-doc files pdf files will be optional

• After the site is set up, the user will be asked to provide a list of friends to which the system will send invitation letters.

Page 15: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUWeb ServicesNUWeb Services

• NUSite, NUBlog

• NUSearch, NUSM

• NUCommunity, NUBBS,

• NUBot, NUWatch, NUPush

• NUCache, NUProxy

• NUPedia, knowledge authoring/manager

• NUMail, P2P secure mail system

• NUJournal

Page 16: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

SearchingSearching

• The search in the nuweb center includes:– Search pages/objects by name (WNS)– Page content search– * attributed search , for example, search for pages authored by

Hamming• The indexer in each nusite will send the raw-index to the center, and

the center will build an index . The raw-index is a record containing indexable texts for each page or object. A text extractor will be used to extract text from rich text documents such as MS-DOC/PPT documents. The upload of such raw index will get approval from the users first.

• Before rendering the search result to the user, the searcher needs to check whether the result page/object exists at that moment.

• It uses the SSN to check the SiteDB and to see whether that site is avalable. It also use grn to check where such resource is available in the cache.

Page 17: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

CachingCaching• Caching

– Every site page will be automatically cached, unless explicitly disabled

– In the first phase, the caching will be done in the center and the NUWeb CP cache spaces. Objects will be cached if accessed

• The client will cache it in its cache spool, and an index will be sent to the center to notify the center that it has such object in cache.

– In the second phase, the caching will be done by collaborative caching in the p2p space too, assuming that some of the personal sites are willing to participate.

– The cache object will be indexed by GRN and MD5– Note that if an object is modified, it will trigger a update to

the global cache space to remove the original cache indexed by GRN

– Each cache object will record a timestamp of the content (the time such content is created.)

Page 18: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

GRI & Collaborative ProxyGRI & Collaborative Proxy

• GRI:– Object indexed by MD5-signature & GRN– Home page indexed by GRN– Instance indexed by MD5

• Syntax:– GRN: SUN#OBN

• Distributed/Collaborative GRI

• Multi-tier Collaborative Proxy

Page 19: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Indices (1)Indices (1)• In the nuweb center, there are several

indices:– SiteDB: indexed by SSN

• Last live time, access cnt, data size, • When alive, each site will periodically send alive info to

the center (every K minutes)

– NameDB: indexed using gaisindex• Each name is associated with a SSN by which we can

check whether such page/object exists.• Each name will have a record, which will have a SSN

value, and a GRN cache flag• In the search result of name db, if a record does not

have a online instance (either roiginal site or the cache copy), it will have a flag indicating “not available”

Page 20: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Indices(2)Indices(2)

– MD5 index, objects/pages indexed by MD5 signature. Each site will produce MD5 signatures for each object, and the (grn,md5) info will be sent to the center to be indexed.The return of a MD5 lookup is the source SSN/IP or the cache site/s IP

– Page/document Content index• Indexed through gais search engine

Page 21: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUWeb Portal ServiceNUWeb Portal Service

• Search engine for the NUWeb cyberspace– Websites, pages, pictures, videos,

documents, articles, etc., …

• Browsing and Viewing– What’s hot, what’s new, what’s cool, – Automatically generated through page

rendering tool based on a CountDB and list manager.

Page 22: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUWeb DBNUWeb DB

• NUWeb cache is implemented through NUWeb DB system.

• NUWeb DB is to store Web Objects and relationship and provide search function. – Web DB:

• ODB, (Object DB)• NDB, (Name DB)• IDB, (Index DB)• TDB, (Term DB)• UDB, (User DB)• SDB, (Site DB)• Page Engine• Access Log DB (PV DB)• Access Control• Query Interface (including SQL) *

Page 23: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Web DB implementationWeb DB implementation

• ODB and NDB is the kernel storage DB• The key technique used in ODB and NDB is the

Hash DB which needs to minimize the disk seeks and maximize the memory usage.

• PV DB (Access log DB) is implemented on top of ODB and NDB.

• Term DB is implemented on top of ODB too. Term DB will record the term frequency, term score … information.

Page 24: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Web DB implementation (2)Web DB implementation (2)

• Site DB records the site info such as access frequency, size, dynamics, etc.

• IDB is a real time index engine for all the objects stored in Web DB.

• Access Control: – Authorization: permission list based– Authentication: through an authentication

center in WNS server.

• SQL is not supported yet, on the todo list.

Page 25: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUDBNUDB

• Net User’s DataBase • Easy to use,

– No background of database is needed.– No need to program – Define the spec and start to use,

• Spec can be adjusted flexibly

– Scalable

• Combine the advantages of Table processing software such as Excel and Database systems

• Portable, computable, mergeable

Page 26: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUDB implmentationNUDB implmentation

• Physical DB Kernel– Hash DB– Inverted Index– Pattern Matching

• Schema Layer, and Query Processing

• User Interface Layer – Data Presentation Management– DUA (Database User Agent, 類似 MUA)

Page 27: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUBlogNUBlog

• AJAX Based Blog System

• Personal Blog Home Base– Can have multiple copies in the web– Creation, Management, Posting

• Import, Export:– XMLRPC– Robot, simulating Browser behaviour

Page 28: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUWatchNUWatch

• Personal Web Agent

• Event Watch, News Watch

• Service Watch,

• Site Watch,

• Commerce Watch,

Page 29: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUWatch ImplementationNUWatch Implementation

• Personal Profile Manager• Matching Platform

– On the fly matching– Batch mode matching through searching

• Data Source Agents– Per user agent– Centralized agent (can reduce overhead)

• Notification Agent– Relay casting to speed up– Gateway to message system

Page 30: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUCommunityNUCommunity

• Personal and Regional Community Engine– Forum, Vote, – Calendar, File Sharing, – Address Book, DB, ..– Interaction mechanism, (auto notification,..)

• A community is conceptually a given a NUWeb site

• A community is treated like a user in the NUWeb space’s authentication and authorization

Page 31: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Access ControlAccess Control

• Support both password-based and membership based protections.

• Each directory is associated with a protection data structure

• Authentication in WNS server• Use Permission List technique for

membership based protection• The protection is a directory base, no

inheritance will be assumed.

Page 32: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

NUJournalNUJournal

• Why the publication is through paper?!– Traditionally, publication HAD TO BE published through paper in the old

age– Journal is both a channel and a barrier – Most of the papers entered the dead state once published

• A new model of publication– Separate the concept of publication and evaluation– Publication is an autonomous will, and publication can be through own

website!, reviewed, commented by readers, or reviewers. – Journal is a marketplace to glue/guide the accesses of publications and

to comment and evaluate the publications– A publication can be a long time living object– Other authors can join the published work along the time, if they make

substantial contributions to the work. – A publication is evaluated by its contribution and impact.

Page 33: NUWeb System sw@gais.cs.ccu.edu. WWW Architecture Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS,

Thanks!Thanks!