hypermedia structure : document composition and migration path for rich set of presentation

6
 Internatio nal Jour nal of C omputer Trends an d Techn ology (IJCT T) - volume4Issue4 –April 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page 630 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation R. N. Jugele* and Dr. V. N. Chavan  * Department of Computer Science, Science College, Congress Nagar, Nagpur. Maharashtra,  Head, Department of Computer Science, S. K. Porwal College, Kamptee, Dist : Nagpur. Maharashtra.  Abstract - The original paper documents are employed  for archiving. Different hypertext structures encounters in the document. Different methods for analyzing  document struct ure is presented. This structure used for  presentation of t he content of t he document to t he user. The hypermedia research community find that it is  necessary to establish a reference architecture for  hypermedia syste ms to make progress on defining a  protocol to enable third party applic ations to access link  services. There is a need to exte nd the scope of these  requirements. The overall architect ure for the integration of existing hypermedi a systems in a  distributed, coll aborative model and provide a clear evolution path towards achieving this goal. Keywords – Hypermedia, link, document, logical, geometric, protocol, object, virtual, runtime. I. INTRODUCTION A working group establishing a protocol for hypermedia systems and aim of this protocol is to enable applications to access hypermedia link service functionality in a consistent and standard manner. It is observed that it is difficult to make  progress on defini ng Hy pertext Protocol without establishing a reference architecture for hypermedia systems. The Dexter Model[8] attempts to provide a standard hypermedia terminology coupled with a formal model of the common abstractions found within contemporary hypermedia systems. A three layer conceptual data model is presented without any suggestion of an architecture for realizing the model. The Flag Taxonomy[16] shows the functionality and interaction o f hypermedia systems in such a manner as to aid classification. To establish an inclusive reference architecture for hypermedia systems. Following are the areas:  Agreement upon specification for location specifiers (LocSpecs)[6]: Reich[11] and Rutledge[15] propose solutions for addressing this issue of open location sp ecifications.  A reference architecture for hypermedia system: Gronbaek[7] propose a synthesis architecture  based around the conceptual layers of the Dexter Model and introduce three protocols for integrating with external entities.  A vision of a globally distributed and collaborative model with a clear evolution path toward this goal: Present model illustrates how hypermedia systems can be integrated in a manner which provide powerful, distributed and collaborati ve architecture. II. PAPER AS STRUCTURE D DOCUMENT Fig. 3. Paper document to Structured Hyperdocument The document model defined forms the basis for algorithms to convert paper documents into structured hyperdocument. These algorithms require processing phases, addressing various aspects of the document structures and content[14]. Processing steps are distinguished based on the different representation levels as shown in Fig. 3. described the method which is tailored easily for use in other applications.  A. Paper t o image objects The scanned pages are segmented using the Isodata thresholding technique[12] and it analysed the  binary images. For speeding up processing, origi nal image can be reduced to other resolutions. These are all mapped to the common document reference

Upload: seventhsensegroup

Post on 02-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation

http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 1/6

 International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 630

Hypermedia Structure : Document

Composition and Migration Path forRich Set of Presentation

R. N. Jugele* and Dr. V. N. Chavan 

*Department of Computer Science, Science College, Congress Nagar, Nagpur. Maharashtra, 

Head, Department of Computer Science, S. K. Porwal College, Kamptee, Dist : Nagpur. Maharashtra.

 Abstract - The original paper documents are employed 

 for archiving. Different hypertext structures encountersin the document. Different methods for analyzing

 document structure is presented. This structure used for

 presentation of the content of the document to the user.The hypermedia research community find that it is

 necessary to establish a reference architecture for

 hypermedia systems to make progress on defining a protocol to enable third party applications to access link services. There is a need to extend the scope of these requirements. The overall architecture for the

integration of existing hypermedia systems in a

 distributed, collaborative model and provide a clearevolution path towards achieving this goal.

Keywords –  Hypermedia, link, document, logical,

geometric, protocol, object, virtual, runtime.

I. INTRODUCTIONA working group establishing a protocol for 

hypermedia systems and aim of this protocol is to

enable applications to access hypermedia link service functionality in a consistent and standard 

manner. It is observed that it is difficult to make

 progress on defining Hypertext Protocol without

establishing a reference architecture for hypermediasystems. The Dexter Model[8] attempts to provide

a standard hypermedia terminology coupled with a

formal model of the common abstractions found within contemporary hypermedia systems. A threelayer conceptual data model is presented without

any suggestion of an architecture for realizing the

model. The Flag Taxonomy[16] shows thefunctionality and interaction of hypermedia systems

in such a manner as to aid classification. To

establish an inclusive reference architecture for hypermedia systems.

Following are the areas:

 Agreement upon specification for location

specifiers (LocSpecs)[6]: Reich[11] and Rutledge[15] propose solutions for addressingthis issue of open location specifications.

 A reference architecture for hypermedia system:Gronbaek[7] propose a synthesis architecture

 based around the conceptual layers of the Dexter 

Model and introduce three protocols for integrating with external entities.

 A vision of a globally distributed and 

collaborative model with a clear evolution pathtoward this goal: Present model illustrates how

hypermedia systems can be integrated in a

manner which provide powerful, distributed and collaborative architecture.

II. PAPER AS STRUCTURED DOCUMENT

Fig. 3. Paper document to Structured 

HyperdocumentThe document model defined forms the basis for 

algorithms to convert paper documents into

structured hyperdocument. These algorithms

require processing phases, addressing variousaspects of the document structures and content[14].

Processing steps are distinguished based on the

different representation levels as shown in Fig. 3.described the method which is tailored easily for 

use in other applications.

 A. Paper to image objectsThe scanned pages are segmented using the Isodata

thresholding technique[12] and it analysed the

 binary images. For speeding up processing, original

image can be reduced to other resolutions. Theseare all mapped to the common document reference

7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation

http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 2/6

 International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 631

coordinate system called image object with its

geometric features. For each image object a set of geometric features is defined i.e. width, height and 

aspect ratio.

 B. Image objects to basic geometric objectsClassifying the image object into a set of geometric

object classes is a segment. Here a decision tree

method is used[18]. The class labels are {text,

figure,horizontal line,vertical line,noise}. Itsfeatures are minimum, maximum, average or modal

value of the features of the image objects in the

group. The values ximin; x

imax, y

imin; y

imax  define the

 bounding box for segment i.

There are three characteristics of segments:

 Features of the individual segments

 Relations between pairs of segments

 Characteristics based on the whole set of 

segments

The individual characteristics used are the width,height and position of a segment on the page. The

 powerful method of selection and action forms the

 basis for further document analysis to deriving the basic geometric objects. The image objects are

usually the smallest basic items in the image which

can be given an interpretation in document terms,

like characters and parts of figures. They do notcorrespond to the basic components required in thegeometric structure which are the single paragraphs

and complete figures.

C. Geometric objects to geometric structureFor multi-column documents, the geometric

structure is mostly concerned with columnstructure. For two column documents segments are

classified into {centered, left column, right

column}.

The column of a segment s  is computed byconsidering whether it is intersected by the middle

line. If not, the column is obvious, otherwise thefollowing is used:

left_column c(s)<- 

column(s) = centered-c(s) right_column c(s)> 

where is the parameter for deciding when anelement is considered to be centered and  c is

centrality. This method is not suited for centered 

segments in the document, as it depends on the

alignment of the bounding boxes in the verticaldirection.

 D. Geometric to logical objectsThe basic objects have a geometric label. There areone or two headers on the top of the page, page

numbers are at the bottom of the page, title pages

have both a title and footer above and below thetextbody. The classification strategy shown in the

following table.

Predicate New type

top most(text)  header  

vertical overlap(text,header)  header  

 bottom most(text)   page number  

in margin(figure,text)  Caption 

segment centered(text)^ Title

above middle(text)

segment centered(text)^ Footer 

 below middle(text)

Segment centered(text) is same as deciding whether 

a column is centered. The algorithm is suited for the title page, the pure textual pages and thecombined text/figure pages present.

 E. Basic objects to contentTo extract the content of figures in a hypertextcontext, focus is on labels in the figure.

  plain alphanumeric labels : facsimile of their 

corresponding ASCII string

 alphanumeric template labels  : text strings

derived from a template where the variable part isa plain alphanumeric label and the fixed part is

some visual shape

  icon labels  : non-alphanumeric labelsdistinguished by their shape alone

  legend labels  : icon labels with an associated textual definition

The content of figures is analyzed at full resolution

to avoid losing important details.The resulting segments are sent to the figure

analysis package. Again the raw text is tokenized 

for use in further analysis.

 F. Layout and content to logical structureLogical segments have it own meaning and no

direct relation to other objects, it can be done by

starting with the layout information[18] and thenapplying rules capturing knowledge of layout

conventions. Following regular expressions is

used, where * means zero or more occurrences and + means at least one occurrence.

-chapter:<start-of-line><numeral><,><word>+<end-of-line>-section:<start-of-line><numeral><,><numeral>

<word>+<end-of-line>

Text labels in a figure have the geometricclassification text they can have a logical

classification indicating their meaning. As in the

7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation

http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 3/6

 International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 632

logical labeling of basic objects this is domain

specific and requires knowledge about the contentof the figures.

Three logical classes can be distinguished. Figures can have a label of class title

 There are labels of class note  provide contextualinformation about the figure

 Class name each of them naming a part of an

object in the figure

G. Logical structure to hypertextIt provides the hierarchical structure  of thehyperdocument and the linear structures  required 

for the reading order and accessing the figures in

the document[13]. Computing a standard index

structure  based on the labels in the figure is also

trivial. An index of important keywords in the textcan be found automatically based on the statistics

of occurrence in the text[3].

The cross-group structure  between the set of figuresand the text can be found when there is some

explicit way of reference to figures and these

references can be found by searching for the patterns:

- “ Note"<:> \Reference Figure"<numeral>

- “ Note"<:>\Reference Figures"<<numeral>,>+

“and" <numeral>Other common ways of referring to figures are,“see figure <numeral>", “as shown in figure

<numeral>", “(fig. <numeral> illustrates",

“(fig.<numeral>)", etc. The values of the numeralsare used to derive the links of the cross-group

structure that relates the text with the set of figures.

To find the cross-group structure  for a specificfigure and its scope in the text, the tokenized text of 

each label in the figure is searched for in the

corresponding text.

Characteristics of the labels:

 The labels in the figure consist of multiple words

 The text in both the label and the associated text

 The text of the labels does not necessarily appear 

in the same order and with the exact words in the

text

Finally identify whether the superscript is part of atextual part of a document, formula, footnotes also

have to be incorporated in the classification of 

logical basic objects. As no semantic linking is

considered there is no remaining cross-referencestructure.

III. COMPOSITIONS WITH VARIOUS

ENTRY POINTSModels MOAP, I-HTSPN and Madeus allow a

composition as an end point of a relationship but

not a component inside a composition. Different

entry points in a composition are desirable becausethey allow different presentations of nodes that are

recursively contained in the composition. NCM isan example of a model that allows such facility

since a link can go into nested compositions asspecified by the node list of end point of the link.

In Fig. 2 the presentation of composition C2 can be

started through links l1 or l3, coming from other 

 parts of the document. When C2 starts through link l1, nodes V1 (video), A1 (background audio) and 

A2 (voice node) must start at the same time. If C2

starts through link l3, nodes V1 and A2 must startat the same time without the background audio.

Therefore the presentation depends on the external

context that is on the navigation that led to

 presentation of the composite node.

Fig. 2. Hypermedia document

IV. A PROTOCOL ALONE IS INSUFFICIENT

Most systems designer have developed their own

 proprietary protocols for communicating with link server and further involve a major re-

implementation to rewrite the system to find out

some new standard protocol. Davis[2] suggested 

that the difference between system protocols could  be resolved if each system produced a protocol

shim which would reside between the application

and the link server as shown in figure 3.Anderson[1] offers a critique of Hypertext Protocoland makes pragmatic recommendations for 

improving syntax and semantics.

Fig. 3: Hypermedia Protocol architecture

The aim of the Hypermedia Protocol initiative is toenrich the user's environment by integrating third 

 party applications with existing link services. It will

not reduce the effectiveness of link services by

7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation

http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 4/6

 International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 633

rendering the functionality of these associated tools

inaccessible to the end user.To overcome this problem there is general

agreement that some form of runtime on the user'smachine is necessary, the further model is shown in

figure 4. It uses a Java virtual machine[4] and develop a framework to allow additional tools and 

functionality to be dynamically downloaded to the

user's machine. The protocol shim functionality will

 be incorporated within the runtime component.

Fig. 4: Introduced runtime component

A further requirement identified and it is amultimedia document/object management which

open to allow developers to utilize third party product shown in figure 5 and useful in providingdirection for the Hypermedia Protocol initiative

and enhance both current and future developments

in the field of hypermedia.

Fig. 5 Reference architecture

V. RUNTIME INVENTION

It act as a mediator between the viewers and thelink server. Following are various approaches to

 provide a runtime component which offer the rich

set of presentation, authoring, navigation and hypermedia link service tools.

  Implementing new Runtime: Implement theruntime and client-side hypermedia tools from

scratch, it signifies a complete re-invention. It

involve an unreasonable amount of effort. It is platform dependent but only one implementation

 per platform. It provide a consistent user 

interface across the platforms. 

 Virtual Machine: It allow minimal runtime

component in a byte-code interpreted languageand extremely versatile. The user can incorporate

any custom written tools with the runtime tosupplement those provided by the link server. It

offers great flexibility and zero administrationclient, each link server must assume that the

runtime component has no local hypermedia

tools of its own and should therefore offer to provide them. It demands a complete re-invention

for each different link server, as it supply its own

client-side hypermedia tools the problem of 

interface inconsistency may occur. Additional penalty also incurred each time while a new tool

is dynamically downloaded prior to usage. 

 Reusing Existing Hypermedia Systems asRuntimes: This strategy promotes the wholesalere-use of existing and familiar client-side

hypermedia tools which sufficiently open to

integrate and combine the previous approaches. Itallow to the developer and user a complete

freedom over their choice of runtime which

would be the favorite hypermedia system. Thisapproach is designed to accommodate their 

differences and allow them to co-exist and allow

a hypermedia system with its own set of 

 proprietary viewers to utilize third party remotelink service. A full definition of the essential

components and protocols is required to achievethis. 

Allowing a hypermedia system to act as a runtimecomponent within the model means the

hypermedia system can augment locally provided 

link services with those of a remote link service. If a runtime is represented by a hypermedia system

with a link service then there is no reason why the

runtime cannot also act as a link service. If a link 

service is represented by a hypermedia system, thenthere is no reason why the link service cannot also

act as a runtime. This confuses the distinction between the two entities as a client runtime can

masquerade as a link server and a link server canmasquerade as a client runtime. Due to this dual

role, greater scope for configuration is possible.

VI. HYPERMEDIA REFERENCE

ARCHITECTUREThe protocols required to connect the componentsand then present a reference architecture for 

hypermedia so that individual components can be

discuss their role within the architecture. The protocol allows the developers of each component

to have the choice as to which aspects of the

reference architecture they wish to adopt and 

 pattern of interaction that each of the protocols isdefine. Following are related protocols:

7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation

http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 5/6

 International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 634

 Viewer Protocol: It has identical purpose to that

of the Hypermedia Protocol where it will enablethird party applications to communicate with the

runtime component. Following are issues thatneed to be addressed:

o Ratification of the way in which viewers candetermine the hypermedia and collaboration

services available

o The adoption of a sufficiently open and versatile specification of location specifiers.

 Hypermedia Protocol: It provide an interface

for communicating with a link server. Following

are issues that need to be addressed:o Ratification of the way in which link servers

advertise the services they offer 

o The adoption of a sufficiently versatilespecification of location specifiers

o Provision of locking for hypermedia objects

 Collaboration Service Protocol: The systems

DHM[5], HyperDisco[19], SP3[9] and Sepia[17] provide support for collaboration among users.

By incorporating an additional component, many

of the common services necessary to supportcollaborative working practices can be provided.

Following are issues that need to be addressed:

o Support for tight and loose modes of 

collaborationo Interaction with Document Management

System to provide object locking

o Event notification subscription/unsubscription

and deliveryo Interaction with Link Service and Document

Management System to support versioning

 Document Management Service Protocol:Open Document Management API(ODMA)[10]

defines a common interface to commercial

document management systems and promoteinteroperability. This standard also addresses

issues like heterogeneity, unique and portabledocument identifiers. The ODMA standard has

no mention of support for the streaming of multimedia objects. Where as DHM[5],

HyperDisco[19] and SP3[9] provide proprietary

solutions for document management and versioncontrol. Following are issues that need to beaddressed:

o Globally unique and portable document

naming schemeo Add, remove and modify documents

o Document retrieval

o Support for versioning

o Document locking 

VII. CONCLUSIONFor the layout and logical analysis, one page of each class is used for optimizing the parameters

that were not fixed beforehand like sh, ov, sw and 

oh used in grouping of segments and  used indefining columns. For optimizing the parameters

used in detection of text labels in the figure theselected figure page is used. The figure contains

 both parentheses and dashes in the textual labels.The first structure is the hierarchical structure. The

cross-group structure  between the set of figures and the text i.e., all references to figures are found correctly. In the logical classification of the content

of figures identifying the titles and notes no errors

are made by the system.

Model provide a reference architecture for theintegration of differing hypermedia systems in a

 powerful, distributed and collaborative framework.

Different alternative strategies for achieving this

end are described. This allows users to continue toenjoy the rich functionality of existing and familiar 

client-side hypermedia tools available withinchosen hypermedia system.Without prior agreement upon the clear roles of the

architectural components, a unilateral attempt at

defining any of the four protocols identified by the

authors would be non-productive and as such theseremain undefined. If a reference architecture can

help guide the way towards the global integration

of hypermedia systems, then the researchcommunity can look forward to exploring emerging

technologies and their potential for easing the non-

trivial task of distributed information management.

REFERENCES

[1]  Anderson, K. M., A Critique of the OpenHypermedia Protocol. In Proceedings of the 3rd Workshop on Open Hypermedia Systems, TechnicalReport CIT-SR-97-01, pp1-4, April1997.http://www.daimi.aau.dk/~kock/OHS-HT97/Papers/anderson.html.

[2]  Davis H. C., Lewis, A.J. and Rizk, A., OHP: A DraftProposal for an Open Hypermedia Protocol, In TheProceedings of the 2nd Workshop on OpenHypermedia Systems, Technical Report UCI-ICS 96-10. http://www.daimi.aau.dk/~kock/OHS-

HT96/Documents/ohp.html.[3]  G. Salton. Another look at automatic text-retrieval

systems. Communications of the ACM,29(7):648{656, 1986.

[4]  Gosling, J. and McGinton, H., The Java LanguageEnvironment: A White Paper,1995. http://java.sun.com/whitePaper/java-whitepaper-1.html.

[5]  Grønbæk, K. and Trigg, R. H., Design Issues for aDexter-Based Hypermedia System. In Proceedings of the ACM Hypertext '92 Conference, Milano, Italy, pp191-200, November 1992.

[6]  Grønbæk, K. and Trigg, R. H., Toward a Dexter- based Model for Open Hypermedia: Unifying

7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation

http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 6/6

 International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 635

Embedded References and Link Objects.In Proceedings of the ACM Hypertext '96 Conference,Washington D.C., pp149-160, March 1996.

[7]  Grønbæk, K. and Wiil, U. K., Towards a Reference

Architecture for Open Hypermedia. In Proceedings of the 3rd Workshop on Open HypermediaSystems, Technical Report CIT-SR-97-01, pp31-38,April 1997. http://www.daimi.aau.dk/~kock/OHS-HT97/Papers/gronbak.html

[8]  Halasz, F. G. and Schwartz, M., The Dexter Hypertext Reference Model. In Communications of theACM, 37(2), pp30-39, February 1994.

[9]  Leggett, J. J. and Schnase, J. L., Dexter With Open

Eyes. In Communications of the ACM, 37(2), pp77-86,February 1994.

[10]  ODMA Association of Information and ImageManagement (AIIM). http://www.aiim.org/odma.

[11] Reich, S., How OHP's LocSpecs Could BenefitFrom ISO/IEC 10744. In Proceedings of the 3rd Workshop on Open Hypermedia Systems, TechnicalReport CIT-SR-97-01, pp54-59, April

1997.http://www.daimi.aau.dk/~kock/OHS-HT97/Papers/reich.ps.

[12] R.O. Duda and P.E. Hart. Pattern classi_cation and scene analysis. Wiley, 1973.

[13] R.N. Jugele and V.N. Chavan,“ODA : ProcessingModel Design for Linking Document”, InternationalJournal Of Engineering And Computer Science,ISSN:2319-7242, Vol 2. - , issue 3, March - 2013, pp.

806-810.[14] R.N. Jugele and V.N. Chavan,“ODA: A Study of 

Document Design", International Journal of EmergingTrends & Technology in Computer Science

(IJETTCS), ISSN:2278-6856,Vol 2 , issue 1, Jan-Feb- 2013, pp. 194-198.

[15] Rutledge, L. and Hardman, L. Applying the HyTimeModel to the Open Hypermedia Protocol.In Proceedings of the 3rd Workshop on OpenHypermedia Systems, Technical Report CIT-SR-97-01, pp63-65, April1997. http://www.daimi.aau.dk/~kock/OHS-HT97/Papers/rutledge.html

[16] Sterbye, K. and Wiil, U. K., The Flag Taxonomy of Open Hypermedia Systems. In Proceedings of theACM Hypertext '96 Conference, Washington D.C., pp129-139, March 1996.

[17] Streitz, N. and Haake, J. and Hannemann, J. and Lemke, A. and Schuler, W. and Schütt, H. and 

Thüring, M., SEPIA: A Cooperative HypermediaAuthoring Environment. In Hypertext: Concepts,Systems and Applications, Proceedings of theHypertext '90 Conference, INRIA, France, pp11-22, November 1990.

[18] S. Tsujimoto and H. Asada. Major components of a

complete text reading system. Proceedings of theIEEE, 80(7):1133{1149, 1992.

[19] Wiil, U. K. and Leggett, J. J., The HyperDiscoApproach to Open Hypermedia Systems.In Proceedings of the ACM Hypertext '96 Conference,Washington D.C. , pp140-148, March 1996.

Books :

01. Principles of Multimedia By. Ranjan Parekh

Tata McGraw Hill Companies.

02. Hypertext and Hypermedia By. J. Nielsen

Academic Press.