combining documentation.docx

Upload: brandy-boone

Post on 04-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 combining documentation.docx

    1/56

    INTRODUCTION

  • 7/29/2019 combining documentation.docx

    2/56

    INTRODUCTIONSYNOPSIS

  • 7/29/2019 combining documentation.docx

    3/56

    PROJECT DESCRIPTION

    Project description:

    Module Names:1. Search(user)

    2. Query result page

    3. Novel data extraction & alignment

    4. Nested structure algorithm

    5. Accurate data extraction.

    Module Description:

    1. Search(user)The web users gather information from web databases using keyword queries.

    Web databases extract number of pages called as output page to the querying users. Many

    usersmay wants to collect useful informations from number of extracting pages. All the

    extracting pages could not contain exact informations and also it will contains some

    auxiliary or un relevant informations. And also may not disclose or store all accurat e

    informations in a single page.

    2. Query result page :

    Web databases generate query result pages based on a users query. Automatically

    extracting the data from these query result pages is very important for many applications,

    such as data integration, which need to cooperate with multiple web databases. Query

    result pages datas are may be accurate or auxiliary but they are most important for users

    to separate the informations is a difficult task.

    3.Novel data extraction & alignment :

    In general, a query result page contains not only the actual data, but also other

    information, such as navigational panels, advertisements, comments, information about

    hosting sites, and so on. The goal of web database data extraction is to remove any

  • 7/29/2019 combining documentation.docx

    4/56

    irrelevant information from the query result page, extract the query result records (QRRs)

    from the page, and align the extracted QRRs into a table such that the data values

    belonging to the same attribute are placed into the same table column.

    We employ the following two-step method, called Combining Tag and Value Similarity

    (CTVS), to extract the QRRs from a query result page p.

    1. Record extraction identifies the QRRs in p and involves data region identification and

    the actual segmentation step.

    2. Record alignment aligns the data values of the QRRs in p into a table so that data

    values for the same attribute are aligned into the same table column.

    4. Nested structure algorithm:

    The proposed nested structure algorithm is mainly used for extracting more than

    one relevant data (or) information from the query result pages. When the user querying

    result page contains extra informations or relevant informations for the searching

    keyword then it will also store the value similarity in same tag within the same column.

    5. Accurate data extraction:

    Our proposed techniques gives accurate data informations in the situation of

    QRRs are not contiguous, which may be due to the presence of auxiliary information,

    such as a comment, recommendation or advertisement, and for handling any nested

    structure that may exist in the QRRs. We also design a new record alignment algorithm

    that aligns the attributes in a record, first pair wise and then holistically, by combining the

    tag and data value similarity information. Our results show that CTVS achieves high

    precision and outperforms existing data extraction methods by producing accurate results.

  • 7/29/2019 combining documentation.docx

    5/56

    SYSTEM ANALYSIS

  • 7/29/2019 combining documentation.docx

    6/56

  • 7/29/2019 combining documentation.docx

    7/56

    Existing System:

    Online databases, called web databases, comprise the deep web. Compared with

    webpages in the surface web, which can be accessed by a unique URL, pages in the deep web

    are dynamically generated in response to a user query submitted through the query interface of a

    web database. Upon receiving a users query, a web database returns the relevant data, either

    structured or semi structured, encoded in HTML pages. Many web applications, such as meta

    querying, data integration and comparison shopping, need the data from multiple web databases.

    For these applications automatic data extraction is necessary.

    Only when the data are extracted and organized in a structured manner, such as tables,

    can they be compared and aggregated. Hence, accurate data extraction is vital for these

    applications to perform correctly. The problem of automatically extracting data records that are

    encoded in the query result pages generated by web databases. State-of-the-art data extraction

    methods are not produce expected results.

    Disadvantages:

    Also Extract auxiliary/advertisements in results

    Data region cannot identify.

    Nested information could not store in results.

    employ wrapper induction can perform poorly when the format of a query result page

    changes

  • 7/29/2019 combining documentation.docx

    8/56

    Proposed System:

    This paper focuses on the problem of automatically extracting data records that are

    encoded in the query result pages generated by web databases. In general, a query result page

    contains not only the actual data, but also other information, such as navigational panels,

    advertisements, comments, information about hosting sites, and so on. The goal of web database

    data extraction is to remove any irrelevant information from the query result page, extract the

    query result records (referred to as QRRs in this paper) from the page, and align the extracted

    QRRs into a table such that the data values1 belonging to the same attribute are placed into the

    same table column.

    We employ the following two-step method, called Combining Tag and Value Similarity (CTVS),

    to extract the QRRs from a query result page p.

    1. Record extraction identifies the QRRs in p and involves two substeps: data region2

    identification and the actual segmentation step.

    2. Record alignment aligns the data values of the QRRs in p into a table so that data values for

    the same attribute are aligned into the same table column.

    Advantages:

    CTVS improves data extraction accuracy in three ways.

    1. An adapted data region identification method and merge method

    2. A novel method

    3. A new nested-structure processing algorithm

    4. Produce accurate data extraction in results.

  • 7/29/2019 combining documentation.docx

    9/56

    TECHNICAL FEASIBILITY

  • 7/29/2019 combining documentation.docx

    10/56

    SYSTEM SPECIFICATION

  • 7/29/2019 combining documentation.docx

    11/56

  • 7/29/2019 combining documentation.docx

    12/56

    SOFTWARE SPECIFICATION

    FRONT END

    ASP.NET

    ASP.NET is part of the .NET framework. ASP.NET programs are centralized

    applications hosted on one or more Web servers that respond dynamically to client requests. The

    responses are dynamic because ASP.NET intercepts requests for pages with a specific extension

    (.aspx or .ascx) and hands off the responsibility for answering those requests to just-in-time (JIT)

    compiled code files that can build a response on-the-fly.

    ASP.NET deals specifically with configuration (web.config and machine.config)

    files, Web Services (ASMX) files, and Web Forms (ASPX) files. The server doesnt serve any

    of these file typesit returns the appropriate content type to the client. The configuration file

    types contain initialization and settings for a specific application or portion of an application.

    Another configuration file, called machine.web, contains machine-level initialization and

    settings. The server ignores requests for web files, because serving them might constitute a

    security breach.

    Client requests for these file types cause the server to load, parse, and execute code

    to return a dynamic response. For Web Forms, the response usually consists of HTML or WML.

    Web Forms maintain state by round-tripping user interface and other persistent values between

    the client and the server automatically for each request.

    A request for a Web Form can use View State, Session State, or Application State to

    maintain values between requests. Both Web Forms and Web Services requests can takeadvantage of ASP. Nets integrated security and data access through ADO.NET, and can run

    code that uses system services to construct the response. So the major difference between a static

    request and a dynamic request is that a typical Web request references a static file. The server

    reads the file and responds with the contents of the requested file.

  • 7/29/2019 combining documentation.docx

    13/56

    4.1.2ASP.NET Events:

    Every time an ASP.NET page is viewed, many tasks are being performed behind the scenes.

    Tasks are performed at key points ("events") of the page's execution lifecycle.The most common

    events are:

    In Figure 1-1, the layer on top of the CLR is a set of framework

    base classes, followed by an additional layer of data and XML classes, plus another

    layer of classes intended for web services, Web Forms, and Windows Forms.

    Collectively, these classes are known as the Framework Class Library (FCL), one of the

    largest class libraries in history and one that provides an object-oriented API to all the

    functionality that the .NET platform encapsulates. With more than 4,000 classes, the

    FCL facilitates rapid development of desktop, client/server, and other web services and

    applications.

    4.1.2.1OnInit

    The first event in our list to be raised is OnInit. When this event is raised, all of the

    page's server controls are initialized with their property values. Post Back values are not applied

    to the controls at this time.

  • 7/29/2019 combining documentation.docx

    14/56

    ASP.NET supports all the .NET languages (currently C#, C++, VB.NET, and JScript,

    but there are well over 20 different languages in development for .NET), so you will eventually

    be able to write Web applications in your choice of almost any modern programming language.

    The .NET Framework sits on top of the operating system, which can be any flavor of Windows,2

    and consists of a number of components. Currently, the .NET Framework consists

    Fig4.1.1.Interoperability

    Increases in speed and power, ASP.NET provides substantial development

    improvements, like seamless server-to-client debugging, automatic validation of form data.

    4.1.2.2On Load

    The next event to be raised is On Load, which is the most important event of them all

    as all the pages server controls will have their Post Back values now. At the time of the page willpost in the database we can use the on load method. The user will perform the event(like to click

    the button to perform the on load method).

    4.1.2.3Post Back Events

  • 7/29/2019 combining documentation.docx

    15/56

  • 7/29/2019 combining documentation.docx

    16/56

    Rich library of Web Controls

    Separation of layout (HTML) and logic (e.g. C#)

    Compiled languages instead of interpreted languages

    GUI can be composed interactively with Visual Studio .NET

    Better state management

    4.1.4 Namespaces

    ASP.NET uses a concept called namespaces. Namespaces are hierarchical object

    models that support various properties and methods. For example, HTML server controls reside

    in "System.web.UI.HtmlControls" namespace, web server controls reside in

    System.web.UI.WebControls" namespace and ADO+ resides in "System. Data" namespace.

    4.1.4 Language Independent

    An ASP.NET page can be created in any language supported by .NET framework.

    Currently .NET framework supports VB, C#, JScript and Managed C++. .NET includes a

    Common Language Specification (CLS), which provides a series of basic rules that are required

    for language integration.

    4.1.5 ASP.NET Server controls

    Using ASP.NET Server Controls, browser variation is handled because these

    controls output the HTML themselves based on the browser requesting the page. Even if we plan

    to use web controls exclusively, its worth reading through this section to master the basics html

    controls. Along the way, youll get an introduction to a few ASP .NET essentials that apply to allkinds of server controls, including view state, post backs, and event handling.

    4.1.6Types of controls

  • 7/29/2019 combining documentation.docx

    17/56

    ASP.NET has two basic types of controls: HTML server controls and Web server

    controls.HTML Server Controls are generated around specific HTML elements and the

    ASP.NET engine changes the attributes of the elements based on server-side code that you

    provide. Web server controls revolve more around the functional you need on the page. The

    ASP.NET engine takes the extra steps to decide based upon the container of the requester, what

    HTML to output.

    Figure: web controls

    Server Explorer

    Server Explorer window enables to perform a number of functions such as database

    connectivity, performance monitoring, and interacting with event logs. By using Server Explorer

    you can log on to a remote server and view database and system data about that server. Many of

    the functions that are performed with the Enterprise Manager in SQL Server can now be

    executed in the Server Explorer.

    Solution Explorer

    Solution Explorer provides an organized view of the projects in the application. The

    toolbar within the Solution Explorer enables to

  • 7/29/2019 combining documentation.docx

    18/56

  • 7/29/2019 combining documentation.docx

    19/56

    Properties window provides the properties of an item that is part of the application.

    This enables to control the style and behavior of the item selected to modify.

    Dynamic Help

    Dynamic Help window shows a list of help topics. The help topics change based on

    the item selected or the action being taken. The Dynamic Help window shows the help items

    displayed when you have a Button control on the page selected. After the item is selected, a list

    of targeted help topic is displayed. The topics are organized as a list of links. Clicking one of the

    links in the Dynamic Help window opens the selected help topic in the Document window. the

    help about the all details about the all information. The result is produced in the lower pane of

    the Document window.

    Document window

  • 7/29/2019 combining documentation.docx

    20/56

    The Document window is the main window within Visual Studio.NET where the

    applications are built. The Document window shows open files in either Design or HTML mode.

    Each open file is represented by a tab at the top of the Document window. Any number of files

    can be kept open at the same time, and you can switch between the open files by clicking the

    appropriate tab.

    Design mode versus HTML mode

    Visual Studio.NET offers two modes for viewing and building files: Design and

    HTML. By clicking the Design tab at the bottom of the Document window, you can see how the

    page will view to the user. The page is built in the Design mode by dragging and dropping

    elements directly onto the design page or form. Visual Studio .NET automatically generates the

    appropriate code. When the page is viewed in HTML mode, it shows the code for the page. It

    enables to directly modify the code to change the way in which the page is presented.

    Working with SQL Server through the Server Explorer

    Using Visual Studio.NET, there is no need to open the Enterprise Manager from

    SQL Server. Visual Studio.NET has the SQL Servers tab within the Server Explorer that gives a

    list of all the servers that are connected to those having SQL Server on them. Opening up a

    particular server tab gives five options:

    Database Diagrams

    Tables

    Views

    Stored Procedures

    Functions

    Database Diagrams

  • 7/29/2019 combining documentation.docx

    21/56

    To create a new diagram right click Database diagrams and select New Diagram.

    The Add Tables dialog enables to select one to all the tables that you want in the visual

    diagram you are going to create. Visual Studio .NET looks at all the relationships between

    the tables and then creates a diagram that opens in the Document window. Each table is

    represented in the diagram and a list of all the columns that are available in that particular

    table. Each relationship between tables is represented by a connection line between those

    tables. The properties of the relationship can be viewed by right clicking the relationship line.

    Tables

    The Server Explorer allows working directly with the tables in SQL Server. It

    gives a list of tables contained in the particular database selected.

    By double clicking one of the tables, the table is seen in the Document window. This grid

    of data shows all the columns and rows of data contained in the particular table. The data

    can be added or deleted from the table grid directly in the Document window. To add a

    new row of data, move to the bottom of the table and type in a new row of data after

    selecting the first column of the first blank row. You can also delete a row of data from

    the table by right clicking the gray box at the left end of the row and selecting Delete.

    By right clicking the gray box at the far left end of the row, the primary key can be set for

    that particular column. The relationships to columns in other tables can be set by

    selecting the Relationships option. To create a new table right-click the Tables section

    within the Server Explorer and selecting New Table. This gives the design view that

    enables to start specifying the columns and column details about the table.

    To run queries against the tables in Visual Studio .NET, open the view of the query

    toolbar by choosing View->Toolbars->Query. To query a specific table, open that tablein the Document window.

    Views

  • 7/29/2019 combining documentation.docx

    22/56

    To create a new view, right-click the View node and select New View. The Add Table

    dialog box enables to select the tables from which the view is produced. The next pane enables to

    customize the appearance of the data in the view.

    C#.NET:

    The C# language is disarmingly simple, with only about 80 keywords and a dozen

    built-in data types, but C# is highly expressive when it comes to implementing modern

    programming concepts. C# includes all the support for structured, component-based, object-

    oriented programming that one expects of a modern language built on the shoulders of C++ and

    Java.

    The C# language was developed by a small team led by two distinguished

    Microsoft engineers, Anders Hejlsberg and Scott Wiltamuth. Hejlsberg is also known for

    creating Turbo Pascal, a popular language for PC programming, and for leading the team that

    designed Borland Delphi, one of the first successful integrated development environments for

    Client/server programming. At the heart of any object-oriented language is its support for

    defining and working with classes. Classes define new types, allowing you to extend the

    language to better model the problem you are trying to solve.

    C# contains keywords for declaring new classes and their methods and properties,

    and for implementing encapsulation, inheritance, and polymorphism, the three pillars of object-

    oriented programming. In C# everything pertaining to a class declaration is found in the

    declaration itself. C# class definitions do not require separate header files or Interface Definition

    Language (IDL) files. Moreover, C# supports a new XML style of inline documentation that

    greatly simplifies the creation of online and print reference documentation for an application. C#

    also supports interfaces, a means of making a contract with a class for services that the interface

    stipulates. In C#, a class can inherit from only a single parent, but a class can implement multiple

    interfaces. When it implements an interface, a C# class in effect promises to provide the

    functionality the interface specifies.

    The two dominant languages for Windows development in the pre-.NET world were

    C++ and Visual Basic 6 (VB6). Both had sizable user populations (although VB6's user base was

    much larger), and so Microsoft needed to find a way to make both groups as happy as possible

  • 7/29/2019 combining documentation.docx

    23/56

    with their new environment. How, for example, could the large number of Windows developers

    who know (and love) C++ be brought forward to use the .NET Framework? One answer is to

    extend C++, an option described later. Another approach, one that has proven more appealing for

    most C++ developers, is to create a new language based on the CLR but with a syntax derived

    from C++. This is exactly what Microsoft did in creating C#.

    The results of this commitment to date are impressive. For one thing, the

    scope of .NET is huge. The platform consists of four separate product groups:

    A set of languages, including C# and Visual Basic .NET; a set of development tools,

    Including Visual Studio .NET; a comprehensive class library for building web services

    And web and Windows applications; as well as the Common Language Runtime (CLR)

    To execute objects built within this framework.

    A set of .NET Enterprise Servers, formerly known as SQL Server 2000, Exchange

    2000, BizTalk 2000, and so on, that provide specialized functionality for relational

    Data storage, email, B2B commerce, etc.

    An offering of commercial web services, called .NET My Services; for a fee,

    Developers can use these services in building applications that require knowledge of

    User identity, etc.

    New .NET-enabled non-PC devices, from cell phones to game boxes.

    C# provides component-oriented features, such as properties, events, and

    declarative constructs (called attributes). Component-oriented programming is supported by the

    CLRs support for storing metadata with the code for the class. The metadata describes the class,

  • 7/29/2019 combining documentation.docx

    24/56

    Including its methods and properties, as well as its security needs and other attributes, such as

    whether it can be serialized; the code contains the logic necessary to carry out its

    functions

    The Common Language Runtime

    The Common Language Runtime (CLR) is the foundation for everything else in the

    .NET Framework. To understand .NET languages such as C# and Visual Basic (VB), you must

    understand the CLR. To understand the .NET Framework class libraryASP.NET, ADO.NET,

    and the rest you must understand the CLR. And since the .NET Framework has become the

    default foundation for new Windows software, anybody who plans to work in the Microsoft

    environment needs to come to grips with the CLR.Software built on the CLR is referred to as managed code, and the CLR provides a

    range of things that support creating and running this code. Perhaps the most fundamental is a

    standard set of types that are used by languages built on the CLR, along with a standard format

    for metadata, which is information about software built using those types. The CLR also

    provides technologies for packaging managed code and a runtime environment for executing

    managed code. As the most elemental part of the .NET Framework, the CLR is unquestionably

    the place to start in understanding what the Framework offers.

    FORMS

    A form is used to view and edit information in the database record by record .A form

    displays only the information we want to see in the way we want to see it. Forms use the familiar

    controls such as textboxes and checkboxes. This makes viewing and entering data easy.

    Views of Form

    We can work with forms in several primarily there are two views,

  • 7/29/2019 combining documentation.docx

    25/56

    1. Design View

    2. Form View

    Design View

    `To build or modify the structure of a form, we work in forms design view. We can

    add control to the form that are bound to fields in a table or query, includes textboxes, option

    buttons, graphs and pictures. The form view which displays the whole design of the form the

    user will view the document as the way of we like.

    Main Features of ASP .NET

    Successor of Active Server Page (ASP), but completely different architecture.

    1. Object-oriented, Event-based

    2. Rich library of web controls, Better state management

    3. Separation of layout (HTML) and logic

    4. Complied languages instead of interpreted languages

    5. GUI can be composed interactively with Visual Studio .NET

    An assembly is a collection of files that appear to the programmer to be a single

    dynamic link library (DLL) or executable (EXE). In .NET, an assembly is the basic unit of reuse,

    Versioning, security, and deployment. The CLR provides a number of classes for manipulating

    Assemblies.

    REPORT

  • 7/29/2019 combining documentation.docx

    26/56

    A report is used to vies and print information from the database. The report can

    ground records into many levels and compute totals and average by checking values from many

    records at once. Also the report is attractive and distinctive because we have control over the size

    and appearance of it.

    MODULE & MACRO

    A macro is a set of actions. Each action in macros does something. Such as opening

    a form or printing a report .We write macros to automate the common tasks the work easy and

    save the time.

    Modules are units of code written in access basic language. We can write and use

    module to automate and customize the database in very sophisticated ways.

    A final note about C# is that it also provides support for directly accessing

    memory using C++ style pointers and keywords for bracketing such operations as

    unsafe, and for warning the CLR garbage collector not to collect objects referenced by

    pointers until they are released.

    4.2 BACK END

    4.2.1SQL Server 2005:

    Several new features and capabilities have been added to SQL Server 2005. Some of

    the most notable features include native XML storage and query support, and integration with

    the .NET Common Language Runtime. The comparative editions of this version of SQL Server

    haven't really changed much. In addition to the Standard, Developer, and Enterprise editions,

    there is a variety of the product called the SQL Server 2005 Express Edition. This is essentially

    the replacement for the SQL Server 2000 Desktop Engine (MSDE) that shipped with versions of

    Office and Access in the past. It's a lightweight version of the SQL Server engine, intended to

    run on a desktop computer with a limited number of connections. As our friends at Microsoft

    continue to gently nudge users away from the Access JET database engine and toward SQL

    Server, their products will continue to become more aligned and standardized. Like the more

    serious editions.

  • 7/29/2019 combining documentation.docx

    27/56

    SQL Server Express can be managed from within Access, Visual Studio, or

    the SQL Server client tools. The SQL language has been enhanced in a few places but is

    generally unchanged. Because Transact-SQL conforms to the industry standard ANSI SQL

    standard, you will find only a few minor additions to the supported syntax in SQL Server 2005.

    A generation of smaller-scale database products evolved to fill the void left the

    casual application developer and power user. Products such as the following became the norm

    for department-level application because they were accessible and inexpensive.

    1. dBase

    2. FoxPro

    3. Paradox

    4. Clipper

    5. Clarion

    6. FileMaker

    7. Access

    The big databases were in another class and were simply not available outside

    of formal IT circles. They were complicated and expensive. Database administrators and

    designers used cumbersome command-line script to create and manage database.

    It was a full-time job; DBAs wrote the script to manage the databases and

    application developers wrote the code for the application that ran against them. Life was good.

    Everyone was happy.

    However, there is only one real constant in the IT world and that is change. In the

    past five years, there have been significant changes in the world of application development,

    database design, and management.

    The most popular language for querying and manipulating databases is SQL,

    usually pronounced "sequel." SQL is a declarative language, as opposed to a procedural

    language, and it can take a while to get used to working with a declarative language when you

    are used to languages such as C#. The heart of SQL is the query. A query is a statement that

  • 7/29/2019 combining documentation.docx

    28/56

    returns a set of records from the database. For example, you might like to see all the Company

    Names and CustomerIDs of every record in the Customers table in which the customer's address

    is in London. To do so, write:

    Select CustomerID, Company Name from Customers where city = 'London'

    SQL joins are inner joins by default. Writing join orders is the same as writing inner join

    orders. The SQL statement goes on to ask the database to create an inner join with Products,

    getting every row in which the ProductID in the Products table is the same as the ProductID in

    the Order Details table. Then create an inner join with customers for those rows where the

    CustomerID is the same in both the Orders table and the Customer table.

    The ADO.NET Object Model

    The ADO.NET object model is rich, but at its heart it is a fairly straightforward set of

    classes. The most important of these is the DataSet. The DataSet represents a subset of the entire

    database, cached on your machine without a continuous connection to the database. Periodically,

    you'll reconnect the DataSet to its parent database, update the database with changes you've

    made to the DataSet, and update the DataSet with changes in the database made by other

    processes.

    This is highly efficient, but to be effective the DataSet must be a robust subset of

    the database, capturing not just a few rows from a single table, but also a set of tables with all the

    Metadata necessary to represent the relationships and constraints of the original database. This

    Is, not surprisingly, what ADO.NET provides.

    The DataSet is composed of Data Table objects as well as DataRelation objects.

    These are accessed as properties of the DataSet object. The Tables property returns a

    DataTableCollection, which in turn contains all the Data Table objects.

    Data Tables and Data Columns

  • 7/29/2019 combining documentation.docx

    29/56

    The Data Table can be created programmatically or as a result of a query against the

    database. The Data Table has a number of public properties, including the Columns collection,

    which returns the DataColumnCollection object, which in turn consists of DataColumn objects.

    Each DataColumn object represents a column in a table.

    DataRelations

    In addition to the Tables collection, the DataSet has a Relations property, which

    returns a DataRelationCollection consisting of DataRelation objects. Each DataRelation

    represents a relationship between two tables through DataColumn objects. For example, in the

    North wind database the Customers table is in a relationship with the Orders table through the

    CustomerID column. The nature of the relationship is one-to-many, or parent-to-child. For any

    given order, there will be exactly one customer, but any given customer might be represented in

    any number of orders.

    Rows

    Data Tables Rows collection returns a set of rows for any given table. Use this

    collection to examine the results of queries against the database, iterating through the rows to

    examine each record in turn. Programmers experienced with ADO are often confused by the

    absence of the Record Set with its move Next and move previous commands. With ADO.NET,

    you do not iterate through the DataSet; instead, access the table you need, and then you can

    iterate through the Rows collection, typically with a for each loop.

    Data Adapter

    The DataSet is an abstraction of a relational database. ADO.NET uses a Data Adapter as

    a bridge between the DataSet and the data source, which is the underlying database. Data

    Adapter provides the Fill( ) method to retrieve data from the database and populate the DataSet.

    DBCommand and DBConnection

    The DBConnection object represents a connection to a data source. This connection can be

    shared among different command objects. The DBCommand object allows you to send a

  • 7/29/2019 combining documentation.docx

    30/56

    command (typically a SQL statement or a stored procedure) to the database. Often these objects

    are implicitly created when you create your DataSet, but you can explicitly access these objects,

    as you'll see in a subsequent example.

    The DataAdapter Object

    Rather than tie the DataSet object too closely to your database architecture, ADO.NET

    uses a DataAdapter object to mediate between the DataSet object and the database. This

    decouples the DataSet from the database and allows a single DataSet to represent more than one

    database or other data source.

  • 7/29/2019 combining documentation.docx

    31/56

    SYSTEM DESIGN

  • 7/29/2019 combining documentation.docx

    32/56

    SYSTEM DESIGN

    DATA FLOW DIAGRAM

    1.Search(user)

    2.Query result page :

  • 7/29/2019 combining documentation.docx

    33/56

    3.Novel data extraction & alignment :

  • 7/29/2019 combining documentation.docx

    34/56

    4.Nested structure algorithm:

  • 7/29/2019 combining documentation.docx

    35/56

    5.Accurate data extraction:

  • 7/29/2019 combining documentation.docx

    36/56

  • 7/29/2019 combining documentation.docx

    37/56

    UML DIAGRAMS

    USECASE DIAGRAM

    SEQUENCE DIAGRAM:

    User

    Browser

    Set user

    query

    Extract user

    query

    Web database

    User BrowserWeb database

    Search user query

    Search results/response

    Extract query result records

    Perform actual segmentation

    Align data values

    Exact results

  • 7/29/2019 combining documentation.docx

    38/56

  • 7/29/2019 combining documentation.docx

    39/56

    COLLABORATION DIAGRAM:

    User

    Browser

    1. Search user query

    Web databaseExtract query result

    page

    1.1 search

    query

    CTVS (Combining tag and

    value similarity)

    Accurate data extraction

    2.1 result page1

    2.2 result page2

    3.1 Actual segmentation

    3.2 align data values

    4.1 final search

    results

  • 7/29/2019 combining documentation.docx

    40/56

    SYSTEM TESTING

  • 7/29/2019 combining documentation.docx

    41/56

  • 7/29/2019 combining documentation.docx

    42/56

    Debugging is eliminating the cause of known errors. Commonly used debugging

    techniques are induction, deduction and backtracking. Debugging by induction involves the

    following steps:

    1. Collect all the information about test details and test results

    2. Look for patterns

    3. Form one or more hypotheses and rank /classify them.

    4. Prove/disprove hypotheses. Re examine

    5. Implement appropriate corrections

    6. Verify the corrections. Re run the system and test again until satisfactory

    Debugging by deduction involves the following steps:

    1. List possible causes for observed failure

    2. Use the available information to eliminate various hypotheses

    3. Prove/disprove the remaining hypotheses

    4. Determine the appropriate corrections

    5. Carry out the corrections and verify

    Debugging by backtracking involves working backward in the source code from Point

    where the error was observed. Run additional test cases and collect more information.

    SYSTEM TESTING

    System testing involves two activities: Integration testing and Acceptance testing.

    Integration strategy stresses on the order in which modules are written, debugged and unit tested.

    Acceptance test involves functional tests, performance tests and stress tests to verify

    requirements fulfillment. System checking checks the interfaces, decision logic, control flow,

    recovery procedures, and throughput, capacity and timing characteristics of the entire system.

  • 7/29/2019 combining documentation.docx

    43/56

    INTEGRATION TESTING

    Integration testing strategies include bottom-up (traditional), top-down and sandwich

    strategies. Bottom-up integration consists of unit testing, followed by sub system testing,

    followed by testing entire system. Unit testing tries to discover errors in modules. Modules are

    tested independently in an artificial environment known as a test harness. Test harnesses

    provide data environments and calling sequences for the routines and subsystem that are being

    tested in isolation.

    Disadvantages of bottom-up testing include that harness preparation, which can

    sometimes take almost 50% or more of the coding and debugging effort for a smaller product.

    After testing all the modules independently and in isolation, they are linked and executed in one

    single integration and in isolation; they are linked and executed in one single integration run.

    This known as Big bang approach to integration testing. Isolating sources of errors is difficult

    in big bang approach.

    Top-down integration starts with the main routine and one or two immediately next lower

    level routines. After a thorough checking the top level becomes a test harness to its immediate

    subordinate routines. Top-down integration offers the following advantages.

    1. System integration is distributed throughout the implementation phase. Modules are

    integrated as they are developed.

    2. Top-level interfaces are first test

    3. Top-level routines provide a natural test harness for lower-level routines.

    4. Errors are localized to the new modules and interfaces that are being added.

    Though top-down integrations seem to offer better advantages, it may not be applicable

    in certain situations. Sometimes it may be necessary to test certain critical low-level modules

    first. In such situations, a sandwich strategy is preferable. Sandwich integration is mostly top-

    down, but bottom-up techniques are used on some modules and sub systems. This mixed

    approach retains the advantages of both strategies.

  • 7/29/2019 combining documentation.docx

    44/56

    ACCEPTANCE TESTING

    Acceptance testing involves planning and execution of functional tests, performance tests

    and stress tests in order to check whether the system implemented satisfies the requirements

    specifications. Quality assurance people as well as customers may simultaneously develop

    acceptance tests and run them. In addition to functional and performance tests, stress test are

    performed to determine the limits/limitations of the system developed. For example, a complier

    may be tested for its symbol table overflows or a real-time system may be tested for its symbol

    table overflows or a real-time system may be tested to find how it responds to multiple interrupts

    of different/same priorities.

    Acceptance test tools include a test coverage analyzer, and a coding standards checker.

    Test coverage analyzer records the control paths followed for each test case. A timing analyzer

    reports the time spent in various regions of the source code under different test cases. Coding

    standards are stated in the product requirements. Manual inspection is usually not an adequate

    mechanism from detecting violations of coding standards.

    SYSTEM TESTING

    Software testing is an important element of software quality assurance and represents theultimate review of specification design and coding. The user tests the developed system and

    changes are made according the needs. The testing phase involves testing of developed system

    using various kinds of data. An elaborated test data is prepared and system using the data. Whole

    testing is noted and corrections are made.

    Testing Objectives

    Testing is a process of executing a program with the intent of finding on errors .

    A good test is on that has a high probability of finding an undiscovered errors .

    Testing is vital to the success of the system. System testing is the state of implementation,

    which ensures that the system works accurately before live operations commence. System testing

  • 7/29/2019 combining documentation.docx

    45/56

    makes a logical assumption that the system is correct and that the goals are successfully

    achieved.

    EFFECTIVE TESTING PREREQUISITES

    1) Types of Testing Done

    Integration Testing

    An overall test plan for the project is prepared before the start of coding .

    Validation Testing

    This project will be tested under this testing using sample data and produce the correct

    sample output.

    RECOVERY TESTING

    This project will be tested under this testing using correct data input and its product and

    the correct valid output without any errors.

    SECURITY TESTING

    This project contains password to secure the data.

    TEST DATA AND INPUT

    Taking various types of data we do the above testing. Preparation of test data plays a

    vital role in system testing. After preparing the test data the system under study is tested using

    the test data. While testing the system by using the above testing and correction methods. The

    system has been verified and validated by running with both.

    i) Run with live data

    ii) Run with test data

  • 7/29/2019 combining documentation.docx

    46/56

    RUN WITH TEST DATA

    In the case the system was run with some sample data. Specification testing was also

    done for each conditions or combinations for conditions.

    RUN WITH LIVE DATA

    The system was tested with the data of the old system for a particular period. Then the new

    reports were verified with the old one.

    TEST CASES

    A test case in software engineering is a set of conditions or variables under which a

    tester will determine whether an application or software system is working correctly or not. The

    mechanism for determining whether a software program or system has passed or failed such a

    test is known as a test oracle. In some settings, an oracle could be a requirement or use case,

    while in others it could be a heuristic. It may take many test cases to determine that a software

    program or system is considered sufficiently scrutinized to be released. Test cases are often

    referred to as test scripts, particularly when written. Written test cases are usually collected into

    test suites.

    FORMAT TEST CASES

    In order to fully test that all the requirements of an application are met, there must be at

    least two test cases for each requirement: one positive test and one negative test. If a requirement

    has sub-requirements, each sub-requirement must have at least two test cases. Keeping track of

    the link between the requirement and the test is frequently done using a traceability matrix.

    Written test cases should include a description of the functionality to be tested, and the

    preparation required to ensure that the test can be conducted.A formal written test-case is

    characterized by a known input and by an expected output, which is worked out before the test is

    executed. The known input should test a precondition and the expected output should test a post

    condition.

    http://en.wikipedia.org/wiki/Software_engineeringhttp://en.wikipedia.org/wiki/Software_applicationhttp://en.wikipedia.org/wiki/Software_systemhttp://en.wikipedia.org/wiki/Oracle_%28software_testing%29http://en.wikipedia.org/wiki/Requirementhttp://en.wikipedia.org/wiki/Use_casehttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Test_scripthttp://en.wikipedia.org/wiki/Test_suitehttp://en.wikipedia.org/wiki/Traceability_matrixhttp://en.wikipedia.org/wiki/Preconditionhttp://en.wikipedia.org/wiki/Postconditionhttp://en.wikipedia.org/wiki/Postconditionhttp://en.wikipedia.org/wiki/Postconditionhttp://en.wikipedia.org/wiki/Postconditionhttp://en.wikipedia.org/wiki/Preconditionhttp://en.wikipedia.org/wiki/Traceability_matrixhttp://en.wikipedia.org/wiki/Test_suitehttp://en.wikipedia.org/wiki/Test_scripthttp://en.wikipedia.org/wiki/Heuristichttp://en.wikipedia.org/wiki/Use_casehttp://en.wikipedia.org/wiki/Requirementhttp://en.wikipedia.org/wiki/Oracle_%28software_testing%29http://en.wikipedia.org/wiki/Software_systemhttp://en.wikipedia.org/wiki/Software_applicationhttp://en.wikipedia.org/wiki/Software_engineering
  • 7/29/2019 combining documentation.docx

    47/56

  • 7/29/2019 combining documentation.docx

    48/56

    IMPLEMENTATION

  • 7/29/2019 combining documentation.docx

    49/56

    IMPLEMENTATION

    SAMPLE CODING

  • 7/29/2019 combining documentation.docx

    50/56

    SCREEN LAYOUTS

  • 7/29/2019 combining documentation.docx

    51/56

    FUTURE ENHANCEMENT

  • 7/29/2019 combining documentation.docx

    52/56

    FUTURE ENHANCEMENT

  • 7/29/2019 combining documentation.docx

    53/56

    CONCLUSION

  • 7/29/2019 combining documentation.docx

    54/56

  • 7/29/2019 combining documentation.docx

    55/56

    BIBLIOGRAPHY

  • 7/29/2019 combining documentation.docx

    56/56

    BIBLIOGRAPHY

    BOOK REFERENCE

    WEB REFERENCE