flat model - web viewq1. explain four characteristics of data warehousing with respect to (a)...

39
Q1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M) Answer: - Data Warehousing is a program dedicated to the delivery of information, which advances decision making, improves business practices and enables knowledge workers. - It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. - It plays a functional role in any organization in form of analytical tool - More generally, data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker, such as executive, manager, and analyst, to arrive at better and faster decisions - Data warehouses provide access to data for complex analysis, knowledge discovery, and decision-making - In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users. The fundamental characteristics of a data warehouse are: - Subject Oriented : Data organized by subject - Integrated : Consistency of defining parameters - Time variant : Timeliness of data and access terms - Non Volatile : Stable data storage medium

Upload: dodiep

Post on 30-Jan-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Q1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Answer:

- Data Warehousing is a program dedicated to the delivery of information, which advances decision making, improves business practices and enables knowledge workers.

- It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

- It plays a functional role in any organization in form of analytical tool- More generally, data warehousing is a collection of decision support technologies, aimed at

enabling the knowledge worker, such as executive, manager, and analyst, to arrive at better and faster decisions

- Data warehouses provide access to data for complex analysis, knowledge discovery, and decision-making

- In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

The fundamental characteristics of a data warehouse are:

- Subject Oriented : Data organized by subject- Integrated : Consistency of defining parameters- Time variant : Timeliness of data and access terms- Non Volatile : Stable data storage medium

Subject Oriented

- Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented.

- A data warehouse is organized around high-level business groupings called subjects. They do not have the same atomic entity focus as OLTP systems.

Page 2: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Integrated

- The data in the warehouse must be integrated and consistent. That is, if two different source systems store conflicting data about entities, or attributes of an entity, the differences need to be resolved during the process of transforming the source data and loading it into the data warehouse.

- Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated.

Time Variant

- In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant.

- Typically, data flows from one or more online transaction processing (OLTP) databases into a data warehouse on a monthly, weekly, or daily basis. The data is normally processed in a staging file before being added to the data warehouse. Data warehouses commonly range in size from tens of gigabytes to a few terabytes. Usually, the vast majority of the data is stored in a few very large fact tables.

Non Volatile

- Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.

- The content of OLTP systems are, by their nature, continuously changing. Inserts, deletes, and updates form the basis of a large volume of business transactions that result in a very volatile set of data. By contrast, data warehouses are static. The data in the warehouse is read-only; updates or refresh of the data occur on a periodic incremental or full refresh basis.

Page 3: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Q2. Describe in detail the three major database models with suitable example. What is ODBMS? How it is similar to ORDBMS? (15M)

Answer:

A database model is a type of data model that determines the logical structure of a database and fundamentally determines in which manner data can be stored, organized, and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Flat model

Flat File Model.

Main articles: Flat file database and Spreadsheet

The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password that might be used as a part of a system security database. Each row would have the specific password associated with an individual user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers. This tabular format is a precursor to the relational model.

Page 4: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Hierarchical model

Hierarchical Model.

Main article: Hierarchical model

In a hierarchical model, data is organized into a tree-like structure, implying a single parent for each record. A sort field keeps sibling records in a particular order. Hierarchical structures were widely used in the early mainframe database management systems, such as the Information Management System (IMS) by IBM, and now describe the structure of XML documents. This structure allows one one-to-many relationship between two types of data. This structure is very efficient to describe many relationships in the real world; recipes, table of contents, ordering of paragraphs/verses, any nested and sorted information.

This hierarchy is used as the physical order of records in storage. Record access is done by navigating through the data structure using pointers combined with sequential accessing. Because of this, the hierarchical structure is inefficient for certain database operations when a full path (as opposed to upward link and sort field) is not also included for each record. Such limitations have been compensated for in later IMS versions by additional logical hierarchies imposed on the base physical hierarchyNetwork model

Relational model

Page 5: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Main article: Relational model

The relational model was introduced by E.F. Codd in 1970[1] as a way to make database management systems more independent of any particular application. It is a mathematical model defined in terms of predicate logic and set theory, and systems implementing it have been used by mainframe, midrange and microcomputer systems.

The products that are generally referred to as relational databases in fact implement a model that is only an approximation to the mathematical model defined by Codd. Three key terms are used extensively in relational database models: relations, attributes, and domains. A relation is a table with columns and rows. The named columns of the relation are called attributes, and the domain is the set of values the attributes are allowed to take.

The basic data structure of the relational model is the table, where information about a particular entity (say, an employee) is represented in rows (also called tuples) and columns. Thus, the "relation" in "relational database" refers to the various tables in the database; a relation is a set of tuples. The columns enumerate the various attributes of the entity (the employee's name, address or phone number, for example), and a row is an actual instance of the entity (a specific employee) that is represented by the relation. As a result, each tuple of the employee table represents various attributes of a single employee.

Advantages of Relational approach· Ease of use: The revision of any information as tables consisting 0f rows and columns is quite natural and therefore even first time users find it attractive.· Flexibility: Different tables from which information has to be linked and extracted can be easily manipulated by operators such as project and join to give information in the form in which it is desired.· Security: Security control and authorization can also be implemented more easily by moving sensitive attributes in a given table into a separate relation with its own authorization controls. If authorization requirement permits, a particular attribute could be joined back with others to enable full information retrieval.· Data Independence: Data independence is achieved more easily with normalization structure used in a relational database than in the more complicated tree or network structure..Disadvantages of Relational Model: · Hardware overheads: relational database systems hide the implementation complexities and the physical data storage details from the user. For doing this, the relational database system need more powerful hardware computers and data storage devices.· Ease of design can lead to bad design: the relational database is easy to design and use. The user needs not to know the complexities of the data storage. This ease of design and use can lead to the development and implementation of the very poorly designed database management system.

Page 6: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Object-oriented database models

Example of an Object-Oriented Model.

Main articles: Object-relational model and Object model

In the 1990s, the object-oriented programming paradigm was been applied to database technology, creating a new database model known as object databases. This aims to avoid the object-relational impedance mismatch - the overhead of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). Even further, the type system used in a particular application can be defined directly in the database, allowing the database to enforce the same data integrity invariants. Object databases also introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried[by whom?]for storing objects in a database. Some[which?]

products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others[which?] have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a

Page 7: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

database programming language that allows full programming capabilities as well as traditional query facilities.

Network Model.

Main article: Network model

The network model expands upon the hierarchical structure, allowing many-to-many relationships in a tree-like structure that allows multiple parents. It was the most popular before being replaced by the relational model, and is defined by the CODASYL specification.

The network model organizes data using two fundamental concepts, called records and sets. Records contain fields (which may be organized hierarchically, as in the programming language COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets.

A set consists of circular linked lists where one record type, the set owner or parent, appears once in each circle, and a second record type, the subordinate or child, may appear multiple times in each circle. In this way a hierarchy may be established between any two record types, e.g., type A is the owner of B. At the same time another set may be defined where B is the owner of A. Thus all the sets comprise a general directed graph (ownership defines a direction), or network construct. Access to records is either sequential (usually in each record type) or by navigation in the circular linked lists.

Page 8: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

ODBMS

- Most of the modern programming languages are object oriented, while most of the mainstream databases - relational. So programmer has to seat at two chairs and work with two data models - relational and object. It significantly complicates design of application, because system architect has two work with different notions representing the same entities

- An object database (also object-oriented database management system) is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented

- ODBMS supports the modeling and creation of data as objects. This includes some kind of support for classes of objects and the inheritance of class properties and methods by subclasses and their objects.

Features of ODBMS

- Object is the basic notion in object oriented system. It is basic unit of storing data in the database. Each object has unique OID which is automatically generated by the system.

- Direct representation of references between objects.- Support of inheritance and polymorphism.- Tight integration with at least one object oriented programming language- Support of traditional DBMS features: ACID transactions, backups, import-export utilities,

scheme evaluation etc

ORDBMS

- An object-relational database (ORD), or object-relational database management system (ORDBMS), is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, just as with

Page 9: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

proper relational systems, it supports extension of the data model with custom data-types and methods.

- An object-relational database can be said to provide a middle ground between relational databases and object-oriented databases (OODBMS). In object-relational databases, the approach is essentially that of relational databases: the data resides in the database and is manipulated collectively with queries in a query language

Page 10: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Q3. How does partitioning (both vertical and horizontal) provide data granularity? What is the advantage of creating granular database for a typical retail enterprise or an airline company? (15M)

Answer:

- A partition is a division of a logical database or its constituting elements into distinct independent parts. Database partitioning is normally done for manageability, performance or availability reasons.

- The partitioning can be done by either building separate smaller databases (each with its own tables, indices, and transaction logs), or by splitting selected elements, for example just one table.

Horizontal partitioning

- It involves putting different rows into different tables. - E.g. customers with ZIP codes less than 5000 are stored in CustomersEast, while customers

with ZIP codes greater than or equal to 5000 are stored in CustomersWest. - The two partition tables are then CustomersEast and CustomersWest, while a view with a

union might be created over both of them to provide a complete view of all customers.

Vertical partitioning

- It involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.

- Different physical storage might be used to realize vertical partitioning as well; storing infrequently used or very wide columns on a different device, for example, is a method of vertical partitioning. Done explicitly or implicitly, this type of partitioning is called "row splitting" (the row is split by its columns).

- A common form of vertical partitioning is to split dynamic data (slow to find) from static data (fast to find) in a table where the dynamic data is not used as often as the static. Creating a view across the two newly created tables restores the original table with a performance penalty, however performance will increase when accessing the static data e.g. for statistical analysis.

Page 11: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Data Granularity

The granularity of data refers to the fineness with which data fields are sub-divided. For example, a postal address can be recorded, with low granularity, as a single field:

1. address = 200 2nd Ave. South #358, St. Petersburg, FL 33701-4313 USAor with high granularity, as multiple fields:

1. street address = 200 2nd Ave. South #3582. city = St. Petersburg3. postal code = FL 33701-43134. country = USA

Higher granularity has overheads for data input and storage. It does however offer benefits in flexibility of data processing in treating each data field in isolation if required. A performance problem caused by excessive granularity may not reveal itself until scalability becomes an issue.

As stated in the above example partitioning in database can be used to make finely granular for the ease of processing.

- When the address is broken down into different columns its vertical partitioning- Again based on specific criteria the data in a single table can be broken down to multiple

partition tables. This is horizontal partitioning.

Advantage of using Granular database in Retail/Airline application.

1. Having granular database assist in file grained parallelism. This means individual tasks are relatively small in terms of code size and execution time.

2. The finer the granularity, the greater the potential for parallelism and hence speed-up, but the greater the overheads of synchronization and communication.

3. Since the an airline and retail application would be accessed by many users across different locations best parallel performance can be attained by the best balance between load and communication overhead. If the granularity is too fine, the performance can suffer from the increased communication overhead. On the other side, if the granularity is too coarse, the performance can suffer from load imbalance.

Page 12: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Q4. What are virtual machines? Explain briefly its working mechanism with suitable example. List two advantages of using virtual machines in our day to day business / work environment. (15M)

A virtual machine is a tightly isolated software container that can run its own operating systems and applications as if it were a physical computer. A virtual machine behaves exactly like a physical computer and contains it own virtual (ie, software-based) CPU, RAM hard disk and network interface card (NIC).

An operating system can’t tell the difference between a virtual machine and a physical machine, nor can applications or other computers on a network. Even the virtual machine thinks it is a “real” computer. Nevertheless, a virtual machine is composed entirely of software and contains no hardware components whatsoever. As a result, virtual machines offer a number of distinct advantages over physical hardware.

Working Mechanism

Advantages

Compatibility - Just like a physical computer, a virtual machine hosts its own guest operating system and applications, and has all the components found in a physical computer (motherboard, VGA card, network card controller, etc). As a result, virtual machines are completely compatible with all standard x86 operating systems, applications and device drivers, so you can use a virtual machine to run all the same software that you would run on a physical x86 computer.

Isolation - While virtual machines can share the physical resources of a single computer, they remain completely isolated from each other as if they were separate physical machines. If, for example, there are four virtual machines on a single physical server and one of the virtual machines crashes, the other three virtual machines remain available. Isolation is an important reason why the availability and security of applications running

Page 13: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

in a virtual environment is far superior to applications running in a traditional, non-virtualized system.

Encapsulation - A virtual machine is essentially a software container that bundles or “encapsulates” a complete set of virtual hardware resources, as well as an operating system and all its applications, inside a software package. Encapsulation makes virtual machines incredibly portable and easy to manage. For example, you can move and copy a virtual machine from one location to another just like any other software file, or save a virtual machine on any standard data storage medium, from a pocket-sized USB flash memory card to an enterprise storage area networks (SANs).

Hardware Independence - Virtual machines are completely independent from their underlying physical hardware. For example, you can configure a virtual machine with virtual components (eg, CPU, network card, SCSI controller) that are completely different from the physical components that are present on the underlying hardware. Virtual machines on the same physical server can even run different kinds of operating systems (Windows, Linux, etc).

When coupled with the properties of encapsulation and compatibility, hardware independence gives you the freedom to move a virtual machine from one type of x86 computer to another without making any changes to the device drivers, operating system, or applications. Hardware independence also means that you can run a heterogeneous mixture of operating systems and applications on a single physical computer.

Page 14: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Q5. How does a web application work? Explain with suitable diagram how a single tier and multi tier application work? What is two phase commit? How does rollback, roll forward and commit work?

Answer

How Web application work.

Page 15: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Technology that separates computers and application software into two categories clients, and servers to better employ available computing resources and share data processing loads. This model was developed at Xerox PARC during the 1970s.

The model assigns one of two roles to the computers in a network: Client or server. A server is a computer system that selectively shares its resources. It might provide high-

volume storage capacity, heavy data crunching, and/or high resolution graphics. Client is a computer or computer program that initiates contact with a server in order to

make use of a resource. Data, CPUs, printers, and data storage devices are some examples of resources. A client computer provides the user interaction-facility (interface) and some or all application processing.

Page 16: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Typically, several client computers are connected through a network (or networks) to a server which could be a large PC, minicomputer, or a mainframe computer

The following are the examples of client/server architectures.

Two tier architectures

Two-tier architecture is where a client talks directly to a server, with no intervening server. It is typically used in small environments(less than 50 users).

In two tier client/server architectures, the user interface is placed at user's desktop environment and the database management system services are usually in a server that is a more powerful machine that provides services to the many clients.

Information processing is split between the user system interface environment and the database management server environment.

Three tier architectures

Page 17: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Three tier architecture was introduced to overcome the drawbacks of the two tier architecture. In the three tier architecture, a middleware is used between the user system interface client environment and the database management server environment.

These middleware are implemented in a variety of ways such as transaction processing monitors, message servers or application servers. The middleware perform the function of queuing, application execution and database staging. In addition the middleware adds scheduling and prioritization for work in progress.

The three tier client/server architecture is used to improve performance for large number of users and also improves flexibility when compared to the two tier approach.

The drawback of three tier architectures is that the development environment is more difficult to use than the development of two tier applications.

The widespread use of the term 3-tier architecture also denotes the following architectures:o Application sharing between a client, middleware and enterprise servero Application sharing between a client, application server and enterprise database

server.

Three tier with message server.

In this architecture, messages are processed and prioritized asynchronously. Messages have headers that include priority information, address and identification number. The message server links to the relational DBMS and other data sources. Messaging systems are alternative for wireless infrastructures.

Three tier with an application server

This architecture allows the main body of an application to run on a shared host rather than in the user system interface client environment. The application server shares business logic, computations and a data retrieval engine. In this architecture applications are more scalable and installation costs are less on a single server than maintaining each on a desktop client.

Page 18: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

3-tier architecture provides

A greater degree of flexibility Increased security, as security can be defined for each service, and at each level Increased performance, as tasks are shared between servers

Two Phase Commit

A commit operation is, by definition, an all-or-nothing affair. If a series of operations bound as a transaction cannot be completed, the rollback must restore the system (or cooperating systems) to the pre-transaction state.

In order to ensure that a transaction can be rolled back, a software system typically logs each operation, including the commit operation itself. A transaction/recovery manager uses the log records to undo (and possibly redo) a partially completed transaction.

When a transaction involves multiple distributed resources, for example, a database server on each of two different network hosts, the commit process is somewhat complex because the transaction includes operations that span two distinct software systems, each with its own resource manager, log records, and so on. (In this case, the distributed resources are the database servers.)

Two-phase commit is a transaction protocol designed for the complications that arise with distributed resource managers. With a two-phase commit protocol, the distributed transaction manager employs a coordinator to manage the individual resource managers.The commit process proceeds as follows:

Phase 1 Each participating resource manager coordinates local operations and forces all log records out: If successful, respond "OK" If unsuccessful, either allow a time-out or respond "OOPS"

Phase 2 If all participants respond "OK":

Coordinator instructs participating resource managers to "COMMIT" Participants complete operation writing the log record for the commit

Otherwise: Coordinator instructs participating resource managers to "ROLLBACK" Participants complete their respective local undos

Roll BackDB server reads back over the transaction log entries for the transaction that needs to be rolled back and generates compensating operations (operations that reverse the effect of each logged change) which it then logs and executes.

Page 19: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Roll Forward

It is also possible to keep a separate journal of all modifications to a database (sometimes called after images). This is not required for rollback of failed transactions but it is useful for updating the database in the event of a database failure, so some transaction-processing systems provide it. If the database fails entirely, it must be restored from the most recent back-up. The back-up will not reflect transactions committed since the back-up was made. However, once the database is restored, the journal of after images can be applied to the database (rollforward) to bring the database up to date. Any transactions in progress at the time of the failure can then be rolled back. The result is a database in a consistent, known state that includes the results of all transactions committed up to the moment of failure.

CommitCommit is exact opposite of Rollback transaction. In this case the the record is saved such the changed data is available to all other users and transaction log entries are deleted.

Page 20: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Q6. Explain each of following with suitable example. (a) Multi-Processing (b)Multi Tasking (c) Multi Threading (d)Multi programming (15M)

Answer

Multi Processing

- Multiprocessing is the coordinated processing of programs by more than one computer processor. Multiprocessing is a general term that can mean the dynamic assignment of a program to one of two or more computers working in tandem or can involve multiple computers working on the same program at the same time (in parallel).

- With the advent of parallel processing, multiprocessing is divided into symmetric multiprocessing (SMP) and massively parallel processing (MPP).

- In symmetric (or "tightly coupled") multiprocessing, the processors share memory and the I/O bus or data path. A single copy of the operating system is in charge of all the processors. SMP, also known as a "shared everything" system, does not usually exceed 16 processors.

- In massively parallel (or "loosely coupled") processing, up to 200 or more processors can work on the same application. Each processor has its own operating system and memory, but an "interconnect" arrangement of data paths allows messages to be sent between processors. Typically, the setup for MPP is more complicated, requiring thought about how to partition a common database among processors and how to assign work among the processors. An MPP system is also known as a "shared nothing" system.

- Example processing two MS word documents at the same time

Multi Tasking

- Multitasking (sometimes incorrectly called multiprocessing) refers to an Operating System's ability to handle multiple concurrent processes that are launched by different running applications.

- In a single processor computer the CPU can execute one task at a time, but the Operating System manages which task should access the CPU. In a synergy between hardware and operating system, the CPU is allocated to different processes/applications several times per second in a process called 'time-slicing'. This enables many programs on your computer at once with apparent instant user responsiveness, even though the single-core CPU can do only one thing at a time.

- This technology has been around since the 1960's and is not to be confused with Multi-threading.

- Example : Watching a movie while downloading a song

Page 21: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Multi Threading- Multi-threading is the program's ability to break itself down to multiple concurrent threads

that can be executed separately by the computer. - A multiprocessor computer can run two or more of the threads at a time, which means that

the program "runs faster" on a multiprocessor machine than a single-processor machine. - On a single processor machine, while a multi-threaded program will run no faster, a multi-

threaded application can appear to be more responsive to user interaction, because the operating system can give the illusion that multiple activities within the same program are running at the same time.

- "Traditional" single-thread applications cannot make use of two processors; therefore they don't run faster on multiprocessor machines.

- Example : In a typical chatting application we have two threads running, one listening to incoming messages and the one pushing the typed messages over the network

Multi Programming

- Early computers ran one process at a time. While the process waited for servicing by another device, the CPU was idle. In an I/O intensive process, the CPU could be idle as much as 80% of the time.

- Advancements in operating systems led to computers that load several independent processes into memory and switch the CPU from one job to another when the first becomes blocked while waiting for servicing by another device.

- This idea of multiprogramming reduces the idle time of the CPU. Multiprogramming accelerates the throughput of the system by efficiently using the CPU time.

- Programs in a multiprogrammed environment appear to run at the same time. Processes running in a multiprogrammed environment are called concurrent processes. In actuality, the CPU processes one instruction at a time, but can execute instructions from any active process.

- Example: Simultaneously chatting on Yahoo as well as MSN messenger in a single core desktop machine

Page 22: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Q7. What is an operating system? Name major components of operating system and explain briefly roles of any two.

Answer:

An operating system is the most important software that runs on a computer. It manages the computer's memory, processes, and all of its software and hardware. It also allows you to communicate with the computer without knowing how to speak the computer's "language."

The operating system (OS) is the first thing loaded onto the computer Not all computers have operating systems. The computer that controls the microwave oven in your

kitchen, for example, doesn't need an operating system. It has one set of tasks to perform, very straightforward input to expect (a numbered keypad and a few pre-set buttons) and simple, never-changing hardware to control.

For other devices, an operating system creates the ability to:o serve a variety of purposeso interact with users in more complicated wayso keep up with needs that change over time

Most commonly available families of Operating system developed by Microsofto Windows family of operating systemo Macintosh operating systems developed by Appleo UNIX family of operating systems

Operating System Functions

It manages the hardware and software resources of the system. In a desktop computer, these resources include such things as the processor, memory, disk space and more (On a cell phone, they include the keypad, the screen, the address book, the phone dialer, the battery and the network connection).

It provides a stable, consistent way for applications to deal with the hardware without having to know all the details of the hardware.

Types Of Operating system

Real-time operating system

Real-time operating systems(RTOS) are used to control machinery, scientific instruments and industrial systems.

It has very little user-interface capability, and no end-user utilities, since the system will be a "sealed box" when delivered for use.

A very important part of an RTOS is managing the resources of the computer so that a particular operation executes in precisely the same amount of time, every time it occurs.

Page 23: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

In a complex machine, having a part move more quickly just because system resources are available may be just as catastrophic as having it not move at all because the system is busy.

Single-user, single task

This operating system is designed to manage the computer so that one user can effectively do one thing at a time.

The Palm OS for Palm handheld computers is a good example of a modern single-user, single-task operating system.

Single-user, multi-tasking

This is the type of operating system most people use on their desktop and laptop computers today. Microsoft's Windows and Apple's MacOS platforms are both examples of operating systems that will

let a single user have several programs in operation at the same time. For example, it's entirely possible for a Windows user to be writing a note in a word processor while downloading a file from the Internet while printing the text of an e-mail message.

Multi-user

A multi-user operating system allows many different users to take advantage of the computer's resources simultaneously.

The operating system must make sure that the requirements of the various users are balanced, and that each of the programs they are using has sufficient and separate resources so that a problem with one user doesn't affect the entire community of users.

Unix, VMS and mainframe operating systems, such as MVS, are examples of multi-user operating systems.

Components of Operating System

Process Management

The operating system manages many kinds of activities ranging from user programs to system programs like printer spooler, name servers, file server etc. Each of these activities is encapsulated in a process.

A process includes the complete execution context (code, data, PC, registers, OS resources in use etc.). It is important to note that a process is not a program. A process is only ONE instant of a program in execution. There are many processes can be running the same program.

The five major activities of an operating system in regard to process management areo Creation and deletion of user and system processes.o Suspension and resumption of processes.

Page 24: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

o A mechanism for process synchronization.o A mechanism for process communication.o A mechanism for deadlock handling.

Main-Memory Management

Primary-Memory or Main-Memory is a large array of words or bytes. Each word or byte has its own address. Main-memory provides storage that can be access directly by the CPU. That is to say for a program to be executed, it must in the main memory.

The major activities of an operating in regard to memory-management are:o Keep track of which part of memory are currently being used and by whom.o Decide which processes are loaded into memory when memory space becomes

available.o Allocate and de-allocate memory space as needed.

File Management

A file is a collected of related information defined by its creator. Computer can store files on the disk (secondary storage), which provide long term storage.

Some examples of storage media are magnetic tape, magnetic disk and optical disk. Each of these media has its own properties like speed, capacity, data transfer rate and access methods.

File systems normally organized into directories to ease their use. These directories may contain files and other directions.

The five main major activities of an operating system in regard to file management areo The creation and deletion of files.o The creation and deletion of directions.o The support of primitives for manipulating files and directions.o The mapping of files onto secondary storage.o The backup of files on stable storage media.

I/O System Management

I/O subsystem hides the peculiarities of specific hardware devices from the user. Only the device driver knows the peculiarities of the specific device to which it is assigned.

Secondary-Storage Management

Generally speaking, systems have several levels of storage, including primary storage, secondary storage and cache storage.

Instructions and data must be placed in primary storage or cache to be referenced by a running program. Because main memory is too small to accommodate all data and programs, and its data are

Page 25: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

lost when power is lost, the computer system must provide secondary storage to back up main memory.

Secondary storage consists of tapes, disks, and other media designed to hold information that will eventually be accessed in primary storage (primary, secondary, cache) is ordinarily divided into bytes or words consisting of a fixed number of bytes. Each location in storage has an address; the set of all addresses available to a program is called an address space.

The three major activities of an operating system in regard to secondary storage management are:o Managing the free space available on the secondary-storage device.o Allocation of storage space when new files have to be written.o Scheduling the requests for memory access.

Networking

A distributed system is a collection of processors that do not share memory, peripheral devices, or a clock. The processors communicate with one another through communication lines called network. The communication-network design must consider routing and connection strategies, and the problems of contention and security.

Protection System

If computer systems has multiple users and allows the concurrent execution of multiple processes, then the various processes must be protected from one another's activities.

Protection refers to mechanism for controlling the access of programs, processes, or users to the resources defined by a computer system.

Command Interpreter System

A command interpreter is an interface of the operating system with the user. The user gives commands with are executed by operating system (usually by turning them into system calls).

The main function of a command interpreter is to get and execute the next user specified command. Command-Interpreter is usually not part of the kernel, since multiple command interpreters may be support by an operating system, and they do not really need to run in kernel mode.

There are two main advantages to separating the command interpreter from the kernel.o If we want to change the way the command interpreter looks, i.e. I want to change the

interface of command interpreter, I am able to do that if the command interpreter is separate from the kernel. I cannot change the code of the kernel so I cannot modify the interface.

o If the command interpreter is a part of the kernel it is possible for a malicious process to gain access to certain part of the kernel that it showed not have to avoid this ugly scenario it is advantageous to have the command interpreter separate from kernel.

Page 26: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Q8. Write short note on OLAP and OLTP (10M)

Answer:

OLTP (online transaction processing)

Is a class of program that facilitates and managestransaction-oriented applications, typically for data entry and retrieval transactions in a number of industries, including banking, airlines, mailorder, supermarkets, and manufacturers. Probably the most widely installed OLTP product is IBM's CICS (Customer Information Control System).

Today's online transaction processing increasingly requires support for transactions that span a network and may include more than one company. For this reason, new OLTP software uses client/server processing and brokering software that allows transactions to run on different computer platforms in a network.

OLAP (online analytical processing)

Is computer processing that enables a user to easily and selectively extract and view data from different points of view. For example, a user can request that data be analyzed to display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas a relational database can be thought of as two-dimensional, a multidimensional database considers each data attribute (such as product, geographic sales region, and time period) as a separate "dimension." OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them. Attributes such as time periods can be broken down into sub attributes.

OLAP can be used for data mining or the discovery of previously undiscerned relationships between data items. An OLAP database does not need to be as large as a data warehouse, since not all transactional data is needed for trend analysis. Using Open Database Connectivity (ODBC), data can be imported from existing relational databases to create a multidimensional database for OLAP.

Page 27: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

OLTP OLAPDefination OLTP stands for On Line Transaction

Processing and is a data modeling approach typically used to facilitate and manage usual business applications. Most of applications you see and use are OLTP based

OLAP stands for On Line Analytic Processing and is an approach to answer multi-dimensional queries. OLAP was conceived for Management Information Systems and Decision Support Systems

Horizon OLTP System deals with operational data. Operational data are those data involved in the operation of a particular system.

OLAP deals with Historical Data or Archival Data. Historical data are those data that are archived over a long period of time.

Refresh OLPT requires instant update. When you cash some money from an ATM you balance shall be immediately updated

OLAP has not require instant refresh. Nobody needs instant information to make strategic business decision.

Data Model & Schema

OLTP perfectly fits traditional entity-relationship or object-oriented models. We usually refer to information as attributes related to entities, objects or classes, like product price, invoice amount or client name. Mapping can be with a simple, one argument function

OLAP solution is use an hybrid approach based sitting on conventional relational technology. This model employs so called star-schema instead of traditional normalization.

Emphasis

OLPT emphasis is on update. Transaction level isolation assures that database is always in a consistent state. This can imply in some overhead to coordinate concurrent updates but is necessary even in small applications.

OLAP can be updated by periodic (daily) processes that work in standalone mode thus consistency can be assured through update process.

Example In a banking System, you withdraw amount through an ATM. Then account Number,ATM PIN Number,Amount you are withdrawing, Balance amount in account etc are operational data elements.

What is the Salary of Mr.John?

What is the address and email id of the person who is the head of maths department?

If we collect last 10 years data about flight reservation, The data can give us many meaningful information such as the trends in reservation. This may give useful information like peak time of travel, what kinds of people are traveling in various classes (Economy/Business)etc.

How is the profit changing over the years across different regions ?

Is it financially viable continue the production unit at location X?

Page 28: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Q9. Write short note on Centralized processing and Decentralized processing (10M)

Centralized processing Centralized processing environments maintain all data & perform all data processing

at a central location. Mainframe & large server computing applications are examples of centralized

processing.

Decentralized (Distributed) Processing

Decentralized processing occurs when computing power, applications, & "work" is spread out (or distributed) over many locations (i.e., via a LAN or WAN).

Decentralized processing environments often use distributed processing techniques, where each remote computer performs a portion of the processing, thus reducing the processing burden on a central computer.

Distributed systems are workstations placed in geographically remote locations & linked to a centralized computer.

- Advantages of Centralized Processingo Data is secured better, once received. o Processing is consistent.

- Disadvantages of Centralized Processingo High cost of transmitting large numbers of detailed transactions o Increased processing power & data storage needs at a central location o There is a reduction in local accountability. o Input/output bottlenecks can occur at high traffic times. o Lack of ability to respond in a timely manner to information requests from remote

locations.

Decentralized (Distributed) Processing

Advantages

DDBMS has many advantages. Data is located near the greatest demand site, access is faster, processing is faster due to several sites spreading out the work load, new sites can be added quickly and easily, communication is improved, operating costs are reduced, it is user friendly, there is less danger of a single-point failure, and it has process independence.Several reasons why businesses and organizations move to distributed databases include organizational and economic reasons, reliable and flexible interconnection of existing database, and the future incremental growth.

Page 29: Flat model -    Web viewQ1. Explain four characteristics of Data Warehousing with respect to (a) Subject Oriented, (b) Integrated, (c) Time variant and (d) Non Volatile (15M)

Data can physically reside nearest to where it is most often accessed, thus providing users with local control of data that they interact with. This results in local autonomy of the data allowing users to enforce locally the policies regarding access to their data.One might want to consider a parallel architecture is to improve reliability and availability of the data in a scalable system. In a distributed system, with some careful tact, it is possible to access some, or possibly all of the data in a failure mode if there is sufficient data replication.

DDBMS also has a few disadvantages.

Managing and controlling is complex, there is less security because data is at so many different sites.

Distributed databases provides more flexible accesses that increase the chance of security violations since the database can be accessed throughout every site within the network.The ability to ensure the integrity of the database in the presence of unpredictable failures of both hardware and software components is also an important features of any distributed database management systems. The integrity of a database is concerned with its consistency, correctness, validity, and accuracy. The integrity controls must be built into the structure of software, databases, and involved personnel.

If there are multiple copies of the same data, then this duplicated data introduces additional complexity in ensuring that all copies are updated for each update. The notion of concurrency control and recoverability consume much of the research efforts in the area of distributed database theory. Increasing in reliability and performance is the goal and not the status quo.