dbms notes

Upload: akash-pal

Post on 18-Oct-2015

66 views

Category:

Documents


0 download

DESCRIPTION

Data Base Management System

TRANSCRIPT

DBMS Notes

31DBMS Notes GIET2013EC4SA

DataBase :A database is a collection of stored operational data used by various applications and/or users by some particular enterprise or by a set of outside authorized applications and authorized users.

DataBase Management System:A DataBase Management System (DBMS) is a software system that manages execution of users applications to access and modify database data so that the data security, data integrity, and data reliability is guaranteed for each application and each application is written with an assumption that it is the only application active in the database.

What Is Data ?Different viewpoints:A sequence of characters stored in computer memory or storageInterpreted sequence of characters stored in computer memory or storageInterpreted set of objects Database supports a concurrent access to the dataFile Systems :File is uninterrupted, unstructured collection of informationFile operations: delete, catalog, create, rename, open, close, read, write, find, Access methods: Algorithms to implement operations along with internal file organizationExamples: File of Customers, File of Students; Access method: implementation of a set of operations on a file of students or customers.

File Management System Problems:Data redundancyData Access: New request-new programData is not isolated from the access implementationConcurrent program execution on the same fileDifficulties with security enforcementIntegrity issues.

Database Applications:Airline Reservation Systems Data items are: single passenger reservations; Information about flights and airports; Information about ticket prices and tickets restrictions.Banking Systems Data items are accounts, customers, loans, mortgages, balances, etc. Failures are not tolerable. Concurrent access must be providedCorporate Records Data items are: sales, accounts, bill of materials records, employee and their dependentsADVANTAGES OF A DBMS:Data independence: Application programs should be as independent as possible from details of data representation and storage. The DBMS can provide an abstract view of the data to insulate application code from such details.Client data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This feature is especially important if the data is stored on external storage devices.Data integrity and security: If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on the data. For example, before inserting salary information for an employee, the DBMS can check that the department budget is not exceeded. Also, the DBMS can enforce access controls that govern what data is visible to deferent classes of users.Data administration: When several users share the data, centralizing the administrationof data can or signi cant improvements. Experienced professionals who understand the nature of the data being managed, and how deferent groups of users use it, can be responsible for organizing the data representation to minimize redundancy and for ne-tuning the storage of the data to make retrieval efficient.concarence recovery: A DBMS schedules concurrent accesses to the data in such a manner that users can think of the data as being accessed by only one user at a time. Further, the DBMS protects users from the eects of system failures.

Reduced application development time: Clearly, the DBMS supports many important functions that are common to many applications accessing data stored in the DBMS. This, in conjunction with the high-level interface to the data, facilitates quick development of applications. Such applications are also likely to be more robust than applications developed from scratch because many important tasks are handled by the DBMS instead of being implemented by the application.Data Levels and their Roles :

Physical corresponds to the first view of data:How data is stored, how is it accessed, how data is modified, is data ordered, how data is allocated to computer memory and/or peripheral devices, how data items are actually represented (ASCI, EBCDIC,) .The physical schema species additional storage details. Essentially, the physical schema summarizes how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and tapes. We must decide what le organizations to use to store the relations, and create auxiliary data structures called indexes to speed up data retrieval operations.

Conceptual corresponds to the second view of data:What we want the data to express and what relationships between data we must express, what story data tells, are all data necessary for the story are discussed.The conceptual schema (sometimes called the logical schema) describes the stored data in terms of the data model of the DBMS. In a relational DBMS, the conceptual schema describes all relations that are stored in the database. In our sample university database, these relations contain information about entities, such as students and faculty, and about relationships, such as students' enrollment in courses. All student entities can be described using records in a Students relation, as we saw earlier. In fact, each collection of entities and each collection of relationships can be described as a relation, leading to the following conceptual schema:Students(sid: string, name: string, login: string, age: integer, gpa: real)Faculty( d: string, fname: string, sal: real)Courses(cid: string, cname: string, credits: integer)Rooms(rno: integer, address: string, capacity: integer)Enrolled(sid: string, cid: string, grade: string)Teaches( d: string, cid: string)Meets In(cid: string, rno: integer, time: string)The choice of relations, and the choice of elds for each relation, is not always obvious,and the process of arriving at a good conceptual schema is called conceptualdatabase design.

View corresponds to the third view of data: What part of the data is seen by a specific application. External schemas, which usually are also in terms of the data model of the DBMS, allow data access to be customized (and authorized) at the level of individual users or groups of users. The external schema design is guided by end user requirements. For example, we might ant to allow students to and out the names of faculty members teaching courses, as well as course enrollments. This can be done by de ning the following view:Course info(cid: string, fname: string, enrollment: integer)

STRUCTURE OF A DBMS:

When a user issues a query, the parsed query is presented to a query optimizer, which uses information about how the data is stored to produce an efficient execution plan for evaluating the query. An execution plan is a blueprint for evaluating a query, and is usually represented as a tree of relational operators.The code that implements relational operators sits on top of the le and access methods layer. This layer includes a variety of software for supporting the concept of a le, which, in a DBMS, is a collection of pages or a collection of records. This layer typically supports a heap le, or le of unordered pages, as well as indexes. In addition to keeping track of the pages in a le, this layer organizes the information within a page. The les and access methods layer code sits on top of the buer manager, which brings pages in from disk to main memory as needed in response to read requests.The lowest layer of the DBMS software deals with management of space on disk, where the data is stored. Higher layers allocate, deallocate, read, and write pages through (routines provided by) this layer, called the disk space manager.The DBMS supports concurrency and crash recovery by carefully scheduling user requests and maintaining a log of all changes to the database. DBMS components associated with concurrency control and recovery include the transaction manager, which ensures that transactions request and release locks according to a suitable locking protocol and schedules the execution transactions; the lock manager, which keeps track of requests for locks and grants locks on database objects when they become available; and the recovery manager, which is responsible for maintaining a log, and restoring the system to a consistent state after a crash. The disk space manager, buer manager, and le and access method layers must interact with these components.Data Models:A collection of tools for describing ...... Data. Data relationships. Data semantics. Data constraints.Relational model..........Entity-Relationship data model (mainly for database design) .Object-based data models (Object-oriented and Object-relational). Semi structured data model (XML).Other older models:......... Network model . Hierarchical model.Database Access from Application Programs:To access the database, DML statements need to be executed from the host language. There are two ways to do this: By providing an application program interface (set of procedures) that can be used to send DML and DDL statements to the database, and retrieve the results. The Open Database Connectivity (ODBC) standard defined by Microsoft for use with the C language is a commonly used application program interface standard. The Java Database Connectivity (JDBC) standard provides corresponding features to the Java language. By extending the host language syntax to embed DML calls within the host language program. Usually, a special character prefaces DML calls, and a preprocessor, called the DML precompiled, converts the DML statements to normal procedure calls in the host language.Database Users and Administrators: Naive users are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously. Application programmers are computer professionals who write application programs. Sophisticated users interact with the system without writing programs. Instead, they form their requests in a database query language. They submit each such query to a query processor, whose function is to break down DML statements into instructions that the storage manager understands. Analysts who submit queries to explore data in the database fall in this category. Specialized users are sophisticated users who write specialized database applications that do not fit into the traditional data-processing framework. Database Administrator: A person who has such central control over the system is called a database administrator (DBA) Schema definition. The DBA creates the original database schema by executing a set of data definition statements in the DDL. Storage structure and access-method definition. Schema and physical-organization modification. The DBA carries out changes to the schema and physical organization to reflect the changing needs of the organization, or to alter the physical organization to improve performance. Granting of authorization for data access. Routine maintenance. Data Model:A data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints.Entity: An entity is a thing or object in the real world that is distinguishable from all other objects. For example, each person in an enterprise is an entity.Entity set: An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all persons who are customers at a given bank, for example, can be defined as the entity set customer. Similarly, the entity set loan might represent the set of all loans awarded by a particular bank.An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an entity set. The designation of an attribute for an entity set expresses that the database stores similar information concerning each entity in the entity set; however, each entity may have its own value for each attribute.Simple and composite attributes: the attributes have been simple; that is, they are not divided into subparts is called as "simple attributes". on the other hand, can be divided into subparts is called as "composite attributes". For example, an attribute name could be structured as a composite attribute consisting of first-name, middle-initial, and last-name.Single-valued and multivalve attributes: For instance, the loan-number attribute for a specific loan entity refers to only one loan number. Such attributes are said to be single valued. There may be instances where an attribute has a set of values for a specific entity. Consider an employee entity set with the attribute phone-number. An employee may have zero, one, or several phone numbers, and different employees may have different numbers of phones.This type of attribute is said to be multivalve.Derived attribute: The value for this type of attribute can be derived from the values of other related attributes or entities. For instance, let us say that the customer entity set has an attribute loans-held, which represents how many loans a customer has from the bank. We can derive the value for this attribute by counting the number of loan entities associated with that customer.Relationship Sets: A relationship is an association among several entities. A relationship set is a set of relationships of the same type.Mapping Cardinalities: Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be associated via a relationship set. Mapping cardinalities are most useful in describing binary relationship sets, although they can contribute to the description of relationship sets that involve more than two entity sets. One to one. An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. One to many. An entity in A is associated with any number (zero or more) of entities in B. An entity in B, however, can be associated with at most one entity in A. Many to one. An entity in A is associated with at most one entity in B. An entity in B, however, can be associated with any number (zero or more) of entities in A. Many to many. An entity in A is associated with any number (zero or more) of entities in B, and an entity in B is associated with any number (zero or more) of entities in A.Keys:A key allows us to identify a set of attributes that suffice to distinguish entities from each other. Keys also help uniquely identify relationships, and thus distinguish relationships from each other.Super key: A super key is a set of one or more attributes that, taken collectively, allow us to identify uniquely an entity in the entity set. For example, the customer-id attribute of the entity set customer is sufficient to distinguish one customer entity from another. Thus, customer-id is a super key. Similarly, the combination of customer-name and customer-id is a super key for the entity set customer. The customer-name attribute of customer is not a super key, because several people might have the same name.candidate key: minimal super keys are called candidate keys. If K is a super key, then so is any superset of K. We are often interested in super keys for which no proper subset is a super key.It is possible that several distinct sets of attributes could serve as a candidate key. Suppose that a combination of customer-name and customer-street is sufficient to distinguish among members of the customer entity set. Then, both {customer-id} and {customer-name, customer-street} are candidate keys. Although the attributes customer id and customer-name together can distinguish customer entities, their combination does not form a candidate key, since the attribute customer-id alone is a candidate key.primary key:which denotes the unique identity is called as primary key. primary key to denote a candidate key that is chosen by the database designer as the principal means of identifying entities within an entity set. A key (primary, candidate, and super) is a property of the entity set, rather than of the individual entities. Any two individual entities in the set are prohibited from having the same value on the key attributes at the same time. The designation of a key represents a constraint in the real-world enterprise being modeled.Weak Entity Sets:An entity set may not have sufficient attributes to form a primary key. Such an entity set is termed a weak entity set. An entity set that has a primary key is termed a strong entity set.

For a weak entity set to be meaningful, it must be associated with another entity set, called the identifying or owner entity set. Every weak entity must be associated with an identifying entity; that is, the weak entity set is said to be existence dependent on the identifying entity set. The identifying entity set is said to own the weak entity set that it identifies. The relationship associating the weak entity set with the identifying entity set is called the identifying relationship. The identifying relationship is many to one from the weak entity set to the identifying entity set, and the participation of the weak entity set in the relationship is total.In our example, the identifying entity set for payment is loan, and a relationship loan-payment that associates payment entities with their corresponding loan entities is the identifying relationship.Although a weak entity set does not have a primary key, we nevertheless need a means of distinguishing among all those entities in the weak entity set that depend on one particular strong entity. The discriminator of a weak entity set is a set of attributes that allows this distinction to be made.In E-R diagrams, a doubly outlined box indicates a weak entity set, and a doubly outlined diamond indicates the corresponding identifying relationship. in fig the weak entity set payment depends on the strong entity set loan via the relationship set loan-payment.

The figure also illustrates the use of double lines to indicate total participationthe of the (weak) entity set payment in the relationship loan-payment is total, meaning that every payment must be related via loan-payment to some loan. Finally, the arrow from loan-payment to loan indicates that each payment is for a single loan. The discriminator of a weak entity set also is underlined, but with a dashed, rather than a solid, line.

Specialization: An entity set may include sub groupings of entities that are distinct in some way from other entities in the set. For instance, a subset of entities within an entity set may have attributes that are not shared by all the entities in the entity set. The E-R model provides a means for representing these distinctive entity groupings. Consider an entity set person, with attributes name, street, and city. A person may be further classified as one of the following: customer employeeEach of these person types is described by a set of attributes that includes all the attributes of entity set person plus possibly additional attributes. For example, customer entities may be described further by the attribute customer-id, whereas employee entities may be described further by the attributes employee-id and salary. The process of designating sub groupings within an entity set is called specialization. The specialization of person allows us to distinguish among persons according to whether they are employees or customers.Generalization: mThe design process may also proceed in a bottom-up manner, in which multiple entity sets are synthesized into a higher-level entity set on the basis of common features. The database designer may have first identified a customer entity set with the attributes name, street, city, and customer-id, and an employee entity set with the attributes name, street, city, employee-id, and salary. There are similarities between the customer entity set and the employee entity set in the sense that they have several attributes in common. This commonality can be expressed by generalization, which is a containment relationship that exists between a higher-level entity set and one or more lower-level entity sets. In our example, person is the higher-level entity set and customer and employee are lower-level entity sets.Higher- and lower-level entity sets also may be designated by the terms super class and subclass, respectively. The person entity set is the superclass of the customer and employee subclasses. For all practical purposes, generalization is a simple inversion of specialization. We will apply both processes, in combination, in the course of designing the E-R schema for an enterprise. In terms of the E-R diagram itself, we do not distinguish between specialization and generalization. New levels of entity representation will be distinguished (specialization) or synthesized (generalization) as the design schema comes to express fully the database application and the user requirements of the database. Differences in the two approaches may be characterized by their starting point and overall goal. Generalization proceeds from the recognition that a number of entity sets share some common features (namely, they are described by the same attributes and participate in the same relationship sets).

Aggregation:Aggregation is an abstraction in which relationship sets (along with their associated entity sets) are treated as higher-level entity sets, and can participate in relationships.

Symbols used in the E-R notation:

ER Model For a college DB:Assumptions : A college contains many departments Each department can offer any number of courses Many instructors can work in a department An instructor can work only in one department For each department there is a Head An instructor can be head of only one department Each instructor can take any number of courses A course can be taken by only one instructor A student can enroll for any number of courses Each course can have any number of studentsSteps in ER Modeling: Identify the Entities Find relationships Identify the key attributes for every Entity Identify other relevant attributes Draw complete E-R diagram with all attributes including Primary KeyStep 1: Identify the Entities: DEPARTMENT STUDENT COURSE INSTRUCTORStep 2: Find the relationships: One course is enrolled by multiple students and one student enrolls for multiple courses,hence the cardinality between course and student is Many to Many. The department offers many courses and each course belongs to only one department,hence the cardinality between department and course is One to Many. One department has multiple instructors and one instructor belongs to one and only onedepartment , hence the cardinality between department and instructor is one to Many. Each department there is a Head of department and one instructor is Head ofdepartment ,hence the cardinality is one to one . One course is taught by only one instructor, but the instructor teaches many courses,hence the cardinality between course and instructor is many to one.Step 3: Identify the key attributes Dept name is the key attribute for the Entity Department, as it identifies the Department uniquely. Course# (CourseId) is the key attribute for Course Entity. Student# (Student Number) is the key attribute for Student Entity. Instructor Name is the key attribute for Instructor Entity. Step 4: Identify other relevant attributesFor the department entity, the relevant attribute is location For course entity, course name, duration, prerequisite For instructor entity, room#, telephone# For student entity, student name, date of birth

ER model for Banking Business :Assumptions : There are multiple banks and each bank has many branches. Each branch has multiple customers Customers have various types of accounts Some Customers also had taken different types of loans from these bank branches One customer can have multiple accounts and LoansStep 1: Identify the Entities BANK BRANCH LOAN ACCOUNT CUSTOMERStep 2: Find the relationships One Bank has many branches and each branch belongs to only one bank, hence thecardinality between Bank and Branch is One to Many. One Branch offers many loans and each loan is associated with one branch, hence thecardinality between Branch and Loan is One to Many. One Branch maintains multiple accounts and each account is associated to one andonly one Branch, hence the cardinality between Branch and Account is One to Many One Loan can be availed by multiple customers, and each Customer can avail multipleloans, hence the cardinality between Loan and Customer is Many to Many. One Customer can hold multiple accounts, and each Account can be held by multipleCustomers, hence the cardinality between Customer and Account is Many to ManyStep 3: Identify the key attributes Bank Code (Bank Code) is the key attribute for the Entity Bank, as it identifies the bankuniquely. Branch# (Branch Number) is the key attribute for Branch Entity. Customer# (Customer Number) is the key attribute for Customer Entity. Loan# (Loan Number) is the key attribute for Loan Entity. Account No (Account Number) is the key attribute for Account Entity.Step 4: Identify other relevant attributes For the Bank Entity, the relevant attributes other than Bank Code would be Nameand Address. For the Branch Entity, the relevant attributes other than Branch# would be Nameand Address. For the Loan Entity, the relevant attribute other than Loan# would be Loan Type. For the Account Entity, the relevant attribute other than Account No would beAccount Type. For the Customer Entity, the relevant attributes other than Customer# would beName, Telephone# and Address.E-R diagram with all attributes including Primary Key:

Normalization

While designing a database out of an entityrelationship model, the main problem existing in that raw database is redundancy. Redundancy is storing the same data item in more one place. A redundancy creates several problems like the following:1. Extra storage space: storing the same data in many places takes large amount of disk space.2. Entering same data more than once during data insertion.3. Deleting data from more than one place during deletion.4. Modifying data in more than one place.5. Anomalies may occur in the database if insertion, deletion, modification etc are no done properly. It creates inconsistency and unreliability in the database.To solve this problem, the raw database needs to be normalized. This is a step by step process of removing different kinds of redundancy and anomaly at each step. At each step a specific rule is followed to remove specific kind of impurity in order to give the database a slim and clean look.

Un-Normalized Form (UNF)If a table contains non-atomic values at each row, it is said to be in UNF. Anatomic valueis something that can not be further decomposed. Anon-atomic value, as the name suggests, can be further decomposed and simplified. Consider the following table:Emp-IdEmp-NameMonthSalesBank-IdBank-Name

E01AAJan1000B01SBI

Feb1200

Mar850

E02BBJan2200B02UTI

Feb2500

E03CCJan1700B01SBI

Feb1800

Mar1850

Apr1725

In the sample table above, there are multiple occurrences of rows under each key Emp-Id. Although considered to be the primary key, Emp-Id cannot give us the unique identification facility for any single row. Further, each primary key points to a variable length record (3 for E01, 2 for E02 and 4 for E03).

First Normal Form (1NF)A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a unique combination of values. The above table in UNF can be processed to create the following table in 1NF.Emp-IdEmp-NameMonthSalesBank-IdBank-Name

E01AAJan1000B01SBI

E01AAFeb1200B01SBI

E01AAMar850B01SBI

E02BBJan2200B02UTI

E02BBFeb2500B02UTI

E03CCJan1700B01SBI

E03CCFeb1800B01SBI

E03CCMar1850B01SBI

E03CCApr1725B01SBI

As you can see now, each row contains unique combination of values. Unlike in UNF, this relation contains only atomic values, i.e. the rows can not be further decomposed, so the relation is now in 1NF.

Second Normal Form (2NF)A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends on the primary key of the relation. Speaking inversely, if a table has some attributes which is not dependant on the primary key of that table, then it is not in 2NF.Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and Bank-Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which is not the primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be removed into another related relation, it would come to 2NF.Emp-IdEmp-NameMonthSalesBank-Id

E01AAJAN1000B01

E01AAFEB1200B01

E01AAMAR850B01

E02BBJAN2200B02

E02BBFEB2500B02

E03CCJAN1700B01

E03CCFEB1800B01

E03CCMAR1850B01

E03CCAPR1726B01

Bank-IdBank-Name

B01SBI

B02UTI

After removing the portion into another relation we store lesser amount of data in two relations without any loss information. There is also a significant reduction in redundancy.

Third Normal Form (3NF)A relation is said to be in 3NF, if it is already in 2NF and there exists notransitive dependencyin that relation. Speaking inversely, if a table contains transitive dependency, then it is not in 3NF, and the table must be split to bring it into 3NF.What is a transitive dependency? Within a relation if we seeA B [B depends on A]AndB C [C depends on B]Then we may deriveA C[C depends on A]Such derived dependencies hold well in most of the situations. For example if we haveRoll MarksAndMarks GradeThen we may safely deriveRoll Grade.This third dependency was not originally specified but we have derived it.The derived dependency is called a transitive dependency when such dependency becomes improbable. For example we have been givenRoll CityAndCity STD CodeIf we try to derive Roll STD Code it becomes a transitive dependency, because obviously the STDCode of a city cannot depend on the roll number issued by a school or college. In such a case the relation should be broken into two, each containing one of these two dependencies:Roll CityAndCity STD code

Boyce-Code Normal Form (BCNF)A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. These could be same situation when a 3NF relation may not be in BCNF the following conditions are found true.1. The candidate keys are composite.2. There are more than one candidate keys in the relation.3. There are some common attributes in the relation.

Professor CodeDepartmentHead of Dept.Percent Time

P1PhysicsGhosh50

P1MathematicsKrishnan50

P2ChemistryRao25

P2PhysicsGhosh75

P3MathematicsKrishnan100

Consider, as an example, the above relation. It is assumed that:1. A professor can work in more than one department2. The percentage of the time he spends in each department is given.3. Each department has only one Head of Department.The relation diagram for the above relation is given as the following:

The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the Head of Department of Chemistry.The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and deleting Head of Dept. form the given relation. The normalized relations are shown in the following.Professor CodeDepartmentPercent Time

P1Physics50

P1Mathematics50

P2Chemistry25

P2Physics75

P3Mathematics100

DepartmentHead of Dept.

PhysicsGhosh

MathematicsKrishnan

ChemistryRao

See the dependency diagrams for these new relations.

Fourth Normal Form (4NF)When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF are required. Let us first find out what multi-valued dependency is.Amulti-valued dependencyis a typical kind of dependency in which each and every attribute within a relation depends upon the other, yet none of them is a unique primary key.We will illustrate this with an example. Consider a vendor supplying many items to many projects in an organization. The following are the assumptions:1. A vendor is capable of supplying many items.2. A project uses many items.3. A vendor supplies to many projects.4. An item may be supplied by many vendors.A multi valued dependency exists here because all the attributes depend upon the other and yet none of them is a primary key having unique value.Vendor CodeItem CodeProject No.

V1I1P1

V1I2P1

V1I1P3

V1I2P3

V2I2P1

V2I3P1

V3I1P2

V3I1P3

The given relation has a number of problems. For example:1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank for item code has to be introduced.2. The information about item I1 is stored twice for vendor V3.Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned above. The problem is reduced by expressing this relation as two relations in the Fourth Normal Form (4NF). A relation is in 4NF if it has no more than one independent multi valued dependency or one independent multi valued dependency with a functional dependency.The table can be expressed as the two 4NF relations given as following. The fact that vendors are capable of supplying certain items and that they are assigned to supply for some projects in independently specified in the 4NF relation.Vendor-SupplyVendor CodeItem Code

V1I1

V1I2

V2I2

V2I3

V3I1

Vendor-ProjectVendor CodeProject No.

V1P1

V1P3

V2P1

V3P2

Fifth Normal Form (5NF)These relations still have a problem. While defining the 4NF we mentioned that all the attributes depend upon each other. While creating the two tables in the 4NF, although we have preserved the dependencies between Vendor Code and Item code in the first table and Vendor Code and Item code in the second table, we have lost the relationship between Item Code and Project No. If there were a primary key then this loss of dependency would not have occurred. In order to revive this relationship we must add a new table like the following. Please note that during the entire process of normalization, this is the only step where a new table is created by joining two attributes, rather than splitting them into separate tables.Project No.Item Code

P111

P112

P211

P311

P313

Let us finally summarize the normalization steps we have discussed so far.Input RelationTransformationOutput Relation

All RelationsEliminate variable length record. Remove multi-attribute lines in table.1NF

1NF RelationRemove dependency of non-key attributes on part of a multi-attribute key.2NF

2NFRemove dependency of non-key attributes on other non-key attributes.3NF

3NFRemove dependency of an attribute of a multi attribute key on an attribute of another (overlapping) multi-attribute key.BCNF

BCNFRemove more than one independent multi-valued dependency from relation by splitting relation.4NF

4NFAdd one relation relating attributes with multi-valued dependency.5NF

Primitive data type

primitive data type abasic typeis adata typeprovided by aprogramming languageas a basic building block. Most languages allow more complicated composite typesto be recursively constructed starting from basic types. abuilt-in typeis a data type for which the programming language provides built-in support.In most programming languages, all basic data types are built-in. In addition, many languages also provide a set of composite data types. Opinions vary as to whether a built-in type that is not basic should be considered "primitiveDepending on the language and its implementation, primitive data types may or may not have a one-to-one correspondence with objects in the computer's memory. However, one usually expects operations on basic primitive data types to be the fastest language constructs there are Integer addition, for example, can be performed as a single machine instruction, and someprocessorsoffer specific instructions to process sequences of characters with a single instruction. In particular, theCstandard mentions that "a 'plain' int object has the natural size suggested by the architecture of the execution environment". This means thatintis likely to be 32 bits long on a 32-bit architecture. Basic primitive types are almost alwaysvalue types.Most languages do not allow the behavior or capabilities of primitive (either built-in or basic) data types to be modified by programs. Exceptions includeSmalltalk, which permits all data types to be extended within a program, adding to the operations that can be performed on them or even redefining the built-in operations.

Composite typesare derived from more than one primitive type. This can be done in a number of ways. The ways they are combined are calleddata structures. Composing a primitive type into a compound type generally results in a new type, e.g.array-of-integeris a different type tointeger. Anarraystores a number of elements of the same type in a specific order. They are accessed using an integer to specify which element is required (although the elements may be of almost any type). Arrays may be fixed-length or expandable. Record(also called tuple or struct) Records are among the simplestdata structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually calledfieldsor members. Union. A union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g. "float or long integer". Contrast with arecord, which could be defined to contain a floatandan integer; whereas, in a union, there is only one value at a time. Atagged union(also called avariant, variant record, discriminated union, or disjoint union) contains an additional field indicating its current type, for enhanced type safety. Asetis anabstract data structurethat can store certain values, without any particularorder, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a boolean "in" or "not in". Anobjectcontains a number of data fields, like a record, and also a number of program code fragments for accessing or modifying them. Data structures not containing code, like those above, are calledplain old data structure.Many others are possible, but they tend to be further variations and compounds of the above.Logical and Physical Database RequirementsThe requirements for a logical and physical database vary by size and design parameters. A logical database must be able to access and identify all files within the storage system to operate correctly, whereas a physical database manages a much smaller field of information. Sometimes, a physical database stores only a single file with one value or word in it.

Logical Database Definition

A logical database is the collected information stored on multiple physical disk files and hard drives within a computer. This database provides a structure to house all the accumulated information within the device and determines the relationships between different types of files and programs. A logical database determines these relationships through a series of highly structured tables designed to categorize information into groups for easier accessibility. Without this categorization, accessing different files within a computer would take additional time as the system searched each file for the appropriate match.

Logical Database RequirementsA logical database can stretch over multiple physical hard disks and information files. The data storage unit is still a single database for information retrieval purposes. To have a logical database, all given hard disks and information files must be accessible from a single source. An example would be a personal computer able to access its information files stored on multiple hard drives from a single user interface. According to Microsoft, when a logical database is successful, the user sees a coherent list of information from a central location that draws from the many file sources tied into the storage system.

Physical Database DefinitionA physical database is both the actual device housing the information files and the search paths used to access information between each source. According to Microsoft, the term "database" refers only to the logical database controlling information files for the entire system. A physical database is technically a smaller unit of storage referred to as either a company, field, record or table, depending on how much information the physical storage device contains. A field is the smallest unit of storage housing only a single file. A company is the largest -- next to a database -- housing separate, large groups of data.

Physical Storage RequirementsThe requirements for a physical database vary by the parameters of the storage device in question. For example, a flash drive designed to hold up to 2 gigabytes of information needs a personal computer or another USB-connected device to allow access to the information stored on the equipment. A physical database also needs a power source to access information. A computer hard drive cannot function without electricity. A flash drive cannot operate without a device with an adequate power source.

Two types of data modeling are as follows: Logical modeling Physical modelingIf you are going to be working with databases, then it is important to understand the difference between logical and physical modeling, and how they relate to one another. Logical and physical modeling are described in more detail in the following subsections.

Logical ModelingLogical modeling deals with gathering business requirements and converting those requirements into a model. The logical model revolves around the needs of the business, not the database, although the needs of the business are used to establish the needs of the database. Logical modeling involves gathering information about business processes, business entities (categories of data), and organizational units. After this information is gathered, diagrams and reports are produced including entity relationship diagrams, business process diagrams, and eventually process flow diagrams. The diagrams produced should show the processes and data that exists, as well as the relationships between business processes and data. Logical modeling should accurately render a visual representation of the activities and data relevant to a particular business.The diagrams and documentation generated during logical modeling is used to determine whether the requirements of the business have been completely gathered. Management, developers, and end users alike review these diagrams and documentation to determine if more work is required before physical modeling commences.Typical deliverables of logical modeling include Entity relationship diagramsAn Entity Relationship Diagram is also referred to as an analysis ERD. The point of the initial ERD is to provide the development team with a picture of the different categories of data for the business, as well as how these categories of data are related to one another. Business process diagramsThe process model illustrates all the parent and child processes that are performed by individuals within a company. The process model gives the development team an idea of how data moves within the organization. Because process models illustrate the activities of individuals in the company, the process model can be used to determine how a database application interface is design. User feedback documentationPhysical ModelingPhysical modeling involves the actual design of a database according to the requirements that were established during logical modeling. Logical modeling mainly involves gathering the requirements of the business, with the latter part of logical modeling directed toward the goals and requirements of the database. Physical modeling deals with the conversion of the logical, or business model, into a relational database model. When physical modeling occurs, objects are being defined at the schema level. A schema is a group of related objects in a database. A database design effort is normally associated with one schema.During physical modeling, objects such as tables and columns are created based on entities and attributes that were defined during logical modeling. Constraints are also defined, including primary keys, foreign keys, other unique keys, and check constraints. Views can be created from database tables to summarize data or to simply provide the user with another perspective of certain data. Other objects such as indexes and snapshots can also be defined during physical modeling. Physical modeling is when all the pieces come together to complete the process of defining a database for a business.Physical modeling is database software specific, meaning that the objects defined during physical modeling can vary depending on the relational database software being used. For example, most relational database systems have variations with the way data types are represented and the way data is stored, although basic data types are conceptually the same among different implementations. Additionally, some database systems have objects that are not available in other database systems.Typical deliverables of physical modeling include the following: Server model diagramsThe server model diagram shows tables, columns, and relationships within a database. User feedback documentation Database design documentationConclusionUnderstanding the difference between logical and physical modeling will help you build better organized and more effective database systems.

Data independence

Data independenceis the type ofdatatransparency that matters for a centralizedDBMS. It refers to the immunity of user applicationsto make changes in the definition and organization of data.Physical data independence deals with hiding the details of the storage structure from user applications. The application should not be involved with these issues, since there is no difference in the operation carried out against the data.The data independence and operation independence together gives the feature of data abstraction. There are two levels of data independence.First levelThelogicalstructure of the data is known as theschema definition. In general, if a user application operates on a subset of the attributesof arelation, it should not be affected later when new attributes are added to the same relation. Logical data independence indicates that the conceptual schema can be changed without affecting the existing schemas.Second levelThe physical structure of the data is referred to as "physical data description". Physical data independence deals with hiding the details of the storage structure from user applications. The application should not be involved with these issues since, conceptually, there is no difference in the operations carried out against the data. There are two types of data independence:1. Logical data independence: The ability to change the logical (conceptual) schema without changing the External schema (User View) is called logical data independence. For example, the addition or removal of new entities, attributes, or relationships to the conceptual schema should be possible without having to change existing external schemas or having to rewrite existing application programs.2. Physical data independence: The ability to change the physical schema without changing the logical schema is called physical data independence. For example, a change to the internal schema, such as using different file organization or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual or external schemas.3. View level data independence: always independent no effect, because there doesn't exist any other level above view level.Data Independence TypesData independence has two types. They are:1. Physical Independence2. Logical Independence.Data independence can be explained as follows: Each higher level of the data architecture is immune to changes of the next lower level of the architecture.Physical Independence:The logical scheme stays unchanged even though the storage space or type of some data is changed for reasons of optimization or reorganization. In this external schema does not change. In this internal schema changes may be required due to some physical schema were reorganized here. Physical data independence is present in most databases and file environment in which hardware storage of encoding, exact location of data on disk, merging of records, so on this are hidden from user.Logical Independence:The external scheme may stay unchanged for most changes of the logical scheme. This is especially desirable as the application software does not need to be modified or newly translated.

Data abstractionIncomputer science,abstractionis the process by whichdataandprogramsare defined with arepresentationsimilar in form to its meaning (semantics), while hiding away theimplementationdetails. Abstraction tries to reduce and factor out details so that the programmercan focus on a few concepts at a time. A system can have severalabstraction layerswhereby different meanings and amounts of detail are exposed to the programmer. For example,low-levelabstraction layers expose details of thecomputer hardware where the program isrun, while high-level layers deal with thebusiness logicof the program.The following English definition of abstraction helps to understand how this term applies to computer science, IT and objects:abstraction - a concept or idea not associated with any specific instanceAbstraction captures only those details about an object that are relevant to the current perspective. The concept originated by analogy withabstraction in mathematics. The mathematical technique of abstraction begins with mathematicaldefinitions, making it a more technical approach than the general concept ofabstraction in philosophy. For example, in both computing and in mathematics, numbersare concepts in theprogramming languages, as founded in mathematics. Implementation details depend on the hardware and software, but this is not a restriction because the computing concept of number is still based on the mathematical concept.Incomputer programming, abstraction can apply to control or to data:Control abstractionis the abstraction of actions whiledata abstractionis that of data structures. Control abstraction involves the use ofsubprogramsand related conceptscontrol flows Data abstraction allows handling data bits in meaningful ways. For example, it is the basic motivation behinddatatype.One can regard the notion of anobject(fromobject-oriented programming) as an attempt to combine abstractions of data and code.The same abstract definition can be used as a commoninterfacefor a family of objects with different implementations and behaviors but which share the same meaning. Theinheritancemechanism in object-oriented programming can be used to define anabstract classas the common interface.The recommendation that programmers use abstractions whenever suitable in order to avoid duplication (usuallyof code) is known as theabstraction principle. The requirement that a programming language provide suitable abstractions is also called the abstraction principle.

Data abstraction enforces a clear separation between theabstractproperties of adata typeand theconcretedetails of its implementation. The abstract properties are those that are visible to client code that makes use of the data typetheinterfaceto the data typewhile the concrete implementation is kept entirely private, and indeed can change, for example to incorporate efficiency improvements over time. The idea is that such changes are not supposed to have any impact on client code, since they involve no difference in the abstract behaviour.For example, one could define anabstract data typecalledlookup tablewhich uniquely associateskeyswithvalues, and in which values may be retrieved by specifying their corresponding keys. Such a lookup table may be implemented in various ways: as ahash table, abinary search tree, or even a simple linearlistof (key:value) pairs. As far as client code is concerned, the abstract properties of the type are the same in each case.Of course, this all relies on getting the details of the interface right in the first place, since any changes there can have major impacts on client code. As one way to look at this: the interface forms acontracton agreed behaviour between the data type and client code; anything not spelled out in the contract is subject to change without notice.Languages that implement data abstraction includeAdaandModula-2.Object-orientedlanguages are commonly claimed[to offer data abstraction; however, theirinheritanceconcept tends to put information in the interface that more properly belongs in the implementation; thus, changes to such information ends up impacting client code, leading directly to theFragile binary interface problem.SQLSQL is a standard language for accessing databases.Our SQL tutorial will teach you how to use SQL to access and manipulate data in: MySQL, SQL Server, Access, Oracle, Sybase, DB2, and other database systems.SQL is a standard language for accessing and manipulating databases.

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate databases SQL is an ANSI (American National Standards Institute) standard

What Can SQL do? SQL can execute queries against a database SQL can retrieve data from a database SQL can insert records in a database SQL can update records in a database SQL can delete records from a database SQL can create new databases SQL can create new tables in a database SQL can create stored procedures in a database SQL can create views in a database SQL can set permissions on tables, procedures, and views

SQL is a Standard - BUT....Although SQL is an ANSI (American National Standards Institute) standard, there are different versions of the SQL language.However, to be compliant with the ANSI standard, they all support at least the major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner.Note:Most of the SQL database programs also have their own proprietary extensions in addition to the SQL standard!

Using SQL in Your Web SiteTo build a web site that shows data from a database, you will need: An RDBMS database program (i.e. MS Access, SQL Server, MySQL) To use a server-side scripting language, like PHP or ASP To use SQL to get the data you want To use HTML / CSS

RDBMSRDBMS stands for Relational Database Management System.RDBMS is the basis for SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.The data in RDBMS is stored in database objects called tables.A table is a collection of related data entries and it consists of columns and rows.Database TablesA database most often contains one or more tables. Each table is identified by a name (e.g. "Customers" or "Orders"). Tables contain records (rows) with data.In this tutorial we will use the well-known North wind sample database (included in MS Access and MS SQL Server).Below is a selection from the "Customers" table:CustomerIDCustomerNameContactNameAddressCityPostalCodeCountry

1Alfreds FutterkisteMaria AndersObere Str. 57Berlin12209Germany

2Ana Trujillo Emparedados y heladosAna TrujilloAvda. de la Constitucin 2222Mxico D.F.05021Mexico

3Antonio Moreno TaqueraAntonio MorenoMataderos 2312Mxico D.F.05023Mexico

4Around the HornThomas Hardy120 Hanover Sq.LondonWA1 1DPUK

5Berglunds snabbkpChristina BerglundBerguvsvgen 8LuleS-958 22Sweden

The table above contains five records (one for each customer) and seven columns (CustomerID, CustomerName, ContactName, Address, City, PostalCode, and Country).

SQL StatementsMost of the actions you need to perform on a database are done with SQL statements.The following SQL statement selects all the records in the "Customers" table:ExampleSELECT * FROM Customers; In this tutorial we will teach you all about the different SQL statements.

Keep in Mind That... SQL is NOT case sensitive: SELECT is the same as select

Semicolon after SQL Statements?Some database systems require a semicolon at the end of each SQL statement.Semicolon is the standard way to separate each SQL statement in database systems that allow more than one SQL statement to be executed in the same call to the server.In this tutorial, we will use semicolon at the end of each SQL statement.

Some of The Most Important SQL Commands SELECT- extracts data from a database UPDATE- updates data in a database DELETE- deletes data from a database INSERT INTO- inserts new data into a database CREATE DATABASE- creates a new database ALTER DATABASE- modifies a database CREATE TABLE- creates a new table ALTER TABLE- modifies a table DROP TABLE- deletes a table CREATE INDEX- creates an index (search key) DROP INDEX- deletes an indexFile organizationFile organization is the methodology which is applied to structured computer files. Files contain computer records which can be documents or information which is stored in a certain way for later retrieval. File organization refers primarily to the logical arrangement of data (which can itself be organized in a system of records with correlation between the fields/columns) in a file system. It should not be confused with the physical storage of the file in some types of storage media. There are certain basic types of computer file, which can include files stored as blocks of data and streams of data, where the information streams out of the file while it is being read until the end of the file is encountered.We will look at two components of file organization here:1. The way the internal file structure is arranged and2. The external file as it is presented to the O/S or program that calls it. Here we will also examine the concept of file extensions.We will examine various ways that files can be stored and organized. Files are presented to the application as a stream of bytes and then an EOF (end of file) condition.A program that uses a file needs to know the structure of the file and needs to interpret its contents.

Internal File Structure - Methods and Design ParadigmIt is a high-level design decision to specify a system of file organization for a computer software program or a computer system designed for a particular purpose. Performance is high on the list of priorities for this design process, depending on how the file is being used. The design of the file organization usually depends mainly on the system environment. For instance, factors such as whether the file is going to be used for transaction-oriented processes like OLTP or Data Warehousing, or whether the file is shared among various processes like those found in a typical distributed system or standalone. It must also be asked whether the file is on a network and used by a number of users and whether it may be accessed internally or remotely and how often it is accessed.However, all things considered the most important considerations might be:1. Rapid access to a record or a number of records which are related to each other.2. The Adding, modification, or deletion of records.3. Efficiency of storage and retrieval of records.4. Redundancy, being the method of ensuring data integrity.A file should be organized in such a way that the records are always available for processing with no delay. This should be done in line with the activity and volatility of the information.Types of File OrganizationOrganizing a file depends on what kind of file it happens to be: a file in the simplest form can be a text file, (in other words a file which is composed of ascii (American Standard Code for Information Interchange) text.) Files can also be created as binary or executable types (containing elements other than plain text.) Also, files are keyed with attributes which help determine their use by the host operating system.Techniques of File OrganizationThe three techniques of file organization are:1. Heap (unordered)2. Sorted1. Sequential (SAM)2. Line Sequential (LSAM)3. Indexed Sequential (ISAM)3. Hashed or DirectIn addition to the three techniques, there are four methods of organizing files. They are sequential, line-sequential, indexed-sequential, inverted listanddirect or hashed access organization.Sequential OrganizationA sequential file contains records organized in the order they were entered. The order of the records is fixed. The records are stored and sorted in physical, contiguous blocks within each block the records are in sequence.Records in these files can only be read or written sequentially.Once stored in the file, the record cannot be made shorter, or longer, or deleted. However, the record can beupdatedif the length does not change. (This is done by replacing the records by creating a new file.) New records will always appear at the end of the file.If theorder of the recordsin a file is not important,sequential organizationwill suffice, no matter how many records you may have. Sequential output is also useful for report printing or sequential readswhich some programs prefer to do.Line-Sequential OrganizationLine-sequential files are like sequential files, except that the records can contain only characters as data. Line-sequential files are maintained by the native byte stream files of the operating system.In the COBOL environment, line-sequential files that are created with WRITE statements with the ADVANCING phrase can be directed to a printer as well as to a disk.Indexed-Sequential OrganizationKey searches are improved by this system too. The single-level indexing structure is the simplest one where a file, whose records are pairs, contains a key pointer. Thispointeris the position in the data file of the record with the given key. A subset of the records, which are evenly spaced along the data file, is indexed, in order to mark intervals of data records.This is how a key search is performed: the search key is compared with the index keys to find the highest index key coming in front of the search key, while a linear search is performed from the record that the index key points to, until the search key is matched or until the record pointed to by the next index entry is reached. Regardless of double file access (index + data) required by this sort of search, the access time reduction is significant compared with sequential file searches.Let's examine, for sake of example, a simple linear search on a 1,000 recordsequentially organizedfile. An average of 500 key comparisons are needed (and this assumes the search keys are uniformly distributed among the data keys). However, using an index evenly spaced with 100 entries, the total number of comparisons is reduced to 50 in the index file plus 50 in the data file: a five to one reduction in the operations count!Hierarchical extension of this scheme is possible since an index is a sequential file in itself, capable of indexing in turn by another second-level index, and so forth and so on. And the exploit of the hierarchical decomposition of the searches more and more, to decrease the access time will pay increasing dividends in the reduction of processing time. There is however a point when this advantage starts to be reduced by the increased cost of storage and this in turn will increase the index access time.Hardware for Index-Sequential Organization is usually Disk-based, rather than tape. Records are physically ordered by primary key. And the index gives the physical location of each record. Records can be accessed sequentially or directly, via the index. The index is stored in a file and read into memory at the point when the file is opened. Also, indexes must be maintained.Life sequential organization the data is stored in physical contiguous box. How ever the difference is in the use of indexes. There are three areas in the disc storage: Primary Area:-Contains file records stored by key or ID numbers. Overflow Area:-Contains records area that cannot be placed in primary area. Index Area:-It contains keys of records and there locations on the disc.Inverted ListIn file organization, this is a file that is indexed on many of the attributes of the data itself. The inverted list method has a single index for each key type. The records are not necessarily stored in a sequence. They are placed in the are data storage area, but indexes are updated for the record keys and location.Here's an example, in a company file, an index could be maintained for allproducts, another one might be maintained forproduct types. Thus, it is faster to search the indexes than every record. These types of file are also known as"inverted indexes."Nevertheless,inverted list filesuse more media space and the storage devices get full quickly with this type of organization. The benefits are apparent immediately because searching is fast. However, updating is much slower.Content-based queries in text retrieval systems useinverted indexesas their preferred mechanism. Data items in these systems are usually storedcompressedwhich would normally slow the retrieval process, but the compression algorithm will be chosen to support this technique.When querying a file there are certain circumstances when the query is designed to bemodal which means that rules are set which require that different information be held in the index. Here's an example of this modality: when phrase querying is undertaken, the particular algorithm requires that offsets to word classifications are held in addition to document numbers.Direct or Hashed AccessWithdirect or hashed accessa portion of disk space is reserved and a hashing algorithm computes the record address. So there is additional space required for this kind of file in the store. Records are placed randomly through out the file. Records are accessed by addresses that specify their disc location. Also, this type of file organization requires a disk storage rather than tape. It has an excellent search retrieval performance, but care must be taken to maintain the indexes. If the indexes become corrupt, what is left may as well go to the bit-bucket, so it is as well to have regular backups of this kind of file just as it is for all stored valuable data!External File Structure and File ExtensionsMicrosoft WindowsandMS-DOSFile SystemsThe external structure of a file depends on whether it is being created on aFATorNTFS partition. The maximum filename length on aNTFSpartition is 256 characters, and 11 characters onFAT(8 character name+"."+3 character extension.)NTFSfilenameskeep their case,whereasFATfilenameshave no concept of case(but case is ignored when performing a search underNTFSOperating System). Also, there is the newVFATwhich permits 256 character filenames.UNIXandApple MacintoshFile SystemsThe concept of directories and files is fundamental to the UNIX operating system. OnMicrosoft Windows-based operating systems, directories are depicted asfoldersand moving about is accomplished by clicking on the different icons. InUNIX, the directories are arranged as a hierarchy with theroot directorybeing at the top of the tree. Therootdirectory is always depicted as/. Within the/directory, there are subdirectories (e.g.: etc and sys).Filescan be written to any directory depending on the permissions. Files can bereadable,writableand/or executable.Organizing files using LibrariesWith the advent of Microsoft Windows 7 the concept of file organization and management has improved drastically by way of use of powerful tool called Libraries. A Library is file organization system to bring together related files and folders stored in different locations of the local as well as network computer such that these can be accessed centrally through a single access point. For instance, various images stored in different folders in the local computer or/and across a computer network can be accumulated in an Image Library. Aggregation of similar files can be manipulated, sorted or accessed conveniently as and when required through a single access point on a computer desktop by use of a Library. This feature is particularly very useful for accessing similar content of related content, and also, for managing projects using related and common data.