computingnow - uptraining.up.nic.in/coursematerial/imp_url_imortant_topics.pdf · 9/13/2016 · on...
TRANSCRIPT
https://www.computer.org/web/computingnow
http://guidelines.gov.in/tools.php
http://inoc.nic.in/
https://security.nic.in/docs/procedure%20for%20cleaning%20of%20web%20sites%20from%20malicious%20files.docx
https://security.nic.in/docs/NIC_RHEL-7_Hardening_V-1.0.pdf
What are malware, viruses, Spyware, and cookies, and what differentiates them ?
"Malware" is short for malicious software and used as a single term to refer to virus, spy ware, worm etc. Malware is
designed to cause damage to a stand alone computer or a networked pc. So wherever a malware term is used it means a
program which is designed to damage your computer it may be a virus, worm or Trojan.
Worms:-
Worms are malicious programs that make copies of themselves again and again on the local drive, network shares, etc. The
only purpose of the worm is to reproduce itself again and again. It doesn‘t harm any data/file on the computer. Unlike a
virus, it does not need to attach itself to an existing program. Worms spread by exploiting vulnerabilities in operating
systems
Examples of worm are: - W32.SillyFDC.BBY
Packed.Generic.236
W32.Troresba
Due to its replication nature it takes a lot of space in the hard drive and consumes more cpu uses which in turn makes the
pc too slow also consumes more network bandwidth.
Virus:-
Virus is a program written to enter to your computer and damage/alter your files/data. A virus might corrupt or delete data
on your computer. Viruses can also replicate themselves. A computer Virus is more dangerous than a computer worm as it
makes changes or deletes your files while worms only replicates itself with out making changes to your files/data.
Examples of virus are: - W32.Sfc!mod
ABAP.Rivpas.A
Accept.3773
Viruses can enter to your computer as an attachment of images, greeting, or audio / video files. Viruses also enters through
downloads on the Internet. They can be hidden in a free/trial softwares or other files that you download.
So before you download anything from internet be sure about it first. Almost all viruses are attached to an executable file,
which means the virus may exist on your computer but it actually cannot infect your computer unless you run or open the
malicious program. It is important to note that a virus cannot be spread without a human action, such as running an
infected program to keep it going.
Virus is of different types which are as follows.
1) File viruses
2) Macro viruses
3) Master boot record viruses
4) Boot sector viruses
5) Multipartite viruses
6) Polymorphic viruses
7) Stealth viruses
File Virus:-This type of virus normally infects program files such as .exe, .com, .bat. Once this virus stays in memory it
tries to infect all programs that load on to memory.
Macro Virus: - These type of virus infects word, excel, PowerPoint, access and other data files. Once infected repairing of
these files is very much difficult.
Master boot record files: - MBR viruses are memory-resident viruses and copy itself to the first sector of a storage device
which is used for partition tables or OS loading programs .A MBR virus will infect this particular area of Storage device
instead of normal files. The easiest way to remove a MBR virus is to clean the MBR area,
Boot sector virus: - Boot sector virus infects the boot sector of a HDD or FDD. These are also memory resident in nature.
As soon as the computer starts it gets infected from the boot sector.
Cleaning this type of virus is very difficult.
Multipartite virus: - A hybrid of Boot and Program/file viruses. They infect program files and when the infected program
is executed, these viruses infect the boot record. When you boot the computer next time the virus from the boot record
loads in memory and then start infecting other program files on disk
Polymorphic viruses: - A virus that can encrypt its code in different ways so that it appears differently in each infection.
These viruses are more difficult to detect.
Stealth viruses: - These types of viruses use different kind of techniques to avoid detection. They either redirect the disk
head to read another sector instead of the one in which they reside or they may alter the reading of the infected file‘s size
shown in the directory listing. For example, the Whale virus adds 9216 bytes to an infected file; then the virus subtracts the
same number of bytes (9216) from the size given in the directory.
Trojans: - A Trojan horse is not a virus. It is a destructive program that looks as a genuine application. Unlike viruses,
Trojan horses do not replicate themselves but they can be just as destructive. Trojans also open a backdoor entry to your
computer which gives malicious users/programs access to your system, allowing confidential and personal information to
be theft.
Example: - JS.Debeski.Trojan
Trojan horses are broken down in classification based on how they infect the systems and the damage caused by them. The
seven main types of Trojan horses are:
• Remote Access Trojans
• Data Sending Trojans
• Destructive Trojans
• Proxy Trojans
• FTP Trojans
• security software disabler Trojans
• denial-of-service attack Trojans
Adware: - Generically adware is a software application in which advertising banners are displayed while any program is
running. Adware can automatically get downloaded to your system while browsing any website and can be viewed through
pop-up windows or through a bar that appears on a computer screen automatically. Adwares are used by companies for
marketing purpose.
Spywares: - Spyware is a type of program that is installed with or without your permission on your personal computers to
collect information about users, their computer or browsing habits tracks each and everything that you do without your
knowledge and send it to remote user. It also can download other malicious programs from internet and install it on the
computer.Spyware works like adware but is usually a separate program that is installed unknowingly when you install
another freeware type program or application.
Spam: - Spamming is a method of flooding the Internet with copies of the same message. Most spams are commercial
advertisements which are sent as an unwanted email to users. Spams are also known as Electronic junk mails or junk
newsgroup postings. These spam mails are very annoying as it keeps coming every day and keeps your mailbox full.
Tracking cookies: - A cookie is a plain text file that is stored on your computer in a cookies folder and it stores data about
your browsing session. Cookies are used by many websites to track visitor information A tracking cookie is a cookie
which keeps tracks of all your browsing information and this is used by hackers and companies to know all your personal
details like bank account details, your credit card information etc. which is dangerous .
Misleading applications: - Misleading applications misguide you about the security status of your computer and shows
you that your computer is infected by some malware and you have to download the tool to remove the threat. As you
download the tool it shows some threats in your computer and to remove it you have to buy the product for which it asks
some personal information like credit card information etc. which is dangerous.
1. Repeater – A repeater operates at the physical layer. Its job is to regenerate the signal over the same network before the
signal becomes too weak or corrupted so as to extend the length to which the signal can be transmitted over the same
network. An important point to be noted about repeaters is that they do no amplify the signal. When the signal becomes
weak, they copy the signal bit by bit and regenerate it at the original strength. It is a 2 port device.
2. Hub – A hub is basically a multiport repeater. A hub connects multiple wires coming from different branches, for
example, the connector in star topology which connects different stations. Hubs cannot filter data, so data packets are sent
to all connected devices. In other words, collision domain of all hosts connected through Hub remains one. Also, they do
not have intelligence to find out best path for data packets which leads to inefficiencies and wastage.
3. Bridge – A bridge operates at data link layer. A bridge is a repeater, with add on functionality of filtering content by
reading the MAC addresses of source and destination. It is also used for interconnecting two LANs working on the same
protocol. It has a single input and single output port, thus making it a 2 port device.
4. Switch – A switch is a multi port bridge with a buffer and a design that can boost its efficiency(large number of ports
imply less traffic) and performance. Switch is data link layer device. Switch can perform error checking before forwarding
data, that makes it very efficient as it does not forward packets that have errors and forward good packets selectively to
correct port only. In other words, switch divides collision domain of hosts, but broadcast domain remains same.
5. Routers – A router is a device like a switch that routes data packets based on their IP addresses. Router is mainly a
Network Layer device. Routers normally connect LANs and WANs together and have a dynamically updating routing
table based on which they make decisions on routing the data packets. Router divide broadcast domains of hosts connected
through it.
6. Gateway – A gateway, as the name suggests, is a passage to connect two networks together that may work upon
different networking models. They basically works as the messenger agents that take data from one system, interpret it,
and transfer it to another system. Gateways are also called protocol converters and can operate at any network layer.
Gateways are generally more complex than switch or router.
COMPUTER NETWORKS
http://www.studytonight.com/computer-networks/osi-model-network-layer
Introduction To Computer Networks
Today the world scenario is changing. Data Communication and network have changed the way business and other daily
affair works. Now, they rely on computer networks and internetwork. A set of devices often mentioned as nodes connected
by media link is called a Network. A node can be a device which is capable of sending or receiving data generated by other
nodes on the network like a computer, printer etc. These links connecting the devices are called Communication channels.
Computer network is a telecommunication channel through which we can share our data. It is also called data network.
The best example of computer network is Internet. Computer network does not mean a system with control unit and other
systems as its slave. It is called a distributed system
A network must be able to meet certain criteria, these are mentioned below:
1. Performance
2. Reliability
3. Scalability
Performance
It can be measured in following ways :
Transit time : It is the time taken to travel a message from one device to another.
Response time : It is defined as the time elapsed between enquiry and response.
Other ways to measure performance are :
1. Efficiency of software
2. Number of users
3. Capability of connected hardware
Reliability
It decides the frequency at which network failure take place. More the failures are, less is the network's reliability.
Security
It refers to the protection of data from the unauthorised user or access. While travelling through network, data passes many
layers of network, and data can be traced if attempted. Hence security is also a very important characteristic for Networks.
Properties of Good Network
1. Interpersonal Communication : We can communicate with each other efficiently and easily example emails, chat
rooms, video conferencing etc.
2. Resources can be shared : We can use the resources provided by network such as printers etc.
3. Sharing files, data : Authorised users are allowed to share the files on the network.
Basic Communication Model
Communication model is used to exchange data between two parties. For example communication between a computer,
server and telephone (through modem).
Source
Data to be transmitted is generated by this device, example: telephones, personal computers etc.
Transmitter
The data generated by the source system are not directly transmitted in the form they are generated. The transmitter
transforms and encodes the information in such a form to produce electromagnetic waves or signals.
Transmission System
A transmission system can be a single transmission line or a complex network connecting source and destination.
Receiver
Receiver accepts the signal from the transmission system and converts it to a form which is easily managed by the
destination device.
Destination
Destination receives the incoming data from the receiver.
Data Communication
The exchange of data between two devices through a transmission medium is Data Communication. The data is exchanged
in the form of 0‘s and 1‘s. The transmission medium used is wire cable. For data communication to occur, the
communication device must be part of a communication system. Data Communication has two types Local and Remote
which are discussed below :
Local :
Local communication takes place when the communicating devices are in the same geographical area, same building, face-
to-face between individuals etc.
Remote :
Remote communication takes place over a distance i.e. the devices are farther. Effectiveness of a Data Communication can
be measured through the following features :
1. Delivery : Delivery should be done to the correct destination.
2. Timeliness : Delivery should be on time.
3. Accuracy : Data delivered should be accurate.
Components of Data Communication
1. Message : It is the information to be delivered.
2. Sender : Sender is the person who is sending the message.
3. Receiver : Receiver is the person to him the message is to be delivered.
4. Medium : It is the medium through which message is to be sent for example modem.
5. Protocol : These are some set of rules which govern data communication.
Line Configuration in Computer Networks
Network is a connection made through connection links between two or more devices. Devices can be a computer, printer
or any other device that is capable to send and receive data. There are two ways to connect the devices :
1. Point-to-Point connection
2. Multipoint connection
Point-To-Point Connection
It is a protocol which is used as a communication link between two devices. It is simple to establish. The most common
example for Point-to-Point connection (PPP) is a computer connected by telephone line. We can connect the two devices
by means of a pair of wires or using a microwave or satellite link.
Example: Point-to-Point connection between remote control and Television for changing the channels.
MultiPoint Connection
It is also called Multidrop configuration. In this connection two or more devices share a single link.
There are two kinds of Multipoint Connections :
If the links are used simultaneously between many devices, then it is spatially shared line configuration.
If user takes turns while using the link, then it is time shared (temporal) line configuration.
Types of Network Topology
Network Topology is the schematic description of a network arrangement, connecting various nodes(sender and receiver)
through lines of connection.
BUS Topology
Bus topology is a network type in which every computer and network device is connected to single cable. When it has
exactly two endpoints, then it is called Linear Bus topology.
Features of Bus Topology
1. It transmits data only in one direction.
2. Every device is connected to a single cable
Advantages of Bus Topology
1. It is cost effective.
2. Cable required is least compared to other network topology.
3. Used in small networks.
4. It is easy to understand.
5. Easy to expand joining two cables together.
Disadvantages of Bus Topology
1. Cables fails then whole network fails.
2. If network traffic is heavy or nodes are more the performance of the network decreases.
3. Cable has a limited length.
4. It is slower than the ring topology.
RING Topology
It is called ring topology because it forms a ring as each computer is connected to another computer, with the last one
connected to the first. Exactly two neighbours for each device.
Features of Ring Topology
1. A number of repeaters are used for Ring topology with large number of nodes, because if someone wants to send
some data to the last node in the ring topology with 100 nodes, then the data will have to pass through 99 nodes to
reach the 100th node. Hence to prevent data loss repeaters are used in the network.
2. The transmission is unidirectional, but it can be made bidirectional by having 2 connections between each Network
Node, it is called Dual Ring Topology.
3. In Dual Ring Topology, two ring networks are formed, and data flow is in opposite direction in them. Also, if one
ring fails, the second ring can act as a backup, to keep the network up.
4. Data is transferred in a sequential manner that is bit by bit. Data transmitted, has to pass through each node of the
network, till the destination node.
Advantages of Ring Topology
1. Transmitting network is not affected by high traffic or by adding more nodes, as only the nodes having tokens can
transmit data.
2. Cheap to install and expand
Disadvantages of Ring Topology
1. Troubleshooting is difficult in ring topology.
2. Adding or deleting the computers disturbs the network activity.
3. Failure of one computer disturbs the whole network.
STAR Topology
In this type of topology all the computers are connected to a single hub through a cable. This hub is the central node and all
others nodes are connected to the central node.
Features of Star Topology
1. Every node has its own dedicated connection to the hub.
2. Hub acts as a repeater for data flow.
3. Can be used with twisted pair, Optical Fibre or coaxial cable.
Advantages of Star Topology
1. Fast performance with few nodes and low network traffic.
2. Hub can be upgraded easily.
3. Easy to troubleshoot.
4. Easy to setup and modify.
5. Only that node is affected which has failed, rest of the nodes can work smoothly.
Disadvantages of Star Topology
1. Cost of installation is high.
2. Expensive to use.
3. If the hub fails then the whole network is stopped because all the nodes depend on the hub.
4. Performance is based on the hub that is it depends on its capacity
MESH Topology
It is a point-to-point connection to other nodes or devices. All the network nodes are connected to each other. Mesh has
n(n-1)/2 physical channels to link n devices.
There are two techniques to transmit data over the Mesh topology, they are :
1. Routing
2. Flooding
Routing
In routing, the nodes have a routing logic, as per the network requirements. Like routing logic to direct the data to reach
the destination using the shortest distance. Or, routing logic which has information about the broken links, and it avoids
those node etc. We can even have routing logic, to re-configure the failed nodes.
Flooding
In flooding, the same data is transmitted to all the network nodes, hence no routing logic is required. The network is
robust, and the its very unlikely to lose the data. But it leads to unwanted load over the network.
Types of Mesh Topology
1. Partial Mesh Topology : In this topology some of the systems are connected in the same fashion as mesh topology
but some devices are only connected to two or three devices.
2. Full Mesh Topology : Each and every nodes or devices are connected to each other.
Features of Mesh Topology
1. Fully connected.
2. Robust.
3. Not flexible.
Advantages of Mesh Topology
1. Each connection can carry its own data load.
2. It is robust.
3. Fault is diagnosed easily.
4. Provides security and privacy.
Disadvantages of Mesh Topology
1. Installation and configuration is difficult.
2. Cabling cost is more.
3. Bulk wiring is required.
TREE Topology
It has a root node and all other nodes are connected to it forming a hierarchy. It is also called hierarchical topology. It
should at least have three levels to the hierarchy.
Features of Tree Topology
1. Ideal if workstations are located in groups.
2. Used in Wide Area Network.
Advantages of Tree Topology
1. Extension of bus and star topologies.
2. Expansion of nodes is possible and easy.
3. Easily managed and maintained.
4. Error detection is easily done.
Disadvantages of Tree Topology
1. Heavily cabled.
2. Costly.
3. If more nodes are added maintenance is difficult.
4. Central hub fails, network fails.
HYBRID Topology
It is two different types of topologies which is a mixture of two or more topologies. For example if in an office in one
department ring topology is used and in another star topology is used, connecting these topologies will result in Hybrid
Topology (ring topology and star topology).
Features of Hybrid Topology
1. It is a combination of two or topologies
2. Inherits the advantages and disadvantages of the topologies included
Advantages of Hybrid Topology
1. Reliable as Error detecting and trouble shooting is easy.
2. Effective.
3. Scalable as size can be increased easily.
4. Flexible.
Disadvantages of Hybrid Topology
1. Complex in design.
2. Costly.
Transmission Modes in Computer Networks
Transmission mode means transferring of data between two devices. It is also called communication mode. These modes
direct the direction of flow of information. There are three types of transmission mode. They are :
Simplex Mode
Half duplex Mode
Full duplex Mode
SIMPLEX Mode
In this type of transmission mode data can be sent only through one direction i.e. communication is unidirectional. We
cannot send a message back to the sender. Unidirectional communication is done in Simplex Systems.
Examples of simplex Mode is loudspeaker, television broadcasting, television and remote, keyboard and monitor etc.
HALF DUPLEX Mode
In half duplex system we can send data in both directions but it is done one at a time that is when the sender is sending the
data then at that time we can‘t send the sender our message. The data is sent in one direction.
Example of half duplex is a walkie- talkie in which message is sent one at a time and messages are sent in both the
directions.
FULL DUPLEX Mode
In full duplex system we can send data in both directions as it is bidirectional. Data can be sent in both directions
simultaneously. We can send as well as we receive the data.
Example of Full Duplex is a Telephone Network in which there is communication between two persons by a telephone
line, through which both can talk and listen at the same time.
In full duplex system there can be two lines one for sending the data and the other for receiving data.
Transmission Mediums in Computer Networks
Data is represented by computers and other telecommunication devices using signals. Signals are transmitted in the form
of electromagnetic energy from one device to another. Electromagnetic signals travel through vacuum, air or other
transmission mediums to travel between one point to another(from source to receiver).
Electromagnetic energy (includes electrical and magnetic fields) includes power, voice, visible light, radio waves,
ultraviolet light, gamma rays etc.
Transmission medium is the means through which we send our data from one place to another. The first layer (physical
layer) of Communication Networks OSI Seven layer model is dedicated to the transmission media, we will study the OSI
Model later.
Factors to be considered while choosing Transmission Medium
1. Transmission Rate
2. Cost and Ease of Installation
3. Resistance to Environmental Conditions
4. Distances
Bounded/Guided Transmission Media
It is the transmission media in which signals are confined to a specific path using wire or cable. The types of Bounded/
Guided are discussed below.
Twisted Pair Cable
This cable is the most commonly used and is cheaper than others. It is lightweight, cheap, can be installed easily, and they
support many different types of network. Some important points :
Its frequency range is 0 to 3.5 kHz.
Typical attenuation is 0.2 dB/Km @ 1kHz.
Typical delay is 50 µs/km.
Repeater spacing is 2km.
Twisted Pair is of two types :
Unshielded Twisted Pair (UTP)
Shielded Twisted Pair (STP)
Unshielded Twisted Pair Cable
It is the most common type of telecommunication when compared with Shielded Twisted Pair Cable which consists of two
conductors usually copper, each with its own colour plastic insulator. Identification is the reason behind coloured plastic
insulation.
UTP cables consist of 2 or 4 pairs of twisted cable. Cable with 2 pair use RJ-11 connector and 4 pair cable use RJ-45
connector.
Advantages :
Installation is easy
Flexible
Cheap
It has high speed capacity,
100 meter limit
Higher grades of UTP are used in LAN technologies like Ethernet.
It consists of two insulating copper wires (1mm thick). The wires are twisted together in a helical form to reduce electrical
interference from similar pair.
Disadvantages :
Bandwidth is low when compared with Coaxial Cable
Provides less protection from interference.
Shielded Twisted Pair Cable
This cable has a metal foil or braided-mesh covering which encases each pair of insulated conductors. Electromagnetic
noise penetration is prevented by metal casing. Shielding also eliminates crosstalk (explained in KEY TERMS Chapter).
It has same attenuation as unshielded twisted pair. It is faster the unshielded and coaxial cable. It is more expensive than
coaxial and unshielded twisted pair.
Advantages :
Easy to install
Performance is adequate
Can be used for Analog or Digital transmission
Increases the signalling rate
Higher capacity than unshielded twisted pair
Eliminates crosstalk
Disadvantages :
Difficult to manufacture
Heavy
Coaxial Cable
Coaxial is called by this name because it contains two conductors that are parallel to each other. Copper is used in this as
centre conductor which can be a solid wire or a standard one. It is surrounded by PVC installation, a sheath which is
encased in an outer conductor of metal foil, barid or both.
Outer metallic wrapping is used as a shield against noise and as the second conductor which completes the circuit. The
outer conductor is also encased in an insulating sheath. The outermost part is the plastic cover which protects the whole
cable.
Here the most common coaxial standards.
50-Ohm RG-7 or RG-11 : used with thick Ethernet.
50-Ohm RG-58 : used with thin Ethernet
75-Ohm RG-59 : used with cable television
93-Ohm RG-62 : used with ARCNET.
There are two types of Coaxial cables :
BaseBand
This is a 50 ohm (Ω) coaxial cable which is used for digital transmission. It is mostly used for LAN‘s. Baseband transmits
a single signal at a time with very high speed. The major drawback is that it needs amplification after every 1000 feet.
BroadBand
This uses analog transmission on standard cable television cabling. It transmits several simultaneous signal using different
frequencies. It covers large area when compared with Baseband Coaxial Cable.
Advantages :
Bandwidth is high
Used in long distance telephone lines.
Transmits digital signals at a very high rate of 10Mbps.
Much higher noise immunity
Data transmission without distortion.
The can span to longer distance at higher speeds as they have better shielding when compared to twisted pair cable
Disadvantages :
Single cable failure can fail the entire network.
Difficult to install and expensive when compared with twisted pair.
If the shield is imperfect, it can lead to grounded loop.
Fiber Optic Cable
These are similar to coaxial cable. It uses electric signals to transmit data. At the centre is the glass core through which
light propagates.
In multimode fibres, the core is 50microns, and In single mode fibres, the thickness is 8 to 10 microns.
The core in fiber optic cable is surrounded by glass cladding with lower index of refraction as compared to core to keep all
the light in core. This is covered with a thin plastic jacket to protect the cladding. The fibers are grouped together in
bundles protected by an outer shield.
Fiber optic cable has bandwidth more than 2 gbps (Gigabytes per Second)
Advantages :
Provides high quality transmission of signals at very high speed.
These are not affected by electromagnetic interference, so noise and distortion is very less.
Used for both analog and digital signals.
Disadvantages :
It is expensive
Difficult to install.
Maintenance is expensive and difficult.
Do not allow complete routing of light signals.
UnBounded/UnGuided Transmission Media
Unguided or wireless media sends the data through air (or water), which is available to anyone who has a device capable of
receiving them. Types of unguided/ unbounded media are discussed below :
Radio Transmission
MicroWave Transmission
Radio Transmission
Its frequency is between 10 kHz to 1GHz. It is simple to install and has high attenuation. These waves are used for
multicast communications.
Types of Propogation
Radio Transmission utilizes different types of propogation :
Troposphere : The lowest portion of earth‘s atmosphere extending outward approximately 30 miles from the
earth‘s surface. Clouds, jet planes, wind is found here.
Ionosphere : The layer of the atmosphere above troposphere, but below space. Contains electrically charged
particles.
Microwave Transmission
It travels at high frequency than the radio waves. It requires the sender to be inside of the receiver. It operates in a system
with a low gigahertz range. It is mostly used for unicast communication.
There are 2 types of Microwave Transmission :
1. Terrestrial Microwave
2. Satellite Microwave
Advantages of Microwave Transmission
Used for long distance telephone communication
Carries 1000‘s of voice channels at the same time
Disadvantages of Microwave Transmission
It is Very costly
Terrestrial Microwave
For increasing the distance served by terrestrial microwave, repeaters can be installed with each antenna .The signal
received by an antenna can be converted into transmittable form and relayed to next antenna as shown in below figure. It is
an example of telephone systems all over the world
There are two types of antennas used for terrestrial microwave communication :
1. Parabolic Dish Antenna
In this every line parallel to the line of symmetry reflects off the curve at angles in a way that they intersect at a common
point called focus. This antenna is based on geometry of parabola.
2. Horn Antenna
It is a like gigantic scoop. The outgoing transmissions are broadcast up a stem and deflected outward in a series of narrow
parallel beams by curved head.
Satellite Microwave
This is a microwave relay station which is placed in outer space. The satellites are launched either by rockets or space
shuttles carry them.
These are positioned 3600KM above the equator with an orbit speed that exactly matches the rotation speed of the earth.
As the satellite is positioned in a geo-synchronous orbit, it is stationery relative to earth and always stays over the same
point on the ground. This is usually done to allow ground stations to aim antenna at a fixed point in the sky.
Features of Satellite Microwave :
Bandwidth capacity depends on the frequency used.
Satellite microwave deployment for orbiting satellite is difficult.
Advantages of Satellite Microwave :
Transmitting station can receive back its own transmission and check whether the satellite has transmitted
information correctly.
A single microwave relay station which is visible from any point.
Disadvantages of Satellite Microwave :
Satellite manufacturing cost is very high
Cost of launching satellite is very expensive
Transmission highly depends on whether conditions, it can go down in bad weather
Types of Communication Networks
Local Area Network (LAN)
It is also called LAN and designed for small physical areas such as an office, group of buildings or a factory. LANs are
used widely as it is easy to design and to troubleshoot. Personal computers and workstations are connected to each other
through LANs. We can use different types of topologies through LAN, these are Star, Ring, Bus, Tree etc.
LAN can be a simple network like connecting two computers, to share files and network among each other while it can
also be as complex as interconnecting an entire building.
LAN networks are also widely used to share resources like printers, shared hard-drive etc.
Applications of LAN
One of the computer in a network can become a server serving all the remaining computers called clients. Software
can be stored on the server and it can be used by the remaining clients.
Connecting Locally all the workstations in a building to let them communicate with each other locally without any
internet access.
Sharing common resources like printers etc are some common applications of LAN.
Metropolitan Area Network (MAN)
It is basically a bigger version of LAN. It is also called MAN and uses the similar technology as LAN. It is designed to
extend over the entire city. It can be means to connecting a number of LANs into a larger network or it can be a single
cable. It is mainly hold and operated by single private company or a public company.
Wide Area Network (WAN)
It is also called WAN. WAN can be private or it can be public leased network. It is used for the network that covers large
distance such as cover states of a country. It is not easy to design and maintain. Communication medium used by WAN are
PSTN or Satellite links. WAN operates on low data rates.
Wireless Network
It is the fastest growing segment of computer. They are becoming very important in our daily life because wind
connections are not possible in cars or aeroplane. We can access Internet at any place avoiding wire related troubles..
These can be used also when the telephone systems gets destroyed due to some calamity/disaster. WANs are really
important now-a-days.
Inter Network
When we connect two or more networks then they are called internetwork or internet. We can join two or more individual
networks to form an internetwork through devices like routers gateways or bridges.
Connection Oriented and Connectionless Services
These are the two services given by the layers to layers above them. These services are :
1. Connection Oriented Service 2. Connectionless Services
Connection Oriented Services
There is a sequence of operation to be followed by the users of connection oriented service. These are :
1. Connection is established 2. Information is sent 3. Connection is released
In connection oriented service we have to establish a connection before starting the communication. When connection is
established we send the message or the information and then we release the connection.
Connection oriented service is more reliable than connectionless service. We can send the message in connection oriented
service if there is an error at the receivers end. Example of connection oriented is TCP (Transmission Control Protocol)
protocol.
Connection Less Services
It is similar to the postal services, as it carries the full address where the message (letter) is to be carried. Each message is
routed independently from source to destination. The order of message sent can be different from the order received.
In connectionless the data is transferred in one direction from source to destination without checking that destination is still
there or not or if it prepared to accept the message. Authentication is not needed in this. Example of Connectionless service
is UDP (User Datagram Protocol) protocol.
Difference between Connection oriented service and Connectionless service
1. In connection oriented service authentication is needed while connectionless service does not need any authentication. 2. Connection oriented protocol makes a connection and checks whether message is received or not and sends again if an
error occurs connectionless service protocol does not guarantees a delivery. 3. Connection oriented service is more reliable than connectionless service. 4. Connection oriented service interface is stream based and connectionless is message based.
Service Primitives
A service is specified by a set of primitives. A primitive means operation. To access the service a user process can access
these primitives. These primitives are different for connection oriented service and connectionless service. There are five
types of service primitives :
1. LISTEN : When a server is ready to accept an incoming connection it executes the LISTEN primitive. It blocks waiting for an incoming connection.
2. CONNECT : It connects the server by establishing a connection. Response is awaited. 3. RECIEVE: Then the RECIEVE call blocks the server. 4. SEND : Then the client executes SEND primitive to transmit its request followed by the execution of RECIEVE to get the
reply. Send the message. 5. DISCONNECT : This primitive is used for terminating the connection. After this primitive one can’t send any message. When
the client sends DISCONNECT packet then the server also sends the DISCONNECT packet to acknowledge the client. When the server package is received by client then the process is terminated.
Connection Oriented Service Primitives
There are 4 types of primitives for Connection Oriented Service :
CONNECT This primitive makes a connection
DATA, DATA-ACKNOWLEDGE, EXPEDITED-DATA Data and information is sent using thus primitive
CONNECT Primitive for closing the connection
RESET Primitive for reseting the connection
Connectionless Oriented Service Primitives
There are 4 types of primitives for Connectionless Oriented Service:
UNIDATA This primitive sends a packet of data
FACILITY, REPORT Primitive for enquiring about the performance of the network, like delivery statistics.
Relationship of Services to Protocol
Services
These are the operations that a layer can provide to the layer above it. It defines the operation and states a layer is ready to
perform but it does not specify anything about the implementation of these operations.
Protocols
These are set of rules that govern the format and meaning of frames, messages or packets that are exchanged between the
server and client.
Reference Models in Communication Networks
The most important reference models are :
1. OSI reference model. 2. TCP/IP reference model.
Introduction to ISO-OSI Model:
There are many users who use computer network and are located all over the world. To ensure national and worldwide
data communication ISO (ISO stands for International Organization of Standardization.) developed this model. This is
called a model for open system interconnection (OSI) and is normally called as OSI model.OSI model architecture consists
of seven layers. It defines seven layers or levels in a complete communication system. OSI Reference model is explained
in other chapter.
Introduction to TCP/IP REFERENCE Model
TCP/IP is transmission control protocol and internet protocol. Protocols are set of rules which govern every possible
communication over the internet. These protocols describe the movement of data between the host computers or internet
and offers simple naming and addressing schemes.
TCP/IP Reference model is explained in details other chapter.
ISO/OSI Model in Communication Networks
There are n numbers of users who use computer network and are located over the world. So to ensure, national and
worldwide data communication, systems must be developed which are compatible to communicate with each other. ISO
has developed this. ISO stands for International organization of Standardization. This is called a model for Open
System Interconnection (OSI) and is commonly known as OSI model.
The ISO-OSI model is a seven layer architecture. It defines seven layers or levels in a complete communication system.
Feature of OSI Model :
1. Big picture of communication over network is understandable through this OSI model.
2. We see how hardware and software work together.
3. We can understand new technologies as they are developed.
4. Troubleshooting is easier by separate networks.
5. Can be used to compare basic functional relationships on different networks.
Functions of Different Layers :
Layer 1: The Physical Layer :
1. It is the lowest layer of the OSI Model.
2. It activates, maintains and deactivates the physical connection.
3. It is responsible for transmission and reception of the unstructured raw data over network.
4. Voltages and data rates needed for transmission is defined in the physical layer.
5. It converts the digital/analog bits into electrical signal or optical signals.
6. Data encoding is also done in this layer.
Layer 2: Data Link Layer :
1. Data link layer synchronizes the information which is to be transmitted over the physical layer.
2. The main function of this layer is to make sure data transfer is error free from one node to another, over the
physical layer.
3. Transmitting and receiving data frames sequentially is managed by this layer.
4. This layer sends and expects acknowledgements for frames received and sent respectively. Resending of non-
acknowledgement received frames is also handled by this layer.
5. This layer establishes a logical layer between two nodes and also manages the Frame traffic control over the
network. It signals the transmitting node to stop, when the frame buffers are full.
Layer 3: The Network Layer :
1. It routes the signal through different channels from one node to other.
2. It acts as a network controller. It manages the Subnet traffic.
3. It decides by which route data should take.
4. It divides the outgoing messages into packets and assembles the incoming packets into messages for higher levels.
Layer 4: Transport Layer :
1. It decides if data transmission should be on parallel path or single path.
2. Functions such as Multiplexing, Segmenting or Splitting on the data are done by this layer
3. It receives messages from the Session layer above it, convert the message into smaller units and passes it on to the
Network layer.
4. Transport layer can be very complex, depending upon the network requirements.
Transport layer breaks the message (data) into small units so that they are handled more efficiently by the network layer.
Layer 5: The Session Layer :
1. Session layer manages and synchronize the conversation between two different applications.
2. Transfer of data from source to destination session layer streams of data are marked and are resynchronized
properly, so that the ends of the messages are not cut prematurely and data loss is avoided.
Layer 6: The Presentation Layer :
1. Presentation layer takes care that the data is sent in such a way that the receiver will understand the information
(data) and will be able to use the data.
2. While receiving the data, presentation layer transforms the data to be ready for the application layer.
3. Languages(syntax) can be different of the two communicating systems. Under this condition presentation layer
plays a role of translator.
4. It perfroms Data compression, Data encryption, Data conversion etc.
Layer 7: Application Layer :
1. It is the topmost layer.
2. Transferring of files disturbing the results to the user is also done in this layer. Mail services, directory services,
network resource etc are services provided by application layer.
3. This layer mainly holds application programs to act upon the received and to be sent data.
Merits of OSI reference model:
1. OSI model distinguishes well between the services, interfaces and protocols.
2. Protocols of OSI model are very well hidden.
3. Protocols can be replaced by new protocols as technology changes.
4. Supports connection oriented services as well as connectionless service.
Demerits of OSI reference model:
1. Model was devised before the invention of protocols.
2. Fitting of protocols is tedious task.
3. It is just used as a reference model.
PHYSICAL Layer - OSI Model
Physical layer is the lowest layer of all. It is responsible for sending bits from one computer to another. This layer is not
concerned with the meaning of the bits and deals with the physical connection to the network and with transmission and
reception of signals.
This layer defines electrical and physical details represented as 0 or a 1. How many pins a network will contain, when the
data can be transmitted or not and how the data would be synchronized.
FUNCTIONS OF PHYSICAL LAYER:
1. Representation of Bits: Data in this layer consists of stream of bits. The bits must be encoded into signals for transmission. It defines the type of encoding i.e. how 0’s and 1’s are changed to signal.
2. Data Rate: This layer defines the rate of transmission which is the number of bits per second. 3. Synchronization: It deals with the synchronization of the transmitter and receiver. The sender and receiver are synchronized
at bit level. 4. Interface: The physical layer defines the transmission interface between devices and transmission medium. 5. Line Configuration: This layer connects devices with the medium: Point to Point configuration and Multipoint configuration. 6. Topologies: Devices must be connected using the following topologies: Mesh, Star, Ring and Bus. 7. Transmission Modes: Physical Layer defines the direction of transmission between two devices: Simplex, Half Duplex, Full
Duplex. 8. Deals with baseband and broadband transmission.
DATA LINK Layer - OSI Model
Data link layer is most reliable node to node delivery of data. It forms frames from the packets that are received from
network layer and gives it to physical layer. It also synchronizes the information which is to be transmitted over the data.
Error controlling is easily done. The encoded data are then passed to physical.
Error detection bits are used by the data link layer. It also corrects the errors. Outgoing messages are assembled into
frames. Then the system waits for the acknowledgements to be received after the transmission. It is reliable to send
message.
FUNCTIONS OF DATA LINK LAYER:
1. Framing: Frames are the streams of bits received from the network layer into manageable data units. This division of stream of bits is done by Data Link Layer.
2. Physical Addressing: The Data Link layer adds a header to the frame in order to define physical address of the sender or receiver of the frame, if the frames are to be distributed to different systems on the network.
3. Flow Control: A flow control mechanism to avoid a fast transmitter from running a slow receiver by buffering the extra bit is provided by flow control. This prevents traffic jam at the receiver side.
4. Error Control: Error control is achieved by adding a trailer at the end of the frame. Duplication of frames are also prevented by using this mechanism. Data Link Layers adds mechanism to prevent duplication of frames.
5. Access Control: Protocols of this layer determine which of the devices has control over the link at any given time, when two or more devices are connected to the same link.
Network Layer - OSI Model
The main aim of this layer is to deliver packets from source to destination across multiple links (networks). If two
computers (system) are connected on the same link then there is no need for a network layer. It routes the signal through
different channels to the other end and acts as a network controller.
It also divides the outgoing messages into packets and to assemble incoming packets into messages for higher levels.
FUNCTIONS OF NETWORK LAYER:
1. It translates logical network address into physical address. Concerned with circuit, message or packet switching.
2. Routers and gateways operate in the network layer. Mechanism is provided by Network Layer for routing the
packets to final destination.
3. Connection services are provided including network layer flow control, network layer error control and packet
sequence control.
4. Breaks larger packets into small packets.
Transport Layer - OSI Model
The main aim of transport layer is to be delivered the entire message from source to destination. Transport layer ensures
whole message arrives intact and in order, ensuring both error control and flow control at the source to destination level. It
decides if data transmission should be on parallel path or single path
Transport layer breaks the message (data) into small units so that they are handled more efficiently by the network layer
and ensures that message arrives in order by checking error and flow control.
FUNCTIONS OF TRANSPORT LAYER:
1. Service Point Addressing : Transport Layer header includes service point address which is port address. This layer gets the message to the correct process on the computer unlike Network Layer, which gets each packet to the correct computer.
2. Segmentation and Reassembling : A message is divided into segments; each segment contains sequence number, which enables this layer in reassembling the message. Message is reassembled correctly upon arrival at the destination and replaces packets which were lost in transmission.
3. Connection Control : It includes 2 types : o Connectionless Transport Layer : Each segment is considered as an independent packet and delivered to the
transport layer at the destination machine. o Connection Oriented Transport Layer : Before delivering packets, connection is made with transport layer at the
destination machine. 4. Flow Control : In this layer, flow control is performed end to end. 5. Error Control : Error Control is performed end to end in this layer to ensure that the complete message arrives at the
receiving transport layer without any error. Error Correction is done through retransmission.
Session Layer - OSI Model
Its main aim is to establish, maintain and synchronize the interaction between communicating systems. Session layer
manages and synchronize the conversation between two different applications. Transfer of data from one destination to
another session layer streams of data are marked and are resynchronized properly, so that the ends of the messages are not
cut prematurely and data loss is avoided.
FUNCTIONS OF SESSION LAYER:
1. Dialog Control : This layer allows two systems to start communication with each other in half-duplex or full-duplex. 2. Synchronization : This layer allows a process to add checkpoints which are considered as synchronization points into stream
of data. Example: If a system is sending a file of 800 pages, adding checkpoints after every 50 pages is recommended. This ensures that 50 page unit is successfully received and acknowledged. This is beneficial at the time of crash as if a crash happens at page number 110; there is no need to retransmit 1 to100 pages.
Presentation Layer - OSI Model
The primary goal of this layer is to take care of the syntax and semantics of the information exchanged between two
communicating systems. Presentation layer takes care that the data is sent in such a way that the receiver will understand
the information (data) and will be able to use the data. Languages (syntax) can be different of the two communicating
systems. Under this condition presentation layer plays a role translator.
FUNCTIONS OF PRESENTATION LAYER:
1. Translation : Before being transmitted, information in the form of characters and numbers should be changed to bit streams. The presentation layer is responsible for interoperability between encoding methods as different computers use different encoding methods. It translates data between the formats the network requires and the format the computer.
2. Encryption : It carries out encryption at the transmitter and decryption at the receiver. 3. Compression : It carries out data compression to reduce the bandwidth of the data to be transmitted. The primary role of
Data compression is to reduce the number of bits to be 0transmitted. It is important in transmitting multimedia such as audio, video, text etc.
Application Layer - OSI Model
It is the top most layer of OSI Model. Manipulation of data (information) in various ways is done in this layer which
enables user or software to get access to the network. Some services provided by this layer includes: E-Mail, transferring
of files, distributing the results to user, directory services, network resource etc.
FUNCTIONS OF APPLICATION LAYER:
1. Mail Services : This layer provides the basis for E-mail forwarding and storage.
2. Network Virtual Terminal : It allows a user to log on to a remote host. The application creates software
emulation of a terminal at the remote host. User‘s computer talks to the software terminal which in turn talks to the
host and vice versa. Then the remote host believes it is communicating with one of its own terminals and allows
user to log on.
3. Directory Services : This layer provides access for global information about various services.
4. File Transfer, Access and Management (FTAM) : It is a standard mechanism to access files and manages it.
Users can access files in a remote computer and manage it. They can also retrieve files from a remote computer.
The TCP/IP Reference Model
TCP/IP means Transmission Control Protocol and Internet Protocol. It is the network model used in the current Internet
architecture as well. Protocols are set of rules which govern every possible communication over a network. These
protocols describe the movement of data between the source and destination or the internet. These protocols offer simple
naming and addressing schemes.
Overview of TCP/IP reference model
TCP/IP that is Transmission Control Protocol and Internet Protocol was developed by Department of Defence's Project
Research Agency (ARPA, later DARPA) as a part of a research project of network interconnection to connect remote
machines.
The features that stood out during the research, which led to making the TCP/IP reference model were:
Support for a flexible architecture. Adding more machines to a network was easy. The network was robust, and connections remained intact untill the source and destination machines were functioning.
The overall idea was to allow one application on one computer to talk to(send data packets) another application running on
different computer.
Description of different TCP/IP protocols
Layer 1: Host-to-network Layer
1. Lowest layer of the all. 2. Protocol is used to connect to the host, so that the packets can be sent over it. 3. Varies from host to host and network to network.
Layer 2: Internet layer
1. Selection of a packet switching network which is based on a connectionless internetwork layer is called a internet layer. 2. It is the layer which holds the whole architecture together. 3. It helps the packet to travel independently to the destination. 4. Order in which packets are received is different from the way they are sent. 5. IP (Internet Protocol) is used in this layer.
Layer 3: Transport Layer
1. It decides if data transmission should be on parallel path or single path. 2. Functions such as multiplexing, segmenting or splitting on the data is done by transport layer. 3. The applications can read and write to the transport layer. 4. Transport layer adds header information to the data. 5. Transport layer breaks the message (data) into small units so that they are handled more efficiently by the network layer. 6. Transport layer also arrange the packets to be sent, in sequence.
Layer 4: Application Layer
The TCP/IP specifications described a lot of applications that were at the top of the protocol stack. Some of them were
TELNET, FTP, SMTP, DNS etc.
1. TELNET is a two-way communication protocol which allows connecting to a remote machine and run applications on it. 2. FTP(File Transfer Protocol) is a protocol, that allows File transfer amongst computer users connected over a network. It is
reliable, simple and efficient. 3. SMTP(Simple Mail Transport Protocol) is a protocol, which is used to transport electronic mail between a source and
destination, directed via a route.
4. DNS(Domain Name Server) resolves an IP address into a textual address for Hosts connected over a network.
Merits of TCP/IP model
1. It operated independently. 2. It is scalable. 3. Client/server architecture. 4. Supports a number of routing protocols. 5. Can be used to establish a connection between two computers.
Demerits of TCP/IP
1. In this, the transport layer does not guarantee delivery of packets. 2. The model cannot be used in any other application. 3. Replacing protocol is not easy. 4. It has not clearly separated its services, interfaces and protocols.
Comparison of OSI Reference Model and TCP/IP Reference Model
Following are some major differences between OSI Reference Model and TCP/IP Reference Model, with diagrammatic
comparison below.
OSI(Open System Interconnection) TCP/IP(Transmission Control Protocol / Internet Protocol)
1. OSI is a generic, protocol independent standard, acting
as a communication gateway between the network and
end user.
1. TCP/IP model is based on standard protocols around which the
Internet has developed. It is a communication protocol, which allows
connection of hosts over a network.
2. In OSI model the transport layer guarantees the
delivery of packets.
2. In TCP/IP model the transport layer does not guarantees delivery of
packets. Still the TCP/IP model is more reliable.
3. Follows vertical approach. 3. Follows horizontal approach.
4. OSI model has a separate Presentation layer and
Session layer. 4. TCP/IP does not have a separate Presentation layer or Session layer.
5. OSI is a reference model around which the networks
are built. Generally it is used as a guidance tool. 5. TCP/IP model is, in a way implementation of the OSI model.
6. Network layer of OSI model provides both connection
oriented and connectionless service. 6. The Network layer in TCP/IP model provides connectionless service.
7. OSI model has a problem of fitting the protocols into
the model. 7. TCP/IP model does not fit any protocol
8. Protocols are hidden in OSI model and are easily
replaced as the technology changes. 8. In TCP/IP replacing protocol is not easy.
9. OSI model defines services, interfaces and protocols
very clearly and makes clear distinction between them. It
is protocol independent.
9. In TCP/IP, services, interfaces and protocols are not clearly separated.
It is also protocol dependent.
10. It has 7 layers 10. It has 4 layers
Diagrammatic Comparison between OSI Reference Model and TCP/IP Reference Model
KEY TERMS in Computer Networks
Following are some important terms, which are frequently used in context of Computer Networks.
Terms Definition
1. ISO The OSI model is a product of the Open Systems Interconnection project at the International
Organization for Standardization. ISO is a voluntary organization.
2. OSI Model Open System Interconnection is a model consisting of seven logical layers.
3. TCP/IP Model Transmission Control Protocol and Internet Protocol Model is based on four layer model which is based
on Protocols.
4. UTP Unshielded Twisted Pair cable is a Wired/Guided media which consists of two conductors usually
copper, each with its own colour plastic insulator
5. STP Shielded Twisted Pair cable is a Wired/Guided media has a metal foil or braided-mesh covering which
encases each pair of insulated conductors. Shielding also eliminates crosstalk
6. PPP Point-to-Point connection is a protocol which is used as a communication link between two devices.
7. LAN Local Area Network is designed for small areas such as an office, group of building or a factory.
8. WAN Wide Area Network is used for the network that covers large distance such as cover states of a country
9. MAN Metropolitan Area Network uses the similar technology as LAN. It is designed to extend over the entire
city.
10. Crosstalk
Undesired effect of one circuit on another circuit. It can occur when one line picks up some signals
travelling down another line. Example: telephone conversation when one can hear background
conversations. It can be eliminated by shielding each pair of twisted pair cable.
11. PSTN
Public Switched Telephone Network consists of telephone lines, cellular networks, satellites for
communication, fiber optic cables etc. It is the combination of world’s (national, local and regional)
circuit switched telephone network.
12. File Transfer, Access
and Management (FTAM)
Standard mechanism to access files and manages it. Users can access files in a remote computer and
manage it.
13. Analog Transmission The signal is continuously variable in amplitude and frequency. Power requirement is high when
compared with Digital Transmission.
14. Digital Transmission It is a sequence of voltage pulses. It is basically a series of discrete pulses. Security is better than Analog
Transmission.
http://www.studytonight.com/servlet/
http://www.studytonight.com/java/
http://www.studytonight.com/data-structures/
http://www.studytonight.com/dbms/
Definition
cloud computing Sponsored News
Considerations for Deploying Hybrid Clouds on Microsoft® Azure™ and Cloud ... –Rackspace Building a Private Cloud on Converged Infrastructure –Dell See More
Vendor Resources
An Introduction to Cloud Computing for Service Providers and Large Enterprise –Joyent Financial Services PaaS & Private Clouds: Managing and Monitoring Disparate... –MuleSoft
Cloud computing is a general term for the delivery of hosted services over the internet.
Cloud computing enables companies to consume a compute resource, such as a virtual machine (VMs), storage or an
application, as a utility -- just like electricity -- rather than having to build and maintain computing infrastructures in house.
SaaS Adoption: Mitigating App Integration Problems
In this expert guide, uncover 9 questions you should be asked by vendors when putting a hybrid cloud app integration
strategy in place. Also, learn how you can use DevOps to prevent cloud complications.
Cloud computing boasts several attractive benefits for businesses and end users. Three of the main benefits of cloud
computing are:
Self-service provisioning: End users can spin up compute resources for almost any type of workload on demand. This eliminates the traditional need for IT administrators to provision and manage compute resources.
Elasticity: Companies can scale up as computing needs increase and scale down again as demands decrease. This eliminates the need for massive investments in local infrastructure which may or may not remain active.
Pay per use: Compute resources are measured at a granular level, allowing users to pay only for the resources and workloads they use.
Cloud computing deployment models
Cloud computing services can be private, public or hybrid.
Private cloud services are delivered from a business' data center to internal users. This model offers versatility and
convenience, while preserving the management, control and security common to local data centers. Internal users may or
may not be billed for services through IT chargeback.
In the public cloud model, a third-party provider delivers the cloud service over the internet. Public cloud services are sold
on demand, typically by the minute or hour. Customers only pay for the CPU cycles, storage or bandwidth they consume.
Leading public cloud providers include Amazon Web Services (AWS), Microsoft Azure, IBM SoftLayer and Google
Compute Engine.
Hybrid cloud is a combination of public cloud services and on-premises private cloud -- with orchestration and automation
between the two. Companies can run mission-critical workloads or sensitive applications on the private cloud while using
the public cloud for bursting workloads that must scale on demand. The goal of hybrid cloud is to create a unified,
automated, scalable environment that takes advantage of all that a public cloud infrastructure can provide while still
maintaining control over mission-critical data.
Cloud computing service categories
Although cloud computing has changed over time, it has been divided into three broad service categories: infrastructure as
a service (IaaS), platform as a service (PaaS) and software as a service (SaaS).
Find more PRO+ content and other member only offers, here.
E-Chapter
Deciding what goes where in a multi-cloud environment
E-Handbook
Open source cloud management means knowing your tools
E-Handbook
Questions to answer before hybrid cloud adoption
IaaS providers, such as AWS, supply a virtual server instance and storage, as well as application program interfaces (APIs)
that let users migrate workloads to a virtual machine. Users have an allocated storage capacity and can start, stop, access
and configure the VM and storage as desired. IaaS providers offer small, medium, large, extra-large and memory- or
compute-optimized instances, in addition to customized instances, for various workload needs.
In the PaaS model, providers host development tools on their infrastructures. Users access these tools over the internet
using APIs, web portals or gateway software. PaaS is used for general software development, and many PaaS providers
will host the software after it's developed. Common PaaS providers include Salesforce.com's Force.com, AWS Elastic
Beanstalk and Google App Engine.
SaaS is a distribution model that delivers software applications over the internet; these applications are often called web
services. Microsoft Office 365 is a SaaS offering for productivity software and email services. Users can access SaaS
applications and services from any location using a computer or mobile device that has internet access.
A brief summary of what IaaS, PaaS and SaaS are and how to use each.
Cloud computing security
Security remains a primary concern for businesses contemplating cloud adoption -- especially public cloud adoption.
Public cloud providers share their underlying hardware infrastructure between numerous customers, as public cloud is a
multi-tenant environment. This environment demands copious isolation between logical compute resources. At the same
time, access to public cloud storage and compute resources is guarded by account logon credentials.
What are the biggest benefits and challenges your organization has faced while using cloud computing services?
Many organizations bound by complex regulatory obligations and governance standards are still hesitant to place data or
workloads in the public cloud for fear of outages, loss or theft. However, this resistance is fading as logical isolation has
proven reliable and the addition of data encryption and various identity and access management (IAM) tools has improved
security within the public cloud.
This was last updated in October 2016
Next Steps
What application integration strategy is right for you?
Brian Posey explains the business benefits of public cloud services, cost considerations and provides advice for choosing
the best public cloud provider.
Continue Reading About cloud computing
Linda Tucci explains why "Cloud computing will follow you everywhere" at SearchCIO.com
George Gilder wrote about the dawning of the "Petabyte Age" and the Internet cloud in the pages of "Wired Magazine"
Read about 11 leaders in the cloud computing industry from SearchCloudComputing
Cloud Computing Explained -- eGuide
Check out 10 cloud computing definitions you must know from SearchCloudComputing.com
Cloud computing experts forecast the market climate in 2014
Related Terms
cloud database A cloud database is a collection of content, either structured or unstructured, that resides on a private, public or hybrid cloud... See complete definition
GoodData GoodData is a software company specializing in cloud-based business intelligence (BI) and big data analytics. The company’s main ... See complete definition
Software as a Service (SaaS) Software as a service (SaaS) is a software distribution model in which applications are hosted by a vendor or service provider ... See complete definition
Digital Signature
The digital equivalent of a handwritten signature or stamped seal, but offering far more inherent security, a digital
signature is intended to solve the problem of tampering and impersonation in digital communications. Digital signatures
can provide the added assurances of evidence to origin, identity and status of an electronic document, transaction or
message, as well as acknowledging informed consent by the signer.
In many countries, including the United States, digital signatures have the same legal significance as the more traditional
forms of signed documents. The United States Government Printing Office publishes electronic versions of the budget,
public and private laws, and congressional bills with digital signatures.
How digital signatures work
Digital signatures are based on public key cryptography, also known as asymmetric cryptography. Using a public key
algorithm such as RSA, one can generate two keys that are mathematically linked: one private and one public. To create a
digital signature, signing software (such as an email program) creates a one-way hash of the electronic data to be signed.
The private key is then used to encrypt the hash. The encrypted hash -- along with other information, such as the hashing
algorithm -- is the digital signature. The reason for encrypting the hash instead of the entire message or document is that a
hash function can convert an arbitrary input into a fixed length value, which is usually much shorter. This saves time since
hashing is much faster than signing.
Esign http://cca.gov.in/cca/sites/default/files/files/ESIGNFAQFeb26022015.pdf
SMS Gateway
This type of gateway directly connects to the operator's mobile SMSC (short message service center) through a direct line
or an internet. ... This gateway converts the SMS to an SMSC format for device compatibility. This form of SMS gateway
is used by SMS aggregators in order to deliver SMS services to customers.
SMS gateway permits your computer to receive or send (SMS) Short Message Service transmissions to or from a telecom
service provider. Most messages are diverted to mobile phone networks. There are many SMS gateways support media
conversion which help in converting from email and other formats.
Short message service (SMS). It helps to receive and send the SMS also called as short messages. This is the most
commonly used method of communication in present modern business world. It helps in exchange of information is made
through SMS. Messages can be delivered without physical contact. This is also real time,fast paced and speedy.
SMS messages travel through a series of connections.The telecom companies has named this as tool SMS gateway. This is
a network facility being used in order to send and receive SMS. It helps two way messaging that can be routed through cell
phones or computers or through Laptops. The connection from cell phones to PC can be made to work through cable,
infrared and Bluetooth.
There are several SMS gateway types. They have different ways and specific course being followed as to how they
transmit the SMS messages. They are as follows:
·1. Direct-to-mobile Gateway. This type of SMS gateway has a built-in GSM or Global System for Mobile
Communication connectivity. This permits sending and receiving SMS text messages via email,web pages and other
software apps through the use of a SIM (Subscriber Identity Module) card. Direct-to-mobile gateway differs from SMS
aggregators as it is installed in its own organization's network and has the capacity to connect to a local mobile network.
This mobile works by the use of SIM card which will be acquired from the network provider for the SMS gateway to be
installed.
·2. Direct-to-SMS Gateway. This device allows sending and receiving SMS text message by email, web pages and other
software apps. This translates protocol to different forms to relay the SMS transmission. This type of gateway directly
connects to the operator‘s mobile SMSC (short message service center) through a direct line or an internet. SMSC handles
operation by storing messages and routing and forwarding them to their desired endpoints. This gateway converts the SMS
to an SMSC format for device compatibility. This form of SMS gateway is used by SMS aggregators in order to deliver
SMS services to customers. This is due to the great volume of messaging it can support and also its direct contact with the
mobile operator.
POP - Post Office Protocol (1) POP is short for Post Office Protocol, a protocol used to retrieve e-mail from a mail server. Most e-mail applications
(sometimes called an e-mail client) use the POP protocol, although some can use the newer IMAP (Internet Message
Access Protocol).
There are two versions of POP. The first, called POP2, became a standard in the mid-80's and requires SMTP to send
messages. The newer version, POP3, can be used with or without SMTP.
(2) Pop is short for point of presence, an access point to the Internet. ISPs have typically multiple POPs. A point of
presence is a physical location, either part of the facilities of a telecommunications provider that the ISP rents or a separate
location from the telecommunications provider, that houses servers, routers, ATM switches and digital/analog
IMAP - Internet Message Access Protocol hort for Internet Message Access Protocol, a protocol for retrieving e-mail messages. The latest version, IMAP4, is similar
to POP3 but supports some additional features. For example, with IMAP4, you can search through your e-mail messages
for keywords while the messages are still on mail server. You can then choose which messages to download to your
machine.
IMAP was developed at Stanford University in 1986.
e-mail client An application that runs on a personal computer or workstation and enables you to send, receive and organize e-mail. It's
called a client because e-mail systems are based on a client-server architecture. Mail is sent from many clients to a central
server, which re-routes the mail to its intended destination.
Also see Why E-Mails Bounce in the Did You Know section of Webopedia.
peer-to-peer architecture Often referred to simply as peer-to-peer, or abbreviated P2P, a type of network in which each workstation has equivalent
capabilities and responsibilities. This differs from client/server architectures, in which some computers are dedicated to
serving the others. Peer-to-peer networks are generally simpler, but they usually do not offer the same performance under
heavy loads.
QR code
QR code (abbreviated from Quick Response Code) is the trademark for a type of matrix barcode (or two-dimensional
barcode) first designed for the automotive industry in Japan. A barcode is a machine-readable optical label that contains
information about the item to which it is attached.
What is a QR Code?
So you may have heard that QR Codes are set to become the 'next big thing' but thinking to yourself, what is a QR Code!?
QR or Quick Response Codes are a type of two-dimensional barcode that can be read using smartphones and dedicated
QR reading devices, that link directly to text, emails, websites, phone numbers and more! You may have even got to this
site by scanning a QR code!
QR codes are huge in Japan and across the East, and are slowly beginning to become commonplace in the West. Soon
enough you will see QR codes on product packaging, shop displays, printed and billboard advertisements as well as in
emails and on websites. The scope of use for QR codes really is huge, particularly for the marketing and advertising of
products, brands, services and anything else you can think of.
Why should I care about QR Codes?
With as many as half of us now owning smartphones, and that number growing on a daily basis, QR Codes have the
potential to have a major impact upon society and particularly in advertising, marketing and customer service with a
wealth of product information just one scan away.
How is a QR Code different from a normal 1D UPC barcode?
Ordinarily we think of a barcode as a collection of vertical lines; 2D Barcodes or QR Codes are different in that the data is
stored in both directions and can be scanned vertically OR horizontally.
Whilst a standard 1D Barcode (UPC/EAN) stores up to 30 numbers, a QR Barcode can store up to a massive 7,089! It is
this massive amount of data that enables links to such things as videos, Facebook or Twitter pages or a plethora of other
website pages.
How do I scan a QR Code?
If you have a smartphone like an iPhone, Android or Blackberry then there a number of different barcode scanner
applications such as Red Laser, Barcode Scanner and QR Scanner that can read and decode data from a QR code. The
majority of these are completely FREE, and all you have to do once you install one is to use your phone's camera to scan
the barcode, which will then automatically load the encoded data for you.
What can be encoded into a QR Code?
In its simplest sense a QR Code is an 'image-based hypertext link' that can be used offline – any URL can be encoded into
a QR Code so essentially any webpage can be opened automatically as a result of scanning the barcode. If you want to
encourage someone to like your Facebook page – have your Facebook profile page as the URL. Want your video to go
viral – encode the URL in your QR Code. The options are endless.
In addition to website URLs a QR Code can also contain a phone number – so when it is scanned it prompts the user to call
a particular number. Similarly you can encode an SMS text message, V-card data or just plain alphanumeric text. The
smartphone or 2D barcode reading device will automatically know which application to use to open the content embedded
within the QR Code.
Where can QR Codes be placed?
The answer to this is almost anywhere! QR Code printing can be done in newspapers, magazines, brochures, leaflets and
on business cards. Further to this they can be put on product packaging or labels, or on billboards or even walls. You could
even tattoo a QR Code on your body – now that would be an interesting take on giving a girl/guy your number in a bar!
You can use QR Codes on a website but they should not generally be used as a substitute for an old-fashioned hyperlink
because obviously the user is already online and doesn't really want to fiddle around with their phone only to find a
website they could have just clicked through to in half the time.
How can I make a QR Code?
You can make your own QR Codes using designated 2D barcode generators, some of which are listed below; however you
should first consider why it is that you want a QR Code and how you will use it. See the 'QR Codes for Marketing' section
below for more information on this.
QR Code generators that are currently available include:
http://www.qrstuff.com/ http://qrcode.kaywa.com/ http://quikqr.com/
What size does a QR Code have to be?
Generally speaking, the larger the QR Code, the easier it is for it to be scanned, however most QR reading devices are able
to scan images that are small enough to fit on a business card for example. This of course assumes that the quality of image
is good.
QR Code File Formats
You can use the following file formats when creating a QR Code:
HTML Code PNG File Tiff File SVG EPS
PNG files work particularly well as they can be resized very easily, meaning that you can easily scale the QR Code
depending on where you want to put it.
QR Codes for Marketing
If you want to use QR Codes for business or marketing purposes then you should consider that people have higher
expectations from scanning a QR Code than they do simply clicking a link on a website. You should offer something
special or unique to people that have taken the time and effort to scan the barcode. For ideas of what this could be, or just
for more information about QR Code Marketing have a look at Piranha Internet who have successfully incorporated the
use of QR Codes into several marketing strategies for their clients.
Also remember that many people won't know what a QR Code is or how to use it. Up until their use is more widespread
you will need to provide instructions about what to do with a QR Code.
Who invented the QR Code?
Denso-Wave - a subsidiary of the Toyota Group - are attributed with the creation of the QR Code as far back as 1994.
Originally it was designed to be used to track parts in the vehicle manufacturing industry, but its use has since grown
tremendously.
Other 2D Barcode Formats
QR Codes are just one type of 2D Barcode, although they are probably the most popular. Other popular 2D Barcode
formats are:
Microsoft Tag – Microsoft have their very own 2D barcode format known as a High Capacity Colour Barcode, or 'Tag'. The main benefits of this are that you can easily customise your tag – adding colour and making it match your brand. You can also "dynamically change your data source" meaning that you can change the URL that the tag directs to. The main drawback of Microsoft Tag is that they can only be read using Microsoft's own tag reader.
Data Matrix – This is probably the most similar format to the QR Code and is commonly used on small electrical components because it can be read even when only 2-3mm in size.
EZcode – This system is a little different in that the data is not actually stored within the code itself, but on the Scanbuy server. A code index is sent from a mobile device to the server, which queries a database and returns the information. The problem with such a system is that it is wholly reliant upon the Scanbuy servers.
GSM stands for Global System for Mobile Communication, and unless you live in the United States or Russia, this is
probably the technology your phone network uses, given it‘s the standard system for most of the world. GSM networks use
TDMA, which stands for Time Division Multiple Access. TDMA works by assigning time slots to multiple conversation
streams, alternating them in sequence and switching between each conversation in very short intervals. During these
intervals, phones can transmit their information. In order for the network to know which users are connected to the
network, each phone uses a subscriber identification module card, or SIM card.
SIM cards are one of the key features of GSM networks. They house your service subscription, network identification, and
address book information. The cards are also used to assign time slots to the phone conversation, and moreover, they tell
the network what services you have access to. They store your address book, too, along with relative contact information.
They can even be used to pass information between phones, if a carrier allows it.
Read more: http://www.digitaltrends.com/mobile/cdma-vs-gsm-differences-explained/#ixzz4Y0MhzIqq
When you‘re looking at buying a new phone, you might find that there are way too many acronyms to choose from,
between CDMA, GSM, LTE, and WiMax, and the list goes on. Instead, it can be easier to focus simply on the differences
in these networks as they apply to you directly. The simplest explanation is that the ―G‖ in 4G stands for generation,
because 4G is the fourth generation of mobile data technology, as defined by the radio sector of the International
Telecommunication Union (ITU-R). LTE stands for ―Long Term Evolution‖ and applies more generally to the idea of
improving wireless broadband speeds to meet increasing demand.
Don't Fall Behind Stay current with a recap of today's Tech News from Digital Trends
When you‘re looking at buying a new phone, you might find that there are way too many acronyms to choose from,
between CDMA, GSM, LTE, and WiMax, and the list goes on. Instead, it can be easier to focus simply on the differences
in these networks as they apply to you directly. The simplest explanation is that the ―G‖ in 4G stands for generation,
because 4G is the fourth generation of mobile data technology, as defined by the radio sector of the International
Telecommunication Union (ITU-R). LTE stands for ―Long Term Evolution‖ and applies more generally to the idea of
improving wireless broadband speeds to meet increasing demand.
What is 3G?
When 3G networks started rolling out, they replaced the 2G system, a network protocol that only allowed the most basic of
what we would now call smartphone functionality. Most 2G networks handled phone calls, basic text messaging, and small
amounts of data over a protocol called MMS. With the introduction of 3G connectivity, a number of larger data formats
became much more accessible, including standard HTML pages, videos, and music. The speeds were still pretty slow, and
mostly required pages and data specially formatted for these slower wireless connections. By 2G standards, the new
protocol was speedy, but still didn‘t come anywhere close to replacing a home broadband connection.
What is 4G?
The ITU-R set standards for 4G connectivity in March of 2008, requiring all services described as 4G to adhere to a set of
speed and connection standards. For mobile use, including smartphones and tablets, connection speeds need to have a peak
of at least 100 megabits per second, and for more stationary uses such as mobile hotspots, at least 1 gigabit per second.
Aadhaar Seeding Aadhaar Seeding Approach Empaneled Seeding Agencies Seeding Training
The UIDAI does not collect or store any additional personal information or linking data, such as PAN number, Driver‘s License
numbers, details of caste, creed, religion, income level or health status, etc. UIDAI has created a seeding ecosystem, where different
partners can leverage various tools offered by UIDAI to link Aadhaar in their respective service delivery databases.
Aadhaar seeding is a process by which UIDs of residents are accurately included in the service delivery database of service providers
for enabling Aadhaar based authentication during service delivery. The seeding process is accomplished in two steps. In the first step,
Aadhaar details need to be collected from the beneficiary. The service provider or the seeding agency is expected to reveal the
purpose of collecting Aadhaar details and taking an informed consent from the Aadhaar number holder or the beneficiary. The second
step involves the verification of collected Aadhaar details. Once the verification is successful with UIDAI‘s CIDR database, the
Aadhaar is linked to the beneficiary record in the domain database of the service provider.
UIDAI has undertaken multiple activities to ensure Aadhaar seeding in facilitated in various scheme databases. The Aadhaar seeding
framework includes:
A Standard Protocol Covering the Approach & Process for Seeding Aadhaar in Service Delivery Databases is available on UIDAI website.
UIDAI has launched various services like DBT Seeding Data Viewer (DSDV), authentication process to verify seeding, e-Aadhaar download, EID-UID search, demographic authentication, advanced search, etc through resident portal and its ecosystem partners to facilitate seeding process
UIDAI has conducted multiple Aadhaar seeding workshops at UIDAI HQ for ministries and departments and at UIDAI ROs for local administration in states.
Empanelled 48 seeding agencies for undertaking seeding on behalf of central and state departments.
Developed content including classroom training and computer based training content for various stakeholders
Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged
in order to accommodate that growth.[1] For example, it can refer to the capability of a system to increase its total output under an increased load when resources (typically hardware) are added. An analogous meaning is implied when the word is used in an economic context, where scalability of a company implies that the underlying business model offers the potential for economic growth within the company.
Scalability, as a property of systems, is generally difficult to define[2] and in any particular case it is necessary to define the specific requirements for scalability on those dimensions that are deemed important. It is a highly significant issue in electronics systems, databases, routers, and networking. A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system.
An algorithm, design, networking protocol, program, or other system is said to scale if it is suitably efficient and practical when applied to large situations (e.g. a large input data set, a large number of outputs or users, or a large number of participating nodes in the case of a distributed system). If the design or system fails when a quantity increases, it does not scale. In practice,
if there are a large number of things (n) that affect scaling, then resource requirements (for example, algorithmic time-
complexity) must grow less than n2 as n increases. An example is a search engine, which scales not only for the number of
users, but also for the number of objects it indexes. Scalability refers to the ability of a site to increase in size as demand warrants.[3]
The concept of scalability is desirable in technology as well as business settings. The base concept is consistent – the ability for a business or technology to accept increased volume without impacting the contribution margin (= revenue − variable costs). For example, a given piece of equipment may have a capacity for 1–1000 users, while beyond 1000 users additional equipment is needed or performance will decline (variable costs will increase and reduce contribution margin).
Database scalability [edit]
A number of different approaches enable databases to grow to very large size while supporting an ever-increasing rate of transactions per second. Not to be discounted, of course, is the rapid pace of hardware advances in both the speed and capacity of mass storage devices, as well as similar advances in CPU and networking speed.
One technique supported by most of the major database management system (DBMS) products is the partitioning of large tables, based on ranges of values in a key field. In this manner, the database can be scaled out across a cluster of separate database servers. Also, with the advent of 64-bit microprocessors, multi-core CPUs, and large SMP multiprocessors, DBMS vendors have been at the forefront of supporting multi-threaded implementations that substantially scale up transaction processing capacity.
Network-attached storage (NAS) and Storage area networks (SANs) coupled with fast local area networks and Fibre Channel technology enable still larger, more loosely coupled configurations of databases and distributed computing power. The widely supported X/Open XA standard employs a global transaction monitor to coordinate distributed transactions among semi-autonomous XA-compliant database resources. Oracle RAC uses a different model to achieve scalability, based on a "shared-everything" architecture that relies upon high-speed connections between servers.
While DBMS vendors debate the relative merits of their favored designs, some companies and researchers question the inherent limitations of relational database management systems. GigaSpaces, for example, contends that an entirely different model of distributed data access and transaction processing, space-based architecture, is required to achieve the highest performance and scalability. On the other hand, Base One makes the case for extreme scalability without departing from mainstream relational database technology. [7] For specialized applications, NoSQL architectures such as Google's BigTable can further enhance scalability. Google's massively distributed Spanner technology, positioned as a successor to BigTable, supports general-purpose database transactions and provides a more conventional SQL-based query language.[8
Definition - What does Data Redundancy mean?
Data redundancy is a condition created within a database or data storage technology in which the same piece of data is held in two separate places.
This can mean two different fields within a single database, or two different spots in multiple software environments or platforms. Whenever data is repeated, this basically constitutes data redundancy. This can occur by accident, but is also done deliberately for backup and recovery purposes.
public key infrastructure (PKI)
A public key infrastructure (PKI) is a set of roles, policies, and procedures needed to create, manage, distribute, use,
store, and revoke digital certificates and manage public-key encryption. ... In a Microsoft PKI, a registration authority is
usually called a subordinate CA.
A public key infrastructure (PKI) is a set of roles, policies, and procedures needed to create, manage, distribute, use,
store, and revoke digital certificates and manage public-key encryption. The purpose of a PKI is to facilitate the secure
electronic transfer of information for a range of network activities such as e-commerce, internet banking and confidential
email. It is required for activities where simple passwords are an inadequate authentication method and more rigorous
proof is required to confirm the identity of the parties involved in the communication and to validate the information being
transferred.[1]
In cryptography, a PKI is an arrangement that binds public keys with respective identities of entities (like persons and
organizations). The binding is established through a process of registration and issuance of certificates at and by a
certificate authority (CA). Depending on the assurance level of the binding, this may be carried out by an automated
process or under human supervision.
The PKI role that assures valid and correct registration is called a registration authority (RA). An RA is responsible for
accepting requests for digital certificates and authenticating the entity making the request.[2]
In a Microsoft PKI, a
registration authority is usually called a subordinate CA.[3]
An entity must be uniquely identifiable within each CA domain on the basis of information about that entity. A third-party
validation authority (VA) can provide this entity information on behalf of the CA.
Design
Public key cryptography is a cryptographic technique that enables entities to securely communicate on an insecure public
network, and reliably verify the identity of an entity via digital signatures.[4]
A public key infrastructure (PKI) is a system for the creation, storage, and distribution of digital certificates which are used
to verify that a particular public key belongs to a certain entity. The PKI creates digital certificates which map public keys
to entities, securely stores these certificates in a central repository and revokes them if needed.[5][6][7]
A PKI consists of:[6][8][9]
A certificate authority (CA) that stores, issues and signs the digital certificates A registration authority which verifies the identity of entities requesting their digital certificates to be stored at the CA A central directory—i.e., a secure location in which to store and index keys A certificate management system managing things like the access to stored certificates or the delivery of the certificates to
be issued. A certificate policy
Methods of certification
Broadly speaking, there have traditionally been three approaches to getting this trust: certificate authorities (CAs), web of
trust (WoT), and simple public key infrastructure (SPKI).[citation needed]
Certificate authorities
The primary role of the CA is to digitally sign and publish the public key bound to a given user. This is done using the
CA's own private key, so that trust in the user key relies on one's trust in the validity of the CA's key. When the CA is a
third party separate from the user and the system, then it is called the Registration Authority (RA), which may or may not
be separate from the CA.[10]
The key-to-user binding is established, depending on the level of assurance the binding has, by
software or under human supervision.
The term trusted third party (TTP) may also be used for certificate authority (CA). Moreover, PKI is itself often used as a
synonym for a CA implementation.[11]
Issuer market share
In this model of trust relationships, a CA is a trusted third party - trusted both by the subject (owner) of the certificate and
by the party relying upon the certificate.
According to NetCraft [2], the industry standard for monitoring Active TLS certificates, states that "Although the global
[TLS] ecosystem is competitive, it is dominated by a handful of major CAs — three certificate authorities (Symantec,
Comodo, GoDaddy) account for three-quarters of all issued [TLS] certificates on public-facing web servers. The top spot
has been held by Symantec (or VeriSign before it was purchased by Symantec) ever since [our] survey began, with it
currently accounting for just under a third of all certificates. To illustrate the effect of differing methodologies, amongst the
million busiest sites Symantec issued 44% of the valid, trusted certificates in use — significantly more than its overall
market share."
Temporary certificates and single sign-on
This approach involves a server that acts as an offline certificate authority within a single sign-on system. A single sign-on
server will issue digital certificates into the client system, but never stores them. Users can execute programs, etc. with the
temporary certificate. It is common to find this solution variety with X.509-based certificates.[12]
Web of trust
Main article: Web of trust
An alternative approach to the problem of public authentication of public key information is the web-of-trust scheme,
which uses self-signed certificates and third party attestations of those certificates. The singular term "web of trust" does
not imply the existence of a single web of trust, or common point of trust, but rather one of any number of potentially
disjoint "webs of trust". Examples of implementations of this approach are PGP (Pretty Good Privacy) and GnuPG (an
implementation of OpenPGP, the standardized specification of PGP). Because PGP and implementations allow the use of
e-mail digital signatures for self-publication of public key information, it is relatively easy to implement one's own web of
trust. [13]
One of the benefits of the web of trust, such as in PGP, is that it can interoperate with a PKI CA fully trusted by all parties
in a domain (such as an internal CA in a company) that is willing to guarantee certificates, as a trusted introducer. If the
"web of trust" is completely trusted then, because of the nature of a web of trust, trusting one certificate is granting trust to
all the certificates in that web. A PKI is only as valuable as the standards and practices that control the issuance of
certificates and including PGP or a personally instituted web of trust could significantly degrade the trustability of that
enterprise's or domain's implementation of PKI.[14]
The web of trust concept was first put forth by PGP creator Phil Zimmermann in 1992 in the manual for PGP version 2.0:
As time goes on, you will accumulate keys from other people that you may want to designate as trusted introducers.
Everyone else will each choose their own trusted introducers. And everyone will gradually accumulate and distribute with
their key a collection of certifying signatures from other people, with the expectation that anyone receiving it will trust at
least one or two of the signatures. This will cause the emergence of a decentralized fault-tolerant web of confidence for all
public keys.
Simple public key infrastructure
Another alternative, which does not deal with public authentication of public key information, is the simple public key
infrastructure (SPKI) that grew out of three independent efforts to overcome the complexities of X.509 and PGP's web of
trust. SPKI does not associate users with persons, since the key is what is trusted, rather than the person. SPKI does not use
any notion of trust, as the verifier is also the issuer. This is called an "authorization loop" in SPKI terminology, where
authorization is integral to its design.[citation needed]
Blockchain-based PKI
An emerging approach for PKI is to use the blockchain technology commonly associated with modern cryptocurrency.
Since blockchain technology aims to provide a distributed and unalterable ledger of information, it has qualities considered
highly suitable for the storage and management of public keys. Emercoin is an example of a blockchain-based
cryptocurrency that supports the storage of different public key types (SSH, GPG, RFC 2230, etc.) and provides open
source software that directly supports PKI for OpenSSH servers.[citation needed]
History
This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced
material may be challenged and removed. (January 2014) (Learn how and when to remove this template message)
Developments in PKI occurred in the early 1970s at the British intelligence agency GCHQ, where James Ellis, Clifford
Cocks and others made important discoveries related to encryption algorithms and key distribution.[15]
However, as
developments at GCHQ are highly classified, the results of this work were kept secret and not publicly acknowledged until
the mid-1990s.
The public disclosure of both secure key exchange and asymmetric key algorithms in 1976 by Diffie, Hellman, Rivest,
Shamir, and Adleman changed secure communications entirely. With the further development of high-speed digital
electronic communications (the Internet and its predecessors), a need became evident for ways in which users could
securely communicate with each other, and as a further consequence of that, for ways in which users could be sure with
whom they were actually interacting.
Assorted cryptographic protocols were invented and analyzed within which the new cryptographic primitives could be
effectively used. With the invention of the World Wide Web and its rapid spread, the need for authentication and secure
communication became still more acute. Commercial reasons alone (e.g., e-commerce, online access to proprietary
databases from web browsers) were sufficient. Taher Elgamal and others at Netscape developed the SSL protocol ('https' in
Web URLs); it included key establishment, server authentication (prior to v3, one-way only), and so on. A PKI structure
was thus created for Web users/sites wishing secure communications.
Vendors and entrepreneurs saw the possibility of a large market, started companies (or new projects at existing
companies), and began to agitate for legal recognition and protection from liability. An American Bar Association
technology project published an extensive analysis of some of the foreseeable legal aspects of PKI operations (see ABA
digital signature guidelines), and shortly thereafter, several U.S. states (Utah being the first in 1995) and other jurisdictions
throughout the world began to enact laws and adopt regulations. Consumer groups raised questions about privacy, access,
and liability considerations, which were more taken into consideration in some jurisdictions than in others.
The enacted laws and regulations differed, there were technical and operational problems in converting PKI schemes into
successful commercial operation, and progress has been much slower than pioneers had imagined it would be.
By the first few years of the 21st century, the underlying cryptographic engineering was clearly not easy to deploy
correctly. Operating procedures (manual or automatic) were not easy to correctly design (nor even if so designed, to
execute perfectly, which the engineering required). The standards that existed were insufficient.
PKI vendors have found a market, but it is not quite the market envisioned in the mid-1990s, and it has grown both more
slowly and in somewhat different ways than were anticipated.[16]
PKIs have not solved some of the problems they were
expected to, and several major vendors have gone out of business or been acquired by others. PKI has had the most success
in government implementations; the largest PKI implementation to date is the Defense Information Systems Agency
(DISA) PKI infrastructure for the Common Access Cards program.
Uses
PKIs of one type or another, and from any of several vendors, have many uses, including providing public keys and
bindings to user identities which are used for:
Encryption and/or sender authentication of e-mail messages (e.g., using OpenPGP or S/MIME) Encryption and/or authentication of documents (e.g., the XML Signature [3] or XML Encryption [4] standards if documents
are encoded as XML) Authentication of users to applications (e.g., smart card logon, client authentication with SSL). There's experimental usage
for digitally signed HTTP authentication in the Enigform and mod_openpgp projects Bootstrapping secure communication protocols, such as Internet key exchange (IKE) and SSL. In both of these, initial set-up
of a secure channel (a "security association") uses asymmetric key—i.e., public key—methods, whereas actual communication uses faster symmetric key—i.e., secret key—methods.
Mobile signatures are electronic signatures that are created using a mobile device and rely on signature or certification services in a location independent telecommunication environment[17]
Open source implementations
OpenSSL is the simplest form of CA and tool for PKI. It is a toolkit, developed in C, that is included in all major Linux distributions, and can be used both to build your own (simple) CA and to PKI-enable applications. (Apache licensed)
EJBCA is a full featured, Enterprise grade, CA implementation developed in Java. It can be used to set up a CA both for
internal use and as a service. (LGPL licensed) OpenCA is a full featured CA implementation using a number of different tools. OpenCA uses OpenSSL for the underlying PKI
operations. XCA is a graphical interface, and database. XCA uses OpenSSL for the underlying PKI operations. (Discontinued) TinyCA was a graphical interface for OpenSSL. XiPKI,[18] CA and OCSP responder. With SHA3 support, OSGi-based (Java).
Criticism
Some argue that purchasing certificates for securing websites by SSL and securing software by code signing is a costly
venture for small businesses.[19]
Presently Symantec holds a major share in PKI certificate market which sold one third of
all certificates issued globally in 2013.[20]
HTTP/2, the latest version of HTTP protocol allows unsecured connections in
theory, in practice major browser companies have made it clear that they would support this state-of-art protocol only over
a PKI secured TLS connection.[21]
Web browser implementation of HTTP/2 including Edge from Microsoft, Chrome from
Google, Firefox from Mozilla, and Opera supports HTTP/2 only over TLS by using ALPN extension of TLS protocol.
This would mean that to get the speed benefits of HTTP/2, website owners would be forced to purchase SSL certificates
controlled by corporations such as Symantec.
Current web browsers carry pre-installed intermediary certificates issued and signed by a Certificate Authority. This means
browsers need to carry a large number of different certificate providers, increasing the risk of a key compromise.
Furthermore, governments can force certificate providers to give their root certificate keys, which in turn would help them
to decrypt traffic by performing a man-in-middle-attack (MITM).
When a key is known to be compromised it could be fixed by revoking the certificate, but such a compromise is not easily
detectable and can be a huge security breach. Browsers have to issue a security patch to revoke intermediary certificates
issued by a compromised root certificate authority.[22]
Some practical security vulnerabilities of X.509 certificates and
known cases where keys were stolen from a major Certificate Authority listed below.
HASHING
Hashing, Hash Data Structure and Hash Table
Hashing is the process of mapping large amount of data item to a smaller table with the help of a hashing function. The
essence of hashing is to facilitate the next level searching method when compared with the linear or binary search. The
advantage of this searching method is its efficiency to hand vast amount of data items in a given collection (i.e. collection
size).
Due to this hashing process, the result is a Hash data structure that can store or retrieve data items in an average time
disregard to the collection size.
Hash Table is the result of storing the hash data structure in a smaller table which incorporates the hash function within
itself. The Hash Function primarily is responsible to map between the original data item and the smaller table itself. Here
the mapping takes place with the help of an output integer in a consistent range produced when a given data item (any
data type) is provided for storage and this output integer range determines the location in the smaller table for the data
item. In terms of implementation, the hash table is constructed with the help of an array and the indices of this array are
associated to the output integer range.
Hash Table Example :
Here, we construct a hash table for storing and retrieving data related to the citizens of a county and the social-security
number of citizens are used as the indices of the array implementation (i.e. key). Let's assume that the table size is 12,
therefore the hash function would be Value modulus of 12.
Hence, the Hash Function would equate to:
(sum of numeric values of the characters in the data item) %12
Note! % is the modulus operator
Let us consider the following social-security numbers and produce a hashcode:
120388113D => 1+2+0+3+8+8+1+1+3+13=40
Hence, (40)%12 => Hashcode=4
310181312E => 3+1+0+1+8+1+3+1+2+14=34
Hence, (34)%12 => Hashcode=10
041176438A => 0+4+1+1+7+6+4+3+8+10=44
Hence, (44)%12 => Hashcode=8
Therefore, the Hashtable content would be as follows:
-----------------------------------------------------
0:empty
1:empty
2:empty
3:empty
4:occupied Name:Drew Smith SSN:120388113D
5:empty
6:empty
7:empty
8:occupied Name:Andy Conn SSN:041176438A
9:empty
10:occupied Name:Igor Barton SSN:310181312E
11:empty
A hash function is a mathematical function that converts a numerical input value into another compressed numerical
value. The input to the hash function is of arbitrary length but output is always of fixed length. Values returned by a hash
function are called message digest or simply hash values.
Hash functions are extremely useful and appear in almost all information security applications.
A hash function is a mathematical function that converts a numerical input value into another compressed numerical value.
The input to the hash function is of arbitrary length but output is always of fixed length.
Values returned by a hash function are called message digest or simply hash values. The following picture illustrated hash
function −
Features of Hash Functions
The typical features of hash functions are −
Fixed Length Output (Hash Value)
o Hash function coverts data of arbitrary length to a fixed length. This process is often referred to as hashing
the data.
o In general, the hash is much smaller than the input data, hence hash functions are sometimes called
compression functions.
o Since a hash is a smaller representation of a larger data, it is also referred to as a digest.
o Hash function with n bit output is referred to as an n-bit hash function. Popular hash functions generate
values between 160 and 512 bits.
Efficiency of Operation
o Generally for any hash function h with input x, computation of h(x) is a fast operation.
o Computationally hash functions are much faster than a symmetric encryption.
Properties of Hash Functions
In order to be an effective cryptographic tool, the hash function is desired to possess following properties −
Pre-Image Resistance o This property means that it should be computationally hard to reverse a hash function.
o In other words, if a hash function h produced a hash value z, then it should be a difficult process to find any
input value x that hashes to z.
o This property protects against an attacker who only has a hash value and is trying to find the input.
Second Pre-Image Resistance
o This property means given an input and its hash, it should be hard to find a different input with the same
hash.
o In other words, if a hash function h for an input x produces hash value h(x), then it should be difficult to
find any other input value y such that h(y) = h(x).
o This property of hash function protects against an attacker who has an input value and its hash, and wants to
substitute different value as legitimate value in place of original input value.
Collision Resistance
o This property means it should be hard to find two different inputs of any length that result in the same hash.
This property is also referred to as collision free hash function.
o In other words, for a hash function h, it is hard to find any two different inputs x and y such that h(x) = h(y).
o Since, hash function is compressing function with fixed hash length, it is impossible for a hash function not
to have collisions. This property of collision free only confirms that these collisions should be hard to find.
o This property makes it very difficult for an attacker to find two input values with the same hash.
o Also, if a hash function is collision-resistant then it is second pre-image resistant.
Design of Hashing Algorithms
At the heart of a hashing is a mathematical function that operates on two fixed-size blocks of data to create a hash code.
This hash function forms the part of the hashing algorithm.
The size of each data block varies depending on the algorithm. Typically the block sizes are from 128 bits to 512 bits. The
following illustration demonstrates hash function −
Hashing algorithm involves rounds of above hash function like a block cipher. Each round takes an input of a fixed size,
typically a combination of the most recent message block and the output of the last round.
This process is repeated for as many rounds as are required to hash the entire message. Schematic of hashing algorithm is
depicted in the following illustration −
Since, the hash value of first message block becomes an input to the second hash operation, output of which alters the
result of the third operation, and so on. This effect, known as an avalanche effect of hashing.
Avalanche effect results in substantially different hash values for two messages that differ by even a single bit of data.
Understand the difference between hash function and algorithm correctly. The hash function generates a hash code by
operating on two blocks of fixed-length binary data.
Hashing algorithm is a process for using the hash function, specifying how the message will be broken up and how the
results from previous message blocks are chained together.
Popular Hash Functions
Let us briefly see some popular hash functions −
Message Digest (MD)
MD5 was most popular and widely used hash function for quite some years.
The MD family comprises of hash functions MD2, MD4, MD5 and MD6. It was adopted as Internet Standard RFC
1321. It is a 128-bit hash function.
MD5 digests have been widely used in the software world to provide assurance about integrity of transferred file.
For example, file servers often provide a pre-computed MD5 checksum for the files, so that a user can compare the
checksum of the downloaded file to it.
In 2004, collisions were found in MD5. An analytical attack was reported to be successful only in an hour by using
computer cluster. This collision attack resulted in compromised MD5 and hence it is no longer recommended for
use.
Secure Hash Function (SHA)
Family of SHA comprise of four SHA algorithms; SHA-0, SHA-1, SHA-2, and SHA-3. Though from same family, there
are structurally different.
The original version is SHA-0, a 160-bit hash function, was published by the National Institute of Standards and
Technology (NIST) in 1993. It had few weaknesses and did not become very popular. Later in 1995, SHA-1 was
designed to correct alleged weaknesses of SHA-0.
SHA-1 is the most widely used of the existing SHA hash functions. It is employed in several widely used
applications and protocols including Secure Socket Layer (SSL) security.
In 2005, a method was found for uncovering collisions for SHA-1 within practical time frame making long-term
employability of SHA-1 doubtful.
SHA-2 family has four further SHA variants, SHA-224, SHA-256, SHA-384, and SHA-512 depending up on
number of bits in their hash value. No successful attacks have yet been reported on SHA-2 hash function.
Though SHA-2 is a strong hash function. Though significantly different, its basic design is still follows design of
SHA-1. Hence, NIST called for new competitive hash function designs.
In October 2012, the NIST chose the Keccak algorithm as the new SHA-3 standard. Keccak offers many benefits,
such as efficient performance and good resistance for attacks.
RIPEMD
The RIPEND is an acronym for RACE Integrity Primitives Evaluation Message Digest. This set of hash functions was
designed by open research community and generally known as a family of European hash functions.
The set includes RIPEND, RIPEMD-128, and RIPEMD-160. There also exist 256, and 320-bit versions of this
algorithm.
Original RIPEMD (128 bit) is based upon the design principles used in MD4 and found to provide questionable
security. RIPEMD 128-bit version came as a quick fix replacement to overcome vulnerabilities on the original
RIPEMD.
RIPEMD-160 is an improved version and the most widely used version in the family. The 256 and 320-bit versions
reduce the chance of accidental collision, but do not have higher levels of security as compared to RIPEMD-128
and RIPEMD-160 respectively.
Whirlpool
This is a 512-bit hash function.
It is derived from the modified version of Advanced Encryption Standard (AES). One of the designer was Vincent
Rijmen, a co-creator of the AES.
Three versions of Whirlpool have been released; namely WHIRLPOOL-0, WHIRLPOOL-T, and WHIRLPOOL.
Applications of Hash Functions
There are two direct applications of hash function based on its cryptographic properties.
Password Storage
Hash functions provide protection to password storage.
Instead of storing password in clear, mostly all logon processes store the hash values of passwords in the file.
The Password file consists of a table of pairs which are in the form (user id, h(P)).
The process of logon is depicted in the following illustration −
An intruder can only see the hashes of passwords, even if he accessed the password. He can neither logon using
hash nor can he derive the password from hash value since hash function possesses the property of pre-image
resistance.
Data Integrity Check
Data integrity check is a most common application of the hash functions. It is used to generate the checksums on data files.
This application provides assurance to the user about correctness of the data.
The process is depicted in the following illustration −
The integrity check helps the user to detect any changes made to original file. It however, does not provide any assurance
about originality. The attacker, instead of modifying file data, can change the entire file and compute all together new hash
and send to the receiver. This integrity check application is useful only if the user is sure about the originality of file.
Characteristics of Modern Cryptography Modern cryptography is the cornerstone of computer and communications security. Its foundation is based on various
concepts of mathematics such as number theory, computational-complexity theory, and probability theory.
Characteristics of Modern Cryptography
There are three major characteristics that separate modern cryptography from the classical approach.
Classic Cryptography Modern Cryptography
It manipulates traditional characters, i.e., letters and digits
directly. It operates on binary bit sequences.
It is mainly based on ‘security through obscurity’. The techniques
employed for coding were kept secret and only the parties
involved in communication knew about them.
It relies on publicly known mathematical algorithms for coding
the information. Secrecy is obtained through a secrete key which
is used as the seed for the algorithms. The computational
difficulty of algorithms, absence of secret key, etc., make it
impossible for an attacker to obtain the original information
even if he knows the algorithm used for coding.
It requires the entire cryptosystem for communicating
confidentially.
Modern cryptography requires parties interested in secure
communication to possess the secret key only.
Context of Cryptography
Cryptology, the study of cryptosystems, can be subdivided into two branches −
Cryptography Cryptanalysis
What is Cryptography?
Cryptography is the art and science of making a cryptosystem that is capable of providing information security.
Cryptography deals with the actual securing of digital data. It refers to the design of mechanisms based on mathematical
algorithms that provide fundamental information security services. You can think of cryptography as the establishment of a
large toolkit containing different techniques in security applications.
What is Cryptanalysis?
The art and science of breaking the cipher text is known as cryptanalysis.
Cryptanalysis is the sister branch of cryptography and they both co-exist. The cryptographic process results in the cipher
text for transmission or storage. It involves the study of cryptographic mechanism with the intention to break them.
Cryptanalysis is also used during the design of the new cryptographic techniques to test their security strengths.
Note − Cryptography concerns with the design of cryptosystems, while cryptanalysis studies the breaking of
cryptosystems.
Security Services of Cryptography
The primary objective of using cryptography is to provide the following four fundamental information security services.
Let us now see the possible goals intended to be fulfilled by cryptography.
Confidentiality
Confidentiality is the fundamental security service provided by cryptography. It is a security service that keeps the
information from an unauthorized person. It is sometimes referred to as privacy or secrecy.
Confidentiality can be achieved through numerous means starting from physical securing to the use of mathematical
algorithms for data encryption.
Data Integrity
It is security service that deals with identifying any alteration to the data. The data may get modified by an unauthorized
entity intentionally or accidently. Integrity service confirms that whether data is intact or not since it was last created,
transmitted, or stored by an authorized user.
Data integrity cannot prevent the alteration of data, but provides a means for detecting whether data has been manipulated
in an unauthorized manner.
Authentication
Authentication provides the identification of the originator. It confirms to the receiver that the data received has been sent
only by an identified and verified sender.
Authentication service has two variants −
Message authentication identifies the originator of the message without any regard router or system that has sent
the message.
Entity authentication is assurance that data has been received from a specific entity, say a particular website.
Apart from the originator, authentication may also provide assurance about other parameters related to data such as the
date and time of creation/transmission.
Non-repudiation
It is a security service that ensures that an entity cannot refuse the ownership of a previous commitment or an action. It is
an assurance that the original creator of the data cannot deny the creation or transmission of the said data to a recipient or
third party.
Non-repudiation is a property that is most desirable in situations where there are chances of a dispute over the exchange of
data. For example, once an order is placed electronically, a purchaser cannot deny the purchase order, if non-repudiation
service was enabled in this transaction.
Cryptography Primitives
Cryptography primitives are nothing but the tools and techniques in Cryptography that can be selectively used to provide a
set of desired security services −
Encryption Hash functions Message Authentication codes (MAC) Digital Signatures
The following table shows the primitives that can achieve a particular security service on their own.
Note − Cryptographic primitives are intricately related and they are often combined to achieve a set of desired security
services from a cryptosystem.
Cryptographic Attacks In the present era, not only business but almost all the aspects of human life are driven by information. Hence, it has
become imperative to protect useful information from malicious activities such as attacks. Let us consider the types of
attacks to which information is typically subjected to.
Attacks are typically categorized based on the action performed by the attacker. An attack, thus, can be passive or active.
Passive Attacks
The main goal of a passive attack is to obtain unauthorized access to the information. For example, actions such as
intercepting and eavesdropping on the communication channel can be regarded as passive attack.
These actions are passive in nature, as they neither affect information nor disrupt the communication channel. A passive
attack is often seen as stealing information. The only difference in stealing physical goods and stealing information is that
theft of data still leaves the owner in possession of that data. Passive information attack is thus more dangerous than
stealing of goods, as information theft may go unnoticed by the owner.
Active Attacks
An active attack involves changing the information in some way by conducting some process on the information. For
example,
Modifying the information in an unauthorized manner.
Initiating unintended or unauthorized transmission of information.
Alteration of authentication data such as originator name or timestamp associated with information
Unauthorized deletion of data.
Denial of access to information for legitimate users (denial of service).
Cryptography provides many tools and techniques for implementing cryptosystems capable of preventing most of the
attacks described above.
Assumptions of Attacker
Let us see the prevailing environment around cryptosystems followed by the types of attacks employed to break these
systems −
Environment around Cryptosystem
While considering possible attacks on the cryptosystem, it is necessary to know the cryptosystems environment. The
attacker‘s assumptions and knowledge about the environment decides his capabilities.
In cryptography, the following three assumptions are made about the security environment and attacker‘s capabilities.
Details of the Encryption Scheme
The design of a cryptosystem is based on the following two cryptography algorithms −
Public Algorithms − With this option, all the details of the algorithm are in the public domain, known to everyone.
Proprietary algorithms − The details of the algorithm are only known by the system designers and users.
In case of proprietary algorithms, security is ensured through obscurity. Private algorithms may not be the strongest
algorithms as they are developed in-house and may not be extensively investigated for weakness.
Secondly, they allow communication among closed group only. Hence they are not suitable for modern communication
where people communicate with large number of known or unknown entities. Also, according to Kerckhoff‘s principle, the
algorithm is preferred to be public with strength of encryption lying in the key.
Thus, the first assumption about security environment is that the encryption algorithm is known to the attacker.
Availability of Ciphertext
We know that once the plaintext is encrypted into ciphertext, it is put on unsecure public channel (say email) for
transmission. Thus, the attacker can obviously assume that it has access to the ciphertext generated by the
cryptosystem.
Availability of Plaintext and Ciphertext
This assumption is not as obvious as other. However, there may be situations where an attacker can have access to
plaintext and corresponding ciphertext. Some such possible circumstances are −
The attacker influences the sender to convert plaintext of his choice and obtains the ciphertext.
The receiver may divulge the plaintext to the attacker inadvertently. The attacker has access to corresponding
ciphertext gathered from open channel.
In a public-key cryptosystem, the encryption key is in open domain and is known to any potential attacker. Using
this key, he can generate pairs of corresponding plaintexts and ciphertexts.
Cryptographic Attacks
The basic intention of an attacker is to break a cryptosystem and to find the plaintext from the ciphertext. To obtain the
plaintext, the attacker only needs to find out the secret decryption key, as the algorithm is already in public domain.
Hence, he applies maximum effort towards finding out the secret key used in the cryptosystem. Once the attacker is able to
determine the key, the attacked system is considered as broken or compromised.
Based on the methodology used, attacks on cryptosystems are categorized as follows −
Ciphertext Only Attacks (COA) − In this method, the attacker has access to a set of ciphertext(s). He does not
have access to corresponding plaintext. COA is said to be successful when the corresponding plaintext can be
determined from a given set of ciphertext. Occasionally, the encryption key can be determined from this attack.
Modern cryptosystems are guarded against ciphertext-only attacks.
Known Plaintext Attack (KPA) − In this method, the attacker knows the plaintext for some parts of the
ciphertext. The task is to decrypt the rest of the ciphertext using this information. This may be done by determining
the key or via some other method. The best example of this attack is linear cryptanalysis against block ciphers.
Chosen Plaintext Attack (CPA) − In this method, the attacker has the text of his choice encrypted. So he has the
ciphertext-plaintext pair of his choice. This simplifies his task of determining the encryption key. An example of
this attack is differential cryptanalysis applied against block ciphers as well as hash functions. A popular public key
cryptosystem, RSA is also vulnerable to chosen-plaintext attacks.
Dictionary Attack − This attack has many variants, all of which involve compiling a ‗dictionary‘. In simplest
method of this attack, attacker builds a dictionary of ciphertexts and corresponding plaintexts that he has learnt over
a period of time. In future, when an attacker gets the ciphertext, he refers the dictionary to find the corresponding
plaintext.
Brute Force Attack (BFA) − In this method, the attacker tries to determine the key by attempting all possible
keys. If the key is 8 bits long, then the number of possible keys is 28 = 256. The attacker knows the ciphertext and
the algorithm, now he attempts all the 256 keys one by one for decryption. The time to complete the attack would
be very high if the key is long.
Birthday Attack − This attack is a variant of brute-force technique. It is used against the cryptographic hash
function. When students in a class are asked about their birthdays, the answer is one of the possible 365 dates. Let
us assume the first student's birthdate is 3rd
Aug. Then to find the next student whose birthdate is 3rd
Aug, we need
to enquire 1.25*√365 ≈ 25 students.
Similarly, if the hash function produces 64 bit hash values, the possible hash values are 1.8x1019
. By repeatedly
evaluating the function for different inputs, the same output is expected to be obtained after about 5.1x109 random
inputs.
If the attacker is able to find two different inputs that give the same hash value, it is a collision and that hash
function is said to be broken.
Man in Middle Attack (MIM) − The targets of this attack are mostly public key cryptosystems where key
exchange is involved before communication takes place.
o Host A wants to communicate to host B, hence requests public key of B.
o An attacker intercepts this request and sends his public key instead.
o Thus, whatever host A sends to host B, the attacker is able to read.
o In order to maintain communication, the attacker re-encrypts the data after reading with his public key and
sends to B.
o The attacker sends his public key as A‘s public key so that B takes it as if it is taking it from A.
Side Channel Attack (SCA) − This type of attack is not against any particular type of cryptosystem or algorithm.
Instead, it is launched to exploit the weakness in physical implementation of the cryptosystem.
Timing Attacks − They exploit the fact that different computations take different times to compute on processor.
By measuring such timings, it is be possible to know about a particular computation the processor is carrying out.
For example, if the encryption takes a longer time, it indicates that the secret key is long.
Power Analysis Attacks − These attacks are similar to timing attacks except that the amount of power
consumption is used to obtain information about the nature of the underlying computations.
Fault analysis Attacks − In these attacks, errors are induced in the cryptosystem and the attacker studies the
resulting output for useful information.
Practicality of Attacks
The attacks on cryptosystems described here are highly academic, as majority of them come from the academic
community. In fact, many academic attacks involve quite unrealistic assumptions about environment as well as the
capabilities of the attacker. For example, in chosen-ciphertext attack, the attacker requires an impractical number of
deliberately chosen plaintext-ciphertext pairs. It may not be practical altogether.
Nonetheless, the fact that any attack exists should be a cause of concern, particularly if the attack technique has the
potential for improvement.
Data Encryption Standard (DES)
Data Encryption Standard (DES) is a symmetric-key block cipher published by the National Institute of Standards and
Technology (NIST).
DES is an implementation of a Feistel Cipher. It uses 16 round Feistel structure. The block size is 64-bit. Though, key
length is 64-bit, DES has an effective key length of 56 bits, since 8 of the 64 bits of the key are not used by the encryption
algorithm (function as check bits only). General Structure of DES is depicted in the following illustration −
Since DES is based on the Feistel Cipher, all that is required to specify DES is −
Round function Key schedule Any additional processing − Initial and final permutation
Initial and Final Permutation
The initial and final permutations are straight Permutation boxes (P-boxes) that are inverses of each other. They have no
cryptography significance in DES. The initial and final permutations are shown as follows −
Round Function
The heart of this cipher is the DES function, f. The DES function applies a 48-bit key to the rightmost 32 bits to produce a
32-bit output.
Expansion Permutation Box − Since right input is 32-bit and round key is a 48-bit, we first need to expand right
input to 48 bits. Permutation logic is graphically depicted in the following illustration −
The graphically depicted permutation logic is generally described as table in DES specification illustrated as shown
−
XOR (Whitener). − After the expansion permutation, DES does XOR operation on the expanded right section and
the round key. The round key is used only in this operation.
Substitution Boxes. − The S-boxes carry out the real mixing (confusion). DES uses 8 S-boxes, each with a 6-bit
input and a 4-bit output. Refer the following illustration −
The S-box rule is illustrated below −
There are a total of eight S-box tables. The output of all eight s-boxes is then combined in to 32 bit section.
Straight Permutation − The 32 bit output of S-boxes is then subjected to the straight permutation with rule shown
in the following illustration:
Key Generation
The round-key generator creates sixteen 48-bit keys out of a 56-bit cipher key. The process of key generation is depicted in
the following illustration −
The logic for Parity drop, shifting, and Compression P-box is given in the DES description.
DES Analysis
The DES satisfies both the desired properties of block cipher. These two properties make cipher very strong.
Avalanche effect − A small change in plaintext results in the very grate change in the ciphertext.
Completeness − Each bit of ciphertext depends on many bits of plaintext.
During the last few years, cryptanalysis have found some weaknesses in DES when key selected are weak keys. These
keys shall be avoided.
DES has proved to be a very well designed block cipher. There have been no significant cryptanalytic attacks on DES
other than exhaustive key search.
Message Authentication Code (MAC) Another type of threat that exist for data is the lack of message authentication. In this threat, the user is not sure about the
originator of the message. Message authentication can be provided using the cryptographic techniques that use secret keys
as done in case of encryption.
Message Authentication Code (MAC)
MAC algorithm is a symmetric key cryptographic technique to provide message authentication. For establishing MAC
process, the sender and receiver share a symmetric key K.
Essentially, a MAC is an encrypted checksum generated on the underlying message that is sent along with a message to
ensure message authentication.
The process of using MAC for authentication is depicted in the following illustration −
Let us now try to understand the entire process in detail −
The sender uses some publicly known MAC algorithm, inputs the message and the secret key K and produces a
MAC value.
Similar to hash, MAC function also compresses an arbitrary long input into a fixed length output. The major
difference between hash and MAC is that MAC uses secret key during the compression.
The sender forwards the message along with the MAC. Here, we assume that the message is sent in the clear, as we
are concerned of providing message origin authentication, not confidentiality. If confidentiality is required then the
message needs encryption.
On receipt of the message and the MAC, the receiver feeds the received message and the shared secret key K into
the MAC algorithm and re-computes the MAC value.
The receiver now checks equality of freshly computed MAC with the MAC received from the sender. If they
match, then the receiver accepts the message and assures himself that the message has been sent by the intended
sender.
If the computed MAC does not match the MAC sent by the sender, the receiver cannot determine whether it is the
message that has been altered or it is the origin that has been falsified. As a bottom-line, a receiver safely assumes
that the message is not the genuine.
Limitations of MAC
There are two major limitations of MAC, both due to its symmetric nature of operation −
Establishment of Shared Secret. o It can provide message authentication among pre-decided legitimate users who have shared key.
o This requires establishment of shared secret prior to use of MAC.
Inability to Provide Non-Repudiation
o Non-repudiation is the assurance that a message originator cannot deny any previously sent messages and
commitments or actions.
o MAC technique does not provide a non-repudiation service. If the sender and receiver get involved in a
dispute over message origination, MACs cannot provide a proof that a message was indeed sent by the
sender.
o Though no third party can compute the MAC, still sender could deny having sent the message and claim
that the receiver forged it, as it is impossible to determine which of the two parties computed the MAC.
Both these limitations can be overcome by using the public key based digital signatures discussed in following section.
Cryptography Digital signatures Digital signatures are the public-key primitives of message authentication. In the physical world, it is common to use
handwritten signatures on handwritten or typed messages. They are used to bind signatory to the message.
Similarly, a digital signature is a technique that binds a person/entity to the digital data. This binding can be independent ly
verified by receiver as well as any third party.
Digital signature is a cryptographic value that is calculated from the data and a secret key known only by the signer.
In real world, the receiver of message needs assurance that the message belongs to the sender and he should not be able to
repudiate the origination of that message. This requirement is very crucial in business applications, since likelihood of a
dispute over exchanged data is very high.
Model of Digital Signature
As mentioned earlier, the digital signature scheme is based on public key cryptography. The model of digital signature
scheme is depicted in the following illustration −
The following points explain the entire process in detail −
Each person adopting this scheme has a public-private key pair.
Generally, the key pairs used for encryption/decryption and signing/verifying are different. The private key used
for signing is referred to as the signature key and the public key as the verification key.
Signer feeds data to the hash function and generates hash of data.
Hash value and signature key are then fed to the signature algorithm which produces the digital signature on given
hash. Signature is appended to the data and then both are sent to the verifier.
Verifier feeds the digital signature and the verification key into the verification algorithm. The verification
algorithm gives some value as output.
Verifier also runs same hash function on received data to generate hash value.
For verification, this hash value and output of verification algorithm are compared. Based on the comparison result,
verifier decides whether the digital signature is valid.
Since digital signature is created by ‗private‘ key of signer and no one else can have this key; the signer cannot
repudiate signing the data in future.
It should be noticed that instead of signing data directly by signing algorithm, usually a hash of data is created. Since the
hash of data is a unique representation of data, it is sufficient to sign the hash in place of data. The most important reason
of using hash instead of data directly for signing is efficiency of the scheme.
Let us assume RSA is used as the signing algorithm. As discussed in public key encryption chapter, the encryption/signing
process using RSA involves modular exponentiation.
Signing large data through modular exponentiation is computationally expensive and time consuming. The hash of the data
is a relatively small digest of the data, hence signing a hash is more efficient than signing the entire data.
Importance of Digital Signature
Out of all cryptographic primitives, the digital signature using public key cryptography is considered as very important and
useful tool to achieve information security.
Apart from ability to provide non-repudiation of message, the digital signature also provides message authentication and
data integrity. Let us briefly see how this is achieved by the digital signature −
Message authentication − When the verifier validates the digital signature using public key of a sender, he is
assured that signature has been created only by sender who possess the corresponding secret private key and no one
else.
Data Integrity − In case an attacker has access to the data and modifies it, the digital signature verification at
receiver end fails. The hash of modified data and the output provided by the verification algorithm will not match.
Hence, receiver can safely deny the message assuming that data integrity has been breached.
Non-repudiation − Since it is assumed that only the signer has the knowledge of the signature key, he can only
create unique signature on a given data. Thus the receiver can present data and the digital signature to a third party
as evidence if any dispute arises in the future.
By adding public-key encryption to digital signature scheme, we can create a cryptosystem that can provide the four
essential elements of security namely − Privacy, Authentication, Integrity, and Non-repudiation.
Encryption with Digital Signature
In many digital communications, it is desirable to exchange an encrypted messages than plaintext to achieve
confidentiality. In public key encryption scheme, a public (encryption) key of sender is available in open domain, and
hence anyone can spoof his identity and send any encrypted message to the receiver.
This makes it essential for users employing PKC for encryption to seek digital signatures along with encrypted data to be
assured of message authentication and non-repudiation.
This can archived by combining digital signatures with encryption scheme. Let us briefly discuss how to achieve this
requirement. There are two possibilities, sign-then-encrypt and encrypt-then-sign.
However, the crypto system based on sign-then-encrypt can be exploited by receiver to spoof identity of sender and sent
that data to third party. Hence, this method is not preferred. The process of encrypt-then-sign is more reliable and widely
adopted. This is depicted in the following illustration −
The receiver after receiving the encrypted data and signature on it, first verifies the signature using sender‘s public key.
After ensuring the validity of the signature, he then retrieves the data through decryption using his private key.
Cryptography Benefits & Drawbacks Nowadays, the networks have gone global and information has taken the digital form of bits and bytes. Critical
information now gets stored, processed and transmitted in digital form on computer systems and open communication
channels.
Since information plays such a vital role, adversaries are targeting the computer systems and open communication
channels to either steal the sensitive information or to disrupt the critical information system.
Modern cryptography provides a robust set of techniques to ensure that the malevolent intentions of the adversary are
thwarted while ensuring the legitimate users get access to information. Here in this chapter, we will discuss the benefits
that we draw from cryptography, its limitations, as well as the future of cryptography.
Cryptography – Benefits
Cryptography is an essential information security tool. It provides the four most basic services of information security −
Confidentiality − Encryption technique can guard the information and communication from unauthorized
revelation and access of information.
Authentication − The cryptographic techniques such as MAC and digital signatures can protect information
against spoofing and forgeries.
Data Integrity − The cryptographic hash functions are playing vital role in assuring the users about the data
integrity.
Non-repudiation − The digital signature provides the non-repudiation service to guard against the dispute that may
arise due to denial of passing message by the sender.
All these fundamental services offered by cryptography has enabled the conduct of business over the networks using the
computer systems in extremely efficient and effective manner.
Cryptography – Drawbacks
Apart from the four fundamental elements of information security, there are other issues that affect the effective use of
information −
A strongly encrypted, authentic, and digitally signed information can be difficult to access even for a legitimate
user at a crucial time of decision-making. The network or the computer system can be attacked and rendered non-
functional by an intruder.
High availability, one of the fundamental aspects of information security, cannot be ensured through the use of
cryptography. Other methods are needed to guard against the threats such as denial of service or complete
breakdown of information system.
Another fundamental need of information security of selective access control also cannot be realized through the
use of cryptography. Administrative controls and procedures are required to be exercised for the same.
Cryptography does not guard against the vulnerabilities and threats that emerge from the poor design of
systems, protocols, and procedures. These need to be fixed through proper design and setting up of a defensive
infrastructure.
Cryptography comes at cost. The cost is in terms of time and money −
o Addition of cryptographic techniques in the information processing leads to delay.
o The use of public key cryptography requires setting up and maintenance of public key infrastructure
requiring the handsome financial budget.
The security of cryptographic technique is based on the computational difficulty of mathematical problems. Any
breakthrough in solving such mathematical problems or increasing the computing power can render a cryptographic
technique vulnerable.
Future of Cryptography
Elliptic Curve Cryptography (ECC) has already been invented but its advantages and disadvantages are not yet fully
understood. ECC allows to perform encryption and decryption in a drastically lesser time, thus allowing a higher amount
of data to be passed with equal security. However, as other methods of encryption, ECC must also be tested and proven
secure before it is accepted for governmental, commercial, and private use.
Quantum computation is the new phenomenon. While modern computers store data using a binary format called a "bit"
in which a "1" or a "0" can be stored; a quantum computer stores data using a quantum superposition of multiple states.
These multiple valued states are stored in "quantum bits" or "qubits". This allows the computation of numbers to be several
orders of magnitude faster than traditional transistor processors.
To comprehend the power of quantum computer, consider RSA-640, a number with 193 digits, which can be factored by
eighty 2.2GHz computers over the span of 5 months, one quantum computer would factor in less than 17 seconds.
Numbers that would typically take billions of years to compute could only take a matter of hours or even minutes with a
fully developed quantum computer.
In view of these facts, modern cryptography will have to look for computationally harder problems or devise completely
new techniques of archiving the goals presently served by modern cryptography.
Behavioral questions will be experience-based and you need a lot of practice to be able to answer them in a satisfactory
manner.
STAR Technique
To answer Behavioral Questions, employ the STAR technique −
S = Situation − (recall an incident in your life that suits the situation)
T = Task − (recall an incident in your life that suits the task)
A = Action − (mention the course of action you opted to address the situation or task)
R = Result − (mention the result of your action and the outcome)
Q − Tell me about an incident where you worked effectively under pressure.
Remember that these are only sample interview answers meant to give a general idea on the approach to
Behavioral Interviews. You need to formulate your own answers to suit the context and scenario asked in the
question.
Sample Behavioral Interview Questions
Q1 − Describe a bad experience you had working with your ex-employer Q2 − Describe how you handle disagreement.
Q3 − Explain a situation when you explained a complex idea simply. Q4 − Describe a time when you had to adapt to a
change at work. Q5 − Describe a time when you made a mistake. Q6 − Describe a time when you delegated tasks to team-
mates. Q7 − Describe when you were blamed for somebody else‘s mistake. Q8 − Describe a difficult situation that you
faced and how you handled it. Q9 − Describe a new suggestion that you had made to your supervisor Q10 − Describe
when you had to take a judgement on a difficult decision.
It is always advisable to memorize a few keywords on the company‘s needs, problems, or goals. Make sure you visit the
company‘s website before the interview to uncover the needs of this specific job profile, instead of the generalized needs
of the industry.
Sample General Interview Questions
Q1 − Tell me about yourself. Q2 − What are your greatest strengths? Q3 − What are your greatest weaknesses? Q4 − Tell
me about an incident you are ashamed of speaking about Q5 − Why did you leave (or plan to leave) your present
employer? Q6 − The Silent Treatment Q7 − Why should I hire you? Q8 − Where do you see yourself five years from
now? Q9 − Why do you want to work at our company? Q10 − Would you lie for the company? Q11 − Questions on
confidential matters.
In Case Interviews, interviewers tend to not mention important figures and details. They want to see if you have a clear
idea on the industry and on what assumption you will solve the problem. In these situations, it‘s okay to consider assumed
data, but they need to be based on facts and logic.
Answering Case Interview Questions
Answering case interview questions can be tricky, especially when you don‘t get the facts right. Do use the following tips
to tackle such questions −
Listen carefully − Paraphrasing helps in understanding the question completely before answering.
Take time to think − Because of the sheer number of parameters needed to tackle the issue, candidates are
expected to take some time to ponder on the scenario, however anything more than five minutes would be
excessive.
Ask questions − Interviewers deliberately give incomplete questions to check the candidates‘ understanding of
relevant parameters, so they expect a lot of questions from you which makes the entire interview quite interactive.
Use a logical framework − Apply the principles you learned in business colleges as a framework. Examples
include Porter's Five Forces and the SWOT analysis.
Prioritize objectives − Start addressing the most important objectives and concerns and gradually move towards
relatively non-priority topics.
Try and think outside the box − Many interviewers are on the lookout for employees who can bring in creativity
to their problem-solving process.
Exhibit enthusiasm − Behaving as though you feel it's fun to tackle this kind of problem is integral to showing
how well you'd fit in as a consultant or whatever position you're interviewing for.
Standard Case Interview Questions
Market-Sizing Case Interview Questions
Market-Sizing Case Interview Questions need the candidates to guess the market size for a specific product. To answer
these questions, you need to have a close idea on the population of the country, the male-female ratio, different
demographics, among many other parameters. A few popular examples are −
Q. How many light bulbs are there in Delhi?
Q. How many people read gossip magazines in Mumbai?
Q. How many photocopies are taken in Odisha each year?
Q. How much beer is consumed in the city of Chandigarh?
Business Case Interview Questions
These questions need knowledge on the internal working of a company. Visit their website and collect as much
information as possible on their way of operations.
Q. You are working directly with <company’s name> management team. It is organizing a project designed to increase
the revenue significantly. If you were provided with data and asked to supervise the project, what steps would you take to
ensure its success?
Q. The firm has assigned you to consult <company’s name> intending to drop a product or expand into new markets in
order to increase revenue. What steps would you take to help this company achieve its objective?
Q. You have been assigned to consult <shoe retailer’s name> with stores throughout the nation. Since its revenue is
dropping, the company has proposed to sell food at its stores. How would you advise this client?
Logic Problems
Questions involving logic problems require you to be able to perform numeracy quickly. The following are a few logic
problems.
Q1 − At 3:15, how many degrees are between two hands of a clock? Q2 − A fire fighter has to get to a burning building as
quickly as he can. There are three paths that he can take. He can take his fire engine over a large hill (5 miles) at 8 miles
per hour. He can take his fire engine through a windy road (7 miles) at 9 miles per hour. Or he can drive his fire engine
along a dirt road which is 8 miles at 12 miles per hour. Which way should he choose? Q3 − You spend 21 dollars on
vegetables at the store. You buy carrots, onions, and celery. The celery cost half the cost of the onions. The onions cost
half the cost of the carrots. How much did the onions cost?
Network Security
Network Security is the process of taking physical and software preventative measures to protect the underlying
networking infrastructure from unauthorized access, misuse, malfunction, modification, destruction, or improper
disclosure, thereby creating a secure platform for computers, users and programs to perform .
What is network security?
Network Security is an organization‘s strategy and provisions for ensuring the security of its assets and of all network
traffic. Network security is manifested in an implementation of security hardware, and software. For the purposes of this
discussion, the following approach is adopted in an effort to view network security in its entirety:
1. Policy 2. Enforcement 3. Auditing
Policy
The IT Security Policy is the principle document for network security. Its goal is to outline the rules for ensuring the
security of organizational assets. Employees today utilize several tools and applications to conduct business productively.
Policy that is driven from the organization‘s culture supports these routines and focuses on the safe enablement of these
tools to its employees. The enforcement and auditing procedures for any regulatory compliance an organization is required
to meet must be mapped out in the policy as well.
Enforcement
Most definitions of network security are narrowed to the enforcement mechanism. Enforcement concerns analyzing all
network traffic flows and should aim to preserve the confidentiality, integrity, and availability of all systems and
information on the network. These three principles compose the CIA triad:
Confidentiality - involves the protection of assets from unauthorized entities Integrity - ensuring the modification of assets is handled in a specified and authorized manner Availability - a state of the system in which authorized users have continuous access to said assets.
Strong enforcement strives to provide CIA to network traffic flows. This begins with a classification of traffic flows by
application, user, and content. As the vehicle for content, all applications must first be identified by the firewall regardless
of port, protocol, evasive tactic, or SSL. Proper application identification allows for full visibility of the content it carries.
Policy management can be simplified by identifying applications and mapping their use to a user identity while inspecting
the content at all times for the preservation of CIA.
The concept of defense in depth is observed as a best practice in network security, prescribing for the network to be
secured in layers. These layers apply an assortment of security controls to sift out threats trying to enter the network:
Access control Identification Authentication Malware detection Encryption File type filtering URL filtering
Content filtering
These layers are built through the deployment of firewalls, intrusion prevention systems (IPS), and antivirus components.
Among the components for enforcement, the firewall (an access control mechanism) is the foundation of network security.
Providing CIA of network traffic flows was difficult to accomplish with previous technologies. Traditional firewalls were
plagued by controls that relied on port/protocol to identify applications—which have since developed evasive
characteristics to bypass the controls—and the assumption that IP address equates to a users identity.
The next generation firewall retains an access control mission, but reengineers the technology; it observes all traffic across
all ports, can classify applications and their content, and identifies employees as users. This enables access controls
nuanced enough to enforce the IT security policy as it applies to each employee of the organization, with no compromise
to security.
Additional services for layering network security to implement a defense in depth strategy 8have been incorporated to the
traditional model as add-on components. Intrusion prevention systems (IPS) and antivirus, for example, are effective tools
for scanning content and preventing malware attacks. However, organizations must be cautious of the complexity and cost
that additional components may add to its network security, and more importantly, not depend on these additional
components to do the core job of the firewall.
Auditing
The auditing process of network security requires checking back on enforcement measures to determine how well they
have aligned with the security policy. Auditing encourages continuous improvement by requiring organizations to reflect
on the implementation of their policy on a consistent basis. This gives organizations the opportunity to adjust their policy
and enforcement strategy in areas of evolving need.
Introduction to Networking A basic understanding of computer networks is requisite in order to understand the principles of network security. In this section, we'll cover some of the foundations of computer networking, then move on to an overview of some popular networks. Following that, we'll take a more in-depth look at TCP/IP, the network protocol suite that is used to run the Internet and many intranets.
Once we've covered this, we'll go back and discuss some of the threats that managers and administrators of computer
networks need to confront, and then some tools that can be used to reduce the exposure to the risks of network computing.
What is a Network? A ``network'' has been defined[1] as ``any set of interlinking lines resembling a net, a network of roads || an interconnected system, a network of alliances.'' This definition suits our purpose well: a computer network is simply a system of interconnected computers. How they're connected is irrelevant, and as we'll soon see, there are a number of ways to do this.
The ISO/OSI Reference Model The International Standards Organization (ISO) Open Systems Interconnect (OSI) Reference Model defines seven layers of communications types, and the interfaces among them. (See Figure 1.) Each layer depends on the services provided by the layer below it, all the way down to the physical network hardware, such as the computer's network interface card, and the wires that connect the cards together.
An easy way to look at this is to compare this model with something we use daily: the telephone. In order for you and I to
talk when we're out of earshot, we need a device like a telephone. (In the ISO/OSI model, this is at the application layer.)
The telephones, of course, are useless unless they have the ability to translate the sound into electronic pulses that can be
transferred over wire and back again. (These functions are provided in layers below the application layer.) Finally, we get
down to the physical connection: both must be plugged into an outlet that is connected to a switch that's part of the
telephone system's network of switches.
If I place a call to you, I pick up the receiver, and dial your number. This number specifies which central office to which to
send my request, and then which phone from that central office to ring. Once you answer the phone, we begin talking, and
our session has begun. Conceptually, computer networks function exactly the same way.
It isn't important for you to memorize the ISO/OSI Reference Model's layers; but it's useful to know that they exist, and
that each layer cannot work without the services provided by the layer below it.
Figure 1: The ISO/OSI Reference Model
What are some Popular Networks? Over the last 25 years or so, a number of networks and network protocols have been defined and used. We're going to look at two of these networks, both of which are ``public'' networks. Anyone can connect to either of these networks, or they can use types of networks to connect their own hosts (computers) together, without connecting to the public networks. Each type takes a very different approach to providing network services.
UUCP
UUCP (Unix-to-Unix CoPy) was originally developed to connect Unix (surprise!) hosts together. UUCP has since been ported to many different architectures, including PCs, Macs, Amigas, Apple IIs, VMS hosts, everything else you can name, and even some things you can't. Additionally, a number of systems have been developed around the same principles as UUCP.
Batch-Oriented Processing.
UUCP and similar systems are batch-oriented systems: everything that they have to do is added to a queue, and then at some specified time, everything in the queue is processed.
Implementation Environment.
UUCP networks are commonly built using dial-up (modem) connections. This doesn't have to be the case though: UUCP can be used over any sort of connection between two computers, including an Internet connection.
Building a UUCP network is a simple matter of configuring two hosts to recognize each other, and know how to get in
touch with each other. Adding on to the network is simple; if hosts called A and B have a UUCP network between them,
and C would like to join the network, then it must be configured to talk to A and/or B. Naturally, anything that C talks to
must be made aware of C's existence before any connections will work. Now, to connect D to the network, a connection
must be established with at least one of the hosts on the network, and so on. Figure 2 shows a sample UUCP network.
Figure 2: A Sample UUCP Network
In a UUCP network, users are identified in the format host!userid. The ``!'' character (pronounced ``bang'' in networking
circles) is used to separate hosts and users. A bangpath is a string of host(s) and a userid like A!cmcurtin or
C!B!A!cmcurtin. If I am a user on host A and you are a user on host E, I might be known as A!cmcurtin and you as
E!you. Because there is no direct link between your host (E) and mine (A), in order for us to communicate, we need to do
so through a host (or hosts!) that has connectivity to both E and A. In our sample network, C has the connectivity we need.
So, to send me a file, or piece of email, you would address it to C!A!cmcurtin. Or, if you feel like taking the long way
around, you can address me as C!B!A!cmcurtin.
The ``public'' UUCP network is simply a huge worldwide network of hosts connected to each other.
Popularity.
The public UUCP network has been shrinking in size over the years, with the rise of the availability of inexpensive Internet connections. Additionally, since UUCP connections are typically made hourly, daily, or weekly, there is a fair bit of delay in getting data from one user on a UUCP network to a user on the other end of the network. UUCP isn't very flexible, as it's used for simply copying files (which can be netnews, email, documents, etc.) Interactive protocols (that make applications such as the World Wide Web possible) have become much more the norm, and are preferred in most cases.
However, there are still many people whose needs for email and netnews are served quite well by UUCP, and its
integration into the Internet has greatly reduced the amount of cumbersome addressing that had to be accomplished in
times past.
Security.
UUCP, like any other application, has security tradeoffs. Some strong points for its security is that it is fairly limited in what it can do, and it's therefore more difficult to trick into doing something it shouldn't; it's been around a long time, and most its bugs have been discovered, analyzed, and fixed; and because UUCP networks are made up of occasional connections to other hosts, it isn't possible for someone on host E to directly make contact with host B, and take advantage of that connection to do something naughty.
On the other hand, UUCP typically works by having a system-wide UUCP user account and password. Any system that
has a UUCP connection with another must know the appropriate password for the uucp or nuucp account. Identifying a
host beyond that point has traditionally been little more than a matter of trusting that the host is who it claims to be, and
that a connection is allowed at that time. More recently, there has been an additional layer of authentication, whereby both
hosts must have the same sequence number , that is a number that is incremented each time a connection is made.
Hence, if I run host B, I know the uucp password on host A. If, though, I want to impersonate host C, I'll need to connect,
identify myself as C, hope that I've done so at a time that A will allow it, and try to guess the correct sequence number for
the session. While this might not be a trivial attack, it isn't considered very secure.
The Internet
Internet: This is a word that I've heard way too often in the last few years. Movies, books, newspapers, magazines, television programs, and practically every other sort of media imaginable has dealt with the Internet recently.
What is the Internet?
The Internet is the world's largest network of networks . When you want to access the resources offered by the Internet, you don't really connect to the Internet; you connect to a network that is eventually connected to the Internet backbone , a network of extremely fast (and incredibly overloaded!) network components. This is an important point: the Internet is a network of networks -- not a network of hosts.
A simple network can be constructed using the same protocols and such that the Internet uses without actually connecting
it to anything else. Such a basic network is shown in Figure 3.
Figure 3: A Simple Local Area Network
I might be allowed to put one of my hosts on one of my employer's networks. We have a number of networks, which are
all connected together on a backbone , that is a network of our networks. Our backbone is then connected to other
networks, one of which is to an Internet Service Provider (ISP) whose backbone is connected to other networks, one of
which is the Internet backbone.
If you have a connection ``to the Internet'' through a local ISP, you are actually connecting your computer to one of their
networks, which is connected to another, and so on. To use a service from my host, such as a web server, you would tell
your web browser to connect to my host. Underlying services and protocols would send packets (small datagrams) with
your query to your ISP's network, and then a network they're connected to, and so on, until it found a path to my
employer's backbone, and to the exact network my host is on. My host would then respond appropriately, and the same
would happen in reverse: packets would traverse all of the connections until they found their way back to your computer,
and you were looking at my web page.
In Figure 4, the network shown in Figure 3 is designated ``LAN 1'' and shown in the bottom-right of the picture. This
shows how the hosts on that network are provided connectivity to other hosts on the same LAN, within the same company,
outside of the company, but in the same ISP cloud , and then from another ISP somewhere on the Internet.
Figure 4: A Wider View of Internet-connected Networks
The Internet is made up of a wide variety of hosts, from supercomputers to personal computers, including every
imaginable type of hardware and software. How do all of these computers understand each other and work together?
TCP/IP: The Language of the Internet TCP/IP (Transport Control Protocol/Internet Protocol) is the ``language'' of the Internet. Anything that can learn to ``speak TCP/IP'' can play on the Internet. This is functionality that occurs at the Network (IP) and Transport (TCP) layers in the ISO/OSI Reference Model. Consequently, a host that has TCP/IP functionality (such as Unix, OS/2, MacOS, or Windows NT) can easily support applications (such as Netscape's Navigator) that uses the network.
Open Design One of the most important features of TCP/IP isn't a technological one: The protocol is an ``open'' protocol, and anyone who wishes to implement it may do so freely. Engineers and scientists from all over the world participate in the IETF (Internet Engineering Task Force) working groups that design the protocols that make the Internet work. Their time is typically donated by their companies, and the result is work that benefits everyone.
IP As noted, IP is a ``network layer'' protocol. This is the layer that allows the hosts to actually ``talk'' to each other. Such things as carrying datagrams, mapping the Internet address (such as 10.2.3.4) to a physical network address (such as 08:00:69:0a:ca:8f), and routing, which takes care of making sure that all of the devices that have Internet connectivity can find the way to each other.
Understanding IP
IP has a number of very important features which make it an extremely robust and flexible protocol. For our purposes, though, we're going to focus on the security of IP, or more specifically, the lack thereof.
Attacks Against IP
A number of attacks against IP are possible. Typically, these exploit the fact that IP does not perform a robust mechanism for authentication , which is proving that a packet came from where it claims it did. A packet simply claims to originate from a given address, and there isn't a way to be sure that the host that sent the packet is telling the truth. This isn't necessarily a weakness, per se , but it is an important point, because it means that the facility of host authentication has to be provided at a higher layer on the ISO/OSI Reference Model. Today, applications that require strong host authentication (such as cryptographic applications) do this at the application layer.
IP Spoofing.
This is where one host claims to have the IP address of another. Since many systems (such as router access control lists) define which packets may and which packets may not pass based on the sender's IP address, this is a useful technique to an attacker: he can send packets to a host, perhaps causing it to take some sort of action.
Additionally, some applications allow login based on the IP address of the person making the request (such as the Berkeley
r-commands )[2]. These are both good examples how trusting untrustable layers can provide security that is -- at best --
weak.
IP Session Hijacking.
This is a relatively sophisticated attack, first described by Steve Bellovin [3]. This is very dangerous, however, because there are now toolkits available in the underground community that allow otherwise unskilled bad-guy-wannabes to perpetrate this attack. IP Session Hijacking is an attack whereby a user's session is taken over, being in the control of the attacker. If the user was in the middle of email, the attacker is looking at the email, and then can execute any commands he wishes as the attacked user. The attacked user simply sees his session dropped, and may simply login again, perhaps not even noticing that the attacker is still logged in and doing things.
For the description of the attack, let's return to our large network of networks in Figure 4. In this attack, a user on host A is
carrying on a session with host G. Perhaps this is a telnet session, where the user is reading his email, or using a Unix
shell account from home. Somewhere in the network between A and G sits host H which is run by a naughty person. The
naughty person on host H watches the traffic between A and G, and runs a tool which starts to impersonate A to G, and at the
same time tells A to shut up, perhaps trying to convince it that G is no longer on the net (which might happen in the event of
a crash, or major network outage). After a few seconds of this, if the attack is successful, naughty person has ``hijacked''
the session of our user. Anything that the user can do legitimately can now be done by the attacker, illegitimately. As far as
G knows, nothing has happened.
This can be solved by replacing standard telnet-type applications with encrypted versions of the same thing. In this case,
the attacker can still take over the session, but he'll see only ``gibberish'' because the session is encrypted. The attacker will
not have the needed cryptographic key(s) to decrypt the data stream from G, and will, therefore, be unable to do anything
with the session.
TCP TCP is a transport-layer protocol. It needs to sit on top of a network-layer protocol, and was designed to ride atop IP. (Just as IP was designed to carry, among other things, TCP packets.) Because TCP and IP were designed together and wherever you have one, you typically have the other, the entire suite of Internet protocols are known collectively as ``TCP/IP.'' TCP itself has a number of important features that we'll cover briefly.
Guaranteed Packet Delivery
Probably the most important is guaranteed packet delivery. Host A sending packets to host B expects to get acknowledgments back for each packet. If B does not send an acknowledgment within a specified amount of time, A will resend the packet.
Applications on host B will expect a data stream from a TCP session to be complete, and in order. As noted, if a packet is
missing, it will be resent by A, and if packets arrive out of order, B will arrange them in proper order before passing the
data to the requesting application.
This is suited well toward a number of applications, such as a telnet session. A user wants to be sure every keystroke is
received by the remote host, and that it gets every packet sent back, even if this means occasional slight delays in
responsiveness while a lost packet is resent, or while out-of-order packets are rearranged.
It is not suited well toward other applications, such as streaming audio or video, however. In these, it doesn't really matter
if a packet is lost (a lost packet in a stream of 100 won't be distinguishable) but it does matter if they arrive late (i.e.,
because of a host resending a packet presumed lost), since the data stream will be paused while the lost packet is being
resent. Once the lost packet is received, it will be put in the proper slot in the data stream, and then passed up to the
application.
UDP UDP (User Datagram Protocol) is a simple transport-layer protocol. It does not provide the same features as TCP, and is thus considered ``unreliable.'' Again, although this is unsuitable for some applications, it does have much more applicability in other applications than the more reliable and robust TCP.
Lower Overhead than TCP
One of the things that makes UDP nice is its simplicity. Because it doesn't need to keep track of the sequence of packets, whether they ever made it to their destination, etc., it has lower overhead than TCP. This is another reason why it's more suited to streaming-data applications: there's less screwing around that needs to be done with making sure all the packets are there, in the right order, and that sort of thing.
Risk Management: The Game of Security It's very important to understand that in security, one simply cannot say ``what's the best firewall?'' There are two extremes: absolute security and absolute access. The closest we can get to an absolutely secure machine is one unplugged from the network, power supply, locked in a safe, and thrown at the bottom of the ocean. Unfortunately, it isn't terribly useful in this state. A machine
with absolute access is extremely convenient to use: it's simply there, and will do whatever you tell it, without questions, authorization, passwords, or any other mechanism. Unfortunately, this isn't terribly practical, either: the Internet is a bad neighborhood now, and it isn't long before some bonehead will tell the computer to do something like self-destruct, after which, it isn't terribly useful to you.
This is no different from our daily lives. We constantly make decisions about what risks we're willing to accept. When we
get in a car and drive to work, there's a certain risk that we're taking. It's possible that something completely out of control
will cause us to become part of an accident on the highway. When we get on an airplane, we're accepting the level of risk
involved as the price of convenience. However, most people have a mental picture of what an acceptable risk is, and won't
go beyond that in most circumstances. If I happen to be upstairs at home, and want to leave for work, I'm not going to
jump out the window. Yes, it would be more convenient, but the risk of injury outweighs the advantage of convenience.
Every organization needs to decide for itself where between the two extremes of total security and total access they need to
be. A policy needs to articulate this, and then define how that will be enforced with practices and such. Everything that is
done in the name of security, then, must enforce that policy uniformly.
Types And Sources Of Network Threats Now, we've covered enough background information on networking that we can actually get into the security aspects of all of this. First of all, we'll get into the types of threats there are against networked computers, and then some things that can be done to protect yourself against various threats.
Denial-of-Service DoS (Denial-of-Service) attacks are probably the nastiest, and most difficult to address. These are the nastiest, because they're very easy to launch, difficult (sometimes impossible) to track, and it isn't easy to refuse the requests of the attacker, without also refusing legitimate requests for service.
The premise of a DoS attack is simple: send more requests to the machine than it can handle. There are toolkits available in
the underground community that make this a simple matter of running a program and telling it which host to blast with
requests. The attacker's program simply makes a connection on some service port, perhaps forging the packet's header
information that says where the packet came from, and then dropping the connection. If the host is able to answer 20
requests per second, and the attacker is sending 50 per second, obviously the host will be unable to service all of the
attacker's requests, much less any legitimate requests (hits on the web site running there, for example).
Such attacks were fairly common in late 1996 and early 1997, but are now becoming less popular.
Some things that can be done to reduce the risk of being stung by a denial of service attack include
Not running your visible-to-the-world servers at a level too close to capacity Using packet filtering to prevent obviously forged packets from entering into your network address space.
Obviously forged packets would include those that claim to come from your own hosts, addresses reserved for
private networks as defined in RFC 1918 [4], and the loopback network (127.0.0.0).
Keeping up-to-date on security-related patches for your hosts' operating systems.
Unauthorized Access ``Unauthorized access'' is a very high-level term that can refer to a number of different sorts of attacks. The goal of these attacks is to access some resource that your machine should not provide the attacker. For example, a host might be a web server, and should provide anyone with requested web pages. However, that host should not provide command shell access without being sure that the person making such a request is someone who should get it, such as a local administrator.
Executing Commands Illicitly
It's obviously undesirable for an unknown and untrusted person to be able to execute commands on your server machines. There are two main classifications of the severity of this problem: normal user access, and administrator access. A normal user can do a number of things on a system (such as read files, mail them to other people, etc.) that an attacker should not be able to do. This might, then, be all the access that an attacker needs. On the other hand, an attacker might wish to make configuration changes to a host (perhaps changing its IP address, putting a start-up script in place to cause the machine to shut down every time it's started, or something similar). In this case, the attacker will need to gain administrator privileges on the host.
Confidentiality Breaches
We need to examine the threat model: what is it that you're trying to protect yourself against? There is certain information that could be quite damaging if it fell into the hands of a competitor, an enemy, or the public. In these cases, it's possible that compromise of a normal user's account on the machine can be enough to cause damage (perhaps in the form of PR, or obtaining information that can be used against the company, etc.)
While many of the perpetrators of these sorts of break-ins are merely thrill-seekers interested in nothing more than to see a
shell prompt for your computer on their screen, there are those who are more malicious, as we'll consider next.
(Additionally, keep in mind that it's possible that someone who is normally interested in nothing more than the thrill could
be persuaded to do more: perhaps an unscrupulous competitor is willing to hire such a person to hurt you.)
Destructive Behavior
Among the destructive sorts of break-ins and attacks, there are two major categories.
Data Diddling.
The data diddler is likely the worst sort, since the fact of a break-in might not be immediately obvious. Perhaps he's toying with the numbers in your spreadsheets, or changing the dates in your projections and plans. Maybe he's changing the account numbers for the auto-deposit of certain paychecks. In any case, rare is the case when you'll come in to work one day, and simply know that something is wrong. An accounting procedure might turn up a discrepancy in the books three or four months after the fact. Trying to track the problem down will certainly be difficult, and once that problem is discovered, how can any of your numbers from that time period be trusted? How far back do you have to go before you think that your data is safe?
Data Destruction.
Some of those perpetrate attacks are simply twisted jerks who like to delete things. In these cases, the impact on your computing capability -- and consequently your business -- can be nothing less than if a fire or other disaster caused your computing equipment to be completely destroyed.
Where Do They Come From? How, though, does an attacker gain access to your equipment? Through any connection that you have to the outside world. This includes Internet connections, dial-up modems, and even physical access. (How do you know that one of the temps that you've brought in to help with the data entry isn't really a system cracker looking for passwords, data phone numbers, vulnerabilities and anything else that can get him access to your equipment?)
In order to be able to adequately address security, all possible avenues of entry must be identified and evaluated. The
security of that entry point must be consistent with your stated policy on acceptable risk levels.
Lessons Learned From looking at the sorts of attacks that are common, we can divine a relatively short list of high-level practices that can help prevent security disasters, and to help control the damage in the event that preventative measures were unsuccessful in warding off an attack.
Hope you have backups
This isn't just a good idea from a security point of view. Operational requirements should dictate the backup policy, and this should be closely coordinated with a disaster recovery plan, such that if an airplane crashes into your building one night, you'll be able to carry on your business from another location. Similarly, these can be useful in recovering your data in the event of an electronic disaster: a hardware failure, or a breakin that changes or otherwise damages your data.
Don't put data where it doesn't need to be
Although this should go without saying, this doesn't occur to lots of folks. As a result, information that doesn't need to be accessible from the outside world sometimes is, and this can needlessly increase the severity of a break-in dramatically.
Avoid systems with single points of failure
Any security system that can be broken by breaking through any one component isn't really very strong. In security, a degree of redundancy is good, and can help you protect your organization from a minor security breach becoming a catastrophe.
Stay current with relevant operating system patches
Be sure that someone who knows what you've got is watching the vendors' security advisories. Exploiting old bugs is still one of the most common (and most effective!) means of breaking into systems.
Watch for relevant security advisories
In addition to watching what the vendors are saying, keep a close watch on groups like CERT and CIAC. Make sure that at least one person (preferably more) is subscribed to these mailing lists
Have someone on staff be familiar with security practices
Having at least one person who is charged with keeping abreast of security developments is a good idea. This need not be a technical wizard, but could be someone who is simply able to read advisories issued by various incident response teams, and keep track of various problems that arise. Such a person would then be a wise one to consult with on security related issues, as he'll be the one who knows if web server software version such-and-such has any known problems, etc.
This person should also know the ``dos'' and ``don'ts'' of security, from reading such things as the ``Site Security
Handbook.''[5]
Firewalls As we've seen in our discussion of the Internet and similar networks, connecting an organization to the Internet provides a two-way flow of traffic. This is clearly undesirable in many organizations, as proprietary information is often displayed freely within a corporate intranet (that is, a TCP/IP network, modeled after the Internet that only works within the organization).
In order to provide some level of separation between an organization's intranet and the Internet, firewalls have been
employed. A firewall is simply a group of components that collectively form a barrier between two networks.
A number of terms specific to firewalls and networking are going to be used throughout this section, so let's introduce
them all together.
Bastion host. A general-purpose computer used to control access between the internal (private) network (intranet) and the Internet (or any other untrusted network). Typically, these are hosts running a flavor of the Unix operating system that has been customized in order to reduce its functionality to only what is necessary in order to support its functions. Many of the general-purpose features have been turned off, and in many cases, completely removed, in order to improve the security of the machine.
Router.
A special purpose computer for connecting networks together. Routers also handle certain functions, such as routing , or managing the traffic on the networks they connect.
Access Control List (ACL). Many routers now have the ability to selectively perform their duties, based on a number of facts about a packet that comes to it. This includes things like origination address, destination address, destination service port, and so on. These can be employed to limit the sorts of packets that are allowed to come in and go out of a given network.
Demilitarized Zone (DMZ). The DMZ is a critical part of a firewall: it is a network that is neither part of the untrusted network, nor part of the trusted network. But, this is a network that connects the untrusted to the trusted. The importance of a DMZ is tremendous: someone who breaks into your network from the Internet should have to get through several layers in order to successfully do so. Those layers are provided by various components within the DMZ.
Proxy. This is the process of having one host act in behalf of another. A host that has the ability to fetch documents from the Internet might be configured as a proxy server , and host on the intranet might be configured to be proxy clients . In this situation, when a host on the intranet wishes to fetch the <http://www.interhack.net/> web page, for example, the browser will make a connection to the proxy server, and request the given URL. The proxy server will fetch the document, and return the result to the client. In this way, all hosts on the intranet are able to access resources on the Internet without having the ability to direct talk to the Internet.
Types of Firewalls There are three basic types of firewalls, and we'll consider each of them.
Application Gateways
The first firewalls were application gateways, and are sometimes known as proxy gateways. These are made up of bastion hosts that run special software to act as a proxy server. This software runs at the Application Layer of our old friend the ISO/OSI Reference Model, hence the name. Clients behind the firewall must be proxitized (that is, must know how to use the proxy, and be configured to do so) in order to use Internet services. Traditionally, these have been the most secure, because they don't allow anything to pass by default, but need to have the programs written and turned on in order to begin passing traffic.
Figure 5: A sample application gateway
These are also typically the slowest, because more processes need to be started in order to have a request serviced. Figure 5
shows a application gateway.
Packet Filtering
Packet filtering is a technique whereby routers have ACLs (Access Control Lists) turned on. By default, a router will pass all traffic sent it, and will do so without any sort of restrictions. Employing ACLs is a method for enforcing your security policy with regard to what sorts of access you allow the outside world to have to your internal network, and vice versa.
There is less overhead in packet filtering than with an application gateway, because the feature of access control is
performed at a lower ISO/OSI layer (typically, the transport or session layer). Due to the lower overhead and the fact that
packet filtering is done with routers, which are specialized computers optimized for tasks related to networking, a packet
filtering gateway is often much faster than its application layer cousins. Figure 6 shows a packet filtering gateway.
Because we're working at a lower level, supporting new applications either comes automatically, or is a simple matter of
allowing a specific packet type to pass through the gateway. (Not that the possibility of something automatically makes it a
good idea; opening things up this way might very well compromise your level of security below what your policy allows.)
There are problems with this method, though. Remember, TCP/IP has absolutely no means of guaranteeing that the source
address is really what it claims to be. As a result, we have to use layers of packet filters in order to localize the traffic. We
can't get all the way down to the actual host, but with two layers of packet filters, we can differentiate between a packet
that came from the Internet and one that came from our internal network. We can identify which network the packet came
from with certainty, but we can't get more specific than that.
Hybrid Systems
In an attempt to marry the security of the application layer gateways with the flexibility and speed of packet filtering, some vendors have created systems that use the principles of both.
Figure 6: A sample packet filtering gateway
In some of these systems, new connections must be authenticated and approved at the application layer. Once this has been
done, the remainder of the connection is passed down to the session layer, where packet filters watch the connection to
ensure that only packets that are part of an ongoing (already authenticated and approved) conversation are being passed.
Other possibilities include using both packet filtering and application layer proxies. The benefits here include providing a
measure of protection against your machines that provide services to the Internet (such as a public web server), as well as
provide the security of an application layer gateway to the internal network. Additionally, using this method, an attacker, in
order to get to services on the internal network, will have to break through the access router, the bastion host, and the
choke router.
So, what's best for me? Lots of options are available, and it makes sense to spend some time with an expert, either in-house, or an experienced consultant who can take the time to understand your organization's security policy, and can design and build a firewall architecture that best implements that policy. Other issues like services required, convenience, and scalability might factor in to the final design.
Some Words of Caution The business of building firewalls is in the process of becoming a commodity market. Along with commodity markets come lots of folks who are looking for a way to make a buck without necessarily knowing what they're doing. Additionally, vendors compete with each other to try and claim the greatest security, the easiest to administer, and the least visible to end users. In order to try to quantify the potential security of firewalls, some organizations have taken to firewall certifications. The certification of a firewall means nothing more than the fact that it can be configured in such a way that it can pass a series of tests. Similarly, claims about meeting or exceeding U.S. Department of Defense ``Orange Book'' standards, C-2, B-1, and such all simply mean that an organization was able to configure a machine to pass a series of tests. This doesn't mean that it was loaded with the vendor's software at the time, or that the machine was even usable. In fact, one vendor has been claiming their operating system is ``C-2 Certified'' didn't make mention of the fact that their operating system only passed the C-2 tests without being connected to any sort of network devices.
Such gauges as market share, certification, and the like are no guarantees of security or quality. Taking a little bit of time
to talk to some knowledgeable folks can go a long way in providing you a comfortable level of security between your
private network and the big, bad Internet.
Additionally, it's important to note that many consultants these days have become much less the advocate of their clients,
and more of an extension of the vendor. Ask any consultants you talk to about their vendor affiliations, certifications, and
whatnot. Ask what difference it makes to them whether you choose one product over another, and vice versa. And then ask
yourself if a consultant who is certified in technology XYZ is going to provide you with competing technology ABC, even
if ABC best fits your needs.
Single Points of Failure
Many ``firewalls'' are sold as a single component: a bastion host, or some other black box that you plug your networks into and get a warm-fuzzy, feeling safe and secure. The term ``firewall'' refers to a number of components that collectively provide the security of the system. Any time there is only one component paying attention to what's going on between the internal and external networks, an attacker has only one thing to break (or fool!) in order to gain complete access to your internal networks.
See the Internet Firewalls FAQ for more details on building and maintaining firewalls.
Secure Network Devices It's important to remember that the firewall is only one entry point to your network. Modems, if you allow them to answer incoming calls, can provide an easy means for an attacker to sneak around (rather than through ) your front door (or, firewall). Just as castles weren't built with moats only in the front, your network needs to be protected at all of its entry points.
Secure Modems; Dial-Back Systems If modem access is to be provided, this should be guarded carefully. The terminal server , or network device that provides dial-up access to your network needs to be actively administered, and its logs need to be examined for strange behavior. Its passwords need to be strong -- not ones that can be guessed. Accounts that aren't actively used should be disabled. In short, it's the easiest way to get into your network from remote: guard it carefully.
There are some remote access systems that have the feature of a two-part procedure to establish a connection. The first part
is the remote user dialing into the system, and providing the correct userid and password. The system will then drop the
connection, and call the authenticated user back at a known telephone number. Once the remote user's system answers that
call, the connection is established, and the user is on the network. This works well for folks working at home, but can be
problematic for users wishing to dial in from hotel rooms and such when on business trips.
Other possibilities include one-time password schemes, where the user enters his userid, and is presented with a
``challenge,'' a string of between six and eight numbers. He types this challenge into a small device that he carries with him
that looks like a calculator. He then presses enter, and a ``response'' is displayed on the LCD screen. The user types the
response, and if all is correct, he login will proceed. These are useful devices for solving the problem of good passwords,
without requiring dial-back access. However, these have their own problems, as they require the user to carry them, and
they must be tracked, much like building and office keys.
No doubt many other schemes exist. Take a look at your options, and find out how what the vendors have to offer will help
you enforce your security policy effectively.
Crypto-Capable Routers A feature that is being built into some routers is the ability to use session encryption between specified routers. Because traffic traveling across the Internet can be seen by people in the middle who have the resources (and time) to snoop around, these are advantageous for providing connectivity between two sites, such that there can be secure routes.
See the Snake Oil FAQ [6] for a description of cryptography, ideas for evaluating cryptographic products, and how to
determine which will most likely meet your needs.
Virtual Private Networks Given the ubiquity of the Internet, and the considerable expense in private leased lines, many organizations have been building VPNs (Virtual Private Networks). Traditionally, for an organization to provide connectivity between a main office and a satellite one, an expensive data line had to be leased in order to provide direct connectivity between the two offices. Now, a solution that is often more economical is to provide both offices connectivity to the Internet. Then, using the Internet as the medium, the two offices can communicate.
The danger in doing this, of course, is that there is no privacy on this channel, and it's difficult to provide the other office
access to ``internal'' resources without providing those resources to everyone on the Internet.
VPNs provide the ability for two offices to communicate with each other in such a way that it looks like they're directly
connected over a private leased line. The session between them, although going over the Internet, is private (because the
link is encrypted), and the link is convenient, because each can see each others' internal resources without showing them
off to the entire world.
A number of firewall vendors are including the ability to build VPNs in their offerings, either directly with their base
product, or as an add-on. If you have need to connect several offices together, this might very well be the best way to do it.
Conclusions Security is a very difficult topic. Everyone has a different idea of what ``security'' is, and what levels of risk are acceptable. The key for building a secure network is to define what security means to your organization . Once that has been defined, everything that goes on with the network can be evaluated with respect to that policy. Projects and systems can then be broken down into their components, and it becomes much simpler to decide whether what is proposed will conflict with your security policies and practices.
Many people pay great amounts of lip service to security, but do not want to be bothered with it when it gets in their way.
It's important to build systems and networks in such a way that the user is not constantly reminded of the security system
around him. Users who find security policies and systems too restrictive will find ways around them. It's important to get
their feedback to understand what can be improved, and it's important to let them know why what's been done has been, the
sorts of risks that are deemed unacceptable, and what has been done to minimize the organization's exposure to them.
Security is everybody's business, and only with everyone's cooperation, an intelligent policy, and consistent practices, will
it be achievable.
Session in Java
Calling request.getSession(false) or simply request.getSession() will return null in the event the session ID is not found or
the session ID refers to an invalid session. There is a single HTTP session by visit, as Java session cookies are not stored
permanently in the browser.
Tokens are the various Java program elements which are identified by the compiler. A token is the smallest element of a
program that is meaningful to the compiler. Tokens supported in Java include keywords, variables, constants, special
characters, operations etc.
Android The Android SDK includes a mobile device emulator — a virtual mobile device that runs on your computer. The
emulator lets you develop and test Android applications without using a physical device.
Mobile application development is a term used to denote the act or process by which application software is developed
for mobile devices, such as personal digital assistants, enterprise digital assistants or mobile phones. These applications
can be pre-installed on phones during manufacturing platforms, or delivered as web applications using server-side or
client-side processing (e.g., JavaScript) to provide an "application-like" experience within a Web browser. Application
software developers also must consider a long array of screen sizes, hardware specifications, and configurations because of
intense competition in mobile software and changes within each of the platforms.[1]
Mobile app development has been
steadily growing, in revenues and jobs created. A 2013 analyst report estimates there are 529,000 direct app economy jobs
within the EU 28 members, 60% of which are mobile app developers.[2]
As part of the development process, mobile user interface (UI) design is also essential in the creation of mobile apps.
Mobile UI considers constraints, contexts, screen, input, and mobility as outlines for design. The user is often the focus of
interaction with their device, and the interface entails components of both hardware and software. User input allows for the
users to manipulate a system, and device's output allows the system to indicate the effects of the users' manipulation.
Mobile UI design constraints include limited attention and form factors, such as a mobile device's screen size for a user's
hand(s). Mobile UI contexts signal cues from user activity, such as location and scheduling that can be shown from user
interactions within a mobile application. Overall, mobile UI design's goal is mainly for an understandable, user-friendly
interface. The UI of mobile apps should: consider users' limited attention, minimize keystrokes, and be task-oriented with a
minimum set of functions. This functionality is supported by mobile enterprise application platforms or integrated
development environments (IDEs).
Mobile UIs, or front-ends, rely on mobile back-ends to support access to enterprise systems. The mobile back-end
facilitates data routing, security, authentication, authorization, working off-line, and service orchestration. This
functionality is supported by a mix of middleware components including mobile application servers, mobile backend as a
service (MBaaS), and service-oriented architecture (SOA) infrastructure.
Criteria for selecting a development platform usually contains the target mobile platforms, existing infrastructure and
development skills. When targeting more than one platform with cross-platform development it is also important to
consider the impact of the tool on the user experience. Performance is another important criteria, as research on mobile
applications indicates a strong correlation between application performance and user satisfaction. Along with performance
and other criteria, the availability of the technology and the project's requirement may drive the development between
native and cross-platform environments. To aid the choice between native and cross-platform environments, some
guidelines and benchmarks have been published. Typically, cross-platform environments are reusable across multiple
platforms, leveraging a native container while using HTML, CSS, and JavaScript for the user interface. In contrast, native
environments are targeted at one platform for each of those environments. For example, Android development occurs in
the Eclipse IDE using Android Developer Tools (ADT) plugins, Apple iOS development occurs using Xcode IDE with
Objective-C and/or Swift, Windows and BlackBerry each have their own development environments.
Mobile application testing
Mobile applications are first tested within the development environment using emulators and later subjected to field
testing. Emulators provide an inexpensive way to test applications on mobile phones to which developers may not have
physical access. The following are examples of tools used for testing application across the most popular mobile operating
systems.
Google Android Emulator - an Android emulator that is patched to run on a Windows PC as a standalone app, without having to download and install the complete and complex Android SDK. It can be installed and Android compatible apps can be tested on it.
The official Android SDK Emulator - a mobile device emulator which mimics all of the hardware and software features of a typical mobile device (without the calls).
MobiOne Developer - a mobile Web integrated development environment (IDE) for Windows that helps developers to code, test, debug, package and deploy mobile Web applications to devices such as iPhone, BlackBerry, Android, and the Palm Pre. MobiOne Developer was officially declared End of Life by the end of 2014.[citation needed]
TestiPhone - a web browser-based simulator for quickly testing iPhone web applications. This tool has been tested and works using Internet Explorer 7, Firefox 2 and Safari 3.
iPhoney - gives a pixel-accurate web browsing environment and it is powered by Safari. It can be used while developing web sites for the iPhone. It is not an iPhone simulator but instead is designed for web developers who want to create 320 by 480 (or 480 by 320) websites for use with iPhone. iPhoney will only run on OS X 10.4.7 or later.
BlackBerry Simulator - There are a variety of official BlackBerry simulators available to emulate the functionality of actual BlackBerry products and test how the device software, screen, keyboard and trackwheel will work with application.
Windows UI Automation - To test applications that use the Microsoft UI Automation technology, it requires Windows Automation API 3.0. It is pre-installed on Windows 7, Windows Server 2008 R2 and later versions of Windows. On other operating systems, you can install using Windows Update or download it from the Microsoft Web site.
Tools include
eggPlant: A GUI-based automated test tool for mobile application across all operating systems and devices. Ranorex: Test automation tools for mobile, web and desktop apps. Testdroid: Real mobile devices and test automation tools for testing mobile and web apps.
Front-end development tools
Front-end development tools are focused on the user interface and user experience (UI-UX) and provide the following
abilities:
UI design tools
SDKs to access device features
Cross-platform accommodations/support
Back-end servers
Back-end tools pick up where the front-end tools leave off, and provide a set of reusable services that are centrally
managed and controlled and provide the following abilities:
Integration with back-end systems
User authentication-authorization
Data services
Reusable business logic
Security add-on layers
With bring your own device (BYOD) becoming the norm within more enterprises, IT departments often need stop-gap,
tactical solutions that layer atop existing apps, phones, and platform component. Features include
App wrapping for security Data encryption Client actions
Reporting and statistics
The Project Management Life Cycle has four phases: Initiation, Planning, Execution and Closure. Each project life
cycle phase is described below, along with the tasks needed to complete it. You can click the links provided, to view more
detailed information on the project management life cycle
Project Management Life Cycle
The Project Management Life Cycle has four phases: Initiation, Planning, Execution and Closure. Each project life cycle
phase is described below, along with the tasks needed to complete it. You can click the links provided, to view more
detailed information on the project management life cycle.
Develop a Business Case Undertake a Feasibility Study Establish the Project Charter Appoint the Project Team Set up the Project Office Perform Phase Review
Create a Project Plan Create a Resource Plan Create a Financial Plan Create a Quality Plan Create a Risk Plan Create an Acceptance Plan Create a Communications Plan Create a Procurement Plan Contract the Suppliers
Define the Tender Process Issue a Statement of Work Issue a Request for Information Issue a Request for Proposal Create Supplier Contract Perform Phase Review
Build Deliverables Monitor and Control
Perform Time Management Perform Cost Management Perform Quality Management Perform Change Management Perform Risk Management Perform Issue Management Perform Procurement Management Perform Acceptance Management Perform Communications Management
Perform Project Closure Review Project Completion
The Project Management Template kit contains all of the tools and templates you need, to complete the project
management life cycle. It also contains a free Project Management Book to help you manage projects. It takes you through
the project lifecycle step-by-step, helping you to deliver projects on time and within budget.
It's also unique, because it:
Applies to all project types and industries Is used to manage projects of any size Gives you the complete set of project templates Explains every step in the project lifecycle in depth!
The Project Management Kit helps:
Project Managers to deliver projects Consultants to manage client projects Trainers to teach project management Students to learn how to manage projects Project Offices to monitor and control projects Senior Managers to improve the success of projects.
Project Initiation Phase
The Project Initiation Phase is the 1st phase in the Project Management Life Cycle, as it involves starting up a new
project. You can start a new project by defining its objectives, scope, purpose and deliverables to be produced. You'll also
hire your project team, setup the Project Office and review the project, to gain approval to begin the next phase.
Overall, there are six key steps that you need to take to properly initiate a new project. These Project Initiation steps and
their corresponding templates are shown in the following diagram. Click each link below, to learn how Method123
templates help you to initiate projects.
Activities
1.Develop a Business Case
2.Undertake a Feasibility Study
3.Establish the Project Charter
4.Appoint the Project Team
5.Set up the Project Office
6.Perform a Phase Review
Templates
Business Case Feasibility Study Project Charter Job Description Project Office Checklist Phase Review Form
The Project Initiation Phase is the most crucial phase in the Project Life Cycle, as it's the phase in which you define your
scope and hire your team.
Project Planning Phase
The Project Planning Phase is the second phase in the project life cycle. It involves creating of a set of plans to help
guide your team through the execution and closure phases of the project.
The plans created during this phase will help you to manage time, cost, quality, change, risk and issues. They will also help
you manage staff and external suppliers, to ensure that you deliver the project on time and within budget.
There are 10 Project Planning steps you need to take to complete the Project Planning Phase efficiently. These steps and
the templates needed to perform them, are shown in the following diagram.
Click each link in the diagram below, to learn how these templates will help you to plan projects efficiently.
Activities
1.
Create a
Project Plan
2.
Create a
Resource Plan
3.
Create a
Financial Plan
4.
Create a
Quality Plan
5.
Create a
Risk Plan
6.
Create a
Acceptance Plan
7.
Create a
Communications Plan
8.
Create a
Procurement Plan
9.
Contract the
Suppliers
10.
Perform a
Phase Review
Templates
Project Plan Resource Plan Financial Plan Quality Plan Risk Plan Acceptance Plan Communications Plan Procurement Plan Tender Process Statement of Work Request for Information Request for Proposal Supplier Contract Tender Register Phase Review Form
The Project Planning Phase is often the most challenging phase for a Project Manager, as you need to make an educated
guess of the staff, resources and equipment needed to complete your project. You may also need to plan your
communications and procurement activities, as well as contract any 3rd party suppliers.
Project Execution Phase
The Project Execution Phase is the third phase in the project life cycle. In this phase, you will build the physical project
deliverables and present them to your customer for signoff. The Project Execution Phase is usually the longest phase in the
project life cycle and it typically consumes the most energy and the most resources.
To enable you to monitor and control the project during this phase, you will need to implement a range of management
processes. These processes help you to manage time, cost, quality, change, risks and issues. They also help you to manage
procurement, customer acceptance and communications.
The project management activities and templates which help you complete them are shown in the following diagram. Click
the links below to learn how these templates help you to execute projects more efficiently than before
Activities
1.
Perform
Time Management
2.
Perform
Cost Management
3.
Perform
Quality Management
4.
Perform
Change Management
5.
Perform
Risk Management
6.
Perform
Issue
Management
7.
Perform
Procurement
Management
8.
Perform
Acceptance
Management
9.
Perform
Communications
Management
10.
Perform a
Phase Review
Templates
Time Management Process Timesheet Form Timesheet Register Cost Management Process Expense Form Expense Register Quality Management Process Quality Review Form Deliverables Register Change Management Process Change Request Form Change Register Risk Management Process Risk Form Risk Register Issue Management Process Issue Form Issue Register Procurement Management Process Purchase Order Form Procurement Register Acceptance Management Process Acceptance Form Acceptance Register Communications Management Process Project Status Report Communications Register Phase Review Form
By using these templates to monitor and control the Project Execution Phase, you will improve your chances of delivering
your project on time and within budget.
Project Closure Phase
The Project Closure Phase is the fourth and last phase in the project life cycle. In this phase, you will formally close your
project and then report its overall level of success to your sponsor.
Project Closure involves handing over the deliverables to your customer, passing the documentation to the business,
cancelling supplier contracts, releasing staff and equipment, and informing stakeholders of the closure of the project.
After the project has been closed, a Post Implementation Review is completed to determine the projects success and
identify the lessons learned.
The activities taken to close a project and the templates which help you to complete each activity, are shown in the
following diagram. Click the links below to learn how these templates can help you to close projects efficiently.
Activities
1.
Perform
Project Closure
2.
Review
Project Completion
Templates
Project Closure Report Post Implementation Review
The first step taken when closing a project is to create a Project Closure Report. It is extremely important that you list
every activity required to close the project within this Project Closure report, to ensure that project closure is completed
smoothly and efficiently. Once the report has been approved by your sponsor, the closure activities stated in the report are
actioned.
Between one and three months after the project has been closed and the business has begun to experience the benefits
provided by the project, you need to complete a Post Implementation Review. This review allows the business to identify
the level of success of the project and list any lessons learned for future projects.
Data are simply facts or figures — bits of information, but not information itself. When data are processed, interpreted,
organized, structured or presented so as to make them meaningful or useful, they are called information. Information
provides context for data.
A digital signature, on the other hand, refers to the encryption / decryption technology on which an electronic signature
solution is built. ... Rather, digital signature encryption secures the data associated with a signed document and helps
verify the authenticity of a signed record.Dec 11, 2013
The Internet Protocol version 4 (IPv4) is a protocol for use on packet-switched Link Layer networks (e.g. Ethernet). IPv4provides an addressing capability of approximately 4.3 billion addresses. The Internet Protocol version 6 (IPv6) is more advanced and has better features compared to IPv4.
IPv4 and IPv6 are two generations of Internet Protocols where IPv4 stands for Internet Protocol version 4 and IPv6 for Internet Protocol version 6.
IPv4 is a protocol for use on packet-switched Link Layer networks (e.g. Ethernet). It is one of the core protocols of standards-based inter-networking methods in the Internet, and was the first version deployed for production in the ARPANET in 1983. IPv4 uses 32-bit source and destination address fields which limits the address space to 4.3 billion addresses. This limitation stimulated the development of IPv6 in the 1990s.
IPv6 is more advanced and has better features compared to IPv4. It has the capability to provide an infinite number of addresses. It is replacing IPv4 to accommodate the growing number of networks worldwide and help solve the IP address exhaustion problem. IPv6 was developed by the Internet Engineering Task Force (IETF).
II.10 DIFFERENCE BETWEEN IPv4 AND IPv6
IPv6 is based on IPv4; it is an evolution of IPv4. So many things that we find with IPv6 are familiar to us. The main differences are illustrated in the table below:
IPv4 IPv6
The size of an address in IPv4 is 32 bits The size of an address in IPv6 is 128 bits
Address Shortages:
IPv4 supports 4.3×109 (4.3 billion) addresses, which is
inadequate to give one (or more if they possess more than one
device) to every living person.
Larger address space:
IPv6 supports 3.4×1038
addresses, or 5×1028
(50
octillion) for each of the roughly 6.5 billion people
alive today.33(*)
IPv4 header has 20 bytes
IPv4 header has many fields (13 fields)
IPv6 header is the double, it has 40 bytes
IPv6 header has fewer fields, it has 8 fields.
IPv4 is subdivided into classes <A-E>. IPv6 is classless.
IPv6 uses a prefix and an Identifier ID known as IPv4
network
IPv4 address uses a subnet mask. IPv6 uses a prefix length.
IPv4 has lack of security.
IPv4 was never designed to be secure
- Originally designed for an isolated military network
- Then adapted for a public educational & research network
IPv6 has a built-in strong security
- Encryption
- Authentication
ISP have IPv4 connectivity or have both IPv4 and IPv6 Many ISP don't have IPv6 connectivity
Non equal geographical distribution (>50% USA) No geographic limitation
Clearly, both are means of communications. The diference is that Web Servicealmost always involves communication over network and HTTP is the most commonly used protocol. Web service also uses SOAP, REST, and XML-RPC as a means of communication. ... All Web Services are API but APIs are not Web Services.
API vs Web Service
API and Web service serve as a means of communication. The only difference is that a
Web service facilitates interaction between two machines over a network. An API acts as
an interface between two different applications so that they can communicate with each
other. An API is a method by which the third-party vendors can write programs that
interface easily with other programs. A Web service is designed to have an interface that
is depicted in a machine-processable format usually specified in Web Service Description
Language (WSDL). Typically, “HTTP” is the most commonly used protocol for
communication. Web service also uses SOAP, REST, and XML-RPC as a means of
communication. API may use any means of communication to initiate interaction
between applications. For example, the system calls are invoked using interrupts by
the Linux kernel API.
An API exactly defines the methods for one software program to interact with the other.
When this action involves sending data over a network, Web services come into the
picture. An API generally involves calling functions from within a software program.
In case of Web applications, the API used is web based. Desktop applications such as
spreadsheets and word documents use VBA and COM-based APIs which don’t involve
Web service. A server application such as Joomla may use a PHP-based API present
within the server which doesn’t require Web service.
A Web service is merely an API wrapped in HTTP. An API doesn’t always need to be web
based. An API consists of a complete set of rules and specifications for a software
program to follow in order to facilitate interaction. A Web service might not contain a
complete set of specifications and sometimes might not be able to perform all the tasks
that may be possible from a complete API.
The APIs can be exposed in a number of ways which include: COM objects, DLL and .H
files in C/C++ programming language, JAR files or RMI in Java, XML over HTTP, JSON
over HTTP, etc. The method used by Web service to expose the API is strictly through a
network.
Summary:
1. All Web services are APIs but all APIs are not Web services.
2. Web services might not perform all the operations that an API would perform.
3. A Web service uses only three styles of use: SOAP, REST and XML-RPC for
communication whereas API may use any style for communication.
4. A Web service always needs a network for its operation whereas an API doesn’t need
a network for its operation.
5. An API facilitates interfacing directly with an application whereas a Web service is a
XML vs. XSD
XML, or the Extensible Markup Language, is a standard or set of rules that governs the
encoding of documents into an electronic format. XML goes hand in hand with HTML in
internet usage. XML defines the structure of the document, but not the way the document
is displayed; this is handled by HTML. XSD stands for XML Schema Document, and is
one of the several XML schema languages that define what could be included inside the
document. An aspect of XSD that people find to be one of its strengths, is that it’s written
in XML. This means that users who know XML are already familiar with XSD,
eliminating the need to learn another language.
XML does not define any elements or tags that are usable within your document. You can
create any tag to describe any element on your XML document, as long as you follow the
correct structure. An XSD defines elements that can be used in the documents, relating to
the actual data with which it is to be encoded. Another positive aspect of having defined
elements and data types, is that the information will be properly interpreted. This is
because the sender and the receiver know the format of the content. A good example of
this, is the date. A date that is expressed as 1/12/2010 can either mean January 12 or
December 1st. Declaring a date data type in an XSD document, ensures that it follows the
format dictated by XSD.
As an XSD document still follows the XML structure, it is still validated as an XML
document. In fact, you can use XML parsers to parse XSD documents, and it will perform
flawlessly, and produce the right information from the file. The reverse is not necessarily
true, as an XML document may contain elements that an XSD parser may not recognize.
XML only checks how well-formed the document is. This can be a problem, as a well-
formed document can still contain errors. XSD validating software often catches the
errors that XML validating software might miss.
Summary:
1. XSD is based and written on XML.
2. XSD defines elements and structures that can appear in the document, while XML
does not.
3. XSD ensures that the data is properly interpreted, while XML does not.
4. An XSD document is validated as XML, but the opposite may not always be true.
5. XSD is better at catching errors than XML
SOAP (Simple Object Access Protocol) and REST (Representation State Transfer) are popular with developers working on system integration based projects. Software architects will design the application from various perspectives and also decides, based on various reasons, which approach to take to expose new API to third party applications. As a software architect, it is good practice to involve your development team lead during system architecture process. This article, based on my experience, will discuss when to use SOAP or REST web services to expose your API to third party clients.
Web Services Demystified Web services are part of the Services Oriented Architecture. Web services are used as the model for process decomposition and assembly. I have been involved in discussion where there were some misconception between web services and web API.
The W3C defines a Web Service generally as:
A software system designed to support interoperable machine-to-machine interaction over a network.
Web API also known as Server-Side Web API is a programmatic interface to a defined request-response message system, typically expressed in JSON or XML, which is exposed
via the web – most commonly by means of an HTTP-based web server. (extracted from Wikipedia)
Based on the above definition, one can insinuate when SOAP should be used instead of REST and vice-versa but it is not as simple as it looks. We can agree that Web Services are not the same as Web API. Accessing an image over the web is not calling a web service but retrieving a web resources using is Universal Resource Identifier. HTML has a well-defined standard approach to serving resources to clients and does not require the use of web service
in order to fulfill their request.
Why Use REST over SOAP Developers are passionate people. Let's briefly analyze some of the reasons they mentioned when considering REST over SOAP:
REST is easier than SOAP I'm not sure what developers refer to when they argue that REST is easier than SOAP. Based on my experience, depending on the requirement, developing REST services can quickly become very complex just as any other SOA projects. What is your service abstracting from the client? What is the level of security required? Is your service a long running asynchronous process? And many other requirements will increase the level of complexity. Testability: apparently it easier to test RESTFul web services than their SOAP counter parts. This is only partially true; for simple REST services, developers only have to point their browser to the service endpoints and a result would be returned in the response. But what happens once you need to add the HTTP headers and passing of tokens, parameters validation… This is still testable but chances are you will require a plugin for your browser in order to test those features. If a plugin is required then the ease of testing is exactly the same as using SOAPUI for testing SOAP based services.
RESTFul Web Services serves JSON that is faster to parse than XML This so called "benefit" is related to consuming web services in a browser. RESTFul web services can also serve XML and any MIME type that you desire. This article is not focused on discussing JSON vs XML; and I wouldn't write any separate article on the topic. JSON relates to JavaScript and as JS is very closed to the web, as in providing interaction on the web with HTML and CSS, most developers automatically assumes that it also linked to interacting with RESTFul web services. If you didn't know before, I'm sure that you can guess that RESTFul web services are language agnostic.
Regarding the speed in processing the XML markup as opposed to JSON, a performance test conducted by David Lead, Lead Engineer at MarkLogic Inc, find out to be a myth.
REST is built for the Web Well this is true according to Roy Fielding dissertation; after all he is credited with the creation of REST style architecture. REST, unlike SOAP, uses the underlying technology for transport and communication between clients and servers. The architecture style is optimized for the modern web architecture. The web has outgrown is initial requirements and this can be seen through HTML5 and web sockets standardization. The web has become a platform on its own right, maybe WebOS. Some applications will require server-side state saving such as financial applications to e-commerce.
Caching When using REST over HTTP, it will utilize the features available in HTTP such as caching, security in terms of TLS and authentication. Architects know that dynamic resources should not be cached. Let's discuss this with an example; we have a RESTFul web service to serve us some stock quotes when provided with a stock ticker. Stock quotes changes per milliseconds, if we make a request for BARC (Barclays Bank), there is a chance that the quote that we have receive a minute ago would be different in two minutes. This shows that we cannot always use the caching features implemented in the protocol. HTTP Caching be useful in client requests of static content but if the caching feature of HTTP is not enough for your requirements, then you should also evaluate SOAP as you will be building your own cache either way not relying on the protocol.
HTTP Verb Binding HTTP verb binding is supposedly a feature worth discussing when comparing REST vs SOAP. Much of public facing API referred to as RESTFul are more REST-like and do not implement all HTTP verb in the manner they are supposed to. For example; when creating new resources, most developers use POST instead of PUT. Even deleting resources are sent through POST request instead of DELETE.
SOAP also defines a binding to the HTTP protocol. When binding to HTTP, all SOAP requests are sent through POST request.
Security Security is never mentioned when discussing the benefits of REST over SOAP. Two simples security is provided on the HTTP protocol layer such as basic authentication and communication encryption through TLS. SOAP security is well standardized through WS-SECURITY. HTTP is not secured, as seen in the news all the time, therefore web services relying on the protocol needs to implement their own rigorous security. Security goes beyond simple authentication and confidentiality, and also includes authorization and integrity. When it comes to ease of implementation, I believe that SOAP is that at the forefront.
Conclusion This was meant to be a short blog post but it seems we got to passionate about the subject.
I accept that there are many other factors to consider when choosing SOAP vs REST but I will over simplify it here. For machine-to-machine communications such as business processing with BPEL, transaction security and integrity, I suggest using SOAP. SOAP binding to HTTP is possible and XML parsing is not noticeably slower than JSON on the browser. For building public facing API, REST is not the undisputed champion. Consider the actual application requirements and evaluate the benefits. People would say that REST protocol agnostic and work on anything that has URI is beside the point. According to its creator, REST was conceived for the evolution of the web. Most so-called RESTFul web
services available on the internet are more truly REST-like as they do not follow the principle of the architectural style. One good thing about working with REST is that application do not need a service contract a la SOAP (WSDL). WADL was never standardized and I do not believe that developers would implement it. I remember looking for Twitter WADL to integrate it.
I will leave you to make your own conclusion. There is so much I can write in a blog post. Feel free to leave any comments to keep the discussion going.
Discover the unprecedented possibilities and challenges, created by today’s fast paced data climate and why your current integration solution is not enough, brought to you in partnership with Liaison Technologies.
REST stands for Representational State Transfer. (It is sometimes spelled "ReST".) It relies on a stateless, client-server, cacheable communications protocol -- and in virtually all cases, the HTTP protocol is used. REST is an architecture style for designing networked applications.
Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database. ... Normalizationis a process of organizing the data in database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly. Let's discuss about anomalies first then we will discuss normal forms with examples. Database normalization, or simply normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity
Normalization of Database Database Normalisation is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purpose,
Eliminating reduntant(useless) data.
Ensuring data dependencies make sense i.e data is logically stored.
Problem Without Normalization
Without Normalization, it becomes difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anamolies are very frequent if Database is not Normalized. To understand these anomalies let us take an example of Student table.
S_id S_Name S_Address Subject_opted
401 Adam Noida Bio
402 Alex Panipat Maths
403 Stuart Jammu Maths
404 Adam Noida Physics
Updation Anamoly : To update address of a student who occurs twice or more than twice in a table, we will have to
update S_Address column in all the rows, else data will become inconsistent.
Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name and address of a student but if
student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anamoly.
Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when we delete that row, entire student
record will be deleted along with it.
Normalization Rule
Normalization rule are divided into following normal form.
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
First Normal Form (1NF)
As per First Normal Form, no two Rows of data must contain repeating group of information i.e each set of column must have a unique value, such that multiple columns cannot be used to fetch the same row. Each table should be organized into rows, and each row should have a primary key that distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than one column can be combined to create a single primary key. For example consider a table which is not in First normal form
Student Table :
Student Age Subject
Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths
In First Normal Form, any row must not have a column in which more than one value is saved, like separated with commas. Rather than that, we must separate such data into multiple rows.
Student Table following 1NF will be :
Student Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be unique.
Second Normal Form (2NF)
As per the Second Normal Form there must not be any partial dependency of any column on primary key. It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenated key, then the table fails Second normal form.
In example of First Normal Form there are two rows for Adam, to include multiple subjects that he has opted for. While this is searchable, and follows First normal form, it is an inefficient use of space. Also in the above Table in First Normal Form, while the candidate key is Student, Subject, Age of Student only depends on Student column, which is incorrect as per Second Normal Form. To achieve second normal form, it would be helpful to split out the subjects into an independent table, and match them up using the student names as foreign keys.
New Student Table following 2NF will be :
Student Age
Adam 15
Alex 14
Stuart 17
In Student Table the candidate key will be Student column, because all other column i.e Age is dependent on it.
New Subject Table introduced for 2NF will be :
Student Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths
In Subject Table the candidate key will be Student, Subject column. Now, both the above tables qualifies for Second Normal Form and will never suffer from Update Anomalies. Although there are a few complex cases in which table in Second Normal Form suffers Update Anomalies, and to handle those scenarios Third Normal Form is there.
Third Normal Form (3NF)
Third Normal form applies that every non-prime attribute of table must be dependent on primary key, or we can say that, there should not be the case that a non-prime attribute is determined by another non-prime attribute. So this transitive functional dependency should be removed from the table and also the table must be in Second Normal form. For example, consider a table with following fields.
Student_Detail Table :
Student_id Student_name DOB Street city State Zip
In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.
New Student_Detail Table :
Student_id Student_name DOB Zip
Address Table :
Zip Street city state
The advantage of removing transtive dependency is,
Amount of data duplication is reduced.
Data integrity achieved.
Boyce and Codd Normal Form (BCNF)
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anamoly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions must be satisfied:
R must be in 3rd Normal Form
and, for each functional dependency ( X -> Y ), X should be a super Key.
RDBMS Concepts A Relational Database management System(RDBMS) is a database management system based on relational model
introduced by E.F Codd. In relational model, data is represented in terms of tuples(rows).
RDBMS is used to manage Relational database. Relational database is a collection of organized set of tables from which data
can be accessed easily. Relational Database is most commonly used database. It consists of number of tables and each table has its own primary key.
What is Table ?
In Relational database, a table is a collection of data elements organised in terms of rows and columns. A table is also considered as convenient representation of relations. But a table can have duplicate tuples while a true relation cannot have
duplicate tuples. Table is the most simplest form of data storage. Below is an example of Employee table.
RDBMS Concepts A Relational Database management System(RDBMS) is a database management system based on relational model introduced by E.F Codd. In relational model, data is represented in terms of tuples(rows).
RDBMS is used to manage Relational database. Relational database is a collection of organized set of tables from which data can be accessed easily. Relational Database is most commonly used database. It consists of number of tables and each table has its own primary key.
What is Table ?
In Relational database, a table is a collection of data elements organised in terms of rows and columns. A table is also considered as convenient representation of relations. But a table can have duplicate tuples while a true relation cannot have
duplicate tuples. Table is the most simplest form of data storage. Below is an example of Employee table.
RDBMS Concepts A Relational Database management System(RDBMS) is a database management system based on relational model introduced by E.F Codd. In relational model, data is represented in terms of tuples(rows).
RDBMS is used to manage Relational database. Relational database is a collection of organized set of tables from which data
can be accessed easily. Relational Database is most commonly used database. It consists of number of tables and each table has its own primary key.
What is Table ?
In Relational database, a table is a collection of data elements organised in terms of rows and columns. A table is also considered as convenient representation of relations. But a table can have duplicate tuples while a true relation cannot have duplicate tuples. Table is the most simplest form of data storage. Below is an example of Employee table.
What is a Column ?
In Relational table, a column is a set of value of a particular type. The term Attribute is also used to represent a column. For example, in Employee table, Name is a column that represent names of employee.
Database Keys Keys are very important part of Relational database. They are used to establish and identify relation between tables. They also ensure that each record within a table can be uniquely identified by combination of one or more fields within a table.
Super Key
Super Key is defined as a set of attributes within a table that uniquely identifies each record within a table. Super Key is a superset of Candidate key.
Candidate Key
Candidate keys are defined as the set of fields from which primary key can be selected. It is an attribute or set of attribute that
can act as a primary key for a table to uniquely identify each record in that table.
Primary Key
Primary key is a candidate key that is most appropriate to become main key of the table. It is a key that uniquely identify each record in a table.
Composite Key
Key that consist of two or more attributes that uniquely identify an entity occurance is called Composite key. But any attribute that makes up the Composite key is not a simple key in its own.
Secondary or Alternative key
The candidate key which are not selected for primary key are known as secondary keys or alternative keys
Non-key Attribute
Non-key attributes are attributes other than candidate key attributes in a table.
Non-prime Attribute
Non-prime Attributes are attributes other than Primary attribute.
E-R Diagram ER-Diagram is a visual representation of data that describes how data is related to each other.
1) Entity
An Entity can be any object, place, person or class. In E-R Diagram, an entity is represented using rectangles. Consider an example of an Organisation. Employee, Manager, Department, Product and many more can be taken as entities from an Organisation.
Weak Entity
Weak entity is an entity that depends on another entity. Weak entity doen't have key attribute of their own. Double rectangle represents weak entity.
2) Attribute
An Attribute describes a property or characterstic of an entity. For example, Name, Age, Address etc can be attributes of a
Student. An attribute is represented using eclipse.
Big data analytics is the process of examining large datasets to uncover hidden patterns, unknown
correlations, market trends, customer preferences and other useful business information
The primary goal of big data analytics is to help companies make more informed business
decisions by enabling data scientists, predictive modelers and other analytics professionals to
analyze large volumes of transaction data, as well as other forms of data that may be untapped by
conventional business intelligence (BI) programs. That could include Web server logs and
Internet clickstream data, social media content and social network activity reports, text from
customer emails and survey responses, mobile-phone call detail records and machine data
captured by sensors connected to the Internet of Things.
Semi-structured and unstructured data may not fit well in traditional data warehouses based
on relational databases. Furthermore, data warehouses may not be able to handle the processing
demands posed by sets of big data that need to be updated frequently or even continually -- for
example, real-time data on the performance of mobile applications or of oil and gas pipelines. As a
result, many organizations looking to collect, process and analyze big data have turned to a newer
class of technologies that includes Hadoop and related tools such
as YARN, MapReduce, Spark, Hive and Pig as well as NoSQL databases. Those technologies
form the core of an open source software framework that supports the processing of large and
diverse data sets across clustered systems.
What is big data technology? Big data is a term for data sets that are so large or complex that traditional dataprocessing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy.
What are the three V's of big data? 3Vs (volume, variety and velocity) are three defining properties or dimensions of big data. Volume refers to the amount of data, variety refers to the number of types ofdata and velocity refers to the speed of data processing.
What is data science and analytics? Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, machine learning, data mining, and predictive analytics, similar to ..
50 Bigdata Platforms and Bigdata Analytics Software IBM Bigdata Analytics. ...
HP Bigdata. ...
SAP Bigdata Analytics. ...
Microsoft Bigdata. ...
Oracle Bigdata Analytics. ...
Talend Open Studio. ...
Teradata Bigdata Analytics. ...
SAS Bigdata Analytics.
What is data analytics tools? Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions about that information. Data analytics is used in many industries to allow companies and organization to make better
business decisions and in the sciences to verify or disprove existing models or theories. Cloud Foundry is an open source cloud platform as a service (PaaS) on which developers can build, deploy, run and scale applications on public and private cloud models. VMware originally created Cloud Foundry and it is now part of Pivotal Software.
What is Cloud Foundry? Key benefits and a real use case
September 8, 2015 by Vineet Badola
Unlike most other Cloud Computing platform services, which are tied to particular cloud providers,
Cloud Foundry is available as a stand-alone software package. You can, of course, deploy it on
Amazon‘s AWS, but you can also host it yourself on your own OpenStack server, or through HP‘s
Helion or VMware‘s vSphere.
First of all though, to be completely clear, just what is a Cloud Computing platform? There are,
broadly speaking, three major categories of Cloud Computing:
Infrastructure as a Service (IaaS), which provides only a base infrastructure, leaving the end
user responsible for platform and environment configuration necessary to deploy applications.
Amazon‘s AWS and Microsoft Azure are prime examples of IaaS.
Software as a Service (SaaS) like Gmail or Salesforce.com.
Platform as a Service (PaaS), which helps to reduce the development overhead (environment
configuration) by providing a ready-to-use platform. PaaS services can be hosted on top of
infrastructure provided by an IaaS.
Since it‘s easy to become a bit confused when thinking about cloud platforms, it‘s important to be
able to visualize exactly which elements of the compute ecosystem are whose responsibilities. While
there is no precise definition, it‘s reasonable to say that a platform requires only that you take care of
your applications.
With that in mind, the platform layer should be able to provide:
A suitable environment to run an application.
Application life cycle management.
Self-healing capacity.
Centralized management of applications.
Distributed environment.
Easy integration.
Easy maintenance (upgrades etc).
What is Cloud Foundry
Cloud Foundry is an open source cloud computing platform originally developed in-house at
VMware. It is now owned by Pivotal Software, which is a joint venture made up of VMware, EMC,
and General Electric.
Cloud Foundry is optimized to deliver…
Fast application development and deployment.
Highly scalable and available architecture.
DevOps-friendly workflows.
Reduced chance of human error.
Multi-tenant compute efficiencies.
Not only can Cloud Foundry lighten developer workloads but, since Cloud Foundry handles so much
of an application‘s resource management, it can also greatly reduce the overhead burden on your
operations team.
Cloud Foundry‘s architectural structure includes components and a high-enough level of
interoperability to permit…
Integration with development tools.
Application deployment.
Application lifecycle management.
Integration with various cloud providers.
Application Execution.
Although Cloud Foundry supports many languages and frameworks, including Java, js, Go, PHP,
Python, and Ruby, not all applications will make a good fit. As with all modern software applications,
your project should attempt to follow the Twelve-Factor App standards.
Key benefits of Cloud Foundry:
Application portability.
Application auto-scaling.
Centralized platform administration.
Centralized logging.
Dynamic routing.
Application health management.
Integration with external logging components like Elasticsearch and Logstash.
Role based access for deployed applications.
Provision for vertical and horizontal scaling.
Infrastructure security.
Support for various IaaS providers.
Getting Started with Cloud Foundry
Before deciding whether Cloud Foundry is for you, you‘ll have to try actually deploying a real
application. As I already mentioned, to set up a suitable environment, you will need an infrastructure
layer. As of now, Cloud Foundry supports AWS, VMware and Open Stack. Setting Cloud Foundry
on top of VMware might not be the best choice for us, since we‘d probably prefer to avoid the extra
complexity. Instead, we‘ll work with Pivotal web services (PWS).
PWS provides ―Cloud Foundry as a web service,‖ deployed on top of AWS. You‘ll just need create
an account and you‘ll automatically get a sixty day free trial. Getting started isn‘t a big deal at all.
Hosting Static files in Cloud Foundry
Once you‘ve created your account and set up the command line interface tool, you‘ll be ready to
deploy your application. We‘re going to use some static files, which means we‘ll need one folder and
a few html files. Make sure there‘s an index.html file among them.
Normally, deploying static files requires a webserver like Apache or Nginx. But we‘re not going to
have to worry about that: the platform will automatically take care of any Internet-facing
configuration we‘ll need. You only need to push your application files to the Cloud Foundry
environment and everything else will be taken care of.
Now, copy the folder with your files to the machine where you‘ve installed the CLI and log in to the
CLI using this API endpoint:
You may need to provide some information
1. Username (the username you used to to log in to your PWS account).
2. Password (the PWS password you created).
3. Organization name (any name will work).
4. Space (select any space where you want your application to be deployed).
Cloud servers can be configured to provide levels of performance, security and control similar to those of a dedicated server. But instead of being hosted on physical hardware that's solely dedicated to you, they reside on a shared ―virtualized‖ environment that's managed by your cloud hosting provider.
5 Differences between Cloud and Dedicated Servers
There`s a rapid expansion in the number of businesses getting online and multiple solutions are now being provided by the hosting industry to them in order to help them host their data on the right server as per their needs. If you are a startup, there are two major hosting options available to you – cloud servers and dedicated servers.
In cloud server you don‘t need to buy and maintain any hardware as everything is ‗handled‘ by the service provider, whereas the user rents or buys the server, software and other resources from the web hosting provider in dedicated server.
To decide which option is right for your business, it is essential to understand basic differences between them so as to take the right decision:
1. Availability
Cloud servers never go down as in case of any issue, one of the multiple nodes takes over the workload of the failed node automatically and this ensures zero downtime and maximum network uptime for your website and application.
With dedicated servers, there‘s risk of downtime and hardware failure as they do not have multiple nodes to share the load.
2. Scalability of resources
Increase or decrease of allotted resources – computing cores, RAM, and storage, as per workloads, is very easy and simple with cloud server. ZNetLive‘s cloud servers have scalable RAM, CPU, storage and strong technical resources to boost your website performance.
When it comes to dedicated server, rigid specifications are there and scaling of resources is a bit difficult and time consuming task.
3. Safety and security
With cloud servers, you have to trust your provider for the services and for taking adequate measures for security. Cloud service providers ensure data safety through dedicated IT support, secure and encrypted solutions, firewalls, and facilitate backup recoveries.
But in dedicated servers, you yourself need to take essential measures from monitoring server resources to upgrading your dedicated server to secure your sensitive and confidential business information.
4. Cost-efficiency
Hourly resource-based billing is among big benefits of cloud server hosting is typically pay as you go, that means you pay only for the computing resources that you actually use. With cloud servers, bandwidth, SQL storage and disk space offered are bit expensive, but they are relatively cheaper and abundant with dedicated servers.
Dedicated servers are generally billed monthly and you have to pay a consistent amount irrespective of how much server and resources you actually use.
5. Level of control
In cloud server, one does not have complete control and is limited to offerings provided by the service provider.
However, a dedicated server offers complete control over the server as one can add applications, programs and performance
enhancing measures to the machine.
Should I go for cloud server or dedicated server?
Selection of a server completely depends upon your business goals and objectives. The cloud server is best suitable for e-commerce websites with unpredictable and fluctuating demands, cost efficiency of cloud server suits best to SMB websites, and is ideal for web hosting providers and for testing of new and basic websites.
But, if you are aiming for high performance, resilience, reliability and full control, then dedicated servers should be your default choice.
Digital Signatures
A digital signature is a type of electronic signature that offers more security than a traditional electronic signature. When you sign a document with a digital signature, the signature links a “fingerprint” of the document to your identity. Then that information is permanently embedded into the document, and the document will show if someone comes in and tries to tamper with it after you’ve signed it.
"Digital signatures offer tamper evidence, independent verification and a strict adherence to standards, meaning our customers are not left having to rely on us being around simply to prove that signatures took place,
Digital Signature A digital signature, on the other hand, refers to the encryption / decryption technology on which an electronic signature solution is built. A digital signature alone is not a type of electronic signature. Rather, digital signature encryption secures the data associated with a signed document and helps verify the authenticity of a signed record. Used alone, it cannot capture a person’s intent to sign a document or be legally bound to an agreement or contract.
What is an electronic signature? An electronic signature is a way of representing your signature on a computerized document, for
example a delivery slip. The term ‗electronic signature‘ can refer to several different methods of
capturing a signature on a document or device. This includes methods such as using a tablet or mobile
app to capture an image of a handwritten signature. It can also be simply typing your name into a
signature box. An example of a commonly created electronic signature is when you sign for a
delivery on the courier‘s digital device.
What is a digital signature? A digital signature is much more than an electronic signature. Digital signatures become
intrinsically linked to the content of the digital document using encryption.
Anyone digitally signing a document needs a digital certificate; the certificate being unique to that
individual. The certificate contains a public and a private key – known as a ‗key pair‘. Digital
signature software works by performing these steps:
1. The software creates a ‘hash’ of the document content. Hashes are representations of the whole content, including images.
2. The signatories certificate is then used to encrypt the hash. This combination of hashing and encryption creates an intrinsic connection between the document and the signatory; digital signing in this way, ties the two together.
3. The document hash is checked using the public key of the certificate to make sure it can be decrypted. It can only be decrypted if the user’s public key matches the private key used to encrypt the document.
4. When the signature is checked using the digital signing software, the original document is hashed again and both the original and signed hash are crosschecked. If there’s a difference between them, then the signature is invalidated.
Because a digital signature is effectively, ‗wrapped up‘ in the content of the document, it means that
if anyone tries to change anything about that document content, the signature will also change. It
effectively invalidates the signature and indicates that the document has been tampered with.
What’s the difference between a digital signature and an
electronic signature? The table below shows a quick, at-a-glance view of some of the key differences between digital
signatures and electronic signatures:
Digital Signature Electronic Signature
Digital signatures are like a lock on a document. If the
document changes after the signature is applied, it will
show up as an invalidated signature.
Electronic signatures are open to tampering.
Digital signatures are very secure. Hashes cannot be
easily undone and encryption using a digital certificate
is highly secure.
Electronic signature‘s are not based on standards and
tend to use proprietary methods so are intrinsically less
secure.
A digital signature is hard to deny. This is also known
as non-repudiation. A digital signature is associated
with an individual‘s private key of a digital certificate.
This identifies them as being the signatory, as it is
unique.
Electronic signatures are much harder to verify.
Digital signatures are nearly always time stamped. This
is very useful in a court of law to tie a person to a
signature at a specific day and time.
Electronic signatures can have a time and date
associated with the signature but it is held separate to
the signature itself so is open to abuse.
Digital signatures can hold logs of events, showing
when each signature was applied. In advanced digital
signature products like ApproveMe, this audit trail can
even send out alerts if the log is tampered with.
Audit logs are not easily applied to electronic
signatures.
The digital certificates representing the individual
signatories give details of the person signing the
document, such as full name, email address and
company name – they are tied to the document
signature through the certificate.
If details of the person placing an electronic signature
on a device or document are required, they have to be
placed separately to the signature and are not held with
the signature itself, therefore are more open to abuse.
eSign is an online electronic signature service in India to facilitate an Aadhaar holder to digitally sign a
document. ... With these two things, an Indian citizen cansign a document remotely without being physically
present
eSign – Online Digital Signature Service
Introduction
For creating electronic signatures, the signer is required to obtain a Digital Signature Certificate (DSC) from a Certifying Authority (CA) licensed by
the Controller of Certifying Authorities (CCA) under the Information Technology (IT) Act, 2000. Before a CA issues a DSC, the identity and address
of the signer must be verified. The private key used for creating the electronic signature is stored in hardware cryptographic token which is secured
with a password/pin. This current scheme of in-person physical presence, paper document based identity & address verification and issuance of
hardware cryptographic tokens does not scale to a billion people. For offering fully paperless citizen services, mass adoption of digital signature is
necessary. A simple to use online service is required to allow everyone to have the ability to digitally sign electronic documents.
eSign
eSign is an online electronic signature service which can be integrated with service delivery applications via an open API to facilitate an Aadhaar
holder to digitally sign a document. Using authentication of the Aadhaar holder through Aadhaar e-KYC service, online electronic signature service is
facilitated
Salient Features of eSign
Save cost and time Aadhaar e-KYC based authentication
Improve user convenience Mandatory Aadhaar ID
Easily apply Digital Signature Biometric or OTP based authentication
Verifiable Signatures and Signatory Flexible and fast integration with application
Legally recognized Suitable for individual business and Government
Managed by Licensed CAs API subscription Model
Privacy concerns addressed Assured Integrity with complete audit trail
Simple Signature verification Immediate destruction of keys after usage
Short validity certificates No concerns regarding key storage and key protection
Easy and secure way to digitally sign information anywhere, anytime - eSign is an online service for electronic signatures without using physical cryptographic token. Application service providers use Aadhaar e-KYC service to authenticate signers and facilitate digital signing of documents.
Facilitates legally valid signatures - eSign process includes signer consent, Digital Signature Certificate issuance request, Digital Signature creation and affixing as well as Digital Signature Certificate acceptance in accordance with provisions of Information Technology Act. It enforces compliance through API specification and licensing model of APIs. Comprehensive digital audit trail, in-built to confirm the validity of transactions , is also preserved.
Flexible and easy to implement - eSign provides configurable authentication options in line with Aadhaar e-KYC service and also records the Aadhaar ID used to verify the identity of the signer. The authentication options for eKYC include biometric (fingerprint or iris scan) or OTP (through the registered mobile in the Aadhaar database). eSign enables millions of Aadhaar holders easy access to legally valid Digital Signature service.
Respecting privacy - eSign ensures the privacy of the signer by requiring that only the thumbprint (hash) of the document be submitted for signature function instead of the whole document.
Secure online service - The eSign service is governed by e-authentication guidelines. While authentication of the signer is carried out using Aadhaar e-KYC services, the signature on the document is carried out on a backend server of the e-Sign provider. eSign services are facilitated by trusted third party service providers - currently Certifying Authorities (CA) licensed under the IT Act. To enhance security and prevent misuse, Aadhaar holders private keys are created on Hardware Security Module (HSM) and destroyed immediately after one time use.
http://www.cca.gov.in/cca/?q=eSign.html
Empanelled eSign Service Providers
List of Providers eMudhra Ltd. C-DAC (n)Code Solutions NSDL e-Governance Infrastructure Ltd
Careers in Emerging Technology: Databases and Data Science
By Lori Cameron
For this issue of ComputingEdge, we asked Andy Pavlo—assistant professor of databaseology in Carnegie Mellon University‘s Computer Science Department—about career opportunities in emerging technology fields involving databases and data science. Pavlo‘s research interests are database management systems—specifically main memory, nonrelational, and transaction-processing systems—and large-scale data analytics. He authored the article ―Emerging Hardware Trends in Large-Scale Transaction Processing‖ in IEEE Internet Computing‘s May/June 2015 issue.
ComputingEdge: What careers in emerging technologies in your field will see the most growth in the next several years?
Pavlo: Artificial intelligence, more specifically machine learning, will continue to be the hot growth area for the foreseeable future in database- and data-science-related fields. Developers who can design high-performance systems to support complex, data-intensive applications will surely be in demand for several years.
ComputingEdge: What would you advise college students to give them an advantage over the competition?
Pavlo: No company or organization starts a new software project from scratch. Thus, it‘s good to have is the ability to work on existing code bases with little or no guidance or documentation. The most ideal employees are those who can start quickly on a project that consists of a large amount of existing code they didn‘t write. The good way to learn this skill is through practice.
ComputingEdge: What advice would you give people changing careers midstream?
Pavlo: You must always work hard. And you have to stay up to date with the latest database systems, machine-learning tools, and data-analysis frameworks. Luckily, we live in an era where everyone is open-sourcing their software, so it is easier for people to try things out at home. The best way to pick up new skills is to pick a hobby project and then build it out using a new piece of software that you want to learn more about.
ComputingEdge: What do you consider to be the best strategies for professional networking?
Pavlo: You need to be visible. Making a LinkedIn page isn‘t enough. You must advertise what you have to offer. This means you should write a blog, build out your GitHub portfolio, contribute to open source projects, attend and give talks at meet-ups, and/or volunteer for hackathons. All of this shows potential employers that you are enthusiastic about computers and technology. Every little bit helps.
ComputingEdge: What should applicants keep in mind when applying for emerging-technology jobs?
Pavlo: The field is moving fast, but having a good computer-science foundation will serve you well no matter what the current technology trend is.
Cloud computing is on the cusp of a modified revolution as companies need faster computing resources to process, store and distribute a large amount of data in an efficient way. Cloud adoption helps big data businesses to deal with terabytes of data through a shared infrastructure.
As technology continues to evolve, the cloud computing market is expected to accelerate faster. According to Gartner over US$1 trillion in Information Technology (IT) spending will be directly or indirectly affected by the shift to cloud in the next five years. The move from on-premise to cloud infrastructure will be aided by a lot of industry developments in 2017. Here are the top five trends that will shape up the cloud computing space in 2017:
Cloud security to be the top priority
Security remains a serious concern with the growing adoption of cloud-based infrastructure globally. The vast amount of data stored on remote servers poses an enormous risk and necessitates implementation of wide security policies across organizations. In the business world, a data breach is linked to the loss of millions of customers, their identities, and reputation. Data breach investigations could result in millions of fines and could destroy businesses all at one go.
A market research firm estimates the size of cloud security market at US$8.7 billion in 2019. With the increase in cloud-based infrastructure, cloud security would become an integral part of big data management strategy in 2017.
Cloud-based IT infrastructure to traverse further
The hardware infrastructure for cloud computing will witness considerable investments with enterprises moving more workloads off-premise. In the next two years, about one-third of all organizations will be entirely based on the cloud, according to IDG‘s Enterprise Cloud Computing Survey, 2016. Servers and Ethernet will constitute the majority of spending for cloud-based infrastructure. On the contrary, spending on traditional IT infrastructure is expected to acknowledge a decline in a few organizations across geographies.
IDC expects that cloud-based IT infrastructure spending will register a compound annual growth rate (CAGR) of 13.6% to reach US$60.8 billion by 2020. ―Demand for cloud services will continue to drive the underlying shift in IT infrastructure spending from on-premise to off-premise deployments, said Natalya Yezhkova, Research Director, Storage Systems at IDC.‖
Public and Hybrid cloud computing set to grow rapidly
Public cloud is gaining traction as enterprises evince keen interesting in hosting their software applications on it. The worldwide public cloud growth will come mainly from infrastructure as a service (Iaas) and software as a service (SaaS), both will account for about 17% and 74% of the total workloads, respectively by 2020. Also, the overall spending on public cloud will be worth US$195 billion by 2020, according to IDC.
Since more and more organizations move their applications to the public cloud to save cost, big data vendors will evolve themselves to work in these IT environments. However, productivity, vendor lock-in, security and privacy concerns would force organizations to embrace a hybrid cloud strategy. This will enable them to shuffle between private and public clouds depending on the workload and business scenarios. The hybrid cloud market is expected to acknowledge a CAGR of 29% through 2019.
Internet of Things (IoT) and cloud computing to go hand in hand
The convergence of IoT and cloud is opening up new horizons in the technological landscape. The connected things will generate a large amount of data through the cloud. While vendors are able to achieve higher economies of scale, customer costs are reduced. Companies are now able to deploy applications worldwide and save cost on data centers. Moreover, IoT will see an upsurge in the management of smart mobile devices in IoT topology in the times ahead.
The Platform as a Service (PaaS) category which enables the combination of these two technologies is expected to see a huge growth in a few years. The database, analytics, and IoT workloads will account for 22% of the total business workloads by 2020. As the industry grows, we are likely to see more strategic collaborations between companies to drive IoT-optimized infrastructure services.
Machine learning to proliferate in cloud
Machine learning databases, applications, and algorithms are becoming pervasive in the cloud platforms. As cloud computing expands, organizations are developing different tools to create intelligent applications and incorporate machine learning in their software services. Infrastructure to support machine learning workloads such as natural language processing and neural networks will be of paramount importance to IT firms in the coming time.
All these developments will spread machine learning into a wide variety of uses and drive the next generation of applications. However, it is just the beginning. Eric Schmidt, Executive Chairman of Alphabet, said ―bringing machine learning to the cloud will be a game changer‖. The path is filled with a lot of challenges and it will be interesting to see how these developments pan out in 2017.
The outlook
Cloud computing has become a viable and mainstream solution to store and process a large amount of data. It holds a tremendous future with growing number of applications running on the cloud, offering unlimited, and elastic data storage capabilities. Cloud computing is helping enterprises gain efficiencies as they move their operations off-premise to better serve their customers. This transition will positively impact a large number of organizations globally over the next five years and will provide a fillip to the global economies.
References
[1] http://www.gartner.com/newsroom/id/3384720
[2] http://www.marketsandmarkets.com/PressReleases/cloud-security.asp
[3] http://www.idgenterprise.com/resource/research/2016-idg-enterprise-cloud-computing-survey/?utm_campaign=Cloud%20Computing%20Survey%202016&utm_medium=Press%20Release&utm_source=Press%20Release
[4] http://www.einnews.com/pr_news/354178469/global-hybrid-cloud-market-trends-demand-and-analysis-by-2027
[5] http://www.informationweek.com/cloud/infrastructure-as-a-service/gartner-sees-$1-trillion-shift-in-it-spending-to-cloud/d/d-id/1326372
[6] http://blogs.wsj.com/cio/2016/10/05/cloud-it-infrastructure-spending-up/
entiment analysis of social media data using Big Data Processing Techniques
NOV 24, 2016 22:56 PM
Introduction
With the extensive growth in the usage of online social media, the ransom amount of data is available as users‘ preference regarding any product, services provided by various organizations or with respect to any political issues. Micro blogs, forums are also available wherein internet users, can express their opinions. Since mobile devices can access network easily from anywhere, social media is becoming more and more popular. The number of people using the social media is increasing day by day as they can share their personal feeling every day and reviews are created in large-scale. Every minute opinion, reviews are being expressed online and a potential user rely on these reviews, opinions, feedback given by various other users to make decisions with respect to purchasing an item or developing a software when it comes to an organization that provides services. Analyzing these reviews, opinions or feedback in this scenario is of utmost importance. It seems evaluating these reviews, opinions are not as easy as it appears to be, and it requires performing sentiment analysis. Sentiment analysis greatly helps us in knowing the customer behavior. The biggest challenge is to process the social data which are in unstructured or semi-structured form. The former technologies fail to process the data in this form in an effective way. So, there is a need for highly optimized, scalable and efficient technology to process the abundant data that are being produced at a high rate. The social media data produced will be either unstructured or semi-structured. Hadoop framework effectively analyzes the unstructured and semi-structured form data. With the increase in the utilization of Hadoop for processing the huge sets of data in various fields the need for maintaining the overall performance of Hadoop becomes inevitable which is made possible by developing various open source tools such as Spark, Hive, Flume, Oozie, Zookeeper, Sqoop which are supported by Hadoop which makes it even more powerful.
SENTIMENT ANALYSIS
Sentiment is defined as an expression or opinion by an author about any object or any aspect. Analyzing, investigating, extracting users‘ opinion, sentiment and preferences from the subjective text is known as sentiment analysis. The main focus of sentiment analysis is parsing the text. In simple terms, sentiment analysis can be defined as detecting the polarity of the text. Polarity can be positive, negative or neutral. It is also referred to as opinion mining as it derives opinion of the user. Opinions vary from user to user and sentiment analysis greatly helps in understanding users‘ perspective. Sentiment can be,
Direct opinion: As the name suggests the opinion about an object is given directly and the opinion may be either positive or negative. For example, ―The video clarity of the cellphone is poor‖ expresses a direct opinion.
Comparison opinion: It is a comparative statement which consists of comparison between two identical objects. The statement, ―The picture quality of camera-x is better than that of camera-y‖ is one possible example for expressing a comparative opinion.
Sentiment analysis is performed at three different levels:
Sentiment analysis at sentence level identifies whether the given sentence is subjective or objective. Analysis at sentence level assumes that the sentence contains only one opinion.
Sentiment analysis at document level classifies the opinion about the particular entity. Entire document contains opinion about the single obj ect and
from the single opinion holder.
Sentiment analysis at feature level extracts the feature of a particular object from the reviews and determines whether the stated opinion is positive or negative. The extracted features are then grouped and their summarized report is produced.
Architecture components of Big Data Ecosystem
With the explosive growth of data on the Internet and the improvement of corpus, sentiment analysis system needs the big data processing techniques to complete tasks. The term Big Data is represented by three V‘s (Volume, Variety, and Velocity). Volume represents amount of data used for summarization. Variety represents different type of data like structured, semi-structured and unstructured which is extracted from various sources. Velocity represents the speed of data generation on internet. For processing the large sets of data in parallel across cluster of nodes Apache came up with an open source framework known as Hadoop.
The major components of Hadoop are Hadoop distributed file system (HDFS) and the MapReduce programming model. Hadoop is accessible, because it runs on cloud computing services or commodity machine across clusters of nodes. It is able to handle failures in an efficient manner even though it is intended to run on commodity hardware which makes it robust. Any number of nodes can be added to the Hadoop cluster in order to deal with huge data in parallel. Hadoop is simple in that a user can write a simple parallel code. To each and every node data is distributed and hence operation is performed in parallel in Hadoop cluster. Hadoop overcomes the hardware failure by keeping multiple copies of data. Modules of Hadoop Ecosystems are as follows:
1. Hadoop common utilities
Hadoop modules require operating system level and file system level abstractions which are provided by the java libraries and utilities. Execution of Hadoop is carried out by the java files and scripts facilitated by Hadoop common utilities.
2. Hadoop Distributed file system (HDFS)
Hadoop provides its own filesystem known as Hadoop distributed file system for storing huge set of data based on Google File Server (GFS) which is highly fault-tolerant. The architecture of HDFS depicts the master/slave architecture. Master node manages the file system and the storage of actual data is taken care by the slave node.A file in a HDFS namespace is divided into several segments and these segments are stored in DataNodes. The plotting of these segments to the DataNodes is identified by the NameNode. The data node performs read and write operations.
3. MapReduce
The MapReduce is a Distributed Data Processing Framework of Apache Hadoop enables the writing of applications in an effective manner and also enables parallel processing of huge sets of data .The MapReduce paradigm has two different tasks:
The Map Task: The Map task captures the input and this input data are divided into pair of data. This data is further divided into tuples to form a key/value pair.
The Reduce Task: The input to the Reduce task is the output from the Map task. All the divided tuples in the Map task is combined to form smaller set of tuples. Map Task is followed by Reduce Task.
The MapReduce component of the Hadoop framework schedules monitors the tasks and also re-executes the failed task. The MapReduce paradigm has a single JobTracker and one TaskTracker that acts as master and slave respectively. The master JobTracker directs the slave TaskTracker to execute the task and also it manages the resource, tracks the resource distribution, consumption and availability. On the other hand the TaskTracker provides the status information to the JobTracker.
4. Hadoop Yarn framework
It provides computational resources required for application execution.YARN is enabler for dynamic resource utilization on Hadoop framework as users can run various Hadoop applications without having to bother about increasing workloads. Yarn has Resource Manager and Node manager for Scheduling of jobs and managing the resources to the clusters. The master is Resource Manager. It will do resource scheduling by knowing where the slaves are located and how many resources they have. The slave of the infrastructure is Node Manager. When it starts, it announces himself to the Resource Manager and periodically sends a heartbeat to the Resource Manager.
Hadoop proves to be a reliable framework and also it processes the huge set of data in a fault-tolerant manner which makes it efficient. To make Hadoop function methodically various open source technologies such as Spark, Flume, Hive, Mahout, Sqoop, Oozie, Zookeeper etc. are developed which are built on top of Hadoop, collectively called as Hadoop eco-system proves to improve the overall performance of Hadoop.
Figure: Hadoop Ecosystem
Data Access Components of Hadoop Ecosystem
Data access components of Hadoop are Apache Pig and Hive. They are used for analyzing large data sets without the low level work with Map
reduce. Apache Pig is a platform for analysing large data sets . Pig‘s infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs and Pig's language layer currently consists of a textual language called Pig Latin. Hive is a data warehouse system for Hadoop for querying, Summarizing and analysis of large data sets stored in HDFS. It provides SQL like interface. The data stored in HDFS is queried by Hive with the help of HiveQL.
Data Integration Components of Hadoop Ecosystem
Data Integration components of Hadoop Ecosystem are flume and sqoop. Flume is used for populating data with Hadoop. Collection, aggregation and movement of data is the responsibility of Flume. Sqoop/REST/ODBC is a connectivity tool for moving data from non-Hadoop data stores like relational databases and data warehouses into Hadoop. The users
Cyclomatic Complexity – 40 Years Later
The criticality and risk of software is defined by its complexity. Forty years ago, McCabe introduced his famous cyclomatic complexity (CC)
metric. Today, it is still one of the most popular and meaningful measurements for analyzing code. Read this blog about the measurement and
its value for improving code quality and maintainability...
— Christof Ebert
It is of great benefit for projects to be able to predict software components likely to have a high defect rate or which might be difficult to test and
maintain. It is of even more value having an indicator which can provide constructive guidance on how to improve the quality of code. This is
what the cyclomatic complexity (CC) metric gives us.
The CC metric is simple to calculate and intuitive to understand. It can be taught quickly. Control flows in code are analyzed by counting the
decisions, i.e., the number of linear independent paths through the code under scrutiny. Too many nested decisions make the code more difficult
to understand due to the many potential flows and possibilities of passing through it. . In addition, the CC value of a module correlates directly
with the number of test cases necessary for path coverage, so even a rough indication given by the CC metric is of high value to a developer or
project manager.
A high CC thus implies high criticality and the code will have a higher defect density (vis-à-vis code with a relatively lower CC); test effort is
higher and maintainability severely reduced. These relationships are intuitive for students as well as experts and managers and this is another
appealing feature of the CC metric. It is small wonder therefore that CC, unlike many other metrics which have been proposed over the past
decades is still going strong and is used in almost all tools for criticality prediction and static code analysis.
CC, together with change history, past defects and a selection of design metrics (e.g., level of uninitialized data, method overriding and God
classes) can be used to build a prediction model. Based on a ranked list of module criticality used in a build, different mechanisms
namely refactoring, re-design, thorough static analysis and unit testing with different coverage schemes can then be applied. The CC metric
therefore gives us a starting point for remedial maintenance effort.
Instead of predicting the number of defects or changes (i.e., algorithmic relationships) we consider assignments to classes (e.g., ―defect-prone‖).
While the first goal can be achieved more or less successfully with regression models or neural networks mainly in finished projects, the latter
goal seems to be adequate for predicting potential outliers in running projects, where precision is too expensive and not really necessary for
decision support. Christof – I am not sure I follow the point being made in these last two sentences – can you possibly clarify/elaborate please?
While the benefits of CC are clear, it does need clear counting rules. These days for instance, we do not count simple ―switch‖ or ―case‖
statements as multiplicities of ―if, then, else‖ decisions. Moreover, the initial proposal to limit CC to seven plus/m inus two per entity is no longer
taken as a hard rule, because boundaries for defect-prone components are rather fuzzy and multi-factorial.
Having identified such overly critical modules, risk management must be applied. The most critical and most complex of the analyzed modules,
for instance, the top 5, are candidates for redesign. For cost reasons mitigation is not only achieved with redesign. The top 20% should have a
thorough static code analysis, and the top 80% should be at least unit tested with C0 coverage of 100%. By concentrating on these critical
components the productivity of quality assurance is increased.
Critical modules should at least undergo a flash review and subsequent refactoring, redesign or rewriting – depending on their complexity, age
and reuse in other projects. Refactoring includes reducing size, improving modularity, balancing cohesion and coupling, and so on. For instance,
apply thorough unit testing with 100 percent C0 coverage (statement coverage) to those modules ranked most critical. Investigate the details of
the selected modules‘ complexity measurements to determine the redesign approach. Typically, the different complexity measurements will
indicate the approach to follow. Static control flow analysis tools incorporating CC can also find security vulnerabilities such as dead code, often
used as backdoors for hijacking software.
Our own data but also many published empirical studies demonstrate that a high decision-to-decision path coverage or C1 coverage will find
over 50% of defects, thus yielding a strong business case in favor of using CC. On the basis of the results from many of our client projects and
taking a conservative ratio of only 40 percent defects in critical components, criticality prediction can yield at least a 20 percent cost reduction for
defect correction.
The additional costs for the criticality analysis and corrections are in the range of few person days per module. The necessary tools such
as Coverity, Klocwork, Lattix, Structure 101, SonarX, SourceMeter, are off the shelf and account for even less per project. These
criticality analyses provide numerous other benefits, such as the removal of specific code-related risks and defects that otherwise are hard to
identify (for example, security flaws).
CC clearly has its value for critically predictions and thus improving code quality and reducing technical debt. Four decades of validity and usage
is a tremendous time in software, and I congratulate McCabe for such a ground-breaking contributio
More:
Read selected white papers on quality practices from our media-center:
http://consulting.vector.com/vc_download_en.html?product=quality
Read our full article on static code analysis technologies in IEEE Software:
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4814967
Author:
Christof Ebert is the managing director of Vector Consulting Services. He is on the IEEE Software editorial board and teaches at the University of
Stuttgart and the Sorbonne in Paris.
DevOps Practice
DevOps breaks organizational silos and thus accelerates delivery. DevOps principles apply not only for cloud and IT services but for most
industries, including critical systems. Read the blog and learn from a recent case study of using DevOps methodologies in critical domains...
— Christof Ebert
DevOps is an organizational shift where instead of distributed silo-like functions cross-functional teams work on continuous operational feature
deliveries. Teams thus deliver value in a faster and continuous way, reducing problems generated by miscommunication between team members
and enhancing a faster resolution of problems. It obviously means a culture shift towards collaboration between development, quality assurance
and operations. At Vector we have supported a number of companies on improving efficiency with DevOps and continuous delivery. Here a brief
case study from a domain with high safety and security requirements.
A global supplier of critical infrastructure solutions faced overly long cycle time and high rework of delivered upgrades. The overall delivery
process from development to the field took 18 months for new products and up to 3 months for upgrades thus being far too long, even in this
domain. We introduced a DevOps model tailored for these specific environmental constraints. The figure below shows the eight focus areas
mapped to the v-shaped lifecycle abstraction. The key change was the enhanced requirements engineering and delivery model (numbers 1 and
2 in the picture below). By running the automated tests and static and runtime analysis with every check into automatic build management, our
client obtained the capability to discover defects early in the development cycle. Less changes during the feature development phase and less
rework due to quality issues directly impacted ROI. Software releases became more consistent and less painful, because tests were run early
and often. The company gained an overall end-to-end cycle time improvement towards 12 months for products and few days for small upgrades
due to better quality and fewer changes.
DevOps principles apply to different delivery models and industries, but must be tailored to the environment and product architecture. Continuous
deliveries are difficult in distributed and critical systems, such as automotive, railway or medical. Nevertheless delivery processes can be
facilitated in a fast and reliable scheme, such as software over the air (OTA) upgrades in these industries show. Obviously such delivery models
need dedicated architecture and hardware changes, for instance secure delivery schemes and a hot swap controller concept, where one half is
operational and the other half builds the next updates which are swapped to active mode after in-depth security and verification
approaches. DevOps for such critical systems is more challenging than cloud and IT services due to the dependence on legacy code and
architecture, and trying to fit it into a continuous delivery approach.
Mutual understanding from requirements onwards to maintenance, service and product evolution will yield typically a cycle time improvement of
10-30% and cost reduction of up to 20%. As products and life-cycle processes vary, each company needs its own approach towards
a DevOps environment, from architecture to tools and culture.
More:
Read selected white papers on agile practices from our media-center
Directly proceed to the white papers…
Read our full article on DevOps tools and technologies in IEEE Software, May 2016
evOps is about fast and flexible development and delivery business processes. The blog provides a brief overview on most recent DevOps
technologies and what it means for industry projects. Learn about some best practices for DevOps in this blog...
— Christof Ebert
DevOps efficiently integrates development, delivery and operations and thus facilitates a lean and fluid connection of these traditionally separated silos. It is a software practice that integrates the two worlds of development and operations with automated development, deployment and infrastructure monitoring. It‘s an organizational shift where instead of distributed silo-like functions cross-functional teams work on continuous operational feature deliveries. This integrative approach helps teams deliver value in a faster and continuous way, reducing problems generated by miscommunication between team members and enhancing a faster resolution of problems.
DevOps means a culture shift towards collaboration between development, quality assurance and operations. The generic process is indicated in the figure below. Its promise and goal is to better integrate the development, production and operations business processes with adequate technology, thus not staying on highly artificial process concepts which will never fly, but rather set up a continuous delivery process with small upgrades. Companies such as Amazon and Google have lead this approach achieving cycle times of minutes. This obviously depends on the deployment model, whereas a single cloud service is easier to facilitate than actual software deliveries to real products.
DevOps applies to these very different delivery models, but must be tailored to the environment and product architecture. Not all products facilitate continuous deliveries, for instance in safety-critical systems. Nevertheless upgrades can be planned and delivered in a fast and reliable scheme, as recent evolution of automotive software over the air (OTA) upgrades show. Aside the highly secured cloud based delivery model, such delivery models also need dedicated architecture and hardware changes, for instance a hot swap controller concept, where one half is operational and the other half builds the next updates which are swapped to active mode after in-depth security and verification approaches. DevOps for embedded systems is more challenging than cloud and IT services due to the dependence on legacy code and architecture, and trying to fit it into a continuous delivery approach.
Modern tools are mandatory to implement a DevOps pipeline. Choosing the right tools for your environment or project it is an important step when moving to a DevOps practice. In the build phase the tools need to support fast workflows. From this perspective, build tools help to achieve fast iteration reducing manual time consuming tasks, and Continuous integration tools merge code from all developers and check for broken
code, improving software quality. During the deployment phase the most important shift is to treat Infrastructure as code. With this approach infrastructure can be shared, tested and version controlled. A homogenous infrastructure is shared between development and production, reducing problems and bugs because of difference in infrastructure configuration.
At Vector we have supported a number of companies on improving efficiency with DevOps and continuous delivery. A key learning for all companies is that the culture shift should not be underestimated. There are four major challenges which we face in all DevOps projects, namely:
• Break complex architectures and feature sets towards small chunks that can be produced and deployed independently.
• Maintain a configuration and build environment which provides visibility at all times about what is currently deployed with which versions and dependencies.
• Introduce a purpose-built development and production environment from legacy ALM/PLM environments.
• Bridge the traditional silo-type cultures of development (perceived by operations in its thoroughness as cumbersome and expensive) and operations (perceived by developers as quick and dirty).
DevOps is a paradigm shift impacting the entire software and IT industry. Building upon lean and agile practices,DevOps means end-to-end automation in software development and delivery. Hardly anybody will be able to approach it with a cookbook style approach, but most will benefit from better connecting the previously isolated silos of development and operations. Mutual understanding from requirements onwards to maintenance, service and product evolution will yield typically a cycle time improvement of 10-30% and cost reduction of up to 20%. Major drivers are less requirements changes, focused testing and quality assurance, and much faster delivery cycle with feature-driven teams. As products and life-cycle processes vary, each company needs its own approach towards a DevOps environment, from architecture to tools and culture.
Contact me at [email protected] for more information or to discuss these trends.
Key differences between MySQL vs PostgreSQL
MySQL is a relational database management system (RDBMS) currently developed by Oracle with open-source code. This code is available for free under the GNU General Public License, and commercial versions of MySQL are also available under various proprietary agreements. PostgreSQL is an object-RDBMS (ORDBMS) that‘s developed by the PostgreSQL Global Development Group. It also has an open source, which is released under the permissive PostgresSQL License. The differences between MySQL and PostgreSQL include the following key categories:
Governance
Supported platforms
Access Methods
Partitioning
Replication
DIFFICULTY Basic - 1 | Medium - 2 | Advanced - 3
TIME REQUIRED 5 min
RELATED PRODUCTS Linux-based VPS or dedicated servers
Governance
The governance model around the MySQL and PostgreSQL is one of the more significant differences between
the two database technologies. MySQL is controlled by Oracle, whereas Postgres is available under an open-source license from the PostgreSQL Global Development Group. As such, there has been increasing interest in Postgres over the past few years. Both are open source, but Postgres has gained in popularity recently.
Supported Platforms
Both MySQL and PostgreSQL can run on the Linux, OS X, Solaris and Windows operating systems (OSs). Linux is an open-source OS, OS X is developed by Apple, Solaris is developed by Oracle and Windows is developed by Microsoft. MySQL also supports the FreeBSD OS, which is open source. PostgreSQL supports the HP-UX operating system, which is developed by Hewlett Packard, and the open-source Unix OS.
Access Methods
Access methods that are common to both MySQL and PostgreSQL include ADO.NET, JDBC and ODBC. ADO.NET is a set of Application Programmer Interfaces (APIs) that programmers use to access data based on XML. JDBC is an API for the Java programming language that accesses databases, while ODBC is a standard API for accessing databases. PostgreSQL can also be accessed with routines from the platform‘s native C library as well as streaming APIs for large objects.
Partitioning
MySQL and PostgreSQL differ significantly with respect to their partitioning methods, which determine how data is stored on different nodes of the database. MySQL uses a proprietary technology called MySQL Cluster to perform horizontal clustering, which consists of creating multiple clusters with a single cluster instance within each node. PostgreSQL doesn‘t implement true partitioning, although it can provide a similar capability with table inheritance. This task involves using a separate sub-table to control each ―partition.‖
Replication
A database may use multiple methods to store redundant data across multiple nodes. MySQL uses master-master replication, in which each node can update the data. Both MySQL and PostgreSQL can perform master-slave replication, where one node controls the storage of data by the other nodes. PostgreSQL can also handle other types of replication with the implementation of third-party extensions.
Smartphones and tablets are still regarded as the latest craze in technology, but what we should
really turn our attention to is the Internet of Things and the devices that are shaping the future for
it. Those of you who are aware of what it means know that the Internet of Things is just around the
corner and we’re excited to see it evolve with the help of these ingenious technical products. There are
many definitions for the IoT, probably because it’s still a pioneering idea, but I’ll try my best to explain
it as plain as possible.
The Internet of Things is an environment where objects, animals or people have unique identifiers that
allow them to transfer data over a network on their own. The evolution of wireless technologies has
made it possible for the IoT to see a rapid growth. Also, the IPv6 huge increase is another catalyst in the
development of the Internet of Things.
20 Devices to show how the Internet of Things will be
Within eight years, it is expected that various organizations ranging from government to business will
create a market of almost $9 trillion which will be constituted from 212 billion “things” making up the
global Internet of Things in 2020. Another good definition for the IoT comes from Techtarget:
A thing, in the Internet of Things, can be a person with a heart monitor implant, a farm animal with
a biochip transponder, an automobile that has built-in sensors to alert the driver when tire pressure
is low — or any other natural or man-made object that can be assigned an IP address and provided
with the ability to transfer data over a network. So far, the Internet of Things has been most closely
associated with machine-to-machine (M2M) communication in manufacturing and power, oil and
gas utilities. Products built with M2M communication capabilities are often referred to as being
smart.
In our article, we won’t talk about the role of the Internet of Things in such big areas as manufacturing,
oil and gas industries; but we’ll discuss about the Internet of Things devices that we come in contact
with every day, such as a smart thermometer, a smart scale, pretty much everything that makes a smart
home.
An ideal Internet of Things scenario would suggest something like this: as you wake up, a smart coffee
maker will start brewing your coffee, when you want to watch a movie, a smart light system would turn
down the lights. Before we start the list with some of the best Internet of Things devices, you
might want to have one more look at this infographic to understand just how big and important the IoT
really is.
Internet of Things Platforms and Networks
Those of you acquainted with the Internet of Things notion and devices have probably heard about Z-
Wave. To put it very simple, Z-Wave is the wireless standard for IoT devices as most of them have Z-
Wave chips inside. It’s the same case with ZigBee, which, albeit newer to this game, is betting on its
low-power approach. The ZigBee and Z-Wave standards, if you will, are the backbones behind most of
the devices present here, but are more used in platforms and networks.
SmartThings
A special attention I think needs to be given to the SmartThings start-up, which works like a platform
for the growing number of devices that are connected to the Internet. It works like an online
marketplace where users can buy starter kits to transform their homes into an unified IoT system. So, if
you’re looking for some cool IoT devices, you should also have a look there, where you will find lots
of Internet of Things devices, such as smart lights, switches, doors and locks. Developers will find the
right tools while users will get smart apps for the devices they buy from SmartThings.
Revolv
Revolv is another platform that brings more devices together under a single command center. Just like
the above video says, you can control the lighting in your home (such as the Philips Hue light-bulb that
we’ll talk lower about), your Apple TV, your heating and much more! The platform Revolv IoT
service is first expected to ship for $299 before the end of this fall. You will get with your purchase the
Revolv Hub, Revolv App, and for a limited time, also the free lifetime service plan that comes with
GeoSense automation.
Securifi
Another IoT device that was born thanks to Kickstarter, Securifi is
somehow also an IoT platform, but also a standalone device, but we’ve decided to put it in this
category. The Almond+ is their latest version of their touchscreen router, an Internet of Things device
that can wirelessly connect a 5,000 square foot home, and is four times faster than your average
wireless router. Also, thanks to its touchscreen, the Almond+ can be set up without having to use a PC
or a smartphone. It also works with ZigBee and Z-Wave standards so Almond+ supports hundreds of
existing sensors in the market.
Xfinity Home products
Comcast’s range of Xfinity
Home products is focused on providing users with a smart way to control their homes. Like a true
service of its kind, Comcast doesn’t sell products on a pay-once basis, but it comes with monthly plans
that vary according to your needs. The Secure and Control plan protects against fire and break-ins,
while providing automation for lights, temperature, and more. The Home Control plan, which is also
cheaper, will let you control lights, temperature, and more home features will be included to save
money on your bills.
WeMo Belkin Home Automation
The WeMo family of IoT products from Belkin is composed of light and insight switches, motion
sensor, baby monitor. Though not included in the WeMo range of products, Belkin also has two
NetCams that will let you watch what happens inside your home. Compared to other similar home
automation systems, Belkin’s products seem to have a lower price-point. Even more, WeMo also works
with IFTTT, so you could do much more amazing stuff with Belkin’s IoT devices.
Ninja Blocks
We did talk also about
Ninja Blocks, as well, when we were sharing with you some of the best weather gadgets there are in the
market. Using the Ninja Blocks IoT system, you can do so many things like watching over the
temperature and humidity levels, turn on the lights when you’re not at home or even send an SMS
when someone is at your front door. A smart mix between IFTTT functionality and the power of the
Internet of Things. A wireless Window & Door sensor can let you know when your door is opened or
images of someone who is moving in front of your door can be stored to your Dropbox account. How
cool is that!
Internet of Things Devices
These IoT devices cover many fields, with a special attention, of course, to home automation and the
functionality of your house. Thermostats, smoke detectors, smart music systems, smart light bulbs –
we have it all here. But Internet of Things devices are also present in the interaction with the human
body, whether we’re talking about fitness trackers, smart body scales or even baby monitors.
Fitbit Aria Wi-Fi Smart Scale
We’ve talked about Fitbit’s Aria smart
scale before and we’ll do it again. Aria tracks your weight, body fat percentage, body mass index and
lets you watch over these values on the long-term. It wirelessly syncs and auto-uploads your stats to an
online graph that you can always access to check how you’re doing. Fitbit’s Aria smart
scale recognizes up to eight people but it does so discreetly, as every information is kept private. To
keep you motivated, you can earn badges as you go. Besides this, if you want, you can receive alerts on
your smartphone when you’re nearing your goals.
Withings Smart Body Analyzer
Withings is a company well-known for making gadgets to look after your health and help you keep fit.
The Smart Body Analyzer is a smart scale that can can do so much more than just letting you keep a
eye on your weight. It comes with an impressive set of feature – full body knowledge, heart
measurement, weight goals and long-term progress graph and indoor air quality monitoring. This
amazing Internet of Things device can be yours for $150 from Amazon.
Nest
Nest is one of the companies that releases products with impressive but simple designs. It probably has
to do with the fact that their company was was co-founded by former Apple
engineers Tony Fadell and Matt Rogers back in 2010. They have only two products, but they seem to be
very well-built and show us that this is a company that will definitely be present in the future with even
more Internet of Things devices.
Thermostat – we’ve featured the Nest Thermostat in our top with the best there are in the market, so
have a look at it if you’re interested in such devices. The Nest Learning Thermostat is called like
that because it can learn your schedule so it could program itself to increase or reduce the heating
levels, thus reducing your bills. Obviously, you can control it from your phone.
Protect smoke detector – the Nest Protect is the second
product of the company; this is a smoke and carbon monoxide detector. It is smart because it doesn’t
immediately turn on a loud alarm when it detects some CO2 from the toaster. The Protect will give an
early warning through a yellow lights flashing and a warning spoke with human voice. It will tell where
the smoke or carbon monoxide is or so you will decide yourself if it is an emergency or just a nuisance
alarm. And if so, you’ll cancel the alarm just by standing under the Nest Protect and waving your arm.
Sonos Music System
Music definitely needs to be taken into consideration with the rise of IoT devices. With an appealing
design, Sonos is a system of HiFi wireless speakers and audio components that combines all your
music collection, radio or podcasts in a single app. You can choose to play what you want in different
rooms by using a dedicated wireless network. Sono has got a wide collection of products for music fans,
from speakers to sound-bars, so head over to their website to choose what you like.
Philips Hue light bulb
Probably because it follows the same product principles as Apple’s product, Philips’ Hue smart
bulb is available to buy from Apple Store, but also from Amazon. You can choose to buy a starter pack
that includes hue bridge and three bulbs for $200 or single connected bulb for $60. A single bridge
will let you control up to 50 bulbs and you will be able to create lighting scenes based on your favorite
photos. Obviously, the lighting is controllable from your smart phone or tablet. They are said to help
you use 80% less energy than traditional bulbs. If you’re interested in this IoT device, you could have a
look at a similar smart light bulb, the LiFx.
Lockitron
Lockitron is not the single smart lock out there, but it’s our favorite and definitely one of the most
innovative IoT devices. Another product that was made real thanks to the power of
crowdfunding, Lockitron ensures keyless entry inside your home using only your phone. You can also
monitor to see if the door is locked when you’re gone. And if not, it will send a notification when it is
unlocked. The single drawback is that it works only with iPhone 4S or iPhone 5, but support for iPhone
5s and 5c should be coming soon, as well. Lockitron has an intelligent power management that makes
its batteries last for up to one year.
LG Smart Thinq
LG wants to be one of the first companies to deploy the power of the Internet of Things in home
appliances. That’s why it has launched the Smart Thinq line of products. It currently contains only
four categories of products, but more will be added as the interest of consumes will increase over time.
At the moment, you can find refrigerators, washing machines, dryers and ovens that are connected to
the Internet to help you be in better control and save money.
AirQuality egg
I have always wondered – what is the
quality of the air that I am breathing? Living in a city has always had me longing for the fresh air in the
small village I was born in. The AirQuality Egg is a sensing device that measures the air quality in
your environment and lets you share that information with an online community in real-time. The
whole system is composed of outdoor sensors that have an RF transmitter which sends the air quality
data wirelessly to an Egg-shaped base station inside. The Egg hub then sends the data to Xively which
stores it and shares it with the community.
Smart baby monitor
We think Smart baby monitors are some very useful IoT devices, especially for tech savvy parents, and
that’s why we have compiled a while ago a list with some of the best to use. The Mimo baby
monitor is another useful such device that lets you watch over your little one’s respiration and check
the temperature in the room. Also, it can tell you if your baby is asleep or how active he is.
Through logical address the system identify a network (source to destination). after identifying the network physical
address is used to identify the host on that network. The port address is used to identify the particular application running
on the destination machine.
Logical Address: An IP address of the system is called logical address. This address is the combnation of Net ID and Host ID. This
address is used by network layer to identify a particular network (source to destination) among the networks. This address can be
changed by changing the host position on the network. So it is called logical address.
Physical address: Each system having a NIC(Network Interface Card) through which two systems physically connected with each
other with cables. The address of the NIC is called Physical address or mac address. This is specified by the manficture company of
the card. This address is used by data link layer.
Port Address: There are many application running on the computer. Each application run with a port no.(logically) on the computer.
This port no. for application is decided by the Karnal of the OS. This port no. is called port address.
Suppose you have to reach to your friends house. You have to first go to the area or the street of your friends house, then go to the house no. Then your friend is a particular person to that house. Now in technical terms the logical address define the area or street no., physical address defines the house no. And lastly the port no. Is your particular friend within that house. Street = link address/ logical address House no. = host address/ physical address Your friend = port address / service point address
Consequences[edit]
A memory leak reduces the performance of the computer by reducing the amount of available memory. Eventually, in the worst case, too much of the available memory may become allocated and all or part of the system or device stops working correctly, the application fails, or the system slows down vastly due to thrashing.
Memory leaks may not be serious or even detectable by normal means. In modern operating systems, normal memory used by an application is released when the application terminates. This means that a memory leak in a program that only runs for a short time may not be noticed and is rarely serious.
Much more serious leaks include those:
where the program runs for an extended time and consumes additional memory over time, such as background tasks on servers, but especially in embedded devices which may be left running for many years
where new memory is allocated frequently for one-time tasks, such as when rendering the frames of a computer game or animated video
where the program can request memory — such as shared memory — that is not released, even when the program terminates
where memory is very limited, such as in an embedded system or portable device
where the leak occurs within the operating system or memory manager
when a system device driver causes the leak
running on an operating system that does not automatically release memory on program termination.
An example of memory leak[edit]
The following example, written in pseudocode, is intended to show how a memory leak can come about, and its effects, without needing any programming knowledge. The program in this case is part of some very simple software designed to control an elevator. This part of the program is run whenever anyone inside the elevator presses the button for a floor.
When a button is pressed:
Get some memory, which will be used to remember the floor number
Put the floor number into the memory
Are we already on the target floor?
If so, we have nothing to do: finished
Otherwise:
Wait until the lift is idle
Go to the required floor
Release the memory we used to remember the floor number
The memory leak would occur if the floor number requested is the same floor that the elevator is on; the condition for releasing the memory would be skipped. Each time this case occurs, more memory is leaked.
Cases like this wouldn't usually have any immediate effects. People do not often press the button for the floor they are already on, and in any case, the elevator might have enough spare memory that this could happen hundreds or thousands of times. However, the elevator will eventually run out of memory. This could take months or years, so it might not be discovered despite thorough testing.
The consequences would be unpleasant; at the very least, the elevator would stop responding to requests to move to another floor (like when you call the elevator or when someone is inside and presses the floor buttons). If other parts of the program need memory (a part assigned to open and close the door, for example), then someone may be trapped inside, or if no one is in, then no one would be able to use the elevator since the software cannot open the door.
The memory leak lasts until the system is reset. For example: if the elevator's power were turned off or in a power outage, the program would stop running. When power was turned on again, the program would restart and all the memory would be available again, but the slow process of memory leak would restart together with the program, eventually prejudicing the correct running of the system.
The leak in the above example can be corrected by bringing the 'release' operation outside of the conditional:
When a button is pressed:
Get some memory, which will be used to remember the floor number
Put the floor number into the memory
Are we already on the target floor?
If not:
Wait until the lift is idle
Go to the required floor
Release the memory we used to remember the floor number
Programming issues[edit] Memory leaks are a common error in programming, especially when using languages that have no built in automatic garbage collection, such as C and C++. Typically, a memory leak occurs because dynamically allocated memory has become unreachable. The prevalence of memory leak bugs has led to the development of a number of debugging tools to detect unreachable memory. BoundsChecker, Deleaker, IBM Rational Purify, Valgrind, Parasoft Insure++, Dr. Memory and memwatch are some of the more popular memory debuggers for C and C++ programs. "Conservative" garbage collection capabilities can be added to any programming language that lacks it as a built-in feature, and libraries for doing this are available for C and C++ programs. A conservative collector finds and reclaims most, but not all, unreachable memory.
Although the memory manager can recover unreachable memory, it cannot free memory that is still reachable and therefore potentially still useful. Modern memory managers therefore provide techniques for programmers to semantically mark memory with varying levels of usefulness, which correspond to varying levels of reachability. The memory manager does not free an object that is strongly reachable. An object is strongly reachable if it is reachable either directly by a strong reference or indirectly by a chain of strong references. (A strong reference is a reference that, unlike a weak reference, prevents an object from being garbage collected.) To prevent this, the developer is responsible for cleaning up references after use, typically by setting the reference to null once it is no longer needed and, if necessary, by deregistering any event listeners that maintain strong references to the object.
Drupal is free, open source software that can be used by individuals or groups of users -- even those lacking technical skills -- to easily create and manage many types of Web sites. The application includes a content management platform and a development framework.
Technology professionals look for reliability, security, and the flexibility to create the
features they want without weighty features they don’t need. They require a platform with a
strong architecture, integrating with third-party applications. Drupal provides all this and
more, conforming to their technical and business requirements, not the other way around.
Features and Benefits Drupal is an “out of the box” web
content management tool as well as a customizable platform -- to help you
build the right tool to serve your content management strategy. Business and
technology leaders use Drupal to create real-world enterprise solutions that
empower web innovation. When assessing Drupal, it’s important to envision your
goals and ask “Can Drupal be used to build this?” The answer nearly always is
“yes”. Drupal offers limitless potential with native features and module
extensions -- it’s a platform for the next disruptive technology, without
disruption to your business.
Highly Scalable
Drupal’s scalability means it can manage the largest, most high-traffic sites in the world. Sites that experience daily
high traffic, like Weather.com, and sites that see periodic spikes in traffic, like Grammy.com and the publications of
Time, Inc. (like SI.com) all use Drupal to ensure scalability as traffic and content grows. Learn more
Mobile-First
Build responsive sites and also create web applications that deliver optimal visitor experiences, no matter what
device they’re on. Drupal supports responsive design best practices and ensures your users get a seamless content
experience every time, on every device.
Integrated Digital Applications
Drupal integrates easily with a wide ecosystem of digital marketing technology and other business applications, so
you can use the best set of tools today, and flex with new tools tomorrow. And, Drupal’s API-first focus means
connecting content to other sites and applications, making content more powerful.
Security
Drupal’s community provide countless eyes and ears to help keep Drupal sites secure. Rely on your team, but also
on the open source community to identify vulnerabilities and create/deliver patches automatically to protect your
sites and your business. And never lose a night’s sleep. Learn more
Flexible Content Architecture
Create the right content architecture using the Admin Interface or do it programatically. Display only the content
appropriate for each context with powerful display mode tools and Views. Include a variety of media types (images,
video, pdfs, etc.). Customizable menus create a comfortable user experience, creating paths to content across
multiple devices.
Tools for Business, with No Limitations
Drupal doesn’t dictate to the business; the business dictates what it needs from Drupal. Too many CMS platforms
impose their will on your business, forcing you to conform to their way of doing things. Drupal acts the opposite
way: use Drupal to create a solution that supports your specific business needs. Drupal creates a foundation for
limitless solutions.
Easy Content Authoring
Essential tools for content creation and publishing, like a customizable WYSIWYG editor for content and marketing
pros. Authentication and permissions for managing editorial workflows as well as content. Authors, publishers, site
admins and developers all use Drupal to meet their requirements, with a workflow that offers them just enough
access to features they need. Learn more
Multisite
Manage many sites across your organization, brands, geographies and campaigns on a single platform that allows
quick, easy site creation and deployment.
Community of Talent and Experience
The worldwide Drupal community shares its secrets on how to get things done, right. If you have a question,
someone has the answer. Leverage the power of open source by building on previously-created solutions. Drupal
developers have access to worldwide community experience. When’s the last time your software provider gave you
this much support?
Content as a Service
With Drupal’s structured data model you can display content in multiple layouts for the responsive web, or export it
to any app or client with a built in REST services. Drupal’s open architecture and APIs provide developers a
framework and tools to build using Drupal and to connect to other sources of data, content, and application
functionality, including marketing technology tools. Content is decoupled from delivery: content can be presented
anywhere, any channel, in any format. Learn more
Multilingual
Architect and configure Drupal to deliver sites to a global, multilingual audience as part of your localization
strategy. Drupal makes it easy to create and manage sites for different regions and geographies, and support one
to many languages across all of your sites, translating and localizing your content and experiences. Learn more
Strong Stack Foundation
Drupal lives on a modern LAMP technology stack: Linux, Apache, MySQL and PHP, which together are meeting the
needs of fast-moving, flexible, agile enterprises and brands building next generation digital platforms.
SDLC, Software Development Life Cycle is a process used by software industry to design,
develop and test high quality softwares. The SDLC aims to produce a high quality software that
meets or exceeds customer expectations, reaches completion within times and cost estimates.
SDLC is the acronym of Software Development Life Cycle.
It is also called as Software development process.
The software development life cycle (SDLC) is a framework defining tasks performed at each step in the
software development process.
ISO/IEC 12207 is an international standard for software life-cycle processes. It aims to be the standard
that defines all the tasks required for developing and maintaining software.
What is SDLC? SDLC is a process followed for a software project, within a software organization. It consists of a
detailed plan describing how to develop, maintain, replace and alter or enhance specific
software. The life cycle defines a methodology for improving the quality of software and the
overall development process.
The following figure is a graphical representation of the various stages of a typical SDLC.
A typical Software Development life cycle consists of the following stages:
Stage 1: Planning and Requirement Analysis Requirement analysis is the most important and fundamental stage in SDLC. It is performed by
the senior members of the team with inputs from the customer, the sales department, market
surveys and domain experts in the industry. This information is then used to plan the basic
project approach and to conduct product feasibility study in the economical, operational, and
technical areas.
Planning for the quality assurance requirements and identification of the risks associated with
the project is also done in the planning stage. The outcome of the technical feasibility study is to
define the various technical approaches that can be followed to implement the project
successfully with minimum risks.
Stage 2: Defining Requirements Once the requirement analysis is done the next step is to clearly define and document the
product requirements and get them approved from the customer or the market analysts. This is
done through .SRS. . Software Requirement Specification document which consists of all the
product requirements to be designed and developed during the project life cycle.
Stage 3: Designing the product architecture SRS is the reference for product architects to come out with the best architecture for the
product to be developed. Based on the requirements specified in SRS, usually more than one
design approach for the product architecture is proposed and documented in a DDS - Design
Document Specification.
This DDS is reviewed by all the important stakeholders and based on various parameters as risk
assessment, product robustness, design modularity , budget and time constraints , the best
design approach is selected for the product.
A design approach clearly defines all the architectural modules of the product along with its
communication and data flow representation with the external and third party modules (if any).
The internal design of all the modules of the proposed architecture should be clearly defined
with the minutest of the details in DDS.
Stage 4: Building or Developing the Product In this stage of SDLC the actual development starts and the product is built. The programming
code is generated as per DDS during this stage. If the design is performed in a detailed and
organized manner, code generation can be accomplished without much hassle.
Developers have to follow the coding guidelines defined by their organization and programming
tools like compilers, interpreters, debuggers etc are used to generate the code. Different high
level programming languages such as C, C++, Pascal, Java, and PHP are used for coding. The
programming language is chosen with respect to the type of software being developed.
Stage 5: Testing the Product This stage is usually a subset of all the stages as in the modern SDLC models, the testing
activities are mostly involved in all the stages of SDLC. However this stage refers to the testing
only stage of the product where products defects are reported, tracked, fixed and retested, until
the product reaches the quality standards defined in the SRS.
Stage 6: Deployment in the Market and Maintenance Once the product is tested and ready to be deployed it is released formally in the appropriate
market. Sometime product deployment happens in stages as per the organizations. business
strategy. The product may first be released in a limited segment and tested in the real business
environment (UAT- User acceptance testing).
Then based on the feedback, the product may be released as it is or with suggested
enhancements in the targeting market segment. After the product is released in the market, its
maintenance is done for the existing customer base.
SDLC Models There are various software development life cycle models defined and designed which are
followed during software development process. These models are also referred as "Software
Development Process Models". Each process model follows a Series of steps unique to its type,
in order to ensure success in process of software development.
Following are the most important and popular SDLC models followed in the industry:
Waterfall Model
Iterative Model
Spiral Model
V-Model
Big Bang Model
The other related methodologies are Agile Model, RAD Model, Rapid Application Development
and Prototyping Models.
The Waterfall Model was first Process Model to be introduced. It is also referred to as a linear-
sequential life cycle model. It is very simple to understand and use. In a waterfall model, each
phase must be completed before the next phase can begin and there is no overlapping in the
phases.
Waterfall model is the earliest SDLC approach that was used for software development .
The waterfall Model illustrates the software development process in a linear sequential flow;
hence it is also referred to as a linear-sequential life cycle model. This means that any phase in
the development process begins only if the previous phase is complete. In waterfall model
phases do not overlap.
Waterfall Model design Waterfall approach was first SDLC Model to be used widely in Software Engineering to ensure
success of the project. In "The Waterfall" approach, the whole process of software development
is divided into separate phases. In Waterfall model, typically, the outcome of one phase acts as
the input for the next phase sequentially.
Following is a diagrammatic representation of different phases of waterfall model.
The sequential phases in Waterfall model are:
Requirement Gathering and analysis: All possible requirements of the system to be developed are
captured in this phase and documented in a requirement specification doc.
System Design: The requirement specifications from first phase are studied in this phase and system
design is prepared. System Design helps in specifying hardware and system requirements and also
helps in defining overall system architecture.
Implementation: With inputs from system design, the system is first developed in small programs
called units, which are integrated in the next phase. Each unit is developed and tested for its
functionality which is referred to as Unit Testing.
Integration and Testing: All the units developed in the implementation phase are integrated into a
system after testing of each unit. Post integration the entire system is tested for any faults and failures.
Deployment of system: Once the functional and non functional testing is done, the product is
deployed in the customer environment or released into the market.
Maintenance: There are some issues which come up in the client environment. To fix those issues
patches are released. Also to enhance the product some better versions are released. Maintenance is
done to deliver these changes in the customer environment.
All these phases are cascaded to each other in which progress is seen as flowing steadily
downwards (like a waterfall) through the phases. The next phase is started only after the
defined set of goals are achieved for previous phase and it is signed off, so the name "Waterfall
Model". In this model phases do not overlap.
Waterfall Model Application Every software developed is different and requires a suitable SDLC approach to be followed
based on the internal and external factors. Some situations where the use of Waterfall model is
most appropriate are:
Requirements are very well documented, clear and fixed.
Product definition is stable.
Technology is understood and is not dynamic.
There are no ambiguous requirements.
Ample resources with required expertise are available to support the product.
The project is short.
Waterfall Model Pros & Cons Advantage
The advantage of waterfall development is that it allows for departmentalization and control. A
schedule can be set with deadlines for each stage of development and a product can proceed
through the development process model phases one by one.
Development moves from concept, through design, implementation, testing, installation,
troubleshooting, and ends up at operation and maintenance. Each phase of development
proceeds in strict order.
Disadvantage
The disadvantage of waterfall development is that it does not allow for much reflection or
revision. Once an application is in the testing stage, it is very difficult to go back and change
something that was not well-documented or thought upon in the concept stage.
The following table lists out the pros and cons of Waterfall model:
Pros Cons
Simple and easy to
understand and use
Easy to manage due to the
rigidity of the model . each
phase has specific
deliverables and a review
process.
Phases are processed and
completed one at a time.
Works well for smaller
projects where
requirements are very well
understood.
Clearly defined stages.
Well understood milestones.
Easy to arrange tasks.
Process and results are well
documented.
No working software is produced until
late during the life cycle.
High amounts of risk and uncertainty.
Not a good model for complex and
object-oriented projects.
Poor model for long and ongoing
projects.
Not suitable for the projects where
requirements are at a moderate to
high risk of changing. So risk and
uncertainty is high with this process
model.
It is difficult to measure progress
within stages.
Cannot accommodate changing
requirements.
Adjusting scope during the life cycle
can end a project.
Integration is done as a "big-bang. at
the very end, which doesn't allow
identifying any technological or
business bottleneck or challenges
early.
n Iterative model, iterative process starts with a simple implementation of a small set of the
software requirements and iteratively enhances the evolving versions until the complete system
is implemented and ready to be deployed.
An iterative life cycle model does not attempt to start with a full specification of requirements.
Instead, development begins by specifying and implementing just part of the software, which is
then reviewed in order to identify further requirements. This process is then repeated, producing
a new version of the software at the end of each iteration of the model.
Iterative Model design Iterative process starts with a simple implementation of a subset of the software requirements
and iteratively enhances the evolving versions until the full system is implemented. At each
iteration, design modifications are made and new functional capabilities are added. The basic
idea behind this method is to develop a system through repeated cycles (iterative) and in
smaller portions at a time (incremental).
Following is the pictorial representation of Iterative and Incremental model:
Iterative and Incremental development is a combination of both iterative design or iterative
method and incremental build model for development. "During software development, more
than one iteration of the software development cycle may be in progress at the same time." and
"This process may be described as an "evolutionary acquisition" or "incremental build"
approach."
In incremental model the whole requirement is divided into various builds. During each iteration,
the development module goes through the requirements, design, implementation and testing
phases. Each subsequent release of the module adds function to the previous release. The
process continues till the complete system is ready as per the requirement.
The key to successful use of an iterative software development lifecycle is rigorous validation of
requirements, and verification & testing of each version of the software against those
requirements within each cycle of the model. As the software evolves through successive cycles,
tests have to be repeated and extended to verify each version of the software.
Iterative Model Application Like other SDLC models, Iterative and incremental development has some specific applications
in the software industry. This model is most often used in the following scenarios:
Requirements of the complete system are clearly defined and understood.
Major requirements must be defined; however, some functionalities or requested enhancements may
evolve with time.
There is a time to the market constraint.
A new technology is being used and is being learnt by the development team while working on the
project.
Resources with needed skill set are not available and are planned to be used on contract basis for
specific iterations.
There are some high risk features and goals which may change in the future.
Iterative Model Pros and Cons The advantage of this model is that there is a working model of the system at a very early stage
of development which makes it easier to find functional or design flaws. Finding issues at an
early stage of development enables to take corrective measures in a limited budget.
The disadvantage with this SDLC model is that it is applicable only to large and bulky software
development projects. This is because it is hard to break a small software system into further
small serviceable increments/modules.
The following table lists out the pros and cons of Iterative and Incremental SDLC Model:
Pros Cons
Some working functionality can
be developed quickly and early
in the life cycle.
Results are obtained early and
periodically.
Parallel development can be
planned.
Progress can be measured.
Less costly to change the
scope/requirements.
Testing and debugging during
smaller iteration is easy.
More resources may be required.
Although cost of change is lesser
but it is not very suitable for
changing requirements.
More management attention is
required.
System architecture or design
issues may arise because not all
requirements are gathered in the
beginning of the entire life cycle.
Defining increments may require
definition of the complete system.
Risks are identified and
resolved during iteration; and
each iteration is an easily
managed milestone.
Easier to manage risk - High
risk part is done first.
With every increment
operational product is
delivered.
Issues, challenges & risks
identified from each increment
can be utilized/applied to the
next increment.
Risk analysis is better.
It supports changing
requirements.
Initial Operating time is less.
Better suited for large and
mission-critical projects.
During life cycle software is
produced early which
facilitates customer evaluation
and feedback.
Not suitable for smaller projects.
Management complexity is more.
End of project may not be known
which is a risk.
Highly skilled resources are
required for risk analysis.
Project.s progress is highly
dependent upon the risk analysis
phase.
The spiral model combines the idea of iterative development with the systematic, controlled
aspects of the waterfall model.
Spiral model is a combination of iterative development process model and sequential linear
development model i.e. waterfall model with very high emphasis on risk analysis.
It allows for incremental releases of the product, or incremental refinement through each
iteration around the spiral.
Spiral Model design The spiral model has four phases. A software project repeatedly passes through these phases in
iterations called Spirals.
Identification:This phase starts with gathering the business requirements in the baseline spiral. In the
subsequent spirals as the product matures, identification of system requirements, subsystem
requirements and unit requirements are all done in this phase.
This also includes understanding the system requirements by continuous communication between the
customer and the system analyst. At the end of the spiral the product is deployed in the identified
market.
Design:Design phase starts with the conceptual design in the baseline spiral and involves architectural
design, logical design of modules, physical product design and final design in the subsequent spirals.
Construct or Build:Construct phase refers to production of the actual software product at every spiral.
In the baseline spiral when the product is just thought of and the design is being developed a POC
(Proof of Concept) is developed in this phase to get customer feedback.
Then in the subsequent spirals with higher clarity on requirements and design details a working model
of the software called build is produced with a version number. These builds are sent to customer for
feedback.
Evaluation and Risk Analysis:Risk Analysis includes identifying, estimating, and monitoring technical
feasibility and management risks, such as schedule slippage and cost overrun. After testing the build,
at the end of first iteration, the customer evaluates the software and provides feedback.
Following is a diagrammatic representation of spiral model listing the activities in each phase:
Based on the customer evaluation, software development process enters into the next iteration
and subsequently follows the linear approach to implement the feedback suggested by the
customer. The process of iterations along the spiral continues throughout the life of the
software.
Spiral Model Application Spiral Model is very widely used in the software industry as it is in synch with the natural
development process of any product i.e. learning with maturity and also involves minimum risk
for the customer as well as the development firms. Following are the typical uses of Spiral
model:
When costs there is a budget constraint and risk evaluation is important.
For medium to high-risk projects.
Long-term project commitment because of potential changes to economic priorities as the requirements
change with time.
Customer is not sure of their requirements which is usually the case.
Requirements are complex and need evaluation to get clarity.
New product line which should be released in phases to get enough customer feedback.
Significant changes are expected in the product during the development cycle.
Spiral Model Pros and Cons The advantage of spiral lifecycle model is that it allows for elements of the product to be added
in when they become available or known. This assures that there is no conflict with previous
requirements and design.
This method is consistent with approaches that have multiple software builds and releases and
allows for making an orderly transition to a maintenance activity. Another positive aspect is that
the spiral model forces early user involvement in the system development effort.
On the other side, it takes very strict management to complete such products and there is a risk
of running the spiral in indefinite loop. So the discipline of change and the extent of taking
change requests is very important to develop and deploy the product successfully.
The following table lists out the pros and cons of Spiral SDLC Model:
Pros Cons
Changing requirements can be
accommodated.
Allows for extensive use of
prototypes
Requirements can be captured more
accurately.
Users see the system early.
Development can be divided into
smaller parts and more risky parts
can be developed earlier which helps
better risk management.
Management is more
complex.
End of project may not be
known early.
Not suitable for small or low
risk projects and could be
expensive for small
projects.
Process is complex
Spiral may go indefinitely.
Large number of
intermediate stages
requires excessive
documentation.
The V - model is SDLC model where execution of processes happens in a sequential manner in
V-shape. It is also known as Verification and Validation model.
V - Model is an extension of the waterfall model and is based on association of a testing phase
for each corresponding development stage. This means that for every single phase in the
development cycle there is a directly associated testing phase. This is a highly disciplined model
and next phase starts only after completion of the previous phase.
V- Model design Under V-Model, the corresponding testing phase of the development phase is planned in parallel.
So there are Verification phases on one side of the .V. and Validation phases on the other side.
Coding phase joins the two sides of the V-Model.
The below figure illustrates the different phases in V-Model of SDLC.
Verification Phases Following are the Verification phases in V-Model:
Business Requirement Analysis: This is the first phase in the development cycle where the product
requirements are understood from the customer perspective. This phase involves detailed
communication with the customer to understand his expectations and exact requirement. This is a very
important activity and need to be managed well, as most of the customers are not sure about what
exactly they need. The acceptance test design planning is done at this stage as business requirements
can be used as an input for acceptance testing.
System Design: Once you have the clear and detailed product requirements, it.s time to design the
complete system. System design would comprise of understanding and detailing the complete hardware
and communication setup for the product under development. System test plan is developed based on
the system design. Doing this at an earlier stage leaves more time for actual test execution later.
Architectural Design: Architectural specifications are understood and designed in this phase. Usually
more than one technical approach is proposed and based on the technical and financial feasibility the
final decision is taken. System design is broken down further into modules taking up different
functionality. This is also referred to as High Level Design (HLD).
The data transfer and communication between the internal modules and with the outside world (other
systems) is clearly understood and defined in this stage. With this information, integration tests can be
designed and documented during this stage.
Module Design:In this phase the detailed internal design for all the system modules is specified,
referred to as Low Level Design (LLD). It is important that the design is compatible with the other
modules in the system architecture and the other external systems. Unit tests are an essential part of
any development process and helps eliminate the maximum faults and errors at a very early stage.
Unit tests can be designed at this stage based on the internal module designs.
Coding Phase The actual coding of the system modules designed in the design phase is taken up in the Coding
phase. The best suitable programming language is decided based on the system and
architectural requirements. The coding is performed based on the coding guidelines and
standards. The code goes through numerous code reviews and is optimized for best performance
before the final build is checked into the repository.
Validation Phases Following are the Validation phases in V-Model:
Unit Testing: Unit tests designed in the module design phase are executed on the code during this
validation phase. Unit testing is the testing at code level and helps eliminate bugs at an early stage,
though all defects cannot be uncovered by unit testing.
Integration Testing: Integration testing is associated with the architectural design phase. Integration
tests are performed to test the coexistence and communication of the internal modules within the
system.
System Testing: System testing is directly associated with the System design phase. System tests
check the entire system functionality and the communication of the system under development with
external systems. Most of the software and hardware compatibility issues can be uncovered during
system test execution.
Acceptance Testing: Acceptance testing is associated with the business requirement analysis phase
and involves testing the product in user environment. Acceptance tests uncover the compatibility issues
with the other systems available in the user environment. It also discovers the non functional issues
such as load and performance defects in the actual user environment.
V- Model Application V- Model application is almost same as waterfall model, as both the models are of sequential
type. Requirements have to be very clear before the project starts, because it is usually
expensive to go back and make changes. This model is used in the medical development field,
as it is strictly disciplined domain. Following are the suitable scenarios to use V-Model:
Requirements are well defined, clearly documented and fixed.
Product definition is stable.
Technology is not dynamic and is well understood by the project team.
There are no ambiguous or undefined requirements.
The project is short.
V- Model Pros and Cons The advantage of V-Model is that it.s very easy to understand and apply. The simplicity of this
model also makes it easier to manage. The disadvantage is that the model is not flexible to
changes and just in case there is a requirement change, which is very common in today.s
dynamic world, it becomes very expensive to make the change.
The following table lists out the pros and cons of V-Model:
Pros Cons
This is a highly disciplined model
and Phases are completed one at
a time.
Works well for smaller projects
where requirements are very well
understood.
Simple and easy to understand
and use.
Easy to manage due to the rigidity
of the model . each phase has
specific deliverables and a review
process.
High risk and uncertainty.
Not a good model for complex
and object-oriented projects.
Poor model for long and
ongoing projects.
Not suitable for the projects
where requirements are at a
moderate to high risk of
changing.
Once an application is in the
testing stage, it is difficult to
go back and change a
functionality
No working software is
produced until late during the
life cycle.
The Big Bang model is SDLC model where we do not follow any specific process. The
development just starts with the required money and efforts as the input, and the output is the
software developed which may or may not be as per customer requirement.
B ig Bang Model is SDLC model where there is no formal development followed and very little
planning is required. Even the customer is not sure about what exactly he wants and the
requirements are implemented on the fly without much analysis.
Usually this model is followed for small projects where the development teams are very small.
Big Bang Model design and Application Big bang model comprises of focusing all the possible resources in software development and
coding, with very little or no planning. The requirements are understood and implemented as
they come. Any changes required may or may not need to revamp the complete software.
This model is ideal for small projects with one or two developers working together and is also
useful for academic or practice projects. It.s an ideal model for the product where requirements
are not well understood and the final release date is not given.
Big Bang Model Pros and Cons The advantage of Big Bang is that its very simple and requires very little or no planning. Easy to
mange and no formal procedure are required.
However the Big Bang model is a very high risk model and changes in the requirements or
misunderstood requirements may even lead to complete reversal or scraping of the project. It is
ideal for repetitive or small projects with minimum risks.
Following table lists out the pros and cons of Big Bang Model:
Pros Cons
This is a very simple model
Little or no planning
required
Easy to manage
Very few resources required
Gives flexibility to
developers
Is a good learning aid for
new comers or students
Very High risk and uncertainty.
Not a good model for complex and
object-oriented projects.
Poor model for long and ongoing
projects.
Can turn out to be very expensive if
requirements are misunderstood
Agile SDLC model is a combination of iterative and incremental process models with focus on
process adaptability and customer satisfaction by rapid delivery of working software product.
Agile Methods break the product into small incremental builds. These builds are provided in
iterations. Each iteration typically lasts from about one to three weeks. Every iteration involves
cross functional teams working simultaneously on various areas like planning, requirements
analysis, design, coding, unit testing, and acceptance testing.
At the end of the iteration a working product is displayed to the customer and important
stakeholders.
What is Agile? Agile model believes that every project needs to be handled differently and the existing methods
need to be tailored to best suit the project requirements. In agile the tasks are divided to time
boxes (small time frames) to deliver specific features for a release.
Iterative approach is taken and working software build is delivered after each iteration. Each
build is incremental in terms of features; the final build holds all the features required by the
customer.
Here is a graphical illustration of the Agile Model:
Agile thought process had started early in the software development and started becoming
popular with time due to its flexibility and adaptability.
The most popular agile methods include Rational Unified Process (1994), Scrum (1995), Crystal
Clear, Extreme Programming (1996), Adaptive Software Development, Feature Driven
Development, and Dynamic Systems Development Method (DSDM) (1995). These are now
collectively referred to as agile methodologies, after the Agile Manifesto was published in 2001.
Following are the Agile Manifesto principles
Individuals and interactions - in agile development, self-organization and motivation are important,
as are interactions like co-location and pair programming.
Working software - Demo working software is considered the best means of communication with the
customer to understand their requirement, instead of just depending on documentation.
Customer collaboration - As the requirements cannot be gathered completely in the beginning of the
project due to various factors, continuous customer interaction is very important to get proper product
requirements.
Responding to change - agile development is focused on quick responses to change and continuous
development.
Agile Vs Traditional SDLC Models Agile is based on the adaptive software development methods where as the traditional SDLC
models like waterfall model is based on predictive approach.
Predictive teams in the traditional SDLC models usually work with detailed planning and have a
complete forecast of the exact tasks and features to be delivered in the next few months or
during the product life cycle. Predictive methods entirely depend on the requirement analysis
and planning done in the beginning of cycle. Any changes to be incorporated go through a strict
change control management and prioritization.
Agile uses adaptive approach where there is no detailed planning and there is clarity on future
tasks only in respect of what features need to be developed. There is feature driven
development and the team adapts to the changing product requirements dynamically. The
product is tested very frequently, through the release iterations, minimizing the risk of any
major failures in future.
Customer interaction is the backbone of Agile methodology, and open communication with
minimum documentation are the typical features of Agile development environment. The agile
teams work in close collaboration with each other and are most often located in the same
geographical location.
Agile Model Pros and Cons Agile methods are being widely accepted in the software world recently, however, this method
may not always be suitable for all products. Here are some pros and cons of the agile model.
Following table lists out the pros and cons of Agile Model:
Pros Cons
Is a very realistic approach to
software development
Promotes teamwork and cross
training.
Functionality can be developed
rapidly and demonstrated.
Resource requirements are
minimum.
Suitable for fixed or changing
requirements
Delivers early partial working
solutions.
Good model for environments
that change steadily.
Minimal rules, documentation
easily employed.
Enables concurrent
development and delivery
within an overall planned
context.
Little or no planning required
Easy to manage
Gives flexibility to developers
Not suitable for handling complex
dependencies.
More risk of sustainability,
maintainability and extensibility.
An overall plan, an agile leader and
agile PM practice is a must without
which it will not work.
Strict delivery management
dictates the scope, functionality to
be delivered, and adjustments to
meet the deadlines.
Depends heavily on customer
interaction, so if customer is not
clear, team can be driven in the
wrong direction.
There is very high individual
dependency, since there is
minimum documentation
generated.
Transfer of technology to new
team members may be quite
challenging due to lack of
documentation.
The RAD (Rapid Application Development) model is based on prototyping and iterative
development with no specific planning involved. The process of writing the software itself
involves the planning required for developing the product.
Rapid Application development focuses on gathering customer requirements through workshops
or focus groups, early testing of the prototypes by the customer using iterative concept, reuse of
the existing prototypes (components), continuous integration and rapid delivery.
What is RAD? Rapid application development (RAD) is a software development methodology that uses minimal
planning in favor of rapid prototyping. A prototype is a working model that is functionally
equivalent to a component of the product.
In RAD model the functional modules are developed in parallel as prototypes and are integrated
to make the complete product for faster product delivery.
Since there is no detailed preplanning, it makes it easier to incorporate the changes within the
development process. RAD projects follow iterative and incremental model and have small
teams comprising of developers, domain experts, customer representatives and other IT
resources working progressively on their component or prototype.
The most important aspect for this model to be successful is to make sure that the prototypes
developed are reusable.
RAD Model Design RAD model distributes the analysis, design, build, and test phases into a series of short, iterative
development cycles. Following are the phases of RAD Model:
Business Modeling: The business model for the product under development is designed in terms of
flow of information and the distribution of information between various business channels. A complete
business analysis is performed to find the vital information for business, how it can be obtained, how
and when is the information processed and what are the factors driving successful flow of information.
Data Modeling: The information gathered in the Business Modeling phase is reviewed and analyzed to
form sets of data objects vital for the business. The attributes of all data sets is identified and defined.
The relation between these data objects are established and defined in detail in relevance to the
business model.
Process Modeling: The data object sets defined in the Data Modeling phase are converted to establish
the business information flow needed to achieve specific business objectives as per the business model.
The process model for any changes or enhancements to the data object sets is defined in this phase.
Process descriptions for adding , deleting, retrieving or modifying a data object are given.
Application Generation: The actual system is built and coding is done by using automation tools to
convert process and data models into actual prototypes.
Testing and Turnover:The overall testing time is reduced in RAD model as the prototypes are
independently tested during every iteration. However the data flow and the interfaces between all the
components need to be thoroughly tested with complete test coverage. Since most of the programming
components have already been tested, it reduces the risk of any major issues.
Following image illustrates the RAD Model:
RAD Model Vs Traditional SDLC The traditional SDLC follows a rigid process models with high emphasis on requirement analysis
and gathering before the coding starts. It puts a pressure on the customer to sign off the
requirements before the project starts and the customer doesn.t get the feel of the product as
there is no working build available for a long time.
The customer may need some changes after he actually gets to see the software, however the
change process is quite rigid and it may not be feasible to incorporate major changes in the
product in traditional SDLC.
RAD model focuses on iterative and incremental delivery of working models to the customer.
This results in rapid delivery to the customer and customer involvement during the complete
development cycle of product reducing the risk of non conformance with the actual user
requirements.
RAD Model Application RAD model can be applied successfully to the projects in which clear modularization is possible.
If the project cannot be broken into modules, RAD may fail. Following are the typical scenarios
where RAD can be used:
RAD should be used only when a system can be modularized to be delivered in incremental manner.
It should be used if there.s high availability of designers for modeling.
It should be used only if the budget permits use of automated code generating tools.
RAD SDLC model should be chosen only if domain experts are available with relevant business
knowledge.
Should be used where the requirements change during the course of the project and working prototypes
are to be presented to customer in small iterations of 2-3 months.
RAD Model Pros and Cons RAD model enables rapid delivery as it reduces the overall development time due to reusability
of the components and parallel development.
RAD works well only if high skilled engineers are available and the customer is also committed
to achieve the targeted prototype in the given time frame. If there is commitment lacking on
either side the model may fail.
Following table lists out the pros and cons of RAD Model:
Pros Cons
Changing requirements can
be accommodated.
Progress can be measured.
Iteration time can be short
with use of powerful RAD
tools.
Productivity with fewer
people in short time.
Reduced development time.
Increases reusability of
components
Dependency on technically strong
team members for identifying
business requirements.
Only system that can be modularized
can be built using RAD.
Requires highly skilled
developers/designers.
High dependency on modeling skills.
Inapplicable to cheaper projects as
cost of modeling and automated code
generation is very high.
Quick initial reviews occur
Encourages customer
feedback
Integration from very
beginning solves a lot of
integration issues.
Management complexity is more.
Suitable for systems that are
component based and scalable.
Requires user involvement
throughout the life cycle.
Suitable for project requiring shorter
development times.
The Software Prototyping refers to building software application prototypes which display the
functionality of the product under development but may not actually hold the exact logic of the
original software.
Software prototyping is becoming very popular as a software development model, as it enables
to understand customer requirements at an early stage of development. It helps get valuable
feedback from the customer and helps software designers and developers understand about
what exactly is expected from the product under development.
What is Software Prototyping? Prototype is a working model of software with some limited functionality.
The prototype does not always hold the exact logic used in the actual software application and is an
extra effort to be considered under effort estimation.
Prototyping is used to allow the users evaluate developer proposals and try them out before
implementation.
It also helps understand the requirements which are user specific and may not have been considered by
the developer during product design.
Following is the stepwise approach to design a software prototype:
Basic Requirement Identification: This step involves understanding the very basics product
requirements especially in terms of user interface. The more intricate details of the internal design and
external aspects like performance and security can be ignored at this stage.
Developing the initial Prototype: The initial Prototype is developed in this stage, where the very
basic requirements are showcased and user interfaces are provided. These features may not exactly
work in the same manner internally in the actual software developed and the workarounds are used to
give the same look and feel to the customer in the prototype developed.
Review of the Prototype:The prototype developed is then presented to the customer and the other
important stakeholders in the project. The feedback is collected in an organized manner and used for
further enhancements in the product under development.
Revise and enhance the Prototype: The feedback and the review comments are discussed during
this stage and some negotiations happen with the customer based on factors like , time and budget
constraints and technical feasibility of actual implementation. The changes accepted are again
incorporated in the new Prototype developed and the cycle repeats until customer expectations are
met.
Prototypes can have horizontal or vertical dimensions. Horizontal prototype displays the user
interface for the product and gives a broader view of the entire system, without concentrating
on internal functions. A vertical prototype on the other side is a detailed elaboration of a specific
function or a sub system in the product.
The purpose of both horizontal and vertical prototype is different. Horizontal prototypes are used
to get more information on the user interface level and the business requirements. It can even
be presented in the sales demos to get business in the market. Vertical prototypes are technical
in nature and are used to get details of the exact functioning of the sub systems. For example,
database requirements, interaction and data processing loads in a given sub system.
Software Prototyping Types There are different types of software prototypes used in the industry. Following are the major
software prototyping types used widely:
Throwaway/Rapid Prototyping: Throwaway prototyping is also called as rapid or close ended
prototyping. This type of prototyping uses very little efforts with minimum requirement analysis to build
a prototype. Once the actual requirements are understood, the prototype is discarded and the actual
system is developed with a much clear understanding of user requirements.
Evolutionary Prototyping: Evolutionary prototyping also called as breadboard prototyping is based on
building actual functional prototypes with minimal functionality in the beginning. The prototype
developed forms the heart of the future prototypes on top of which the entire system is built. Using
evolutionary prototyping only well understood requirements are included in the prototype and the
requirements are added as and when they are understood.
Incremental Prototyping: Incremental prototyping refers to building multiple functional prototypes of
the various sub systems and then integrating all the available prototypes to form a complete system.
Extreme Prototyping : Extreme prototyping is used in the web development domain. It consists of
three sequential phases. First, a basic prototype with all the existing pages is presented in the html
format. Then the data processing is simulated using a prototype services layer. Finally the services are
implemented and integrated to the final prototype. This process is called Extreme Prototyping used to
draw attention to the second phase of the process, where a fully functional UI is developed with very
little regard to the actual services.
Software Prototyping Application Software Prototyping is most useful in development of systems having high level of user
interactions such as online systems. Systems which need users to fill out forms or go through
various screens before data is processed can use prototyping very effectively to give the exact
look and feel even before the actual software is developed.
Software that involves too much of data processing and most of the functionality is internal with
very little user interface does not usually benefit from prototyping. Prototype development could
be an extra overhead in such projects and may need lot of extra efforts.
Software Prototyping Pros and Cons Software prototyping is used in typical cases and the decision should be taken very carefully so
that the efforts spent in building the prototype add considerable value to the final software
developed. The model has its own pros and cons discussed as below.
Following table lists out the pros and cons of Prototyping Model:
Pros Cons
Increased user involvement in
the product even before
implementation
Since a working model of the
system is displayed, the users
get a better understanding of
the system being developed.
Reduces time and cost as the
defects can be detected much
earlier.
Quicker user feedback is
available leading to better
solutions.
Missing functionality can be
identified easily
Confusing or difficult functions
Risk of insufficient requirement
analysis owing to too much
dependency on prototype
Users may get confused in the
prototypes and actual systems.
Practically, this methodology may
increase the complexity of the
system as scope of the system
may expand beyond original
plans.
Developers may try to reuse the
existing prototypes to build the
actual system, even when its not
technically feasible
The effort invested in building
prototypes may be too much if
can be identified not monitored properly
This was about the various SDLC models available and the scenarios in which these SDLC
models are used. The information in this tutorial will help the project managers decide what
SDLC model would be suitable for their project and it would also help the developers and testers
understand basics of the development model being used for their project.
We have discussed all the popular SDLC models in the industry, both traditional and Modern.
This tutorial also gives you an insight into the pros and cons and the practical applications of the
SDLC models discussed.
Waterfall and V model are traditional SDLC models and are of sequential type. Sequential means
that the next phase can start only after the completion of first phase. Such models are suitable
for projects with very clear product requirements and where the requirements will not change
dynamically during the course of project completion.
Iterative and Spiral models are more accommodative in terms of change and are suitable for
projects where the requirements are not so well defined, or the market requirements change
quite frequently.
Big Bang model is a random approach to Software development and is suitable for small or
academic projects.
Agile is the most popular model used in the industry. Agile introduces the concept of fast
delivery to customers using prototype approach. Agile divides the project into small iterations
with specific deliverable features. Customer interaction is the backbone of Agile methodology,
and open communication with minimum documentation are the typical features of Agile
development environment.
RAD (Rapid Application Development) and Software Prototype are modern techniques to
understand the requirements in a better way early in the project cycle. These techniques work
on the concept of providing a working model to the customer and stockholders to give the look
and feel and collect the feedback. This feedback is used in an organized manner to improve the
product.
The Useful Resources section lists some suggested books and online resources to gain further
understanding of the SDLC concepts.
SDLC Overview SDLC, Software Development Life Cycle is a process used by software industry to design,
develop and test high quality softwares. The SDLC aims to produce a high quality software that
meets or exceeds customer expectations, reaches completion within times and cost estimates.
SDLC is the acronym of Software Development Life Cycle.
It is also called as Software development process.
The software development life cycle (SDLC) is a framework defining tasks performed at each step in the
software development process.
ISO/IEC 12207 is an international standard for software life-cycle processes. It aims to be the standard
that defines all the tasks required for developing and maintaining software.
A typical Software Development life cycle consists of the following stages:
Stage 1: Planning and Requirement Analysis
Stage 2: Defining Requirements
Stage 3: Designing the product architecture
Stage 4: Building or Developing the Product
Stage 5: Testing the Product
Stage 6: Deployment in the Market and Maintenance
SDLC Models There are various software development life cycle models defined and designed which are
followed during software development process. These models are also referred as "Software
Development Process Models". Each process model follows a Series of steps unique to its type,
in order to ensure success in process of software development.
Following are the most important and popular SDLC models followed in the industry:
Waterfall Model
Iterative Model
Spiral Model
V-Model
Big Bang Model
The other related methodologies are Agile Model, RAD Model, Rapid Application Development
and Prototyping Models.
SDLC Waterfall Model Following is a diagrammatic representation of different phases of waterfall model.
The sequential phases in Waterfall model are:
Requirement Gathering and analysis: All possible requirements of the system to be developed are
captured in this phase and documented in a requirement specification doc.
System Design: The requirement specifications from first phase are studied in this phase and system
design is prepared. System Design helps in specifying hardware and system requirements and also
helps in defining overall system architecture.
Implementation: With inputs from system design, the system is first developed in small programs
called units, which are integrated in the next phase. Each unit is developed and tested for its
functionality which is referred to as Unit Testing.
Integration and Testing: All the units developed in the implementation phase are integrated into a
system after testing of each unit. Post integration the entire system is tested for any faults and failures.
Deployment of system: Once the functional and non functional testing is done, the product is
deployed in the customer environment or released into the market.
Maintenance: There are some issues which come up in the client environment. To fix those issues
patches are released. Also to enhance the product some better versions are released. Maintenance is
done to deliver these changes in the customer environment.
All these phases are cascaded to each other in which progress is seen as flowing steadily
downwards (like a waterfall) through the phases. The next phase is started only after the
defined set of goals are achieved for previous phase and it is signed off, so the name "Waterfall
Model". In this model phases do not overlap.
SDLC Iterative Model Following is the pictorial representation of Iterative and Incremental model:
This model is most often used in the following scenarios:
Requirements of the complete system are clearly defined and understood.
Major requirements must be defined; however, some functionalities or requested enhancements may
evolve with time.
There is a time to the market constraint.
A new technology is being used and is being learnt by the development team while working on the
project.
Resources with needed skill set are not available and are planned to be used on contract basis for
specific iterations.
There are some high risk features and goals which may change in the future.
SDLC Spiral Model The spiral model has four phases. A software project repeatedly passes through these phases in
iterations called Spirals.
Identification:This phase starts with gathering the business requirements in the baseline spiral. In the
subsequent spirals as the product matures, identification of system requirements, subsystem
requirements and unit requirements are all done in this phase.
This also includes understanding the system requirements by continuous communication between the
customer and the system analyst. At the end of the spiral the product is deployed in the identified
market.
Design:Design phase starts with the conceptual design in the baseline spiral and involves architectural
design, logical design of modules, physical product design and final design in the subsequent spirals.
Construct or Build:Construct phase refers to production of the actual software product at every spiral.
In the baseline spiral when the product is just thought of and the design is being developed a POC
(Proof of Concept) is developed in this phase to get customer feedback.
Then in the subsequent spirals with higher clarity on requirements and design details a working model
of the software called build is produced with a version number. These builds are sent to customer for
feedback.
Evaluation and Risk Analysis:Risk Analysis includes identifying, estimating, and monitoring technical
feasibility and management risks, such as schedule slippage and cost overrun. After testing the build,
at the end of first iteration, the customer evaluates the software and provides feedback.
Following is a diagrammatic representation of spiral model listing the activities in each phase:
V Model The V - model is SDLC model where execution of processes happens in a sequential manner in
V-shape. It is also known as Verification and Validation model.
V - Model is an extension of the waterfall model and is based on association of a testing phase
for each corresponding development stage. This means that for every single phase in the
development cycle there is a directly associated testing phase. This is a highly disciplined model
and next phase starts only after completion of the previous phase.
The below figure illustrates the different phases in V-Model of SDLC.
SDLC Big Bang Model The Big Bang model is SDLC model where there is no specific process followed. The
development just starts with the required money and efforts as the input, and the output is the
software developed which may or may not be as per customer requirement.
B ig Bang Model is SDLC model where there is no formal development followed and very little
planning is required. Even the customer is not sure about what exactly he wants and the
requirements are implemented on the fly without much analysis.
Usually this model is followed for small projects where the development teams are very small.
Agile Model Here is a graphical illustration of the Agile Model:
Agile thought process had started early in the software development and started becoming
popular with time due to its flexibility and adaptability.
The most popular agile methods include Rational Unified Process (1994), Scrum (1995), Crystal
Clear, Extreme Programming (1996), Adaptive Software Development, Feature Driven
Development, and Dynamic Systems Development Method (DSDM) (1995). These are now
collectively referred to as agile methodologies, after the Agile Manifesto was published in 2001.
Following are the Agile Manifesto principles
Individuals and interactions . in agile development, self-organization and motivation are important,
as are interactions like co-location and pair programming.
Working software . Demo working software is considered the best means of communication with the
customer to understand their requirement, instead of just depending on documentation.
Customer collaboration . As the requirements cannot be gathered completely in the beginning of the
project due to various factors, continuous customer interaction is very important to get proper product
requirements.
Responding to change . agile development is focused on quick responses to change and continuous
development.
RAD Model Following image illustrates the RAD Model:
Following are the typical scenarios where RAD can be used:
RAD should be used only when a system can be modularized to be delivered in incremental manner.
It should be used if there.s high availability of designers for modeling.
It should be used only if the budget permits use of automated code generating tools.
RAD SDLC model should be chosen only if domain experts are available with relevant business
knowledge.
Should be used where the requirements change during the course of the project and working prototypes
are to be presented to customer in small iterations of 2-3 months.
Software Prototyping The Software Prototyping refers to building software application prototypes which display the
functionality of the product under development but may not actually hold the exact logic of the
original software.
Software prototyping is becoming very popular as a software development model, as it enables
to understand customer requirements at an early stage of development. It helps get valuable
feedback from the customer and helps software designers and developers understand about
what exactly is expected from the product under development.
Following is the stepwise approach to design a software prototype:
Basic Requirement Identification: This step involves understanding the very basics product
requirements especially in terms of user interface. The more intricate details of the internal design and
external aspects like performance and security can be ignored at this stage.
Developing the initial Prototype: The initial Prototype is developed in this stage, where the very
basic requirements are showcased and user interfaces are provided. These features may not exactly
work in the same manner internally in the actual software developed and the workarounds are used to
give the same look and feel to the customer in the prototype developed.
Review of the Prototype:The prototype developed is then presented to the customer and the other
important stakeholders in the project. The feedback is collected in an organized manner and used for
further enhancements in the product under development.
Revise and enhance the Prototype: The feedback and the review comments are discussed during
this stage and some negotiations happen with the customer based on factors like , time and budget
constraints and technical feasibility of actual implementation. The changes accepted are again
incorporated in the new Prototype developed and the cycle repeats until customer expectations are
met.
Summary This was about the various SDLC models available and the scenarios in which these SDLC
models are used. The information in this tutorial will help the project managers decide what
SDLC model would be suitable for their project and it would also help the developers and testers
understand basics of the development model being used for their project.
We have discussed all the popular SDLC models in the industry, both traditional and Modern.
This tutorial also gives you an insight into the pros and cons and the practical applications of the
SDLC models discussed.
Waterfall and V model are traditional SDLC models and are of sequential type. Sequential means
that the next phase can start only after the completion of first phase. Such models are suitable
for projects with very clear product requirements and where the requirements will not change
dynamically during the course of project completion.
Iterative and Spiral models are more accommodative in terms of change and are suitable for
projects where the requirements are not so well defined, or the market requirements change
quite frequently.
Big Bang model is a random approach to Software development and is suitable for small or
academic projects.
Agile is the most popular model used in the industry. Agile introduces the concept of fast
delivery to customers using prototype approach. Agile divides the project into small iterations
with specific deliverable features. Customer interaction is the backbone of Agile methodology,
and open communication with minimum documentation are the typical features of Agile
development environment.
RAD (Rapid Application Development) and Software Prototype are modern techniques to
understand the requirements in a better way early in the project cycle. These techniques work
on the concept of providing a working model to the customer and stockholders to give the look
and feel and collect the feedback. This feedback is used in an organized manner to improve the
product
In this world of cloud, one of the biggest features is the ability to scale. There are different ways to accomplish scaling, which is a transformation that enlarges or diminishes. One is vertical scaling and the other is horizontal scaling. What is the difference between the two? If you look at just the definitions of vertical and horizontal you might see the following: • Vertical: something that is standing directly upright at a right angle to the flat ground • Horizontal: something that is parallel to the horizon (the area where the sky seems to meet the earth) If you are a visual kind of person you may be able to see this. Let‘s add some technology to this and see what we get. Vertical scaling can essentially resize your server with no change to your code. It is the ability to increase the capacity of existing hardware or software by adding resources. Vertical scaling is limited by the fact that you can only get as big as the size of the server. Horizontal scaling affords the ability to scale wider to deal with traffic. It is the ability to connect multiple hardware or software entities, such as servers, so that they work as a single logical unit. This kind of scale cannot be implemented at a moment‘s notice. So, having said all that, I always like to provide an example that you might be able to visually imagine. Imagine, if you will, an apartment building that has many rooms and floors where people move in and out all the time. In this apartment building, 200 spaces are available but not all are taken at one time. So, in a sense, the apartment scales vertically as more people come and there are rooms to accommodate them. As long as the 200-space capacity is not exceeded, life is good. This could even apply to a restaurant. You have seen the signs that tell you how many people could be held in the establishment. As more patrons come in more tables may be set up and more chairs added (scaling vertically). However when capacity is reached no more patrons would be able to fit. You can only be as big as the building and patio of the restaurant. This is much like in your cloud environment, where you could add more hardware to the existing machine (RAM and hard drive space) but you are limited to capacity of your actual machine. On the horizontal scaling side, imagine a two lane expressway. The expressway is good to handle the 2,000 or so vehicles that travel the expressway. As commerce begins to expand, more buildings are constructed and more homes are built. As a result the expressway that once handled 2,000 or so vehicles is now having an increase to 8,000 vehicles. This makes a major traffic jam during rush hour. To alleviate this problem of traffic jams and an increase in accidents, the expressway can be scaled horizontally by constructing more lanes and quite possibly adding an overpass. In this example the construction will take some time. Much like scaling your cloud horizontally, you add additional machines to your environment (scaling wider). This requires planning and making sure you have resources available as well as making sure your architecture can handle the scalability.
Five easy steps to improve your database performance January 30, 2015: Based on reader feedback, section 4 ―Do you have enough database connections?‖ has been revised.
Database access is a core feature of most applications. Based on our experience, it seems that for at least 80% of all applications we see, simple database performance tuning can speed up applications significantly. Fortunately, there isn‘t a lot of rocket science involved until you get really deep under the hood of database tuning. When you‘re ready to take database tuning to the next level, there are many great tools around for you to consider, for example from our friends at Vivid Cortex. For this post however, we will only focus on quick wins that you can easily achieve without any help from third parties.
Step 1. Is your database server healthy?
First and foremost, make sure that the host that‘s serving your database process has sufficient resources available. This includes CPU, memory, and disk space.
CPU CPU will most likely not be a bottleneck, but database servers induce continuous base load on machines. To keep the host responsive, make sure that it has at the very least two CPU cores available. I will assume that at least some of your hosts are virtualized. As a general rule of thumb, when monitoring virtual machines, also monitor the virtual host that the machines run on. CPU metrics of individual virtual machines won‘t show you the full picture. Numbers like CPU ready time are of particular importance.
CPU ready time is a considerable factor when assigning CPU time to virtual machines
Memory
Keep in mind that memory usage is not the only metric to keep an eye on. Memory usagedoes not tell you how much additional memory may be needed. The important number to look at is page faults per seconds.
Page faults is the real indicator when it comes to your host‘s memory requirements
Having thousands of page faults per second indicates that your hosts are out of memory (this is when you start to hear your server‘s hard drive grinding away).
Disk space Because of indices and other performance improvements, databases use up a LOT more disk space than what the actual data itself requires (indices, you know). NoSQL databases in particular (Cassandra and MongoDB for instance) eat up a lot more disk space than you would expect. MongoDB takes up less RAM than a common SQL database, but it‘s a real disk space hog.
I can‘t emphasize this too much: make sure you have lots of disk space available on your hard drive. Also, make sure your database runs on a dedicated hard drive, as this should keep disk fragmentation caused by other processes to a minimum.
Disk latency is an indicator for overloaded harddrives
One number to keep an eye on is disk latency. Depending on hard drive load, disk latency will increase, leading to a decrease in database performance. What can you do about this? Firstly, try to leverage your application‘s and database‘s caching mechanisms as much as possible. There is no quicker and more cost-effective way of moving the needle. If that still does not yield the expected performance, you can always add additional hard drives. Read performance can be multiplied by simply mirroring your hard drives. Write performance really benefits from using RAID 1 or RAID 10 instead of, let‘s say, RAID 6. If you want to get your hands dirty on this subject, read up on disk latency and I/O issues.
Step 2. Who is accessing the database?
Once your database is residing on healthy hardware you should take a look at which applications are actually accessing the database. If one of your applications or services suffers from bad database performance, do not jump to the conclusion that you know which application or service is responsible for the bad performance.
Knowing which services access a database is vital for finding database performance bottlenecks
When talking about inferior database performance, you‘re really talking about two different things. On one hand, the database as a whole may be affected. On the other hand, the database may be just a single service that‘s experiencing bad performance.
If all of the database‘s clients experience bad performance, go back and check if your host is truly healthy. Chances are that your hardware is not up to the challenge. If there is only a single service that‘s suffering from bad database response times, dig deeper into that service‘s metrics to find out what‘s causing the problem.
3. Understand the load and individual response time of each service
If an individual service is having bad database performance, you should take a deeper look into the service‘s communication with the database. Which queries are executed? How often are the queries executed per request? How many rows do they return?
You should now what kind of commands affect the database performance the most
It‘s important to know that issues that materialize on the database level may be rooted elsewhere. Very often there is an issue related to the way a database is accessed.
Look at how often queries are called per request. Maybe you can reduce the number of actual database queries by improving the database cache of your service. Question everything. Is there any reason why a single query should be executed more than once per request? If there is, maybe you can unlock some potential performance by applying smart caching strategies.
4. Do you have enough database connections?
Even if the way you query your database is perfectly fine, you may still experience inferior database performance. If this is your situation, it‘s time to check that your application‘s database connection is correctly sized.
Check to see if Connection Acquisition time comprises a large percentage of your database‘s response time.
When configuring a connection pool there are two things to consider:
1) What is the maximum number of connections the database can handle?
2) What is the correct size connection pool required for your application?
Why shouldn‘t you just set the connection pool size to the maximum? Because your application may not be the only client that‘s connected to the database. If your application takes up all the connections, the database server won‘t be able to perform as expected. However if your application is the only client connected to the database, then go for it! How to find out the maximum number of connections
You already confirmed in Step #1 that your database server is healthy. The maximum number of connections to
the database is a function of the resources on the database. So to find the maximum number of connections, gradually increase load and the number of allowed connections to your database. While doing this, keep an eye on your database server‘s metrics. Once they max out—either CPU, memory, or disk performance—you know you‘ve reached the limit. If the number of available connections you reach is not enough for your application, then it‘s time to consider upgrading your hardware. Determine the correct size for your application‘s connection pool The number of allowed concurrent connections to your database is equivalent to the amount of parallel load that your application applies to the database server. There are tools available to help you in determining the correct number here. For Java, you might want to give log4jdbc a try.
Increasing load will lead to higher transaction response times, even if your database server is healthy. Measure the transaction response time from end-to-end to see if Connection Acquisition time takes up increasingly more time under heavy load. If it does, then you know that your connection pool is exhausted. If it doesn‘t, have another look at your database server‘s metrics to determine the maximum number of connections that your database can handle.
By the way, a good rule of thumb to keep in mind here is that a connection pool‘s size should be constant, not variable. So set the minimum and maximum pool sizes to the same value.
5. Don‘t forget about the network We tend to forget about the physical constraints faced by our virtualized infrastructure. Nonetheless, there are physical constraints: cables fail and routers break. Unfortunately, the gap between works and doesn‘t work usually varies. This is why you should keep an eye on your network metrics. If problems suddenly appear after months or even years of operating flawlessly, chances are that your infrastructure is suffering from a non-virtual, physical problem. Check your routers, check your cables, and check your network interfaces. It‘s best to do this as early as possible following the first sign that there may be a problem because this may be the point in time when you can fix a problem before it impacts your business.
Retransmissions seriously impact network performance
Very often, over-stressed processes start to drop packets due to depleted resources. Just in case your network issue is not a hardware problem, process level visibility can definitely come in handy in identifying a failing component.
Database performance wrap up
Databases are sophisticated applications that are not built for bad performance or failure. Make sure your databases are securely hosted and resourced so that they can perform at their best.
Here‘s what you‘ll need to optimize your database: Server data to check host health Hypervisor and virtual machine metrics to ensure that your virtualization is okay Application data to optimize database access
Network data to analyze the network impact of database communication.
There are many tools that can provide you with this information. I used Ruxit for my examples here because it provides all the data I need in a single tool. Though, obviously, I am a bit biased.
Give it a try! Ruxit is free to use for 30 days! The trial stops automatically, no credit card is required. Just enter your email address, choose your cloud location and install our agent. Monitoring your database performance was never easier!
10 Steps to better postgresql performance
Christophe Pettus
PostgreSQL guy
Done PostgreSQL for over 10 years
Django for 4 years
Not going to explain why things work great, just will provide good options. Ask him later for details
http://thebuild.com/presentations/not-your-job.pdf
Note
Christophe talks super fast and I can’t keep up PostgreSQL features
Robust, feature-rich, fully ACID compliant database
Very high performance, can handle hundreds of terabytes
Default database with Django
PostgreSQL negatives
Configuration is hard
Installation is hard on anything but Linux
Not NoSQL
Configuration
Logging
Be generous with logging; it’s very low-impact on the system
Locations for logs
o syslog
o standard format to files
o Just paste the following:
log_destination = 'csvlog' log_directory = 'pg_log' TODO - get rest from Christophe
Shared_buffers
TODO - get this
work_mem
Start low: 32-64MB
Look for ‘temporary file’ lines in logs
set to 2-3x the largest temp file you see
Can cause a huge speed-up if set properly
Be careful: it can use that amount of memory per query
maintenance_work_mem
Set to 10% of system memory, up to 1GB
effective_cache_size
Set to the amount of file system cache available
If you don’t know it, set it to 50% of the available memory
Checkpointing
A complete fish of dirty buffers to disk
Potentially a lot of I/O
Done when the first of two thresholds are hit:
o A particular...
Note
Didn’t get any of this part of things. Easy performance boosts
Don’t run anything else on your PostgreSQL server
If PostgreSQL is in a VM, remember all of the other VMs on the same host
Disable the Linux OOM killer
Stupid Database Tricks
Don’t put your sessions in the database
Avoid aonstantly-updated accumulator records.
Don’t put the task queues in the database
Don’t use the database as a filesystem
Don’t use frequently-locked singleton records
Don’t use very long-running transactions
Mixing transactional and data warehouse queries on the same database
One schema trick
If one model ha sa constantly-updated section and a rarely-updated section
o last-seen on site field
o cut out that field into a new model
SQL Pathologies
Gigantic IN clauses (a typical Django anti-pattern) are problematic
Unanchored text queries like ‘%this%’ run slow
Indexing
A good index
o Has high selectivity on commonly-used data
o Returns a small number of records
o Is determined by analysis, not guessing
Use pg_stat_user_tables - shows sequential scans
Use pg_stat_index_blah
Vacuuming
autovacuum slowing the system down?
o increase autovacuum_vacuum_cost_limit in small increments
Or if the load is periodic
o Do manual VACUUMing instead at low-low times
o You must VACUUM on a regular basis
Analyze your vacuum
o Collect statistics on the data to help the planner choose a good plan
o Done automatically as part of autovacuum
On-going maintenance
keeping it running
monitoring
Keep track of disk space and system load
memory and I/O utilization is very handy
1 minute bnts
check_posgres.pl at bucardo.org
Backups
pg_dump
Easiest backup tool for PostgreSQL
Low impact on a running database
Makes a copy of the database
becomes impractical for large databases
Streaming replication
Best solution for large databases
Easy to set up
Maintains an exact logical copy of the database on a different host
Does not guard against application-level failures, however
Can be used for read-only queries
if you are getting query cancellations then bump up a config
Is all-or-nothing
If you need partial replication, you need to use Slony or Bucardo
o ..warning:: partial replication is a full-time effort
WAL Archiving
Maintains a set of base backups and WAL segments on a remote server
Can be used for point-in-time recovery in case of an application (or DBA) failure
Slightly more complex to set up
Encodings
Character encoding is fixed in a database when created
The defaults are not what you want
Use UTF-8 encoding
Migrations
All modifications to a table take an exclusive lock on that table while the modification is being done.
If you add a column with a default value, the table will be rewritten
Migrating a big table
o Create the column as NOT NULL
o Add constraint later once field is populated o Note
I’ve done this a lot.
Vacuum FREEZE
Once in a while PostgreSQL needs to scan every table
THis can be a very big surprise
Run VACUUM manually periodically
Hardware
Get lots of ECC RAM
CPU is not as vital as RAM
Use a RAID
AWS Survival Guide
Biggest instance you can afford
EBS for the data and transaction
Set up streaming replication
Handling Very Large Tables in Postgres Using Partitioning September 13, 2016 by Rimas Silkaitis
One of the interesting patterns that we‟ve seen, as a result of managing one of the largest fleets of Postgres databases, is one or two tables growing at a rate that‟s much larger and faster than the rest of the tables in the database. In terms of absolute numbers, a table that grows sufficiently large is on the order of hundreds of gigabytes to terabytes in size. Typically, the data in this table tracks events in an application or is analogous to an application log. Having a table of this size isn‟t a problem in and of itself, but can lead to other issues; query performance can start to degrade and indexes can take much longer to update. Maintenance tasks, such as vacuum, can also become inordinately long. Depending on how you need to work with the information being stored, Postgres table partitioning can be a great way to restore query performance and deal with large volumes of data over time without having to resort to changing to a different data store. We use pg_partman ourselves in the Postgres database that backs the control plane that maintains the fleet of Heroku Postgres, Heroku Redis, and Heroku Kafka stores. In our control plane, we have a table that tracks all of the state transitions for any individual data store. Since we don‟t need that information to stick around after a couple of weeks, we use table partitioning. This allows us to drop tables after the two week window and we can keep queries blazing fast. To understand how to get better performance with a large dataset in Postgres, we need to understand how Postgres does inheritance, how to set up table partitions manually, and then how to use the Postgres extension, pg_partman, to ease the partitioning setup and maintenance process.
Let‟s Talk About Inheritance First Postgres has basic support for table partitioning via table inheritance. Inheritance for tables in Postgres is much like inheritance in object-oriented programming. A table is said to inherit from another one when it maintains the same data definition and interface. Table inheritance for Postgres has been around for quite some time, which means the functionality has had time to mature. Let‟s walk through a contrived example to illustrate how inheritance works:
CREATE TABLE products (
id BIGSERIAL,
price INTEGER
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ
);
CREATE TABLE books (
isbn TEXT,
author TEXT,
title TEXT
) INHERITS (products);
CREATE TABLE albums (
artist TEXT,
length INTEGER,
number_of_songs INTEGER
) INHERITS (products);
In this example, both books and albums inherit from products. This means that if a record was inserted into the books table, it would have all the same characteristics of the products table plus that of the books table. If a query was issued against the products table, that query would reference information on the product table plus all of its descendants. For this example, the query would reference products, books and albums. That‟s the default behavior in Postgres. But, you can also issue queries against any of the child tables individually.
Setting up Partitioning Manually Now that we have a grasp on inheritance in Postgres, we‟ll set up partitioning manually. The basic premise of partitioning is that a master table exists that all other children inherit from. We‟ll use the phrase „child table‟ and partition interchangeably throughout the rest of the setup process. Data should not live on the master table at all. Instead, when data gets inserted into the master table, it gets redirected to the appropriate child partition table. This redirection is usually defined by a trigger that lives in Postgres. On top of that, CHECK constraints are put on each of the child tables so that if data were to be inserted directly on the child table, the correct information will be inserted. That way data that doesn‟t belong in the partition won‟t end up in there. When doing table partitioning, you need to figure out what key will dictate how information is partitioned across the child tables. Let‟s go through the process of partitioning a very large events table in our Postgres database. For an events table, time is the key that determines how to split out information. Let‟s also assume that our events table gets 10 million INSERTs done in any given day and this is our original events table schema:
CREATE TABLE events (
uuid text,
name text,
user_id bigint,
account_id bigint,
created_at timestamptz
);
Let‟s make a few more assumptions to round out the example. The aggregate queries that run against the events table only have a time frame of a single day. This means our aggregations are split up by hour for any given day. Our usage of the data in the events table only spans a couple of days. After that time, we don‟t query the data any more. On top of that, we have 10 million events generated a day. Given these extra assumptions, it makes sense to create daily partitions. The key that we‟ll use to partition the data will be the time at which the event was created (e.g. created_at).
CREATE TABLE events (
uuid text,
name text,
user_id bigint,
account_id bigint,
created_at timestamptz
);
CREATE TABLE events_20160801 (
CHECK (created_at >= ‘2016-08-01 00:00:00’ AND created_at < ‘2016-08-02 00:00:00’)
) INHERITS (events);
CREATE TABLE events_20160802 (
CHECK (created_at >= ‘2016-08-02 00:00:00’ AND created_at < ‘2016-08-03 00:00:00’)
) INHERITS (events);
Our master table has been defined as events and we have two tables out in the future that are ready to accept
data, events_20160801 and events_20160802. We‟ve also put CHECK constraints on them to make sure that only data for that day ends up on that partition. Now we need to create a trigger to make sure that any data entered on the master table gets directed to the correct partition:
CREATE OR REPLACE FUNCTION event_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.created_at >= ‘2016-08-01 00:00:00'AND
NEW.created_at < ‘2016-08-02 00:00:00' ) THEN
INSERT INTO events_20160801 VALUES (NEW.*);
ELSIF ( NEW.created_at >= ‘2016-08-02 00:00:00'AND
NEW.created_at < ‘2016-08-03 00:00:00' ) THEN
INSERT INTO events_20160802 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Fix the event_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_event_trigger
BEFORE INSERT ON event
FOR EACH ROW EXECUTE PROCEDURE event_insert_trigger();
Great! The partitions have been created, the trigger function defined, and the trigger has been added to the events table. At this point, my application can insert data on the events table and the data can be directed to the appropriate partition. Unfortunately, utilizing table partitioning is a very manual setup fraught with chances for failure. It requires us to go into the database every so often to update the partitions and the trigger, and we haven‟t even talked about removing old data from the database yet. This is where pg_partman comes in.
Implementing pg_partman pg_partman is a partition management extension for Postgres that makes the process of creating and managing table partitions easier for both time and serial-based table partition sets. Compared to partitioning a table manually, pg_partman makes it much easier to partition a table and reduce the code necessary to run partitioning outside of the database. Let‟s run through an example of doing this from scratch: First, let‟s load the extension and create our events table. If you already have a big table defined, the pg_partman documentation has guidance for how to convert that table into one that‟s using table partitioning.
$ heroku pg:psql -a sushi
sushi::DATABASE=> CREATE EXTENSION pg_partman;
sushi::DATABASE=> CREATE TABLE events (
id bigint,
name text,
properities jsonb,
created_at timestamptz
);
Let‟s reuse our assumptions that we made about our event data we made earlier. We‟ve got 10 million events that are created a day and our queries really need aggregation on a daily basis. Because of this we‟re going to create daily partitions.
sushi::DATABASE=> SELECT create_parent('public.events', 'created_at', 'time', 'daily');
This command is telling pg_partman that we‟re going to use time-series based partitioning, created_at is going to be the column we use for partitioning, and we want to partition on a daily basis for our master events table. Amazingly, everything that was done to manually set up partitioning is completed in this one command. But we‟re not finished, we need to make sure that on regular intervals maintenance is run on the partitions so that new tables get created and old ones get removed.
sushi::DATABASE=> SELECT run_maintenance();
The run_maintenance() command will instruct pg_partman to look through all of the tables that were partitioned and identify if new partitions should be created and old partitions destroyed. Whether or not a partition should be destroyed is determined by the retention configuration options. While this command can be run via a terminal session, we need to set this up to run on a regular basis. This is a great opportunity to use Heroku Scheduler to accomplish the task.
This command will run on an hourly basis to double check the partitions in the database. Checking the partitioning on an hourly basis might be a bit overkill in this scenario but since Heroku Scheduler is a best effort service, running it hourly is not going to cause any performance impacts on the database. That‟s it! We‟ve set up table partitioning in Heroku Postgres and it will be running on its own with very little maintenance on our part. This setup only scratches the surface of what‟s possible with pg_partman. Check out the extension‟s documentation for the details of what‟s possible.
Should I Use Table Partitioning? Table partitioning allows you to break out one very large table into many smaller tables dramatically increasing performance. As pointed out in the „Setting up Partitioning Manually‟ section, many challenges exist when trying to create and use table partitioning on your own but pg_partman can ease that operational burden. Despite that, table partitioning shouldn‟t be the first solution you reach for when you run into problems. A number of questions should be asked to determine if table partitioning is the right fit:
1. Do you have a sufficiently large data set stored in one table, and do you expect it to grow significantly over time?
2. Is the data immutable, that is, will it never updated after being initially inserted?
3. Have you done as much optimization as possible on the big table with indexes?
4. Do you have data that has little value after a period of time?
5. Is there a small range of data that has to be queried to get the results needed?
6. Can data that has little value be archived to a slower, cheaper storage medium, or can the older data be stored in aggregate or “rolled up”?
http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Index-only_scans
https://www.datadoghq.com/blog/100x-faster-postgres-performance-by-changing-1-line/
https://robots.thoughtbot.com/advanced-postgres-performance-tips
http://dba.stackexchange.com/questions/42290/configuring-postgresql-for-read-performance