true-kare security and disaster recovery policy security...the true-kare infrastructure is located...
TRANSCRIPT
Security and
Disaster Recovery
Policy
Index Overall Description ..................................................................................................................................... 4
TK Infrastructure Security ..................................................................................................................... 4
Physical Access to Infrastructure ..................................................................................................... 4
Remote Access to Infrastructure...................................................................................................... 4
Applicational Security ............................................................................................................................ 4
Data Security........................................................................................................................................... 5
Network Security .................................................................................................................................... 5
Current Security Plan ................................................................................................................................. 6
Development Team ................................................................................................................................ 6
Audit Team .............................................................................................................................................. 7
Security Plan for QA .............................................................................................................................. 7
Discovery ............................................................................................................................................. 7
IT Department and Development Team ......................................................................................... 8
Correction and Retesting .................................................................................................................. 8
Security Plan for Production ................................................................................................................. 8
Discovery ............................................................................................................................................. 8
IT Department and Development Team ......................................................................................... 9
Correction and Retesting. ................................................................................................................. 9
Assessment and Reporting ............................................................................................................... 9
Security Breach .................................................................................................................................... 10
Discovery ........................................................................................................................................... 10
Initial Report ...................................................................................................................................... 10
Further Reports ................................................................................................................................. 10
Correction and Recovery ................................................................................................................ 10
Final Report ....................................................................................................................................... 11
Disaster Recovery and Business Continuity ........................................................................................ 11
High Availability and Fault Tolerance ................................................................................................ 11
Load Balancing ................................................................................................................................. 12
Data Transmission Loss Prevention .............................................................................................. 12
Stored Data Loss Prevention.......................................................................................................... 12
Offsite Data Replication ................................................................................................................... 13
Monitoring of all Components ............................................................................................................. 13
Business Continuity Plan .................................................................................................................... 13
Discovery ........................................................................................................................................... 14
Minor Infrastructure Issue ............................................................................................................... 14
Major Infrastructure Issue ............................................................................................................... 14
Infrastructure Outage ....................................................................................................................... 15
Correction Followup ......................................................................................................................... 15
Reporting ........................................................................................................................................... 16
Support Documentation and Process Update ............................................................................. 16
Disaster Recovery Plan ....................................................................................................................... 16
Discovery ........................................................................................................................................... 17
Disaster Recovery ............................................................................................................................ 17
Return to Normal Operation............................................................................................................ 17
Reporting ........................................................................................................................................... 18
Future Disaster Recovery Improvements ......................................................................................... 18
Due to the nature of the services provided by True-Kare, security is on the top of our list of
concerns. The entire True-Kare infrastructure was designed from the ground-up taking into
consideration all aspects related with:
- Infrastructure Security
- Data Security
- Application Security
Overall Description
TK Infrastructure Security
The True-Kare infrastructure is located on the Amazon Web Services Elastic Cloud, located in
Ireland. Amazon has many years of experience in designing, constructing, and operating large-
scale data centers. This experience has been applied to the AWS platform and infrastructure,
which True-Kare now uses to provide its service.
Physical Access to Infrastructure
The following article from Amazon Web Services describes in detail all security processes
related to the access to the physical infrastructure where True-Kare infrastructure is located:
http://aws.amazon.com/articles/1697
All True-Kare infrastructure is constantly being tested as described in a following section of this
document.
Remote Access to Infrastructure
Remote access to the True-Kare infrastructure servers and databases is only allowed to a very
restricted set of senior staff at True-Kare IT department.
Access to Servers is performed through the use of encrypted connections, which are limited in
their origin to specific locations and require certificate based authentication.
All connections to the servers are monitored and logged as well as any failed attempts.
Applicational Security
Amazon Web Services provides a complete firewall solution. The mandatory inbound firewall is
configured in a default deny mode and True-Kare engineers explicitly open specific ports to
allow inbound traffic for specific protocols. More information regarding such features can be
found on the Amazon Web Services documentation regarding security.
All Web Applications and Daemons related to the True-Kare infrastructure are protected not
only by this firewall but also through the use of balancers that allow mitigation of DOS attacks as
well as allow instant growing of the infrastructure when needed.
All Web Applications and Daemons where designed from the ground up with security in mind.
Access to the Web Applications is allowed through the use of username/password
authentication and monitoring is performed for logging of suspicious activities, including but not
limited to access attempts and SQL injection attempts.
All True-Kare software is constantly being tested as described in a following section of this
document.
Data Security
True-Kare customer and business data, when possible and viable is stored and transmitted in
encrypted format.
Data is replicated in two different AWS Availability Zones in the same Region. This allows for
recovery in failure from single locations as well as maintains an up-to-date copy of all production
data.
Furthermore, backups are performed daily and stored not only inside the Amazon Web Service
infrastructure but also off-site in True-Kare installations under strict security measures, to which
only specific senior staff at True-Kare IT department have access to. These backups are weekly
imported to a Database at True-Kare installations to ensure integrity of backups.
Network Security
The entire True-Kare infrastructure is set behind the Amazon Web Service mandatory firewalls and along with the innate protection provided by AWS, True-Kare is well aware and protected against the traditional network security issues (from Amazon Web Services: Overview of Security Processes):
● Distributed Denial Of Service (DDoS) Attacks: AWS API endpoints are hosted on the
same Internet-scale, world class infrastructure that supports the Amazon.com retail site.
Standard DDoS mitigation techniques such as syn cookies and connection limiting are
used. To further mitigate the effect of potential DDoS attacks, Amazon maintains internal
bandwidth which exceeds its provider-supplied Internet bandwidth.
● IP Spoofing: Amazon EC2 instances cannot send spoofed traffic. The Amazon -
controlled, host-based firewall infrastructure will not permit an instance to send traffic
with a source IP or MAC address other than its own.
● Port Scanning: Port scans by Amazon EC2 customers are a violation of the Amazon
EC2 Acceptable Use Policy (AUP). Violations of the AUP are taken seriously, and every
reported violation is investigated. When Port scanning is detected it is stopped and
blocked. Port scans of Amazon EC2 instances are generally ineffective because, by
default, all inbound ports on Amazon EC2 instances are closed.
Current Security Plan
Traditionally and in general, companies perform security audits and penetration tests once per
year, if they do at all. For True-Kare once or twice per year is not enough to ensure that all data
from our clients and partners is safe and secure.
The strategy implemented by True-Kare to address these issues is a continuous one, that
focuses on keeping the infrastructure secure at all times against all threats, old and new. So
instead of security tests every year, we do them constantly all the time.
True-Kare has in place two Security Plans, which will be explained further in the document.
Particular and customized Security Plans for specific Partners and Clients are possible although
they are usually not necessary.
Development Team
True-Kare is quite unique in respect to the technologies used to provide our services. There are
two main components that make up the True-Kare services: The hardware devices and the
Platform Software, both of which were totally developed by True-Kare engineering team and IT
department.
Because all the technology used was developed and implemented by True-Kare, our
engineering teams are always improving and pushing these technologies to the next level, and
that includes all aspects of security and safety.
True-Kare does not depend on third party providers to implement solutions and corrections to
possible security and safety problems. Our engineering teams are constantly working with the
audit teams on security enhancements, which are features that have the highest priority and are
addressed immediately on reporting over any other tasks.
Audit Team
Because the security of our platform is of the utmost importance, and because it is always best
to have a fresh pair of eyes, True-Kare security audits are performed by an independent party.
True-Kare uses the Keep-IT-Secure-24 service by Integrity Sa., a service that provides True-
Kare with a continuous security assessment and pen testing.
The True-Kare platform is at all times being tested for security vulnerabilities, which includes
infrastructure problems related to the way servers are configured and installed but also for
application vulnerabilities such as unsafe checking of data, SQL injection and many more
potentially disruptive problems.
As well as continuously testing the True-Kare Production environment, True-Kare has a Testing
and QA environment setup that has deployed the latest and next versions of the True-Kare
platform. This environment is a scaled down replica of the Production environment, with the
exact same components and environment configurations and allows for testing of future
versions and configuration changes. This environment is also being continuously tested using
the Keep-IT-Secure-24 service by Integrity Sa.
Security Plan for QA
The True-Kare QA environment is continually being tested by Keep-IT-Secure-24 service by
Integrity Sa. This environment has the future to be released versions of the True-Kare platform.
Discovery
On discovery of potential safety and security threats a report is performed by Keep-IT-Secure-
24 service by Integrity Sa. and passed to True-Kare IT Department immediately through the
use of the Keep-IT-Secure-24 portal.
The reports, depending on their nature, usually contain specific details regarding the security
problem, sometimes solutions, and further notes from the Testing consultant involved on that
particular test.
On receiving the report, True-Kare Senior personnel at the IT Department involve members of
the IT Department and Development Team as necessary to resolve the potential problem.
Keep-IT-Secure-24 is requested also to perform verification on Production environment of the
security problem, which might not be present due to being a stable and already tested version of
the True-Kare Platform.
IT Department and Development Team
On being provided with a possible security problem, this task takes precedent over all other
development activities.
Depending on the specific security threat different personnel will be involved and assigned the
task of resolving the security problem.
Communication between the Testing Consultant and the IT and Dev Team is performed directly
from that point onwards through the use of the Keep-IT-Secure-24 Portal.
From Discovery of potential threat to assigning task to IT and Dev Team at True-Kare takes
usually less than 24 hours for problems at QA Platform. Security is given priority over new
features and enhancements.
Correction and Retesting
Several interactions are usually performed between the Testing team at Keep-IT-Secure-24 and
the True-Kare engineers.
On correction a re-test is requested and the Task is closed only when the Testing team at Keep-
IT-Secure-24 is satisfied.
Security Plan for Production
The True-Kare production platform is continually being tested by Keep-IT-Secure-24 service by
Integrity Sa. . This environment has the current versions of the True-Kare platform, and is the
same environment being used by our clients and partners.
Discovery
On discovery of potential safety and security threats a report is performed by Keep-IT-Secure-
24 service by Integrity Sa. and passed to True-Kare IT Department immediately through the
use of the Keep-IT-Secure-24 portal.
The reports, depending on their nature, usually contain specific details regarding the security
problem, sometimes solutions, and further notes from the Testing consultant involved on that
particular test.
Some security issues can be detected through the True-Kare infrastructure monitoring and
logging services, as well reported from other sources, such as public reports of security
problems with components that might be in use on the True-Kare platform.
On receiving the report, True-Kare Senior personnel at the IT Department involve members of
the IT Department and Development Team as necessary to resolve the potential problem and
assess any security breach.
IT Department and Development Team
Any security and safety problem reported or discovered on the production environment requires
that an assessment is performed on possible security breach.
Depending on the specific security threat different personnel will be involved and assigned the
task of resolving the security problem.
If the security problem was a result of a Keep-IT-Secure-24 report, the communication between
the Testing Consultant and the IT and Dev Team is performed directly from that point onwards
through the use of the Keep-IT-Secure-24 Portal.
From Discovery of potential threat to assigning task to IT and Dev Team at True-Kare takes
usually less than 4 hours for problems at production platform.
Correction and Retesting.
On correction, a test or re-test is requested and the Task is closed only when the Testing team
at Keep-IT-Secure-24 is satisfied that the security problem is resolved.
Assessment and Reporting
For each potential security issue on the True-Kare production environment a report is performed
on the issue at hand which will contain:
● General description of security issue
● Time of discovery and time of implementation of fix.
● Assessment of the security problem
● Assessment of breach, if one existed
All security issues related to customer and partner data is always disclosed to our partners as
well as measures implemented to mitigate such threats.
Security Breach
The True-Kare platform is constantly being monitored and logged, as well as being constantly
tested by an Independent company for potential security problems, still there is a possibility of a
security breach and True-Kare has a plan to deal with this possibility.
Discovery
On Detection of a security breach or potential security breach, the findings are immediately
reported to Keep-IT-Secure-24 service from Integrity Sa. for further and concurrent analyses of
threat with logging and reporting on how the breach was discovered.
A task force is created with True-Kare Senior IT department staff and True-Kare Senior Dev
Team staff to investigate causes, extension of damage including but not limited to data
compromise.
Initial Report
On creation of the Security Breach Task Force, an initial report is created and all True-Kare
partners are immediately notified of the security breach as well as all initial finding, as long as
these findings do not compromise the work of the Security Breach Task Force in assessing
damage and implementation of correction and further monitoring.
A Senior Staff of True-Kare will be appointed the single point of contact for all Partners for
issues related to this incident. This will be the person a Partner can contact to get further
information in regards to the work being performed to address the issue.
Further Reports
The True-Kare Single Point of Contact for this incident will release continuous reports on the
work being performed to address the issue at hand.
These reports will include new findings that are pertinent to our Partners and Clients, details of
corrections being implemented and when possible resolution times.
At least one daily report will be sent regarding the issue at hand.
Correction and Recovery
The Security Breach Task Force can request at any time the assigning of further members of
the True-Kare IT Department and True-Kare Development Team to address specific areas of
the issue at hand.
Security and safety issues have precedent over any and all New Features and Enhancement
tasks.
The Security Breach Task Force will work continuously with the Keep-IT-Secure-24 service
team to assess and correct the issue and remove the security threat and also assess the
damages from the security breach and correct any problems that might have resulted from the
security breach.
Final Report
At correction and closure of security threat, the True-Kare Single Point of Contact will compose
a report that will inform all our Partners of:
● the extent of the damage,
● the security problems that were discovered
● the security features that were implemented to address further problems
● all steps taken to address the security problem
The True-Kare Single Point of Contact will also contact directly all partners that might have had
their data compromised in any way in order to further explain what steps were taken to correct
the problem and if necessary to implement a Task Force to work with the Partner to review any
and all data problems that might have arise from this issue.
Disaster Recovery and Business Continuity
All True-Kare Platform components are redundant and some even have failover components
that take over operation on the event of a catastrophic failure.
In order to achieve Business Continuity during all times True-Kare infrastructure contains no
single point of failure and using Amazon Web Services helps to ensure this.
High Availability and Fault Tolerance
The True-Kare Platform was designed from the ground up using as reference the Amazon Web
Services Reference Architectures for Web Application Hosting, Fault Tolerance and High-
Availability.
Load Balancing
All Traffic to the True-Kare Platform is balanced through multiple components through the use of
Elastic Load Balancers and through the use of other software load balancing solutions. All load
balancers are configured in a very aggressive configuration in order to detect any problems
before any user.
Redundant components of the True-Kare Platform exist in different Availability Zones on the
same Regions, to allow for fast communication between all components of the True-Kare
Platform but also to achieve High-Availability and Fault Tolerance by not being dependent on
the correct working of a specific Availability Zone.
Data Transmission Loss Prevention
The True-Kare Platform, in particular the communication daemons with devices, were designed
from the ground-up with high-availability and fault tolerance in mind.
Communications that are performed to the True-Kare Platform that are not acknowledged are
required to be repeated by both the True-Kare Platform and the True-Kare Devices, and all data
is verified for consistency to avoid data duplication when possible.
Data Synchronization processes are monitored by True-Kare IT Department in an automatic
fashion with alerts for anomalous situations. These anomalous situations are reported to the
Engineering and Development Teams at True-Kare that work together to correct any problems
that might have been detected.
This is only possible because True-Kare has its own Engineering and Development team that
control all aspects of the services being developed without requiring third party companies to
intervene.
Stored Data Loss Prevention
All True-Kare Platform Databases are configured, as stated before, in a High-Availability and
Fault Tolerance setup.
All databases are replicated to nodes in different Availability Zones inside the same regions.
Backups are performed daily inside the Amazon Web Service Infrastructure and stored for 1
month inside the Amazon Web Service.
All Backups are also transmitted and stored offsite in the True-Kare office for even longer
periods of time.
Weekly backups are imported and tested not only to assert correctness and integrity of the
backups but also to establish time measurements for total recovery of Database.
Offsite Data Replication
True-Kare is continuously improving not only its services and products but also all processes
and methodologies in use internally to address issues that might arise.
It is at this time being tested on the True-Kare QA environment the asynchronous replication of
database data to an off-site infrastructure hosted on another cloud other than Amazon Web
Service, at Amsterdam in The Netherlands.
This will allow in the future a quicker recovery of the entire platform in case of a disastrous
failure within the Amazon Web Service cloud, which is something that was never experienced
thus far by True-Kare, but a scenario that could possibly happen and is being addressed.
Monitoring of all Components
All True-Kare Platforms and individual components are under continuous and uninterrupted
monitoring with automatic alarms for any and all problems or threshold conditions of the servers
and services being provided by those servers.
Because all components of the True-Kare Platform exist in a redundant High Availability and
Fault Tolerant setup, alarms and threshold conditions usually show no impact to customers and
partners.
Monitoring of the True-Kare Platform is performed 24/7, 365 days per year. Apart from the
Support Engineer assigned to monitoring there is always :
● a Senior IT Engineer on call 24/7 , 365 days per year
● a Development Engineer on call 24/7 , 365 days per year
Situations that cannot be resolved by the Support Team on call are escalated quickly to the
corresponding department that can address the issue in question.
Business Continuity Plan
The True-Kare Business Continuity Plan , which is part of the Support Activities of the True-
Kare Support and IT Department, is the plan by which True-Kare assures that all services
continue to operate even when major failures exist.
Discovery
Issues that in any way impact on the normal service of the True-Kare platform are usually
discovered in one of two ways:
● Monitoring
● Reporting
The True-Kare Platform components are continuously monitored by the True-Kare Support and
IT Department. All components have a set of alarms configured to detect active faults or prevent
the possibility of a fault in a near future.
Occasionally and more usually related with new versions of the True-Kare Platform being
deployed into production, there can exist issues which were not detected previously during the
Testing and QA phase of development. These issues are usually found through the reporting by
partners or clients, but more usually through the reporting by True-Kare Support.
Due to the nature, design and configuration of the True-Kare Platform, most issues reports have
minimal impact on the service itself, with the most common impact being degradation of
performance in some way.
Upon Discovery of an issue the Support team will categorize the Issue and depending on the
type of issue it will immediately try to remedy the situation, if it is at all possible or involve any
parties necessary to address the issue
Minor Infrastructure Issue
A Minor Infrastructure Issue is an issue that:
● Has low or no impact to the True-Kare Platform (small degradation of performance)
● Has no visibility to client and partner
The Support Team to whom the issue was allocated will correct the Issue at hand following all
documentation available to the Support Team and perform root cause analyses and report to
the True-Kare IT Department Head.
If root cause cannot be established, a report is made to the True-Kare IT Department Head who
will then assign the case to a Senior IT Staff for further analyses.
Major Infrastructure Issue
A Major Infrastructure Issue is an issue that has any of the following :
● Has impact to the True-Kare Platform (high degradation of performance, or total loss of
layer)
● Has visibility to client and partner
These issues are always allocated to a Senior IT Staff and all work is performed under their
supervision.
The issue is addressed by following all documentation available or by involving the necessary
on call staff of the True-Kare Development Team.
Infrastructure Outage
An Infrastructure Outage Issue, or Fatal Issue, is an issue that has High impact to the True-Kare
Platform such as no service being provided, in which clients and partners are affected.
These issues are always allocated to a Senior IT Staff and all work is performed under their
supervision.
The issue is addressed by following all documentation available or by involving the necessary
on call and not on call staff of the True-Kare IT and Development Team.
A Senior True-Kare staff is assigned as Single Point of Contact for contact from Partners and
immediate contact is made to the True-Kare Partners to inform of the incident through the use of
a report.
Correction Followup
Upon correction of any incident on the True-Kare Platform, root cause analyses are performed,
even if there is no visibility to the Clients or Partners.
From these root cause analyses many tasks arise:
● Further and more definite correction of issue
● Update of monitoring to allow early detection of issue or prevention
● Update of Support Documentation on recovery
● Update of Support Processes
True-Kare takes very seriously any and all issues that affect the True-Kare Platform and
Service, even if those issues are not visible to the public in general.
True-Kare Support Documentation is constantly being updated, and processes refined to allow
for prevention, early detection and fast resolution of issues.
Reporting
For any issue that has visibility to our Client and Partners, a report is made and Partners and
Clients informed.
Follow up reports detailing root cause analyses and steps taken to prevent, detect early and
correct faster are detailed.
Until the issue is closed at least one daily report is made to our Partners in order to keep
Partners informed of the efforts being made to minimize any and all impact to the service.
Support Documentation and Process Update
The True-Kare Support Documentation is continuously being updated with new information and
resultion data.
On discovery of issues and their resolutions, the Support Documentation is modified and
adapted to address such issues faster.
Monitoring Alarms and Triggers and also constantly being updated with more variables, and
with the improvements made to the True-Kare Platform constantly new and more aggressive
detection strategies are being used.
Disaster Recovery Plan
Due to the way the True-Kare Platform was designed and implemented, it actually works as two
different and independent infrastructures being load balanced through the use of several
devices as Elastic Load Balancers, etc.
All True-Kare Platform components exist in at least 2 different Availability Zones inside the
Amazon Web Services Europe Area. An Availability Zone is Isolated and connected to other
Availability Zones on the same Region through low-latency links.
In the event of an Availability Zone failure, all traffic is automatically redirected to the other
remaining and available Availability Zone. One of these events will drop the performance of the
overall True-Kare Platform and Service to a maximum of half the performance, and in some
cases not even that due to the fact that some components are replicated through 3 Availability
Zones.
In the event of a complete and catastrophic failure of all Availability Zones of Amazon Web
Services for the European Region, then the Disaster Recovery Plan is put in place.
As well as having a redundant infrastructure setup at Amazon Web Services, True-Kare has
also another infrastructure in The Netherlands.
Discovery
The Disaster Recovery Plan is executed as a follow up of the non-resolution of a Complete
Infrastructure Outage.
At this point in time all Senior True-Kare Support Staff, Senior True-Kare IT Department and
Partners are already aligned and informed about the outage.
Disaster Recovery
To start the Disaster Recovery process, the following steps are performed in accordance with
the True-Kare Infrastructure documentation:
● The Netherlands True-Kare infrastructure is deployed with the Production Version of the
True-Kare Platform through the use of already in place Scripts
● The Latest DB Backup is imported to the DataBase, through the use of already in place
Scripts.
● Communication Daemons are started with the Latest Version, through the use of already
in place Scripts
● DNS records are modified in accordance with the True-Kare Infrastructure
Documentation.
DNS records for True-Kare are kept with Low TTLs in order to allow quick redirection of traffic.
Databases are setup with several types of logs that allow for recovery and avoid loss of data
when returning to normal operations.
Return to Normal Operation
On returning to Normal Operation , the Amazon Web Service Platform is started and all True-
Kare Platform software is loaded in Write-Only Emergency Mode and DNS records are modified
back to the normal True-Kare Platform.
The Write-Only Emergency Mode has the following particularities:
● No updates are allowed from any True-Kare Platform component
● No updates are allowed from Devices
● Only Emergency actions from Devices are recorded and acted upon.
During this stage the Database is updated with the new events that occurred on the Disaster
Recovery Platform through the use of logging on the database and through the use of logging of
operations on True-Kare Communication Daemons. This will allow to get the Normal True-Kare
Platform up to date with all changes performed on the Disaster Recovery Platform.
Depending on the amount of time the Disaster Recovery Platform was necessary, it might take
more or less time to get the database up to date.
Once the Production Database is up-to-date the True-Kare Plaftorm is set to Normal Mode,
which will allow normal usage of the True-Kare Platform and allow for queued events from
devices to be performed.
Reporting
It is a policy of True-Kare to be as open and direct as possible with its partners and during all
activities update reports are performed and sent to all Partners.
As with any issue with the True-Kare Platform, a root cause analyses is performed in order to
assert causes for outage, ways to mitigate and prevent future problems, and adjustments to be
performed not only to the True-Kare Infrastructure Documentation as well as to the True-Kare
Support Documentation.
Any and all process improvements that are found to improve the True-Kare responsiveness to
such events are made and documented.
All of these improvements are communicated to our Partners.
Future Disaster Recovery Improvements
As stated before, True-Kare is absolutely committed to improving all areas of our service and
that obviously includes the responsiveness to catastrophic events that might lead to a Disaster
Recovery Scenario.
Because of this, True-Kare is at this time testing Asynchronous Replication of all the Production
databases to our Netherlands off-site.
Once this testing is complete, and if proven to be a possible and better alternative to the existing
Disaster Recovery Plan, a complete re-write of the Entire Disaster Recovery Plan will be made
taking into account that a Live almost up-to-date (delta of second) database replica exist.
This will allow for quicker recovery of a catastrophic event, allow for faster return to normal
operation by just synchronizing the production database, etc.
Partners are informed proactively of all new modifications and improvements being made.