· web viewat present, the rapid development of computer technology and internet application cause...
TRANSCRIPT
Cloud Storage Oriented Cipher-
text Search Protocol
Catalogue1. Introduction............................................................................................................................4
1.1 Background .....................................................................................................................4
1.2 Purpose ...........................................................................................................................5
1.3 Application .....................................................................................................................5
1.4 Terminology ...................................................................................................................5
1.5 Symbol description..........................................................................................................7
1.6 Normative reference .......................................................................................................8
2. Overview................................................................................................................................8
2.1 Protocol overview............................................................................................................8
2.2 Design Philosophy...........................................................................................................9
2.3 Requirements for Design...............................................................................................13
3. Data Types...........................................................................................................................14
3.1 Definition.......................................................................................................................14
3.1.1 File..........................................................................................................................14
3.1.2 Array.......................................................................................................................14
3.1.3 Index.......................................................................................................................14
3.1.4 Token......................................................................................................................15
3.1.5 Proof.......................................................................................................................15
3.1.6 Hash table...............................................................................................................16
3.1.7 Merkle Hash Tree...................................................................................................16
3.2 Implementation..............................................................................................................17
4. Message Types.....................................................................................................................19
5. File Storage..........................................................................................................................21
5.1 Overview........................................................................................................................21
5.2 Generate index...............................................................................................................21
5.3 Generate keys.................................................................................................................22
5.4 Encryption......................................................................................................................23
5.5 Generate.........................................................................................................................26
5.6 Upload file.....................................................................................................................28
5.7 Store files.......................................................................................................................29
6. File Search...........................................................................................................................29
6.1 Overview........................................................................................................................29
6.2 Generate search token....................................................................................................30
6.3 Send search token..........................................................................................................30
6.4 Search............................................................................................................................31
6.5 Return result...................................................................................................................31
6.6 Decryption.....................................................................................................................32
7. Challenge and Proof............................................................................................................32
7.1 Overview........................................................................................................................32
7.2 Generate challenge.........................................................................................................33
7.3 Send challenge...............................................................................................................33
7.4 Generate proof...............................................................................................................34
7.5 Send proof......................................................................................................................34
7.6 Validate proof................................................................................................................35
8. File Update...........................................................................................................................35
8.1 Overview........................................................................................................................35
8.2 Generate Keys................................................................................................................36
8.3 Generate update token...................................................................................................37
8.4 Send token/file...............................................................................................................38
8.5 Update files....................................................................................................................39
8.6 Update index..................................................................................................................39
8.7 Update Search Authentication Token............................................................................40
8.8 Return new DSA............................................................................................................41
8.9 Update DSA...................................................................................................................42
9 Error Handling ....................................................................................................................42
1. Introduction
1.1 Background
In recent years, with the rapid development of cloud computing, cloud storage
which is one of the most important parts of cloud computing is becoming a
researching hot. Technically speaking, cloud storage refers a system which consists of
large numbers of different types of network storage devices working together. These
devices use the technology of cluster application, grid and distributed file system to
provide storage and business access.
At present, the rapid development of computer technology and Internet
application cause data grow exponentially. People have more and more demand for
storage. Under this trend, the proposal and development of cloud storage not only
brings cheap storage service for people but also challenges traditional data storage
service. Cloud storage, as a new storage method, has a big advantage over traditional
storage. First of all, in the cloud storage, storage exists as a service. When a user has
requirement for storage, he applies the appropriate size of space from cloud storage
service provider to avoid constructing and managing storage platform himself. It
ensures the full utilization of storage resources, and reduces the overhead of storage
cost of user. Secondly, cloud storage can provide data backup, disaster recovery, load
balance and other functions. So when some storage nodes are upgraded or damaged, it
can also provide storage service normally to user to avoid the interruption of service.
In addition, the authorized users can access the cloud storage service at any place
through network. This storage flexibility will play a great role in promoting the
development of mobile Internet. The scalability, low cost, no access restrictions and
easy management of cloud storage will bring a great challenge to the traditional
storage method.
At the same time, cloud storage also has problems. In cloud storage, all the data
are delivered to cloud storage provider, and users lose the absolute control of the data,
which will inevitably cause users to concern about the data security. Cloud storage has
provided security protection measures, such as the use of general SSL in data
transmission (Secure Sockets Layer) and TLS (Transport Layer Security) protocol,
data encryption and firewall settings, but the data security depends entirely on the
cloud storage system security, the quality of data administrator and other controlled
factors because of the centralized management of the CSP. In addition, data stored in
the cloud has become a primary target for malicious users and hackers. And if the data
isolation becomes invalid, the private data of users may be leaked. Although cloud
storage provider provides SLA (Service Level Agreement) protocol to user to
illustrate the services grades it provides, various uncontrolled factors still cause the
concerns of users. Data security is a key problem in cloud storage. The survey of the
Twinstrata in 2012 shows that only 20% of users are willing to store their private data
in the cloud, and about 50% of people are willing to use cloud storage for data
backup, archival storage, disaster recovery and so on. Thus, the problem of data
security obstacles the cloud storage extension.
The cloud storage vendors, such as Windows, VMware, Amazon, Google all
have launched their own cloud storage services and give a certain assurance of data
security, such as a variety of encryption, authentication means, to protect the privacy
of users. But there are still various security incidents. In 2005, the encrypted tapes of
the America bank were lost, resulting in the disclosure of a large number of
customers’ information. In April of 2011, the information of nearly 77000000 online
customers of Sony was stolen, including credit card data. In June of the same year,
Google was invaded and some important personnel mailbox accounts were theft.
These security events have caused users to lose trust in these cloud service providers.
1.2 Purpose
This protocol constructs a dynamic cipher-text search model and a search
verification model based on cloud storage from the perspective of protection of data
privacy. This model enables users to store their private data at the untrusted party.
Even if the data are stolen, it will not disclose any information about the plaintext of
the data. The model also supports the search operation based on the keyword and
dynamic adding and deleting files.
1.3 Application
The system based on this protocol can be used in some confidential departments
and sensitive commercial sectors. These departments and sectors can store a large
number of secret information in the cipher-text form in the cloud server and retrieve at
any time as necessary.
1.4 Terminology
Keyword. A keyword is used to indicate the word of file content, which is the
generalization and centralization of information. In this protocol it refers to some
words selected as the identifications of the files.
Linked list. The linked list is a storage structure, the physical storage unit of
which is non-continuous and non-sequential. The logical sequence of data elements is
implemented by the linked order of pointer. The list consists of a series of nodes (each
element in the list is called node), which can be dynamically generated at runtime.
Each node consists of two parts: one is the data domain which stores the data
elements and the other is the pointer domain which stores the address of the next
node. Compared to the structure of the sequence list, the linked list is more convenient
for inserting and deleting operations.
Array. An array is a form of organizing some variables with the same type
orderly in order to handle easily.
Pseudo random function. A pseudorandom function is an algorithm of producing
random numbers for people.
Token: A token is a kind of special frames which can control the site occupying
media, to distinguish from the data frame and the other control frames. In this
protocol the token is a kind of data format, used to represent the transmission type of
message and the transmission data.
Inverted index. An inverted index is derived from practical applications which
need to find records according to the value of the attribute. Each part of this index
table includes an attribute value and the address of each record with the attribute
values. It determines the record position by attribute value and it doesn’t determine
the attribute value by record.
Encryption. Encryption is to make the original plaintext files or data become
unreadable code, often referred as "cipher-text", according to a certain algorithm. It
can only demonstrate the original content after the corresponding key is input. By this
way, the purpose to protect data from being stolen or read by illegally people is
achieved.
MD5. Message Digest Algorithm MD5 (message digest algorithm version fifth)
is a hash function which is widely used in the computer security field to provide the
integrity of the message. The algorithm transforms an arbitrary length byte string into
a large length-fixed integer to ensure data integrity.
Cloud storage. Cloud storage develops based on cloud computing. It focuses on
providing users with online storage service based on the Internet. Cloud storage
organizes a large number of different types of storage devices to cooperate together
by software to provide external data storage services.
Digital signature. Digital signature is some data which is added on the data unit,
or the cipher transformation made to the data unit. This data or transformation allows
the recipient of data unit to confirm the data sources and data integrity and to protect
data from being forged.
Key. The key is a kind of parameter. It is the data as the input of algorithm
converting the plaintext into cipher-text or cipher-text into plaintext.
1.5 Symbol description
Symbol description
symbol descriptionThe file set including n files
#F The number of filesThe files including m keywords
W The set of keywords#W The number of keywords
The file set including the keyword w
The number of files including the keyword w
The linked list made up of files including keyword w
The linked list made up of all keywords in file f
Inverted indexfree An special keyword ,satisfied
Search array, used to store keyword linked list
Dictionary, used to record the head node of the linked list
Deleting array, used to store file linked list
Dictionary, used to record the head node of the linked list
γ Encrypted index, defined as
Function description:
The algorithm used in the search part:
Description of Algorithm Function
: Running on the client, used to generate the key for symmetric encryption
algorithm and pseudo-random function.
: Use the key to encrypt the user’s file and keyword information into
cipher-text and the encrypted index.
. Use the user’s keyword to generate the corresponding search
token.
: Use the search token to perform search operation on the encrypted
index.
: Generate the add token according to the files to be added and
the corresponding keywords.
: Use the received add token to add files and update the stored
encrypted index.
:Generate the deleting token according to the files to be deleted.
: Use the received delete token and the files to be deleted to update the
stored encrypted index.
The algorithm used in the authentication part:
Description of Algorithm Function
. To generate the key used in the algorithm and the client is responsible for
keeping the key.
: To generate the search authenticator when storing files.
: Run by the Client, and it is used to generate the challenge of searching
some keywords.
: Run by the Server, and it is used to generate the authentication path
according to a certain search.
: The verification algorithm. Run by the Client, and it is used to verify
the proof sent from the Server
: Generate add token according to the files to be added and the
corresponding keywords.
: Use the received add token to add files and update the stored
encrypted index.
: Generate the deleting token according to the files to be deleted
: Use the received delete token and the files to be deleted to update the
stored encrypted index.
: It is used to update the DSA state.
1.6 Normative reference
Kamara S, Lauter K. Cryptographic cloud storage[J]. Financial Cryptography and Data Security, 2010: 136-149.
2. Overview
2.1 Protocol overview
The system uses C/S architecture and it is composed of two entities, client and server. The main function of client is key generation, data encryption/decryption, authenticator generation, token generation and so on. The main function of server is searching, proof generation, update operation and so on. The overall frame is shown in figure 2.1.
figure 2.1 the frame structure of the cloud storage system
Figure 2.1: ①file encryption/decryption at the client; ②generate the search token
using keywords at the client; ③generate add/deleting tokens using the files to be
added/deleted at the client; ④store the files which user uploads at the server; ⑤search
on the encrypted index according to the received search token at the server; ⑥update
files according to the received add/delete token at the server; ⑦interaction of data
between the client and server.
As can be seen from figure 2.1, the main function of the client is obtaining the
original data from the users (including the files to be uploaded, the searching
keywords, and the files to be updated), processing data and uploading data to the
cloud. The main function of the server is receiving the data sent by client and doing
the corresponding operations, mainly including storing, searching, and updating.
2.2 Design Philosophy
The protocol analysis the security of the existing cloud storage system, and puts
forward a secure framework of cloud storage system.
Fig. 2.2 complete Security Model of Cloud StorageThe construction of the model of secure cloud storage system is based on the
Searchable Symmetric Encryption (SSE) algorithm, combining with the secure cloud
storage system architecture. Through the SSE algorithm, the user can encrypt the data
and index, and send the cipher-text and the secure index to the cloud service provider
for storing. When executing the search operation, the cloud service provider searches
on the secure index using the search token generated by the user, and returns the
cipher-text set to the user. Then the user can decrypt the received result and get the
plaintext file corresponding to the search keyword. In addition, users can add files and
delete files at any time, and it still can be able to guarantee the correctness of the
index. In order to verify the search results returned by the server, a dynamic search
authentication (DSA) algorithm is designed. The algorithm is based on the improved
Merkle authentication dictionary and can validate the correctness of the search result.
The algorithm also support update operation based on the token and the algorithm can
achieve higher efficiency at communication and computations.
Searchable encryption
The model involves only two entities. One is the owner of the confidentiality
data, who hopes to store the data in the cloud and prevent from illegal access to the
data. This kind of entity is called the user (Client). The other kind of entity is the
cloud storage service provider, who provides storage interface outward, stores the data
and performs specific search operation on the data. It is called the server (Server).
According to the mentions above, in order to guarantee the security of the data in
the maximum extent, all of the operations processing user data are basically placed at
the client, including user’s files encryption, file index encryption and process of
keywords. And the server only needs to store the files and do the limited retrieval
function.
Fig. 2.3 Searchable Encryption Model of Cloud Environment
As can be seen from the chart, the user uses the computer to select the file sets
needed to be stored, preprocesses the files and then uploads them to the cloud. The
preprocessing of the files is divided into two parts which execute simultaneously. One
part is using symmetric encryption algorithm to encrypt the files set to get the cipher-
text set, and then uploading them to the cloud storage server. The other part is
constructing the index using the keywords of the file, encrypting the index using the
special encryption method and storing the result which is called the encrypted index
in the cloud. The storage of the file and index is managed by the cloud storage service
providers. The users only need to upload the files, without caring about the details of
file storage. When a user searches some keyword, the client generates the search
token corresponding the keyword using the method provided by the algorithm and
sends the token to the cloud storage server. Then the server performs the search
operation and returns the result.
The key of constructing the searchable encryption algorithm lies in the
encryption of file index. In order to obtain a better search experience, this protocol
uses the form of keyword specified by the user in advance. After obtaining the
keyword information, preprocess these keywords. The keyword linked list is
constructed by the files containing the same keyword and the file identifier is written
in the linked list corresponding to the nodes. All of the keyword linked list form the
inverted index. In order to ensure that the server cannot obtain effective information
from the index, the pseudo random function (PRFs) is used to encrypt the inverted
index. The encrypted index is stored in the random position of the search array, and
each head node of the list is stored in the dictionary Ts (also called search table). The
processed arrays and dictionaries are stored in the server. Because the inner elements
are all encrypted data, the server cannot get the plaintext information directly from the
search arrays and the search table. When the user search a keyword, process the
keyword to get the search token which contains the information designating the
position of the keyword in the encrypted index. After the server receives the search
token, it reads the encrypted index of the user, performs the search operation, gets the
file identification, and sends the responding cipher-text to the client.
The user of cloud storage users may add or delete the files at any time, so the
protocol must be able to support dynamic addition and deletion operation. The
previous discussion shows that the key of the search lies in the construction of the
encrypted index. In order to ensure that it still can be efficient and correct to perform
the search operation after the user adds and deletes the files, the encrypted index must
be updated in the process of adding and deleting files. When the user adds files, the
keyword that the file contains maybe existed or new. No matter what kind of situation,
it only needs perform the corresponding updating operation on the keyword linked
list, and the operation is not difficult. When the user deletes a file, the file contains
different keywords which may be at any node of the keyword linked list. So every
node of the linked list containing the keyword must be traversed. After deleting the
node, the continuity of the linked list also needs to be ensured. So the deleting
operation is complex and low efficiency.
In order to update the encrypted index more efficiently when a file is deleted, the
file linked list is constructed by the keywords of a file. All the file index forms the file
index. Encrypt the index and store it in the random position of the arrays which is
called the deleting array Ad (Deletion Array). Store the head node of the linked list in
the dictionary Td (Deletion Table). So when a file is deleted, find the position of the
keywords corresponding to the file in the As on the deleting array, update the
correspondingly in the As, and delete the corresponding file linked list from Ad. In
order to ensure that the server cannot get the file information of the user from array,
the random string is used to fill the unused unit in the array. At the same time, in order
to be able to find a free node in the As when adding a file, the idle node of the array
needs to be recorded. This protocol uses a special keyword to construct the idle nodes
linked list and stores the head node of the linked list in the search table, as storing the
inverted index.
Search for certification
The cloud storage model has been introduced before, and this model can realize
the function of the cipher-text search based on the keyword. Due to the lack of a
verification mechanism for the search operation, so this model is not perfect. So the
model will be improved in function next to add the function of the verification for the
search.
The protocol uses MHT as the basic authentication structure. Every file linked
list associated a keyword is as the data source of the leaf node in the MHT. Calculate
the value of the node using one-way hash function, and construct a full binary tree (in
order to facilitate the operation) based on the value. The root node of the
authentication tree is as the verification value, stored by the users of cloud storage
memory for the subsequent verification operation. The authentication tree itself is as
the authenticator stored by the server. When a user searches a file corresponds to a
keyword, the challenge according to this keyword and the search token are all
generated at the same time. The server performs the search operation according to the
search token, and generates the verification path according to the challenge at the
same time. After the user obtains the search results and the proof, he decrypts the
result and gets the value of the leaf nodes in the MHT by calculating. Then calculate
the final verification value according to the proof. Compare this value to the value
stored at the client. If they are the same, the verification is passed. Otherwise the
verification is failed and the operation is terminated.
2.3 Requirements for Design
In order to fully use storage service provided by the cloud storage, let the server
performs the search operation and ensure that the server cannot get any useful
information during the interactive process, this protocol designs a cipher-text search
method.
First of all, the user selects the files to be stored and adds some keywords
descripting the file for each file. Then construct the keyword index using these
keyword information. In order to ensure that the index will not reveal the file
information, the special process of encrypting these indexes is required special. Use
the symmetric encryption algorithms such as AES algorithm to encrypt the files of the
user, send the cipher-text and the encrypted index together to the cloud storage server
for storage.
When the user retrieve a keyword, he inputs the keyword, processes it to get the
keyword token and sends it to the server. After receiving the keyword token, the
server retrieves on the encrypted index of the user, finds the cipher-text corresponding
to the token, and return the result to user. Note that in this process the server doesn't
know what the search keyword the user specifies. The only effective information that
can be obtained is the specific files corresponds to the specific token.
Using this idea, the cipher-text search method supporting keyword search is
constructed to satisfy the demand of storing confidential data in the cloud storage for
user and give the server the ability of transparent search.
3. Data Types
3.1 Definition
3.1.1 File
This section introduces the file types supported in the protocol. The file
operations in the protocol are: file upload and file update.
The file types supported in this protocol are: text files (including the files with
the suffix: .txt, .doc, .docx, .pptx, .xls, etc.), sound files (including the files with the
suffix: .mp3, .wav and so on), video files (including the files with the
suffix: .avi, .mp4 etc.)
3.1.2 Array
: Search array. The linked list indexed by the keywords of the files is called the keywords linked list. The file identifier is written into the corresponding node list. All the keywords linked list form the inverted index. In order to ensure the server cannot acquire any effective information from the index, use the pseudo random function to encrypt the inverted index. Store the encrypted index in the search array As randomly
and the head nodes of each linked list are stored in the dictionary Ts (Search Table).
: Deletion array. In order to update the encrypted index efficiently, the linked
list constructed by the keywords of each file is called the file linked list. All the file linked lists form the file index. Encrypt the index and then store it in the random position of the array which is called the deletion array Ad. The head node of the linked list is stored in the dictionary Td .
3.1.3 Index
In this protocol contains two kinds of indexes: the inverted index and the encrypted index.
The inverted index is constructed using keyword information of the file.The encrypted index is the encrypted inverted index using special method.
3.1.4 Token
This protocol uses the token to do search operations and update operations. represents the tokens which includes search token, add token, deletion token. The
search token is defined as . The add token is defined as , and the deletion token is
defined as .
Search token. The format of search token is: , in
which w represents the keyword and k represents the key. When the user retrieves a file containing a certain keyword, he first processes the keyword and get the corresponding search token. After the server gets search token, it reads the user’s encrypted index and calls the corresponding algorithm to search to get the cipher-text set corresponding to the search token. Finally the cipher-text set is sent to the user.
Add token. The format of add token is: , in
which f represents the files to be added. When the user adds a file, the client first generates add token using the file and keyword information. The client encrypts the file and sends add token and the encrypted file to the server. The server receives the encrypted file, reads the user’s encrypted index and update the encrypted index using add token.
Deletion token. The format of deletion token is:
, in which f represents the files to be deleted. The
process of deleting files is similar with adding files. The user may not have a copy of the file locally when he wants to delete a file, so the deletion algorithm needs to download the file to be deleted from the server, and then generates deletion token using the file.
3.1.5 Proof
Proof. The format of deletion proof is: , in which h represents
the height of the authenticator . When a user searches a file with a certain keyword, he generates search token and the challenge corresponding to this search. The server executes search operation according to the search token and at the same time generates the certification path according to the challenge, also known as proof.
3.1.6 Hash table
Hash table is a data structure with direct access based on the key value. That is to say, it maps the key value to a position in the table to access records in order to speed up the search. The mapping function is called the hash function, and the array which stores records is called the hash table.
3.1.7 Merkle Hash Tree
The Merkle hash tree is the authentication structure based on the tree structure. The authentication structure can be used to verify the data integrity. It is usually defined as the complete binary tree when in use.
The Merkle hash tree is a full binary tree and it just uses a one-way hash function in the computation. Sometimes complete binary tree can also be used to represent the Merkle hash tree, because the Merkle tree used in the protocol has 2 l leaf nodes and it also belongs to complete binary tree.
The initialization of the Merkle hash tree requires mapping the documents to be
authenticated to leaf nodes and grow reversely through the hash function to construct
a complete hash tree. Then the verifier only needs to record the value of the root node
of hash tree and send the hash tree as the authenticator to the untrusted server. In the
stage of verification, the verifier generates the verification challenge of some leaf
node. The server receives the challenge and generates verification paths
corresponding to the position of challenge the leaf nodes corresponding, and transmits
it to the verifier, the verifier can verify operation according to the root node to verify
the path and the stored value. This verification method is far less than the complete
data retrieved to calculate way of validation in computational cost and communication
cost.
The figure below is an example showing how to generate a hash tree:
Fig.2.3 the structure of Merkle hash tree
Assuming that the data set is and each data in the set Yi can be the
data source of leaf node. Calculate the value of the leaf node of hash tree through the
one-way hash function F, and the calculation method can be expressed as
. After calculating each leaf node value, every two brother node
values are mapped to a value which is as the node values of its father using one-way
hash function, and finally construct the whole Hash authentication tree in this way. In
the process of calculation using one-way hash function F, with two leaf node values as
input, the final outputs a fixed length value, the calculation can be expressed as
. The root node value of
hash tree is expressed using the symbol and stored by the verifier as the
verification value. However, the hash tree itself is stored by the third party server.
3.2 Implementation
The index in the protocol can be achieved using two-dimensional array. Merkle hash
tree can be achieved using full binary tree. The rest data structure can be achieved by
variables. The reference implementation of data structure in the protocol is given in
the below.
Data structure
Definition Description
Index CArray<CStringArray*,CStringArray*> A two-dimensional array, used to store the file index and the inverted index
Hash table hash_map<CString,char[16]> Hash table, storing the corresponding relationship between the elements and its
MD5 value to avoid the repeated computation of the hash value
MHT node {char[16]} The structure of MHT nodeSearch array(1)
char fileID[8] File IDshort loc_pre The position of the previous node in the
list.short loc_next The position of the next node in the list.
Search array(2)
bool flag Indicates whether an array node has been used
short loc_d_next The position of the previous node in the list.
short loc_d_dual_pre The coordinate of the previous node of the dual node in the Ad
Deletion array(1)
short loc_d_dual_next The coordinate of the next node of the dual node in the Ad
short loc_s The coordinate of the dual node in As
short loc_s_pre The coordinate of the previous node of
the dual nodeshort loc_s_next The coordinate of the next node of the
dual nodechar fk1w[16]
The record of the value Deletion array(1)
bool flag Identifierchar complexStr[32] The record of the XOR valuechar randomStr[16] The record of the random string.
The entrance
address of MHT leaf
char fk1w[16] The record of the entrance address of the MHT leaf nodeint loc
proof char updateInfo[16] The definition of the structure of proof.char prove[ProveSize]
Search token
char fk1w[16] The definition of the structure of search token.char gk2w[16]
char pk3w[16]Add
token(1)char fk1w[16] The definition of the structure of add
token.char gk2w[16]char complexStr[16]char randomStr[16]
Add token(2)
int count The structure of add token, in which count represents the number of key
words in the filechar fk1free[16]char gk2free[16]
char fk1f[16]char gk2f[16]char pk3f[16]
Deletion token(1)
char fk1f[16] The definition of the structure of deletion token when every file is
calculated.char gk2f[16]char pk3f[16]
char fk1free[16]char gk2free[16]
Deletion token(2)
int count The structure of deletion token, in which count represents the number of the files
4. Message Types
In the process of the interaction between the client and the server, the format of transmission message is defined as follows:
The message contains the following fields:1. Message typeThe Message type field is mainly used to indicate the type of the transmission
message. The field uses 8 bits, and the first 4 bits is used to distinguish between an operation message and a notification message.
If it is an operation message, it indicates that the information carried in the message is a specific operation and the first 4 bits is set to 0000. If it is a notification message, it indicates that the message is used to notify whether the operation has been performed correctly and the first 4 bits is set to 0001.
(a) Add files (the first to add). When the client firstly adds a file, the client sends the encrypted files and indexes to the server. The field of the message is set to 0x01.
(b) Search operation. When the user performs a search operation, the client generates and sends search token to the server. The message of the field is set to 0x02. When the server returns the results the user wants to search, this field is set to the 0x03 message.
(c) Authentication operation. When the authentication of the search data is requested, the client will send the challenge to the server and the field of the message is set to 0x04. When the server has generated proof, it sends the prove value to the client and the field of the message is set to 0x05.
(d) Update operation. When the user needs to add files to the server (not the first to add), the client sends the new files and add token to the server. The server updates files and index using them. The field of the message is set to 0x06. When the user needs to delete the files in the server, the client sends the deletion token to the server for deleting the files. The field of the message is set to 0x07. When the server executes a DSA status update, the server sends the new DSA state to the client for authentication. The field of the message is set to 0x08.
(e) Operation tips. It is used for the server to notify whether the operation is successful. If the operation is successful, the field of the message is set to 0x00. If the operation fails, the server returns an error message and the field of the message is set to 0x01.
2. Length
This field is used to represent the size (Byte) of the data part in the transmission.
3. DirectionThis field is used to represent the direction of message transmission. When the
client sends the message to the server, the field of the message is set to 0x00. When the server sends the message to the client, the field of the message is set to 0x01.
4. TypeThis field is used to represent the type of the transmission data.If the Data field in the transmission is encrypted file, the field is set to 0x00.If the Data field in the transmission is index the field is set to 0x01.If the Data field in the transmission is add token, the field is set to 0x02. If the
Data field in the transmission is deletion token, the field is set to 0x03. If the Data field in the transmission is search token, the field is set to 0x04.
If the Data field in the transmission is challenge, the field is set to 0x05.If the Data field in the transmission is proof, the field is set to 0x06.If the Data field in the transmission is search authenticator, the field is set to
0x07.If the Data field in the transmission is DSA state, the field is set to 0x08.If the Data field in the transmission is error information, the field is set to 0x09.
5. DataThis field is used to store the data to be transmitted.
5. File Storage
5.1 Overview
When the user uploads a file, he first chooses the files to upload from the local disk and attaches some keyword (specified by the user) description for each file. After the files has been chosen, the client preprocess the data, including generating the encrypted index, the search authenticator and the encrypted files and upload them to the server for storage.
Fig.5.1.1 The flow chart of file storage sectionThe flow chart of file storage section is shown in Figure 5.1.1. First the client
generates the keyword index according to the keyword. Then the client encrypts the index and the files and gets the cipher-text c and the encrypted index λ. The client generates the Merkle hash tree, also known as search authenticator, and sends the hash tree, the cipher-text c and the encrypted index λ to the server. The server receives these files, stores them at the local, and returns a message to the client to inform it whether the operation has been performed successfully.
5.2 Generate inverted index
The operation of generating inverted index is performed on the client in the local.According to keyword information of the file, construct the file linked list based
on the keyword. All the file linked lists form the inverted index. Encrypt the inverted index to get the encrypted index. The flow chart of generating the inverted index is showed in Figure 5.2.1.
Fig.5.2.1 The flow chart of generating the inverted index
5.3 Generate keys
The operation of generating keys is performed on the client in the local.
Key generation process
: 1k, which is the system security parameter, is the input of the function.
Select three k-bits strings K1, K2, K3 randomly as the key of the pseudorandom
function. Compute as the key of symmetric encryption algorithm.
The algorithm outputs the key .
is running on the client for generating the key of symmetric encryption
algorithm and pseudo random function. The generated key is only used locally on the client. So the key management on the client is very simple. The client only needs to store the key, and it is not related to the key distribution operation. Notably, the file encryption, the index encryption and updating operations all need the key to participate in. So if the user’s key is missing, it will be unable to retrieve their data from the cloud storage server.
5.4 Encryption
The operation of encryption is performed on the client in the local.The process of encryption includes file encryption and index encryption.
After generating the encrypted index and the authenticator, encrypt the user’s
plaintext files. The process of file encryption is relatively simple. It only needs to loop
for the collection of files, and use symmetric encryption algorithm to encrypt the files.
Attach the file name and the keyword information to the end of plaintext file for the
use of generating deletion tokens in the subsequent procession and encrypt the files.
To generate the encryption index, first traverse the inverted index to generate the
MD5 of file and keyword, the search array which has been filled and the search table.
Traverse the file linked list, and use the files and the MD5 value of keywords to fill in
the deletion array and search table. After the completion of the traversal, construct the
free linked list and store it in the array. Finally write the generated search array,
deletion array and two search tables into the file and save them in the disk.
The flow chart of generating encrypted index is shown in figure 5.4.1.
Fig.5.4.1 The flow chart of generating encrypted index
The realization process of encryption algorithm is described in detail through the formal definition.
Encryption algorithm process
: Input key K, file set , inverted index , and process as follows:
1. Initialize array , and dictionary , .
2. For each keyword ,process as follows:
(a) Create the linked list , and the list contains nodes .
These nodes will be stored in the array randomly. Define
.
Among them, represent the ith document identification and is a random
string to be filled. and are all defined as 0.
(b) Store the address of the head node of each linked-list in the search
table . The structure of is defined as
. represents the coordinate of
the dual node of in the array .
3.for each in the every file, process as follows:
(a) Construct the linked-list which contains nodes and
store the nodes in the array randomly. Notice that every node is associated
with a keyword , therefore it is also associated with a node in the linked list .
and are defined as the previous node and the next node of in the
keyword linked-list. The structure of the node is defined as:
represents the random string to be filled. is defined as 0.
(b) Store the address of the head node in the each linked-list in the search
table . The structure of is defined as .
4. Select unused nodes and randomly each from the
free array . For each node , the structure of is defined as:
.set as 0 and store the head node of the
linked-list in the search table .the structure is defined as .
5. Fill the other unused nodes in the array and with random strings.
6. Encrypt each file and get the cipher-text .
7. The algorithm outputs the encrypted file set and the encrypted
index .
Notice, MD5 value of all the files and the keywords has been calculated in the
process of generating the encrypted index. To improve the treatment efficiency, write
the data and MD5 information into the hash table (to be a dictionary) for later use.
In order to facilitate the understanding, a simple example of constructing an encryption index is given here. Suppose that there are three files to be upload:
, , , where w represents the keyword of the
file. First, the inverted index is constructed using the files and the keyword information. Then use the inverted index and the files to structure the encrypted index. The results are shown in Figure 5.4.2
Fig.5.4.2 Structure the encrypted index.The detailed construction process is as follows:① Construct the file linked-list distinguished by the keywords according to the
keyword information. All the linked-list together form the inverted index.② Define two fixed-length arrays As and Ad, and initialize them.③ for every node in the inverted index, the contents of the node is computed
according to the formula and written into the array As . Write the coordinate of the head node of the linked-list list into Ts.
④ For each keyword of each file, calculate the contents of the node and write them into the array Ad. Write the coordinate of the head node of each file linked-list into Td.
⑤ Construct the free list and write the coordinate of the head node of the linked-list into Ts
5.5 Generate search authenticator
The operation of generating the search authenticator is performed on the client in the local.
After the encryption index is generated, it will be easy to get the inverted index
and the dictionary which records all the MD5 values. It can be quick to construct the
search authenticator using these information. Note that there is an identification flag
in the array As and array Ad each, indicating whether the node has been used. In order
to hide this information to the server, the identification will be ignored and the other
contents will be written into the files when writing to the disk.
The key of generating search authenticator is to calculate the value of the leaf
node. When generate search authenticator, first initialize the MHT array according to
the number of keyword in the inverted index and then traverse the inverted index. For
each linked in the inverted index, read the corresponding MD5 value and compute and
write the result into the leaf node position corresponding to the MHT array. After the
inverted index has been traversed, calculate up according to the leaf nodes to get the
whole MHT authentication tree. Then write the tree into file, set the value of the root
node of the MHT as the authentication value and write it into the key file of user. The
process flow of generating the search authenticator is shown in figure5.5.1.
Fig.5.5.1 The process flow of generating the search authenticatorThe following describes the process of generating the search authenticator in
detail through formal definition.the process of generating the search authenticator
: Input the user key , the file set , the inverted index , and
process as follows:
1. For each keyword , compute .
2. Set as leaf node to construct MHT. stands for MHT and
stands for the value of root node of MHT.
3. The algorithm outputs authenticator and DSA state .
Execute the algorithm to get the search authenticator and the
authentication value . The client is responsible for the storing . The search
authenticator and the results generated by are stored in the Server.
5.6 Upload file
The operation of updating files is completed by both the client and the server, which is an interactive process.
After the client completes the process of encrypting the files and generating the search authenticator and the encrypted index, it uploads the encrypted files, the encrypted index and the search the authenticator to the server for storage.
When upload different files, the data filling into each field of the message is different
When the client sends the encrypted files to the server, each field of the message is filled as follows:
Message type field: 0x01, it indicates the operation is adding files the first time.Direction field: 0x00, it indicates the message is sent from the client to the
server.Type field: 0x00, it indicates the data portion carries the encrypted files.Length field and Data field will be filled based on the actual situation. When the client sends the encrypted index to the server, each field of the
message is filled as follows:Message type field: 0x01, it indicates the operation is adding files the first time.
Direction field: 0x00, it indicates the message is sent from the client to the server.
Type field: 0x00, it indicates the data portion carries the index.Length field and Data field will be filled based on the actual situation. When the client sends the search authenticator to the server, each field of the
message is filled as follows:Message type field: 0x01, it indicates the operation is adding files the first time.Direction field: 0x00, it indicates the message is sent from the client to the
server.Type field: 0x00, it indicates the data portion carries the search authenticator.Length field and Data field will be filled based on the actual situation.
5.7 Store files
The operation of storing files is performed on the client in the local.When the server receives a connection request from the client, it first checks
whether the user exists. If it exists, the server directly stores the received files in the user's corresponding folder. If the user does not exist, the server creates a new folder, and stores the files in it.
6. File Search
6.1 Overview
When the user retrieves a file includes some keywords, first use algorithm
to handle the keywords to get the corresponding search token.
When the server gets the search token, it read the encrypted index of user and use
algorithm to search. Then the server finds the search token
corresponding to cipher-text set, and sends the result to the user. The user receives and decrypts the cipher-text. After this process, the user can get the file set corresponding to the keywords without divulging any effective information.
Fig. 6.1.1 Sequence Diagram of Search Operation
6.2 Generate search token
The operation of generating search token is performed on the client in the local.
: Input the keyword w and the key K, Output the search token
。
6.3 Send search token
The operation of sending the search token is completed by both the client and the server, which is an interactive process. The search token generated by the client is sent to the server.
When the client sends the search token to the server, each field of the message is filled as follows
Message type field: 0x02, it indicates the operation of sending the search token belongs to search part.
Direction field: 0x00, it indicates the message is sent from the client to the server.
Type field: 0x04, it indicates the data portion carries the search token.Length field and Data field will be filled based on the actual situation.
6.4 Search
At any time, the user can input the keyword and send the request to the server to query all the files which contain the keywords.
: input the encrypted index , the search token and the
cipher-text set , and process as follows:
1. Compute , find the coordinate of the head node of the
keyword linked list . stands for the coordinate of in and stands for the
coordinate of in .
2. Express the content of in as , compute
and get the file description corresponding to
the node and the coordinate of the next node in .
3. If is not equal to zero, execute the method of step 2. If is
equal to zero, the algorithm stops.
stands for the file identifier set which has been searched. Find
cipher-text corresponding to each identifier and output .
6.5 Return result
The operation of returning the result is completed by both the client and the server, which is an interactive process. The search result generated by the server is sent to the client.
When the server sends the search result to the client, each field of the message is filled as follows:
Message type field: 0x03, it indicates the operation of returning the search result belongs to the search part.
Direction field: 0x01, it indicates the message is sent from the server to the client.
Type field: 0x00, it indicates the data portion carries the cipher-text.Length field and Data field will be filled based on the actual situation.
6.6 Decryption
The operation of decryption is performed on the client in the local.Because the files which the user receives from the server have been encrypted,
the client needs to use the same symmetric encryption algorithm to decrypt files.
: Input the key and the cipher-text set by
7. Challenge and Proof
7.1 Overview
Verify operation must be accompanied by search operation synchronously. The diagram below is a timing diagram of the authentication process. The process includes two parts: challenge and prove. First, the client generates the corresponding challenge according to a certain search which is based on some keywords and sends it to the server. After the server receives the challenge, it reads the user's search authenticator and generates the proof according to that search. After the client gets the proof, it decrypts the file set including the keywords, returned by the search process and validates the result. Finally the client judges whether the operation of server is legitimate according to the output of the algorithm.
Fig. 7.1.1 Sequence Diagram of Authentication Algorithm
7.2 Generate challenge
The operation of generating challenge is performed on the client in the local.
: Input the key and the keyword for searching w , and then
compute and output the challenge .
7.3 Send challenge
The operation of sending challenge is completed by both the client and the server, which is an interactive process. The challenge generated by the server is sent to the client.
When the server sends the challenge to the client, each field of the message is filled as follows:
Message type field: 0x04, it indicates the operation of sending challenge belongs to the authentication part.
Direction field: 0x00, it indicates the message is sent from the client to the server.
Type field: 0x05, it indicates the data portion carries the challenge.Length field and Data field will be filled based on the actual situation.
7.4 Generate proof
The operation of generating challenge is performed on the client in the local.
: Input the search authenticator , the challenge , and process as
follows:
1. Traverse , and find the first leaf node M whose element is .
2. Traverse from node M to the root node, and record the sibling node value
of all the node in the traversal path.
3. Output the proof , and h is the height of the authenticator .
7.5 Send proof
The operation of sending proof is completed by both the client and the server, which is an interactive process. The proof generated by the server is sent to the client.
When the server sends the challenge to the client, each field of the message is filled as follows:
Message type field: 001000, it indicates the operation of sending proof belongs to the authentication part.
Direction field: 0x01, it indicates the message is sent from the server to the client.
Type field: 0x05, it indicates the data portion carries the proof.Length field and Data field will be filled based on the actual situation.
7.6 Validate proof
The operation of validating proof is performed on the client in the local.
: Input the file , the proof and the state got by search, and
process as follows:
1. Compute to get the value of leaf node .
2. Compute when , to get the validation value .
3. If , the validation is passed and outputs 1. Otherwise outputs 0.
8. File Update
8.1 Overview
The timing diagram of the update operation is shown below.
Fig. 8.1.1 Sequence Diagram of Add Files
When a user adds a file, the client first uses the file and the keywords to generate add tokens (including the SSE token and the token DSA), and encrypts the files. Then the client uploads cipher-text and add token. The server updates the encrypted index and the search authenticator using the token and simultaneously stores the cipher-text. Then it return the update information back to the client. The client updates the local DSA state using the update information.
The process of deleting files is similar with adding files. When deleting files, the copy of file in the local should be accounted for. So the deleting algorithm first needs to download the files which the user want to delete from the server. Then the client
generates the deletion token using the files. We assume that the client has the copies of files to be removed.
8.2 Generate Keys
Key generation process
: is the security parameter of the system. Select two k-bit length strings
randomly according to the safety parameter.
The key generation algorithm and running on the client is used for
generating the key of the DSA algorithm. The generated key consists of two parts,
which are keys of two pseudo random functions respectively. Same with the key
generation of SSE algorithm, the generation and use of DSA key and use only in the
Client and the client is responsible for keeping the key.
8.3 Generate update token
The operation of generating update token is performed on the client in the local.The update operation includes add operation and deletion operation, so the
update token also includes the add token and the deletion token.
Generating add token:
: Input the key K, the files which are to be added, the
inverted index , and process as follows:
1. For each keyword of the files ( ), compute
, and are fixed-length random strings.
2. Compute , according to the result of step 1.
3. Encrypt the file .
4. The algorithm outputs the add token and the cipher-text .
Generating the deletion token:
: input the key , the file , and compute
, the algorithm output the deletion token .
8.4 Send token/file
After generating the update token, the client sends the token to the server.The operation of sending token is completed by both the client and the server,
which is an interactive process. The token generated by the client is sent to the server. When the client adds file, the client not only needs to upload the add token, but also needs to upload encrypted files.
Add files:The process of adding files is divided into two sub-processes: sending the token
and sending the files. When the server sends the token to the client, each field of the message is
filled as follows:Message type field: 0x06, it indicates the operation of adding files belongs to the
updating part.Direction field: 0x00, it indicates the message is sent from the client to the
server.Type field: 0x02, it indicates the data portion carries the addition token.Length field and Data field will be filled based on the actual situation. When the server sends the encrypted file to the client, each field of the
message is filled as follows:Message type field: 0x06, it indicates the operation of adding files belongs to the
updating part.Direction field: 0x00, it indicates the message is sent from the client to the
server.Type field: 0x00, it indicates the data portion carries the encrypted file.Length field and Data field will be filled based on the actual situation.
Delete files: When the server sends the token to the client, each field of the message is
filled as follows:Message type field: 0x07, it indicates the operation of deleting files belongs to
the updating part.Message type field: 0x07, it indicates the operation of deleting files belongs to
the update part.Direction field: 0x00, it indicates the message is sent from the client to the
server.Type field: 0x00, it indicates the data portion carries the challenge.Length field and Data field will be filled based on the actual situation.
8.5 Update files
The operation of updating files is performed on the client in the local.When the server receives the files from the client, the server stores them in the
local.
8.6 Update index
The operation of updating index is performed on the client in the local.Update index(add files operation)
Adding file process
:Input the cipher-text ,the encrypted index ,the add token , and
process as follows:
1. Store the cipher-text: .
2.Set as ,for each , process as follows:
(a) Find the coordinate of the head node M of the free list in through .
(b) Compute to find the coordinate of the next free node, and the
coordinate of the dual node of M in , .
(c) Set and update the linked-list .
(d) Compute , find the coordinate of the head node N of
the linked-list in , and the coordinate of dual node of M.
(e) Let represent for , set , and set node M as
the head node of the linked-list .
(f) Set , update the content of node M.
(g) Set , update the search table .
(h) Let represent for , set , and
update the content of node .
(i) Set ,update the content of node
(j) Set , update the search list .
3. The algorithm outputs the new encrypted file set and the updated encrypted index
.
Update index (delete file operation)
Deleting file process
:Input the encrypted index ,the encrypted file set , the deletion token
, and process as follows:
1. Let be , compute , and find the coordinate of the head
node of the linked-list in .
2. For each node in the linked-list , process as follows:
(a) Compute , in which .
(b) Use random string to fill .
(c)Compute , find the address of the head node of the linked-list .
(d) Set , let the node be the head node of the linked-list .
(e) Set , update the content of N which is the dual node of , let
it in the linked-list .
(f) Let be the previous node of in the keyword linked-list. Set
, in which . Set
, in which
.
(g) Let be the next node of in the keyword linked-list. Set
, in which . Set
, in which
.
(h) Set , and execute (a)
3. Delete the files with the identification in the encrypted file set, .
4. The algorithm outputs the new encrypted file set and the updated encrypted index
.
The process flow of deleting files is approximately same as adding files. The
client generates the delete token using the files and the keywords and sends it to the
server. The server reads the encrypted index, update the encrypted index using the
deletion token, and delete the corresponding files from the encrypted file set.
8.7 Update Search Authenticator
The operation of updating the search authenticator is performed on the client in the local.
In order to make user still validate the search result after adding and deleting files, the search authenticator must be updated after adding and deleting files. The DSA state (hash tree) stored at the user also needs the corresponding update operation.
When the user needs to add a file, he first generates add token using the files and the keywords, and then encrypts the files. Finally he sends add token and encrypted files to the server. After the server receives the data, it reads the user's search authenticator, update the search authenticator operation according to add tokens, and stores the encrypted files of the user. The process flow of deleting the files is roughly the same. Just change the file storage operation to delete files operation. The specific operations are as follows:
Generate update information:
: Input the search authenticator and add token , and process as
follows:
1. Set the add token as .
2. When ,
(a)Find the first leaf node the value of which is in the authenticator
(b)Set as . Compute .
(c)Record the critical path from M to the root.
3. Reconstruct MHT with the node and the nodes that do not change,
to get the new authenticator .
4. Output the new authenticator and the update information .
8.8 Return new DSA
The operation of returning DSA state is completed by both the client and the
server, which is an interactive process. The new DSA state generated by the server is sent to the client.
When the server sends the DSA state to the client, each field of the message is filled as follows:
Message type field: 0x07, it indicates the operation of returning DSA belongs to the update part.
Direction field: 0x01, it indicates the message is sent from the server to the client.
Type field: 0x00, it indicates the data portion carries the DSA state.Length field and Data field will be filled based on the actual situation.
8.9 Update DSA
The operation of updating DSA state is performed on the client in the local.The DSA state is the root node values. It is stored at the client in the local, used
in the authentication operation.
: input DSA state , update information , add token , and
process as follows:
1. Let the token be , the update information be ,
and the leaf node corresponding to be .
2. Validate using state , when . If the validation is passed, continue.
Otherwise the algorithm outputs .
3. Compute , when .
4. Use to update .
5. Output new state value .
9 Error Handling
Error Definition Result Reason Action
Log-on error Password is incorrect/Cannot connect to the Internet
Message pops up, and input again/check the Internet
Uploading error Cannot connect to the Internet / file is occupied
Message pops up, and check the network / close occupied file
Operation error The content of the message is filled in error (such as the message transmission direction may fill in error)
Message pops up, and check the content filled in