· web viewat present, the rapid development of computer technology and internet application cause...

94
Cloud Storage Oriented Cipher-text Search Protocol Catalogue 1. Introduction............................................. 4 1.1 Background ............................................4 1.2 Purpose ...............................................5

Upload: vuongkhuong

Post on 13-Apr-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Cloud Storage Oriented Cipher-

text Search Protocol

Catalogue1. Introduction............................................................................................................................4

1.1 Background .....................................................................................................................4

1.2 Purpose ...........................................................................................................................5

1.3 Application .....................................................................................................................5

1.4 Terminology ...................................................................................................................5

1.5 Symbol description..........................................................................................................7

1.6 Normative reference .......................................................................................................8

2. Overview................................................................................................................................8

2.1 Protocol overview............................................................................................................8

2.2 Design Philosophy...........................................................................................................9

2.3 Requirements for Design...............................................................................................13

3. Data Types...........................................................................................................................14

3.1 Definition.......................................................................................................................14

3.1.1 File..........................................................................................................................14

3.1.2 Array.......................................................................................................................14

3.1.3 Index.......................................................................................................................14

3.1.4 Token......................................................................................................................15

3.1.5 Proof.......................................................................................................................15

3.1.6 Hash table...............................................................................................................16

3.1.7 Merkle Hash Tree...................................................................................................16

3.2 Implementation..............................................................................................................17

4. Message Types.....................................................................................................................19

5. File Storage..........................................................................................................................21

5.1 Overview........................................................................................................................21

5.2 Generate index...............................................................................................................21

5.3 Generate keys.................................................................................................................22

5.4 Encryption......................................................................................................................23

5.5 Generate.........................................................................................................................26

5.6 Upload file.....................................................................................................................28

5.7 Store files.......................................................................................................................29

6. File Search...........................................................................................................................29

6.1 Overview........................................................................................................................29

6.2 Generate search token....................................................................................................30

6.3 Send search token..........................................................................................................30

6.4 Search............................................................................................................................31

6.5 Return result...................................................................................................................31

6.6 Decryption.....................................................................................................................32

7. Challenge and Proof............................................................................................................32

7.1 Overview........................................................................................................................32

7.2 Generate challenge.........................................................................................................33

7.3 Send challenge...............................................................................................................33

7.4 Generate proof...............................................................................................................34

7.5 Send proof......................................................................................................................34

7.6 Validate proof................................................................................................................35

8. File Update...........................................................................................................................35

8.1 Overview........................................................................................................................35

8.2 Generate Keys................................................................................................................36

8.3 Generate update token...................................................................................................37

8.4 Send token/file...............................................................................................................38

8.5 Update files....................................................................................................................39

8.6 Update index..................................................................................................................39

8.7 Update Search Authentication Token............................................................................40

8.8 Return new DSA............................................................................................................41

8.9 Update DSA...................................................................................................................42

9 Error Handling ....................................................................................................................42

1. Introduction

1.1 Background

In recent years, with the rapid development of cloud computing, cloud storage

which is one of the most important parts of cloud computing is becoming a

researching hot. Technically speaking, cloud storage refers a system which consists of

large numbers of different types of network storage devices working together. These

devices use the technology of cluster application, grid and distributed file system to

provide storage and business access.

At present, the rapid development of computer technology and Internet

application cause data grow exponentially. People have more and more demand for

storage. Under this trend, the proposal and development of cloud storage not only

brings cheap storage service for people but also challenges traditional data storage

service. Cloud storage, as a new storage method, has a big advantage over traditional

storage. First of all, in the cloud storage, storage exists as a service. When a user has

requirement for storage, he applies the appropriate size of space from cloud storage

service provider to avoid constructing and managing storage platform himself. It

ensures the full utilization of storage resources, and reduces the overhead of storage

cost of user. Secondly, cloud storage can provide data backup, disaster recovery, load

balance and other functions. So when some storage nodes are upgraded or damaged, it

can also provide storage service normally to user to avoid the interruption of service.

In addition, the authorized users can access the cloud storage service at any place

through network. This storage flexibility will play a great role in promoting the

development of mobile Internet. The scalability, low cost, no access restrictions and

easy management of cloud storage will bring a great challenge to the traditional

storage method.

At the same time, cloud storage also has problems. In cloud storage, all the data

are delivered to cloud storage provider, and users lose the absolute control of the data,

which will inevitably cause users to concern about the data security. Cloud storage has

provided security protection measures, such as the use of general SSL in data

transmission (Secure Sockets Layer) and TLS (Transport Layer Security) protocol,

data encryption and firewall settings, but the data security depends entirely on the

cloud storage system security, the quality of data administrator and other controlled

factors because of the centralized management of the CSP. In addition, data stored in

the cloud has become a primary target for malicious users and hackers. And if the data

isolation becomes invalid, the private data of users may be leaked. Although cloud

storage provider provides SLA (Service Level Agreement) protocol to user to

illustrate the services grades it provides, various uncontrolled factors still cause the

concerns of users. Data security is a key problem in cloud storage. The survey of the

Twinstrata in 2012 shows that only 20% of users are willing to store their private data

in the cloud, and about 50% of people are willing to use cloud storage for data

backup, archival storage, disaster recovery and so on. Thus, the problem of data

security obstacles the cloud storage extension.

The cloud storage vendors, such as Windows, VMware, Amazon, Google all

have launched their own cloud storage services and give a certain assurance of data

security, such as a variety of encryption, authentication means, to protect the privacy

of users. But there are still various security incidents. In 2005, the encrypted tapes of

the America bank were lost, resulting in the disclosure of a large number of

customers’ information. In April of 2011, the information of nearly 77000000 online

customers of Sony was stolen, including credit card data. In June of the same year,

Google was invaded and some important personnel mailbox accounts were theft.

These security events have caused users to lose trust in these cloud service providers.

1.2 Purpose

This protocol constructs a dynamic cipher-text search model and a search

verification model based on cloud storage from the perspective of protection of data

privacy. This model enables users to store their private data at the untrusted party.

Even if the data are stolen, it will not disclose any information about the plaintext of

the data. The model also supports the search operation based on the keyword and

dynamic adding and deleting files.

1.3 Application

The system based on this protocol can be used in some confidential departments

and sensitive commercial sectors. These departments and sectors can store a large

number of secret information in the cipher-text form in the cloud server and retrieve at

any time as necessary.

1.4 Terminology

Keyword. A keyword is used to indicate the word of file content, which is the

generalization and centralization of information. In this protocol it refers to some

words selected as the identifications of the files.

Linked list. The linked list is a storage structure, the physical storage unit of

which is non-continuous and non-sequential. The logical sequence of data elements is

implemented by the linked order of pointer. The list consists of a series of nodes (each

element in the list is called node), which can be dynamically generated at runtime.

Each node consists of two parts: one is the data domain which stores the data

elements and the other is the pointer domain which stores the address of the next

node. Compared to the structure of the sequence list, the linked list is more convenient

for inserting and deleting operations.

Array. An array is a form of organizing some variables with the same type

orderly in order to handle easily.

Pseudo random function. A pseudorandom function is an algorithm of producing

random numbers for people.

Token: A token is a kind of special frames which can control the site occupying

media, to distinguish from the data frame and the other control frames. In this

protocol the token is a kind of data format, used to represent the transmission type of

message and the transmission data.

Inverted index. An inverted index is derived from practical applications which

need to find records according to the value of the attribute. Each part of this index

table includes an attribute value and the address of each record with the attribute

values. It determines the record position by attribute value and it doesn’t determine

the attribute value by record.

Encryption. Encryption is to make the original plaintext files or data become

unreadable code, often referred as "cipher-text", according to a certain algorithm. It

can only demonstrate the original content after the corresponding key is input. By this

way, the purpose to protect data from being stolen or read by illegally people is

achieved.

MD5. Message Digest Algorithm MD5 (message digest algorithm version fifth)

is a hash function which is widely used in the computer security field to provide the

integrity of the message. The algorithm transforms an arbitrary length byte string into

a large length-fixed integer to ensure data integrity.

Cloud storage. Cloud storage develops based on cloud computing. It focuses on

providing users with online storage service based on the Internet. Cloud storage

organizes a large number of different types of storage devices to cooperate together

by software to provide external data storage services.

Digital signature. Digital signature is some data which is added on the data unit,

or the cipher transformation made to the data unit. This data or transformation allows

the recipient of data unit to confirm the data sources and data integrity and to protect

data from being forged.

Key. The key is a kind of parameter. It is the data as the input of algorithm

converting the plaintext into cipher-text or cipher-text into plaintext.

1.5 Symbol description

Symbol description

symbol descriptionThe file set including n files

#F The number of filesThe files including m keywords

W The set of keywords#W The number of keywords

The file set including the keyword w

The number of files including the keyword w

The linked list made up of files including keyword w

The linked list made up of all keywords in file f

Inverted indexfree An special keyword ,satisfied

Search array, used to store keyword linked list

Dictionary, used to record the head node of the linked list

Deleting array, used to store file linked list

Dictionary, used to record the head node of the linked list

γ Encrypted index, defined as

Function description:

The algorithm used in the search part:

Description of Algorithm Function

: Running on the client, used to generate the key for symmetric encryption

algorithm and pseudo-random function.

: Use the key to encrypt the user’s file and keyword information into

cipher-text and the encrypted index.

. Use the user’s keyword to generate the corresponding search

token.

: Use the search token to perform search operation on the encrypted

index.

: Generate the add token according to the files to be added and

the corresponding keywords.

: Use the received add token to add files and update the stored

encrypted index.

:Generate the deleting token according to the files to be deleted.

: Use the received delete token and the files to be deleted to update the

stored encrypted index.

The algorithm used in the authentication part:

Description of Algorithm Function

. To generate the key used in the algorithm and the client is responsible for

keeping the key.

: To generate the search authenticator when storing files.

: Run by the Client, and it is used to generate the challenge of searching

some keywords.

: Run by the Server, and it is used to generate the authentication path

according to a certain search.

: The verification algorithm. Run by the Client, and it is used to verify

the proof sent from the Server

: Generate add token according to the files to be added and the

corresponding keywords.

: Use the received add token to add files and update the stored

encrypted index.

: Generate the deleting token according to the files to be deleted

: Use the received delete token and the files to be deleted to update the

stored encrypted index.

: It is used to update the DSA state.

1.6 Normative reference

Kamara S, Lauter K. Cryptographic cloud storage[J]. Financial Cryptography and Data Security, 2010: 136-149.

2. Overview

2.1 Protocol overview

The system uses C/S architecture and it is composed of two entities, client and server. The main function of client is key generation, data encryption/decryption, authenticator generation, token generation and so on. The main function of server is searching, proof generation, update operation and so on. The overall frame is shown in figure 2.1.

figure 2.1 the frame structure of the cloud storage system

Figure 2.1: ①file encryption/decryption at the client; ②generate the search token

using keywords at the client; ③generate add/deleting tokens using the files to be

added/deleted at the client; ④store the files which user uploads at the server; ⑤search

on the encrypted index according to the received search token at the server; ⑥update

files according to the received add/delete token at the server; ⑦interaction of data

between the client and server.

As can be seen from figure 2.1, the main function of the client is obtaining the

original data from the users (including the files to be uploaded, the searching

keywords, and the files to be updated), processing data and uploading data to the

cloud. The main function of the server is receiving the data sent by client and doing

the corresponding operations, mainly including storing, searching, and updating.

2.2 Design Philosophy

The protocol analysis the security of the existing cloud storage system, and puts

forward a secure framework of cloud storage system.

Fig. 2.2 complete Security Model of Cloud StorageThe construction of the model of secure cloud storage system is based on the

Searchable Symmetric Encryption (SSE) algorithm, combining with the secure cloud

storage system architecture. Through the SSE algorithm, the user can encrypt the data

and index, and send the cipher-text and the secure index to the cloud service provider

for storing. When executing the search operation, the cloud service provider searches

on the secure index using the search token generated by the user, and returns the

cipher-text set to the user. Then the user can decrypt the received result and get the

plaintext file corresponding to the search keyword. In addition, users can add files and

delete files at any time, and it still can be able to guarantee the correctness of the

index. In order to verify the search results returned by the server, a dynamic search

authentication (DSA) algorithm is designed. The algorithm is based on the improved

Merkle authentication dictionary and can validate the correctness of the search result.

The algorithm also support update operation based on the token and the algorithm can

achieve higher efficiency at communication and computations.

Searchable encryption

The model involves only two entities. One is the owner of the confidentiality

data, who hopes to store the data in the cloud and prevent from illegal access to the

data. This kind of entity is called the user (Client). The other kind of entity is the

cloud storage service provider, who provides storage interface outward, stores the data

and performs specific search operation on the data. It is called the server (Server).

According to the mentions above, in order to guarantee the security of the data in

the maximum extent, all of the operations processing user data are basically placed at

the client, including user’s files encryption, file index encryption and process of

keywords. And the server only needs to store the files and do the limited retrieval

function.

Fig. 2.3 Searchable Encryption Model of Cloud Environment

As can be seen from the chart, the user uses the computer to select the file sets

needed to be stored, preprocesses the files and then uploads them to the cloud. The

preprocessing of the files is divided into two parts which execute simultaneously. One

part is using symmetric encryption algorithm to encrypt the files set to get the cipher-

text set, and then uploading them to the cloud storage server. The other part is

constructing the index using the keywords of the file, encrypting the index using the

special encryption method and storing the result which is called the encrypted index

in the cloud. The storage of the file and index is managed by the cloud storage service

providers. The users only need to upload the files, without caring about the details of

file storage. When a user searches some keyword, the client generates the search

token corresponding the keyword using the method provided by the algorithm and

sends the token to the cloud storage server. Then the server performs the search

operation and returns the result.

The key of constructing the searchable encryption algorithm lies in the

encryption of file index. In order to obtain a better search experience, this protocol

uses the form of keyword specified by the user in advance. After obtaining the

keyword information, preprocess these keywords. The keyword linked list is

constructed by the files containing the same keyword and the file identifier is written

in the linked list corresponding to the nodes. All of the keyword linked list form the

inverted index. In order to ensure that the server cannot obtain effective information

from the index, the pseudo random function (PRFs) is used to encrypt the inverted

index. The encrypted index is stored in the random position of the search array, and

each head node of the list is stored in the dictionary Ts (also called search table). The

processed arrays and dictionaries are stored in the server. Because the inner elements

are all encrypted data, the server cannot get the plaintext information directly from the

search arrays and the search table. When the user search a keyword, process the

keyword to get the search token which contains the information designating the

position of the keyword in the encrypted index. After the server receives the search

token, it reads the encrypted index of the user, performs the search operation, gets the

file identification, and sends the responding cipher-text to the client.

The user of cloud storage users may add or delete the files at any time, so the

protocol must be able to support dynamic addition and deletion operation. The

previous discussion shows that the key of the search lies in the construction of the

encrypted index. In order to ensure that it still can be efficient and correct to perform

the search operation after the user adds and deletes the files, the encrypted index must

be updated in the process of adding and deleting files. When the user adds files, the

keyword that the file contains maybe existed or new. No matter what kind of situation,

it only needs perform the corresponding updating operation on the keyword linked

list, and the operation is not difficult. When the user deletes a file, the file contains

different keywords which may be at any node of the keyword linked list. So every

node of the linked list containing the keyword must be traversed. After deleting the

node, the continuity of the linked list also needs to be ensured. So the deleting

operation is complex and low efficiency.

In order to update the encrypted index more efficiently when a file is deleted, the

file linked list is constructed by the keywords of a file. All the file index forms the file

index. Encrypt the index and store it in the random position of the arrays which is

called the deleting array Ad (Deletion Array). Store the head node of the linked list in

the dictionary Td (Deletion Table). So when a file is deleted, find the position of the

keywords corresponding to the file in the As on the deleting array, update the

correspondingly in the As, and delete the corresponding file linked list from Ad. In

order to ensure that the server cannot get the file information of the user from array,

the random string is used to fill the unused unit in the array. At the same time, in order

to be able to find a free node in the As when adding a file, the idle node of the array

needs to be recorded. This protocol uses a special keyword to construct the idle nodes

linked list and stores the head node of the linked list in the search table, as storing the

inverted index.

Search for certification

The cloud storage model has been introduced before, and this model can realize

the function of the cipher-text search based on the keyword. Due to the lack of a

verification mechanism for the search operation, so this model is not perfect. So the

model will be improved in function next to add the function of the verification for the

search.

The protocol uses MHT as the basic authentication structure. Every file linked

list associated a keyword is as the data source of the leaf node in the MHT. Calculate

the value of the node using one-way hash function, and construct a full binary tree (in

order to facilitate the operation) based on the value. The root node of the

authentication tree is as the verification value, stored by the users of cloud storage

memory for the subsequent verification operation. The authentication tree itself is as

the authenticator stored by the server. When a user searches a file corresponds to a

keyword, the challenge according to this keyword and the search token are all

generated at the same time. The server performs the search operation according to the

search token, and generates the verification path according to the challenge at the

same time. After the user obtains the search results and the proof, he decrypts the

result and gets the value of the leaf nodes in the MHT by calculating. Then calculate

the final verification value according to the proof. Compare this value to the value

stored at the client. If they are the same, the verification is passed. Otherwise the

verification is failed and the operation is terminated.

2.3 Requirements for Design

In order to fully use storage service provided by the cloud storage, let the server

performs the search operation and ensure that the server cannot get any useful

information during the interactive process, this protocol designs a cipher-text search

method.

First of all, the user selects the files to be stored and adds some keywords

descripting the file for each file. Then construct the keyword index using these

keyword information. In order to ensure that the index will not reveal the file

information, the special process of encrypting these indexes is required special. Use

the symmetric encryption algorithms such as AES algorithm to encrypt the files of the

user, send the cipher-text and the encrypted index together to the cloud storage server

for storage.

When the user retrieve a keyword, he inputs the keyword, processes it to get the

keyword token and sends it to the server. After receiving the keyword token, the

server retrieves on the encrypted index of the user, finds the cipher-text corresponding

to the token, and return the result to user. Note that in this process the server doesn't

know what the search keyword the user specifies. The only effective information that

can be obtained is the specific files corresponds to the specific token.

Using this idea, the cipher-text search method supporting keyword search is

constructed to satisfy the demand of storing confidential data in the cloud storage for

user and give the server the ability of transparent search.

3. Data Types

3.1 Definition

3.1.1 File

This section introduces the file types supported in the protocol. The file

operations in the protocol are: file upload and file update.

The file types supported in this protocol are: text files (including the files with

the suffix: .txt, .doc, .docx, .pptx, .xls, etc.), sound files (including the files with the

suffix: .mp3, .wav and so on), video files (including the files with the

suffix: .avi, .mp4 etc.)

3.1.2 Array

: Search array. The linked list indexed by the keywords of the files is called the keywords linked list. The file identifier is written into the corresponding node list. All the keywords linked list form the inverted index. In order to ensure the server cannot acquire any effective information from the index, use the pseudo random function to encrypt the inverted index. Store the encrypted index in the search array As randomly

and the head nodes of each linked list are stored in the dictionary Ts (Search Table).

: Deletion array. In order to update the encrypted index efficiently, the linked

list constructed by the keywords of each file is called the file linked list. All the file linked lists form the file index. Encrypt the index and then store it in the random position of the array which is called the deletion array Ad. The head node of the linked list is stored in the dictionary Td .

3.1.3 Index

In this protocol contains two kinds of indexes: the inverted index and the encrypted index.

The inverted index is constructed using keyword information of the file.The encrypted index is the encrypted inverted index using special method.

3.1.4 Token

This protocol uses the token to do search operations and update operations. represents the tokens which includes search token, add token, deletion token. The

search token is defined as . The add token is defined as , and the deletion token is

defined as .

Search token. The format of search token is: , in

which w represents the keyword and k represents the key. When the user retrieves a file containing a certain keyword, he first processes the keyword and get the corresponding search token. After the server gets search token, it reads the user’s encrypted index and calls the corresponding algorithm to search to get the cipher-text set corresponding to the search token. Finally the cipher-text set is sent to the user.

Add token. The format of add token is: , in

which f represents the files to be added. When the user adds a file, the client first generates add token using the file and keyword information. The client encrypts the file and sends add token and the encrypted file to the server. The server receives the encrypted file, reads the user’s encrypted index and update the encrypted index using add token.

Deletion token. The format of deletion token is:

, in which f represents the files to be deleted. The

process of deleting files is similar with adding files. The user may not have a copy of the file locally when he wants to delete a file, so the deletion algorithm needs to download the file to be deleted from the server, and then generates deletion token using the file.

3.1.5 Proof

Proof. The format of deletion proof is: , in which h represents

the height of the authenticator . When a user searches a file with a certain keyword, he generates search token and the challenge corresponding to this search. The server executes search operation according to the search token and at the same time generates the certification path according to the challenge, also known as proof.

3.1.6 Hash table

Hash table is a data structure with direct access based on the key value. That is to say, it maps the key value to a position in the table to access records in order to speed up the search. The mapping function is called the hash function, and the array which stores records is called the hash table.

3.1.7 Merkle Hash Tree

The Merkle hash tree is the authentication structure based on the tree structure. The authentication structure can be used to verify the data integrity. It is usually defined as the complete binary tree when in use.

The Merkle hash tree is a full binary tree and it just uses a one-way hash function in the computation. Sometimes complete binary tree can also be used to represent the Merkle hash tree, because the Merkle tree used in the protocol has 2 l leaf nodes and it also belongs to complete binary tree.

The initialization of the Merkle hash tree requires mapping the documents to be

authenticated to leaf nodes and grow reversely through the hash function to construct

a complete hash tree. Then the verifier only needs to record the value of the root node

of hash tree and send the hash tree as the authenticator to the untrusted server. In the

stage of verification, the verifier generates the verification challenge of some leaf

node. The server receives the challenge and generates verification paths

corresponding to the position of challenge the leaf nodes corresponding, and transmits

it to the verifier, the verifier can verify operation according to the root node to verify

the path and the stored value. This verification method is far less than the complete

data retrieved to calculate way of validation in computational cost and communication

cost.

The figure below is an example showing how to generate a hash tree:

Fig.2.3 the structure of Merkle hash tree

Assuming that the data set is and each data in the set Yi can be the

data source of leaf node. Calculate the value of the leaf node of hash tree through the

one-way hash function F, and the calculation method can be expressed as

. After calculating each leaf node value, every two brother node

values are mapped to a value which is as the node values of its father using one-way

hash function, and finally construct the whole Hash authentication tree in this way. In

the process of calculation using one-way hash function F, with two leaf node values as

input, the final outputs a fixed length value, the calculation can be expressed as

. The root node value of

hash tree is expressed using the symbol and stored by the verifier as the

verification value. However, the hash tree itself is stored by the third party server.

3.2 Implementation

The index in the protocol can be achieved using two-dimensional array. Merkle hash

tree can be achieved using full binary tree. The rest data structure can be achieved by

variables. The reference implementation of data structure in the protocol is given in

the below.

Data structure

Definition Description

Index CArray<CStringArray*,CStringArray*> A two-dimensional array, used to store the file index and the inverted index

Hash table hash_map<CString,char[16]> Hash table, storing the corresponding relationship between the elements and its

MD5 value to avoid the repeated computation of the hash value

MHT node {char[16]} The structure of MHT nodeSearch array(1)

char fileID[8] File IDshort loc_pre The position of the previous node in the

list.short loc_next The position of the next node in the list.

Search array(2)

bool flag Indicates whether an array node has been used

short loc_d_next The position of the previous node in the list.

short loc_d_dual_pre The coordinate of the previous node of the dual node in the Ad

Deletion array(1)

short loc_d_dual_next The coordinate of the next node of the dual node in the Ad

short loc_s The coordinate of the dual node in As

short loc_s_pre The coordinate of the previous node of

the dual nodeshort loc_s_next The coordinate of the next node of the

dual nodechar fk1w[16]

The record of the value Deletion array(1)

bool flag Identifierchar complexStr[32] The record of the XOR valuechar randomStr[16] The record of the random string.

The entrance

address of MHT leaf

char fk1w[16] The record of the entrance address of the MHT leaf nodeint loc

proof char updateInfo[16] The definition of the structure of proof.char prove[ProveSize]

Search token

char fk1w[16] The definition of the structure of search token.char gk2w[16]

char pk3w[16]Add

token(1)char fk1w[16] The definition of the structure of add

token.char gk2w[16]char complexStr[16]char randomStr[16]

Add token(2)

int count The structure of add token, in which count represents the number of key

words in the filechar fk1free[16]char gk2free[16]

char fk1f[16]char gk2f[16]char pk3f[16]

Deletion token(1)

char fk1f[16] The definition of the structure of deletion token when every file is

calculated.char gk2f[16]char pk3f[16]

char fk1free[16]char gk2free[16]

Deletion token(2)

int count The structure of deletion token, in which count represents the number of the files

4. Message Types

In the process of the interaction between the client and the server, the format of transmission message is defined as follows:

The message contains the following fields:1. Message typeThe Message type field is mainly used to indicate the type of the transmission

message. The field uses 8 bits, and the first 4 bits is used to distinguish between an operation message and a notification message.

If it is an operation message, it indicates that the information carried in the message is a specific operation and the first 4 bits is set to 0000. If it is a notification message, it indicates that the message is used to notify whether the operation has been performed correctly and the first 4 bits is set to 0001.

(a) Add files (the first to add). When the client firstly adds a file, the client sends the encrypted files and indexes to the server. The field of the message is set to 0x01.

(b) Search operation. When the user performs a search operation, the client generates and sends search token to the server. The message of the field is set to 0x02. When the server returns the results the user wants to search, this field is set to the 0x03 message.

(c) Authentication operation. When the authentication of the search data is requested, the client will send the challenge to the server and the field of the message is set to 0x04. When the server has generated proof, it sends the prove value to the client and the field of the message is set to 0x05.

(d) Update operation. When the user needs to add files to the server (not the first to add), the client sends the new files and add token to the server. The server updates files and index using them. The field of the message is set to 0x06. When the user needs to delete the files in the server, the client sends the deletion token to the server for deleting the files. The field of the message is set to 0x07. When the server executes a DSA status update, the server sends the new DSA state to the client for authentication. The field of the message is set to 0x08.

(e) Operation tips. It is used for the server to notify whether the operation is successful. If the operation is successful, the field of the message is set to 0x00. If the operation fails, the server returns an error message and the field of the message is set to 0x01.

2. Length

This field is used to represent the size (Byte) of the data part in the transmission.

3. DirectionThis field is used to represent the direction of message transmission. When the

client sends the message to the server, the field of the message is set to 0x00. When the server sends the message to the client, the field of the message is set to 0x01.

4. TypeThis field is used to represent the type of the transmission data.If the Data field in the transmission is encrypted file, the field is set to 0x00.If the Data field in the transmission is index the field is set to 0x01.If the Data field in the transmission is add token, the field is set to 0x02. If the

Data field in the transmission is deletion token, the field is set to 0x03. If the Data field in the transmission is search token, the field is set to 0x04.

If the Data field in the transmission is challenge, the field is set to 0x05.If the Data field in the transmission is proof, the field is set to 0x06.If the Data field in the transmission is search authenticator, the field is set to

0x07.If the Data field in the transmission is DSA state, the field is set to 0x08.If the Data field in the transmission is error information, the field is set to 0x09.

5. DataThis field is used to store the data to be transmitted.

5. File Storage

5.1 Overview

When the user uploads a file, he first chooses the files to upload from the local disk and attaches some keyword (specified by the user) description for each file. After the files has been chosen, the client preprocess the data, including generating the encrypted index, the search authenticator and the encrypted files and upload them to the server for storage.

Fig.5.1.1 The flow chart of file storage sectionThe flow chart of file storage section is shown in Figure 5.1.1. First the client

generates the keyword index according to the keyword. Then the client encrypts the index and the files and gets the cipher-text c and the encrypted index λ. The client generates the Merkle hash tree, also known as search authenticator, and sends the hash tree, the cipher-text c and the encrypted index λ to the server. The server receives these files, stores them at the local, and returns a message to the client to inform it whether the operation has been performed successfully.

5.2 Generate inverted index

The operation of generating inverted index is performed on the client in the local.According to keyword information of the file, construct the file linked list based

on the keyword. All the file linked lists form the inverted index. Encrypt the inverted index to get the encrypted index. The flow chart of generating the inverted index is showed in Figure 5.2.1.

Fig.5.2.1 The flow chart of generating the inverted index

5.3 Generate keys

The operation of generating keys is performed on the client in the local.

Key generation process

: 1k, which is the system security parameter, is the input of the function.

Select three k-bits strings K1, K2, K3 randomly as the key of the pseudorandom

function. Compute as the key of symmetric encryption algorithm.

The algorithm outputs the key .

is running on the client for generating the key of symmetric encryption

algorithm and pseudo random function. The generated key is only used locally on the client. So the key management on the client is very simple. The client only needs to store the key, and it is not related to the key distribution operation. Notably, the file encryption, the index encryption and updating operations all need the key to participate in. So if the user’s key is missing, it will be unable to retrieve their data from the cloud storage server.

5.4 Encryption

The operation of encryption is performed on the client in the local.The process of encryption includes file encryption and index encryption.

After generating the encrypted index and the authenticator, encrypt the user’s

plaintext files. The process of file encryption is relatively simple. It only needs to loop

for the collection of files, and use symmetric encryption algorithm to encrypt the files.

Attach the file name and the keyword information to the end of plaintext file for the

use of generating deletion tokens in the subsequent procession and encrypt the files.

To generate the encryption index, first traverse the inverted index to generate the

MD5 of file and keyword, the search array which has been filled and the search table.

Traverse the file linked list, and use the files and the MD5 value of keywords to fill in

the deletion array and search table. After the completion of the traversal, construct the

free linked list and store it in the array. Finally write the generated search array,

deletion array and two search tables into the file and save them in the disk.

The flow chart of generating encrypted index is shown in figure 5.4.1.

Fig.5.4.1 The flow chart of generating encrypted index

The realization process of encryption algorithm is described in detail through the formal definition.

Encryption algorithm process

: Input key K, file set , inverted index , and process as follows:

1. Initialize array , and dictionary , .

2. For each keyword ,process as follows:

(a) Create the linked list , and the list contains nodes .

These nodes will be stored in the array randomly. Define

.

Among them, represent the ith document identification and is a random

string to be filled. and are all defined as 0.

(b) Store the address of the head node of each linked-list in the search

table . The structure of is defined as

. represents the coordinate of

the dual node of in the array .

3.for each in the every file, process as follows:

(a) Construct the linked-list which contains nodes and

store the nodes in the array randomly. Notice that every node is associated

with a keyword , therefore it is also associated with a node in the linked list .

and are defined as the previous node and the next node of in the

keyword linked-list. The structure of the node is defined as:

represents the random string to be filled. is defined as 0.

(b) Store the address of the head node in the each linked-list in the search

table . The structure of is defined as .

4. Select unused nodes and randomly each from the

free array . For each node , the structure of is defined as:

.set as 0 and store the head node of the

linked-list in the search table .the structure is defined as .

5. Fill the other unused nodes in the array and with random strings.

6. Encrypt each file and get the cipher-text .

7. The algorithm outputs the encrypted file set and the encrypted

index .

Notice, MD5 value of all the files and the keywords has been calculated in the

process of generating the encrypted index. To improve the treatment efficiency, write

the data and MD5 information into the hash table (to be a dictionary) for later use.

In order to facilitate the understanding, a simple example of constructing an encryption index is given here. Suppose that there are three files to be upload:

, , , where w represents the keyword of the

file. First, the inverted index is constructed using the files and the keyword information. Then use the inverted index and the files to structure the encrypted index. The results are shown in Figure 5.4.2

Fig.5.4.2 Structure the encrypted index.The detailed construction process is as follows:① Construct the file linked-list distinguished by the keywords according to the

keyword information. All the linked-list together form the inverted index.② Define two fixed-length arrays As and Ad, and initialize them.③ for every node in the inverted index, the contents of the node is computed

according to the formula and written into the array As . Write the coordinate of the head node of the linked-list list into Ts.

④ For each keyword of each file, calculate the contents of the node and write them into the array Ad. Write the coordinate of the head node of each file linked-list into Td.

⑤ Construct the free list and write the coordinate of the head node of the linked-list into Ts

5.5 Generate search authenticator

The operation of generating the search authenticator is performed on the client in the local.

After the encryption index is generated, it will be easy to get the inverted index

and the dictionary which records all the MD5 values. It can be quick to construct the

search authenticator using these information. Note that there is an identification flag

in the array As and array Ad each, indicating whether the node has been used. In order

to hide this information to the server, the identification will be ignored and the other

contents will be written into the files when writing to the disk.

The key of generating search authenticator is to calculate the value of the leaf

node. When generate search authenticator, first initialize the MHT array according to

the number of keyword in the inverted index and then traverse the inverted index. For

each linked in the inverted index, read the corresponding MD5 value and compute and

write the result into the leaf node position corresponding to the MHT array. After the

inverted index has been traversed, calculate up according to the leaf nodes to get the

whole MHT authentication tree. Then write the tree into file, set the value of the root

node of the MHT as the authentication value and write it into the key file of user. The

process flow of generating the search authenticator is shown in figure5.5.1.

Fig.5.5.1 The process flow of generating the search authenticatorThe following describes the process of generating the search authenticator in

detail through formal definition.the process of generating the search authenticator

: Input the user key , the file set , the inverted index , and

process as follows:

1. For each keyword , compute .

2. Set as leaf node to construct MHT. stands for MHT and

stands for the value of root node of MHT.

3. The algorithm outputs authenticator and DSA state .

Execute the algorithm to get the search authenticator and the

authentication value . The client is responsible for the storing . The search

authenticator and the results generated by are stored in the Server.

5.6 Upload file

The operation of updating files is completed by both the client and the server, which is an interactive process.

After the client completes the process of encrypting the files and generating the search authenticator and the encrypted index, it uploads the encrypted files, the encrypted index and the search the authenticator to the server for storage.

When upload different files, the data filling into each field of the message is different

When the client sends the encrypted files to the server, each field of the message is filled as follows:

Message type field: 0x01, it indicates the operation is adding files the first time.Direction field: 0x00, it indicates the message is sent from the client to the

server.Type field: 0x00, it indicates the data portion carries the encrypted files.Length field and Data field will be filled based on the actual situation. When the client sends the encrypted index to the server, each field of the

message is filled as follows:Message type field: 0x01, it indicates the operation is adding files the first time.

Direction field: 0x00, it indicates the message is sent from the client to the server.

Type field: 0x00, it indicates the data portion carries the index.Length field and Data field will be filled based on the actual situation. When the client sends the search authenticator to the server, each field of the

message is filled as follows:Message type field: 0x01, it indicates the operation is adding files the first time.Direction field: 0x00, it indicates the message is sent from the client to the

server.Type field: 0x00, it indicates the data portion carries the search authenticator.Length field and Data field will be filled based on the actual situation.

5.7 Store files

The operation of storing files is performed on the client in the local.When the server receives a connection request from the client, it first checks

whether the user exists. If it exists, the server directly stores the received files in the user's corresponding folder. If the user does not exist, the server creates a new folder, and stores the files in it.

6. File Search

6.1 Overview

When the user retrieves a file includes some keywords, first use algorithm

to handle the keywords to get the corresponding search token.

When the server gets the search token, it read the encrypted index of user and use

algorithm to search. Then the server finds the search token

corresponding to cipher-text set, and sends the result to the user. The user receives and decrypts the cipher-text. After this process, the user can get the file set corresponding to the keywords without divulging any effective information.

Fig. 6.1.1 Sequence Diagram of Search Operation

6.2 Generate search token

The operation of generating search token is performed on the client in the local.

: Input the keyword w and the key K, Output the search token

6.3 Send search token

The operation of sending the search token is completed by both the client and the server, which is an interactive process. The search token generated by the client is sent to the server.

When the client sends the search token to the server, each field of the message is filled as follows

Message type field: 0x02, it indicates the operation of sending the search token belongs to search part.

Direction field: 0x00, it indicates the message is sent from the client to the server.

Type field: 0x04, it indicates the data portion carries the search token.Length field and Data field will be filled based on the actual situation.

6.4 Search

At any time, the user can input the keyword and send the request to the server to query all the files which contain the keywords.

: input the encrypted index , the search token and the

cipher-text set , and process as follows:

1. Compute , find the coordinate of the head node of the

keyword linked list . stands for the coordinate of in and stands for the

coordinate of in .

2. Express the content of in as , compute

and get the file description corresponding to

the node and the coordinate of the next node in .

3. If is not equal to zero, execute the method of step 2. If is

equal to zero, the algorithm stops.

stands for the file identifier set which has been searched. Find

cipher-text corresponding to each identifier and output .

6.5 Return result

The operation of returning the result is completed by both the client and the server, which is an interactive process. The search result generated by the server is sent to the client.

When the server sends the search result to the client, each field of the message is filled as follows:

Message type field: 0x03, it indicates the operation of returning the search result belongs to the search part.

Direction field: 0x01, it indicates the message is sent from the server to the client.

Type field: 0x00, it indicates the data portion carries the cipher-text.Length field and Data field will be filled based on the actual situation.

6.6 Decryption

The operation of decryption is performed on the client in the local.Because the files which the user receives from the server have been encrypted,

the client needs to use the same symmetric encryption algorithm to decrypt files.

: Input the key and the cipher-text set by

,and then compute to get the plaintext.

7. Challenge and Proof

7.1 Overview

Verify operation must be accompanied by search operation synchronously. The diagram below is a timing diagram of the authentication process. The process includes two parts: challenge and prove. First, the client generates the corresponding challenge according to a certain search which is based on some keywords and sends it to the server. After the server receives the challenge, it reads the user's search authenticator and generates the proof according to that search. After the client gets the proof, it decrypts the file set including the keywords, returned by the search process and validates the result. Finally the client judges whether the operation of server is legitimate according to the output of the algorithm.

Fig. 7.1.1 Sequence Diagram of Authentication Algorithm

7.2 Generate challenge

The operation of generating challenge is performed on the client in the local.

: Input the key and the keyword for searching w , and then

compute and output the challenge .

7.3 Send challenge

The operation of sending challenge is completed by both the client and the server, which is an interactive process. The challenge generated by the server is sent to the client.

When the server sends the challenge to the client, each field of the message is filled as follows:

Message type field: 0x04, it indicates the operation of sending challenge belongs to the authentication part.

Direction field: 0x00, it indicates the message is sent from the client to the server.

Type field: 0x05, it indicates the data portion carries the challenge.Length field and Data field will be filled based on the actual situation.

7.4 Generate proof

The operation of generating challenge is performed on the client in the local.

: Input the search authenticator , the challenge , and process as

follows:

1. Traverse , and find the first leaf node M whose element is .

2. Traverse from node M to the root node, and record the sibling node value

of all the node in the traversal path.

3. Output the proof , and h is the height of the authenticator .

7.5 Send proof

The operation of sending proof is completed by both the client and the server, which is an interactive process. The proof generated by the server is sent to the client.

When the server sends the challenge to the client, each field of the message is filled as follows:

Message type field: 001000, it indicates the operation of sending proof belongs to the authentication part.

Direction field: 0x01, it indicates the message is sent from the server to the client.

Type field: 0x05, it indicates the data portion carries the proof.Length field and Data field will be filled based on the actual situation.

7.6 Validate proof

The operation of validating proof is performed on the client in the local.

: Input the file , the proof and the state got by search, and

process as follows:

1. Compute to get the value of leaf node .

2. Compute when , to get the validation value .

3. If , the validation is passed and outputs 1. Otherwise outputs 0.

8. File Update

8.1 Overview

The timing diagram of the update operation is shown below.

Fig. 8.1.1 Sequence Diagram of Add Files

When a user adds a file, the client first uses the file and the keywords to generate add tokens (including the SSE token and the token DSA), and encrypts the files. Then the client uploads cipher-text and add token. The server updates the encrypted index and the search authenticator using the token and simultaneously stores the cipher-text. Then it return the update information back to the client. The client updates the local DSA state using the update information.

The process of deleting files is similar with adding files. When deleting files, the copy of file in the local should be accounted for. So the deleting algorithm first needs to download the files which the user want to delete from the server. Then the client

generates the deletion token using the files. We assume that the client has the copies of files to be removed.

8.2 Generate Keys

Key generation process

: is the security parameter of the system. Select two k-bit length strings

randomly according to the safety parameter.

The key generation algorithm and running on the client is used for

generating the key of the DSA algorithm. The generated key consists of two parts,

which are keys of two pseudo random functions respectively. Same with the key

generation of SSE algorithm, the generation and use of DSA key and use only in the

Client and the client is responsible for keeping the key.

8.3 Generate update token

The operation of generating update token is performed on the client in the local.The update operation includes add operation and deletion operation, so the

update token also includes the add token and the deletion token.

Generating add token:

: Input the key K, the files which are to be added, the

inverted index , and process as follows:

1. For each keyword of the files ( ), compute

, and are fixed-length random strings.

2. Compute , according to the result of step 1.

3. Encrypt the file .

4. The algorithm outputs the add token and the cipher-text .

Generating the deletion token:

: input the key , the file , and compute

, the algorithm output the deletion token .

8.4 Send token/file

After generating the update token, the client sends the token to the server.The operation of sending token is completed by both the client and the server,

which is an interactive process. The token generated by the client is sent to the server. When the client adds file, the client not only needs to upload the add token, but also needs to upload encrypted files.

Add files:The process of adding files is divided into two sub-processes: sending the token

and sending the files. When the server sends the token to the client, each field of the message is

filled as follows:Message type field: 0x06, it indicates the operation of adding files belongs to the

updating part.Direction field: 0x00, it indicates the message is sent from the client to the

server.Type field: 0x02, it indicates the data portion carries the addition token.Length field and Data field will be filled based on the actual situation. When the server sends the encrypted file to the client, each field of the

message is filled as follows:Message type field: 0x06, it indicates the operation of adding files belongs to the

updating part.Direction field: 0x00, it indicates the message is sent from the client to the

server.Type field: 0x00, it indicates the data portion carries the encrypted file.Length field and Data field will be filled based on the actual situation.

Delete files: When the server sends the token to the client, each field of the message is

filled as follows:Message type field: 0x07, it indicates the operation of deleting files belongs to

the updating part.Message type field: 0x07, it indicates the operation of deleting files belongs to

the update part.Direction field: 0x00, it indicates the message is sent from the client to the

server.Type field: 0x00, it indicates the data portion carries the challenge.Length field and Data field will be filled based on the actual situation.

8.5 Update files

The operation of updating files is performed on the client in the local.When the server receives the files from the client, the server stores them in the

local.

8.6 Update index

The operation of updating index is performed on the client in the local.Update index(add files operation)

Adding file process

:Input the cipher-text ,the encrypted index ,the add token , and

process as follows:

1. Store the cipher-text: .

2.Set as ,for each , process as follows:

(a) Find the coordinate of the head node M of the free list in through .

(b) Compute to find the coordinate of the next free node, and the

coordinate of the dual node of M in , .

(c) Set and update the linked-list .

(d) Compute , find the coordinate of the head node N of

the linked-list in , and the coordinate of dual node of M.

(e) Let represent for , set , and set node M as

the head node of the linked-list .

(f) Set , update the content of node M.

(g) Set , update the search table .

(h) Let represent for , set , and

update the content of node .

(i) Set ,update the content of node

(j) Set , update the search list .

3. The algorithm outputs the new encrypted file set and the updated encrypted index

.

Update index (delete file operation)

Deleting file process

:Input the encrypted index ,the encrypted file set , the deletion token

, and process as follows:

1. Let be , compute , and find the coordinate of the head

node of the linked-list in .

2. For each node in the linked-list , process as follows:

(a) Compute , in which .

(b) Use random string to fill .

(c)Compute , find the address of the head node of the linked-list .

(d) Set , let the node be the head node of the linked-list .

(e) Set , update the content of N which is the dual node of , let

it in the linked-list .

(f) Let be the previous node of in the keyword linked-list. Set

, in which . Set

, in which

.

(g) Let be the next node of in the keyword linked-list. Set

, in which . Set

, in which

.

(h) Set , and execute (a)

3. Delete the files with the identification in the encrypted file set, .

4. The algorithm outputs the new encrypted file set and the updated encrypted index

.

The process flow of deleting files is approximately same as adding files. The

client generates the delete token using the files and the keywords and sends it to the

server. The server reads the encrypted index, update the encrypted index using the

deletion token, and delete the corresponding files from the encrypted file set.

8.7 Update Search Authenticator

The operation of updating the search authenticator is performed on the client in the local.

In order to make user still validate the search result after adding and deleting files, the search authenticator must be updated after adding and deleting files. The DSA state (hash tree) stored at the user also needs the corresponding update operation.

When the user needs to add a file, he first generates add token using the files and the keywords, and then encrypts the files. Finally he sends add token and encrypted files to the server. After the server receives the data, it reads the user's search authenticator, update the search authenticator operation according to add tokens, and stores the encrypted files of the user. The process flow of deleting the files is roughly the same. Just change the file storage operation to delete files operation. The specific operations are as follows:

Generate update information:

: Input the search authenticator and add token , and process as

follows:

1. Set the add token as .

2. When ,

(a)Find the first leaf node the value of which is in the authenticator

(b)Set as . Compute .

(c)Record the critical path from M to the root.

3. Reconstruct MHT with the node and the nodes that do not change,

to get the new authenticator .

4. Output the new authenticator and the update information .

8.8 Return new DSA

The operation of returning DSA state is completed by both the client and the

server, which is an interactive process. The new DSA state generated by the server is sent to the client.

When the server sends the DSA state to the client, each field of the message is filled as follows:

Message type field: 0x07, it indicates the operation of returning DSA belongs to the update part.

Direction field: 0x01, it indicates the message is sent from the server to the client.

Type field: 0x00, it indicates the data portion carries the DSA state.Length field and Data field will be filled based on the actual situation.

8.9 Update DSA

The operation of updating DSA state is performed on the client in the local.The DSA state is the root node values. It is stored at the client in the local, used

in the authentication operation.

: input DSA state , update information , add token , and

process as follows:

1. Let the token be , the update information be ,

and the leaf node corresponding to be .

2. Validate using state , when . If the validation is passed, continue.

Otherwise the algorithm outputs .

3. Compute , when .

4. Use to update .

5. Output new state value .

9 Error Handling

Error Definition Result Reason Action

Log-on error Password is incorrect/Cannot connect to the Internet

Message pops up, and input again/check the Internet

Uploading error Cannot connect to the Internet / file is occupied

Message pops up, and check the network / close occupied file

Operation error The content of the message is filled in error (such as the message transmission direction may fill in error)

Message pops up, and check the content filled in