cubesat cloud: a framework for …ufdcimages.uflib.ufl.edu/uf/e0/04/61/39/00001/challa_o.pdfcubesat...

101
CUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE SENSING DATA ON CUBESAT CLUSTERS By OBULAPATHI NAYUDU CHALLA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2013

Upload: others

Post on 14-Mar-2020

13 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

CUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSINGAND COMMUNICATION OF REMOTE SENSING DATA ON CUBESAT CLUSTERS

By

OBULAPATHI NAYUDU CHALLA

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2013

Page 2: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

© 2013 Obulapathi Nayudu Challa

2

Page 3: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

I dedicate this to my family, my wife Sreevidya Inturi, my mother Rangamma Challa, my

father Ananthaiah Challa, my sister Sreelatha Chowdary Lingutla, my brother-in-laws

Ramesh Naidu Lingutla, Sreekanth Chowdary Inturi, my father-in-law Sreenivasulu

Chowdary Inturi, my mother-in-law Venkatalakshmi Inturi, my brothers Akshay Kumar

Anugu and Dheeraj Kota and my uncle Venkatanarayana Pattipaati, for all their love and

support.

3

Page 4: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

ACKNOWLEDGMENTS

It has been a great experience being a student of Dr. Janise Y. McNair for the last

five and half years. There was never a time that I did not feel cared for, thanks to her

constant support and guidance.

I would like to thank my committee Dr. Xiaolin (Andy) Li, Dr. Norman G. Fitz-Coy

and Dr. Haniph A. Latchman for agreeing to serve on my committee. I would like to

thank them for providing valuable feedback in completing my dissertation. Thanks to

the professors at University of Florida Dr. Patrick Oscar Boykin, Ms. Wenhsing Wu, Dr.

Ramakant Srivastava, Dr. Erik Sander, Dr. A. Antonio Arroyo, Dr. Jose A. B. Fortes, Dr.

John M. Shea, Dr. Greg Stitt, Dr. Sartaj Sahni and Dr. Shigang Chen, for teaching me

what all I know today. Thanks to staff at University of Florida Ray E. McClure II, Jason

Kawaja, Shannon M Chillingworth, Cheryl Rhoden and Stephenie A. Sparkman, for

their patience with my countless requests and administrative questions. I would like to

take this opportunity to thank all my Wireless and Mobile Group colleagues, past and

present, for being there with me and helping me all along in one way or other. I would

like to thank Alexander Verbitski for his mentorship during my internship.

I would like to thank my teachers Sreedevi, Uma Kantha, Nalini Sreenivasan, K.

Ramakrishna, K. Bhaskar Naidu, Sambasiva Reddy, A Koteswar Rao, A. K. Rama

Rao, Dr. Vijay Kumar Chakka, Dr. Gautam Dutta and Dr. Prabhat Ranjan who greatly

influenced my life. Internet and Open Source have made this world a true Vasudhaika

Kutumbam for me. I would like to thank Linus Torvalds, creator of Linux; Richard

Matthew Stallman, founder of GNU; Vint Cerf, father of Internet; Tim Berners-Lee,

inventor of the World Wide Web; Guido Rossum, creator of Python programming

language; Satoshi Nakamoto, inventor of Bitcoin; Masashi Kishimoto, creator of Naruto;

Mark Shuttleworth, founder of Ubuntu and Tim O’Reilly, the founder of O’Reilly Media.

Life at University of Florida has been always fun and exciting, thanks to the

wonderful friends around here: Dan Trevino, Dante Buckley, Gokul Bhat, Hrishikesh

4

Page 5: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Pendurkar, Jimmy (Tzu Yu) Lin, Karthik Talloju, Krishna Chaitanya, Kishore Yalamanchili,

Madhulika Dandina, Manu Rastogi, Paul Muri, Rakesh Chalasani, Ravi Shekhar, Seshu

Pria, Shruthi Venkatesh, Subhash Guttikonda, Udayan Kumar, Vaibhav Garg, Vijay

Bhaskar Reddy and Vivek Anand. I would like to thank Mr. Iqbal Qaiyumi, Dr. Shaheda

Qaiyumi, Mr. Jagat Desai and Mrs. Vatsala Desai for taking care of me like their son.

Thanks to my long-distance friends Bhargavi Vanga, Praveen Kumar, Radha Vummadi,

Uday Kumar, Uzumaki Naruto and Vijay Kumar, who have been close even when they

were far.

Lastly, I would like to thank my family - my wife Sreevidya Inturi, my mother

Rangamma Challa, my father Ananthaiah Challa, my sister Sreelatha Chowdary

Lingutla, my brother-in-laws Ramesh Naidu Lingutla, Sreekanth Chowdary Inturi, my

father-in-law Sreenivasulu Chowdary Inturi, my mother-in-law Venkatalakshmi Inturi,

my brothers Akshay Kumar Anugu and Dheeraj Kota and my uncle Venkatanarayana

Pattipaati. Their endless love and support throughout the years has meant more to me

than words can express. I would like to dedicate my dissertation to them.

5

Page 6: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

CHAPTER

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.1 CubeSat Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.1 Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Evolution of CubeSat Networks . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.1 Summary and Limitations of CubeSat Communications . . . . . . 242.3 Distributed Satellite Systems . . . . . . . . . . . . . . . . . . . . . . . . . 252.4 Classification of Distributed Satellite Systems . . . . . . . . . . . . . . . . 272.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.5.1 Distributed Storage Systems . . . . . . . . . . . . . . . . . . . . . 272.5.2 Distributed Computing Techniques . . . . . . . . . . . . . . . . . . 30

3 NETWORK ARCHITECTURE OF CUBESAT CLOUD . . . . . . . . . . . . . . 34

3.1 Components of the CubeSat Network . . . . . . . . . . . . . . . . . . . . 353.1.1 Space Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.1.2 Ground Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2 System Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2.1 Cluster Communication . . . . . . . . . . . . . . . . . . . . . . . . 383.2.2 Space Segment to Ground Segment Communication . . . . . . . 393.2.3 Ground Segment Network Communication . . . . . . . . . . . . . . 40

3.3 CubeSat Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.1 Storage, Processing and Communication of Remote Sensing Data

on CubeSat Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.2 Source Coding, Storing and Downlinking of Remote Sensing Data

on CubeSat Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 DISTRIBUTED STORAGE OF REMOTE SENSING IMAGES ON CUBESATCLUSTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 Key Design Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.1.1 Need for Simple Design . . . . . . . . . . . . . . . . . . . . . . . . 454.1.2 Low Bandwidth Operation . . . . . . . . . . . . . . . . . . . . . . . 45

6

Page 7: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

4.1.3 Network Partition Tolerant . . . . . . . . . . . . . . . . . . . . . . . 454.1.4 Autonomous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.1.5 Data Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Shared Goals Between CDFS, GFS and HDFS . . . . . . . . . . . . . . . 464.2.1 Component Failures are Norm . . . . . . . . . . . . . . . . . . . . 464.2.2 Small Number of Large Files . . . . . . . . . . . . . . . . . . . . . 464.2.3 Immutable Files and Non-existent Random Read Writes . . . . . . 47

4.3 Architecture of CubeSat Distributed File System . . . . . . . . . . . . . . 474.3.1 File System Namespace . . . . . . . . . . . . . . . . . . . . . . . . 494.3.2 Heartbeats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 File Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.4.1 Create a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.4.2 Writing to a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.4.3 Deleting a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5 Enhancements and Optimizations . . . . . . . . . . . . . . . . . . . . . . 524.5.1 Bandwidth and Energy Efficient Replication . . . . . . . . . . . . . 52

4.5.1.1 Number of nodes on communication path = replicationfactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5.1.2 Number of nodes on communication path >replicationfactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5.1.3 Number of nodes on communication path <replicationfactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5.2 Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.5.3 Chunk Size and Granularity . . . . . . . . . . . . . . . . . . . . . . 564.5.4 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5.5 Master Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5.6 Worker Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5.7 Chunk Corruption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.5.8 Inter CubeSat Link Failure . . . . . . . . . . . . . . . . . . . . . . . 584.5.9 Network Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.7 Summary of CubeSat Distributed File System . . . . . . . . . . . . . . . . 59

5 DISTRIBUTED PROCESSING OF REMOTE SENSING IMAGES ON CUBESATCLUSTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1 CubeSat MapMerge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.2 Command and Data Flow during a CubeSat MapMerge Job . . . . . . . . 615.3 Fault Tolerance, Failures, Granularity and Load Balancing . . . . . . . . . 63

5.3.1 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.3.2 Master Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.3.3 Worker Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.3.4 Task Granularity and Load Balancing . . . . . . . . . . . . . . . . . 64

5.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.5 Summary of CubeSat MapMerge . . . . . . . . . . . . . . . . . . . . . . . 65

7

Page 8: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

6 DISTRIBUTED COMMUNICATION OF REMOTE SENSING IMAGES FROMCUBESAT CLUSTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.1 CubeSat Torrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.2 Command and Data Flow During a Torrent Session . . . . . . . . . . . . . 676.3 Enhancements and Optimizations . . . . . . . . . . . . . . . . . . . . . . 67

6.3.1 Improve Storage Reliability and Decrease Storage Overhead . . . 676.3.2 Using Source Coding to Improve Downlink Time . . . . . . . . . . 696.3.3 Improving the Quality of Service for Real-time Traffic Applications

Like VoIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4 Fault Tolerance, Failures, Granularity and Load Balancing . . . . . . . . . 71

6.4.1 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.4.2 Master Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.4.3 Worker Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.4.4 Task Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.4.5 Tail Effect and Backup Downloads . . . . . . . . . . . . . . . . . . 73

6.5 Simulation Results and Summary of CubeSat Torrent . . . . . . . . . . . 73

7 SIMULATOR, EMULATOR AND PERFORMANCE ANALYSIS . . . . . . . . . . 75

7.1 Hardware and Software of Master and Worker CubeSats for Emulator . . 757.2 Hardware and Software of Server and Ground Station for Emulator . . . . 777.3 Network Programming Frameworks . . . . . . . . . . . . . . . . . . . . . 77

7.3.1 Twisted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.3.2 Eventlet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.3.3 PyEv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.3.4 Asyncore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.3.5 Tornado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.3.6 Concurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7.4 Twisted Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.5 Network Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.6 CubeSat Cloud Emulator Setup . . . . . . . . . . . . . . . . . . . . . . . . 797.7 CubeSat Cloud Simulator Setup . . . . . . . . . . . . . . . . . . . . . . . 797.8 CubeSat Reliability Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 827.9 Simulation and Emulation Results . . . . . . . . . . . . . . . . . . . . . . 82

7.9.1 Profiling Reading and Writing of Remote Sensing Data Chunkson Raspberry Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.9.2 Processing, CubeSat to CubeSat and CubeSat to Ground StationChunk Communication Time . . . . . . . . . . . . . . . . . . . . . . 83

7.9.3 Storing Remote Sensing Images using CubeSat Cloud . . . . . . . 847.9.4 Processing Remote Sensing Images using CubeSat Cloud . . . . 857.9.5 Speedup and Efficiency of CubeSat MapMerge . . . . . . . . . . . 867.9.6 Downlinking Remote Sensing Images Using CubeSat Cloud . . . . 877.9.7 Speedup and Efficiency of CubeSat Torrent . . . . . . . . . . . . . 887.9.8 Copy On Transmit Overhead . . . . . . . . . . . . . . . . . . . . . . 897.9.9 Source Coding Overhead . . . . . . . . . . . . . . . . . . . . . . . 89

8

Page 9: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

7.9.10 Metadata and Control Traffic Overhead . . . . . . . . . . . . . . . . 907.9.11 Comparison of CDFS with GFS and HDFS . . . . . . . . . . . . . 907.9.12 Simulator vs Emulator . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.10 Summary of Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 95

8 SUMMARY AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

9

Page 10: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

LIST OF TABLES

Table page

2-1 CubeSat data speeds and downloads . . . . . . . . . . . . . . . . . . . . . . . 25

10

Page 11: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

LIST OF FIGURES

Figure page

1-1 CubeSat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2-1 Generations of CubeSat networks . . . . . . . . . . . . . . . . . . . . . . . . . 23

2-2 Genso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2-3 Architectural overview of the Google File System . . . . . . . . . . . . . . . . . 29

2-4 Architectural overview of the Hadoop Distributed File System . . . . . . . . . . 31

3-1 Architecture of a CubeSat network . . . . . . . . . . . . . . . . . . . . . . . . . 34

3-2 Architecture of a CubeSat cluster . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3-3 A blown up picture of ESTCube-I CubeSat, showing its subsystems . . . . . . 36

3-4 Ground station . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3-5 Ground station antenna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3-6 Overview of CubeSat Cloud and its component frameworks . . . . . . . . . . . 42

3-7 Integration of CubeSat Distributed File System and CubeSat Torrent . . . . . . 44

4-1 Architecture of CubeSat Distributed File System . . . . . . . . . . . . . . . . . 48

4-2 Bandwidth and energy efficient replication . . . . . . . . . . . . . . . . . . . . . 53

4-3 Copy on transmit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5-1 Example of CubeSat MapMerge . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5-2 Overview of execution of CubeSat MapMerge on CubeSat cluster . . . . . . . 62

6-1 Overview of CubeSat Torrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7-1 Raspberry Pi mini computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7-2 CubeSat Cloud emulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

7-3 CubeSat Cloud simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7-4 Lifetimes of CubeSats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7-5 Read and write times of a chunk . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7-6 CubeSat to CubeSat and CubeSat to ground station chunk communicationprofiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7-7 File distribution time for various file sizes and cluster sizes . . . . . . . . . . . . 86

11

Page 12: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

7-8 File processing time for various file sizes and cluster sizes . . . . . . . . . . . . 86

7-9 Speedup of CubeSat MapMerge . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7-10 Efficiency of CubeSat MapMerge . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7-11 File downlinking time for various file sizes and cluster sizes . . . . . . . . . . . 89

7-12 Speedup of CubeSat Torrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7-13 Efficiency of CubeSat Torrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7-14 Bandwidth overhead due to replication . . . . . . . . . . . . . . . . . . . . . . . 92

7-15 Bandwidth overhead due to source coding . . . . . . . . . . . . . . . . . . . . . 92

7-16 Bandwidth and energy overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7-17 Bandwidth consumption of CDFS vs GFS and HDFS . . . . . . . . . . . . . . . 93

7-18 Write time of CDFS vs GFS and HDFS . . . . . . . . . . . . . . . . . . . . . . 94

7-19 Energy consumption of CDFS vs GFS and HDFS . . . . . . . . . . . . . . . . . 94

7-20 Simulator vs emulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

12

Page 13: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

CUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSINGAND COMMUNICATION OF REMOTE SENSING DATA ON CUBESAT CLUSTERS

By

Obulapathi Nayudu Challa

December 2013

Chair: Janise Y. McNairMajor: Electrical and Computer Engineering

CubeSat Cloud is a novel vision for a space based remote sensing network that

includes a collection of small satellites (including CubeSats), ground stations, and a

server, where a CubeSat is a miniaturized satellite with a volume of a 10x10x10 cm

cube and has a weight of approximately 1 kg. The small form factor of CubeSats limits

the processing and communication capabilities. Implemented and deployed CubeSats

have demonstrated about 1 GHz processing speed and 9.6 kbps communication speed.

A CubeSat in its current state can take hours to process a 100 MB image and more than

a day to downlink the same, which prohibits remote sensing, considering the limitations

in ground station access time for a CubeSat.

This dissertation designs an architecture and supporting networking protocols to

create CubeSat Cloud, a distributed processing, storage and communication framework

that will enable faster execution of remote sensing missions on CubeSat clusters. The

core components of CubeSat Cloud are CubeSat Distributed File System, CubeSat

MapMerge, and CubeSat Torrent. The CubeSat Distributed File System has been

created for distributing of large amounts of data among the satellites in the cluster. Once

the data is distributed, CubeSat MapReduce has been created to process the data

in parallel, thereby reducing the processing load for each CubeSat. Finally, CubeSat

Torrent has been created to downlink the data at each CubeSat to a distributed set of

ground stations, enabling faster asynchronous downloads. Ground stations send the

13

Page 14: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

downlinked data to the server to reconstruct the original image and store it for later

retrieval.

Analysis of the proposed CubeSat Cloud architecture was performed using a

custom-designed simulator, called CubeNet and an emulation test bed using Raspberry

Pi devices. Results show that for cluster sizes ranging from 5 to 25 small satellites,

faster download speeds up to 4 to 22 times faster - can be achieved when using

CubeSat Cloud, compared to a single CubeSat. These improvements are achieved at

an almost negligible bandwidth and memory overhead (1%).

14

Page 15: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

CHAPTER 1INTRODUCTION

A CubeSat is a miniaturized satellite primarily used for university space research

[1]. It has a volume of exactly one litre, weighs no more than one kilogram and is built

using commercial off-the-shelf components [2]. Future satellite systems are envisioned

to be made up of a cluster or constellation of smaller satellites like CubeSats in support

of huge monolithic satellites together forming a distributed space network. However,

weight, volume, power and geometry constraints of CubeSats must be overcome in

order to provide required processing, storage and communication capabilities. Figure

1-1 shows the picture of a CubeSat. A CubeSat has only about 1 GHz processor, 1

GB of RAM, 32 - 64 GB of flash memory and 9.6 kbps communication capability [2]

[3]. On other hand, remote sensing missions like weather monitoring, flood monitoring

and volcanic activity monitoring require intensive processing or downlinking large

amounts of data. With its limited resources, a CubeSat can take hours to process one

remote sensing image and days to downlink the same [4] [5]. Thus, processing and

communication systems have become bottlenecks for employing CubeSats on remote

sensing missions.

The advantages of CubeSats are its low cost, low round trip time for communication

of ground station and they are easy to experiment with. The manufacturing cost of a

typical large satellite weighing about 1000 kg is on the order of hundreds of millions

of dollars [6] because all the components are custom made and need to be tested

extensively before launch. However, most of the components of a CubeSat are

commercial off the shelf components (COTS). Only the payload is custom designed

from ground up. Thus, CubeSats can be engineered at a price of about half a million to

a few million dollars. This cost is orders of magnitude less than the cost of a typical large

satellite [7] [3]. Launches can be achieved in groups of CubeSats at one time.

15

Page 16: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 1-1. CubeSat

Image courtesy of NASA. Picture by Paul Adams.

Large satellites are launched into geostationary earth orbit (GEO) or highly elliptical

orbit or high earth orbit (HEO) orbit which are 36000 km or 50000 km. As a result of

the long distance between earth and satellite, signal propagation delay is about 120

ms and round trip time approximates to about 250 ms. CubeSats are launched into low

earth orbit (LEO) orbit which is about 600 - 800 km from earth. As a result, the RTT for

the signal reduces to about 10 ms, which could better quality of service for applications

like real time tracking and voice applications, when compared to that of RTT for GEO or

HEO satellite.

Finally, since a CubeSat mission costs half a million to a few million dollars and can

be launches in large numbers using a single rocket, mission failure is not fatal. Since

mission failure is not fatal, and costs are lower, new technologies can be easily inserted

into an existing space network via CubeSats.

CubeSats have very limited resources to accomplish meaningful remote sensing

missions. Typical processing power of CubeSats is about 1 GHz and has 1 GB of

RAM. As a result, computational power of CubeSats is not sufficient for executing

16

Page 17: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

processing intensive remote sensing missions. CubeSats use structurally simple low

gain antennas like monopole or dipole and have limited power budget of about 500 mW

for communication system. The typical communication data rate between a CubeSat

and a ground station is about 9.6 kbps [5]. As a result, large amounts of data can not

be downloaded to ground stations in a reasonable amount of time. CubeSats have low

memory, processing, battery power and communication capabilities. Timing constraints

are too tight to have long communication windows. Each CubeSat is controlled

individually. Currently, there is no meaningful way of controlling multiple CubeSats

using a unified control mechanism. As a result, a single CubeSat cannot perform

processing and communication intensive remote sensing missions in a meaningful time.

1.1 CubeSat Cloud

In this work we propose CubeSat Cloud, a framework for distributed storage,

processing and communication of remote sensing data. We demonstrate that CubeSat

Cloud can store remote sensing data on a CubeSat cluster in a distributed fashion to

allow the possibility of distributed computation and communication, speeding up remote

sensing missions. For distributing remote sensing data, CubeSat Cloud uses CubeSat

Distributed File System. For distributed processing and communication, CubeSat Cloud

uses CubeSat MapMerge and CubeSat Torrent respectively. We reduce the bandwidth

and energy consumption by energy efficient replication and liner block source coding.

In chapter 2 we outline the evolution of CubeSat Network and present relevant

background in distributed satellite systems, storage systems and computing techniques.

In chapter 3, we describe the architecture of CubeSat Network, which consists of two

segments, namely a space segment and a ground segment. the space segment is

designed to be a CubeSat cluster with a radius of about 100 km. It consists of Sensor

nodes and Worker nodes which are inter-connected using high speed communication

links. Sensor CubeSat has sensing subsystem and act as Master of the cluster while

executing remote sensing missions. Worker nodes are typical 1U CubeSats (10 x 10 x

17

Page 18: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

10 cm cube) with standard subsystems. Ground segment is made up of a ground station

server and several ground stations. Ground stations are connected to the server via the

Internet. Ground stations act as relays between ground station server and CubeSats.

On top of the described network architecture, we build the CubeSat Cloud platform.

In Chapter 4, we describe the three core components of CubeSat Cloud namely

CubeSat Distributed File System (CDFS), CubeSat MapMerge and CubeSat Torrent.

CubeSat Distributed File System is used for distributing the remote sensing data to the

nodes in the CubeSat cluster. Once the remote sensing data is distributed, CubeSat

MapMerge and CubeSat Torrent are used for processing and downlinking the remote

sensing data.

CDFS splits large sized remote sensing data into chunks and distributes the chunks

to the Worker nodes in the cluster. CDFS uses ”Copy-On-Transmit” for creating replicas

with very low bandwidth and energy overhead. Source coding is used for reducing

the storage and bandwidth overhead for missions which require only downlinking of

remote sensing data. We demonstrate that CDFS can store data reliably, without loss

of any data, even if a limited number of CubeSats go offline. Distributing the remote

sensing data to nodes in the cluster allows the possibility of distributed processing and

distributed communication for speeding up the remote sensing missions.

In chapter 6, we describe the working of CubeSat MapMerge in detail. CubeSat

MapMerge is a distributed processing framework inspired by Google MapReduce [8].

Worker nodes process the chunks stored with them in parallel. Failures are detected

using the Heartbeat mechanism and failed executions are re-scheduled on other worker

nodes. We demonstrate that CubeSat MapMerge can speed up processing of large

sizes of remote sensing data on CubeSat clusters by a factor of the size of cluster (i.e.,

the number of CubeSats in the cluster) and is resilient to worker and communication link

failures.

18

Page 19: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

In Chapter 7 we explain the downlink process. Multiple raw or processed chunks

are downlinked in parallel to ground stations. Once a chunk is downlinked to a ground

station, it is forwarded to the ground station server. After receiving all chunks, the Server

uses the chunks to reproduce sensor image. We demonstrate that CubeSat Torrent

can speed up the downlinking of large files by a factor of the cluster size (number of

CubeSats in the cluster) and is resilient to worker and communication link failures.

To test the performance of the system, we built a CubeSat Cloud simulation

framework and a CubeSat Cloud testbed for emulation. We describe the testbed

and simulation setup in detail in chapter 8. CubeSats are emulated using Raspberry

Pi mini-computers, while the terrestrial Server and ground stations are emulated

using standard desktop computers. CubeSat Cloud is written in Python programming

language using Twisted, an event based asynchronous network programming framework.

Simulation results indicate that CubeSat MapMerge and CubeSat Torrent, with cluster

sizes in the range of 5 - 25 CubeSats together enable 4.75 - 23.15 times faster

(compared to a single CubeSat) processing and downlinking of large sized remote

sensing data. This speed is achieved at an almost negligible bandwidth and memory

overhead (1%). Emulation results from the CubeSat Cloud testbed agree with simulation

results and, indicate that our proposed CubeSat Cloud can speed up remote sensing

missions by a factor of the size of the CubeSat cluster with minimal overhead, while

achieving asynchronous download with short communication windows.

19

Page 20: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

CHAPTER 2BACKGROUND

The CubeSat concept was initiated by Professor Twiggs as a teaching tool to

help students learn the process of developing, launching and operating satellite.

CubeSats are currently designed for low Earth orbits. They are well suited for distributed

sensing applications and low data rate communications applications. Unlike the large

monolithic satellites, CubeSats are built to a large degree using commercial off the shelf

components (COTS). Engineering CubeSats using COTS equipment and following

standards in design and development have shortened the development cycles and

reduced costs. CubeSats are typically launched and deployed using a mechanism called

P-POD [9], developed and built by Cal Poly. P-PODs are mounted to a launch vehicle

carrying CubeSats and deploy them once the proper signal is received from the launch

vehicle. The P-POD Mk III has a capacity for three 1U CubeSats. A P-POD can deploy

three 1U or one 1U and one 2U or one 3U CubeSat.

CubeSats carry one or two scientific payloads like magnetic field sensor, image

sensor or ion concentration finder. Several companies and research institutes offer

regular launch opportunities in clusters of several cubes. CubeSat as a specification for

constructing and deploying pico satellites accomplishes following goals:

1. Encapsulation of launcher-payload interface: CubeSat standard eliminates asignificant amount of managerial work and makes it easy to mate a piggybacksatellite with its launcher.

2. Unification among payloads and launchers: Satellites adhering to CubeSatstandard can be interchanged quickly for one another and thus enables utilizationof launch opportunities on short notice.

3. Simplification of pico satellite infrastructure: CubeSat standard it possible to designand produce an operational small satellite at a very low cost.

2.1 Remote Sensing

Acquiring information about an object with out making physical contact with it is

called Remote sensing. Usually it refers to gathering information about atmosphere and

earth surface using satellites. Remote sensing can be performed in either passive or

20

Page 21: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

active sensors. Passive remote sensors use natural radiation reflected by the object

under observation. Film photography, infrared, charge-coupled devices and radiometers

are examples of passive remote sensors. Active remote sensors make use of radiation

source and observe objects using the scattered or reflected radiation. Examples

of active remote sensors include RADAR and LiDAR. It is easy to collect the data

from inaccessible and dangerous places using remote sensing. Weather monitoring,

deforestation monitoring, glacial activity monitoring, volcano monitoring, flood and other

disaster monitoring are some of the examples of remote sensing applications. Each

data point collected by remote sensors is typically anywhere between 10 MB to 100

MB. Resolution of a remote sensor varies from 1 m - 1000 m per pixel depending on

the sensor. remote sensing data is immutable. It does not change after acquisition.

With its given resources, a single CubeSat can take about 10 hours for processing

a remote sensing image and 2 days to downlink the same [5]. Execution of remote

sensing missions using CubeSats can be speeded up by parallelizing the processing

and downlinking of the remote sensing images using a cluster of CubeSats.

2.2 Evolution of CubeSat Networks

Since the launch of the first CubeSat into space in 2003, CubeSat communication

networks have evolved in several ways. Very early CubeSats used to communicate

with their home ground station only as shown in the Figure 2-1. These networks can

be classified as generation 1 CubeSat networks. A typical CubeSat in 600 - 800 km

orbit has a window of about 8 minutes and it gets in contact with ground station about

4 times a day. This limited the communication window to about 25 minutes per day.

The first generation CubeSats operated at speeds of 1.2 kbps. This limits the total

downlink capacity to 1.8 MB. However no CubeSat achieved this high downlink or uplink

bandwidth due to various limitations including inefficient protocols, large amount of

beacon data, power constraints, unreliable communication systems on board. As a

21

Page 22: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

result, most missions collected a modest amount of about a few MB of data (<12 MB)

for their whole lifetime [10].

With the introduction of MoreDBs (Massive operations, recording, and experimentation

Database System) [11], CubeSat networks made the next significant step in their

evolution. MoreDBs is a system to manage all data generated by Cal Poly small

satellites. It is an attempt to consolidate all satellite information into a single, readily

accessible location to make data analysis more efficient. Using networks like MoreDBs,

mission controllers can collect the beacons from their small satellites received by other

amateur radio operators as shown in the Figure 2-1. This significantly increased the

amount of small satellite health information available and also served to track the

whereabouts of small satellite. These efforts brought CubeSat networks into generation

2.

However, MoreDB has following two significant limitations. First, MoreDB architecture

requires mission specific software to be developed and distributed to the ground station

facilities. Second, any modification to the packet format requires an upgrade of software

at all the ground stations. This is cumbersome and error prone.

As a result of these limitations, the solution is not scalable to a large number of

CubeSats. In order to overcome these limitations, Space Systems Group (SSG) [12]

and Wireless And Mobile Lab [13] at University of Florida developed A Cloud Computing

Architecture for Spacecraft Telemetry Collection (T-C3) [14], a scalable and flexible

means of collecting the telemetry data. T-C3 is an effort to improve MoreDB and make it

a universal telemetry decoding solution solution for CubeSats. Instead of decoding the

received beacon at the amateur radio station directly, T-C3 forwards the beacon to the

T-C3 central server which fingerprints the beacon and decodes the beacon using that

satellites telemetry format, making it a much more scalable and flexible solution [14].

GENSO (Global Educational Network for Satellite Operations) [15] is the next

significant milestone in the evolution of CubeSat networks. GENSO was founded to

22

Page 23: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 2-1. Generation 1 (a) and Generation 2 (b) CubeSat Networks

Image courtesy of Space Systems Group. Picture by Tzu Yu (Jimmy) Lin.

create a network of amateur radio stations around the world to support the small satellite

operations of various universities. GENSO has been designed as a distributed system

connected via the Internet as shown in the Figure 2-2. The satellite can communicate

with the main base station through any arbitrary available relay station. With a single

ground station, a university can gather about 25 minutes of data from a CubeSat in a

day. Using the GENSO network, mission controllers can gather hours of worth data

per day by receiving data via hundreds of networked radio stations around the world.

It will also allow them to command their spacecraft from the other ground stations.

GENSO and other similar efforts can be classified as generation 3 CubeSat networks.

GENSO plans to have a built-in database of all the satellites. This database can be

used to predict and automate the tracking of the satellites to collect the telemetry in

an efficient way. Once the data is downlinked, the data will be provided to respective

mission controllers.

23

Page 24: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 2-2. Architecture of GENSO

2.2.1 Summary and Limitations of CubeSat Communications

Table 2-1 summarizes the data speeds and data downloads of CubeSat communication

systems. Typical characteristics of a CubeSat communication subsystem can be

summarized as follows. Data rate is 9600 baud, power rating 500 mW with an efficiency

of about 25% and a total download of 12 MB has been achieved so far using 13

satellites for a period of 5 years. As one can see, communications is the primary

bottleneck for emerging remote sensing missions. In order to improve the downlink

speed we developed CubeSat Torrent, a distributed communications framework for

24

Page 25: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

CubeSat clusters. This proposal envisions the next generation of CubeSat Networks,

which is the distributed satellite system.

Table 2-1. CubeSat data speeds and downloadsParameter Min Max Average

Speed 1200 bps 38.4 kbps 9600 bpsPower 350 mW 1500 mW 500 mW

Frequency 433 MHz 900 MHz NATotal Download 320 KB 6.77 MB 0.5 - 5 MB

2.3 Distributed Satellite Systems

A distributed system is a collection of independent components that work together

to perform a desired task and appears to end user as a single coherent system.

Examples of distributed Systems include the World Wide Web (WWW), Clusters,

Network of Workstations or Embedded Systems, Cell processor etc., These distributed

systems are fueled by the availability of powerful and low cost microprocessors and

high speed communication technologies like Local Area Network (LAN). As the price

to performance ratio of microprocessors drop and speed of communication networks

increase, distributed computing systems have much better price-performance ratio than

a single large centralized system.

As more and more CubeSats are launched, it is becoming apparent that some

space research needs may be better met by a group of small satellites, rather than by

a single large satellite. This is akin to the paradigm shift that happened in the computer

industry a few decades ago: shift of focus from large, expensive mainframes to using

smaller, cheaper, more adaptable sets of distributed computers for solving challenging

problems [16].

Distributed satellite systems have their own advantages and challenges. Due to the

advances in modern VLSI technology that create integrated circuits with lower power

and smaller in size, and due to subsystems like RelNAV Software Defined Radio [17]

that have enabled high speed satellite communication, distributed satellite systems

25

Page 26: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

potentially have a much better price to performance ratio than a single large monolithic

satellite. Applications like weather monitoring and tracking are inherently distributed

in nature and may be better served by a distributed system than a centralized system.

Monolithic satellite architecture requires that each satellite must have all of the sensing,

processing, storage and communication peripherals on board. Distributed satellite

systems can share resources like sensing, memory, processing and communications,

as well as information. The multiplicity of sensors, storage devices, processors and

communication devices means there is no single point of failure. Critical information

can be duplicated allowing the system to continue to work even if some components

fail. Similarly, distributed satellite systems may have better availability than centralized

satellite systems, due to the ability of the system to work at reduced capacity when

components fail.

Finally, distributed satellite systems enjoy the advantage of incremental growth.

The functionality of a distributed satellite system can be gradually increase by adding

more satellites as and when need arises. Distributed small satellite systems rely on what

is called horizontal scaling, where one employs more satellites to serve an increased

need.

On other hand, distributed satellite systems are more complex and difficult to

build than monolithic satellite systems. Several challenges such as orbit planning,

resource management, communication and data management, security must be

addressed [16]. There is very little or no support for distributed data storage, processing

or communications for distributed satellite systems. Distributed satellite systems need

fast and low power backbone network for data and control information exchange. The

backbone network must be reliable and should prevent problems such as message

loss, overloading and saturation. Distributed systems store data at several places. It

provides more access points for critical information. As a result, additional security

measures need to be taken to safeguard data and systems. Finally, finding out problems

26

Page 27: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

in distributed satellite systems and troubleshooting them requires detailed analysis of

each satellite and communication between them.

2.4 Classification of Distributed Satellite Systems

Constellation, Formation Flying and Swarm / Cluster are three main types of

distributed satellite systems [16]. A group of satellites in similar orbits with coordinated

ground coverage complementing each other is called a Constellation. Satellites in a

constellation do not have on-board control of their relative positions and are controlled

separately from ground control stations. Iridium and Teledesic are well known examples

of satellite constellations. A group of satellites with coordinated motion control, based

on their relative positions, to preserve the topology is called a Flying Formation. Position

of a satellite in a flying formation is controlled by onboard closed-loop mechanism.

Satellites of a flying formation work together to perform the function of a single, large,

virtual instrument. TICS, F6 and Orbital Express are well known examples of flying

formations. A group of satellites, without fixed absolute or relative positions, working

together to achieve a joint goal is called a Cluster or Swarm. More about satellite

clusters is presented in the Chapter 3.

2.5 Related Work

2.5.1 Distributed Storage Systems

Below we present an overview of related work done in the fields of distributed

storage, processing and communications. We surveyed some well known distributed file

systems such as The Google File System (GFS) [18], Hadoop Distributed File System

(HDFS) [19], Coda [20] and Lustre [21]. Owing to simplicity, fault tolerant design and

scalability, architecture of GFS and HDFS suits well for distributed storage on CubeSat

clusters. Below we present an overview of distributed storage systems, GFS and HDFS.

Google File System is the major storage engine for large scale data at Google. A

brief summary of GFS is as follows. Architecture of Google File System is shown in the

Figure 2-3. GFS consists of two components: a master and one or more chunk servers.

27

Page 28: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

GFS functions similarly to a standard POSIX file library but is not POSIX compatible.

Each file is split into fixed size blocks called chunks. Each chunk is 64 MB in size, by

default. Chunks are stored on chunk servers. Metadata information like constituent

chunks of a file, file to chunk mapping, chunk to chunk server mapping is stored with

master. Chunk servers store the actual data in the form of chunks. When clients interact

with Google file system, large share of communication will be between the client and

chunk servers. This avoids master as the bottleneck for transferring large files in and out

of Google File System.

Client machine communicates to the Google File System through the client class

library. It translates the open, read, write and delete file system calls into Google File

System calls. Client library communicates with the master for metadata operations and

chunk servers for actual data operations. The interface of GFS client is very similar to

that of POSIX file system. To work with GFS, no knowledge about distributed systems

is required. GFS client abstracts all the required distributed knowledge. However, some

localised chunk information is used for scheduling MapReduce jobs on nodes to improve

the efficiency. Each of the open, read, write and delete operations are implemented in

the following way. GFS client requests master for metadata information including file to

chunk mapping, chunk to chunk server mapping and communicates with chunk servers

for actual data. Once the operation is complete metadata of the Master is updated to

reflect the new state of the file system.

Hadoop Distributed File System (HDFS) is an open-source implementation of

Google File System. Hadoop is written in Java programming language. It can be

interfaced with C++, Python, Ruby and many other programming languages using its

Thrift [22] interface. It is designed for storing hundreds of gigabytes or even petabytes

of data and for fast streaming access to the application data. Similar to GFS, HDFS

supports write-once-read-many semantics on files.

28

Page 29: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 2-3. Architectural overview of the Google File System

29

Page 30: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

HDFS also uses master/slave architecture. Architecture of Hadoop Distributed File

System is shown in the Figure 2-4. In HDFS, NameNode plays the role of the master

node of GFS. It controls the namespace and implements the access control mechanism

for data stored in HDFS. DataNodes takes care of managing the local storage hardware.

In order to bring the cost of the implementation down, these nodes run open source

operating system, typically a GNU/Linux system. When a file copied into HDFS, it is split

into blocks and distributed to DataNodes. Fault tolerance of the stored data is achieved

through replication. Each chunk is replicated 3 times, by default. HDFS documentation

is available at their website [23].

There are several limitations of GFS and HDFS for storing remote sensing data

on CubeSat clusters. Unlike wired communication channels, wireless communication

channels in space are more unreliable. Communication links break often leading to

network partitions. GFS and HDFS are not partition tolerant. GFS and HDFS are

not optimized for power consumption. For CubeSat clusters, power is a very scarce

resource. Also, cost of communication is significantly high for wireless links, compared

to wired links. GFS and HDFS are designed as generic data storage platforms. They

are not tailored for storing remote sensing data. Their generic design is too complex for

CubeSat Clusters. Using GFS or HDFS causes a lot of overhead in terms of processing,

memory, bandwidth and power. We designed CubeSat Distributed File System to

overcome the above mentioned problems and tailored it for storing remote sensing data

on CubeSat clusters by using large chunk sizes and load balancing.

2.5.2 Distributed Computing Techniques

Distributed computing is a form of computing where processing is performed

simultaneously on many nodes. The key principle behind distributed computing is that

most of the large problems can be divided into smaller problems, which can be solved

concurrently. Cheap computing nodes are connected using high speed backbone

network to form a cluster to execute the smaller problems. We surveyed distributed

30

Page 31: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 2-4. Architectural overview of the Hadoop Distributed File System

31

Page 32: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

computing techniques such as Common Object Request Broker Architecture (COBRA),

Web services, Remote Procedure Call (RPC), Remote Method Invocation (RMI) and

MapReduce that are used for distributed processing on computing machines. Below we

present a brief overview of them.

Distributed objects technique involves distributed objects communicating via

messages. Common Object Request Broker Architecture (COBRA), JAVA Remote

Method Invocation (RMI), IBM Websphere MQ, Apple’s NSProxy, Gnustep, Microsoft’s

Distributed Component Object Model (DCOM) and .Net are well known examples of

this model. Owing to its platform independence and interoperable nature, CORBA

programs can work together regardless of the programming languages used. Java RMI,

IBM Websphere MQ, Apple’s NSProxy, Microsoft’s DCOM and .Net are proprietary

technologies. They are not independent of the programming language and are not quite

versatile.

Web services is the way through which web based applications operate via

the HTTP protocol. Web services uses Simple Object Access Protocol (SOAP) for

exchanging structured information between the web components; JavaScript Object

Notation (JSON) to exchange data between web services in human readable format,

Web Services Description Language (WSDL) to provide a machine readable(machine-readable)

description of a web service and Universal Description, Discovery and Integration

(UDDI) for describing web services.

Message Passing Interface (MPI), Open Message Passing Interface (OpenMPI),

Open Multi Processing (OpenMP) and Parallel Virtual Machine (PVM) are the prevalent

technologies in message passing interface category. These technologies are used in

massively parallel applications and supercomputing when data needs to be distributed

and communicated efficiently.

Sockets is a popular option for client server based architectures, like mail servers

and web servers. Their availability on any system equipped with TCP/IP stack makes it

32

Page 33: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

an attractive option. Traffic can easily be re-routed to different ports using secure shell

(SSH), Secure Sockets Layer (SSL) or Virtual Private Network (VPN) connections.

MapReduce is the recent development in the field of distributed computing.

Introduced by Google Inc. in 2004 [18], design simplicity, fault tolerance and ease of

implementation makes it an attractive candidate for large scale distributed processing.

It is a based on the map and reduce primitives of functional languages like Lisp.

MapReduce programs are highly parallelizable and thus can be used for large-scale

data processing by employing large cluster of computing nodes. Companies like Google,

Yahoo! and Facebook use MapReduce to process many terabytes of data on a large

cluster containing thousands of cheap computing machines. MapReduce performs

large-scale distributed computation while hiding the complications of parallelization, data

distribution, synchronization, locking, load balancing and fault tolerance.

We studied in detail about the advantages and disadvantages of the above

mentioned distributed computing techniques. They do not account for salient and unique

features of CubeSats and CubeSat clusters like power, memory and communications

constraints, unreliable wireless communication, high cost of communication and need

for tight locality optimization of data storage and operations. Owing to its simplicity

and fault tolerant design, MapReduce suits well for large scale distributed processing

of remote sensing data on CubeSat clusters. Based on MapReduce, we designed

CubeSat MapMerge to serve the needs of CubeSat community. To overcome the above

mentioned limitations, we designed CubeSat MapMerge and tailored it for processing

remote sensing data on CubeSat clusters.

33

Page 34: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

CHAPTER 3NETWORK ARCHITECTURE OF CUBESAT CLOUD

Architecture of CubeSat Network is shown in Figure 3-1. CubeSat Network

consists of space and ground segments. Space segment is a CubeSat Cluster.

Architecture of CubeSat Cluster is shown in Figure 3-2. A CubeSat cluster has a

radius of about 25 km. It consists of Sensor nodes and Worker nodes inter-connected

using high speed communication links. Worker nodes are CubeSats with storage,

processing, communication and other standard subsystems. In addition to the standard

subsystems, Sensor CubeSat has sensing subsystem. Sensor nodes act as Master of

the cluster while orchestrating remote sensing missions. Ground segment is composed

of Server and several ground stations. CubeSat to CubeSat communication links are

short distance, reliable, directional, low power and high speed. CubeSats to ground

station communication links are long distance, high power, low speed and unreliable.

Each CubeSat is connected to a ground station. Ground stations are connected to the

Server via the Internet. Ground stations act as relays between Server and CubeSats.

Figure 3-1. Architecture of a CubeSat network

34

Page 35: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 3-2. Architecture of a CubeSat cluster

3.1 Components of the CubeSat Network

3.1.1 Space Segment

A worker CubeSat is a typical 1U CubeSat that has dimensions of 10 cm x 10 cm

x 10 cm, volume of exactly one litre, weighs about one kilogram. However, it does not

need to be a 1U CubeSat only. It can be 2U or 3U or any other form factor. Worker

CubeSats needs to have storage, processing and communication subsystems. Other

standard subsystems include Satellite Bus, Electrical Power, Structural and Thermal,

Attitude Determination and Control. Worker CubeSat has about 1 GHz processor, 1

GB of RAM, 32 - 64 GB memory. CubeSat to ground station communication speed is

about 9.6 kbps. Sensor CubeSats has sensing module in addition to above mentioned

subsystens. Figure 3-3 shows a blown up ESTCube-I CubeSat showing its various

subsystems.

Sensor node is equipped with sensing hardware. It performs sensing (take an

image or do a radar scan). While orchestrating a mission, a Sensor node acts as the

Master node for CubeSat Cluster. When not orchestrating a mission, Sensor node

performs the role of a worker node. Master node is the primary center for receiving

35

Page 36: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 3-3. A blown up picture of ESTCube-I CubeSat, showing its subsystems

Image courtesy of University of Tartu. Picture by Andreas Valdmann.

36

Page 37: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

commands from the server and issuing subcommands to worker CubeSats in the

cluster. It keeps track of all the metadata related to the mission including the list of

participating nodes, their resource capabilities, map jobs, merge jobs, downlink jobs

and their status. It keeps track of all the resources available in the cluster, their state

and tracks available resources on each node. It is also responsible for taking scheduling

decisions like which job needs to be scheduled on which node and when. Worker nodes

have limited role of executing the processing and downlinking jobs assigned to them by

the Master node.

3.1.2 Ground Segment

Ground station or amateur radio station is an installation that enables communication

with CubeSat satellite. Figure 3-4 shows ground station control equipment and Figure

3-5 shows high directional Yagi antenna used for communicating with satellites. It

contains high gain directional antennas like Yagi or parabolic dish antenna, communication

equipment like modems and computers to send, capture and analyse the data received.

There are several types of amateur radio stations including fixed ground stations,

mobile stations, space stations, and temporary field stations. Most of the radio stations

are established to provide an educational and recreational purposes for providing

technical expertise, skills and volunteer manning to promote attendance by the public,

communications education for the public.

Ground station server is a dedicated computer system that is connected to ground

stations through Internet. It receives commands from Administrator and uplinks the

commands to CubeSats. Once the mission is executed, resulting data is downlinked to

the Server and is stored o its local storage disk. Server acts as the command center for

CubeSat network. Administrator issues commands to the Server, which then forwards

the commands to the Master CubeSat through a ground station. Server node stores all

the downlinked mission data and thus acts as the storage node for the downlinked data

from CubeSat cluster. Ground Stations acts as a relay between CubeSats and server.

37

Page 38: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 3-4. Ground station

Image courtesy of Gator Amateur Radio Club. Image by Tzu Yu (Jimmy) Lin.

They downlink the data from CubeSat and send it to server. They upload commands

and data from server to worker CubeSats which forward them to Master CubeSat.

3.2 System Communication

3.2.1 Cluster Communication

CubeSats are connected to each other through a high speed (>1 Mbps) and

low power consuming backbone network. High gain directed antennas like patch

or LASER [24] are used for inter cluster communication. Vescent photonics [25] is

developing a extremely small and low power optical communications modules for

CubeSats. There has been research on using tethers for low distance, high speed

communication between satellites. RelNav demonstrated a spacecraft subsystem call

38

Page 39: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 3-5. Ground station antenna

Image courtesy of Gator Amateur Radio Club. Image by Tzu Yu (Jimmy) Lin.

SWIFT Software Defined Radio (SDR) [26] [17] that will enable a flock of satellites.

SWIFT SDR subsystem demonstrated by RelNav provides provide following services:

• 1 Mbps inter-satellite communication link for data exchange between CubeSats.

• Relative position and orientation for formation flight.

• Cluster synchronization and timing for coordinated operations and coherentsensing.

3.2.2 Space Segment to Ground Segment Communication

CubeSat geometry prohibits the use of complex antennas [27]. As a result,

CubeSats are connected to ground stations through simple antennas like monopole

or dipole. Coupled with stringent power constraints and distances of order 600 - 800 km,

this resulted in low speed links between CubeSats and ground stations. Typical CubeSat

to ground station speed is about 9.6 kbps [10].

39

Page 40: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

3.2.3 Ground Segment Network Communication

Ground stations and Server are connected via the Internet. Internet provides a high

speed (10 Mbps) and reliable wired communication medium between the Server and

ground stations. Power is not a constraint for the Server and ground stations as they are

connected to the electrical grid.

3.3 CubeSat Cloud

We propose CubeSat Cloud, a framework for distributed storage, processing and

communication of remote sensing data on CubeSat Clusters. CubeSat Cloud uses

CubeSat Distributed File System for distributed storage of remote sensing data on

CubeSat Clusters. CubeSat MapMerge is the distributed processing framework used

for processing remote sensing data stored in CDFS. CubeSat Torrent is the distributed

communications framework used for downlinking raw or partially processed remote

sensing data from CubeSat Clusters. Below we describe how remote sensing missions

are executed using the CubeSat Cloud framework.

3.3.1 Storage, Processing and Communication of Remote Sensing Data onCubeSat Clusters

CubeSat Cloud, as a generic framework, can be used for storing, processing

and downlink of remote sensing data from CubeSat Clusters. Once a remote sensing

operation is performed, obtained sensor data is stored on the CubeSat cluster using the

CubeSat Distributed File System. After storing the data on the cluster, it is processed

using CubeSat MapMerge and obtained results are downlinked using CubeSat Torrent.

Figure 3-6 shows the overview of CubeSat Cloud framework consisting of CubeSat

Distributed File System, CubeSat MapMerge and CubeSat Torrent. Below is a detailed

description of how a remote sensing mission is executed using CubeSat Cloud.

1. Server sends the SENSE and STORE command to the Master. Upon receiving theSENSE and STORE command, Master performs remote sensing operation andstores the sensor data on the local file system. Server and Master does not needto have a direct communication link. The command will be relayed through theground station network and space segment.

40

Page 41: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

2. Master node then splits the file into chunks C1, C2, C3, . . . Cn. Size of chunksis about 64 Kb. Master node distributes the Chunks to the worker nodes. File tochunk mapping, Chunk to worker mapping and other metadata is stored Master.Splitting the remote sensing data into chunks, distributing them and storing themon worker nodes is achieved using CubeSat Distributed File System. Distributingthe data across the worker nodes in the cluster allows the possibility of processingand downlinking the data in distributed fashion.

3. Server sends the PROCESS command to Master. Master CubeSat commands theWorker CubeSats to processes the stored chunks stored to produce partial results.Obtained partial results are stored on the local file system of worker nodes.

4. Server sends the DOWNLINK command to the Master, which then commands theworker nodes in the cluster to downlink the processed chunks to ground stations.Downlinking the processed chunks to the Server is achieved through CubeSatTorrent.

5. Once the Server receives all the processed chunks, it stitches them into fullsolution. Processing of chunks to produce partial results on Worker nodes andstitching of partial results into the complete solution on Server constitutes theCubeSat MapMerge.

3.3.2 Source Coding, Storing and Downlinking of Remote Sensing Data onCubeSat Clusters

A large number of missions require only downlinking of the remote sensing data

without processing it. For these missions, worker nodes does not require access to the

raw data. This can be used as an opportunity to optimize the CubeSat Torrent missions

by improving the quality of service and reducing the storage overhead. Below we

present in detail about how we utilize source coding for improving the quality of service

and reduce storage overhead. Figure 3-7 shows the overview of how a downlink only

remote sensing missions are executed using CubeSat Cloud and is described below in

detail.

1. Master sends the SENSE, CODE and STORE command to the Server.

2. Upon receiving the SENSE, CODE and STORE command from the Server, Masterperforms remote sensing operation and stores data from the sensor on the localfile system.

41

Page 42: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 3-6. Overview of CubeSat Cloud and its component frameworks

42

Page 43: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

3. Master node splits the remote sensing data file into chunks C1, C2, C3 . . . Cn. Sizeof each chunk is about 64 Kb. Then, based on the required redundancy, it createscoded chunks C1’, C2’, C3’ . . . Cm, where m >n.

4. Master distributes coded chunks C1’, C2’, C3’ . . . Cm to the Worker nodes. Masterstores metadata, which includes file to chunk mapping, chunk to Worker mappingand chunk status. Splitting the remote sensing data into chunks, performingcoding, distributing them and storing them on Worker nodes is performed byCubeSat Distributed File System.

5. Server then sends the DOWNLINK command to the Master, which then commandsthe Worker nodes in the cluster to downlink the processed chunks to Groundstations. Downlinking the processed chunks to the Server is performed byCubeSat Torrent.

6. Once the Server receives n out of m chunks, it stitches them into the original file.As long as n out m chunks are available, the original data can still be recovered.Details about performance analysis of source coding are discussed in Chapter 8.

43

Page 44: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 3-7. CubeSat Cloud: Integration of CubeSat Distributed File System and CubeSat Torrent

44

Page 45: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

CHAPTER 4DISTRIBUTED STORAGE OF REMOTE SENSING IMAGES ON CUBESAT CLUSTERS

The CubeSat Distributed File System (CDFS) is built for storing large sized remote

sensing files on small satellite clusters in a distributed fashion. While satisfying the

goals of scalability, reliability and performance, CDFS is designed for CubeSat clusters

which use wireless backbone network, are partition prone and have severe power

and bandwidth constraints. CDFS has successfully met the scalability, performance

and reliability goals while adhering to the constraints posed by the harsh environment

and limited resources. It is being used as a storage layer for distributed processing

and distributed communication on CubeSat clusters. In this chapter, we present the

architecture, file system design and several optimizations.

4.1 Key Design Points

4.1.1 Need for Simple Design

A typical CubeSat has about 1 GHz processing capability, 1 GB RAM, 32 GB of

flash storage, 1 Mbps inter cluster communication speed, 9.6 kbps communication

capability and 2 W power generation capability[5]. For CubeSats, processing, bandwidth

and battery power are scarce resources. So the system design needs to be simple.

4.1.2 Low Bandwidth Operation

CubeSat network is built using long distance wireless links (10 km for inter cluster

and 600 km for CubeSat to ground station). As a result, the cost of communication is

very high. As a result, data and control traffic needs to be reduced as much as possible.

4.1.3 Network Partition Tolerant

The backbone medium of communication is wireless and the space environment is

harsh. High velocity of satellites (relative to ground stations) in LEO makes the satellite

to ground station link failure very common. Topology of CubeSat cluster is also very

dynamic, causing the inter satellite links to keep breaking very frequently. Sometimes,

nodes go into sleep mode to conserve power. All the above factors can cause frequent

45

Page 46: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

breaking of communication links. As a result, if a node is temporarily unreachable,

system should not treat it as a node failure. System should be tolerant to temporary

network failures or partitions.

4.1.4 Autonomous

Most of the time, individual CubeSats and the whole CubeSat cluster are inaccessible

to human operators. So the software design should take care of all failure scenarios.

A reset mechanism, at the node and network level, should be provided. In case if all

the fault tolerance mechanisms fail, system will undergo reset mechanism and start

working again. As a result, distributed file system should be able to operate completely

autonomously without human intervention.

4.1.5 Data Integrity

Memory failures are fatal for satellite missions. Even though memory chips for

satellites are radiation hardened, high energy cosmic rays can sometimes cause trouble.

For example, the Mars rover Curiosity had suffered a significant setback because of

damage to the memory of its primary computer caused by a high-energy particle.

Hence, data integrity should not be violated.

4.2 Shared Goals Between CDFS, GFS and HDFS

Along with the above design points that CDFS shares additional design points with

GFS and HDFS which are highlighted below.

4.2.1 Component Failures are Norm

Given a large number of CubeSats and communication links, failures are norm

rather than the exception. Therefore, constant monitoring, error detection, fault

tolerance, and automatic recovery must be integral to the system.

4.2.2 Small Number of Large Files

Files are huge by traditional standards. Images and remote sensing data generated

by satellites tend to be in the order of hundreds of megabytes.

46

Page 47: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

4.2.3 Immutable Files and Non-existent Random Read Writes

Random writes within a file are practically non-existent. Once written, the files are

only read, and often only sequentially. This kind of access patterns are common for

imaging, remote sensing missions and programs like MapReduce that process this data

and generate new data.

CDFS shares goals of availability, performance, scalability and reliability with GFS

and HDFS. Owing to its radically different operating environment, the system design

points and constraints are very different for CDFS. GFS and HDFS were designed for

non-power constrained cluster of computers connected using high speed wired media.

CDFS is meant for distributed data storage on CubeSat clusters which use wireless

communication medium for exchanging data and have severe power and bandwidth

constraints. Design of CDFS should be simple, operate with very less bandwidth

consumption and operate autonomously without the requirement for human intervention.

It should be tolerant to network partitions, temporary link failures, node failures and

preserve the integrity of the data stored.

4.3 Architecture of CubeSat Distributed File System

Figure 4-1 Shows the architecture of CDFS. A CDFS cluster consists of Sensor

CubeSats and worker CubeSats. Sensor nodes are equipped with sensing module and

thus performs sensing. While orchestrating a mission, a Sensor node plays the role of

the Master (M) node. Worker nodes aid the Master node in processing or downlinking

large files. Here is how CDFS stores a file on the cluster. Administrator will issue a

remote sensing command to the central server (as shown in Figure 4-1). Central server

will transmits the command to a relay ground station which uplinks it to the Master

CubeSat. Upon receiving the command, Master CubeSat will perform sensing, like

taking image or doing a radar scan. Sensing operation will generate large amounts of

data (about 100 MB) which is stored into a local file on Master node.

47

Page 48: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 4-1. Architecture of CubeSat Distributed File System

48

Page 49: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Master node (M) splits this file into blocks called chunks and stores them on worker

CubeSats. Each chunk is identified using unique chunk id. For reliability, each chunk is

replicated on multiple workers. By default, CDFS creates two replicas (a primary replica

and a secondary replica), along with an implicit replica stored on Master node. So in

effect, there are three replicas. Along with the implicit replicas, the Master CubeSat

holds all metadata for the filesystem. Metadata includes the mapping from files to

chunks, the location of these chunks on various workers, the namespace and access

control information. The workers store the actual data. Worker nodes store chunks as

regular files on local flash memory. As shown in the figure, the cluster is organized as a

tree with the Master node as the root node.

4.3.1 File System Namespace

CDFS supports a traditional hierarchical file organization in which a user or

an application can create directories and store files inside them. The file system

namespace hierarchy is similar to that of Linux file systems[28]. The root directory

is “/” and is empty by default. One can create, rename, relocate, and remove files.

CDFS supports hidden files and directories concept in a way similar to that of Linux file

systems. Hidden file or directories start with “.” (period) and contain metadata, error

detection and correction information, configuration information and other miscellaneous

information required by CDFS. These hidden files are stored as regular files rather than

distributed files since these are very small and are used by the system locally. One can

refer to the files stored on a server using notation “cdfs://server/filepath”, where filepath

looks like “/directory1/directory2/ . . . /filename”.

4.3.2 Heartbeats

Several problems can cause loss of data or connectivity between Master and worker

nodes. Problems are diagnosed using Heartbeat messages. Once every 10 minutes,

worker nodes send a Heartbeat message to Master node. Heartbeat message contains

workers current status and problems, if any. Master periodically diagnoses the received

49

Page 50: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Heartbeat messages to detect any problems and rectify them if possible. If a worker

does not send any Heartbeat message with in 10 minutes, Master will mark the node as

temporary failure. If there is no Heartbeat message from worker with in 30 minutes of

time, Master marks the worker as permanent failure. When a node is marked temporary

failure, data chunks assigned to worker will not be replicated on other nodes, instead

the secondary replicas will be marked as primary. After permanent failure, Master

node marks the node as dead. When a node contacts the Master after recovering from

permanent failure, Master refreshes it metadata to reflect the change.

4.4 File Operations

CDFS has a simple interface. CDFS supports file operations create, write, read and

delete. Next section we describe in detail what happens when each of these operations

is performed.

4.4.1 Create a File

Once Master performs remote sensing operation (take an image or do a radar

scan), it generates huge amounts of sensor data. Initially this data will be stored into

a local file on the Master node. Typical size of this file is about 100 MB. Master stores

this file on CubeSat cluster using CDFS to perform distributed processing or distributed

downlinking. Following actions will implemented in sequence when a CDFS file is

created by Master node. It requires filename, and chunk size as parameters. By default,

chunk size is 64 KB and is optional.

1. Master calculates the number of chunks based on the file size and chunk size.(Number of chunks = file size / chunk size).

2. Master generates chunk identifiers called chunk ids and assigns one to eachchunk. Chunk id is an immutable id.

3. Master assigns the chunks to worker nodes. Each chunk is assigned to one workernode in a round robin fashion. A copy of the chunk stored at the selected node iscalled primary replica of the chunk.

50

Page 51: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

4. Master stores the above metadata (filename, number of chunks, chunk to chunkid mapping and chunk id to worker node mapping) in its permanent storage andcommunicates the same to the backup Master.

4.4.2 Writing to a File

Write operation is performed by Master node when it wants to copy a local file (on

the Master) to CDFS. Files in CDFS are immutable. They can be written only once

after that are created. Inputs for a writing a file are source filename on Master and the

destination filename CDFS. Following actions happen in sequence when the Master

writes a local file to a CDFS file.

1. For each chunk Master performs the actions described in steps 2, 3, 4, and 5.

2. Master looks up the metadata of the destination file on CDFS to find out the workernode responsible for storing the chunk.

3. Master determines transmission path (from Master to the worker node) using treebased routing algorithm.

4. From the nodes on the transmission path (excluding the Master and destinationworker node),

5. Master randomly picks a node to be the secondary replica of the chunk andnotifies it.

6. Master transmits the chunk to the primary replica node. While the chunk is beingtransmitted to the primary replica node, secondary replica node copies the chunkand stores in its local storage.

7. After storing all the chunks on the cluster, Master commits the metadata to itsmemory and communicates the same to the Server.

4.4.3 Deleting a File

The following actions are performed in sequence when a file is deleted.

1. Administrator issues delete file command to the Server.

2. Server uplinks the command to the Master CubeSat through a relay groundstation.

3. Master node looks up the metadata for the file and sends the delete chunkcommand to all the primary and secondary replicas nodes.

51

Page 52: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

4. Once a worker deletes the chunks, it sends ACK to the Master.

5. Once the ACKs are received from all worker CubeSats, Master deletes themetadata for the file.

6. Master CubeSat will send SUCCESS message to the Server through relay groundstation.

4.5 Enhancements and Optimizations

CDFS serves well as distributed data storage on CubeSat Clusters. However

CubeSats have stringent energy constraints and CubeSat clusters have severe

bandwidth constraints. So there is a dire need to reduce energy and bandwidth

consumption. Below we describe the methods we employ for reducing the energy

and bandwidth consumption.

4.5.1 Bandwidth and Energy Efficient Replication

To ensure reliability of data stored, CDFS uses redundancy. Each chunk has three

replicas stored on three different nodes, called replica nodes. But, creating replicas

is both energy and bandwidth consuming. For a CubeSat cluster, both energy and

bandwidth are precious. In order to reduce energy and bandwidth, Master node (Source

node) can be used as the Super Replica Node (Super Replica Node: A node which

stores the replicas of all chunks). Since the Master node performs sensing and has all

the data initially, implicit replicas on Master node are created without any energy and

bandwidth consumption. Using Master node as a Super Replica node essentially means

that CDFS needs to create only two additional replicas. This also means that Master

node should be equipped with sufficiently high storage to store all chunks. But this is a

small cost compared energy and bandwidth saved. The data from the source node is

accessed only if other two replicas are not available, in order to conserve the power of

the source node.

For additional two replicas, any random selection of worker nodes will do a good

job for achieving reliability. But, if replica nodes are carefully selected, energy and

bandwidth consumption can be significantly reduced. Consider the two scenarios

52

Page 53: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

A and B depicted in Figure 4-2. In scenario A, the chunk is replicated on nodes M

(the Master node), A and B2. In scenario B, the chunk is replicated on nodes M, B

and B1. The cost of communication (bandwidth and energy consumed) in the first

scenario is 3 times the average link communication cost (from M to A and from M to

B to B2). In the second case, energy consumption is only 2 times the average link

communication cost (from M to B to B1). Storing a chunk on nodes that are on the

same communication path or on nodes which are located close to each other yields

best energy and bandwidth efficiency. Exploiting the above observation, we designed a

novel method for providing reliability with low power and bandwidth consumption. This

technique is called Copy-on-transmit.

Figure 4-2. Bandwidth and energy efficient replication

When the source node transmits the data to a destination, it goes through multiple

hops. Selected nodes, on the communication path, copy the data while it is being

transmitted. This method is very convenient for doing data replication in wireless

networks, without incurring additional energy or bandwidth consumption. Consider the

53

Page 54: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

scenarios shown in Figure 4-3. In all cases the source node M, transmits data to the

destination node Z. Below we describe how we replicate data using copy-on-transmit for

different communication path lengths for a replication factor of 3 (1 implicit replica and 2

explicit replicas).

4.5.1.1 Number of nodes on communication path = replication factor

In this case, we replicate the chunk on all nodes along the path including the source

and destination. When the chunk is being transmitted from Node M to Node Z through

Node A, Node A makes a copy of the chunk and stores in its memory. Now the chunk

has three replicas, one each at M, A and Z.

4.5.1.2 Number of nodes on communication path >replication factor

In this case, we replicate the chunk on Master node M, destination node Z and

a random on the path. When the chunk is being transmitted from Node M to Node Z

through Nodes A, B, C, D, and E, Node C makes a copy of the chunk and stores it in its

memory. Now the chunk has three replicas one each at M, C and Z.

4.5.1.3 Number of nodes on communication path <replication factor

In this case, we replicate the data on all nodes on the communication path (Node

M and Node Z) and some additional nodes. This scenario can have two different

sub-scenarios (a) when the destination node is not a leaf node (has children) and

(b) when the destination is a leaf node (no children). These two sub-scenarios are

discussed as Case 3(a) and Case 3(b) below.

Case 3(a) Destination node is not a leaf node: In this case, first we replicate the data

on all nodes (Node M and Node Z), along path. In order to meet the replication

requirement, the communication path is extended beyond the destination node Z to

store data on Node A. This ensures that there are required numbers of replicas.

Case 3(b) Destination node is a leaf node: In this case, first we replicate the data

on all nodes (Node M and Node Z), along path. In order to meet the replication

requirement, one more replica of chunk needs to be created. Master randomly

54

Page 55: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

selects another node A and stores chunk on it, ensuring that there are required

number of replicas.

Figure 4-3. Copy on transmit

4.5.2 Load Balancing

The goal of load balancing is to distribute data to the nodes in the cluster in order

to balance one or several of the criteria like storage, processing, communication, power

consumption. When a file is created, number of chunks assigned to a worker node is

proportional to the value of the LBF(node), where LBF is the load balancing function.

Below we explain how we determine the load balancing function for uniform storage,

proportional storage and several other criteria. Custom load balancing function can

be used to perform load balancing according to users wish. However its needs to be

noted that distributing data in order to perform uniform storage might result in uneven

load balancing for processing or communication and vice versa. N is the total number of

55

Page 56: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

worker nodes in the cluster and LBF is the load balancing function. The following are the

available load balancing functions available in CDFS:

• Uniform storage / processing / communications per node: LBF(node) = 1 / N

• In proportion to storage capacity of node: LBF(node) = Storage capacity of thenode / Total storage capacity of the Cluster

• In proportion to processing capacity of node: LBF(node) = Processing power ofnode / Total processing power of the Cluster

• In proportion to communication capacity of node: LBF(node) = Communicationspeed of node / Total communication speed of the cluster.

• In proportion to power generation capacity of node: LBF(node) = Power generationcapability of node / Total power generation capability of the cluster

• Hybrid: LBF(node) = a * LBF(node) for storage + b * LBF(node) for processing + c* LBF(node) for communication + d * LBF(node) for power, where a, b, c and d arenormalized proportion coefficients, and sum of a, b, c and d is 1.

For missions that are processing intensive, it is desirable that number of chunks

stored on a node is proportional to the nodes processing power. For communication

intensive missions, it is desirable that number of chunks stored on a node is proportional

to the communication capabilities of the node. For missions that are both processing

and communication, hybrid function can be used. Additionally, in order not to overload

nodes, a capping on the number of chunks stored per node per file is suggested.

4.5.3 Chunk Size and Granularity

By splitting files into a large number of chunks, granularity will be improved. Small

chunks ensure better storage balancing, especially for small files. However, as the

number of chunks increases, so the amount of metadata, metadata operations and

number of control messages which decreases the system performance. In order to

strike balance between the advantages of large chunks with advantages of granularity,

we selected chunk size to be about 64 KB.

56

Page 57: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

4.5.4 Fault Tolerance

CDFS is designed to be tolerant for temporary and permanent CubeSat failures

and its performance degrades gracefully with component, machine or link failures. A

CubeSat cluster can contain up to about a hundred CubeSats and are interconnected

with roughly same number of high speed wireless links. Because of a large number of

components and harsh space environment, some CubeSats or wireless links may face

intermittent problems and some may face fatal errors from which they cannot recover

unless hard reset by ground station. Source of the problem can be in application,

operating system, memory, connectors or networking. So failures should be treated as a

norm rather than an exception. In order to avoid system downtime or corruption of data,

system should be designed to handle the failures and its performance should degrade

gracefully with failures. Below we discuss how we handle these errors when they come

up.

4.5.5 Master Failure

Master node stores metadata, which consists of mapping between the files to

chunks and chunks to worker nodes. If the Master node fails, the mission will fail. In

order to avoid mission failure in case of Master failure, metadata is written to Masters

non-volatile memory, like flash, and the same is communicated to the Server. If the

Master reboots because of a temporary failure, a new copy will be started from the last

known state stored in Masters non-volatile memory. In case of failure of Master, worker

nodes will wait until a new Master resumes.

4.5.6 Worker Failure

Worker nodes send Heartbeat messages to master once every 10 minutes. If a

worker reports a fatal error, Master marks the worker node as failed. If a worker does

not send heartbeat message with in 10 minutes, Master will mark the node as temporary

failure. If Master does not receive heartbeat message with in 30 minutes, Master marks

57

Page 58: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

the worker as failed. Once the failed node comes back online, Masters metadata will be

refreshed to account the change.

4.5.7 Chunk Corruption

Harsh space environment and the cosmic rays lead to frequent memory corruption.

One of the computer systems of the Mars rover Curiosity had a memory problem due

to high energy particles and resulted in a major setback for mission. Thus, ensuring

the integrity of data stored on CDFS is of paramount important. CDFS uses checksum

of data for detecting bad data. Performing data integrity operations on entire chunk is

inefficient. If a chunk is found to be corrupt, discarding the whole chunk will lead to a lot

of wasted IO. It also requires a lot of time and memory to read the whole chunk (64 KB),

to verify its integrity. Thus, each chunk is split into blocks of 512 bytes. CDFS stores

CRC of each block of data and performs checksum validation at the block level. When a

read operation is performed on a chunk, block by block is read and each block is verified

for data integrity by comparing the stored checksum with newly computed checksum.

This way if one of the blocks is found to be corrupt, only that block is marked bad and

can be read from another healthy replica of the chunk. Employing data integrity check at

the block level ensures that partial IO or downlinking that was done before detecting the

data corruption will not go waste. Doing data integrity at block levels also increases the

availability of data.

4.5.8 Inter CubeSat Link Failure

Owing to harsh space environment, communication links fail often. If a CubeSat to

CubeSat link fails, the child node in the routing tree will retry connect to its parent. If the

link re-establishment is not successful or the link quality is bad, the child node will ping

its neighbours and search for a new parent node and joins the cluster.

4.5.9 Network Partitioning

Sometimes a single CubeSat or several CubeSats may get separated from the

CubeSat cluster. This phenomenon is called network partitioning. In either case, the

58

Page 59: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

data stored on the separated nodes will be retained and will be available for downlinking

to the ground stations. Using the downlinked metadata, separated CubeSats can be

contacted by Server via ground stations for downlinking the data.

4.6 Simulation Results

We simulated CubeSat Distributed File System with a CubeSat cluster consisting of

one master node and 5 - 25 worker nodes. Each CubeSat has a processing clocked at

1 GHz, 1 GB RAM, 32 GB of flash storage memory, 1 Mbps inter-cluster communication

link and 9.6 kbps CubeSat to ground station data rate. Our simulation results indicate

that file storing time for 100 MB file on cluster of size 10 is about 12.96 minutes. Since

file storing time is only few minutes, it is negligible compared to file processing and file

downlinking time, which are in hours.

4.7 Summary of CubeSat Distributed File System

We built CubeSat Distributed File System to store large files in distributed

fashion and thus enable distributed applications like CubeSat MapMerge [4] and

CubeSat Torrent [5] on CubeSat Clusters. It treats component and system failures as

a norm rather than the exception and is optimized for processing satellite images and

remote sensing which are huge by nature. CDFS provides fault tolerance by constant

monitoring, replicating crucial data, and does automatic recovery.

In CubeSat Clusters, network bandwidth and power are scarce resources. A

number of optimizations in our system are therefore targeted at reducing the amount of

data and control messages sent across the network. Copy-on-transmit enables making

replicas without any additional or very little bandwidth or energy consumption. Failures

are detected using Heartbeat mechanism. CDFS has built in load balancers for several

use cases like CubeSat MapMerge and CubeSat Torrent and allows use of user defined

custom load balancers.

59

Page 60: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

CHAPTER 5DISTRIBUTED PROCESSING OF REMOTE SENSING IMAGES ON CUBESAT

CLUSTERS

Processing power of CubeSats is about 1 GHz. Lack of available power and active

cooling of microprocessors further restricts the available processing power. As a result,

processing intensive remote sensing applications cannot be performed on individual

CubeSats in a meaningful amount of time. Distributed computing offers a solution

to this problem. By pooling processing power of individual CubeSats in a cluster,

processing of large remote sensing files can be speeded up. CubeSat Cloud uses

CubeSat MapMerge to process remote sensing data on CubeSat Clusters.

5.1 CubeSat MapMerge

CubeSat MapMerge is inspired by MapReduce and is tailored for CubeSat clusters.

Master node orchestrates CubeSat MapMerge. Master node commands the worker

nodes to process the chunks stored with them. Worker nodes process the chunks and

produce intermediate results. As soon as the workers process chunks, they downlink

the partial solutions to the Server. Once the Server gets all the results, it stitches the

intermediate solutions to obtain the full solution. Master node takes care of scheduling

map tasks, monitoring them and re-executing the failed tasks. The worker nodes

execute the subtasks as directed by the master. Figure 5-1 shows an overview of

how an image can be processed using CubeSat MapMerge and is explained in brief in

following steps.

1. Master node splits the image into chunks and distributes them to the worker nodesin the cluster using CDFS.

2. Worker nodes process the splits given to them to produce partial solutions anddownlink the solutions to Server.

3. Server stitches the downlinked partial solutions into full solution.

60

Page 61: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 5-1. Example of CubeSat MapMerge

5.2 Command and Data Flow during a CubeSat MapMerge Job

Figure 5-2 shows the flow of data and commands during a CubeSat MapMerge

operation. When the Administrator issues a process command to the Server (Ex:

process image.jpg), the following actions occur in the sequence noted.

1. Uplinking the command: Administrator issues a command to Server. Serverforwards the command to Ground station, which uplinks the command to masterCubeSat. (Ex: take an image of a particular area and process it).

2. Work assignment: Master node commands the worker nodes to process thechunks stored with them and downlink the results.

61

Page 62: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 5-2. Overview of execution of CubeSat MapMerge on CubeSat cluster

62

Page 63: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

3. Map phase: Worker node process the chunks stored with them and stores theresult locally.

4. Downlinking the results: As and when a worker node processes a chunk, itdownlinks the solution to a ground station. Ground station forwards the solution toServer. Downlinking of results is achieved through CubeSat Torrent.

5. Reduce phase: Once Server receives all the partial solutions, it stitches them intofull solution.

More details about CubeSat MapMerge are presented in the paper CubeSat

MapMerge [4].

5.3 Fault Tolerance, Failures, Granularity and Load Balanc ing

CubeSat MapMerge is tolerant to temporary and permanent CubeSat failures. Its

performance degrades gracefully with component, machine or link failures. Metadata is

replicated to avoid mission failure in case of failure of the master node. Worker failures

are detected using Heartbeat mechanism. If a worker node fails, the tasks assigned

for the worker node are rescheduled on other worker nodes. Data chunks are split

into a large number of pieces to improve granularity and load balancing. Chunk size is

selected to be about 64 KB in order to balance the advantages of granularity with control

traffic overhead.

5.3.1 Fault Tolerance

CubeSat MapMerge is designed to be tolerant to temporary and permanent

CubeSat failures and its performance degrades gracefully with component, machine or

link failures. A CubeSat cluster can contain up to about a hundred CubeSats and are

interconnected with roughly same number of high speed wireless links. Because of a

large number of components and harsh space environment, some CubeSats or wireless

links may face intermittent problems and some may face fatal errors from which they

cannot recover unless hard reset by ground station. So failures should be treated as the

norm rather than an exception. In order to avoid system downtime or corruption of data,

system should be designed to handle the failures and its performance should degrade

63

Page 64: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

gracefully with failures. Below we discuss how we handle these errors when they come

up.

5.3.2 Master Failure

Master node stores metadata, which consists of mapping between the map jobs

to worker nodes and the state of map jobs. In order to avoid mission failure in case

of failure of the master node, periodically metadata is written to masters non-volatile

memory, like flash, and the same is communicated to the Server. If the master reboots

because of a temporary failure, a new copy will be started from the last known state

stored in masters non-volatile memory. If the master cannot recover from error, Map

Reduce mission is aborted and raw data can be downlinked to the Server.

5.3.3 Worker Failure

Worker nodes periodically send Heartbeat messages containing their status and

problems, if any. If a worker reports fatal error, master marks the worker node as failed.

Processing task assigned to the worker node is reset back to its initial idle state and is

scheduled on other worker node containing the replica of the chunk.

5.3.4 Task Granularity and Load Balancing

By splitting the data into a large number of pieces, task granularity will be improved.

CubeSats with a faster processor or special hardware like GPU, DSP or FPGA can

process an order of magnitude large number of map tasks than a standard CubeSat.

Fine task granularity will ensure better load balancing. However, as the number of

chunks increase so does the metadata operations and control messages, leading to

decrease in the system performance. To balance the advantages of granularity with the

control traffic overhead, chunk size is selected to be about 64 KB.

5.4 Simulation Results

We simulated CubeSat MapMerge with a CubeSat cluster consisting of one master

node and 5 - 25 worker nodes. Each CubeSat has a processing clocked at 1 GHz, 1

GB RAM, 32 GB of flash storage memory, 1 Mbps inter-cluster communication link and

64

Page 65: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

9.6 kbps CubeSat to ground station data rate. We processed images using de-noise,

entropy, peak detection, segmentation and Sobel edge detection algorithms. We used

Scikit Python image processing library for processing the images. Our simulations

indicate that CubeSat MapMerge, with cluster sizes in the range of 5 - 25 CubeSats,

can process images at about 4.8 - 23.4 times faster than an individual CubeSat. These

results indicate that CubeSat MapMerge can speedup processing intensive remote

sensing missions by a factor of size of the cluster. More detailed results are presented in

Section 8: Simulation results.

5.5 Summary of CubeSat MapMerge

CubeSat MapMerge is a very simple, yet efficient distributed processing framework

for processing of remote sensing images on CubeSat clusters. It treats node and link

failures as a norm rather than an exception and is optimized for processing remote

sensing images. It provides fault tolerance by constant monitoring, replicating crucial

data, and fast and automatic recovery. With Heartbeat mechanism to detect failures and

redundant execution to recover from failures this design is fault tolerant. Optimal chunk

size balances the advantages of granularity with control traffic overhead. Load balancing

takes into account of nodes with multi core processors, graphic processing units,

digital signal processors and FPGAs into account and distributed the data accordingly.

CubeSat MapMerge can speedup processing intensive remote sensing missions by a

factor of size of the cluster.

65

Page 66: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

CHAPTER 6DISTRIBUTED COMMUNICATION OF REMOTE SENSING IMAGES FROM CUBESAT

CLUSTERS

Due to stringent space constraints, CubeSats typically use monopole, dipole and

turnstile antennas. As a result, a typical CubeSat to ground station link has a data rate

of 9.6 kbps. Low speed data communication is one of the major bottlenecks for remote

sensing missions that require downlinking of large amounts of data. For emerging

remote sensing missions, communication bottleneck poses a severe threat as the

connectivity with ground station will be very limited, intermittent and comes at a very

high price. As a result, data intensive remote sensing applications cannot be performed

using individual CubeSats in a meaningful amount of time. Distributed communication

offers a solution to this problem. By pooling the communication resources of individual

CubeSats in a cluster, downlinking of large sized remote sensing images can be

speeded up.

We studied CubeSat Communication protocols including AX.25 [29] and CubeSat

Space Protocol (CSP) [30]. All these protocols are point-to-point and does not support

any form of distributed communications for faster downloading of large data files like

images or videos. Currently there are no protocols for downloading data from CubeSat

clusters in a distributed fashion. So we designed CubeSat Torrent based on Torrent

communication protocol to speedup remote sensing missions requiring downlinking of

large amounts of data. CubeSat Cloud uses CubeSat Torrent for distributed downlinking

of remote sensing data from CubeSat Clusters.

6.1 CubeSat Torrent

CubeSat Torrent [5] is a distributed communications framework inspired by Torrent

[31]. CubeSat Torrent works in the following way. Master node plays the role of tracker.

It keeps track of all the worker nodes in the cluster and their available downlink capacity.

When the Server requests for a file to be downlinked, Master node commands the

worker nodes in the cluster to downlink the chunks or partial solutions stored with

66

Page 67: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

them. Worker nodes simultaneously downlink chunks to various ground stations.

Ground stations forward the chunks to the Server. Once Server receives all the chunks,

Server stitches them to generate the original file. Figure 6-1 shows an overview of how

CubeSat Torrent works.

6.2 Command and Data Flow During a Torrent Session

1. Uplinking the command: Server sends a file downlink command to the groundstation, which uplinks it to the Master.

2. Distributing the subcommands: Master issues subcommands to the worker nodesstoring the chunks of the file to downlink them.

3. Downlinking the chunks: When a worker gets a chunk downlink command, it readsthe chunk from its local file system and starts downlinking it to the connectedground station.

4. Notification: Upon successful downloading of chunk, worker notifies master andcontinues to next chunk. This process repeats until all chunks are downlinked.

5. Forwarding of chunks: Once ground station receives chunk, it forwards the chunkto Server.

6. Reconstructing original file: Once all the chunks are downlinked to the Server,Server stitches the chunks into the original image.

6.3 Enhancements and Optimizations

We made several enhancements and optimizations to CubeSat Cloud to improve

performance. Below we present the enhancements, particularly for remote sensing

missions, which require only downlinking of remote sensing data.

6.3.1 Improve Storage Reliability and Decrease Storage Overh ead

CubeSat Cloud uses redundancy to provide reliability. Each chunk is replicated 3

times, so that even if a CubeSat fails or loses a chunk, the chunk is still available with

two other CubeSats. Replication provides access to raw data at each worker node so

that data can be processed before it is downlinked to the ground station. It also leads

to a lot of communication and storage overhead. Replicating each chunk 3 times, leads

to 200% storage overhead and 10% - 25% communication and energy consumption

67

Page 68: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 6-1. Overview of CubeSat Torrent

68

Page 69: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

overhead. More details about overhead resulting due to replication are discussed in

detail in the paper Distributed Data Storage for CubeSat Clusters [32]. For remote

sensing missions which only need to downlink the data, there is no advantage of having

access to raw data as worker nodes does not process the data. This can be used as

an opportunity to reduce the storage and communication overhead. Once the Master

performs sensing, it creates chunks C1, C2, C3 . . . Cn of raw data. Then, based on the

required redundancy, it creates coded chunks C1’, C2’, C3’ . . . Cm, where m >n. Master

node then distributes these chunks to the worker nodes. As long as n out m chunks

are downlinked, the original image can be recovered. More details about performance

analysis of source coding are discussed in Chapter 8.

6.3.2 Using Source Coding to Improve Downlink Time

Some worker nodes take unusually long time to downlink a chunk. These nodes

are called straggler nodes. There can be several reasons for this like a bad antenna,

cache failures, scheduling of intensive background tasks, very low speed link, etc. If raw

data is downlinked directly, downlink is not complete until the last chunk is downlinked

to the Server. If a straggler node takes very long time to downlink a chunk, no matter

how fast the other nodes downlink the rest of the chunks, file downlink will still be slowed

down due to the delay in downlinking of chunk by the straggler node. To mitigate the

risk of slowdown of downlinking of a file by stragglers, CubeSat Cloud performs uses

duplicate downlinking of last few chunks as explained in CubeSat Torrent. However,

for missions that require only downlinking of remote sensing data, efficiency of this

mitigation mechanism can further be improved by used of source coding. After Master

performs sensing, it creates chunks C1, C2, C3 . . . Cn of raw data. Then, based on the

required redundancy, it creates coded chunks C1’, C2’, C3’ . . . Cm, where m >n. Master

node then distributes these chunks to the worker nodes. When the Master receives the

DOWNLINK command from Server, it starts downlinking the chunks C1’, C2’, . . . in usual

way until N-W chunks are downlinked to the Server, where N is the number of chunks

69

Page 70: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

and W is the number of Worker nodes in the Cluster. At that point, only W chunks needs

to be downlinked to the Server to complete the file download. If any of the W chunk

downloads take unusually long, the whole file download will take more time. In order to

prevent slowdown of downlinking of files due to stragglers, Master schedules more than

W chunk downloads. As a result, even if straggler nodes slow down downlinking of some

chunks, required number of chunks (N) will be downlinked to the Server at the highest

possible speed. Once the Server receives N chunks, it undoes the source coding to

create the original file from the downlinked chunks. More details about performance

analysis of source coding are discussed in Section 8.

6.3.3 Improving the Quality of Service for Real-time Traffic A pplications Like VoIP

Real-time traffic applications like VoIP need high quality of service. Traditional

methods provide better quality of service through the use of forward error correction and

or retransmission. Given that bandwidth is premium for CubeSat communications, large

amounts of forward error correction data means high overhead and thus less bandwidth

for actual data. Retransmissions lead to increased downlink times. Other methods

for providing quality of service include the use of multiple channels to send copies of

packets creating redundant transmissions. Although, these methods are computationally

less intensive, they do not ensure resilience to the losses and reduce overall throughput

of the system.

Consider a scenario where CubeSat Torrent is used for streaming data from Master

(Sensor) node. As explained before, Master node splits the raw data into chunks. Let’s

suppose that the data frame (D) for time ti is split into chunks C1, C2, C3 . . . Cn. Master

uses these chunks to create linear coded chunks C1’, C2’, C3’, . . . Cm, where m >n.

Master forwards the coded packets to the worker nodes, which downlink them to the

ground stations. In the process of this downlinking, some packets are lost. Rest of the

packets reach Server, which then stitches them back into D, the original data frame.

Server can obtain D back from coded packets as long as it receives at least n of them

70

Page 71: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

or if a maximum of m-n packets are lost in transmission. If the Master node notices that

more than r packets are being lost on their way to Server, it increases the redundancy

by increasing m and thus increasing r. More details and results about our source coding

technique are presented in the paper Robust Communications for CubeSat Cluster

using Network Coding [33].

6.4 Fault Tolerance, Failures, Granularity and Load Balanc ing

CubeSat Torrent incorporates several mechanisms to make itself tolerant to

temporary and permanent CubeSat failures. Its performance degrades gracefully

with communication link failures. Worker node failures are detected using Heartbeat

mechanism. If a worker node fails, the downlink tasks assigned for the worker node

are rescheduled on other worker nodes. Size of data chunks is selected to be 64 KB

to improve granularity and load balancing. Chunk size is selected to be about 64 KB

in order to balance the advantages of granularity with metadata and control traffic

overhead.

6.4.1 Fault Tolerance

CubeSat Torrent is designed to be tolerant to temporary and permanent CubeSat

failures and its performance degrades gracefully with the machine or link failures.

Failures are the norm rather than an exception. A cluster can contain up to hundred

nodes and is connected, with roughly about the same number of ground stations,

through long distance wireless links. The quantity and quality of the links virtually

guarantee that some links break intermittently and are not functional at any given

time, and some will not recover from their failures. Problems can be caused by human

errors, CubeSat mobility, bad antennas, communication system bugs, memory failures,

connectors and other networking hardware. Such failures can result in an unavailable

communication links or can lead to data corruption. Therefore, constant monitoring,

error detection, fault tolerance, and automatic recovery must be a part of the system.

71

Page 72: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Below we discuss how we meet these challenges and how we resolve the problems

when they occur.

6.4.2 Master Failure

Master writes periodic checkpoints of all the master data structures. If the master

task dies, a new copy will be started from the last checkpointed state. Master node

represents the single point of failure for the CubeSat Torrent. In order to avoid mission

failure in case of failure of the master node, periodically metadata is written to masters

nonvolatile memory, like flash, and the same is communicated to the Server. If the

master reboots because of a temporary failure, a new copy will be started from the last

known state stored in masters nonvolatile memory. In case of failure of master, data can

be downlinked to from the worker nodes to the Server.

6.4.3 Worker Failure

Periodically workers send Heartbeat message to Master node. Heartbeat message

contains the status of the worker and problems, if any. If the Master does not receive

Heartbeat message from Master with in 30 minutes, master marks the worker as failed.

Downlink task assigned to the worker is reset back to its initial idle state, and scheduled

on other worker nodes. If a worker loses connection with ground station, it retries

with same or different ground station. If it cannot connect to any ground station within

a certain amount of time, it signals failure to master. Master marks the worker as a

temporary failed node. If the worker cannot connect to the ground station, Master marks

the worker as failed node and reschedules the downlinking job assigned to the failed

worker to another worker.

6.4.4 Task Granularity

Master divides the file to be downloaded into C chunks. Ideally, C should be much

larger than the number of worker machines. Having each worker download many

different chunks improves dynamic load balancing. However, as C increases, so does

the amount of control traffic and delays resulting from exchange of control information.

72

Page 73: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

In order to balance the advantages of granularity with the overhead incurred due to

control traffic, C is chosen to be about 64 KB.

6.4.5 Tail Effect and Backup Downloads

Some nodes takes unusually long time to downlink a chunk. These nodes are

called stragglers. Reasons behind them could be a bad antenna or a very low speed

link. To mitigate the risk of slowdown of downlinking or uplinking of a file by stragglers,

CubeSat Torrent uses backup downloads. When a file downlink or uplink operation is

close to completion, the master schedules backup downlinking tasks for the remaining

in-progress chunks. The chunk is marked as downlinked whenever either the primary

or the backup worker finishes downlinking. This is only a design feature and is not

implemented.

6.5 Simulation Results and Summary of CubeSat Torrent

We simulated CubeSat Torrent on a CubeSat cluster consisting of one master,

5 - 25 workers and 5 - 25 ground stations. Each CubeSat has a processing speed of

1 GHz, 1 GB RAM, 32 GB flash storage and ground communication data speed 9.6

kbps. CubeSats in the cluster are connected to each other through 1 Mbps high speed

inter-cluster communication links. Our simulation results indicate that CubeSat Torrent,

with cluster sizes in the range of 5 - 25 CubeSats, enables 4.71 - 22.93 times faster

(compared to a single CubeSat) downlinking of remote sensing data. CubeSat Torrent

can potentially speed up CubeSat missions requiring remote sensing data downlinking

by a factor of size of the cluster.

CubeSat Torrent demonstrates the essential qualities for downlinking of large size

remote sensing data for CubeSat clusters. It is fault tolerant and scalable. It provides

fault tolerance by constant monitoring, replicating crucial data, and fast and automatic

recovery. Optimal chunk size balances amount of overhead from control message

traffic and advantages of granularity. Checksumming is used to detect data corruption.

Proposed design delivers high aggregate throughput which is required for a variety

73

Page 74: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

of missions. We achieve this by splitting the file into chunks and downlinking them

in parallel from workers to ground stations. Simplified design and minimal metadata

operations result in very low overhead.

74

Page 75: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

CHAPTER 7SIMULATOR, EMULATOR AND PERFORMANCE ANALYSIS

For simulating and measuring performance of CubeSat cloud, we created

a CubeSat Cloud simulator. For verifying the simulation results, we also created

a CubeSat Cloud testbed consisting of 5 CubeSats. We used Raspberry Pi mini

single-board computer for emulating a CubeSat and desktop computer for emulating the

Server and ground stations. Below is a detailed description of CubeSat Cloud simulator

and emulator.

7.1 Hardware and Software of Master and Worker CubeSats for Emul ator

Master and Worker are emulated using Raspberry Pi. Raspberry Pi is a mini

single-board computer developed by the Raspberry Pi Foundation. Figure 7-1 shows

various components of Raspberry Pi. It has a Broadcom BCM2835 system on a chip

(SoC), has 512 MB of RAM and uses an SD card for booting and long-term storage.

Debian and Arch Linux ARM distributions are available for running on Raspberry

Pi. Python is the primary advocated programming language to be used with the

platform, although support for BBC BASIC, C, and Perl is there. Below are more

detailed specifications of a Raspberry Pi Model B single-board computer. We processed

images using de-noise, entropy, peak detection, segmentation and Sobel edge detection

algorithms. We used Scikit Python image processing library for processing the images.

• Processor: Raspberry Pi runs on Broadcom BCM2835 SoC chip. Broadcom chipincludes ARM1176JZFS processor clocked at 700 MHz, floating point unit, andVideoCore 4 GPU.

• Graphics: With VideoCore GPU, Raspberry enables hardware-acceleratedgraphics capable of rendering 1Gpixel/s

• SDRAM: Model B comes with 512 MB RAM. 512 MB is shared with GPU. RAM isgenrally clocked from 400 MHz to 500 MHz.

• Storage: There is no bootable flash disk, instead boots from pluggable SD card. Aminimum of 2 GB is required, but more then 4 GB is suggested.

75

Page 76: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

• Power ratings: Raspberry Pi draws about 300 mA (1 W) in idle power mode andabout 700 mA (2.2 W) when all peripherals are active.

• Ports: Raspberry comes with 10/100 BaseT Ethernet, HDMI and 2 USB ports. It ispowered using microUSB interface. Its size is roughly about 9 x 6 x 2 cm.

• Low-level peripherals: It has 8 General Purpose IO (GPIO) pins, a UART, an I2Cbus, a SPI bus with two chip selects and I2S audio.

Figure 7-1. Raspberry Pi mini computer

Image courtesy of Matthew Murray

The specifications in terms of processing power and memory are very similar to that

of a CubeSat. So, we used Raspberry Pi to emulate a CubeSat in the CubeSat Cloud

testbed.

76

Page 77: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

7.2 Hardware and Software of Server and Ground Station for Emulat or

Server and ground station hardware are implemented using Dell Optiplex 755 model

desktop computers. Below are the specifications of these machines:

• Processor: It comes with two Intel Core 2 Duo CPU E8400, clocked at 3.00 GHz.

• Memory: It comes with 4 GiB of RAM.

• Graphics: It is powered by VESA RV610 graphics card.

• OS Type: It is configured to run Ubuntu LTS 12.04.03 Precise Pangolin, 32-bitversion.

• Disk: It has 240 GB of storage disk for OS and permanent storage.

We used the open source Ubuntu 12.04.03 Long Term Support (LTS) version as our

base Operating System for Server and ground station. For running twisted applications,

we used Python 2.7.3 version as Python versions above 3.0 did not have full support

for Twisted framework. We used Python python-twisted 11.1.0 built for Ubuntu using the

python-twisted package.

7.3 Network Programming Frameworks

In order to develop CubeSat Cloud framework, we researched available network

programming frameworks in Python, including Twisted, Eventlet, PyEv, asynccore,

Tornado. Below is a brief description of the each of these frameworks.

7.3.1 Twisted

Twisted is considered as the best reactor frameworks available in Python. It is a little

bit complex and has a steep learning curve, but is elegant and provides all necessary

features required for developing asynchronous applications.

7.3.2 Eventlet

Eventlet was developed by Linden Lab. It is based on Greenlet framework, which

is geared towards asynchronous network applications. It is non-pep8 compliant

tough. Logging mechanism is not implemented to the full extent, the API is somewhat

inconsistent.

77

Page 78: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

7.3.3 PyEv

PyEv is based on libevent framework. It needs to be developed lot more to be

considered as a serious competitor with other network programming frameworks. There

does not seem to be big companies using this framework, as of now.

7.3.4 Asyncore

Asyncore is based on stdlib and is a very low-level framework. There is not much

support for high-level network operations, so a lot of boiler code needs to be written just

to get started with network applications.

7.3.5 Tornado

Tornado is a very simple python server meant for developing dynamic websites.

It features async HTTP client and a simple ioloop. Its simple, but not provide required

callback features to be considered a candidate for implementing CubeSat Cloud.

7.3.6 Concurrence

Concurrence is a networking framework for creating massively concurrent

network applications in Python. It exposes a high-level synchronous API to low-level

asynchronous IO using libevent. It runs using either Stackless Python or Greenlets.

All blocking network I/O is transparently made asynchronous through a single libevent

loop, so it is nearly as efficient as a real asynchronous server. It is similar to Eventlet in

this way. The downside is that its API is quite different from Python’s sockets/threading

modules.

7.4 Twisted Framework

Of the frameworks, we researched into Twisted was best suited for our job, since it

provided handy features like callbacks, deferreds, etc. along with a strong community

support. It is an asynchronous event based network programming framework. It is

implemented in Python programming language and licensed under open source MIT

license. Call backs are the core part of the Twisted framework. Users write callbacks

78

Page 79: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

and register them to be called when events happen (as a connection is made, a

message is received, or connection is lost).

7.5 Network Configuration

CubeSat to ground station communication link is modelled with data rate of 9600

bps, a delay of 2 ms with a jitter of 200 us following normal distribution. We modelled

the CubeSat Cluster communication links using the specifications of RelNAV. Data rate

is 1 Mbps, link communication delay of 0.1 ms. Packet loss rate was set at 0.3%, with

a 25% loss correlation in order to simulate packet burst loses. We used Hierarchical

Token Bucket (HTB) and tc networking tool on Linux to shape the network traffic to our

requirements.

7.6 CubeSat Cloud Emulator Setup

CubeSat Cloud emulator consists of one Server, one Master, 5 Worker nodes

and 5 Ground stations. CubeSat Cloud emulator is shown in the Figure 7-2. Master

and worker CubeSats are emulated using Raspberry Pi, since the CubeSats footprint

(processing power and RAM) matches with that of Raspberry Pi’s. Server and ground

stations are emulated using Dell Optiplex computer. All the components are connected

using a Gigabit Ethernet switch. CubeSat to CubeSat and CubeSat to ground station

communication links are configured as described in the section 7.5.

7.7 CubeSat Cloud Simulator Setup

CubeSat Cloud simulator consists of one Server, one Master, 5 - 25 Worker nodes

and ground stations. System architecture of CubeSat Cloud simulator is shown in the

Figure 7-3. Simulation is run on the Dell Optiplex computer described in section 7.2.

Master and Worker CubeSat are simulated using the profiling results obtained from

the emulator. Components communicate to each using TCP/IP sockets of localhost

interface. CubeSat to CubeSat and CubeSat to ground station communication links are

configured as described in the section 7.5. Simulation results are presented below.

79

Page 80: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-2. CubeSat Cloud emulator

80

Page 81: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-3. CubeSat Cloud simulator

81

Page 82: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

7.8 CubeSat Reliability Model

Data reliability is achieved through replication. Each remote sensing image is

split into chunks and distributed to worker nodes. Each chunk is replicated on multiple

CubeSats, so that if some CubeSats fail, data is still available on other CubeSats.

Number of replicas per chunk is primarily governed by required availability of data and

node failure rate. Availability of an image (A) is given by,

A = (1− fR)C × 100

Where, f is the probability of failure of a node, R is the number of replicas of each

chunk and C is number of chunks of the file. To find the CubeSat failure probability,

we collected data about lifetimes of the CubeSats that are launched so far. Figure 7-4

shows a summary of the lifetime of launched CubeSats. More details about CubeSats

launched so far can be obtained from ”A Survey of Communication Sub-systems for

Inter-satellite Linked Systems and CubeSat Missions” [34]. Using the above data, we

calculated that the mean lifetime of a CubeSat is about 1204 days. And depending on

the downlink speeds and mission data size (about 100 MB), a remote sensing mission

can take about 1 day. So the probability of failure of CubeSat during a mission (f) is

about 10−3. Typical number of chunks per file (C) is about 1000. With a redundancy

of 1 (2 replicas for each chunk) CDFS provides an availability of 99.98 and with a

redundancy of 2 (3 replicas) CDFS provides an availability of 99.9999. We targeted an

availability of 99.9999. So each chunk needs to be replicated 3 times.

7.9 Simulation and Emulation Results

7.9.1 Profiling Reading and Writing of Remote Sensing Data Chunk s on Rasp-berry Pi

In order to build a simulation framework, we did profiling of reading chunks from

flash storage and writing chunks to flash storage of remote sensing data chunks on

Raspberry Pi single board mini-computer. Profiling results are reported in Figure 7-5.

Average reading and writing times for a chunk of size 64 KB are 4.91 and 15.66 ms.

82

Page 83: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-4. Lifetimes of CubeSats

This shows that, reading and writing a file of 100 MB will take about 8 and 25 seconds.

Compared to this, processing and downlinking a chunk will take order of hours. So

reading and writing times are negligible compared to time taken for processing and

downlinking a remote sensing image.

7.9.2 Processing, CubeSat to CubeSat and CubeSat to Ground Statio n ChunkCommunication Time

We did profiling of processing time, CubeSat to CubeSat communication time

and CubeSat to Ground station communication time. We processed images using

de-noise, entropy, peak detection, segmentation and Sobel edge detection algorithms.

We used Scikit Python image processing library for processing the images. Processing

time is the average of time taken by Raspberry Pi to process a chunk using the above

mentioned image processing algorithms. Communication links are simulated using

the parameters specified in section 8.5 Network Configuration. Profiling results are

reported in Figure 7-6. Average CubeSat to CubeSat chunk communication time is

83

Page 84: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-5. Read and write times of a chunk

about 1.19 seconds, processing time is 15.62 seconds and CubeSat to Ground Station

communication time is about 68.29 seconds. This result shows that chunk processing

time and chunk communication time from CubeSat to ground station are more than

an order of magnitude larger than chunk communication time between CubeSat to

CubeSat. As a result, distributing a file on the cluster will be much faster compared

to processing and downlinking the file. These results also indicate that processing an

image of size 100 MB on a single CubeSat will take about 7 hours and downlinking

the same will take about 30 hours. Hence we need to parallelize processing and

downlinking of remote sensing images.

7.9.3 Storing Remote Sensing Images using CubeSat Cloud

Figure 7-7 shows the time taken for storing (splitting am image into chunks and

distributing the chunks to the worker nodes in the cluster) an image on the CubeSat

84

Page 85: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-6. CubeSat to CubeSat and CubeSat to ground station chunk communicationprofiling

cluster for various cluster and image sizes. For cluster size of 1 (a single CubeSat), file

storing time is almost zero (11 seconds for a file of size 100 MB), since the files only

needs to be split into chunks and does not needs to be distributed over the network.

Average file storing time for 100 MB file on cluster of size 10 is about 12.96 minutes.

Since file storing time is only few minutes, it is negligible compared to file processing

and file downlinking time, which are in hours.

7.9.4 Processing Remote Sensing Images using CubeSat Cloud

Figure 7-8 shows the image processing times for various cluster and image sizes.

We processed images using de-noise, entropy, peak detection, segmentation and

Sobel edge detection algorithms. We used Scikit Python image processing library for

processing the images. Processing time is the average of time taken by CubeSat Cloud

to process the remote sensing images using the above mentioned image processing

algorithms. For cluster size of 1 (a single CubeSat), file processing time is 448 minutes.

Average file processing time for 100 MB file on clusters of size 10 and 25 is about 47

and 19 minutes respectively. This results in a savings of 401 minutes for processing a

file on cluster of size 10 and 429 minutes on a cluster of size 25. CubeSat MapMerge

85

Page 86: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-7. File distribution time for various file sizes and cluster sizes

reduces the processing time from about 8 hours to less than an hour and thus is

attractive for processing large size remote sensing images.

Figure 7-8. File processing time for various file sizes and cluster sizes

7.9.5 Speedup and Efficiency of CubeSat MapMerge

We studied the variation of speedup and efficiency of CubeSat MapMerge with

variation in cluster size. Speed up is defined as the ratio of time taken by the cluster

86

Page 87: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

to process image to the time taken by a single CubeSat to process the same image.

Efficiency is defined as ratio of speed up of the cluster to the cluster size expressed in

percentage. Figure 7-9 shows the variation of processing speedup with cluster size

for large files (>10 MB). For cluster sizes of 10 and 25 the speedup is 9.54 and 23.40

respectively. Figure 7-10 shows the variation of processing efficiency with cluster size

for large files (>10 MB). For cluster sizes of 10 and 25 the efficiency is 95.38 and 93.61

respectively.

Figure 7-9. Speedup of CubeSat MapMerge

7.9.6 Downlinking Remote Sensing Images Using CubeSat Cloud

Figure 7-11 shows the image downlinking time for various cluster and file sizes.

Downlinking time is the time taken by the a CubeSat or CubeSat Cluster to downlink

a remote sensing image to the Server. A single CubeSat takes 1 day 6 hours of

connectivity time to downlink a file of size 100 MB. Compared to that average file

87

Page 88: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-10. Efficiency of CubeSat MapMerge

downlinking time for 100 MB file on cluster of size 10 needs only about 3 hours

13 minutes of connectivity. This results in a savings of about 27 hours of time for

downlinking a file of 100 MB. CubeSat Torrent reduces image downlinking time

approximately by the factor of the size of the cluster.

7.9.7 Speedup and Efficiency of CubeSat Torrent

We studied the variation of speedup and efficiency of CubeSat Torrent with variation

in cluster size. Speed up is defined as the ratio of time taken by the cluster to downlink

an image to the time taken by a single CubeSat to downlink the same image. Efficiency

is defined as ratio of total effective data speed of the cluster to the total raw data speed

of the cluster expressed in percentage. Figure 7-12 shows the variation of processing

speedup with cluster size for large files (>10 MB). For cluster sizes of 10 and 25 the

speedup is 9.35 and 22.93 respectively. Figure 7-13 shows the variation of processing

88

Page 89: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-11. File downlinking time for various file sizes and cluster sizes

efficiency with cluster size for large files (>10 MB). For cluster sizes of 10 and 25 the

efficiency is 71.95 and 70.59 respectively.

7.9.8 Copy On Transmit Overhead

Figure 7-14 shows the bandwidth overhead due to replication using Copy-On-Transmit

for various cluster sizes. Energy overhead is same as bandwidth overhead. Bandwidth

overhead for Copy On Transmit for cluster sizes of 10 and 25 is 35.71 and 9.61

respectively. Copy On Transmit leads to 200% storage overhead, as it creates two

explicit replicas.

7.9.9 Source Coding Overhead

Figure 7-15 shows the bandwidth overhead for single and double redundancy

due to Source Coding for various cluster sizes. With single redundancy, data can

be recovered in case of one failed CubeSat. Using double redundancy, data can be

recovered, even if two CubeSats fail. Bandwidth overhead for Source Coding for cluster

sizes of 10 and 25 varies from about 5 - 25% depending the number of redundant

chunks and cluster size. Energy overhead is same as bandwidth overhead.

89

Page 90: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-12. Speedup of CubeSat Torrent

7.9.10 Metadata and Control Traffic Overhead

Figure 7-16 shows the bandwidth overhead due to metadata and other control

information for various cluster and file sizes. Bandwidth overhead is about 0.4 - 1%.

Bandwidth and overhead percentage is mostly independent of the file size and varies

primary with the cluster size.

7.9.11 Comparison of CDFS with GFS and HDFS

Bandwidth and energy are very limited on CubeSat cluster. CDFS uses several

enhancements like using Master node as super replica node, Copy-on-transmit and liner

block source coding for reducing the energy and bandwidth consumption. Figure 7-17

shows the bandwidth required by CDFS and GFS (as well as HDFS) for writing a file of

100 MB to the cluster. CDFS consumes about 35 - 40% less bandwidth compared GFS

and HDFS. Figure 7-18 shows the time taken by CDFS and GFS (as well as HDFS)

90

Page 91: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-13. Efficiency of CubeSat Torrent

for writing a file of 100 MB to the cluster. CDFS writes are about 50% faster than GFS

and HDFS because of super replica node and reduced bandwidth requirements. Figure

7-19 shows the energy required by CDFS and GFS (as well as HDFS) for writing a file

of 100 MB to the cluster. CDFS consumes about 40% less energy compared GFS and

HDFS.

7.9.12 Simulator vs Emulator

Figure 7-20 shows the time required for writing, processing and downlinking remote

sensing image of size 100 MB. Simulator results are about 5-12% more then emulator

results. This discrepancy might be attributed to due delays in the simulation framework

because of large number of threads running simultaneously.

91

Page 92: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-14. Bandwidth overhead due to replication

Figure 7-15. Bandwidth overhead due to source coding

92

Page 93: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-16. Bandwidth and energy overhead

Figure 7-17. Bandwidth consumption of CDFS vs GFS and HDFS

93

Page 94: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-18. Write time of CDFS vs GFS and HDFS

Figure 7-19. Energy consumption of CDFS vs GFS and HDFS

94

Page 95: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

Figure 7-20. Simulator vs emulator

7.10 Summary of Simulation Results

We simulated CubeSat Cloud framework on CubeSat Cloud testbed. CubeSat

Cloud framework was developed using Python programming language. We simulated

CubeSat Torrent on a CubeSat cluster consisting of one master, 5 - 25 workers and 5

- 25 ground stations. Each CubeSat has a processor running at 1 GHz, 1 GB RAM, 32

GB non-volatile memory, 1 Mbps inter-cluster communication link and 9.6 kbps ground

station data rate. Server and ground stations are connected to each other via Internet

through 10 Mbps data rate communication links.

We simulated CubeSat Cloud with various cluster sizes. Our simulation results

indicate that for cluster sizes in range of 5 to 25 CubeSats, a speedup of 4.75 - 23.15

times faster (compared to a single CubeSat) processing and downlinking of remote

sensing images can be achieved. Simulation results closely match with results from the

testbed.

95

Page 96: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

CHAPTER 8SUMMARY AND FUTURE WORK

Weight, power and geometry constraints severely limit processing and communication

capabilities. A CubeSat has about 1 GHz processing capability, 1 GB RAM, 32 - 64

GB of flash memory and CubeSat to ground station communication data rate of 9.6

kbps. As a result, processing and communication intensive remote missions, which

generate about 100 MB per sensing operation, cannot be completed in a meaningful

amount of time. Processing a remote sensing image of size 100 MB takes about 8

hours and downlinking takes a day and quarter with current infrastructure. We consider

the possibility of using distributed storage, processing and communications for faster

execution of remote sensing missions.

We propose, CubeSat Cloud, a framework for distributed storage, processing

and communication of remote sensing data on CubeSat Clusters. CubeSat Cloud

is optimized for storing, processing and downlinking of large sized remote sensing

data which is of order of hundreds of megabytes. CubeSat Cloud uses CubeSat

Distributed File System for storing remote sensing data in distributed fashion on the

cluster. CubeSat Distributed File System splits the large size remote sensing data in

chunks and distributes them to the worker nodes in the cluster. Metadata consisting of

file to chunk mapping and chunk to worker node mapping is stored with Master node.

For processing distributed data CubeSat Cloud uses CubeSat MapMerge. Worker

nodes process the chunks stored with them and store the results obtained on the local

file system. Once the chunks are processed, they are downlinked to Server using

CubeSat Torrent. Server stitches the partial solutions into full solution. Component

and link failures are treated as norm instead of exceptions. Failures are detected using

Heartbeat mechanism and system is tolerant to component and link failures. CubeSat

Cloud implements several enhancements including copy-on-transmit and linear block

source coding to reduce consumption of scarce resources like power and bandwidth.

96

Page 97: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

For simulating CubeSat cloud we created, CubeSat Cloud testbed. We simulated

CubeSats using Raspberry Pis and testbed is written using Python-twisted, an event

based asynchronous network programming framework. Simulation results indicate

that CubeSat MapMerge and CubeSat Torrent, with cluster sizes in range of 5 - 25

CubeSats, enables 4.75 - 23.15 times faster (compared to a single CubeSat) processing

and downlinking of large sized remote sensing data. All this speed is achieved at almost

negligible bandwidth and memory overhead (1%). These results indicate that CubeSat

Cloud can speed up remote sensing missions by a factor of size of cluster.

8.1 Future work

Below is an overview of the future work as an extension to CubeSat Cloud.

Launching and deploying of the CubeSats into CubeSat cluster and maintaining the

cluster for long time periods needs to be looked into. CubeSat Cloud was designed

using Python programming language in order to support rapid prototyping. A flight ready

system can be built using C++ and the network stack can be optimized for CubeSat

communication channel characteristics. Link layer communication protocol can be

integrated into CubeSat Torrent to improve the efficiency of the downloads. From

CubeSat subsystems perspective, a lightweight CubeSat to CubeSat low distance high

speed LASER communication module will significantly enhance the efficiency of the

system and lead to reduced energy consumption.

97

Page 98: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

REFERENCES

[1] H. Heidt, J. Puig-Suari, A. Moore and R. Twiggs, “Cubesat: A new Generationof Picosatellite for Education and Industry Low-Cost Space Experimentation,”Proceedings of the Utah State University Small Satellite Conference, Logan, UT ,Citeseer, p. 12, 2001.

[2] Andrew E. Kalman (2010, Jan 15), “CubeSat Kit: Commercial Offthe Shelf Components for Cuebsats,” Retrieved July 16, 2012, fromhttp://www.cubesatkit.com/docs/datasheet/.

[3] J. Gozalvez, “Smartphones Sent Into Space [Mobile Radio],” Vehicular TechnologyMagazine, IEEE, vol. 8, no. 3, pp. 13–18, 2013.

[4] Obulapathi N. Challa and Janise Y. McNair, “Distributed Computing on CubesatClusters using Mapreduce,” iCubeSat, The Interplanetary CubeSat Workshop,2012.

[5] Obulapathi N. Challa and Janise Y. McNair, “CubeSat Torrent: Torrent likeDistributed Communications for CubeSat Satellite Clusters,” Military Communi-cations Conference, pp. 1–6, 2012.

[6] D.E. Koelle and R. Janovsky, “Development and Transportation costs of SpaceLaunch Systems,” DGLR/CEAS European Air and Space Conference, 2007.

[7] Kirk Woellert and Pascale Ehrenfreund and Antonio J. Ricco and Henry Hertzfeld,“Cubesats: Cost-effective Science and Technology Platforms for Emerging andDeveloping Nations,” Advances in Space Research, vol. 47, no. 4, pp. 663 – 684,2011.

[8] Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing onLarge Clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.

[9] S. Lee and J. Puig-Suari, Coordination of Multiple CubeSats on the Dnepr LaunchVehicle, M.S. Thesis. California Polytechnic State University, December 2006.

[10] B. Klofas, J. Anderson, K. Leveque, “A Survey of CubeSat CommunicationSystems,” 5th Annual CubeSat Workshop - Cal Poly, 2008.

[11] MoreDBs team at University of CalPoly, “Massive Operations, Recording, andExperimentation Database System (2011, April 15).,” Retrieved July 16, 2012, fromhttp://moredbs.atl.calpoly.edu/, 2008.

[12] Norman G. Fitz-Coy, “Space Systems Group (ssg) (2008, aug 26).,” Retrieved July16, 2012, from http://www2.mae.ufl.edu/ssg/.

[13] Janise Y. McNair, “Wireless and Mobile Systems Laboratory (wam) (2008, aug26).,” Retrieved July 16, 2012, from http://www.wam.ece.ufl.edu/.

98

Page 99: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

[14] Tzu Yu. Lin, Takashi Hiramatsu, Narendran Sivasubramanian and Norman G.Fitz-Coy, “T-c3: A cloud computing architecture for spacecraft telemetry collection,”Retrieved July 16, 2012, from http://www.swampsat.com/tc3, 2011.

[15] GENSO Consortium, “Global Educational Network for Satellite Operations (2009,jun 20).,” Retrieved July 16, 2012, from http://www.genso.org/, 2009.

[16] R. Scrofano, P.R. Anderson, J.P. Seidel, J.D. Train, G.H. Wang, L.R. Abramowitz,J.A. Bannister and D. Borgeson, “Space-based local area network,” MilitaryCommunications Conference, 2009., pp. 1–7, 2009.

[17] Nestor Voronka, Tyrel Newton, Alan Chandler and Peter Gagnon, “ImprovingCubeSat Communications,” CubeSat Developers Workshop, Cal Poly, 2013.

[18] Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung, “The Google FileSystem,” SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 29–43, 2003.

[19] K. Shvachko, Hairong Kuang, S. Radia and R. Chansler, “The Hadoop DistributedFile System,” IEEE Symposium on Mass Storage Systems and Technologies(MSST), pp. 1–10, 2010.

[20] Mahadev Satyanarayanan, J.J. Kistler, P. Kumar, M.E. Okasaki, E.H. Siegel andD.C. Steere, “Coda: A Highly Available File System for a Distributed WorkstationEnvironment,” IEEE Transactions on Computers, pp. 447–459, 1990.

[21] Sun Jian, Li Zhan-huai and Zhang Xiao, “The Performance Optimization of LustreFile System,” 7th International Conference on Computer Science Education(ICCSE), pp. 214–217, 2012.

[22] Apache Software Foundation, “Apache Thrift,” Retrieved July 16, 2012, fromhttp://thrift.apache.org/, January 2012.

[23] Apache Software Foundation (2012, Feb 6)., “HDFS: Hadoop Distributed FileSystem,” Retrieved July 16, 2012, from http://hadoop.apache.org/, June 2012.

[24] “Florida University SATellite V (FUNSAT V) Competition,” Retrieved July 16, 2012,from https://vivo.ufl.edu/display/n958538186, 2009.

[25] “Tethers SDR (2012): Software Defined Radio (SWIFT SDR) BasedCommunication Downlinks for CubeSats,” Retrieved July 16, 2012, fromhttp://goo.gl/Q5fut, 2012.

[26] “RelNav: Relative Navigation, Timing and Data Communicationsfor CubeSat Clusters,” Retrieved July 16, 2012, fromhttp://www.tethers.com/SpecSheets/RelNavSheet.pdf.

[27] Paul Muri, Obulapathi N. Challa and Janise Y. McNair, “Enhancing Small SatelliteCommunication Through Effective Antenna System Design,” Military Communica-tions Conference, 2010, pp. 347–352, 2010.

99

Page 100: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

[28] R. Russell, D. Quinlan and C. Yeoh, “Filesystem Hierarchy Standard,” RetrievedJuly 16, 2012, from http://refspecs.linuxfoundation.org/FHS 2.3/fhs-2.3.pdf, January2003.

[29] A. William, Beech, D. E. Nielsen and J. Taylor, “AX.25 Link AccessProtocol for Amateur Packet Radio,” Retrieved July 16, 2012, fromhttp://www.tapr.org/pdf/AX25.2.2.pdf, 1998.

[30] “CubeSat Space Protocol: A Small Network-layer Delivery Protocol Designed forCubeSats,” Retrieved July 16, 2012, from https://github.com/GomSpace/libcsp,April 2010.

[31] B. Cohen, “The BitTorrent Protocol Specification Standard,” Retrieved July 16,2012, from http://www.bittorrent.org/beps/bep 0003.html, January 2008.

[32] Obulapathi N. Challa and Janise Y. McNair, “Distributed Data Storage on CubeSatClusters,” Advances in Computing, pp. 36–49, 2013.

[33] Gokul Bhat, Obulapathi Challa, Paul Muri and Janise McNair, “RobustCommunications for CubeSat Cluster using Network Coding,” 3rd InterplanetaryCubeSat Workshop, 2013.

[34] Paul Muri and Janise McNair, “A Survey of Communication Sub-systemsfor Intersatellite Linked Systems and CubeSat Missions,” JCM, vol. 7, no. 4,pp. 290–308, 2012.

100

Page 101: CUBESAT CLOUD: A FRAMEWORK FOR …ufdcimages.uflib.ufl.edu/UF/E0/04/61/39/00001/CHALLA_O.pdfCUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSING AND COMMUNICATION OF REMOTE

BIOGRAPHICAL SKETCH

Dr. Obulapathi N. Challa was born and brought up in India. He received a B.S. in

Information and Communication Technology from DA-IICT in India, a M.S. in Computer

Engineering and a Ph.D. in Cloud Computing from the University of Florida. He worked

as a Research Assistant with Dr. Janise McNair and was a part of Wireless and Mobile

Laboratory, and Small Satellite Group at University of Florida. His interests include

Cloud Computing, BigData, Small Satellites, Open Source and Distributed Systems.

101