cubesat cloud: a framework for …ufdcimages.uflib.ufl.edu/uf/e0/04/61/39/00001/challa_o.pdfcubesat...
TRANSCRIPT
CUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSINGAND COMMUNICATION OF REMOTE SENSING DATA ON CUBESAT CLUSTERS
By
OBULAPATHI NAYUDU CHALLA
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2013
© 2013 Obulapathi Nayudu Challa
2
I dedicate this to my family, my wife Sreevidya Inturi, my mother Rangamma Challa, my
father Ananthaiah Challa, my sister Sreelatha Chowdary Lingutla, my brother-in-laws
Ramesh Naidu Lingutla, Sreekanth Chowdary Inturi, my father-in-law Sreenivasulu
Chowdary Inturi, my mother-in-law Venkatalakshmi Inturi, my brothers Akshay Kumar
Anugu and Dheeraj Kota and my uncle Venkatanarayana Pattipaati, for all their love and
support.
3
ACKNOWLEDGMENTS
It has been a great experience being a student of Dr. Janise Y. McNair for the last
five and half years. There was never a time that I did not feel cared for, thanks to her
constant support and guidance.
I would like to thank my committee Dr. Xiaolin (Andy) Li, Dr. Norman G. Fitz-Coy
and Dr. Haniph A. Latchman for agreeing to serve on my committee. I would like to
thank them for providing valuable feedback in completing my dissertation. Thanks to
the professors at University of Florida Dr. Patrick Oscar Boykin, Ms. Wenhsing Wu, Dr.
Ramakant Srivastava, Dr. Erik Sander, Dr. A. Antonio Arroyo, Dr. Jose A. B. Fortes, Dr.
John M. Shea, Dr. Greg Stitt, Dr. Sartaj Sahni and Dr. Shigang Chen, for teaching me
what all I know today. Thanks to staff at University of Florida Ray E. McClure II, Jason
Kawaja, Shannon M Chillingworth, Cheryl Rhoden and Stephenie A. Sparkman, for
their patience with my countless requests and administrative questions. I would like to
take this opportunity to thank all my Wireless and Mobile Group colleagues, past and
present, for being there with me and helping me all along in one way or other. I would
like to thank Alexander Verbitski for his mentorship during my internship.
I would like to thank my teachers Sreedevi, Uma Kantha, Nalini Sreenivasan, K.
Ramakrishna, K. Bhaskar Naidu, Sambasiva Reddy, A Koteswar Rao, A. K. Rama
Rao, Dr. Vijay Kumar Chakka, Dr. Gautam Dutta and Dr. Prabhat Ranjan who greatly
influenced my life. Internet and Open Source have made this world a true Vasudhaika
Kutumbam for me. I would like to thank Linus Torvalds, creator of Linux; Richard
Matthew Stallman, founder of GNU; Vint Cerf, father of Internet; Tim Berners-Lee,
inventor of the World Wide Web; Guido Rossum, creator of Python programming
language; Satoshi Nakamoto, inventor of Bitcoin; Masashi Kishimoto, creator of Naruto;
Mark Shuttleworth, founder of Ubuntu and Tim O’Reilly, the founder of O’Reilly Media.
Life at University of Florida has been always fun and exciting, thanks to the
wonderful friends around here: Dan Trevino, Dante Buckley, Gokul Bhat, Hrishikesh
4
Pendurkar, Jimmy (Tzu Yu) Lin, Karthik Talloju, Krishna Chaitanya, Kishore Yalamanchili,
Madhulika Dandina, Manu Rastogi, Paul Muri, Rakesh Chalasani, Ravi Shekhar, Seshu
Pria, Shruthi Venkatesh, Subhash Guttikonda, Udayan Kumar, Vaibhav Garg, Vijay
Bhaskar Reddy and Vivek Anand. I would like to thank Mr. Iqbal Qaiyumi, Dr. Shaheda
Qaiyumi, Mr. Jagat Desai and Mrs. Vatsala Desai for taking care of me like their son.
Thanks to my long-distance friends Bhargavi Vanga, Praveen Kumar, Radha Vummadi,
Uday Kumar, Uzumaki Naruto and Vijay Kumar, who have been close even when they
were far.
Lastly, I would like to thank my family - my wife Sreevidya Inturi, my mother
Rangamma Challa, my father Ananthaiah Challa, my sister Sreelatha Chowdary
Lingutla, my brother-in-laws Ramesh Naidu Lingutla, Sreekanth Chowdary Inturi, my
father-in-law Sreenivasulu Chowdary Inturi, my mother-in-law Venkatalakshmi Inturi,
my brothers Akshay Kumar Anugu and Dheeraj Kota and my uncle Venkatanarayana
Pattipaati. Their endless love and support throughout the years has meant more to me
than words can express. I would like to dedicate my dissertation to them.
5
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
CHAPTER
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1 CubeSat Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1 Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Evolution of CubeSat Networks . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Summary and Limitations of CubeSat Communications . . . . . . 242.3 Distributed Satellite Systems . . . . . . . . . . . . . . . . . . . . . . . . . 252.4 Classification of Distributed Satellite Systems . . . . . . . . . . . . . . . . 272.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5.1 Distributed Storage Systems . . . . . . . . . . . . . . . . . . . . . 272.5.2 Distributed Computing Techniques . . . . . . . . . . . . . . . . . . 30
3 NETWORK ARCHITECTURE OF CUBESAT CLOUD . . . . . . . . . . . . . . 34
3.1 Components of the CubeSat Network . . . . . . . . . . . . . . . . . . . . 353.1.1 Space Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.1.2 Ground Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 System Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2.1 Cluster Communication . . . . . . . . . . . . . . . . . . . . . . . . 383.2.2 Space Segment to Ground Segment Communication . . . . . . . 393.2.3 Ground Segment Network Communication . . . . . . . . . . . . . . 40
3.3 CubeSat Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.1 Storage, Processing and Communication of Remote Sensing Data
on CubeSat Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.2 Source Coding, Storing and Downlinking of Remote Sensing Data
on CubeSat Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 DISTRIBUTED STORAGE OF REMOTE SENSING IMAGES ON CUBESATCLUSTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 Key Design Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.1.1 Need for Simple Design . . . . . . . . . . . . . . . . . . . . . . . . 454.1.2 Low Bandwidth Operation . . . . . . . . . . . . . . . . . . . . . . . 45
6
4.1.3 Network Partition Tolerant . . . . . . . . . . . . . . . . . . . . . . . 454.1.4 Autonomous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.1.5 Data Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 Shared Goals Between CDFS, GFS and HDFS . . . . . . . . . . . . . . . 464.2.1 Component Failures are Norm . . . . . . . . . . . . . . . . . . . . 464.2.2 Small Number of Large Files . . . . . . . . . . . . . . . . . . . . . 464.2.3 Immutable Files and Non-existent Random Read Writes . . . . . . 47
4.3 Architecture of CubeSat Distributed File System . . . . . . . . . . . . . . 474.3.1 File System Namespace . . . . . . . . . . . . . . . . . . . . . . . . 494.3.2 Heartbeats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 File Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.4.1 Create a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.4.2 Writing to a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.4.3 Deleting a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Enhancements and Optimizations . . . . . . . . . . . . . . . . . . . . . . 524.5.1 Bandwidth and Energy Efficient Replication . . . . . . . . . . . . . 52
4.5.1.1 Number of nodes on communication path = replicationfactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.1.2 Number of nodes on communication path >replicationfactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.1.3 Number of nodes on communication path <replicationfactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.2 Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.5.3 Chunk Size and Granularity . . . . . . . . . . . . . . . . . . . . . . 564.5.4 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5.5 Master Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5.6 Worker Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5.7 Chunk Corruption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.5.8 Inter CubeSat Link Failure . . . . . . . . . . . . . . . . . . . . . . . 584.5.9 Network Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.7 Summary of CubeSat Distributed File System . . . . . . . . . . . . . . . . 59
5 DISTRIBUTED PROCESSING OF REMOTE SENSING IMAGES ON CUBESATCLUSTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1 CubeSat MapMerge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.2 Command and Data Flow during a CubeSat MapMerge Job . . . . . . . . 615.3 Fault Tolerance, Failures, Granularity and Load Balancing . . . . . . . . . 63
5.3.1 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.3.2 Master Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.3.3 Worker Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.3.4 Task Granularity and Load Balancing . . . . . . . . . . . . . . . . . 64
5.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.5 Summary of CubeSat MapMerge . . . . . . . . . . . . . . . . . . . . . . . 65
7
6 DISTRIBUTED COMMUNICATION OF REMOTE SENSING IMAGES FROMCUBESAT CLUSTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.1 CubeSat Torrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.2 Command and Data Flow During a Torrent Session . . . . . . . . . . . . . 676.3 Enhancements and Optimizations . . . . . . . . . . . . . . . . . . . . . . 67
6.3.1 Improve Storage Reliability and Decrease Storage Overhead . . . 676.3.2 Using Source Coding to Improve Downlink Time . . . . . . . . . . 696.3.3 Improving the Quality of Service for Real-time Traffic Applications
Like VoIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4 Fault Tolerance, Failures, Granularity and Load Balancing . . . . . . . . . 71
6.4.1 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.4.2 Master Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.4.3 Worker Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.4.4 Task Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.4.5 Tail Effect and Backup Downloads . . . . . . . . . . . . . . . . . . 73
6.5 Simulation Results and Summary of CubeSat Torrent . . . . . . . . . . . 73
7 SIMULATOR, EMULATOR AND PERFORMANCE ANALYSIS . . . . . . . . . . 75
7.1 Hardware and Software of Master and Worker CubeSats for Emulator . . 757.2 Hardware and Software of Server and Ground Station for Emulator . . . . 777.3 Network Programming Frameworks . . . . . . . . . . . . . . . . . . . . . 77
7.3.1 Twisted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.3.2 Eventlet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.3.3 PyEv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.3.4 Asyncore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.3.5 Tornado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.3.6 Concurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.4 Twisted Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.5 Network Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.6 CubeSat Cloud Emulator Setup . . . . . . . . . . . . . . . . . . . . . . . . 797.7 CubeSat Cloud Simulator Setup . . . . . . . . . . . . . . . . . . . . . . . 797.8 CubeSat Reliability Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 827.9 Simulation and Emulation Results . . . . . . . . . . . . . . . . . . . . . . 82
7.9.1 Profiling Reading and Writing of Remote Sensing Data Chunkson Raspberry Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.9.2 Processing, CubeSat to CubeSat and CubeSat to Ground StationChunk Communication Time . . . . . . . . . . . . . . . . . . . . . . 83
7.9.3 Storing Remote Sensing Images using CubeSat Cloud . . . . . . . 847.9.4 Processing Remote Sensing Images using CubeSat Cloud . . . . 857.9.5 Speedup and Efficiency of CubeSat MapMerge . . . . . . . . . . . 867.9.6 Downlinking Remote Sensing Images Using CubeSat Cloud . . . . 877.9.7 Speedup and Efficiency of CubeSat Torrent . . . . . . . . . . . . . 887.9.8 Copy On Transmit Overhead . . . . . . . . . . . . . . . . . . . . . . 897.9.9 Source Coding Overhead . . . . . . . . . . . . . . . . . . . . . . . 89
8
7.9.10 Metadata and Control Traffic Overhead . . . . . . . . . . . . . . . . 907.9.11 Comparison of CDFS with GFS and HDFS . . . . . . . . . . . . . 907.9.12 Simulator vs Emulator . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.10 Summary of Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 95
8 SUMMARY AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
9
LIST OF TABLES
Table page
2-1 CubeSat data speeds and downloads . . . . . . . . . . . . . . . . . . . . . . . 25
10
LIST OF FIGURES
Figure page
1-1 CubeSat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2-1 Generations of CubeSat networks . . . . . . . . . . . . . . . . . . . . . . . . . 23
2-2 Genso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2-3 Architectural overview of the Google File System . . . . . . . . . . . . . . . . . 29
2-4 Architectural overview of the Hadoop Distributed File System . . . . . . . . . . 31
3-1 Architecture of a CubeSat network . . . . . . . . . . . . . . . . . . . . . . . . . 34
3-2 Architecture of a CubeSat cluster . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3-3 A blown up picture of ESTCube-I CubeSat, showing its subsystems . . . . . . 36
3-4 Ground station . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3-5 Ground station antenna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3-6 Overview of CubeSat Cloud and its component frameworks . . . . . . . . . . . 42
3-7 Integration of CubeSat Distributed File System and CubeSat Torrent . . . . . . 44
4-1 Architecture of CubeSat Distributed File System . . . . . . . . . . . . . . . . . 48
4-2 Bandwidth and energy efficient replication . . . . . . . . . . . . . . . . . . . . . 53
4-3 Copy on transmit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5-1 Example of CubeSat MapMerge . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5-2 Overview of execution of CubeSat MapMerge on CubeSat cluster . . . . . . . 62
6-1 Overview of CubeSat Torrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7-1 Raspberry Pi mini computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7-2 CubeSat Cloud emulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7-3 CubeSat Cloud simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7-4 Lifetimes of CubeSats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7-5 Read and write times of a chunk . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7-6 CubeSat to CubeSat and CubeSat to ground station chunk communicationprofiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7-7 File distribution time for various file sizes and cluster sizes . . . . . . . . . . . . 86
11
7-8 File processing time for various file sizes and cluster sizes . . . . . . . . . . . . 86
7-9 Speedup of CubeSat MapMerge . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7-10 Efficiency of CubeSat MapMerge . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7-11 File downlinking time for various file sizes and cluster sizes . . . . . . . . . . . 89
7-12 Speedup of CubeSat Torrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7-13 Efficiency of CubeSat Torrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7-14 Bandwidth overhead due to replication . . . . . . . . . . . . . . . . . . . . . . . 92
7-15 Bandwidth overhead due to source coding . . . . . . . . . . . . . . . . . . . . . 92
7-16 Bandwidth and energy overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7-17 Bandwidth consumption of CDFS vs GFS and HDFS . . . . . . . . . . . . . . . 93
7-18 Write time of CDFS vs GFS and HDFS . . . . . . . . . . . . . . . . . . . . . . 94
7-19 Energy consumption of CDFS vs GFS and HDFS . . . . . . . . . . . . . . . . . 94
7-20 Simulator vs emulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
12
Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy
CUBESAT CLOUD: A FRAMEWORK FOR DISTRIBUTED STORAGE, PROCESSINGAND COMMUNICATION OF REMOTE SENSING DATA ON CUBESAT CLUSTERS
By
Obulapathi Nayudu Challa
December 2013
Chair: Janise Y. McNairMajor: Electrical and Computer Engineering
CubeSat Cloud is a novel vision for a space based remote sensing network that
includes a collection of small satellites (including CubeSats), ground stations, and a
server, where a CubeSat is a miniaturized satellite with a volume of a 10x10x10 cm
cube and has a weight of approximately 1 kg. The small form factor of CubeSats limits
the processing and communication capabilities. Implemented and deployed CubeSats
have demonstrated about 1 GHz processing speed and 9.6 kbps communication speed.
A CubeSat in its current state can take hours to process a 100 MB image and more than
a day to downlink the same, which prohibits remote sensing, considering the limitations
in ground station access time for a CubeSat.
This dissertation designs an architecture and supporting networking protocols to
create CubeSat Cloud, a distributed processing, storage and communication framework
that will enable faster execution of remote sensing missions on CubeSat clusters. The
core components of CubeSat Cloud are CubeSat Distributed File System, CubeSat
MapMerge, and CubeSat Torrent. The CubeSat Distributed File System has been
created for distributing of large amounts of data among the satellites in the cluster. Once
the data is distributed, CubeSat MapReduce has been created to process the data
in parallel, thereby reducing the processing load for each CubeSat. Finally, CubeSat
Torrent has been created to downlink the data at each CubeSat to a distributed set of
ground stations, enabling faster asynchronous downloads. Ground stations send the
13
downlinked data to the server to reconstruct the original image and store it for later
retrieval.
Analysis of the proposed CubeSat Cloud architecture was performed using a
custom-designed simulator, called CubeNet and an emulation test bed using Raspberry
Pi devices. Results show that for cluster sizes ranging from 5 to 25 small satellites,
faster download speeds up to 4 to 22 times faster - can be achieved when using
CubeSat Cloud, compared to a single CubeSat. These improvements are achieved at
an almost negligible bandwidth and memory overhead (1%).
14
CHAPTER 1INTRODUCTION
A CubeSat is a miniaturized satellite primarily used for university space research
[1]. It has a volume of exactly one litre, weighs no more than one kilogram and is built
using commercial off-the-shelf components [2]. Future satellite systems are envisioned
to be made up of a cluster or constellation of smaller satellites like CubeSats in support
of huge monolithic satellites together forming a distributed space network. However,
weight, volume, power and geometry constraints of CubeSats must be overcome in
order to provide required processing, storage and communication capabilities. Figure
1-1 shows the picture of a CubeSat. A CubeSat has only about 1 GHz processor, 1
GB of RAM, 32 - 64 GB of flash memory and 9.6 kbps communication capability [2]
[3]. On other hand, remote sensing missions like weather monitoring, flood monitoring
and volcanic activity monitoring require intensive processing or downlinking large
amounts of data. With its limited resources, a CubeSat can take hours to process one
remote sensing image and days to downlink the same [4] [5]. Thus, processing and
communication systems have become bottlenecks for employing CubeSats on remote
sensing missions.
The advantages of CubeSats are its low cost, low round trip time for communication
of ground station and they are easy to experiment with. The manufacturing cost of a
typical large satellite weighing about 1000 kg is on the order of hundreds of millions
of dollars [6] because all the components are custom made and need to be tested
extensively before launch. However, most of the components of a CubeSat are
commercial off the shelf components (COTS). Only the payload is custom designed
from ground up. Thus, CubeSats can be engineered at a price of about half a million to
a few million dollars. This cost is orders of magnitude less than the cost of a typical large
satellite [7] [3]. Launches can be achieved in groups of CubeSats at one time.
15
Figure 1-1. CubeSat
Image courtesy of NASA. Picture by Paul Adams.
Large satellites are launched into geostationary earth orbit (GEO) or highly elliptical
orbit or high earth orbit (HEO) orbit which are 36000 km or 50000 km. As a result of
the long distance between earth and satellite, signal propagation delay is about 120
ms and round trip time approximates to about 250 ms. CubeSats are launched into low
earth orbit (LEO) orbit which is about 600 - 800 km from earth. As a result, the RTT for
the signal reduces to about 10 ms, which could better quality of service for applications
like real time tracking and voice applications, when compared to that of RTT for GEO or
HEO satellite.
Finally, since a CubeSat mission costs half a million to a few million dollars and can
be launches in large numbers using a single rocket, mission failure is not fatal. Since
mission failure is not fatal, and costs are lower, new technologies can be easily inserted
into an existing space network via CubeSats.
CubeSats have very limited resources to accomplish meaningful remote sensing
missions. Typical processing power of CubeSats is about 1 GHz and has 1 GB of
RAM. As a result, computational power of CubeSats is not sufficient for executing
16
processing intensive remote sensing missions. CubeSats use structurally simple low
gain antennas like monopole or dipole and have limited power budget of about 500 mW
for communication system. The typical communication data rate between a CubeSat
and a ground station is about 9.6 kbps [5]. As a result, large amounts of data can not
be downloaded to ground stations in a reasonable amount of time. CubeSats have low
memory, processing, battery power and communication capabilities. Timing constraints
are too tight to have long communication windows. Each CubeSat is controlled
individually. Currently, there is no meaningful way of controlling multiple CubeSats
using a unified control mechanism. As a result, a single CubeSat cannot perform
processing and communication intensive remote sensing missions in a meaningful time.
1.1 CubeSat Cloud
In this work we propose CubeSat Cloud, a framework for distributed storage,
processing and communication of remote sensing data. We demonstrate that CubeSat
Cloud can store remote sensing data on a CubeSat cluster in a distributed fashion to
allow the possibility of distributed computation and communication, speeding up remote
sensing missions. For distributing remote sensing data, CubeSat Cloud uses CubeSat
Distributed File System. For distributed processing and communication, CubeSat Cloud
uses CubeSat MapMerge and CubeSat Torrent respectively. We reduce the bandwidth
and energy consumption by energy efficient replication and liner block source coding.
In chapter 2 we outline the evolution of CubeSat Network and present relevant
background in distributed satellite systems, storage systems and computing techniques.
In chapter 3, we describe the architecture of CubeSat Network, which consists of two
segments, namely a space segment and a ground segment. the space segment is
designed to be a CubeSat cluster with a radius of about 100 km. It consists of Sensor
nodes and Worker nodes which are inter-connected using high speed communication
links. Sensor CubeSat has sensing subsystem and act as Master of the cluster while
executing remote sensing missions. Worker nodes are typical 1U CubeSats (10 x 10 x
17
10 cm cube) with standard subsystems. Ground segment is made up of a ground station
server and several ground stations. Ground stations are connected to the server via the
Internet. Ground stations act as relays between ground station server and CubeSats.
On top of the described network architecture, we build the CubeSat Cloud platform.
In Chapter 4, we describe the three core components of CubeSat Cloud namely
CubeSat Distributed File System (CDFS), CubeSat MapMerge and CubeSat Torrent.
CubeSat Distributed File System is used for distributing the remote sensing data to the
nodes in the CubeSat cluster. Once the remote sensing data is distributed, CubeSat
MapMerge and CubeSat Torrent are used for processing and downlinking the remote
sensing data.
CDFS splits large sized remote sensing data into chunks and distributes the chunks
to the Worker nodes in the cluster. CDFS uses ”Copy-On-Transmit” for creating replicas
with very low bandwidth and energy overhead. Source coding is used for reducing
the storage and bandwidth overhead for missions which require only downlinking of
remote sensing data. We demonstrate that CDFS can store data reliably, without loss
of any data, even if a limited number of CubeSats go offline. Distributing the remote
sensing data to nodes in the cluster allows the possibility of distributed processing and
distributed communication for speeding up the remote sensing missions.
In chapter 6, we describe the working of CubeSat MapMerge in detail. CubeSat
MapMerge is a distributed processing framework inspired by Google MapReduce [8].
Worker nodes process the chunks stored with them in parallel. Failures are detected
using the Heartbeat mechanism and failed executions are re-scheduled on other worker
nodes. We demonstrate that CubeSat MapMerge can speed up processing of large
sizes of remote sensing data on CubeSat clusters by a factor of the size of cluster (i.e.,
the number of CubeSats in the cluster) and is resilient to worker and communication link
failures.
18
In Chapter 7 we explain the downlink process. Multiple raw or processed chunks
are downlinked in parallel to ground stations. Once a chunk is downlinked to a ground
station, it is forwarded to the ground station server. After receiving all chunks, the Server
uses the chunks to reproduce sensor image. We demonstrate that CubeSat Torrent
can speed up the downlinking of large files by a factor of the cluster size (number of
CubeSats in the cluster) and is resilient to worker and communication link failures.
To test the performance of the system, we built a CubeSat Cloud simulation
framework and a CubeSat Cloud testbed for emulation. We describe the testbed
and simulation setup in detail in chapter 8. CubeSats are emulated using Raspberry
Pi mini-computers, while the terrestrial Server and ground stations are emulated
using standard desktop computers. CubeSat Cloud is written in Python programming
language using Twisted, an event based asynchronous network programming framework.
Simulation results indicate that CubeSat MapMerge and CubeSat Torrent, with cluster
sizes in the range of 5 - 25 CubeSats together enable 4.75 - 23.15 times faster
(compared to a single CubeSat) processing and downlinking of large sized remote
sensing data. This speed is achieved at an almost negligible bandwidth and memory
overhead (1%). Emulation results from the CubeSat Cloud testbed agree with simulation
results and, indicate that our proposed CubeSat Cloud can speed up remote sensing
missions by a factor of the size of the CubeSat cluster with minimal overhead, while
achieving asynchronous download with short communication windows.
19
CHAPTER 2BACKGROUND
The CubeSat concept was initiated by Professor Twiggs as a teaching tool to
help students learn the process of developing, launching and operating satellite.
CubeSats are currently designed for low Earth orbits. They are well suited for distributed
sensing applications and low data rate communications applications. Unlike the large
monolithic satellites, CubeSats are built to a large degree using commercial off the shelf
components (COTS). Engineering CubeSats using COTS equipment and following
standards in design and development have shortened the development cycles and
reduced costs. CubeSats are typically launched and deployed using a mechanism called
P-POD [9], developed and built by Cal Poly. P-PODs are mounted to a launch vehicle
carrying CubeSats and deploy them once the proper signal is received from the launch
vehicle. The P-POD Mk III has a capacity for three 1U CubeSats. A P-POD can deploy
three 1U or one 1U and one 2U or one 3U CubeSat.
CubeSats carry one or two scientific payloads like magnetic field sensor, image
sensor or ion concentration finder. Several companies and research institutes offer
regular launch opportunities in clusters of several cubes. CubeSat as a specification for
constructing and deploying pico satellites accomplishes following goals:
1. Encapsulation of launcher-payload interface: CubeSat standard eliminates asignificant amount of managerial work and makes it easy to mate a piggybacksatellite with its launcher.
2. Unification among payloads and launchers: Satellites adhering to CubeSatstandard can be interchanged quickly for one another and thus enables utilizationof launch opportunities on short notice.
3. Simplification of pico satellite infrastructure: CubeSat standard it possible to designand produce an operational small satellite at a very low cost.
2.1 Remote Sensing
Acquiring information about an object with out making physical contact with it is
called Remote sensing. Usually it refers to gathering information about atmosphere and
earth surface using satellites. Remote sensing can be performed in either passive or
20
active sensors. Passive remote sensors use natural radiation reflected by the object
under observation. Film photography, infrared, charge-coupled devices and radiometers
are examples of passive remote sensors. Active remote sensors make use of radiation
source and observe objects using the scattered or reflected radiation. Examples
of active remote sensors include RADAR and LiDAR. It is easy to collect the data
from inaccessible and dangerous places using remote sensing. Weather monitoring,
deforestation monitoring, glacial activity monitoring, volcano monitoring, flood and other
disaster monitoring are some of the examples of remote sensing applications. Each
data point collected by remote sensors is typically anywhere between 10 MB to 100
MB. Resolution of a remote sensor varies from 1 m - 1000 m per pixel depending on
the sensor. remote sensing data is immutable. It does not change after acquisition.
With its given resources, a single CubeSat can take about 10 hours for processing
a remote sensing image and 2 days to downlink the same [5]. Execution of remote
sensing missions using CubeSats can be speeded up by parallelizing the processing
and downlinking of the remote sensing images using a cluster of CubeSats.
2.2 Evolution of CubeSat Networks
Since the launch of the first CubeSat into space in 2003, CubeSat communication
networks have evolved in several ways. Very early CubeSats used to communicate
with their home ground station only as shown in the Figure 2-1. These networks can
be classified as generation 1 CubeSat networks. A typical CubeSat in 600 - 800 km
orbit has a window of about 8 minutes and it gets in contact with ground station about
4 times a day. This limited the communication window to about 25 minutes per day.
The first generation CubeSats operated at speeds of 1.2 kbps. This limits the total
downlink capacity to 1.8 MB. However no CubeSat achieved this high downlink or uplink
bandwidth due to various limitations including inefficient protocols, large amount of
beacon data, power constraints, unreliable communication systems on board. As a
21
result, most missions collected a modest amount of about a few MB of data (<12 MB)
for their whole lifetime [10].
With the introduction of MoreDBs (Massive operations, recording, and experimentation
Database System) [11], CubeSat networks made the next significant step in their
evolution. MoreDBs is a system to manage all data generated by Cal Poly small
satellites. It is an attempt to consolidate all satellite information into a single, readily
accessible location to make data analysis more efficient. Using networks like MoreDBs,
mission controllers can collect the beacons from their small satellites received by other
amateur radio operators as shown in the Figure 2-1. This significantly increased the
amount of small satellite health information available and also served to track the
whereabouts of small satellite. These efforts brought CubeSat networks into generation
2.
However, MoreDB has following two significant limitations. First, MoreDB architecture
requires mission specific software to be developed and distributed to the ground station
facilities. Second, any modification to the packet format requires an upgrade of software
at all the ground stations. This is cumbersome and error prone.
As a result of these limitations, the solution is not scalable to a large number of
CubeSats. In order to overcome these limitations, Space Systems Group (SSG) [12]
and Wireless And Mobile Lab [13] at University of Florida developed A Cloud Computing
Architecture for Spacecraft Telemetry Collection (T-C3) [14], a scalable and flexible
means of collecting the telemetry data. T-C3 is an effort to improve MoreDB and make it
a universal telemetry decoding solution solution for CubeSats. Instead of decoding the
received beacon at the amateur radio station directly, T-C3 forwards the beacon to the
T-C3 central server which fingerprints the beacon and decodes the beacon using that
satellites telemetry format, making it a much more scalable and flexible solution [14].
GENSO (Global Educational Network for Satellite Operations) [15] is the next
significant milestone in the evolution of CubeSat networks. GENSO was founded to
22
Figure 2-1. Generation 1 (a) and Generation 2 (b) CubeSat Networks
Image courtesy of Space Systems Group. Picture by Tzu Yu (Jimmy) Lin.
create a network of amateur radio stations around the world to support the small satellite
operations of various universities. GENSO has been designed as a distributed system
connected via the Internet as shown in the Figure 2-2. The satellite can communicate
with the main base station through any arbitrary available relay station. With a single
ground station, a university can gather about 25 minutes of data from a CubeSat in a
day. Using the GENSO network, mission controllers can gather hours of worth data
per day by receiving data via hundreds of networked radio stations around the world.
It will also allow them to command their spacecraft from the other ground stations.
GENSO and other similar efforts can be classified as generation 3 CubeSat networks.
GENSO plans to have a built-in database of all the satellites. This database can be
used to predict and automate the tracking of the satellites to collect the telemetry in
an efficient way. Once the data is downlinked, the data will be provided to respective
mission controllers.
23
Figure 2-2. Architecture of GENSO
2.2.1 Summary and Limitations of CubeSat Communications
Table 2-1 summarizes the data speeds and data downloads of CubeSat communication
systems. Typical characteristics of a CubeSat communication subsystem can be
summarized as follows. Data rate is 9600 baud, power rating 500 mW with an efficiency
of about 25% and a total download of 12 MB has been achieved so far using 13
satellites for a period of 5 years. As one can see, communications is the primary
bottleneck for emerging remote sensing missions. In order to improve the downlink
speed we developed CubeSat Torrent, a distributed communications framework for
24
CubeSat clusters. This proposal envisions the next generation of CubeSat Networks,
which is the distributed satellite system.
Table 2-1. CubeSat data speeds and downloadsParameter Min Max Average
Speed 1200 bps 38.4 kbps 9600 bpsPower 350 mW 1500 mW 500 mW
Frequency 433 MHz 900 MHz NATotal Download 320 KB 6.77 MB 0.5 - 5 MB
2.3 Distributed Satellite Systems
A distributed system is a collection of independent components that work together
to perform a desired task and appears to end user as a single coherent system.
Examples of distributed Systems include the World Wide Web (WWW), Clusters,
Network of Workstations or Embedded Systems, Cell processor etc., These distributed
systems are fueled by the availability of powerful and low cost microprocessors and
high speed communication technologies like Local Area Network (LAN). As the price
to performance ratio of microprocessors drop and speed of communication networks
increase, distributed computing systems have much better price-performance ratio than
a single large centralized system.
As more and more CubeSats are launched, it is becoming apparent that some
space research needs may be better met by a group of small satellites, rather than by
a single large satellite. This is akin to the paradigm shift that happened in the computer
industry a few decades ago: shift of focus from large, expensive mainframes to using
smaller, cheaper, more adaptable sets of distributed computers for solving challenging
problems [16].
Distributed satellite systems have their own advantages and challenges. Due to the
advances in modern VLSI technology that create integrated circuits with lower power
and smaller in size, and due to subsystems like RelNAV Software Defined Radio [17]
that have enabled high speed satellite communication, distributed satellite systems
25
potentially have a much better price to performance ratio than a single large monolithic
satellite. Applications like weather monitoring and tracking are inherently distributed
in nature and may be better served by a distributed system than a centralized system.
Monolithic satellite architecture requires that each satellite must have all of the sensing,
processing, storage and communication peripherals on board. Distributed satellite
systems can share resources like sensing, memory, processing and communications,
as well as information. The multiplicity of sensors, storage devices, processors and
communication devices means there is no single point of failure. Critical information
can be duplicated allowing the system to continue to work even if some components
fail. Similarly, distributed satellite systems may have better availability than centralized
satellite systems, due to the ability of the system to work at reduced capacity when
components fail.
Finally, distributed satellite systems enjoy the advantage of incremental growth.
The functionality of a distributed satellite system can be gradually increase by adding
more satellites as and when need arises. Distributed small satellite systems rely on what
is called horizontal scaling, where one employs more satellites to serve an increased
need.
On other hand, distributed satellite systems are more complex and difficult to
build than monolithic satellite systems. Several challenges such as orbit planning,
resource management, communication and data management, security must be
addressed [16]. There is very little or no support for distributed data storage, processing
or communications for distributed satellite systems. Distributed satellite systems need
fast and low power backbone network for data and control information exchange. The
backbone network must be reliable and should prevent problems such as message
loss, overloading and saturation. Distributed systems store data at several places. It
provides more access points for critical information. As a result, additional security
measures need to be taken to safeguard data and systems. Finally, finding out problems
26
in distributed satellite systems and troubleshooting them requires detailed analysis of
each satellite and communication between them.
2.4 Classification of Distributed Satellite Systems
Constellation, Formation Flying and Swarm / Cluster are three main types of
distributed satellite systems [16]. A group of satellites in similar orbits with coordinated
ground coverage complementing each other is called a Constellation. Satellites in a
constellation do not have on-board control of their relative positions and are controlled
separately from ground control stations. Iridium and Teledesic are well known examples
of satellite constellations. A group of satellites with coordinated motion control, based
on their relative positions, to preserve the topology is called a Flying Formation. Position
of a satellite in a flying formation is controlled by onboard closed-loop mechanism.
Satellites of a flying formation work together to perform the function of a single, large,
virtual instrument. TICS, F6 and Orbital Express are well known examples of flying
formations. A group of satellites, without fixed absolute or relative positions, working
together to achieve a joint goal is called a Cluster or Swarm. More about satellite
clusters is presented in the Chapter 3.
2.5 Related Work
2.5.1 Distributed Storage Systems
Below we present an overview of related work done in the fields of distributed
storage, processing and communications. We surveyed some well known distributed file
systems such as The Google File System (GFS) [18], Hadoop Distributed File System
(HDFS) [19], Coda [20] and Lustre [21]. Owing to simplicity, fault tolerant design and
scalability, architecture of GFS and HDFS suits well for distributed storage on CubeSat
clusters. Below we present an overview of distributed storage systems, GFS and HDFS.
Google File System is the major storage engine for large scale data at Google. A
brief summary of GFS is as follows. Architecture of Google File System is shown in the
Figure 2-3. GFS consists of two components: a master and one or more chunk servers.
27
GFS functions similarly to a standard POSIX file library but is not POSIX compatible.
Each file is split into fixed size blocks called chunks. Each chunk is 64 MB in size, by
default. Chunks are stored on chunk servers. Metadata information like constituent
chunks of a file, file to chunk mapping, chunk to chunk server mapping is stored with
master. Chunk servers store the actual data in the form of chunks. When clients interact
with Google file system, large share of communication will be between the client and
chunk servers. This avoids master as the bottleneck for transferring large files in and out
of Google File System.
Client machine communicates to the Google File System through the client class
library. It translates the open, read, write and delete file system calls into Google File
System calls. Client library communicates with the master for metadata operations and
chunk servers for actual data operations. The interface of GFS client is very similar to
that of POSIX file system. To work with GFS, no knowledge about distributed systems
is required. GFS client abstracts all the required distributed knowledge. However, some
localised chunk information is used for scheduling MapReduce jobs on nodes to improve
the efficiency. Each of the open, read, write and delete operations are implemented in
the following way. GFS client requests master for metadata information including file to
chunk mapping, chunk to chunk server mapping and communicates with chunk servers
for actual data. Once the operation is complete metadata of the Master is updated to
reflect the new state of the file system.
Hadoop Distributed File System (HDFS) is an open-source implementation of
Google File System. Hadoop is written in Java programming language. It can be
interfaced with C++, Python, Ruby and many other programming languages using its
Thrift [22] interface. It is designed for storing hundreds of gigabytes or even petabytes
of data and for fast streaming access to the application data. Similar to GFS, HDFS
supports write-once-read-many semantics on files.
28
Figure 2-3. Architectural overview of the Google File System
29
HDFS also uses master/slave architecture. Architecture of Hadoop Distributed File
System is shown in the Figure 2-4. In HDFS, NameNode plays the role of the master
node of GFS. It controls the namespace and implements the access control mechanism
for data stored in HDFS. DataNodes takes care of managing the local storage hardware.
In order to bring the cost of the implementation down, these nodes run open source
operating system, typically a GNU/Linux system. When a file copied into HDFS, it is split
into blocks and distributed to DataNodes. Fault tolerance of the stored data is achieved
through replication. Each chunk is replicated 3 times, by default. HDFS documentation
is available at their website [23].
There are several limitations of GFS and HDFS for storing remote sensing data
on CubeSat clusters. Unlike wired communication channels, wireless communication
channels in space are more unreliable. Communication links break often leading to
network partitions. GFS and HDFS are not partition tolerant. GFS and HDFS are
not optimized for power consumption. For CubeSat clusters, power is a very scarce
resource. Also, cost of communication is significantly high for wireless links, compared
to wired links. GFS and HDFS are designed as generic data storage platforms. They
are not tailored for storing remote sensing data. Their generic design is too complex for
CubeSat Clusters. Using GFS or HDFS causes a lot of overhead in terms of processing,
memory, bandwidth and power. We designed CubeSat Distributed File System to
overcome the above mentioned problems and tailored it for storing remote sensing data
on CubeSat clusters by using large chunk sizes and load balancing.
2.5.2 Distributed Computing Techniques
Distributed computing is a form of computing where processing is performed
simultaneously on many nodes. The key principle behind distributed computing is that
most of the large problems can be divided into smaller problems, which can be solved
concurrently. Cheap computing nodes are connected using high speed backbone
network to form a cluster to execute the smaller problems. We surveyed distributed
30
Figure 2-4. Architectural overview of the Hadoop Distributed File System
31
computing techniques such as Common Object Request Broker Architecture (COBRA),
Web services, Remote Procedure Call (RPC), Remote Method Invocation (RMI) and
MapReduce that are used for distributed processing on computing machines. Below we
present a brief overview of them.
Distributed objects technique involves distributed objects communicating via
messages. Common Object Request Broker Architecture (COBRA), JAVA Remote
Method Invocation (RMI), IBM Websphere MQ, Apple’s NSProxy, Gnustep, Microsoft’s
Distributed Component Object Model (DCOM) and .Net are well known examples of
this model. Owing to its platform independence and interoperable nature, CORBA
programs can work together regardless of the programming languages used. Java RMI,
IBM Websphere MQ, Apple’s NSProxy, Microsoft’s DCOM and .Net are proprietary
technologies. They are not independent of the programming language and are not quite
versatile.
Web services is the way through which web based applications operate via
the HTTP protocol. Web services uses Simple Object Access Protocol (SOAP) for
exchanging structured information between the web components; JavaScript Object
Notation (JSON) to exchange data between web services in human readable format,
Web Services Description Language (WSDL) to provide a machine readable(machine-readable)
description of a web service and Universal Description, Discovery and Integration
(UDDI) for describing web services.
Message Passing Interface (MPI), Open Message Passing Interface (OpenMPI),
Open Multi Processing (OpenMP) and Parallel Virtual Machine (PVM) are the prevalent
technologies in message passing interface category. These technologies are used in
massively parallel applications and supercomputing when data needs to be distributed
and communicated efficiently.
Sockets is a popular option for client server based architectures, like mail servers
and web servers. Their availability on any system equipped with TCP/IP stack makes it
32
an attractive option. Traffic can easily be re-routed to different ports using secure shell
(SSH), Secure Sockets Layer (SSL) or Virtual Private Network (VPN) connections.
MapReduce is the recent development in the field of distributed computing.
Introduced by Google Inc. in 2004 [18], design simplicity, fault tolerance and ease of
implementation makes it an attractive candidate for large scale distributed processing.
It is a based on the map and reduce primitives of functional languages like Lisp.
MapReduce programs are highly parallelizable and thus can be used for large-scale
data processing by employing large cluster of computing nodes. Companies like Google,
Yahoo! and Facebook use MapReduce to process many terabytes of data on a large
cluster containing thousands of cheap computing machines. MapReduce performs
large-scale distributed computation while hiding the complications of parallelization, data
distribution, synchronization, locking, load balancing and fault tolerance.
We studied in detail about the advantages and disadvantages of the above
mentioned distributed computing techniques. They do not account for salient and unique
features of CubeSats and CubeSat clusters like power, memory and communications
constraints, unreliable wireless communication, high cost of communication and need
for tight locality optimization of data storage and operations. Owing to its simplicity
and fault tolerant design, MapReduce suits well for large scale distributed processing
of remote sensing data on CubeSat clusters. Based on MapReduce, we designed
CubeSat MapMerge to serve the needs of CubeSat community. To overcome the above
mentioned limitations, we designed CubeSat MapMerge and tailored it for processing
remote sensing data on CubeSat clusters.
33
CHAPTER 3NETWORK ARCHITECTURE OF CUBESAT CLOUD
Architecture of CubeSat Network is shown in Figure 3-1. CubeSat Network
consists of space and ground segments. Space segment is a CubeSat Cluster.
Architecture of CubeSat Cluster is shown in Figure 3-2. A CubeSat cluster has a
radius of about 25 km. It consists of Sensor nodes and Worker nodes inter-connected
using high speed communication links. Worker nodes are CubeSats with storage,
processing, communication and other standard subsystems. In addition to the standard
subsystems, Sensor CubeSat has sensing subsystem. Sensor nodes act as Master of
the cluster while orchestrating remote sensing missions. Ground segment is composed
of Server and several ground stations. CubeSat to CubeSat communication links are
short distance, reliable, directional, low power and high speed. CubeSats to ground
station communication links are long distance, high power, low speed and unreliable.
Each CubeSat is connected to a ground station. Ground stations are connected to the
Server via the Internet. Ground stations act as relays between Server and CubeSats.
Figure 3-1. Architecture of a CubeSat network
34
Figure 3-2. Architecture of a CubeSat cluster
3.1 Components of the CubeSat Network
3.1.1 Space Segment
A worker CubeSat is a typical 1U CubeSat that has dimensions of 10 cm x 10 cm
x 10 cm, volume of exactly one litre, weighs about one kilogram. However, it does not
need to be a 1U CubeSat only. It can be 2U or 3U or any other form factor. Worker
CubeSats needs to have storage, processing and communication subsystems. Other
standard subsystems include Satellite Bus, Electrical Power, Structural and Thermal,
Attitude Determination and Control. Worker CubeSat has about 1 GHz processor, 1
GB of RAM, 32 - 64 GB memory. CubeSat to ground station communication speed is
about 9.6 kbps. Sensor CubeSats has sensing module in addition to above mentioned
subsystens. Figure 3-3 shows a blown up ESTCube-I CubeSat showing its various
subsystems.
Sensor node is equipped with sensing hardware. It performs sensing (take an
image or do a radar scan). While orchestrating a mission, a Sensor node acts as the
Master node for CubeSat Cluster. When not orchestrating a mission, Sensor node
performs the role of a worker node. Master node is the primary center for receiving
35
Figure 3-3. A blown up picture of ESTCube-I CubeSat, showing its subsystems
Image courtesy of University of Tartu. Picture by Andreas Valdmann.
36
commands from the server and issuing subcommands to worker CubeSats in the
cluster. It keeps track of all the metadata related to the mission including the list of
participating nodes, their resource capabilities, map jobs, merge jobs, downlink jobs
and their status. It keeps track of all the resources available in the cluster, their state
and tracks available resources on each node. It is also responsible for taking scheduling
decisions like which job needs to be scheduled on which node and when. Worker nodes
have limited role of executing the processing and downlinking jobs assigned to them by
the Master node.
3.1.2 Ground Segment
Ground station or amateur radio station is an installation that enables communication
with CubeSat satellite. Figure 3-4 shows ground station control equipment and Figure
3-5 shows high directional Yagi antenna used for communicating with satellites. It
contains high gain directional antennas like Yagi or parabolic dish antenna, communication
equipment like modems and computers to send, capture and analyse the data received.
There are several types of amateur radio stations including fixed ground stations,
mobile stations, space stations, and temporary field stations. Most of the radio stations
are established to provide an educational and recreational purposes for providing
technical expertise, skills and volunteer manning to promote attendance by the public,
communications education for the public.
Ground station server is a dedicated computer system that is connected to ground
stations through Internet. It receives commands from Administrator and uplinks the
commands to CubeSats. Once the mission is executed, resulting data is downlinked to
the Server and is stored o its local storage disk. Server acts as the command center for
CubeSat network. Administrator issues commands to the Server, which then forwards
the commands to the Master CubeSat through a ground station. Server node stores all
the downlinked mission data and thus acts as the storage node for the downlinked data
from CubeSat cluster. Ground Stations acts as a relay between CubeSats and server.
37
Figure 3-4. Ground station
Image courtesy of Gator Amateur Radio Club. Image by Tzu Yu (Jimmy) Lin.
They downlink the data from CubeSat and send it to server. They upload commands
and data from server to worker CubeSats which forward them to Master CubeSat.
3.2 System Communication
3.2.1 Cluster Communication
CubeSats are connected to each other through a high speed (>1 Mbps) and
low power consuming backbone network. High gain directed antennas like patch
or LASER [24] are used for inter cluster communication. Vescent photonics [25] is
developing a extremely small and low power optical communications modules for
CubeSats. There has been research on using tethers for low distance, high speed
communication between satellites. RelNav demonstrated a spacecraft subsystem call
38
Figure 3-5. Ground station antenna
Image courtesy of Gator Amateur Radio Club. Image by Tzu Yu (Jimmy) Lin.
SWIFT Software Defined Radio (SDR) [26] [17] that will enable a flock of satellites.
SWIFT SDR subsystem demonstrated by RelNav provides provide following services:
• 1 Mbps inter-satellite communication link for data exchange between CubeSats.
• Relative position and orientation for formation flight.
• Cluster synchronization and timing for coordinated operations and coherentsensing.
3.2.2 Space Segment to Ground Segment Communication
CubeSat geometry prohibits the use of complex antennas [27]. As a result,
CubeSats are connected to ground stations through simple antennas like monopole
or dipole. Coupled with stringent power constraints and distances of order 600 - 800 km,
this resulted in low speed links between CubeSats and ground stations. Typical CubeSat
to ground station speed is about 9.6 kbps [10].
39
3.2.3 Ground Segment Network Communication
Ground stations and Server are connected via the Internet. Internet provides a high
speed (10 Mbps) and reliable wired communication medium between the Server and
ground stations. Power is not a constraint for the Server and ground stations as they are
connected to the electrical grid.
3.3 CubeSat Cloud
We propose CubeSat Cloud, a framework for distributed storage, processing and
communication of remote sensing data on CubeSat Clusters. CubeSat Cloud uses
CubeSat Distributed File System for distributed storage of remote sensing data on
CubeSat Clusters. CubeSat MapMerge is the distributed processing framework used
for processing remote sensing data stored in CDFS. CubeSat Torrent is the distributed
communications framework used for downlinking raw or partially processed remote
sensing data from CubeSat Clusters. Below we describe how remote sensing missions
are executed using the CubeSat Cloud framework.
3.3.1 Storage, Processing and Communication of Remote Sensing Data onCubeSat Clusters
CubeSat Cloud, as a generic framework, can be used for storing, processing
and downlink of remote sensing data from CubeSat Clusters. Once a remote sensing
operation is performed, obtained sensor data is stored on the CubeSat cluster using the
CubeSat Distributed File System. After storing the data on the cluster, it is processed
using CubeSat MapMerge and obtained results are downlinked using CubeSat Torrent.
Figure 3-6 shows the overview of CubeSat Cloud framework consisting of CubeSat
Distributed File System, CubeSat MapMerge and CubeSat Torrent. Below is a detailed
description of how a remote sensing mission is executed using CubeSat Cloud.
1. Server sends the SENSE and STORE command to the Master. Upon receiving theSENSE and STORE command, Master performs remote sensing operation andstores the sensor data on the local file system. Server and Master does not needto have a direct communication link. The command will be relayed through theground station network and space segment.
40
2. Master node then splits the file into chunks C1, C2, C3, . . . Cn. Size of chunksis about 64 Kb. Master node distributes the Chunks to the worker nodes. File tochunk mapping, Chunk to worker mapping and other metadata is stored Master.Splitting the remote sensing data into chunks, distributing them and storing themon worker nodes is achieved using CubeSat Distributed File System. Distributingthe data across the worker nodes in the cluster allows the possibility of processingand downlinking the data in distributed fashion.
3. Server sends the PROCESS command to Master. Master CubeSat commands theWorker CubeSats to processes the stored chunks stored to produce partial results.Obtained partial results are stored on the local file system of worker nodes.
4. Server sends the DOWNLINK command to the Master, which then commands theworker nodes in the cluster to downlink the processed chunks to ground stations.Downlinking the processed chunks to the Server is achieved through CubeSatTorrent.
5. Once the Server receives all the processed chunks, it stitches them into fullsolution. Processing of chunks to produce partial results on Worker nodes andstitching of partial results into the complete solution on Server constitutes theCubeSat MapMerge.
3.3.2 Source Coding, Storing and Downlinking of Remote Sensing Data onCubeSat Clusters
A large number of missions require only downlinking of the remote sensing data
without processing it. For these missions, worker nodes does not require access to the
raw data. This can be used as an opportunity to optimize the CubeSat Torrent missions
by improving the quality of service and reducing the storage overhead. Below we
present in detail about how we utilize source coding for improving the quality of service
and reduce storage overhead. Figure 3-7 shows the overview of how a downlink only
remote sensing missions are executed using CubeSat Cloud and is described below in
detail.
1. Master sends the SENSE, CODE and STORE command to the Server.
2. Upon receiving the SENSE, CODE and STORE command from the Server, Masterperforms remote sensing operation and stores data from the sensor on the localfile system.
41
Figure 3-6. Overview of CubeSat Cloud and its component frameworks
42
3. Master node splits the remote sensing data file into chunks C1, C2, C3 . . . Cn. Sizeof each chunk is about 64 Kb. Then, based on the required redundancy, it createscoded chunks C1’, C2’, C3’ . . . Cm, where m >n.
4. Master distributes coded chunks C1’, C2’, C3’ . . . Cm to the Worker nodes. Masterstores metadata, which includes file to chunk mapping, chunk to Worker mappingand chunk status. Splitting the remote sensing data into chunks, performingcoding, distributing them and storing them on Worker nodes is performed byCubeSat Distributed File System.
5. Server then sends the DOWNLINK command to the Master, which then commandsthe Worker nodes in the cluster to downlink the processed chunks to Groundstations. Downlinking the processed chunks to the Server is performed byCubeSat Torrent.
6. Once the Server receives n out of m chunks, it stitches them into the original file.As long as n out m chunks are available, the original data can still be recovered.Details about performance analysis of source coding are discussed in Chapter 8.
43
Figure 3-7. CubeSat Cloud: Integration of CubeSat Distributed File System and CubeSat Torrent
44
CHAPTER 4DISTRIBUTED STORAGE OF REMOTE SENSING IMAGES ON CUBESAT CLUSTERS
The CubeSat Distributed File System (CDFS) is built for storing large sized remote
sensing files on small satellite clusters in a distributed fashion. While satisfying the
goals of scalability, reliability and performance, CDFS is designed for CubeSat clusters
which use wireless backbone network, are partition prone and have severe power
and bandwidth constraints. CDFS has successfully met the scalability, performance
and reliability goals while adhering to the constraints posed by the harsh environment
and limited resources. It is being used as a storage layer for distributed processing
and distributed communication on CubeSat clusters. In this chapter, we present the
architecture, file system design and several optimizations.
4.1 Key Design Points
4.1.1 Need for Simple Design
A typical CubeSat has about 1 GHz processing capability, 1 GB RAM, 32 GB of
flash storage, 1 Mbps inter cluster communication speed, 9.6 kbps communication
capability and 2 W power generation capability[5]. For CubeSats, processing, bandwidth
and battery power are scarce resources. So the system design needs to be simple.
4.1.2 Low Bandwidth Operation
CubeSat network is built using long distance wireless links (10 km for inter cluster
and 600 km for CubeSat to ground station). As a result, the cost of communication is
very high. As a result, data and control traffic needs to be reduced as much as possible.
4.1.3 Network Partition Tolerant
The backbone medium of communication is wireless and the space environment is
harsh. High velocity of satellites (relative to ground stations) in LEO makes the satellite
to ground station link failure very common. Topology of CubeSat cluster is also very
dynamic, causing the inter satellite links to keep breaking very frequently. Sometimes,
nodes go into sleep mode to conserve power. All the above factors can cause frequent
45
breaking of communication links. As a result, if a node is temporarily unreachable,
system should not treat it as a node failure. System should be tolerant to temporary
network failures or partitions.
4.1.4 Autonomous
Most of the time, individual CubeSats and the whole CubeSat cluster are inaccessible
to human operators. So the software design should take care of all failure scenarios.
A reset mechanism, at the node and network level, should be provided. In case if all
the fault tolerance mechanisms fail, system will undergo reset mechanism and start
working again. As a result, distributed file system should be able to operate completely
autonomously without human intervention.
4.1.5 Data Integrity
Memory failures are fatal for satellite missions. Even though memory chips for
satellites are radiation hardened, high energy cosmic rays can sometimes cause trouble.
For example, the Mars rover Curiosity had suffered a significant setback because of
damage to the memory of its primary computer caused by a high-energy particle.
Hence, data integrity should not be violated.
4.2 Shared Goals Between CDFS, GFS and HDFS
Along with the above design points that CDFS shares additional design points with
GFS and HDFS which are highlighted below.
4.2.1 Component Failures are Norm
Given a large number of CubeSats and communication links, failures are norm
rather than the exception. Therefore, constant monitoring, error detection, fault
tolerance, and automatic recovery must be integral to the system.
4.2.2 Small Number of Large Files
Files are huge by traditional standards. Images and remote sensing data generated
by satellites tend to be in the order of hundreds of megabytes.
46
4.2.3 Immutable Files and Non-existent Random Read Writes
Random writes within a file are practically non-existent. Once written, the files are
only read, and often only sequentially. This kind of access patterns are common for
imaging, remote sensing missions and programs like MapReduce that process this data
and generate new data.
CDFS shares goals of availability, performance, scalability and reliability with GFS
and HDFS. Owing to its radically different operating environment, the system design
points and constraints are very different for CDFS. GFS and HDFS were designed for
non-power constrained cluster of computers connected using high speed wired media.
CDFS is meant for distributed data storage on CubeSat clusters which use wireless
communication medium for exchanging data and have severe power and bandwidth
constraints. Design of CDFS should be simple, operate with very less bandwidth
consumption and operate autonomously without the requirement for human intervention.
It should be tolerant to network partitions, temporary link failures, node failures and
preserve the integrity of the data stored.
4.3 Architecture of CubeSat Distributed File System
Figure 4-1 Shows the architecture of CDFS. A CDFS cluster consists of Sensor
CubeSats and worker CubeSats. Sensor nodes are equipped with sensing module and
thus performs sensing. While orchestrating a mission, a Sensor node plays the role of
the Master (M) node. Worker nodes aid the Master node in processing or downlinking
large files. Here is how CDFS stores a file on the cluster. Administrator will issue a
remote sensing command to the central server (as shown in Figure 4-1). Central server
will transmits the command to a relay ground station which uplinks it to the Master
CubeSat. Upon receiving the command, Master CubeSat will perform sensing, like
taking image or doing a radar scan. Sensing operation will generate large amounts of
data (about 100 MB) which is stored into a local file on Master node.
47
Figure 4-1. Architecture of CubeSat Distributed File System
48
Master node (M) splits this file into blocks called chunks and stores them on worker
CubeSats. Each chunk is identified using unique chunk id. For reliability, each chunk is
replicated on multiple workers. By default, CDFS creates two replicas (a primary replica
and a secondary replica), along with an implicit replica stored on Master node. So in
effect, there are three replicas. Along with the implicit replicas, the Master CubeSat
holds all metadata for the filesystem. Metadata includes the mapping from files to
chunks, the location of these chunks on various workers, the namespace and access
control information. The workers store the actual data. Worker nodes store chunks as
regular files on local flash memory. As shown in the figure, the cluster is organized as a
tree with the Master node as the root node.
4.3.1 File System Namespace
CDFS supports a traditional hierarchical file organization in which a user or
an application can create directories and store files inside them. The file system
namespace hierarchy is similar to that of Linux file systems[28]. The root directory
is “/” and is empty by default. One can create, rename, relocate, and remove files.
CDFS supports hidden files and directories concept in a way similar to that of Linux file
systems. Hidden file or directories start with “.” (period) and contain metadata, error
detection and correction information, configuration information and other miscellaneous
information required by CDFS. These hidden files are stored as regular files rather than
distributed files since these are very small and are used by the system locally. One can
refer to the files stored on a server using notation “cdfs://server/filepath”, where filepath
looks like “/directory1/directory2/ . . . /filename”.
4.3.2 Heartbeats
Several problems can cause loss of data or connectivity between Master and worker
nodes. Problems are diagnosed using Heartbeat messages. Once every 10 minutes,
worker nodes send a Heartbeat message to Master node. Heartbeat message contains
workers current status and problems, if any. Master periodically diagnoses the received
49
Heartbeat messages to detect any problems and rectify them if possible. If a worker
does not send any Heartbeat message with in 10 minutes, Master will mark the node as
temporary failure. If there is no Heartbeat message from worker with in 30 minutes of
time, Master marks the worker as permanent failure. When a node is marked temporary
failure, data chunks assigned to worker will not be replicated on other nodes, instead
the secondary replicas will be marked as primary. After permanent failure, Master
node marks the node as dead. When a node contacts the Master after recovering from
permanent failure, Master refreshes it metadata to reflect the change.
4.4 File Operations
CDFS has a simple interface. CDFS supports file operations create, write, read and
delete. Next section we describe in detail what happens when each of these operations
is performed.
4.4.1 Create a File
Once Master performs remote sensing operation (take an image or do a radar
scan), it generates huge amounts of sensor data. Initially this data will be stored into
a local file on the Master node. Typical size of this file is about 100 MB. Master stores
this file on CubeSat cluster using CDFS to perform distributed processing or distributed
downlinking. Following actions will implemented in sequence when a CDFS file is
created by Master node. It requires filename, and chunk size as parameters. By default,
chunk size is 64 KB and is optional.
1. Master calculates the number of chunks based on the file size and chunk size.(Number of chunks = file size / chunk size).
2. Master generates chunk identifiers called chunk ids and assigns one to eachchunk. Chunk id is an immutable id.
3. Master assigns the chunks to worker nodes. Each chunk is assigned to one workernode in a round robin fashion. A copy of the chunk stored at the selected node iscalled primary replica of the chunk.
50
4. Master stores the above metadata (filename, number of chunks, chunk to chunkid mapping and chunk id to worker node mapping) in its permanent storage andcommunicates the same to the backup Master.
4.4.2 Writing to a File
Write operation is performed by Master node when it wants to copy a local file (on
the Master) to CDFS. Files in CDFS are immutable. They can be written only once
after that are created. Inputs for a writing a file are source filename on Master and the
destination filename CDFS. Following actions happen in sequence when the Master
writes a local file to a CDFS file.
1. For each chunk Master performs the actions described in steps 2, 3, 4, and 5.
2. Master looks up the metadata of the destination file on CDFS to find out the workernode responsible for storing the chunk.
3. Master determines transmission path (from Master to the worker node) using treebased routing algorithm.
4. From the nodes on the transmission path (excluding the Master and destinationworker node),
5. Master randomly picks a node to be the secondary replica of the chunk andnotifies it.
6. Master transmits the chunk to the primary replica node. While the chunk is beingtransmitted to the primary replica node, secondary replica node copies the chunkand stores in its local storage.
7. After storing all the chunks on the cluster, Master commits the metadata to itsmemory and communicates the same to the Server.
4.4.3 Deleting a File
The following actions are performed in sequence when a file is deleted.
1. Administrator issues delete file command to the Server.
2. Server uplinks the command to the Master CubeSat through a relay groundstation.
3. Master node looks up the metadata for the file and sends the delete chunkcommand to all the primary and secondary replicas nodes.
51
4. Once a worker deletes the chunks, it sends ACK to the Master.
5. Once the ACKs are received from all worker CubeSats, Master deletes themetadata for the file.
6. Master CubeSat will send SUCCESS message to the Server through relay groundstation.
4.5 Enhancements and Optimizations
CDFS serves well as distributed data storage on CubeSat Clusters. However
CubeSats have stringent energy constraints and CubeSat clusters have severe
bandwidth constraints. So there is a dire need to reduce energy and bandwidth
consumption. Below we describe the methods we employ for reducing the energy
and bandwidth consumption.
4.5.1 Bandwidth and Energy Efficient Replication
To ensure reliability of data stored, CDFS uses redundancy. Each chunk has three
replicas stored on three different nodes, called replica nodes. But, creating replicas
is both energy and bandwidth consuming. For a CubeSat cluster, both energy and
bandwidth are precious. In order to reduce energy and bandwidth, Master node (Source
node) can be used as the Super Replica Node (Super Replica Node: A node which
stores the replicas of all chunks). Since the Master node performs sensing and has all
the data initially, implicit replicas on Master node are created without any energy and
bandwidth consumption. Using Master node as a Super Replica node essentially means
that CDFS needs to create only two additional replicas. This also means that Master
node should be equipped with sufficiently high storage to store all chunks. But this is a
small cost compared energy and bandwidth saved. The data from the source node is
accessed only if other two replicas are not available, in order to conserve the power of
the source node.
For additional two replicas, any random selection of worker nodes will do a good
job for achieving reliability. But, if replica nodes are carefully selected, energy and
bandwidth consumption can be significantly reduced. Consider the two scenarios
52
A and B depicted in Figure 4-2. In scenario A, the chunk is replicated on nodes M
(the Master node), A and B2. In scenario B, the chunk is replicated on nodes M, B
and B1. The cost of communication (bandwidth and energy consumed) in the first
scenario is 3 times the average link communication cost (from M to A and from M to
B to B2). In the second case, energy consumption is only 2 times the average link
communication cost (from M to B to B1). Storing a chunk on nodes that are on the
same communication path or on nodes which are located close to each other yields
best energy and bandwidth efficiency. Exploiting the above observation, we designed a
novel method for providing reliability with low power and bandwidth consumption. This
technique is called Copy-on-transmit.
Figure 4-2. Bandwidth and energy efficient replication
When the source node transmits the data to a destination, it goes through multiple
hops. Selected nodes, on the communication path, copy the data while it is being
transmitted. This method is very convenient for doing data replication in wireless
networks, without incurring additional energy or bandwidth consumption. Consider the
53
scenarios shown in Figure 4-3. In all cases the source node M, transmits data to the
destination node Z. Below we describe how we replicate data using copy-on-transmit for
different communication path lengths for a replication factor of 3 (1 implicit replica and 2
explicit replicas).
4.5.1.1 Number of nodes on communication path = replication factor
In this case, we replicate the chunk on all nodes along the path including the source
and destination. When the chunk is being transmitted from Node M to Node Z through
Node A, Node A makes a copy of the chunk and stores in its memory. Now the chunk
has three replicas, one each at M, A and Z.
4.5.1.2 Number of nodes on communication path >replication factor
In this case, we replicate the chunk on Master node M, destination node Z and
a random on the path. When the chunk is being transmitted from Node M to Node Z
through Nodes A, B, C, D, and E, Node C makes a copy of the chunk and stores it in its
memory. Now the chunk has three replicas one each at M, C and Z.
4.5.1.3 Number of nodes on communication path <replication factor
In this case, we replicate the data on all nodes on the communication path (Node
M and Node Z) and some additional nodes. This scenario can have two different
sub-scenarios (a) when the destination node is not a leaf node (has children) and
(b) when the destination is a leaf node (no children). These two sub-scenarios are
discussed as Case 3(a) and Case 3(b) below.
Case 3(a) Destination node is not a leaf node: In this case, first we replicate the data
on all nodes (Node M and Node Z), along path. In order to meet the replication
requirement, the communication path is extended beyond the destination node Z to
store data on Node A. This ensures that there are required numbers of replicas.
Case 3(b) Destination node is a leaf node: In this case, first we replicate the data
on all nodes (Node M and Node Z), along path. In order to meet the replication
requirement, one more replica of chunk needs to be created. Master randomly
54
selects another node A and stores chunk on it, ensuring that there are required
number of replicas.
Figure 4-3. Copy on transmit
4.5.2 Load Balancing
The goal of load balancing is to distribute data to the nodes in the cluster in order
to balance one or several of the criteria like storage, processing, communication, power
consumption. When a file is created, number of chunks assigned to a worker node is
proportional to the value of the LBF(node), where LBF is the load balancing function.
Below we explain how we determine the load balancing function for uniform storage,
proportional storage and several other criteria. Custom load balancing function can
be used to perform load balancing according to users wish. However its needs to be
noted that distributing data in order to perform uniform storage might result in uneven
load balancing for processing or communication and vice versa. N is the total number of
55
worker nodes in the cluster and LBF is the load balancing function. The following are the
available load balancing functions available in CDFS:
• Uniform storage / processing / communications per node: LBF(node) = 1 / N
• In proportion to storage capacity of node: LBF(node) = Storage capacity of thenode / Total storage capacity of the Cluster
• In proportion to processing capacity of node: LBF(node) = Processing power ofnode / Total processing power of the Cluster
• In proportion to communication capacity of node: LBF(node) = Communicationspeed of node / Total communication speed of the cluster.
• In proportion to power generation capacity of node: LBF(node) = Power generationcapability of node / Total power generation capability of the cluster
• Hybrid: LBF(node) = a * LBF(node) for storage + b * LBF(node) for processing + c* LBF(node) for communication + d * LBF(node) for power, where a, b, c and d arenormalized proportion coefficients, and sum of a, b, c and d is 1.
For missions that are processing intensive, it is desirable that number of chunks
stored on a node is proportional to the nodes processing power. For communication
intensive missions, it is desirable that number of chunks stored on a node is proportional
to the communication capabilities of the node. For missions that are both processing
and communication, hybrid function can be used. Additionally, in order not to overload
nodes, a capping on the number of chunks stored per node per file is suggested.
4.5.3 Chunk Size and Granularity
By splitting files into a large number of chunks, granularity will be improved. Small
chunks ensure better storage balancing, especially for small files. However, as the
number of chunks increases, so the amount of metadata, metadata operations and
number of control messages which decreases the system performance. In order to
strike balance between the advantages of large chunks with advantages of granularity,
we selected chunk size to be about 64 KB.
56
4.5.4 Fault Tolerance
CDFS is designed to be tolerant for temporary and permanent CubeSat failures
and its performance degrades gracefully with component, machine or link failures. A
CubeSat cluster can contain up to about a hundred CubeSats and are interconnected
with roughly same number of high speed wireless links. Because of a large number of
components and harsh space environment, some CubeSats or wireless links may face
intermittent problems and some may face fatal errors from which they cannot recover
unless hard reset by ground station. Source of the problem can be in application,
operating system, memory, connectors or networking. So failures should be treated as a
norm rather than an exception. In order to avoid system downtime or corruption of data,
system should be designed to handle the failures and its performance should degrade
gracefully with failures. Below we discuss how we handle these errors when they come
up.
4.5.5 Master Failure
Master node stores metadata, which consists of mapping between the files to
chunks and chunks to worker nodes. If the Master node fails, the mission will fail. In
order to avoid mission failure in case of Master failure, metadata is written to Masters
non-volatile memory, like flash, and the same is communicated to the Server. If the
Master reboots because of a temporary failure, a new copy will be started from the last
known state stored in Masters non-volatile memory. In case of failure of Master, worker
nodes will wait until a new Master resumes.
4.5.6 Worker Failure
Worker nodes send Heartbeat messages to master once every 10 minutes. If a
worker reports a fatal error, Master marks the worker node as failed. If a worker does
not send heartbeat message with in 10 minutes, Master will mark the node as temporary
failure. If Master does not receive heartbeat message with in 30 minutes, Master marks
57
the worker as failed. Once the failed node comes back online, Masters metadata will be
refreshed to account the change.
4.5.7 Chunk Corruption
Harsh space environment and the cosmic rays lead to frequent memory corruption.
One of the computer systems of the Mars rover Curiosity had a memory problem due
to high energy particles and resulted in a major setback for mission. Thus, ensuring
the integrity of data stored on CDFS is of paramount important. CDFS uses checksum
of data for detecting bad data. Performing data integrity operations on entire chunk is
inefficient. If a chunk is found to be corrupt, discarding the whole chunk will lead to a lot
of wasted IO. It also requires a lot of time and memory to read the whole chunk (64 KB),
to verify its integrity. Thus, each chunk is split into blocks of 512 bytes. CDFS stores
CRC of each block of data and performs checksum validation at the block level. When a
read operation is performed on a chunk, block by block is read and each block is verified
for data integrity by comparing the stored checksum with newly computed checksum.
This way if one of the blocks is found to be corrupt, only that block is marked bad and
can be read from another healthy replica of the chunk. Employing data integrity check at
the block level ensures that partial IO or downlinking that was done before detecting the
data corruption will not go waste. Doing data integrity at block levels also increases the
availability of data.
4.5.8 Inter CubeSat Link Failure
Owing to harsh space environment, communication links fail often. If a CubeSat to
CubeSat link fails, the child node in the routing tree will retry connect to its parent. If the
link re-establishment is not successful or the link quality is bad, the child node will ping
its neighbours and search for a new parent node and joins the cluster.
4.5.9 Network Partitioning
Sometimes a single CubeSat or several CubeSats may get separated from the
CubeSat cluster. This phenomenon is called network partitioning. In either case, the
58
data stored on the separated nodes will be retained and will be available for downlinking
to the ground stations. Using the downlinked metadata, separated CubeSats can be
contacted by Server via ground stations for downlinking the data.
4.6 Simulation Results
We simulated CubeSat Distributed File System with a CubeSat cluster consisting of
one master node and 5 - 25 worker nodes. Each CubeSat has a processing clocked at
1 GHz, 1 GB RAM, 32 GB of flash storage memory, 1 Mbps inter-cluster communication
link and 9.6 kbps CubeSat to ground station data rate. Our simulation results indicate
that file storing time for 100 MB file on cluster of size 10 is about 12.96 minutes. Since
file storing time is only few minutes, it is negligible compared to file processing and file
downlinking time, which are in hours.
4.7 Summary of CubeSat Distributed File System
We built CubeSat Distributed File System to store large files in distributed
fashion and thus enable distributed applications like CubeSat MapMerge [4] and
CubeSat Torrent [5] on CubeSat Clusters. It treats component and system failures as
a norm rather than the exception and is optimized for processing satellite images and
remote sensing which are huge by nature. CDFS provides fault tolerance by constant
monitoring, replicating crucial data, and does automatic recovery.
In CubeSat Clusters, network bandwidth and power are scarce resources. A
number of optimizations in our system are therefore targeted at reducing the amount of
data and control messages sent across the network. Copy-on-transmit enables making
replicas without any additional or very little bandwidth or energy consumption. Failures
are detected using Heartbeat mechanism. CDFS has built in load balancers for several
use cases like CubeSat MapMerge and CubeSat Torrent and allows use of user defined
custom load balancers.
59
CHAPTER 5DISTRIBUTED PROCESSING OF REMOTE SENSING IMAGES ON CUBESAT
CLUSTERS
Processing power of CubeSats is about 1 GHz. Lack of available power and active
cooling of microprocessors further restricts the available processing power. As a result,
processing intensive remote sensing applications cannot be performed on individual
CubeSats in a meaningful amount of time. Distributed computing offers a solution
to this problem. By pooling processing power of individual CubeSats in a cluster,
processing of large remote sensing files can be speeded up. CubeSat Cloud uses
CubeSat MapMerge to process remote sensing data on CubeSat Clusters.
5.1 CubeSat MapMerge
CubeSat MapMerge is inspired by MapReduce and is tailored for CubeSat clusters.
Master node orchestrates CubeSat MapMerge. Master node commands the worker
nodes to process the chunks stored with them. Worker nodes process the chunks and
produce intermediate results. As soon as the workers process chunks, they downlink
the partial solutions to the Server. Once the Server gets all the results, it stitches the
intermediate solutions to obtain the full solution. Master node takes care of scheduling
map tasks, monitoring them and re-executing the failed tasks. The worker nodes
execute the subtasks as directed by the master. Figure 5-1 shows an overview of
how an image can be processed using CubeSat MapMerge and is explained in brief in
following steps.
1. Master node splits the image into chunks and distributes them to the worker nodesin the cluster using CDFS.
2. Worker nodes process the splits given to them to produce partial solutions anddownlink the solutions to Server.
3. Server stitches the downlinked partial solutions into full solution.
60
Figure 5-1. Example of CubeSat MapMerge
5.2 Command and Data Flow during a CubeSat MapMerge Job
Figure 5-2 shows the flow of data and commands during a CubeSat MapMerge
operation. When the Administrator issues a process command to the Server (Ex:
process image.jpg), the following actions occur in the sequence noted.
1. Uplinking the command: Administrator issues a command to Server. Serverforwards the command to Ground station, which uplinks the command to masterCubeSat. (Ex: take an image of a particular area and process it).
2. Work assignment: Master node commands the worker nodes to process thechunks stored with them and downlink the results.
61
Figure 5-2. Overview of execution of CubeSat MapMerge on CubeSat cluster
62
3. Map phase: Worker node process the chunks stored with them and stores theresult locally.
4. Downlinking the results: As and when a worker node processes a chunk, itdownlinks the solution to a ground station. Ground station forwards the solution toServer. Downlinking of results is achieved through CubeSat Torrent.
5. Reduce phase: Once Server receives all the partial solutions, it stitches them intofull solution.
More details about CubeSat MapMerge are presented in the paper CubeSat
MapMerge [4].
5.3 Fault Tolerance, Failures, Granularity and Load Balanc ing
CubeSat MapMerge is tolerant to temporary and permanent CubeSat failures. Its
performance degrades gracefully with component, machine or link failures. Metadata is
replicated to avoid mission failure in case of failure of the master node. Worker failures
are detected using Heartbeat mechanism. If a worker node fails, the tasks assigned
for the worker node are rescheduled on other worker nodes. Data chunks are split
into a large number of pieces to improve granularity and load balancing. Chunk size is
selected to be about 64 KB in order to balance the advantages of granularity with control
traffic overhead.
5.3.1 Fault Tolerance
CubeSat MapMerge is designed to be tolerant to temporary and permanent
CubeSat failures and its performance degrades gracefully with component, machine or
link failures. A CubeSat cluster can contain up to about a hundred CubeSats and are
interconnected with roughly same number of high speed wireless links. Because of a
large number of components and harsh space environment, some CubeSats or wireless
links may face intermittent problems and some may face fatal errors from which they
cannot recover unless hard reset by ground station. So failures should be treated as the
norm rather than an exception. In order to avoid system downtime or corruption of data,
system should be designed to handle the failures and its performance should degrade
63
gracefully with failures. Below we discuss how we handle these errors when they come
up.
5.3.2 Master Failure
Master node stores metadata, which consists of mapping between the map jobs
to worker nodes and the state of map jobs. In order to avoid mission failure in case
of failure of the master node, periodically metadata is written to masters non-volatile
memory, like flash, and the same is communicated to the Server. If the master reboots
because of a temporary failure, a new copy will be started from the last known state
stored in masters non-volatile memory. If the master cannot recover from error, Map
Reduce mission is aborted and raw data can be downlinked to the Server.
5.3.3 Worker Failure
Worker nodes periodically send Heartbeat messages containing their status and
problems, if any. If a worker reports fatal error, master marks the worker node as failed.
Processing task assigned to the worker node is reset back to its initial idle state and is
scheduled on other worker node containing the replica of the chunk.
5.3.4 Task Granularity and Load Balancing
By splitting the data into a large number of pieces, task granularity will be improved.
CubeSats with a faster processor or special hardware like GPU, DSP or FPGA can
process an order of magnitude large number of map tasks than a standard CubeSat.
Fine task granularity will ensure better load balancing. However, as the number of
chunks increase so does the metadata operations and control messages, leading to
decrease in the system performance. To balance the advantages of granularity with the
control traffic overhead, chunk size is selected to be about 64 KB.
5.4 Simulation Results
We simulated CubeSat MapMerge with a CubeSat cluster consisting of one master
node and 5 - 25 worker nodes. Each CubeSat has a processing clocked at 1 GHz, 1
GB RAM, 32 GB of flash storage memory, 1 Mbps inter-cluster communication link and
64
9.6 kbps CubeSat to ground station data rate. We processed images using de-noise,
entropy, peak detection, segmentation and Sobel edge detection algorithms. We used
Scikit Python image processing library for processing the images. Our simulations
indicate that CubeSat MapMerge, with cluster sizes in the range of 5 - 25 CubeSats,
can process images at about 4.8 - 23.4 times faster than an individual CubeSat. These
results indicate that CubeSat MapMerge can speedup processing intensive remote
sensing missions by a factor of size of the cluster. More detailed results are presented in
Section 8: Simulation results.
5.5 Summary of CubeSat MapMerge
CubeSat MapMerge is a very simple, yet efficient distributed processing framework
for processing of remote sensing images on CubeSat clusters. It treats node and link
failures as a norm rather than an exception and is optimized for processing remote
sensing images. It provides fault tolerance by constant monitoring, replicating crucial
data, and fast and automatic recovery. With Heartbeat mechanism to detect failures and
redundant execution to recover from failures this design is fault tolerant. Optimal chunk
size balances the advantages of granularity with control traffic overhead. Load balancing
takes into account of nodes with multi core processors, graphic processing units,
digital signal processors and FPGAs into account and distributed the data accordingly.
CubeSat MapMerge can speedup processing intensive remote sensing missions by a
factor of size of the cluster.
65
CHAPTER 6DISTRIBUTED COMMUNICATION OF REMOTE SENSING IMAGES FROM CUBESAT
CLUSTERS
Due to stringent space constraints, CubeSats typically use monopole, dipole and
turnstile antennas. As a result, a typical CubeSat to ground station link has a data rate
of 9.6 kbps. Low speed data communication is one of the major bottlenecks for remote
sensing missions that require downlinking of large amounts of data. For emerging
remote sensing missions, communication bottleneck poses a severe threat as the
connectivity with ground station will be very limited, intermittent and comes at a very
high price. As a result, data intensive remote sensing applications cannot be performed
using individual CubeSats in a meaningful amount of time. Distributed communication
offers a solution to this problem. By pooling the communication resources of individual
CubeSats in a cluster, downlinking of large sized remote sensing images can be
speeded up.
We studied CubeSat Communication protocols including AX.25 [29] and CubeSat
Space Protocol (CSP) [30]. All these protocols are point-to-point and does not support
any form of distributed communications for faster downloading of large data files like
images or videos. Currently there are no protocols for downloading data from CubeSat
clusters in a distributed fashion. So we designed CubeSat Torrent based on Torrent
communication protocol to speedup remote sensing missions requiring downlinking of
large amounts of data. CubeSat Cloud uses CubeSat Torrent for distributed downlinking
of remote sensing data from CubeSat Clusters.
6.1 CubeSat Torrent
CubeSat Torrent [5] is a distributed communications framework inspired by Torrent
[31]. CubeSat Torrent works in the following way. Master node plays the role of tracker.
It keeps track of all the worker nodes in the cluster and their available downlink capacity.
When the Server requests for a file to be downlinked, Master node commands the
worker nodes in the cluster to downlink the chunks or partial solutions stored with
66
them. Worker nodes simultaneously downlink chunks to various ground stations.
Ground stations forward the chunks to the Server. Once Server receives all the chunks,
Server stitches them to generate the original file. Figure 6-1 shows an overview of how
CubeSat Torrent works.
6.2 Command and Data Flow During a Torrent Session
1. Uplinking the command: Server sends a file downlink command to the groundstation, which uplinks it to the Master.
2. Distributing the subcommands: Master issues subcommands to the worker nodesstoring the chunks of the file to downlink them.
3. Downlinking the chunks: When a worker gets a chunk downlink command, it readsthe chunk from its local file system and starts downlinking it to the connectedground station.
4. Notification: Upon successful downloading of chunk, worker notifies master andcontinues to next chunk. This process repeats until all chunks are downlinked.
5. Forwarding of chunks: Once ground station receives chunk, it forwards the chunkto Server.
6. Reconstructing original file: Once all the chunks are downlinked to the Server,Server stitches the chunks into the original image.
6.3 Enhancements and Optimizations
We made several enhancements and optimizations to CubeSat Cloud to improve
performance. Below we present the enhancements, particularly for remote sensing
missions, which require only downlinking of remote sensing data.
6.3.1 Improve Storage Reliability and Decrease Storage Overh ead
CubeSat Cloud uses redundancy to provide reliability. Each chunk is replicated 3
times, so that even if a CubeSat fails or loses a chunk, the chunk is still available with
two other CubeSats. Replication provides access to raw data at each worker node so
that data can be processed before it is downlinked to the ground station. It also leads
to a lot of communication and storage overhead. Replicating each chunk 3 times, leads
to 200% storage overhead and 10% - 25% communication and energy consumption
67
Figure 6-1. Overview of CubeSat Torrent
68
overhead. More details about overhead resulting due to replication are discussed in
detail in the paper Distributed Data Storage for CubeSat Clusters [32]. For remote
sensing missions which only need to downlink the data, there is no advantage of having
access to raw data as worker nodes does not process the data. This can be used as
an opportunity to reduce the storage and communication overhead. Once the Master
performs sensing, it creates chunks C1, C2, C3 . . . Cn of raw data. Then, based on the
required redundancy, it creates coded chunks C1’, C2’, C3’ . . . Cm, where m >n. Master
node then distributes these chunks to the worker nodes. As long as n out m chunks
are downlinked, the original image can be recovered. More details about performance
analysis of source coding are discussed in Chapter 8.
6.3.2 Using Source Coding to Improve Downlink Time
Some worker nodes take unusually long time to downlink a chunk. These nodes
are called straggler nodes. There can be several reasons for this like a bad antenna,
cache failures, scheduling of intensive background tasks, very low speed link, etc. If raw
data is downlinked directly, downlink is not complete until the last chunk is downlinked
to the Server. If a straggler node takes very long time to downlink a chunk, no matter
how fast the other nodes downlink the rest of the chunks, file downlink will still be slowed
down due to the delay in downlinking of chunk by the straggler node. To mitigate the
risk of slowdown of downlinking of a file by stragglers, CubeSat Cloud performs uses
duplicate downlinking of last few chunks as explained in CubeSat Torrent. However,
for missions that require only downlinking of remote sensing data, efficiency of this
mitigation mechanism can further be improved by used of source coding. After Master
performs sensing, it creates chunks C1, C2, C3 . . . Cn of raw data. Then, based on the
required redundancy, it creates coded chunks C1’, C2’, C3’ . . . Cm, where m >n. Master
node then distributes these chunks to the worker nodes. When the Master receives the
DOWNLINK command from Server, it starts downlinking the chunks C1’, C2’, . . . in usual
way until N-W chunks are downlinked to the Server, where N is the number of chunks
69
and W is the number of Worker nodes in the Cluster. At that point, only W chunks needs
to be downlinked to the Server to complete the file download. If any of the W chunk
downloads take unusually long, the whole file download will take more time. In order to
prevent slowdown of downlinking of files due to stragglers, Master schedules more than
W chunk downloads. As a result, even if straggler nodes slow down downlinking of some
chunks, required number of chunks (N) will be downlinked to the Server at the highest
possible speed. Once the Server receives N chunks, it undoes the source coding to
create the original file from the downlinked chunks. More details about performance
analysis of source coding are discussed in Section 8.
6.3.3 Improving the Quality of Service for Real-time Traffic A pplications Like VoIP
Real-time traffic applications like VoIP need high quality of service. Traditional
methods provide better quality of service through the use of forward error correction and
or retransmission. Given that bandwidth is premium for CubeSat communications, large
amounts of forward error correction data means high overhead and thus less bandwidth
for actual data. Retransmissions lead to increased downlink times. Other methods
for providing quality of service include the use of multiple channels to send copies of
packets creating redundant transmissions. Although, these methods are computationally
less intensive, they do not ensure resilience to the losses and reduce overall throughput
of the system.
Consider a scenario where CubeSat Torrent is used for streaming data from Master
(Sensor) node. As explained before, Master node splits the raw data into chunks. Let’s
suppose that the data frame (D) for time ti is split into chunks C1, C2, C3 . . . Cn. Master
uses these chunks to create linear coded chunks C1’, C2’, C3’, . . . Cm, where m >n.
Master forwards the coded packets to the worker nodes, which downlink them to the
ground stations. In the process of this downlinking, some packets are lost. Rest of the
packets reach Server, which then stitches them back into D, the original data frame.
Server can obtain D back from coded packets as long as it receives at least n of them
70
or if a maximum of m-n packets are lost in transmission. If the Master node notices that
more than r packets are being lost on their way to Server, it increases the redundancy
by increasing m and thus increasing r. More details and results about our source coding
technique are presented in the paper Robust Communications for CubeSat Cluster
using Network Coding [33].
6.4 Fault Tolerance, Failures, Granularity and Load Balanc ing
CubeSat Torrent incorporates several mechanisms to make itself tolerant to
temporary and permanent CubeSat failures. Its performance degrades gracefully
with communication link failures. Worker node failures are detected using Heartbeat
mechanism. If a worker node fails, the downlink tasks assigned for the worker node
are rescheduled on other worker nodes. Size of data chunks is selected to be 64 KB
to improve granularity and load balancing. Chunk size is selected to be about 64 KB
in order to balance the advantages of granularity with metadata and control traffic
overhead.
6.4.1 Fault Tolerance
CubeSat Torrent is designed to be tolerant to temporary and permanent CubeSat
failures and its performance degrades gracefully with the machine or link failures.
Failures are the norm rather than an exception. A cluster can contain up to hundred
nodes and is connected, with roughly about the same number of ground stations,
through long distance wireless links. The quantity and quality of the links virtually
guarantee that some links break intermittently and are not functional at any given
time, and some will not recover from their failures. Problems can be caused by human
errors, CubeSat mobility, bad antennas, communication system bugs, memory failures,
connectors and other networking hardware. Such failures can result in an unavailable
communication links or can lead to data corruption. Therefore, constant monitoring,
error detection, fault tolerance, and automatic recovery must be a part of the system.
71
Below we discuss how we meet these challenges and how we resolve the problems
when they occur.
6.4.2 Master Failure
Master writes periodic checkpoints of all the master data structures. If the master
task dies, a new copy will be started from the last checkpointed state. Master node
represents the single point of failure for the CubeSat Torrent. In order to avoid mission
failure in case of failure of the master node, periodically metadata is written to masters
nonvolatile memory, like flash, and the same is communicated to the Server. If the
master reboots because of a temporary failure, a new copy will be started from the last
known state stored in masters nonvolatile memory. In case of failure of master, data can
be downlinked to from the worker nodes to the Server.
6.4.3 Worker Failure
Periodically workers send Heartbeat message to Master node. Heartbeat message
contains the status of the worker and problems, if any. If the Master does not receive
Heartbeat message from Master with in 30 minutes, master marks the worker as failed.
Downlink task assigned to the worker is reset back to its initial idle state, and scheduled
on other worker nodes. If a worker loses connection with ground station, it retries
with same or different ground station. If it cannot connect to any ground station within
a certain amount of time, it signals failure to master. Master marks the worker as a
temporary failed node. If the worker cannot connect to the ground station, Master marks
the worker as failed node and reschedules the downlinking job assigned to the failed
worker to another worker.
6.4.4 Task Granularity
Master divides the file to be downloaded into C chunks. Ideally, C should be much
larger than the number of worker machines. Having each worker download many
different chunks improves dynamic load balancing. However, as C increases, so does
the amount of control traffic and delays resulting from exchange of control information.
72
In order to balance the advantages of granularity with the overhead incurred due to
control traffic, C is chosen to be about 64 KB.
6.4.5 Tail Effect and Backup Downloads
Some nodes takes unusually long time to downlink a chunk. These nodes are
called stragglers. Reasons behind them could be a bad antenna or a very low speed
link. To mitigate the risk of slowdown of downlinking or uplinking of a file by stragglers,
CubeSat Torrent uses backup downloads. When a file downlink or uplink operation is
close to completion, the master schedules backup downlinking tasks for the remaining
in-progress chunks. The chunk is marked as downlinked whenever either the primary
or the backup worker finishes downlinking. This is only a design feature and is not
implemented.
6.5 Simulation Results and Summary of CubeSat Torrent
We simulated CubeSat Torrent on a CubeSat cluster consisting of one master,
5 - 25 workers and 5 - 25 ground stations. Each CubeSat has a processing speed of
1 GHz, 1 GB RAM, 32 GB flash storage and ground communication data speed 9.6
kbps. CubeSats in the cluster are connected to each other through 1 Mbps high speed
inter-cluster communication links. Our simulation results indicate that CubeSat Torrent,
with cluster sizes in the range of 5 - 25 CubeSats, enables 4.71 - 22.93 times faster
(compared to a single CubeSat) downlinking of remote sensing data. CubeSat Torrent
can potentially speed up CubeSat missions requiring remote sensing data downlinking
by a factor of size of the cluster.
CubeSat Torrent demonstrates the essential qualities for downlinking of large size
remote sensing data for CubeSat clusters. It is fault tolerant and scalable. It provides
fault tolerance by constant monitoring, replicating crucial data, and fast and automatic
recovery. Optimal chunk size balances amount of overhead from control message
traffic and advantages of granularity. Checksumming is used to detect data corruption.
Proposed design delivers high aggregate throughput which is required for a variety
73
of missions. We achieve this by splitting the file into chunks and downlinking them
in parallel from workers to ground stations. Simplified design and minimal metadata
operations result in very low overhead.
74
CHAPTER 7SIMULATOR, EMULATOR AND PERFORMANCE ANALYSIS
For simulating and measuring performance of CubeSat cloud, we created
a CubeSat Cloud simulator. For verifying the simulation results, we also created
a CubeSat Cloud testbed consisting of 5 CubeSats. We used Raspberry Pi mini
single-board computer for emulating a CubeSat and desktop computer for emulating the
Server and ground stations. Below is a detailed description of CubeSat Cloud simulator
and emulator.
7.1 Hardware and Software of Master and Worker CubeSats for Emul ator
Master and Worker are emulated using Raspberry Pi. Raspberry Pi is a mini
single-board computer developed by the Raspberry Pi Foundation. Figure 7-1 shows
various components of Raspberry Pi. It has a Broadcom BCM2835 system on a chip
(SoC), has 512 MB of RAM and uses an SD card for booting and long-term storage.
Debian and Arch Linux ARM distributions are available for running on Raspberry
Pi. Python is the primary advocated programming language to be used with the
platform, although support for BBC BASIC, C, and Perl is there. Below are more
detailed specifications of a Raspberry Pi Model B single-board computer. We processed
images using de-noise, entropy, peak detection, segmentation and Sobel edge detection
algorithms. We used Scikit Python image processing library for processing the images.
• Processor: Raspberry Pi runs on Broadcom BCM2835 SoC chip. Broadcom chipincludes ARM1176JZFS processor clocked at 700 MHz, floating point unit, andVideoCore 4 GPU.
• Graphics: With VideoCore GPU, Raspberry enables hardware-acceleratedgraphics capable of rendering 1Gpixel/s
• SDRAM: Model B comes with 512 MB RAM. 512 MB is shared with GPU. RAM isgenrally clocked from 400 MHz to 500 MHz.
• Storage: There is no bootable flash disk, instead boots from pluggable SD card. Aminimum of 2 GB is required, but more then 4 GB is suggested.
75
• Power ratings: Raspberry Pi draws about 300 mA (1 W) in idle power mode andabout 700 mA (2.2 W) when all peripherals are active.
• Ports: Raspberry comes with 10/100 BaseT Ethernet, HDMI and 2 USB ports. It ispowered using microUSB interface. Its size is roughly about 9 x 6 x 2 cm.
• Low-level peripherals: It has 8 General Purpose IO (GPIO) pins, a UART, an I2Cbus, a SPI bus with two chip selects and I2S audio.
Figure 7-1. Raspberry Pi mini computer
Image courtesy of Matthew Murray
The specifications in terms of processing power and memory are very similar to that
of a CubeSat. So, we used Raspberry Pi to emulate a CubeSat in the CubeSat Cloud
testbed.
76
7.2 Hardware and Software of Server and Ground Station for Emulat or
Server and ground station hardware are implemented using Dell Optiplex 755 model
desktop computers. Below are the specifications of these machines:
• Processor: It comes with two Intel Core 2 Duo CPU E8400, clocked at 3.00 GHz.
• Memory: It comes with 4 GiB of RAM.
• Graphics: It is powered by VESA RV610 graphics card.
• OS Type: It is configured to run Ubuntu LTS 12.04.03 Precise Pangolin, 32-bitversion.
• Disk: It has 240 GB of storage disk for OS and permanent storage.
We used the open source Ubuntu 12.04.03 Long Term Support (LTS) version as our
base Operating System for Server and ground station. For running twisted applications,
we used Python 2.7.3 version as Python versions above 3.0 did not have full support
for Twisted framework. We used Python python-twisted 11.1.0 built for Ubuntu using the
python-twisted package.
7.3 Network Programming Frameworks
In order to develop CubeSat Cloud framework, we researched available network
programming frameworks in Python, including Twisted, Eventlet, PyEv, asynccore,
Tornado. Below is a brief description of the each of these frameworks.
7.3.1 Twisted
Twisted is considered as the best reactor frameworks available in Python. It is a little
bit complex and has a steep learning curve, but is elegant and provides all necessary
features required for developing asynchronous applications.
7.3.2 Eventlet
Eventlet was developed by Linden Lab. It is based on Greenlet framework, which
is geared towards asynchronous network applications. It is non-pep8 compliant
tough. Logging mechanism is not implemented to the full extent, the API is somewhat
inconsistent.
77
7.3.3 PyEv
PyEv is based on libevent framework. It needs to be developed lot more to be
considered as a serious competitor with other network programming frameworks. There
does not seem to be big companies using this framework, as of now.
7.3.4 Asyncore
Asyncore is based on stdlib and is a very low-level framework. There is not much
support for high-level network operations, so a lot of boiler code needs to be written just
to get started with network applications.
7.3.5 Tornado
Tornado is a very simple python server meant for developing dynamic websites.
It features async HTTP client and a simple ioloop. Its simple, but not provide required
callback features to be considered a candidate for implementing CubeSat Cloud.
7.3.6 Concurrence
Concurrence is a networking framework for creating massively concurrent
network applications in Python. It exposes a high-level synchronous API to low-level
asynchronous IO using libevent. It runs using either Stackless Python or Greenlets.
All blocking network I/O is transparently made asynchronous through a single libevent
loop, so it is nearly as efficient as a real asynchronous server. It is similar to Eventlet in
this way. The downside is that its API is quite different from Python’s sockets/threading
modules.
7.4 Twisted Framework
Of the frameworks, we researched into Twisted was best suited for our job, since it
provided handy features like callbacks, deferreds, etc. along with a strong community
support. It is an asynchronous event based network programming framework. It is
implemented in Python programming language and licensed under open source MIT
license. Call backs are the core part of the Twisted framework. Users write callbacks
78
and register them to be called when events happen (as a connection is made, a
message is received, or connection is lost).
7.5 Network Configuration
CubeSat to ground station communication link is modelled with data rate of 9600
bps, a delay of 2 ms with a jitter of 200 us following normal distribution. We modelled
the CubeSat Cluster communication links using the specifications of RelNAV. Data rate
is 1 Mbps, link communication delay of 0.1 ms. Packet loss rate was set at 0.3%, with
a 25% loss correlation in order to simulate packet burst loses. We used Hierarchical
Token Bucket (HTB) and tc networking tool on Linux to shape the network traffic to our
requirements.
7.6 CubeSat Cloud Emulator Setup
CubeSat Cloud emulator consists of one Server, one Master, 5 Worker nodes
and 5 Ground stations. CubeSat Cloud emulator is shown in the Figure 7-2. Master
and worker CubeSats are emulated using Raspberry Pi, since the CubeSats footprint
(processing power and RAM) matches with that of Raspberry Pi’s. Server and ground
stations are emulated using Dell Optiplex computer. All the components are connected
using a Gigabit Ethernet switch. CubeSat to CubeSat and CubeSat to ground station
communication links are configured as described in the section 7.5.
7.7 CubeSat Cloud Simulator Setup
CubeSat Cloud simulator consists of one Server, one Master, 5 - 25 Worker nodes
and ground stations. System architecture of CubeSat Cloud simulator is shown in the
Figure 7-3. Simulation is run on the Dell Optiplex computer described in section 7.2.
Master and Worker CubeSat are simulated using the profiling results obtained from
the emulator. Components communicate to each using TCP/IP sockets of localhost
interface. CubeSat to CubeSat and CubeSat to ground station communication links are
configured as described in the section 7.5. Simulation results are presented below.
79
Figure 7-2. CubeSat Cloud emulator
80
Figure 7-3. CubeSat Cloud simulator
81
7.8 CubeSat Reliability Model
Data reliability is achieved through replication. Each remote sensing image is
split into chunks and distributed to worker nodes. Each chunk is replicated on multiple
CubeSats, so that if some CubeSats fail, data is still available on other CubeSats.
Number of replicas per chunk is primarily governed by required availability of data and
node failure rate. Availability of an image (A) is given by,
A = (1− fR)C × 100
Where, f is the probability of failure of a node, R is the number of replicas of each
chunk and C is number of chunks of the file. To find the CubeSat failure probability,
we collected data about lifetimes of the CubeSats that are launched so far. Figure 7-4
shows a summary of the lifetime of launched CubeSats. More details about CubeSats
launched so far can be obtained from ”A Survey of Communication Sub-systems for
Inter-satellite Linked Systems and CubeSat Missions” [34]. Using the above data, we
calculated that the mean lifetime of a CubeSat is about 1204 days. And depending on
the downlink speeds and mission data size (about 100 MB), a remote sensing mission
can take about 1 day. So the probability of failure of CubeSat during a mission (f) is
about 10−3. Typical number of chunks per file (C) is about 1000. With a redundancy
of 1 (2 replicas for each chunk) CDFS provides an availability of 99.98 and with a
redundancy of 2 (3 replicas) CDFS provides an availability of 99.9999. We targeted an
availability of 99.9999. So each chunk needs to be replicated 3 times.
7.9 Simulation and Emulation Results
7.9.1 Profiling Reading and Writing of Remote Sensing Data Chunk s on Rasp-berry Pi
In order to build a simulation framework, we did profiling of reading chunks from
flash storage and writing chunks to flash storage of remote sensing data chunks on
Raspberry Pi single board mini-computer. Profiling results are reported in Figure 7-5.
Average reading and writing times for a chunk of size 64 KB are 4.91 and 15.66 ms.
82
Figure 7-4. Lifetimes of CubeSats
This shows that, reading and writing a file of 100 MB will take about 8 and 25 seconds.
Compared to this, processing and downlinking a chunk will take order of hours. So
reading and writing times are negligible compared to time taken for processing and
downlinking a remote sensing image.
7.9.2 Processing, CubeSat to CubeSat and CubeSat to Ground Statio n ChunkCommunication Time
We did profiling of processing time, CubeSat to CubeSat communication time
and CubeSat to Ground station communication time. We processed images using
de-noise, entropy, peak detection, segmentation and Sobel edge detection algorithms.
We used Scikit Python image processing library for processing the images. Processing
time is the average of time taken by Raspberry Pi to process a chunk using the above
mentioned image processing algorithms. Communication links are simulated using
the parameters specified in section 8.5 Network Configuration. Profiling results are
reported in Figure 7-6. Average CubeSat to CubeSat chunk communication time is
83
Figure 7-5. Read and write times of a chunk
about 1.19 seconds, processing time is 15.62 seconds and CubeSat to Ground Station
communication time is about 68.29 seconds. This result shows that chunk processing
time and chunk communication time from CubeSat to ground station are more than
an order of magnitude larger than chunk communication time between CubeSat to
CubeSat. As a result, distributing a file on the cluster will be much faster compared
to processing and downlinking the file. These results also indicate that processing an
image of size 100 MB on a single CubeSat will take about 7 hours and downlinking
the same will take about 30 hours. Hence we need to parallelize processing and
downlinking of remote sensing images.
7.9.3 Storing Remote Sensing Images using CubeSat Cloud
Figure 7-7 shows the time taken for storing (splitting am image into chunks and
distributing the chunks to the worker nodes in the cluster) an image on the CubeSat
84
Figure 7-6. CubeSat to CubeSat and CubeSat to ground station chunk communicationprofiling
cluster for various cluster and image sizes. For cluster size of 1 (a single CubeSat), file
storing time is almost zero (11 seconds for a file of size 100 MB), since the files only
needs to be split into chunks and does not needs to be distributed over the network.
Average file storing time for 100 MB file on cluster of size 10 is about 12.96 minutes.
Since file storing time is only few minutes, it is negligible compared to file processing
and file downlinking time, which are in hours.
7.9.4 Processing Remote Sensing Images using CubeSat Cloud
Figure 7-8 shows the image processing times for various cluster and image sizes.
We processed images using de-noise, entropy, peak detection, segmentation and
Sobel edge detection algorithms. We used Scikit Python image processing library for
processing the images. Processing time is the average of time taken by CubeSat Cloud
to process the remote sensing images using the above mentioned image processing
algorithms. For cluster size of 1 (a single CubeSat), file processing time is 448 minutes.
Average file processing time for 100 MB file on clusters of size 10 and 25 is about 47
and 19 minutes respectively. This results in a savings of 401 minutes for processing a
file on cluster of size 10 and 429 minutes on a cluster of size 25. CubeSat MapMerge
85
Figure 7-7. File distribution time for various file sizes and cluster sizes
reduces the processing time from about 8 hours to less than an hour and thus is
attractive for processing large size remote sensing images.
Figure 7-8. File processing time for various file sizes and cluster sizes
7.9.5 Speedup and Efficiency of CubeSat MapMerge
We studied the variation of speedup and efficiency of CubeSat MapMerge with
variation in cluster size. Speed up is defined as the ratio of time taken by the cluster
86
to process image to the time taken by a single CubeSat to process the same image.
Efficiency is defined as ratio of speed up of the cluster to the cluster size expressed in
percentage. Figure 7-9 shows the variation of processing speedup with cluster size
for large files (>10 MB). For cluster sizes of 10 and 25 the speedup is 9.54 and 23.40
respectively. Figure 7-10 shows the variation of processing efficiency with cluster size
for large files (>10 MB). For cluster sizes of 10 and 25 the efficiency is 95.38 and 93.61
respectively.
Figure 7-9. Speedup of CubeSat MapMerge
7.9.6 Downlinking Remote Sensing Images Using CubeSat Cloud
Figure 7-11 shows the image downlinking time for various cluster and file sizes.
Downlinking time is the time taken by the a CubeSat or CubeSat Cluster to downlink
a remote sensing image to the Server. A single CubeSat takes 1 day 6 hours of
connectivity time to downlink a file of size 100 MB. Compared to that average file
87
Figure 7-10. Efficiency of CubeSat MapMerge
downlinking time for 100 MB file on cluster of size 10 needs only about 3 hours
13 minutes of connectivity. This results in a savings of about 27 hours of time for
downlinking a file of 100 MB. CubeSat Torrent reduces image downlinking time
approximately by the factor of the size of the cluster.
7.9.7 Speedup and Efficiency of CubeSat Torrent
We studied the variation of speedup and efficiency of CubeSat Torrent with variation
in cluster size. Speed up is defined as the ratio of time taken by the cluster to downlink
an image to the time taken by a single CubeSat to downlink the same image. Efficiency
is defined as ratio of total effective data speed of the cluster to the total raw data speed
of the cluster expressed in percentage. Figure 7-12 shows the variation of processing
speedup with cluster size for large files (>10 MB). For cluster sizes of 10 and 25 the
speedup is 9.35 and 22.93 respectively. Figure 7-13 shows the variation of processing
88
Figure 7-11. File downlinking time for various file sizes and cluster sizes
efficiency with cluster size for large files (>10 MB). For cluster sizes of 10 and 25 the
efficiency is 71.95 and 70.59 respectively.
7.9.8 Copy On Transmit Overhead
Figure 7-14 shows the bandwidth overhead due to replication using Copy-On-Transmit
for various cluster sizes. Energy overhead is same as bandwidth overhead. Bandwidth
overhead for Copy On Transmit for cluster sizes of 10 and 25 is 35.71 and 9.61
respectively. Copy On Transmit leads to 200% storage overhead, as it creates two
explicit replicas.
7.9.9 Source Coding Overhead
Figure 7-15 shows the bandwidth overhead for single and double redundancy
due to Source Coding for various cluster sizes. With single redundancy, data can
be recovered in case of one failed CubeSat. Using double redundancy, data can be
recovered, even if two CubeSats fail. Bandwidth overhead for Source Coding for cluster
sizes of 10 and 25 varies from about 5 - 25% depending the number of redundant
chunks and cluster size. Energy overhead is same as bandwidth overhead.
89
Figure 7-12. Speedup of CubeSat Torrent
7.9.10 Metadata and Control Traffic Overhead
Figure 7-16 shows the bandwidth overhead due to metadata and other control
information for various cluster and file sizes. Bandwidth overhead is about 0.4 - 1%.
Bandwidth and overhead percentage is mostly independent of the file size and varies
primary with the cluster size.
7.9.11 Comparison of CDFS with GFS and HDFS
Bandwidth and energy are very limited on CubeSat cluster. CDFS uses several
enhancements like using Master node as super replica node, Copy-on-transmit and liner
block source coding for reducing the energy and bandwidth consumption. Figure 7-17
shows the bandwidth required by CDFS and GFS (as well as HDFS) for writing a file of
100 MB to the cluster. CDFS consumes about 35 - 40% less bandwidth compared GFS
and HDFS. Figure 7-18 shows the time taken by CDFS and GFS (as well as HDFS)
90
Figure 7-13. Efficiency of CubeSat Torrent
for writing a file of 100 MB to the cluster. CDFS writes are about 50% faster than GFS
and HDFS because of super replica node and reduced bandwidth requirements. Figure
7-19 shows the energy required by CDFS and GFS (as well as HDFS) for writing a file
of 100 MB to the cluster. CDFS consumes about 40% less energy compared GFS and
HDFS.
7.9.12 Simulator vs Emulator
Figure 7-20 shows the time required for writing, processing and downlinking remote
sensing image of size 100 MB. Simulator results are about 5-12% more then emulator
results. This discrepancy might be attributed to due delays in the simulation framework
because of large number of threads running simultaneously.
91
Figure 7-14. Bandwidth overhead due to replication
Figure 7-15. Bandwidth overhead due to source coding
92
Figure 7-16. Bandwidth and energy overhead
Figure 7-17. Bandwidth consumption of CDFS vs GFS and HDFS
93
Figure 7-18. Write time of CDFS vs GFS and HDFS
Figure 7-19. Energy consumption of CDFS vs GFS and HDFS
94
Figure 7-20. Simulator vs emulator
7.10 Summary of Simulation Results
We simulated CubeSat Cloud framework on CubeSat Cloud testbed. CubeSat
Cloud framework was developed using Python programming language. We simulated
CubeSat Torrent on a CubeSat cluster consisting of one master, 5 - 25 workers and 5
- 25 ground stations. Each CubeSat has a processor running at 1 GHz, 1 GB RAM, 32
GB non-volatile memory, 1 Mbps inter-cluster communication link and 9.6 kbps ground
station data rate. Server and ground stations are connected to each other via Internet
through 10 Mbps data rate communication links.
We simulated CubeSat Cloud with various cluster sizes. Our simulation results
indicate that for cluster sizes in range of 5 to 25 CubeSats, a speedup of 4.75 - 23.15
times faster (compared to a single CubeSat) processing and downlinking of remote
sensing images can be achieved. Simulation results closely match with results from the
testbed.
95
CHAPTER 8SUMMARY AND FUTURE WORK
Weight, power and geometry constraints severely limit processing and communication
capabilities. A CubeSat has about 1 GHz processing capability, 1 GB RAM, 32 - 64
GB of flash memory and CubeSat to ground station communication data rate of 9.6
kbps. As a result, processing and communication intensive remote missions, which
generate about 100 MB per sensing operation, cannot be completed in a meaningful
amount of time. Processing a remote sensing image of size 100 MB takes about 8
hours and downlinking takes a day and quarter with current infrastructure. We consider
the possibility of using distributed storage, processing and communications for faster
execution of remote sensing missions.
We propose, CubeSat Cloud, a framework for distributed storage, processing
and communication of remote sensing data on CubeSat Clusters. CubeSat Cloud
is optimized for storing, processing and downlinking of large sized remote sensing
data which is of order of hundreds of megabytes. CubeSat Cloud uses CubeSat
Distributed File System for storing remote sensing data in distributed fashion on the
cluster. CubeSat Distributed File System splits the large size remote sensing data in
chunks and distributes them to the worker nodes in the cluster. Metadata consisting of
file to chunk mapping and chunk to worker node mapping is stored with Master node.
For processing distributed data CubeSat Cloud uses CubeSat MapMerge. Worker
nodes process the chunks stored with them and store the results obtained on the local
file system. Once the chunks are processed, they are downlinked to Server using
CubeSat Torrent. Server stitches the partial solutions into full solution. Component
and link failures are treated as norm instead of exceptions. Failures are detected using
Heartbeat mechanism and system is tolerant to component and link failures. CubeSat
Cloud implements several enhancements including copy-on-transmit and linear block
source coding to reduce consumption of scarce resources like power and bandwidth.
96
For simulating CubeSat cloud we created, CubeSat Cloud testbed. We simulated
CubeSats using Raspberry Pis and testbed is written using Python-twisted, an event
based asynchronous network programming framework. Simulation results indicate
that CubeSat MapMerge and CubeSat Torrent, with cluster sizes in range of 5 - 25
CubeSats, enables 4.75 - 23.15 times faster (compared to a single CubeSat) processing
and downlinking of large sized remote sensing data. All this speed is achieved at almost
negligible bandwidth and memory overhead (1%). These results indicate that CubeSat
Cloud can speed up remote sensing missions by a factor of size of cluster.
8.1 Future work
Below is an overview of the future work as an extension to CubeSat Cloud.
Launching and deploying of the CubeSats into CubeSat cluster and maintaining the
cluster for long time periods needs to be looked into. CubeSat Cloud was designed
using Python programming language in order to support rapid prototyping. A flight ready
system can be built using C++ and the network stack can be optimized for CubeSat
communication channel characteristics. Link layer communication protocol can be
integrated into CubeSat Torrent to improve the efficiency of the downloads. From
CubeSat subsystems perspective, a lightweight CubeSat to CubeSat low distance high
speed LASER communication module will significantly enhance the efficiency of the
system and lead to reduced energy consumption.
97
REFERENCES
[1] H. Heidt, J. Puig-Suari, A. Moore and R. Twiggs, “Cubesat: A new Generationof Picosatellite for Education and Industry Low-Cost Space Experimentation,”Proceedings of the Utah State University Small Satellite Conference, Logan, UT ,Citeseer, p. 12, 2001.
[2] Andrew E. Kalman (2010, Jan 15), “CubeSat Kit: Commercial Offthe Shelf Components for Cuebsats,” Retrieved July 16, 2012, fromhttp://www.cubesatkit.com/docs/datasheet/.
[3] J. Gozalvez, “Smartphones Sent Into Space [Mobile Radio],” Vehicular TechnologyMagazine, IEEE, vol. 8, no. 3, pp. 13–18, 2013.
[4] Obulapathi N. Challa and Janise Y. McNair, “Distributed Computing on CubesatClusters using Mapreduce,” iCubeSat, The Interplanetary CubeSat Workshop,2012.
[5] Obulapathi N. Challa and Janise Y. McNair, “CubeSat Torrent: Torrent likeDistributed Communications for CubeSat Satellite Clusters,” Military Communi-cations Conference, pp. 1–6, 2012.
[6] D.E. Koelle and R. Janovsky, “Development and Transportation costs of SpaceLaunch Systems,” DGLR/CEAS European Air and Space Conference, 2007.
[7] Kirk Woellert and Pascale Ehrenfreund and Antonio J. Ricco and Henry Hertzfeld,“Cubesats: Cost-effective Science and Technology Platforms for Emerging andDeveloping Nations,” Advances in Space Research, vol. 47, no. 4, pp. 663 – 684,2011.
[8] Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing onLarge Clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
[9] S. Lee and J. Puig-Suari, Coordination of Multiple CubeSats on the Dnepr LaunchVehicle, M.S. Thesis. California Polytechnic State University, December 2006.
[10] B. Klofas, J. Anderson, K. Leveque, “A Survey of CubeSat CommunicationSystems,” 5th Annual CubeSat Workshop - Cal Poly, 2008.
[11] MoreDBs team at University of CalPoly, “Massive Operations, Recording, andExperimentation Database System (2011, April 15).,” Retrieved July 16, 2012, fromhttp://moredbs.atl.calpoly.edu/, 2008.
[12] Norman G. Fitz-Coy, “Space Systems Group (ssg) (2008, aug 26).,” Retrieved July16, 2012, from http://www2.mae.ufl.edu/ssg/.
[13] Janise Y. McNair, “Wireless and Mobile Systems Laboratory (wam) (2008, aug26).,” Retrieved July 16, 2012, from http://www.wam.ece.ufl.edu/.
98
[14] Tzu Yu. Lin, Takashi Hiramatsu, Narendran Sivasubramanian and Norman G.Fitz-Coy, “T-c3: A cloud computing architecture for spacecraft telemetry collection,”Retrieved July 16, 2012, from http://www.swampsat.com/tc3, 2011.
[15] GENSO Consortium, “Global Educational Network for Satellite Operations (2009,jun 20).,” Retrieved July 16, 2012, from http://www.genso.org/, 2009.
[16] R. Scrofano, P.R. Anderson, J.P. Seidel, J.D. Train, G.H. Wang, L.R. Abramowitz,J.A. Bannister and D. Borgeson, “Space-based local area network,” MilitaryCommunications Conference, 2009., pp. 1–7, 2009.
[17] Nestor Voronka, Tyrel Newton, Alan Chandler and Peter Gagnon, “ImprovingCubeSat Communications,” CubeSat Developers Workshop, Cal Poly, 2013.
[18] Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung, “The Google FileSystem,” SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 29–43, 2003.
[19] K. Shvachko, Hairong Kuang, S. Radia and R. Chansler, “The Hadoop DistributedFile System,” IEEE Symposium on Mass Storage Systems and Technologies(MSST), pp. 1–10, 2010.
[20] Mahadev Satyanarayanan, J.J. Kistler, P. Kumar, M.E. Okasaki, E.H. Siegel andD.C. Steere, “Coda: A Highly Available File System for a Distributed WorkstationEnvironment,” IEEE Transactions on Computers, pp. 447–459, 1990.
[21] Sun Jian, Li Zhan-huai and Zhang Xiao, “The Performance Optimization of LustreFile System,” 7th International Conference on Computer Science Education(ICCSE), pp. 214–217, 2012.
[22] Apache Software Foundation, “Apache Thrift,” Retrieved July 16, 2012, fromhttp://thrift.apache.org/, January 2012.
[23] Apache Software Foundation (2012, Feb 6)., “HDFS: Hadoop Distributed FileSystem,” Retrieved July 16, 2012, from http://hadoop.apache.org/, June 2012.
[24] “Florida University SATellite V (FUNSAT V) Competition,” Retrieved July 16, 2012,from https://vivo.ufl.edu/display/n958538186, 2009.
[25] “Tethers SDR (2012): Software Defined Radio (SWIFT SDR) BasedCommunication Downlinks for CubeSats,” Retrieved July 16, 2012, fromhttp://goo.gl/Q5fut, 2012.
[26] “RelNav: Relative Navigation, Timing and Data Communicationsfor CubeSat Clusters,” Retrieved July 16, 2012, fromhttp://www.tethers.com/SpecSheets/RelNavSheet.pdf.
[27] Paul Muri, Obulapathi N. Challa and Janise Y. McNair, “Enhancing Small SatelliteCommunication Through Effective Antenna System Design,” Military Communica-tions Conference, 2010, pp. 347–352, 2010.
99
[28] R. Russell, D. Quinlan and C. Yeoh, “Filesystem Hierarchy Standard,” RetrievedJuly 16, 2012, from http://refspecs.linuxfoundation.org/FHS 2.3/fhs-2.3.pdf, January2003.
[29] A. William, Beech, D. E. Nielsen and J. Taylor, “AX.25 Link AccessProtocol for Amateur Packet Radio,” Retrieved July 16, 2012, fromhttp://www.tapr.org/pdf/AX25.2.2.pdf, 1998.
[30] “CubeSat Space Protocol: A Small Network-layer Delivery Protocol Designed forCubeSats,” Retrieved July 16, 2012, from https://github.com/GomSpace/libcsp,April 2010.
[31] B. Cohen, “The BitTorrent Protocol Specification Standard,” Retrieved July 16,2012, from http://www.bittorrent.org/beps/bep 0003.html, January 2008.
[32] Obulapathi N. Challa and Janise Y. McNair, “Distributed Data Storage on CubeSatClusters,” Advances in Computing, pp. 36–49, 2013.
[33] Gokul Bhat, Obulapathi Challa, Paul Muri and Janise McNair, “RobustCommunications for CubeSat Cluster using Network Coding,” 3rd InterplanetaryCubeSat Workshop, 2013.
[34] Paul Muri and Janise McNair, “A Survey of Communication Sub-systemsfor Intersatellite Linked Systems and CubeSat Missions,” JCM, vol. 7, no. 4,pp. 290–308, 2012.
100
BIOGRAPHICAL SKETCH
Dr. Obulapathi N. Challa was born and brought up in India. He received a B.S. in
Information and Communication Technology from DA-IICT in India, a M.S. in Computer
Engineering and a Ph.D. in Cloud Computing from the University of Florida. He worked
as a Research Assistant with Dr. Janise McNair and was a part of Wireless and Mobile
Laboratory, and Small Satellite Group at University of Florida. His interests include
Cloud Computing, BigData, Small Satellites, Open Source and Distributed Systems.
101