high performance computing, cloud computing, and remote...
TRANSCRIPT
High Performance Computing, Cloud computing,and Remote Sensing Big Data
Lizhe Wang
Institute of Remote Sensing and Digital Earth (RADI)Chinese Academy of Sciences (CAS)
email: [email protected]
March 2016, Canberra, Australia
Lizhe Wang HPC, Clouds, RS Big Data
Contents
1 RS Big Data: an Overview
2 How Big? a Quantitative Model
3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing
4 RS Big Data Processing in Clouds: Current Work
5 Concluding Remark
Lizhe Wang HPC, Clouds, RS Big Data
ToC
1 RS Big Data: an Overview
2 How Big? a Quantitative Model
3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing
4 RS Big Data Processing in Clouds: Current Work
5 Concluding Remark
Lizhe Wang HPC, Clouds, RS Big Data
International Development of Earth Observation
0
5
10
15
20
25
30
35
40
1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010
卫星数
Nimbus, 1964514 EO satellites till Dec. 2011EO has played an important role for human being society
Lizhe Wang HPC, Clouds, RS Big Data
International Development of Earth Observation
EO satellites in 1962-2012
Lizhe Wang HPC, Clouds, RS Big Data
International Development of Earth Observation
Plan of EO satellites in 2013-2035
Lizhe Wang HPC, Clouds, RS Big Data
Earth Observation in China
China Remote Sensing Satellite Ground Station, a member ofthe Landsat Ground Station Operations Working Group,boasts one of the world’s highest capacities for receiving,processing, and distributing satellite data. With over 3.3million scenes of satellite data accumulated on file since 1986,it is regarded as the largest Earth observation satellite dataarchive in China.Its three stations at Miyun, Kashi, and Sanya can receive datasimultaneously from satellites covering the whole territory ofChina and 70% of Asia.It receives 12 satellites currently, will receive 50 satellite in2025, and become the LARGEST ground station for civil usein the world.
Lizhe Wang HPC, Clouds, RS Big Data
Earth Observation in China
Lizhe Wang HPC, Clouds, RS Big Data
Earth Observation in China
Lizhe Wang HPC, Clouds, RS Big Data
ToC
1 RS Big Data: an Overview
2 How Big? a Quantitative Model
3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing
4 RS Big Data Processing in Clouds: Current Work
5 Concluding Remark
Lizhe Wang HPC, Clouds, RS Big Data
EO Data Processing flow: an Overview
����
DP
S
� �1-9#
1#
��
��
2#
� �1-4#
1-3#
����
����
DP
S
�
����
�
�
���
� �1-9#
� �
� �
� �
� �
� �
DP
S
� �1-9#
1#
��
��
2#
�
����
�
�
���
� �1-9#
� �
� �
� �
� �
� �
� �
� �
� � �
��
��
�� �
��
DP
S
� �1-6#
��
��
4#
�
����
�
�
�
��
� �1-6#
� �
� �
� �
. . .. . .
. . .
���
��
��
��
��'���%���)�!�
SAR �BJ-1�QuickBird�
HJ-1A�
Earth Observation DataDEM
GCP
�%��#%"��&&�!�
�
��" �'%��
����" �'%��
�' "&#��%��
���(���������%"��&&�!�
��!� �"%%��'
�"&���
�(&�"!
%'�"%��'�����'�"!
!�"% �'�"! ��&'%��'�"!
��!�%��
�"%�&'
���
���&&�����'�"!
�� �
��'�%
�!"*
RS Data Processing
��'����$(�%�!���!�����"%��!�
��'����'�� �&�%)�'�"!���'*"%�
L2
Mosaic
�����'�������&�!�
�%"(��'
��"�������!���"%�&'���%� �(�&����
��""�
�!��&�
�%������ �'�������##����'�"!&�
Lizhe Wang HPC, Clouds, RS Big Data
Data acquisition: data volumes and data ratesSatellites Data Rates (MB/S) Data Amount (GB/Day) Data Amount (TB/Year)
ZY-02 113.98 115 40.99ZY-03 304 334 119.05HJ-1A 120 114 40.63HJ-1B 60 57 20.32HJ-1C 320 187.5 66.83
LandSat8 440.00 241.70 47.33RadaSat2 105.00 57.68 11.29RadaSat-1 105.00 57.68 11.29SPOT-5 100.00 54.93 10.76
LANDSAT5 85.00 28.02 9.99SPOT-4 50.00 10.99 3.92
RADARSAT-1 105.00 34.61 12.34ENVISAT 100.00 32.96 11.75IRS-P6 210.00 46.14 16.45
Single RS data set: GBDaily receiving RS data set: TBNational RS data archive: PBInternational RS data archive: EB
Lizhe Wang HPC, Clouds, RS Big Data
Data volumes of a single scene of L0, L1, and L2 Products
Satellites(Sensor) L0(MB) L1(MB) L2(MB)
BJ-1(MS) 228.3 228.3 333.32BJ-1(PAN) 98.9 98.9 150.33
SPOT-5(PAN) 572 572 45.96SPOT-5(MS) 144 144 252.21
LANDSAT7(ETM) 337 337 532LANDSAT8(UTM) 481 481 781
Lizhe Wang HPC, Clouds, RS Big Data
Data volume of value-added processing
Satellites Fine Ortho Regional Mosaic (GB)(Sensor) 1 Scene(MB) 1 Scene(MB) 15 Scenes 512 Scenes
LANDSAT7 511.52 515.3 7.48 255.5ZY-02C(PAN) 201.61 206.65 N/A N/AZY-02C (HRI) 467.3 475.15 N/A N/A
Lizhe Wang HPC, Clouds, RS Big Data
Data Volume Throughout the Entire Flow
Processing
RC RD TR L0 L1 L2 L3 L4 MO CL AR OS
Data
Am
ount
(Log
D):L
og(M
B)
1
2
3
4
5
6
CBERS-02C(HRI) LANDSAT5
Lizhe Wang HPC, Clouds, RS Big Data
Comparative analysis of the data throughput rate
Processing
RC RD TR L0 L1 L2 L3 L4 MO CL AR OS
Spee
d (M
bps)
0
20
40
60
80
100
120
CBERS-02(PAN) CBERS-02(HRI) LANDSAT5
Lizhe Wang HPC, Clouds, RS Big Data
Comparative Analysis of I/O Occupation
Processing(Algorithms)
L1 L2 L3 L4 MO CL
I/O P
erce
ntag
e (%
)
10
20
30
40
50
60
500M1GB5GB10GB22GB
Yan Ma, Lizhe Wang. Towards building an empirical index for data-intensive remotesensing image processing applications. Future Generation Computer Systems.
Lizhe Wang HPC, Clouds, RS Big Data
Computational Complexity of Algorithms
Stage Algorithm Complexity Parallelism
Radiometirc O(n) ExcellentPreprocessing Geometirc O(n2) Excellent
Registration O(n2log(n)) ExcellentK-T Trans O(p2bn2) Good
Value-added Processing Convolution O(k2n2) GoodMosaicking O(n3) Normal
K-mean O(rbmn2) NormalInformation Extraction Bayers O(bmn2) good
BP O(n2log(n)) good
Lizhe Wang HPC, Clouds, RS Big Data
Computational Complexity of Algorithms
Processing(Algorithms)
L0 L1 L2 L3 L4 MO CL AR OS
Com
puta
tiona
l Com
plex
ity: l
ogn(
C)
.5
1.0
1.5
2.0
2.5
3.0
3.5
Complexity (C)
Lizhe Wang HPC, Clouds, RS Big Data
The DIRS model
�� #����)����$ "#
���(!�"����$#� ���)!������� "�$��#
*Flops/IO����" %��!%$��� %�$+
��!�"������#$���$� �� �� ������(
*��!�"�����DI��%"&�#+
� ������� ��������$ " �,F(C,D,N,T..)
� ������������� ����
�� �
Processing
RC Rd TR L0 L1 L2 L3 L4 MO CL AQ
Data
Am
ount
(Log
D):L
og(M
B)
1
2
3
4
5
6
CBERS-02C(HRI) LANDSAT5
Processing
RC Rd TR L0 L1 L2 L3 L4 MO CL AQ
Data
Am
ount
(Log
D):L
og(M
B)
1
2
3
4
5
6
CBERS-02C(HRI) LANDSAT5
�������
������� ����
������
� �!%$�$� ��������'��$�
���������������
F��� �����
�#$���$� ��� �� �
�#$���$� ��� �� �
Lizhe Wang HPC, Clouds, RS Big Data
The DIRS model
DIRS =
0 if
10DC
60T < 1
log10DC
60T if10DC
60T > 1
(1)
Lizhe Wang HPC, Clouds, RS Big Data
Representation of data volume ‘D’
D = max {Din, Dm1, Dm2, . . . , Dout} (2)
��� ����� ��� ��� ���
Lizhe Wang HPC, Clouds, RS Big Data
Representation of time requirement ‘T’
Table: Quantization of computational complexity ‘C’
Computational Complexity ‘C’
O(n) 1O(n2) 2
O(n2log(n)) 2.5O(k2n2) 2.2O(n3) 3
O(n3) ∼ O(n4) 4
Lizhe Wang HPC, Clouds, RS Big Data
Representation of computational complexity ‘C’
Table: Quantization of Time Requirement ‘T’
Time Requirement ‘T’
Second 1Minute 2Hour 3Day 3.5
Week 4>Month 5
Lizhe Wang HPC, Clouds, RS Big Data
Empirical Estimation of DIRS Through the Entire RS DataProcessing Flow
Algorithms ‘C’ ‘D’ ‘T’ DIRS
L1 1 3 2 0L2 2 3 2 2.4L3 2 3 2 2.4L4 2 3 2 2.4
MO 5.5 5.5 3 11.2CL 5.5 3 3 8.4AR 5.5 3 3 11.2OP 4.5 3 3 3.7
Lizhe Wang HPC, Clouds, RS Big Data
Empirical Estimation of the DIRS Indicator
Proessing(Algorithms)
L0 L1 L2 L3 L4 MO CL AR OS
DI I
ndic
ator
(1to
10)
0
2
4
6
8
10
12
DI Indicator (Empirical Estimation)
Lizhe Wang HPC, Clouds, RS Big Data
Some Findings for RS Big Data
Big DataCompute-intensive and data-intensiveUnbalanced Data FlowComplex Algorithm Flow
Lizhe Wang HPC, Clouds, RS Big Data
ToC
1 RS Big Data: an Overview
2 How Big? a Quantitative Model
3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing
4 RS Big Data Processing in Clouds: Current Work
5 Concluding Remark
Lizhe Wang HPC, Clouds, RS Big Data
Philosophy of RS Big Data Computing
Data processing models and algorithmsApproximate computationReduced ComputationPartial Computation
System and SoftwarePlatforms: Clusters, Data Centers, Clouds, High-speednetworkingSoftware system: Parallel file systems, High Performance IO,Programming models, Memory models, Runtime System
Paradigm of RS Data Product ServicePrecise Prediction, On-Demand Computing, Active ServiceStandalization, Serialisation and Automation of RS Dataproduction
Lizhe Wang HPC, Clouds, RS Big Data
Philosophy of RS Big Data Computing
Data processing models and algorithmsApproximate computationReduced ComputationPartial Computation
System and SoftwarePlatforms: Clusters, Data Centers, Clouds, High-speednetworkingSoftware system: Parallel file systems, High Performance IO,Programming models, Memory models, Runtime Systems
Paradigm of RS Data Product ServicePrecise Prediction, On-Demand Computing, Active ServiceStandalization, Serialisation and Automation of RS Dataproduction
Lizhe Wang HPC, Clouds, RS Big Data
High Performance Processing for EO Big Data
Where:EO Big Data and itsApplications
What:Quantitative analysis of EO BigData and its processingalgorithmsStructed, semi-structed andunstructured EO Big DataHigh performance processingalgorithms and software models
Systems and PlatformsClustersSatellite data centresClouds, and Multiple datacenters
Data ModelMemory model: GDDMStorage model: HPGFSIO model: ADIOS,HA
Processing ModelDistributed Memory ProcessingModel(MPI)Workflow ModelBatch Processing ModelStreaming data
Production SystemParallel RS Image ProcessingSystem (PIPS, PIPS+)RS Data Production Systemacross Multiple SatelliteDatacenters
Lizhe Wang HPC, Clouds, RS Big Data
Contents
1 RS Big Data: an Overview
2 How Big? a Quantitative Model
3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing
4 RS Big Data Processing in Clouds: Current Work
5 Concluding Remark
Lizhe Wang HPC, Clouds, RS Big Data
A System Viewr11 r12 r13 r14 r15
r21 r22 r23 r24 r25
r31 r32 r33 r34 r35
r41 r42 r43 r44 r45
r51 r52 r53 r54 r55
r16
r26
r36
r46
r56r61 r62 r63 r64 r65 r66
r17
r27
r37
r47
r57r67
r71 r72 r73 r74 r75 r76 r67
r18
r28
r38
r48
r58r68
r78r81 r82 r83 r84 r85 r86 r87 r88
1 2
5 6
3 4
7 8
11 12
13 14 15 16
9 10
3-� �����
6r43 r44
r34r33
r43 r44
r34r33
r43 r44
r34r33
Z-Order Diagonal Hilbert
15
1 2
5 6
3 4
7 8
11 12
13 14 16
9 10
15
1 2
5 6
3 4
7 8
11 12
13 14 16
9 10
r43 r44
r34r33
4
8
12
1615
11
7
32
6
10
1413
1
5
9
13 14 10 9 5 61 2 4 8 12 1615117 3
13
r43 r44
r34r33
r43 r44
r34r333-� �����
r43 r44r34r33r43 r44r34r33r43 r44r34r33
14 10 9 5 1 2 6 7 3 4 8 12 11 15 16
Bricks ��
(Z-order ��)
��Brick ��
����������
Yan Ma, Lizhe Wang, et. al. Distributed data structure templates for data-intensive remote sensingapplications. Concurrency and Computation: Practice and Experience, 25(12): 1784-1797 (2013)
Lizhe Wang HPC, Clouds, RS Big Data
Implementation
map ( ) 1
MPI_Win
1 3 1 N1 1 1 2
Multi-core Cluster
1
map_f
2
3
4
map_f
1
2
3
N
/Node
/core
T1 T2 T3 T4
(Margin)
T1
T2
T3
T4
T1
T2 T3
T4
T5
T1 T3
T2 T4
T1
T2
T3
T4
Ti Ai
Lizhe Wang HPC, Clouds, RS Big Data
Contents
1 RS Big Data: an Overview
2 How Big? a Quantitative Model
3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing
4 RS Big Data Processing in Clouds: Current Work
5 Concluding Remark
Lizhe Wang HPC, Clouds, RS Big Data
Generic Parallel Processing Model for RS Big Data
RS-GPPS
Generic Algorithm Skeletons
(Templates)
Distributed RS Data Templates
Job Class (RS sequential code/
function)
Data Partition Strategy Function
Skeletons Instantiation Data Templates Instantiation
Parameter Specifying
Parameter Specifying
Massive Remote Sensing Image Parallel Processing Programs
Code Generation
How to generate parallel remote sensing programs with RS-GPPS skeletons
Parameters of Distributed RS Data Templates • Data type of RS image • data partition strategies
Parameter of the Skeleton • Job class
C/C++��Job�
MPI+C/C++�����
Job�
Lizhe Wang, Yan Ma, etc. Yan Ma, Lizhe Wang, et. al. Generic Parallel Programming for Massive RemoteSensing Data Processing. CLUSTER 2012: 420-428
Lizhe Wang HPC, Clouds, RS Big Data
Generic Parallel Processing Model for RS Big Data
RS-GPPS
Template <Class T> Class Skeleton{}
Generic Algorithm Skeletons
(Templates) Distributed RS Data
Templates
Parallel FS
3
N
1
2
1 2
3 4
Data Distribution
Info
Multicore Cluster
1 2 3 N
MPI Runtime Environment
Share Distributed Spaces
0
1 2 3 4
Data Distribution (node mapping)
Parallel I/O
data block load/write
! Implement skeleton (Perform Computations on Distributed RS Data) 1) Divide the Task into subtasks; 2) Load the Data Blocks concurrently through parallel I/O; 3) Execute Sequential Codes in parallel
! Implement Distributed RS Data 1) Slice RS data logically into overlapped blocks; 2) Distributed blocks across nodes; 3) Converged blocks as a Global RS data via one-sided messaging of MPI
one-sided messaging �
Lizhe Wang HPC, Clouds, RS Big Data
Contents
1 RS Big Data: an Overview
2 How Big? a Quantitative Model
3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing
4 RS Big Data Processing in Clouds: Current Work
5 Concluding Remark
Lizhe Wang HPC, Clouds, RS Big Data
RS Data Access Pattern
Image
Memory
File
Read/Write
(b) Rectangular-Block Access Pattern
MPI_IO <offset,count=2>
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
a11 a12 a13 a14
a21 a22 a23 a24
a31 a32 a33 a34
a41 a42 a43 a44
Input
Output
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
2 3
Read/Write
r43 r44r33 r34
r43 r44
r52r51 r53 r54 Read/Write
r53 r54
r33 r34
) (
r43
r53 r54
r33 r34
(a) Neighbor-Based Processing(Regional dependent)
Lizhe Wang HPC, Clouds, RS Big Data
RS Data Access Pattern
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44Image
Memory
File
Read/Write
(b) Cross-File Access Pattern
MPI_IO <offset,count>
a11 a12 a13 a14
a21 a22 a23 a24
a31 a32 a33 a34
a41 a42 a43 a44
Output
Read/Write
r43 r44r33 r34
10
2 3
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
) (
r43 r44r33 r34
(a) Band-Related Processing(Band dependent)
Input
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
Input
Input
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
r43 r44r33 r34 r43 r44r33 r34
r43 r44r33 r34
r43 r44r33 r34
r43 r44r33 r34
r43 r44r33 r34
r43 r44r33 r34
Read/WriteRead/Write
Lizhe Wang HPC, Clouds, RS Big Data
RS Data Access Pattern
Image
Memory
File
Read/Write
(b) Diagnal Access Pattern
1
MPI_IO <offset,count=1>
0
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
a11 a12 a13 a14
a21 a22 a23 a24
a31 a32 a33 a34
a41 a42 a43 a44
Input
Output
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
2
3 Read/Write
r33 r44
r52r51 r53 r54
Read/Write
) )
) ) (
r22r11
r33 r44
(a) Irregular Processing(Global or Irregular Dependent)
4
5Read/Write
r11 r22
r11 r22 r33 r33
Image
Memory
File
Read/Write
(c) Polygon Access Pattern
MPI_IO <offset,count>
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
r41 r42 r43 r44
Read/Write
r31 r32
r52r51 r53 r54
Read/Write
)
) ) (
r22 r31r33
r42
r42r22 r33
r22 r31 r32 r33 r42
r32
)
Lizhe Wang HPC, Clouds, RS Big Data
Optimized Data Layout in HPGFS
���������� �� ����
����������������
����������
1 2
���� ������������������
Geographical Metadata Image Info Map Info Projection Para Satellite & Sensor Info ….
Image Data Image Band 1 Image Band 2 Image Band 3 …. Image Band n
Basic Operations getGeoMetadata() setGeoMetadata() projectionReform() resampling() …..
Geographical�Metadata
Image Info
Map Info Projection Para(differ with projections)
Satellite & Sensor Para(differ with sensors)
UTM
EllipsoidDatumUTM Zone…..
Transvers Mercator
EllipsoidDatumCentral meridianCoordinating orgin……
ETM+Sensor typeOrbitSemi-majorSemi-minorOrbital Inclination…..
RadarsatSensor typeOrbitWave lengthRange Resolution Azimuth……
�����������
Metadata
Geodata
5 6
r11 r12 r13 r14 r15
r21 r22 r23 r24 r25
r31 r32 r33 r34 r35
r41 r42 r43 r44 r45
r51 r52 r53 r54 r55
r16
r26
r36
r46
r56r61 r62 r63 r64 r65 r66
r17
r27
r37
r47
r57r67
r71 r72 r73 r74 r75 r76 r67
1 2
5 6
3 4
7 8
11 12
13 14 15 16
9 10
3 4 7 8 9 10 13 14 11 12 15 16
����������
Lizhe Wang HPC, Clouds, RS Big Data
Optimized Data Layout in HPGFS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
13 14 10 9 5 1 2 6 7 3 4 8 12 11 15 16 13 9 14 5 10 15 1 6 11 16 2 7 12 3 8 4
���������������������(Consecutive-Line Access Pattern)
�����������������������������(���� �� Access Pattern)
��������������(������ Access Pattern)
�������������������(��������������� Access Pattern)
1,2,3,4,9,10,11
,12
5,6,7,813,14,15,16
1,5,9,13,3,7,11,15
2,6,10,14,4,8,12,16
9,10,13,14,3,4,7,8,
1,2,5,6,12,11,15,
16
13,9,14,1,6,11,16,3,8,4
5,10,15,2,7,12
15
1 2
5 6
3 4
7 8
11 12
13 14 16
9 10
15
1 2
5 6
3 4
7 8
11 12
13 14 16
9 10
15
1 2
5 6
3 4
7 8
11 12
13 14 16
9 10
15
1 2
5 6
3 4
7 8
11 12
13 14 16
9 10
r43 r44
r34r33
r11 r12 r13 r14 r15
r21 r22 r23 r24 r25
r31 r32 r33 r34 r35
r41 r42 r43 r44 r45
r51 r52 r53 r54 r55
r16
r26
r36
r46
r56
r61 r62 r63 r64 r65 r66
r17
r27
r37
r47
r57
r67
r71 r72 r73 r74 r75 r76 r67
r18
r28
r38
r48
r58
r68
r78
r81 r82 r83 r84 r85 r86 r87 r88
1 2
5 6
3 4
7 8
11 12
13 14 15 16
9 10
����������� �����with Space-Filling Curves
3-� �����
Brick-Based
� ���� ���������
(Access-Pattern Awared)
6
4
8
12
1615
11
7
32
6
10
1413
1
5
9
13 14 10 9 5 61 2 4 8 12 1615117 3
13
r43 r44
r34r33
r43 r44
r34r33
3-� �����
r43 r44
r34r33
r43 r44
r34r33
r43 r44
r34r33
r43 r44r34r33r43 r44r34r33r43 r44r34r33
14 10 9 5 1 2 6 7 3 4 8 12 11 15 16
Arranging Bricks(Example:Hilbert Curve)
Data Organizing inside Bricks
(Example:Z-order Curve)
������������������ (By Bricks)
����������
����������
����������
����������
Brick Array
Byte Array
Pixel Array
Building Data Layout with Application Aware Policies Different Space-Filling Curves for Different Access Patterns
Lizhe Wang, Yan Ma, etc. A Parallel File System with Application-aware Data Layout Policies in DigitalEarth”, IEEE Transactions on Parallel & Distributed Systems, doi:10.1109/TPDS.2014.2322362
Lizhe Wang HPC, Clouds, RS Big Data
Contents
1 RS Big Data: an Overview
2 How Big? a Quantitative Model
3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing
4 RS Big Data Processing in Clouds: Current Work
5 Concluding Remark
Lizhe Wang HPC, Clouds, RS Big Data
Large-scale RS images Parallel Processing
Input file;XML file;Individual Executable Program;
Output file;
Individual computing task
3
2
11
10
9
17
16
20
22
12
18
21
24
25
6
13
5
19
7
14
23
8
15
1
4
Original sequence
1 5
9
19
2
10
3
11
4
12
6
13
7
14
8
15
16 17 18
20 2221
23
24
25
Task tree based parallel computing
Lizhe Wang HPC, Clouds, RS Big Data
Large-scale RS images Parallel Processing
Lizhe Wang HPC, Clouds, RS Big Data
DAG-based scheduling model
( A AAD
A
. A AAD
A A
,A A
D
C A
1��������
2������� 3��������
4������� 5������ 6������ 7������
AA A A
A
LRMS
, A
D
A A )
A
)
Yan Ma, Lizhe Wang, etc. Task tree based large-scale mosaicking for massive remote sensed imageries withdynamic DAG scheduling. Parallel and Distributed Systems, IEEE Transactions on, 25(8): 2126–2137, Aug2014.
Lizhe Wang HPC, Clouds, RS Big Data
Large scale parallel RS Image Mosaicing
Data: More than Half of China. (20◦ N–47◦ N and 84◦ E–180◦ E) 352 Scenesof ETM+, 15m resolution, PANPerformance: On a 10-node cluster, it takes 5 hours. typically it is several-daywork with a commercial software
Lizhe Wang HPC, Clouds, RS Big Data
ToC
1 RS Big Data: an Overview
2 How Big? a Quantitative Model
3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing
4 RS Big Data Processing in Clouds: Current Work
5 Concluding Remark
Lizhe Wang HPC, Clouds, RS Big Data
Multi-datacenter Clouds for RS Data Processing
Lizhe Wang HPC, Clouds, RS Big Data
Multi-datacenter Clouds for RS Data Processing
Products provide40+ types of RS inversion products: vegetation structure andgrowth state, radiation budget, water and heat fluxes, ice-snowvariation, mineral exploration20+ types of thematic RS products: agriculture, forestry,geology, water,sea ice, environmentlarge scales area: global, China, South Americalong time series: all year round
Service providespatial metadata management for multi-level RS data;data-dependent knowledge base;replicas management for distributed RS data;automated data production;production task scheduling;workflow fault-tolerant;distributed resource monitoring
Lizhe Wang, etc. pipsCloud: High Performance Cloud Computing for Remote Sensing Big DataManagement and Processing. Submitted to FGCS.
Lizhe Wang HPC, Clouds, RS Big Data
Software System Architecture
�,+'1,/'+%
�
��1����+�%#*#+1��2 0!/'-1',+����&�/'+%�
�������'/12�)��/,!#00'+%��601#*�$,/����
� �����'/12�)������)201#/� +3'/,+*#+1�$,/����
�6-#/3'0,/
�),2"��/�*#4,/(���-#+�1�!(�
�/#��#1��5���
�'/12�)���!&'+#�����
�� �� ������ �� �� ��
�#1�,/(�1,/�%# �,*-21#�����1�
�),2"��,/1�)�������������������4# �0#/3'!#����������
��������--)'!�1',+��4�/#�
"�1��)�6,210��+"�!,-6�
������-�#"2!#
�/,%/�**'+%��(#)#1,+0���������
�'01/' 21#"������1���1/2!12/#
��0(��!&#"2)'+%�������,/.2#����
����#-),6#/�+,3��5���
����!&#"2)�/�+,3��0!&#"2)�/�
�*�%#�#-,0'1,/6��)�+!#�
�2,1���,3��.2,1��
���������� ����������
�),2"���+�%#*#+1
�1,/�%#
��'+"#/������4'$1�
�,/($),4��201,*'7�1',+
�,+'1,/'+%���+%)'��
�����1����+�%#*#+1
�������
�#1�"�1����+�%#*#+1
�����1���+"#5'+%
�����+"#5'+%��/##�
�#1�"�1���+"#5'+%�),,*��')1#/�
�#1�"�1���#/3'!#��1��
�2 0!/'-1',+
��
��������������������
�#-)#/
�/"#/���+�%#*#+1
�,/($),4��/,!#00'+%
�601#*���+�%#*#+1
�'/12�)��/%�+'7�1',+�����+�%#*#+1 ����2#2#
����,$14�/#�#-,0'1,/6
������������������0#�
��-��#/3'!#���-0#/3#/�
��1���#/3'!#
��1���#1/'#3�)
��1���,4+),�"
�#1�"�1���#1/'#3�)
�#1�"�1���,4+),�"
�2)1'�1#+�+!6!!,2+1'+% �1�1'01'!
�#0,2/!#��,+'1,/'+%�
���%',0��601#*
��,+'1,/'+%�
���+%)'��
�,%�+�)'060��+"��,+'1,/'+%�
��-)2+(�
�#!2/'16��#6�+,1#�
�
�0#/�21&#+1'!!�1',+
Lizhe Wang HPC, Clouds, RS Big Data
RS file management
Lizhe Wang HPC, Clouds, RS Big Data
RS Metadata management
��$���$���
��������$
�&�$��� ���
�
� �����$�"�����'����
������������
Storage
��� ����� ���� ����
��
� � �
� �
� �
� � �� � ��
������
� � �
� �
� �
� � �� � ��
������
� � �
� �
� �
� � �� � ��
������
Storage Storage Storage
������������
�" %!����
����$"(�����#�" %!���
�
����$"(�����#�" %!���
�
����$"(�����#
��� �������������� ����
��� ��� ���
�'�� �'�� )�� �'�� �'����$���$���%�"(
�
Lizhe Wang HPC, Clouds, RS Big Data
RS Workflow Management
Lizhe Wang HPC, Clouds, RS Big Data
RS Replica and Cache Management
Lizhe Wang HPC, Clouds, RS Big Data
RS Replica and Cache Management
Lizhe Wang HPC, Clouds, RS Big Data
ToC
1 RS Big Data: an Overview
2 How Big? a Quantitative Model
3 Software Systems for High Performance RS Big Data ProcessingGlobal Distributed RS Data Model in MemoryGeneric Parallel Processing Model for RS Big DataParallel Storage Model for RS Big DataScheduling of Large-scale RS images Parallel Processing
4 RS Big Data Processing in Clouds: Current Work
5 Concluding Remark
Lizhe Wang HPC, Clouds, RS Big Data
Concluding Remark for RS Big Data
FindingsBig DataCompute-intensive and data-intensiveUnbalanced Data FlowComplex Algorithm Flow
RecommendationSpecially optimized HPC software systems for data transfer,IO, storage, processing and management
Lizhe Wang HPC, Clouds, RS Big Data
Thank you for your time !
Lizhe Wang HPC, Clouds, RS Big Data