high performance computing group - michael j....
TRANSCRIPT
![Page 1: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/1.jpg)
Michael J. Flynn
Maxeler Technologies and Stanford University
Dataflow Supercomputers
![Page 2: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/2.jpg)
Outline
◦ History
• Dataflow as a supercomputer technology
• openSPL: generalizing the dataflow programming model
• Optimizing the hardware for dataflow
![Page 3: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/3.jpg)
• Amdahl espouses to sequential machine and posits Amdahl’s law
• Danial Slotnick (father of ILLIAC IV) posits the parallel approach while recognizing the problem of programming
3
The great parallel precessor debate of 1967
![Page 4: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/4.jpg)
Michael J Flynn
4Slotnick’s law
“The parallel approach to computing does require that some original thinking be done about numerical analysis and data management in order to secure efficient use. In an environment which has represented the absence of the need to think as the highest virtue this is a decided disadvantage.”
-Daniel Slotnick (1967)….Speedup in parallel processing is achieved by programming effort……
![Page 5: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/5.jpg)
The (multi core) Parallel Processor Problem
• Efficient distribution of tasks
• Inter-node communications (data assembly & dispatch) reduces computational efficiency: speedup/nodes
• Memory bottleneck limitations
• The sequential programming model: Layers of abstraction hide critical sources of and limits to efficient parallel execution
![Page 6: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/6.jpg)
Is there a magic compiler bullet?May’s laws
May’s first law: Software efficiency halves every 18 months, exactly compensating for Moore’s Law
May’s second law: Compiler technology doubles efficiency no faster than once a decade
David May
![Page 7: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/7.jpg)
• Looking for another way. Some experience from Maxeler
• Dataflow is an old technology (70’s and 80’s), conceptually creates and ideal machine to match the program but the interconnect problem was insurmountable for the day.
• Today’s FPGAs have come a long way and enable an emulation of the dataflow machine.
7
Dataflow as a supercomputer technology
![Page 8: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/8.jpg)
Hardware and Software Alternatives
• Hardware: A reconfigurable heterogeneous accelerator array model
• Software: A spatial (2D) dataflow programming model rather than a sequential model
![Page 9: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/9.jpg)
Accelerator HW model
• Assumes host CPU + FPGA accelerator
• Application consists of two parts
– Essential (high usage, >99%) part (kernel(s))
– Bulk part (<1% dynamic activity)
• Essential part is executed on accelerator; Bulk part on host
• So Slotnick’s law of effort now only applies to a small portion of the application
![Page 10: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/10.jpg)
10
FPGA accelerator hardware model: server with acceleration cards
![Page 11: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/11.jpg)
Data flow graph (DFG) --> Data flow machine
Each (essential) program has a data flow graph (DFG)The ideal HW to execute the DFG is a data flow machine that exactly matches the DFGA compiler / translator transforms the DF machine so that it can be emulated by the FPGA.FPGA based accelerators, while slow in cycle time, offer much more flexibility in matching DFGs.Limitation 1: The DFG is limited in (static) size to O (104) nodes.Limitation 2: Only the control structure is matched not the data access patterns
![Page 12: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/12.jpg)
12
Acceleration with Static, Synchronous, Streaming DFMs
Create a static DFM (unroll loops, etc.); generally the goal is throughput not latency.
Create a fully synchronous DFM synchronized to multiple memory channels. The time through the DFM is always the same.
Stream computations across the long DFM array, creating MISD or pipelined parallelism.
If silicon area and pin BW allow, create multiple copies of the DFM (as with SIMD or vector computations).
Iterate on the DFM aspect ratio to optimize speedup.
![Page 13: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/13.jpg)
13
Acceleration with Static, Synchronous, Streaming DFMs
Create a fully synchronous data flow machine synchronized to multiple memory channels, thenstream computations across a long array
FPGA based DFM
Data from node memory
Computation #1
Results tomemory
Computation #2
Buffer intermediate results
PCIe accelerator card w memory is DFE (Engine)
![Page 14: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/14.jpg)
x
x
+
30
y
SCSVar x = io.input("x", scsInt(32));
SCSVar result = x * x + 30;
io.output("y", result, scsInt(32));
14
Example: X2 + 30
![Page 15: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/15.jpg)
Example: Moving Average
15
SCSVar x = io.input(“x”, scsFloat(7,17));SCSVar prev = stream.offset(x, -1);SCSVar next = stream.offset(x, 1); SCSVar sum = prev + x + next; SCSVar result = sum / 3;io.output(“y”, result, scsFloat(7,17));
Y = (Xn-1 + X + Xn+1) / 3
![Page 16: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/16.jpg)
Example: Choices
16
x
+1
y
-1
>
10
SCSVar x = io.input(“x”, scsUInt(24));SCSVar result = (x>10) ? x+1 : x-1;io.output(“y”, result, scsUInt(24));
![Page 17: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/17.jpg)
Data flow graph as generated by compiler
4866 nodes; about 250x100
Each node representsa line of JAVA code witharea time parameters, sothat the designer can change the aspect ratio to improvepin BW, area usage andspeedup
![Page 18: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/18.jpg)
1U Form Factor 4x dataflow engines12 Intel Xeon cores192GB DFE RAM192GB CPU RAMPCIe Gen2 x8MaxRing interconnect3x 3.5” hard drivesInfiniband
18
MPC-C500
![Page 19: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/19.jpg)
MPC-X1000
8 dataflow engines(192-384GB RAM)High-speed MaxRingZero-copy RDMA between CPUs and DFEs over InfinibandDynamic CPU/DFE balancing
19
![Page 20: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/20.jpg)
Example: Seismic Data Processing
For Oil & Gas exploration: distribute grid of sensors over large area
Sonic impulse the area and record reflections: frequency, amplitude, delay at each sensor
Sea based surveys use 30,000 sensors to record data (120 db range) each sampled at more than 2kbps with new sonic impulse every 10 seconds
Order of terabytes of data each day
![Page 21: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/21.jpg)
1200m
1200m
1200m1200m
1200m
Generates >1GB every 10s
![Page 22: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/22.jpg)
Modelling Results
Up to 240x speedup for 1 MAX2 card compared to single CPU coreSpeedup increases with cube size1 billion point modelling domain using single FPGA card
22
![Page 23: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/23.jpg)
Achieved Computational Speedup for the entire application (not just the kernel) compared to Intel server
RTM with ChevronVTI 19x and TTI 25x Sparse Matrix
20-40xSeismic Trace
Processing24x
Lattice Boltzman
Fluid Flow 30xConjugate Gradient Opt
26xCredit 32x and Rates 26x
624
624
![Page 24: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/24.jpg)
24
So for HPC, how can emulation (FPGA) be better than high performance x86 processor(s)?
Multi core approach lacks robustness in streaming hardware (spanning area, time, power)
Multi core lacks robust parallel software methodology and tools
FPGAs emulate the ideal data flow machineSuccess comes about from their flexibility in matching the
DFG with a synchronous DFM and streaming data through and shear size > 1 million cells
Effort and support tools provide significant application speedup
![Page 25: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/25.jpg)
• Open spatial programming language, an orderly way to expose parallelism
• 2D dataflow is the programmer’s model, JAVA the syntax
• Could target hardware implementations, beyond the DFEs
– map on to CPUs (e.g. using OpenMP/MPI)
– GPUs
– Other accelerators
25
Generalizing the programming modelopenSPL
![Page 26: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/26.jpg)
• A program is a sequence of instructions
• Performance is dominated by:
– Memory latency
– ALU availability
26
Temporal Computing (1D)
CPU
Time
Get Inst
.1
Memory
COMP
Read data
1
Write Result
1
COMP
Read data
2
Write Result
2
COMP
Read data
3
Write Result
3
Actual computation time
Get Inst
.2
Get Inst
.3
![Page 27: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/27.jpg)
27
Spatial Computing (2D)
datain
ALU
ALU
Buffer
ALU
Control
ALU
Control
ALU dataout
Synchronous data movement
Time
Read data [1..N]ComputationWrite results [1..N]
Throughput dominated
![Page 28: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/28.jpg)
OpenSPL Basics
28
• Control and Data-flows are decoupled– both are fully programmable– can run in parallel for maximum performance
• Operations exist in space and by default run in parallel– their number is limited only by the available space
• All operations can be customized at various levels – e.g., from algorithm down to the number representation
• Data sets (actions) streams through the operations• The data transport and processing can be matched
![Page 29: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/29.jpg)
OpenSPL Models
29
• Memory:– Fast Memory (FMEM): many, small in size, low latency– Large Memory (LMEM): few, large in size, high latency– Scalars: many, tiny, lowest latency, fixed during exec.
• Execution:– datasets + scalar settings sent as atomic “actions”– all data flows through the system synchronously in “ticks”
• Programming:– API allows construction of a graph computation– meta-programming allows complex construction
![Page 30: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/30.jpg)
Spatial Arithmetic
30
• Operations instantiated as separate arithmetic units
• Units along data paths use custom arithmetic and number representation
• The above may reduce individual unit sizes
– can maximize the number that fit on a given SCS
• Data rates of memory and I/O communication may also be maximized due to scaled down data sizes
S S S S S S Ss
Exponent (8) Mantissa (23)
S S Ss
Exponent (3)
Mantissa (10)Potentially optimal encoding
![Page 31: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/31.jpg)
Spatial Arithmetic at All Levels
31
• Arithmetic optimizations at the bit level
– e.g., minimizing the number of ’1’s in binary numbers, leading to linear savings of both space and power (the zeros are omitted in the implementation)
• Higher level arithmetic optimizations
– e.g., in matrix algebra, the location of all non-zero elements in sparse matrix computations is important
• Spatial encoding of data structures can reduce transfers between memory and computational units (boost performance and improve efficiency)
– In temporal computing encoding and decoding would take time and eventually can cancel out all of the advantages
– In spatial computing, encoding and decoding just consume a bit more of additional space
![Page 32: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/32.jpg)
• Spatial computing systems generate one result during every tick
• SC system efficiency is strongly determined by how efficiently data can be fed from external sources
• Fair comparison metrics are needed, among others:
– computations per cubic foot of datacenter space
– computations per Watt
– operational costs per computation
32
Benchmarking Spatial Computers
![Page 33: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/33.jpg)
• The FPGA while quite suitable for emulation is not an ideal hardware substrate
– Too fine grain, wasted area
– Expensive
– Place and route times are excessive and will get longer
– Slow cycle time
• FPGA advantages
– Commodity part with best process technology
– Flexible interconnect
– Transistor density scaling
– In principle, possible to quickly reduce to ASIC
33
Hardware: FPGA pros & Cons
![Page 34: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/34.jpg)
Silicon device density scaling (ITRS 10 year projections)
Net: there’s either 20 billion transistors or 50 Giga Bytes of Flash on a 1cm2 die
![Page 35: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/35.jpg)
Hardware alternatives: DFArray
• Clearly an array structure is attractive
• The LUT is inefficient in the context of dataflow
• Dataflow operations are relative few and well defined (arithmetic, logic, mux and FIFO and store)
• Flexibility in data sizing is important but not necessarily to the bit.
• Existing DSPs (really MACs) in FPGAs are prized and well used (2000 or so per chip)
![Page 36: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/36.jpg)
Hardware alternatives: DFArray
• The flexible interconnect required by the dataflow will still limit the cycle time (perhaps 5x slower than CPU). This also reduces power requirements.
• The great advantage is in the increased operational density (perhaps 10x over existing FPGAs) enabling much larger dataflow machines and greater parallelism.
• Avoids very long (many hours) “place & route” time required by FPGA
![Page 37: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/37.jpg)
Hardware alternatives: DFArray
• The big disadvantage: it’s not a commodity part (even an expensive one).
• There’s a lot of research issues in determining the best dataflow hardware fabric.
![Page 38: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/38.jpg)
38
Conclusions 1
Parallel Processing demands rethinking algorithms, programming approach and environment and hardware.The success of FPGA acceleration points to the weakness of evolutionary approaches to parallel processing: hardware (multi core) and software (C++, etc.), at least for some applicationsThe automation of acceleration is still early on; still required: tools, methodology for writing apps., analysis methodology and (maybe) a new hardware basis For FPGA success software is key: VHDL, inefficient place and route, SW are big limitations
Conclusions
![Page 39: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/39.jpg)
Conclusions 2
• In parallel processing: to find success, start with the problem not the solution.
• There’s a lot of research ahead to effectively create parallel translation technology.
![Page 40: High Performance Computing Group - Michael J. …hpc.ac.upc.edu/Talks/dir15/T000392/slides.pdfMichael J Flynn 4Slotnick’s law “The parallel approach to computing does require that](https://reader034.vdocuments.us/reader034/viewer/2022042223/5ec9cb90ef49976c6804f774/html5/thumbnails/40.jpg)
Thank you