writing high-performance software by arvid norberg
DESCRIPTION
BitTorrent Chief Architect Arvid Norberg on Writing high-performance software.TRANSCRIPT
![Page 1: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/1.jpg)
Writing High-Performance Software
![Page 2: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/2.jpg)
Performance ⟺Longer Battery Life(Not only for when things need to run fast)
![Page 3: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/3.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Memory Cache
Core 1 Core 2
L1 (32 kiB)L1 (32 kiB)
Core 3 Core 4
L1 (32 kiB)L1 (32 kiB)
L2 (256 kiB)L2 (256 kiB)
L3 (6 MiB)
Main memory (16 GiB)
L2 (256 kiB)L2 (256 kiB)
Typical memory cache hierarchy (Core i5 Sandy Bridge)
![Page 4: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/4.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Memory Latency
register
0ns 0.125ns 0.25ns 0.375ns 0.5ns
http://www.7-cpu.com/cpu/IvyBridge.html
Memory latencies Core i5 Sandy Bridge
![Page 5: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/5.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Memory Latency
register
L1 cache
0ns 0.325ns 0.65ns 0.975ns 1.3ns
http://www.7-cpu.com/cpu/IvyBridge.html
Memory latencies Core i5 Sandy Bridge
![Page 6: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/6.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
register
L1 cache
L2 cache
0ns 1ns 2ns 3ns 4ns
http://www.7-cpu.com/cpu/IvyBridge.html
Memory LatencyMemory latencies Core i5 Sandy Bridge
![Page 7: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/7.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
register
L1 cache
L2 cache
L3 cache
0ns 3.75ns 7.5ns 11.25ns 15ns
http://www.7-cpu.com/cpu/IvyBridge.html
Memory LatencyMemory latencies Core i5 Sandy Bridge
![Page 8: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/8.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
http://www.7-cpu.com/cpu/IvyBridge.html
register
L1 cache
L2 cache
L3 cache
DRAM
0ns 22.5ns 45ns 67.5ns 90ns
61.8 x
Memory LatencyMemory latencies Core i5 Sandy Bridge
![Page 9: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/9.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
• When a CPU is waiting for memory, it is busy (i.e. you will see 100% CPU usage, even if your bottleneck is waiting for memory)
• Memory access patterns is a significant factor in performance
• Constant cache misses makes your program run up to 2 orders of magnitude slower than constant cache hits
Memory Latency
![Page 10: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/10.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Memory Cache
The memory you requested
The memory pulled into the cache
cache line
![Page 11: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/11.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
• CPUs prefetch memory automatically if they can recognize your access pattern (sequential is easy)
• CPUs predict branches in order to prefetch instruction memory
• Memory access pattern is not only determined by data access but also control flow (indirect jumps stall execution on a memory lookup)
Memory Latency
![Page 12: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/12.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Memory Cache
64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes
For linear memory reads, the CPU will pre-fetch memory
![Page 13: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/13.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Memory Cache
64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes
For linear memory reads, the CPU will pre-fetch memory
![Page 14: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/14.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Memory Cache
64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes
For random memory reads, there is no pre-fetch and most memory accesses will cause a cache miss
![Page 15: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/15.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Memory Cache
64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes 64 bytes
For random memory reads, there is no pre-fetch and most memory accesses will cause a cache miss
![Page 16: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/16.jpg)
Data Structures
![Page 17: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/17.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Data Structures
• Array of pointers to objects and linked listsmore cache pressure / cache misses
• Array of objectsless cache pressure / cache hits
![Page 18: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/18.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Data Structures
• One optimization is to refactor your single list of heterogenous objects into one list per type.
• Objects would lay out sequentially in memory
• Virtual function dispatch could become static
![Page 19: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/19.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Data Structures
std::vector<std::unique_ptr<shape>> shapes;
for (auto& s : shapes) s->draw();
![Page 20: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/20.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Data Structures
std::vector<std::unique_ptr<shape>> shapes;
for (auto& s : shapes) s->draw();
std::vector<rectangle> rectangles;std::vector<circle> circles;
for (auto& s : rectangles) s.draw();for (auto& s : circles) s.draw();
![Page 21: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/21.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Data Structures
std::vector<std::unique_ptr<shape>> shapes;
for (auto& s : shapes) s->draw();
std::vector<rectangle> rectangles;std::vector<circle> circles;
for (auto& s : rectangles) s.draw();for (auto& s : circles) s.draw();
Pointers needdereferencing +
vtable lookup
![Page 22: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/22.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Data Structures
Pointers needdereferencing +
vtable lookup
Objects packed back-to-back, sequential memory access, no vtable lookup
std::vector<rectangle> rectangles;std::vector<circle> circles;
for (auto& s : rectangles) s.draw();for (auto& s : circles) s.draw();
std::vector<std::unique_ptr<shape>> shapes;
for (auto& s : shapes) s->draw();
![Page 23: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/23.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Data Structures
• For heap allocated objects, put the most commonly used (“hot”) fields in the first cache line
• Avoid unnecessary padding
![Page 24: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/24.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Data Structures
Padding
struct A [24 Bytes] 0: [int : 4] a--- 4 Bytes padding --- 8: [void* : 8] b 16: [int : 4] c --- 4 Bytes padding ---
struct B [16 Bytes] 0: [void* : 8] b 8: [int : 4] a 12: [int : 4] c
struct A {! int a;! void* b;! int c;};
struct B {! void* b;! int a;! int c;};
![Page 25: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/25.jpg)
Context Switching
![Page 26: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/26.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching
• One significant source of cache misses is switching context, and switching the data set being worked on
• Context switch
• Thread / process switching
• User space -> kernel space
![Page 27: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/27.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching
• One significant source of cache misses is switching context, and switching the data set being worked on
• Context switch
• Thread / process switching
• User space -> kernel space
![Page 28: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/28.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching
• Lower the cost of context switching by amortizing it over as much work as possible
• Reduce the number of system calls by passing as much work as possible per call
• Reduce thread wake-ups/sleeps by batching work
![Page 29: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/29.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching
When a thread wakes up, do as much work as possible before going to sleep
Drain the socket of received bytes
Drain the job queue
![Page 30: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/30.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching (traffic analogy)One car at a time
![Page 31: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/31.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching (traffic analogy)One car at a time
![Page 32: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/32.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching (traffic analogy)One car at a time
![Page 33: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/33.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching (traffic analogy)One car at a time
![Page 34: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/34.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching (traffic analogy)One car at a time
![Page 35: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/35.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching (traffic analogy)One car at a time
![Page 36: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/36.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching (traffic analogy)The whole queue at a time
![Page 37: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/37.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching (traffic analogy)The whole queue at a time
![Page 38: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/38.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching (traffic analogy)The whole queue at a time
![Page 39: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/39.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching
• Every time the socket becomes readable, read and handle one request
buf = socket.read_one_request()req = parse_request(buf)handle_req(socket, req)
![Page 40: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/40.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching
• Drain the socket each time it becomes readable
• Parse and handle each request that was receivebuf.append(socket.read_all())
req, buf = parse_request(buf)while req != None: handle_req(socket, req) req, buf = parse_request(buf)
![Page 41: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/41.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching
• Write all responses in a single call at the endDon’t flush buffer to
socket until all messages are handledbuf.append(socket.read_all())
socket.cork()req, buf = parse_request(buf)while req != None: handle_req(socket, req) req, buf = parse_request(buf)socket.uncork()
![Page 42: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/42.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Socket Programming
• There are two ways to read from sockets
• Wait for readable event then read (POSIX)
• Read async. then wait for completion event (Win32)
![Page 43: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/43.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Socket Programming
kevent ev[100];
int events = kevent(queue, nullptr , 0, ev, 100, nullptr);
for (int i = 0; i < events; ++i) { int size = read(ev[i].ident, buffer , buffer_size); /* ... */}
![Page 44: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/44.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Socket Programming
Wait for the socket to become readable
Copy data from kernel space to user space
kevent ev[100];
int events = kevent(queue, nullptr , 0, ev, 100, nullptr);
for (int i = 0; i < events; ++i) { int size = read(ev[i].ident, buffer , buffer_size); /* ... */}
![Page 45: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/45.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Socket Programming
WSABUF b = { buffer_size, buffer };DWORD transferred = 0, flags = 0;WSAOVERLAPPED ov; // [ initialization ]int ret = WSARecv(s, &b, 1, &transferred , &flags, &ov, nullptr);
WSAOVERLAPPED* ol;ULONG_PTR* key;BOOL r = GetQueuedCompletionStatus(port , &transferred, &key, &ol, INFINITE);
ret = WSAGetOverlappedResult(s, &ov , &transferred, false, &flags);
![Page 46: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/46.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Socket Programming
Initiate async. read into buffer
Wait for operations to complete
Query status
WSABUF b = { buffer_size, buffer };DWORD transferred = 0, flags = 0;WSAOVERLAPPED ov; // [ initialization ]int ret = WSARecv(s, &b, 1, &transferred , &flags, &ov, nullptr);
WSAOVERLAPPED* ol;ULONG_PTR* key;BOOL r = GetQueuedCompletionStatus(port , &transferred, &key, &ol, INFINITE);
ret = WSAGetOverlappedResult(s, &ov , &transferred, false, &flags);
![Page 47: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/47.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Socket Programming
• Passing in a buffer up-front is preferable because:
• NIC driver can in theory receive data directly into your buffer and save a copy
• If there is a memory copy, it can be done asynchronously, not blocking your thread
![Page 48: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/48.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Socket Programming
• Problem: What buffer size should be used?
• Too large will waste memory
• Too small will waste system calls(since we need multiple calls to drain the socket)
![Page 49: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/49.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Socket Programming
• Problem: What buffer size should be used?
• Start with some reasonable buffer size
• If an async read fills the whole buffer, increase size
• If an async read returns significantly less than the buffer size, decrease size
Size adjustments should be proportional to the buffer size
![Page 50: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/50.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Context Switching
Adapt batch size to the computer’s natural granularity
Higher load should lead to larger batches, fewer context switches and higher efficiency.
Use of magic numbers is a red flag
![Page 51: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/51.jpg)
Message Queues
![Page 52: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/52.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Message Queues
• Events on message queues may come in batches
• Example: we receive one message per 16 kiB block read from disk.
void conn::on_disk_read(buffer const& buf) { m_socket.write(&buf[0], buf.size()); }
![Page 53: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/53.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Message Queues
• Problem: We want to flush our sockets right before we go to sleep, i.e. when we have drained the message queue, without starvation
![Page 54: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/54.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Message Queues
void conn::on_disk_read(buffer const& buf) { m_buf.insert(m_buf.end(), buf); if (m_has_flush_msg) return; m_has_flush_msg = true; m_queue.post(std::bind(&conn::flush , this));}
void conn::flush() { m_socket.write(&m_buf[0], m_buf.size()); }
![Page 55: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/55.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Message Queues
If there is no outstanding flush message, post one
Instead of writing to the socket, accumulate the buffers
Flush all buffers when all messages have been handled
void conn::on_disk_read(buffer const& buf) { m_buf.insert(m_buf.end(), buf); if (m_has_flush_msg) return; m_has_flush_msg = true; m_queue.post(std::bind(&conn::flush , this));}
void conn::flush() { m_socket.write(&m_buf[0], m_buf.size()); }
![Page 56: Writing High-Performance Software by Arvid Norberg](https://reader033.vdocuments.us/reader033/viewer/2022052618/554b5725b4c905e9388b4cf9/html5/thumbnails/56.jpg)
BitTorrent, Inc. | Writing High-Performance Software For Internal Presentation Purposes Only, Not For External Distribution .
Message Queues
FIFOmessage
queue
Message handler
Flush message