Thủ Thuật Hướng dẫn For a linked list which is faster to find 2022
Bạn đang tìm kiếm từ khóa For a linked list which is faster to find được Cập Nhật vào lúc : 2022-12-11 10:09:05 . Với phương châm chia sẻ Mẹo về trong nội dung bài viết một cách Chi Tiết 2022. Nếu sau khi Read Post vẫn ko hiểu thì hoàn toàn có thể lại Comments ở cuối bài để Mình lý giải và hướng dẫn lại nha.
Linked lists are celebrity data structures of software development. They are celebrities because every engineer has had something to do with them in one part of their career. They are used in many places: from low-level memory management in operating systems up to data wrangling and data filtering in machine learning.
Nội dung chính
- Why are linked lists slow?
- So, how does linked list performance depends on memory layout?
- How to make linked lists faster?
- Faster alternatives to linked lists
- Linked lists with a shared allocator
- Linked lists with a per-instance allocator
- Linked list backed by std::vector
- Unrolled linked lists
- Data filtering using linked lists
- Unordered collection of objects
- Specializing your linked list
- Conclusions
- Unless you are holding pointers to the elements of the linked list in other data structures, you dont need iterator stability.
- If you are filtering data, you want to be able to remove elements any location, but you will always insert them the beginning or the end.
- If you use linked list to keep a unsorted collection of objects, you dont care what location you want to insert your object. You only want to be able to insert and remove objects, and the same time keep the memory consumption low. You can cheapily insert into vector, but when you remove from it, you will waste memory space.
- Your linked list can be special in some way, e.g. most of your linked lists can be really small. In that case you want to optimize for the common linked list size.
- It avoids calling the allocate for each insertion to the list and deallocate for each deletion from the list.
- std::vector handles memory allocation in a simple way.
- The block of memory used for nodes is very compact, which increased data cache hit rate and TLB hit rate.
- Allocating from a single block of memory effectively avoids memory fragmentation. When the linked list is destroyed, the whole block of memory is returned back to the operating system.
- It uses indexes instead of pointers to point to the next item in the list, and indexes have smaller sizes. The node itself is smaller which increases data cache hit rate.
- If you create a list by inserting elements the beginning or end, you are guaranteed to have the perfect ordering and the best performance.
- It offers compact operation which recreates the perfect ordering of nodes in the memory and can be called when the linked list becomes too fragmented. It also releases unused memory back to the operating system.
- It uses more memory. If std::vector regrows bad time, it can consume two times more memory than actually needed.
- If many nodes are deleted from the list, many nodes can reside in the không lấy phí nodes list but are actually unused. This increases memory consumption. In that case it pays off to perform the compact operation.
- The iterators are not stable. When the vector regrows, the iterators get invalidated. But the indexes into the backing vector remain stable, so you can use them instead.
- Better data cache usage. Using unrolled linked list increases the data cache hit rate, TLB cache hit rate and general performance.
- It avoids calls to the system allocator. Instead of having N calls for the simple linked list of size N, a unrolled linked list with a factor X makes N/X calls to the system allocator, which is significantly smaller.
- Depending on the implementation, they can use more memory than regular linked lists (this happens if a single node is not completely filled).
- Iteration is more complex. For a simple linked list, the iteration is just a pointer dereference. For an unrolled linked list, the iteration can be either pointer dereference (end of list), or an moving to the next value in the node (which is increment).
- Depending on the implementation, iterators might not be stable. To optimize for performance, some shuffling inside a single node might be needed when a new element is added to the list which can decrease performance.
- A program with many linked lists, 95% of the lists have 2 elements in them. In this case, the data cache misses would not come from iterating through the list, but from switching from linked list X to linked list X + 1. Here, the challenge is not packing a single list, but packing a large array of small linked lists.
- A program that filters data, but after the initial filling, 90% of the data is removed in the first pass. If we were to use data filtering using linked lists as we described earlier, the vector would be sparsely populated and we would be left with a vector that is no better than a simple linked list. One of the solutions would be, in the first filtering pass, to allocate another vector and copy only remaining elements to it. Then we get rid of the original vector. The data in the new vector is perfectly ordered, so we can expect good performance later.
- A text editor. Although it might seem like a good idea to keep the text the user is entering in a linked list of characters for easy modification, in reality it is never done like that. This would fragment the memory too much, also the next pointer would take unnecessarily too much space. Text editors employ a data structure called Gap buffer, which allow cheap insertion and removal clustered around the same location. Internally, gap buffers ressemble regular memory buffers, but the text they hold is split in two halfs: left and right. Between the halfs, there is a gap, and inserting data into the gap is very cheap. More on gap buffers on Wikipedia.
- source: Wikipedia []
- source::Wikipedia []
- source: PLF Colony web site []
They promise a lot: if the list has n elements, you can traverse it in O(n) time, and you can insert into it and remove from it in constant time. In contrast to vectors, where insertion a random place and removal from a random place has a O(n) complexity, they seem like perfect data structures when you need to filter data, reorganize data, append data in random places, etc.
But, under all the pretty promises, things dont look that shiny. Linked lists are data structures that are notorious for bad hardware resource usage, particularly bad usage of the memory subsystem. Why?
Why are linked lists slow?
We wrote several times in this block, but we are restating it again. On modern hardware, the bottleneck is often the memory subsystem. CPU is fast but the memory isnt. What this means is that between the request made to the memory subsystem and the response, the CPU can wait up to 300 cycles. For this reason, CPUs contain a component called a prefetcher that can figure out the access pattern of your program and prefetch data before the data actually becomes needed.
Linked lists suffer from slowdowns because, while traversing the linked list, your program is following the next pointer and jumps around the memory in an unpredictable fashion. In arrays, the memory access is predictable: if CPU is processing an array of integers and it is currently processing an element address X, the next element will be address X + sizeof(int). In linked lists, the address of the next element will be known only after the next pointer is dereferenced. There is no pattern to memory access, and the CPU has to pay the full price of data cache misses.
Additionally, the data from the memory is fetched typically in blocks of 64 bytes. Arrays take advantage of this, because if the block doesnt belong to the currently processed element, it belongs to the next element. On the contrary, part of the 64 bytes of data that is fetched from the memory will belong to the current node, but the other part can belong to some completely irrelevant component and the rest of the block is wasted data transfer.
Like what you are reading? Follow us on LinkedIn or and get notified as soon as new content becomes available.
Need help with software performance? Contact us!
So, how does linked list performance depends on memory layout?
When someone tries to explain the concept of a linked list to you, they do it by using the following chart:
Linked list1
A linked list consists of nodes. Each node contains a value and a pointer to the next node. The last node in the list points to a termination node, which is typically a nullptr. In this example, the linked list consists of three nodes, with corresponding values 12, 99 and 37.
From the performance perspective, however, this chart is incomplete. All nodes have their places in memory, and the memory layout is crucial for performance. Here are three different layouts of the same linked list:
Three different memory layouts of a linked list
The first layout (we call random layout)is the layout you might come up with if your system is allocating and deallocating nodes for a long period of time. It is the least efficient: the nodes are not close to one another in memory and, while iterating through the linked list, the access pattern is completely unpredictable.
The second layout (we call it compact layout) is better: the nodes are close to one another in memory. The memory access pattern is still unpredictable, but the fact that the nodes are close to one another increases the chance that the next node is in CPUs data cache and therefore increases performance.
The third layout (perfect layout)is the best. While iterating through the list, the program is accessing neighboring memory locations. The data prefetcher can figure out the memory access pattern and can preload the data so that the data is available before the CPU even needs it.
If you are using a simple linked list, e.g. std::list, you dont have any control of the memory layout. It might happen that for a small test program std::list is very fast, but in a production environment it is much slower. The reason is simple: in the small testing environment, the system allocator can return neighboring memory chunks thus creating a perfect memory layout for std::list. In a production environment, however, this is often not possible as memory needs to be reused, so the allocator will return chunks of memory that were used earlier.
Lets compare linked list performance with respect to the memory layout. For this test, we will use linked lists of doubles, and we will sum up all the values in the list. We pick three sizes of linked lists: small (128 doubles), medium (1M doubles) and large (64M doubles). We iterate each linked list enough times so that in the end we iterated through 64M values.
Here are the numbers for different linked list sizes:
Linked List SizeRandom Memory LayoutCompact Memory LayoutPerfect Memory LayoutSmall (128 doubles)
Runtime:
L3 Data Cache Request Rate:
L3 Data Cache Miss Rate:
Data Transferred from Memory:
0.08 s
~ 0%
~ 0%
0.26 GB
0.07 s
~0%
~0%
0.25 GB
0.07 s
~0%
~0%
0.25 GBMedium (1M doubles)
Runtime:
L3 Data Cache Request Rate:
L3 Data Cache Miss Rate:
Data Transferred from Memory:
8.2 s
22 %
92 %
21.7 GB
6 s
23 %
68 %
13 GB
0.12 s
1 %
89 %
1.25 GBLarge (64 M doubles)
Runtime:
L3 Data Cache Request Rate:
L3 Data Cache Miss Rate:
Data Transferred from Memory:
15 s
24 %
99 %
34.2 GB
9.7 s
24 %
99 %
25.2 GB
0.12 s
1 %
99 %
1.36 GBRuntimes and other parameters for different memory layouts of linked lists
Lets analyze the numbers in the table. For a small linked list, the memory layout is not that important. We can reasonably expect that the whole linked list fits the L1 cache. You can see that the data transferred from the memory remains constant regardless of the layout, which again confirms our hypothesis that the data set fits the data cache. The number of requests made to the L3 cache is almost zero as well.
For medium and large linked lists, the memory layout becomes much more important. Whereas the time needed to traverse the linked list with perfect memory layout remains stable around 0.12 s, the numbers for two other memory layouts explode! The random memory layout is 68 times slower for the medium data set and 125 times slower for the large data set. The linked list with a compact memory layout, even though a bit better, is still much slower than the linked list with a perfect memory layout.
We see that the worse the layout, the amount of the data transferred from the main memory becomes larger. This is expected because of the block organization of the data caches: we load 64 bytes of data from the memory, even though we are using only 16 bytes.
Note that for the large linked list, the L3 data cache miss rate is 99%. This means that the L3 cache is useless for the data set of this size.
IMPORTANT NOTE: As you have seen, the performance of the linked list depends a lot on its memory layout. And the memory layout depends very much on the environment your program is running: the underlying memory allocator, other data structures in the program that allocate a lot of memory (other linked lists, trees or heap-allocated objects) and the way your program uses its linked lists (modifying linked list a lot ruins its good memory layout). So, the numbers we presented and we will present in the following section may look a lot different for your case.
How to make linked lists faster?
The question is if we can make linked lists faster? The answer is yes, but, as you will see later, the process of optimizing linked lists is not simple and straightforward.
The first conclusion that comes up naturally, is that for optimum performance of the linked list you need to strive for the perfect memory layout. The closer the memory layout is to the perfect layout, the faster will it run.
Second, notice that linked lists are very flexible data structures. You can add and remove elements random locations with O(1) complexity. All the pointers remain where they are regardless of the insertion and removal. It is very hard to think of a data structure that has similar properties.
In reality, much of the flexibility is not needed. For example:
Most of the time you dont need the full flexibility of a linked list, only some part of it. Although we will give ideas on how to speed up regular linked lists, many times you will want to specialize your linked list for your application as this will result in the greatest speedup.
Faster alternatives to linked lists
So how do we make linked lists faster? The first thing you need to understand is that, when iterating over a linked list, you can expect a data cache miss every time you move to the next element unless the memory layout is perfect. So, if you dont need to iterate over the list, dont do it.
Alternatively, maybe you need to iterate over the list, but you can iterate only over some part of the list. You can introduce skipping pointers (or skipping iterators), to skip over the elements in the list that you dont need to access right now.
Linked list with two fields: next (used for regular iteration), and skip (used for fast iteration)
In the above example, we can use the skip field to skip over the element in the linked list we are not currently processing, but that we dont wish to delete.
Linked lists with a shared allocator
One of the ways to increase the performance of linked lists is to use a custom iterator for your linked list. We talked about custom allocators in the previous post about allocation and performance, but the summary is the following: with a custom allocator, all the memory for the linked list nodes can be allocated from a dedicated block of memory. Because of this, the nodes that make the list are more compact in memory. This increases the data cache hit rate and TLB cache hit rate.
Standard STL containers, including std::list and stl::forward_list can be configured with a custom allocator. The process is really simple, and you can find an allocator trực tuyến (e.g. moya allocator). The downside to this approach is that all the instances of the same type share the same allocator. For example, all instances of std::list<int, my_alloc> will share the same allocator. This approach is good if you have many small linked lists, but can be problematic if you have a few, but large linked lists.
Using a custom allocator typically results that the nodes of the list are more compact in memory, which corresponds to the compact layout we described earlier. But if you have many instances of the same list, they all will share the same allocator, and as a consequence, you should not expect that the consecutive nodes in the list are consecutive in memory.
Linked lists with a per-instance allocator
For larger linked lists it can pay off to have a separate instance allocator for each list. If the allocator returns consecutive chunks in memory, this will translate to consecutive nodes being consecutive memory locations, which further results in an almost perfect ordering of nodes in memory and the performance comparable to those of vectors.
Unfortunately, STL doesnt offer such data structures, so you will have to write them yourself or try to find implementations trực tuyến.
Linked list backed by std::vector
We wrote an implementation of a linked list called jsl::vector_list that uses std::vector instead of an allocator for storage. The list allocates memory for its nodes from the elements of underlying std::vector. When a node is deleted, it is returned to the list of không lấy phí nodes. This node can be later reused when new values are inserted into the list.
There are several advantages to this approach:
There are also some disadvantages:
Since we implemented the data structures, we implemented a quick performance evaluation. We compared the performance of std::forward_list<int> and jsl::vector_list<int> for various operations. Please note that in this small test program std::forward_list will probably have a near perfect memory layout initially since it is the only data structure in the program that allocates memory.
Here are the numbers:
Operationstd::forward_list runtimejsl::vector_list runtime(1) push_back 128 M integers2.9 seconds0.9 seconds(2) erase every fifth element0.45 seconds0.2 seconds(3) insert a new int every third place1.2 seconds0.7 seconds(4) traverse the whole list2.1 seconds0.4 seconds(5) traverse after compact0.27 secondsComparison between std::forward_list and jsl::vector_list
We performed altogether five operations on our lists. At step 1 we inserted 128 M integers into both lists. The reason why std::forward_list is slower is probably because of the need to call malloc for each element of the list. Our jsl::vector_list doesnt call malloc, so typically allocation is very fast.
The same applies for step 2: deleting from the std::forward_list is slower because it needs to call không lấy phí for each element of the list. In step 3 we insert a new integer every third place in the list. In step 4 we traverse both lists and compare the runtimes. Both lists suffer from slowdowns because of an imperfect memory layout, but jsl::vector_list suffers less since std::vector will keep the block of memory as compact as possible. Traversing the whole list is more than 5 times faster for the jsl::vector_list. After the layout of jsl::vector_list is made perfect again using compact, jsl::vector_list is more than 7 times faster.
Like what you are reading? Follow us on LinkedIn or and get notified as soon as new content becomes available.
Need help with software performance? Contact us!
Unrolled linked lists
Unrolled linked lists are another way of making linked lists faster. The main idea behind unrolled linked lists is to store more than one value in a single linked list node.
An unrolled linked list with up to 4 values in each node.2
In the above example, the list stores 4 elements in a single node. The advantages of such solutions are:
There are also downsides:
We compared the performance of unrolled and regular linked lists in the earlier post about ways to make your program run faster by better using the data cache. Here are the numbers from the same post:
As you can see, the higher the number of values in the node, the better the performance. This is especially true if you are holding large data types as part of your list.
An unrolled linked lists implementation is available in the PLF C++ Library. It is a drop-in replacement for std::list, so it can be used with minimum changes to your code. The authors claim that their implementation is 293% faster than std::list when it comes to insertion, 57% faster when it comes to erasure and 17% faster when it comes to iteration.
Data filtering using linked lists
When doing data filtering, first you are filling your linked list with data, and then you are removing data that is not according to a certain criterion. Linked lists are also used for this, but the process can be optimized.
In this particular case ordering of values doesnt change if we remove other values. If we use std::vector to store our data, we can get rid of the next pointer, saving some space but also increasing the data cache hit rate. We can store information about used and freed values in a bitset. Each bit in a bitset corresponds to a value: if the bit set the value is present, otherwise it is left out.
Similarly, for a linked list backed by a vector, we can perform a compact operation to get rid of the gaps and make the memory access more data cache-friendly, after we have performed a sufficient number of element removal.
We implemented jsl::filter_vector which is a data filtering list backed by a vector. It uses boost::dynamic_bitset to store information about không lấy phí and occupied nodes. Here is the comparison of jsl::filter_vector and jsl::vector_list. Here are the numbers:
Operationstd::listjsl::filter_vector(1) insert 256M integers6 s1.8 s(2) traverse the whole list0.72 s0.66 s(3) erase 4 out of 5 elements (leave every 5th)2.1 s0.92 s(4) traverse the whole list0.73 s0.38 s(5) traverse the whole list after compact0.12 sComparison between std::list and jsl::filter_vector
We see a similar pattern as with jsl::vector_list. Insertion and deletions are slower because of the need to work with dynamic memory through the system allocator. After the insertion, the times needed to traverse the whole list is similar for both std::list and jsl::filter_vector.
After erasure, although we removed 80% of the element in the list, the time needed o traverse std::list didnt change, and the time needed to traverse jsl::filter_vector was less than two times shorter. The real performance improvement happens after the compact operation on jsl::filter_vector. The runtime is now three times faster than the runtime before the compact.
The time needed to traverse jsl::filter_list of 256 M integers was originally 0.66 seconds. After we removed 80% of the elements, the runtime was 0.38 s, which is less than 2 times faster. But, after the compact operation, the runtime is 0.12 seconds, which is 5 times faster than the original. Only after restoring the perfect memory layout is the speedup (5x) as one would expect.
Unordered collection of objects
The trò chơi developers first had this problem: they have a collection of objects, but the ordering is not important. What is important is that you can quickly traverse the collection, you can quickly add an object and quickly erase it. A linked list might seem like a good solution to this problem, but requirements for trò chơi performance mean that a simple linked list doesnt cut it.
So they developed a new data structure called colony that can be used to store a collection of unordered objects, with traversal, insertion and erasure complexities which are the same as those of linked lists. But colonies are much more data cache-friendly, because they internally use vectors.
Colonies allocate memory in blocks that can hold a fixed number of objects. When a new object is added to the colony, if there is no place available in any of the blocks, the colony will allocate a new memory block to hold the new object. When the object is deleted, its place is the memory block is freed and can be reused later, but colonies typically do not release memory back to the system except when the whole block of memory is không lấy phí.
Insertion into colony3
Iterators in the colonies are stable, i.e. once an object is added to the colony, its address will not change. This is important for gaming, where objects in different colonies can reference one another using pointers.
Colonies are currently in the process of being added to STL as std::colony (or std::hive). In the meantime, you can find colony implementation as plf::colony.
According to its authors, its performance is better than any std:: library container for unordered insertion, better than any std:: library container for erasure, and better than any std:: library container except for vector and dequeue.
Specializing your linked list
We gave here several interesting alternatives for a linked list, but in the end, the data pattern and data access pattern in your particular program will define what is best for you. When doing this, you should have the following guidelines in mind: (1) keeping the block of memory from which you allocate nodes as tight as possible and (2) making sure you are running through the memory sequentially should guarantee optimal performance.
Here are some examples where none of the raw approaches we presented would work:
Like what you are reading? Follow us on LinkedIn or and get notified as soon as new content becomes available.
Need help with software performance? Contact us!
Conclusions
What are the takeaways of this post? The main takeaway is that the memory layout is tremendously important for achieving good performance. A perfect memory layout (where neighboring nodes are neighboring in elements) is almost as fast as a vector.
The second is that maintaining the perfect memory layout requires some development effort on the developers side, and also it requires some additional runtime cost on the programs side. It might happen that these costs are too high and that the performance of an ordinary list is just enough. That is up to you as a developer to measure and decide.
And finally, there is no magic wand when it comes to linked list performance. While understanding the importance of good memory layout, you will need to analyze the usage pattern and with that understanding, create a data structure that is perfect for your particular application.
Reply
8
0
Chia sẻ
Share Link Down For a linked list which is faster to find miễn phí
Bạn vừa Read nội dung bài viết Với Một số hướng dẫn một cách rõ ràng hơn về Video For a linked list which is faster to find tiên tiến và phát triển nhất và Chia SẻLink Tải For a linked list which is faster to find Free.
Giải đáp vướng mắc về For a linked list which is faster to find
Nếu sau khi đọc nội dung bài viết For a linked list which is faster to find vẫn chưa hiểu thì hoàn toàn có thể lại Comment ở cuối bài để Admin lý giải và hướng dẫn lại nha
#linked #list #faster #find