0%

redisList

列表对象

redis中的列表对象相当于链表,使用的是quicklist实现,quicklist其实是一个ziplist组成的链表。

ziplist

ziplist是一种压缩链表,将链表元素类似于数组般放在一块连续区域里,其结构如下图所示。

ziplist

1
<zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>

头尾

  • <uint32_t zlbytes>: ziplist占用的总字节长
  • <uint32_t zltail>: 最后一个entry的偏移地址
  • <uint16_t zllen> ziplist中包含的Entey的数量
  • <uint8_t zlend>: 尾部标识,固定为255(0xFF)

创建一个ziplist的代码如下,因为刚创建,所以只有头尾。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/* The size of a ziplist header: two 32 bit integers for the total
* bytes count and last item offset. One 16 bit integer for the number
* of items field. */
#define ZIPLIST_HEADER_SIZE (sizeof(uint32_t)*2+sizeof(uint16_t))

/* Size of the "end of ziplist" entry. Just one byte. */
#define ZIPLIST_END_SIZE (sizeof(uint8_t))

unsigned char *ziplistNew(void) {
/* ziplist头尾的大小 */
unsigned int bytes = ZIPLIST_HEADER_SIZE+ZIPLIST_END_SIZE;
unsigned char *zl = zmalloc(bytes);

/* 初始化ziplist
* intrev32ifbe 转小端,仅在机器为大端的时候起作用 */
ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
ZIPLIST_LENGTH(zl) = 0;
zl[bytes-1] = ZIP_END;
return zl;
}

Entry

Entry中存的就是ziplist中的数据,其结构为:

1
<prevlen> <encoding> <entry-data>
prevlen

prevlen表示前一个entry的长度,因为list是双向链表,在向前遍历的时候,需要能获取上一个entry的地址。其编码规则如下

  • 如果长度小于254字节,prevlen1个字节,并直接存储长度
  • 如果长度大于等于254字节,prevlen5个字节,第一个字节为254(0xFE),后续4个字节存储长度

为啥是254呢?

因为END是255,当对ziplist进行向后遍历的时候,到下一个entry,首先判断该entry的第一个字节。若为255,表示当前ziplist结束了;若为254,则需要跳过5个字节到encoding;否则跳一个字节到encoding。

encoding

encoding部分包括了数据部分的长度和类型。前两位为11,表示该数据为数字类型,该数字将和encoding一起存储。具体规则如下:

  • 00pppppp: 1字节。表示长度小于64(2^6)字节的字符数组,pppppp表示该字符串长度。
  • 01pppppp|pppppppp: 2字节。表示长度小于16384(2^14)字节的字符数组。PS:14位数字大端存储
  • 10000000|pppppppp|pppppppp|pppppppp|pppppppp: 5字节。表示长度小于2^32字节的字符数组。PS:第一个字节的低6位未使用设置为0,32位数字大端存储
  • 11000000: int16编码的整数,encoding和data总长度为3bytes
  • 11010000: int32编码的整数,encoding和data总长度为5bytes
  • 11100000: int64编码的整数,encoding和data总长度为9bytes
  • 11110000: int24编码的整数,encoding和data总长度为4bytes
  • 11111110: int8编码的整数,encoding和data总长度为2bytes
  • 1111xxxx: xxxx取值范围为0001-1101,表示值0-12,因为0000不能使用,所以0001表示0,相当于4位的二进制值减一
  • 11111111: ziplist的结尾

接下来举个ziplist的🌰,向一个ziplist中存入"2""5"两个字符串。PS:数据大都小端存放,小端即低字节保存在低地址中;在存储的时候,都会判断字符串是否能转成数字类型,若能,则使用数字编码。

[0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff]

分别对应[zlbytes][zltail][zllen][entry: "2"][entry: "5"][zlend]

  • 0f 00 00 00: 小端存放15,表示共占15字节
  • 0c 00 00 00: 小端存放12,表示最后一个元素5的偏移地址为12
  • 02 00: 小端存放2,表示ziplist中共两个元素
  • 00 f3: 00表示前一个entry长度为0,f3表示数字2
  • 00 f6: 02表示前一个entry长度为2,f6表示数字5
  • ff: 结束标识符zlend

再加一个长点的字符串"hello world",其entry为:

[02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]

  • 02: 表示前一个entry长度为2
  • 0b: 表示数据是11字节的字符串
  • [48 65 6c 6c 6f 20 57 6f 72 6c 64]hello world的ASCII

在redis中有一个结构体存储entry的相关信息。PS:仅是为了操作方便,并非其实际编码方式。为了描述方便,本文中部分注释处会使用zlentry(p)表示获取p指向的entry的信息,在实际代码中使用的是zipEntry(unsigned char *p, zlentry *e)函数实现获取entry信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/* We use this function to receive information about a ziplist entry.
* Note that this is not how the data is actually encoded, is just what we
* get filled by a function in order to operate more easily. */
typedef struct zlentry {
/* prevrawlen编码后的长度 */
unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/
/* prevrawlen的值 */
unsigned int prevrawlen; /* Previous entry len. */
/* encoding的长度 */
unsigned int lensize; /* Bytes used to encode this entry type/len.
For example strings have a 1, 2 or 5 bytes
header. Integers always use a single byte.*/
/* 数据编码后的长度 */
unsigned int len; /* Bytes used to represent the actual entry.
For strings this is just the string length
while for integers it is 1, 2, 3, 4, 8 or
0 (for 4 bit immediate) depending on the
number range. */
unsigned int headersize; /* prevrawlensize + lensize. */
unsigned char encoding; /* Set to ZIP_STR_* or ZIP_INT_* depending on
the entry encoding. However for 4 bits
immediate integers this can assume a range
of values and must be range-checked. */
unsigned char *p; /* Pointer to the very start of the entry, that
is, this points to prev-entry-len field. */
} zlentry;

部分函数

获取prevlensize的函数,即prevlen是占1个字节还是5个字节。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#define ZIP_BIG_PREVLEN 254

/* Encode the length of the previous entry and write it to "p". This only
* uses the larger encoding (required in __ziplistCascadeUpdate). */
int zipStorePrevEntryLengthLarge(unsigned char *p, unsigned int len) {
uint32_t u32;
if (p != NULL) {
p[0] = ZIP_BIG_PREVLEN;
u32 = len;
memcpy(p + 1, &u32, sizeof(u32));
memrev32ifbe(p + 1);
}
return 1 + sizeof(uint32_t);
}

/* Encode the length of the previous entry and write it to "p". Return the
* number of bytes needed to encode this length if "p" is NULL. */
unsigned int zipStorePrevEntryLength(unsigned char *p, unsigned int len) {
if (p == NULL) {
return (len < ZIP_BIG_PREVLEN) ? 1 : sizeof(uint32_t) + 1;
} else {
if (len < ZIP_BIG_PREVLEN) {
p[0] = len;
return 1;
} else {
return zipStorePrevEntryLengthLarge(p, len);
}
}
}
插入节点
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
/* Insert item at "p". */
/* zl: 被插入的ziplist对象
* p: 插入的地址
* s: 插入的值
* slen: s的长度
* 首先获取s编码成entry的的信息: prevlen、总长度reqlen
* 然后获取要将后续entry往后移动的距离,将其复制过去,并进行级联更新
* 最后填入entry的信息 */
unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p,
unsigned char *s, unsigned int slen) {
size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen, newlen;
unsigned int prevlensize, prevlen = 0;
size_t offset;
int nextdiff = 0;
unsigned char encoding = 0;
long long value = 123456789; /* initialized to avoid warning. Using a value
that is easy to see if for some reason
we use it uninitialized. */
zlentry tail;

/* Find out prevlen for the entry that is inserted. */
if (p[0] != ZIP_END) {
/* 获取p指向entry的prevlen */
ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
} else {
/* 获取最后一个entry的len */
unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
if (ptail[0] != ZIP_END) {
prevlen = zipRawEntryLengthSafe(zl, curlen, ptail);
}
}

/* 获取reqlen,先获取数据所需编码长度 */
/* See if the entry can be encoded
* 尝试转成数字编码,成功则value和encoding转成对应值*/
if (zipTryEncoding(s, slen, &value, &encoding)) {
/* 'encoding' is set to the appropriate integer encoding */
reqlen = zipIntSize(encoding);
} else {
/* 'encoding' is untouched, however zipStoreEntryEncoding will use the
* string length to figure out how to encode it. */
reqlen = slen;
}
/* We need space for both the length of the previous entry and
* the length of the payload. */
/* 加上prevlen的长度 */
reqlen += zipStorePrevEntryLength(NULL, prevlen);
/* 加上encoding的长度 */
reqlen += zipStoreEntryEncoding(NULL, encoding, slen);

/* When the insert position is not equal to the tail, we need to
* make sure that the next entry can hold this entry's length in
* its prevlen field. */
int forcelarge = 0;
/* zipPrevLenByteDiff 得到表示reqlen所需的prevlenszie - p.prevlenszie*/
nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p, reqlen) : 0;
if (nextdiff == -4 && reqlen < 4) {
nextdiff = 0;
forcelarge = 1;
}

/* Store offset because a realloc may change the address of zl. */
offset = p - zl;
newlen = curlen + reqlen + nextdiff;
/* 调整分配的空间 */
zl = ziplistResize(zl, newlen);
p = zl + offset;

/* Apply memory move when necessary and update tail offset. */
if (p[0] != ZIP_END) {
/* Subtract one because of the ZIP_END bytes */
/* 移动后续entry的内容 */
memmove(p + reqlen, p - nextdiff, curlen - offset - 1 + nextdiff);

/* Encode this entry's raw length in the next entry. */
if (forcelarge)
zipStorePrevEntryLengthLarge(p + reqlen, reqlen);
else
zipStorePrevEntryLength(p + reqlen, reqlen);

/* Update offset for tail */
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)) + reqlen);

/* When the tail contains more than one entry, we need to take
* "nextdiff" in account as well. Otherwise, a change in the
* size of prevlen doesn't have an effect on the *tail* offset. */
assert(zipEntrySafe(zl, newlen, p + reqlen, &tail, 1));
if (p[reqlen + tail.headersize + tail.len] != ZIP_END) {
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)) + nextdiff);
}
} else {
/* This element will be the new tail. */
ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p - zl);
}

/* When nextdiff != 0, the raw length of the next entry has changed, so
* we need to cascade the update throughout the ziplist */
if (nextdiff != 0) {
offset = p - zl;
zl = __ziplistCascadeUpdate(zl, p + reqlen);
p = zl + offset;
}

/* Write the entry */
p += zipStorePrevEntryLength(p, prevlen);
p += zipStoreEntryEncoding(p, encoding, slen);
if (ZIP_IS_STR(encoding)) {
memcpy(p, s, slen);
} else {
zipSaveInteger(p, value, encoding);
}
ZIPLIST_INCR_LENGTH(zl, 1);
return zl;
}

流程图如下:

ziplist_insert
删除节点
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
/* Delete "num" entries, starting at "p". Returns pointer to the ziplist. */
/* 在zl指向的ziplist中,从p指向的entry开始删除num个entry */
unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p,
unsigned int num) {
unsigned int i, totlen, deleted = 0;
size_t offset;
int nextdiff = 0;
zlentry first, tail;
size_t zlbytes = intrev32ifbe(ZIPLIST_BYTES(zl));
/* 将p指向的entry信息填入first */
zipEntry(p, &first); /* no need for "safe" variant since the input pointer validated by the function that returned it. */
/* 将p指向删除m个entry之后的entry地址 */
for (i = 0; p[0] != ZIP_END && i < num; i++) {
p += zipRawEntryLengthSafe(zl, zlbytes, p);
deleted++;
}

assert(p >= first.p);
/* 需要删除的长度 */
totlen = p - first.p;
if (totlen > 0) {
uint32_t set_tail;
if (p[0] != ZIP_END) {
/* Storing `prevrawlen` in this entry may increase or decrease the
* number of bytes required compare to the current `prevrawlen`.
* There always is room to store this, because it was previously
* stored by an entry that is now being deleted. */
/* nextdiff = first.prevrawlensize - zlentry(p).prevrawlensize */
nextdiff = zipPrevLenByteDiff(p, first.prevrawlen);

/* Note that there is always space when p jumps backward: if
* the new previous entry is large, one of the deleted elements
* had a 5 bytes prevlen header, so there is for sure at least
* 5 bytes free and we need just 4. */
p -= nextdiff;
assert(p >= first.p && p < zl + zlbytes - 1);
/* 更新p的prevlen */
zipStorePrevEntryLength(p, first.prevrawlen);

/* Update offset for tail */
set_tail = intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)) - totlen;

/* When the tail contains more than one entry, we need to take
* "nextdiff" in account as well. Otherwise, a change in the
* size of prevlen doesn't have an effect on the *tail* offset. */
assert(zipEntrySafe(zl, zlbytes, p, &tail, 1));
if (p[tail.headersize + tail.len] != ZIP_END) {
set_tail = set_tail + nextdiff;
}

/* Move tail to the front of the ziplist */
/* since we asserted that p >= first.p. we know totlen >= 0,
* so we know that p > first.p and this is guaranteed not to reach
* beyond the allocation, even if the entries lens are corrupted. */
/* 获取需要移动的字节数 */
size_t bytes_to_move = zlbytes - (p - zl) - 1;
memmove(first.p, p, bytes_to_move);
} else {
/* The entire tail was deleted. No need to move memory. */
set_tail = (first.p - zl) - first.prevrawlen;
}

/* Resize the ziplist */
offset = first.p - zl;
zlbytes -= totlen - nextdiff;
/* 调整内存大小 */
zl = ziplistResize(zl, zlbytes);
p = zl + offset;

/* Update record count */
ZIPLIST_INCR_LENGTH(zl, -deleted);

/* Set the tail offset computed above */
assert(set_tail <= zlbytes - ZIPLIST_END_SIZE);
ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(set_tail);

/* When nextdiff != 0, the raw length of the next entry has changed, so
* we need to cascade the update throughout the ziplist */
if (nextdiff != 0) zl = __ziplistCascadeUpdate(zl, p);
}
return zl;
}

流程图如下:

ziplist_insert
级联更新 (__ziplistCascadeUpdate)

在插入和删除中都有级联更新这个操作,因为entry的长度超过254prevlen字段会由1个字节增到5个字节。考虑一连串长度在250-253间的entry,在前面加一个长度大于254的entry,会使得接下来的几串entry长度都突破254,也就是需要同时更新这一串的entry,这就是级联更新;删除同理,但是删除不会一直更新下去,第一个更新后,如果下一个prevlen由5个字节变成了1个字节,会使用5个字节存prevlen。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
/* The pointer "p" points to the first entry that does NOT need to be
* updated, i.e. consecutive fields MAY need an update. */
unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) {
zlentry cur;
size_t prevlen, prevlensize, prevoffset; /* Informat of the last changed entry. */
size_t firstentrylen; /* Used to handle insert at head. */
size_t rawlen, curlen = intrev32ifbe(ZIPLIST_BYTES(zl));
size_t extra = 0, cnt = 0, offset;
size_t delta = 4; /* Extra bytes needed to update a entry's prevlen (5-1). */
unsigned char *tail = zl + intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl));

/* Empty ziplist */
if (p[0] == ZIP_END) return zl;
/* 获取p指向entry的信息 */
zipEntry(p, &cur); /* no need for "safe" variant since the input pointer was
validated by the function that returned it. */
firstentrylen = prevlen = cur.headersize + cur.len;
/* 获取prevlen占的字节数 */
prevlensize = zipStorePrevEntryLength(NULL, prevlen);
prevoffset = p - zl;
/* 指向第一个需要更新的entry */
p += prevlen;

/* Iterate ziplist to find out how many extra bytes do we need to update it. */
while (p[0] != ZIP_END) {
assert(zipEntrySafe(zl, curlen, p, &cur, 0));

/* Abort when "prevlen" has not changed. */
if (cur.prevrawlen == prevlen) break;

/* Abort when entry's "prevlensize" is big enough. */
if (cur.prevrawlensize >= prevlensize) {
/* (5,5) (1,1)*/
if (cur.prevrawlensize == prevlensize) {
zipStorePrevEntryLength(p, prevlen);
} else {
/* This would result in shrinking, which we want to avoid.
* So, set "prevlen" in the available bytes. */
/* 为了防止继续更新下去,将1个字节的prevlen存成5个字节 */
/* (5,1) */
zipStorePrevEntryLengthLarge(p, prevlen);
}
break;
}

/* cur.prevrawlen means cur is the former head entry. */
/* prevrawlensize只剩下(1,5)了,也就是从1增到5导致的级联更新 */
assert(cur.prevrawlen == 0 || cur.prevrawlen + delta == prevlen);

/* Update prev entry's info and advance the cursor. */
/* 为下一轮循环更新数据 */
rawlen = cur.headersize + cur.len;
prevlen = rawlen + delta;
prevlensize = zipStorePrevEntryLength(NULL, prevlen);
prevoffset = p - zl;
p += rawlen;
/* extra存放增长的字节数 */
extra += delta;
cnt++;
}

/* Extra bytes is zero all update has been done(or no need to update). */
/* 不是由prevrawlensize增长导致的级联更新已经结束了 */
if (extra == 0) return zl;

/* Update tail offset after loop. */
if (tail == zl + prevoffset) {
/* When the the last entry we need to update is also the tail, update
* tail offset unless this is the only entry that was updated (so the
* tail offset didn't change). */
/* tail不关心最后一个entry的长度 */
if (extra - delta != 0) {
ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(
intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)) + extra - delta);
}
} else {
/* Update the tail offset in cases where the last entry we updated is not the tail. */
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)) + extra);
}

/* Now "p" points at the first unchanged byte in original ziplist,
* move data after that to new ziplist. */
offset = p - zl;
zl = ziplistResize(zl, curlen + extra);
p = zl + offset;
memmove(p + extra, p, curlen - offset - 1);
p += extra;

/* Iterate all entries that need to be updated tail to head. */
/* 由后往前复制entry,p指向的是第一个不需要更改的entry,prevoffset当前需要修改的entry的偏移位置 */
while (cnt) {
zipEntry(zl + prevoffset, &cur); /* no need for "safe" variant since we already iterated on all these entries above. */
rawlen = cur.headersize + cur.len;
/* Move entry to tail and reset prevlen. */
/* 减cur.prevrawlensize是因为prevlen不需要复制过去,是需要修改的 */
memmove(p - (rawlen - cur.prevrawlensize),
zl + prevoffset + cur.prevrawlensize,
rawlen - cur.prevrawlensize);
p -= (rawlen + delta);
if (cur.prevrawlen == 0) {
/* "cur" is the previous head entry, update its prevlen with
* firstentrylen. */
zipStorePrevEntryLength(p, firstentrylen);
} else {
/* An entry's prevlen can only increment 4 bytes. */
zipStorePrevEntryLength(p, cur.prevrawlen + delta);
}
/* Foward to previous entry. */
prevoffset -= cur.prevrawlen;
cnt--;
}
return zl;
}

流程图如下:

ziplist_CascadeUpdate

优缺点

ziplist是紧密的堆放在一起的,所以会比较节省空间。但也因此每次更新的时候,都可能需要移动后续所有节点,还有可能触发级联更新。

quicklist

quicklistNode

先了解一下quicklistNode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/* quicklistNode is a 32 byte struct describing a ziplist for a quicklist.
* We use bit fields keep the quicklistNode at 32 bytes.
* count: 16 bits, max 65536 (max zl bytes is 65k, so max count actually < 32k).
* encoding: 2 bits, RAW=1, LZF=2.
* container: 2 bits, NONE=1, ZIPLIST=2.
* recompress: 1 bit, bool, true if node is temporary decompressed for usage.
* attempted_compress: 1 bit, boolean, used for verifying during testing.
* extra: 10 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
struct quicklistNode *prev;
struct quicklistNode *next;
unsigned char *zl;
unsigned int sz; /* ziplist size in bytes */
unsigned int count : 16; /* count of items in ziplist */
unsigned int encoding : 2; /* RAW==1 or LZF==2 */
unsigned int container : 2; /* NONE==1 or ZIPLIST==2 */
unsigned int recompress : 1; /* was this node previous compressed? */
unsigned int attempted_compress : 1; /* node can't compress; too small */
unsigned int extra : 10; /* more bits to steal for future usage */
} quicklistNode;

能比较明显地看出来,这是一个双向链表的节点,有指向前后的指针,指向当前节点所存数据的指针,以及数据的一些控制信息。

quciklist结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#if UINTPTR_MAX == 0xffffffff
/* 32-bit */
# define QL_FILL_BITS 14
# define QL_COMP_BITS 14
# define QL_BM_BITS 4
#elif UINTPTR_MAX == 0xffffffffffffffff
/* 64-bit */
# define QL_FILL_BITS 16
# define QL_COMP_BITS 16
# define QL_BM_BITS 4 /* we can encode more, but we rather limit the user
since they cause performance degradation. */
#else
# error unknown arch bits count
#endif

/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist.
* 'count' is the number of total entries.
* 'len' is the number of quicklist nodes.
* 'compress' is: 0 if compression disabled, otherwise it's the number
* of quicklistNodes to leave uncompressed at ends of quicklist.
* 'fill' is the user-requested (or default) fill factor.
* 'bookmakrs are an optional feature that is used by realloc this struct,
* so that they don't consume memory when not used. */
typedef struct quicklist {
quicklistNode *head;
quicklistNode *tail;
unsigned long count; /* total count of all entries in all ziplists */
unsigned long len; /* number of quicklistNodes */
int fill : QL_FILL_BITS; /* fill factor for individual nodes */
unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
unsigned int bookmark_count: QL_BM_BITS;
quicklistBookmark bookmarks[];
} quicklist;

大部分的比较好理解,就是一个由quicklistNode组成的双向链表。比较不好理解的是fillcompress,先讲一下这两个:

fill

前面也说了,quicklist是由ziplist组成的链表,那么一个节点放多大的ziplist好呢。ziplist太大了,不利于对其进行修改;ziplist太小了,相当于一个普通的链表。fill就是用来控制ziplist的大小的参数,可通过配置list-max-ziplist-size来设置。

fill>=0时,表示ziplist中的节点数需要小于fill

fill<0时,是对ziplist的大小进行限制,其取值为[-1, -4],默认为-2

  • -1:最大不超过4KB
  • -2:最大不超过8KB,默认值
  • -3:最大不超过16KB
  • -4:最大不超过32KB
  • -5:最大不超过64KB
compress

因为quicklist一般使用到的都只是头尾两部分,中间节点较少访问,而且对长链而言,中间节点占大多数空间。所以可以考虑对中间节点进行压缩。compress就是压缩的控制参数,可通过配置list-compress-depth来设置。

  • 0:表示不压缩,默认值
  • 1:表示quicklist列表的两端各有1个节点不压缩,中间的节点压缩。
  • 2:表示quicklist列表的两端各有2个节点不压缩,中间的节点压缩。
  • 3:表示quicklist列表的两端各有3个节点不压缩,中间的节点压缩。
  • 以此类推 …

quicklistEntry

1
2
3
4
5
6
7
8
9
typedef struct quicklistEntry {
const quicklist *quicklist; // 所属quicklist
quicklistNode *node; // 所属quickNode
unsigned char *zi; // 所属ziplist指针
unsigned char *value; // 字符串值
long long longval; // 数值
unsigned int sz; // 大小
int offset; // 偏移
} quicklistEntry;

此结构体跟ziplist中的zlentry类似,保存的是一个entry的信息。

部分函数

Push

二者源码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/* Add new entry to head node of quicklist.
*
* Returns 0 if used existing head.
* Returns 1 if new head created. */
int quicklistPushHead(quicklist *quicklist, void *value, size_t sz) {
quicklistNode *orig_head = quicklist->head;
if (likely(
_quicklistNodeAllowInsert(quicklist->head, quicklist->fill, sz))) {
quicklist->head->zl =
ziplistPush(quicklist->head->zl, value, sz, ZIPLIST_HEAD);
quicklistNodeUpdateSz(quicklist->head);
} else {
quicklistNode *node = quicklistCreateNode();
node->zl = ziplistPush(ziplistNew(), value, sz, ZIPLIST_HEAD);

quicklistNodeUpdateSz(node);
_quicklistInsertNodeBefore(quicklist, quicklist->head, node);
}
quicklist->count++;
quicklist->head->count++;
return (orig_head != quicklist->head);
}

/* Add new entry to tail node of quicklist.
*
* Returns 0 if used existing tail.
* Returns 1 if new tail created. */
int quicklistPushTail(quicklist *quicklist, void *value, size_t sz) {
quicklistNode *orig_tail = quicklist->tail;
if (likely(
_quicklistNodeAllowInsert(quicklist->tail, quicklist->fill, sz))) {
quicklist->tail->zl =
ziplistPush(quicklist->tail->zl, value, sz, ZIPLIST_TAIL);
quicklistNodeUpdateSz(quicklist->tail);
} else {
quicklistNode *node = quicklistCreateNode();
node->zl = ziplistPush(ziplistNew(), value, sz, ZIPLIST_TAIL);

quicklistNodeUpdateSz(node);
_quicklistInsertNodeAfter(quicklist, quicklist->tail, node);
}
quicklist->count++;
quicklist->tail->count++;
return (orig_tail != quicklist->tail);
}

可以看到这两个的流程都是类似的,先判断是否可以在第一个/最后一个节点插入,若可以,则使用ziplist的push方法,然后更新节点的大小信息;否则新建一个节点,并将其插入到头/尾部。详情如下:

再看看_quicklistNodeAllowInsert这个函数,判断是否可以在该节点插入

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
/* Optimization levels for size-based filling */
static const size_t optimization_level[] = {4096, 8192, 16384, 32768, 65536};

/* Maximum size in bytes of any multi-element ziplist.
* Larger values will live in their own isolated ziplists. */
#define SIZE_SAFETY_LIMIT 8192

#define sizeMeetsSafetyLimit(sz) ((sz) <= SIZE_SAFETY_LIMIT)

REDIS_STATIC int _quicklistNodeAllowInsert(const quicklistNode *node,
const int fill, const size_t sz) {
if (unlikely(!node))
return 0;

int ziplist_overhead;
/* size of previous offset */
/* 后一个节点的prevlen */
if (sz < 254)
ziplist_overhead = 1;
else
ziplist_overhead = 5;

/* size of forward offset */
/* 加上encoding字段的长度 */
if (sz < 64)
ziplist_overhead += 1;
else if (likely(sz < 16384))
ziplist_overhead += 2;
else
ziplist_overhead += 5;

/* new_sz overestimates if 'sz' encodes to an integer type */
unsigned int new_sz = node->sz + sz + ziplist_overhead;
if (likely(_quicklistNodeSizeMeetsOptimizationRequirement(new_sz, fill)))
return 1;
/* 判断大小是否超限 */
else if (!sizeMeetsSafetyLimit(new_sz))
return 0;
/* 判断fill大于0时,节点数是否符合要求 */
else if ((int)node->count < fill)
return 1;
else
return 0;
}

/* 判断fill<0时,大小是否符合要求 */
REDIS_STATIC int
_quicklistNodeSizeMeetsOptimizationRequirement(const size_t sz,
const int fill) {
if (fill >= 0)
return 0;

size_t offset = (-fill) - 1;
/* 相当于offset<len(optimization_level) */
if (offset < (sizeof(optimization_level) / sizeof(*optimization_level))) {
/* 新的长度小于fill对应值,才返回1 */
if (sz <= optimization_level[offset]) {
return 1;
} else {
return 0;
}
} else {
return 0;
}
}

其中new_sz其实计算的是:后一个节点的prevlen + encoding + sz,而且没有考虑级联更新的结果,只是一个大致的值

Pop

弹出的操作只需要先将首/尾的值取出,然后将该entry删除即可。需要考虑的是该值是字符串类型或数值类型,可通过entryencoding字段判别

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
/* pop from quicklist and return result in 'data' ptr.  Value of 'data'
* is the return value of 'saver' function pointer if the data is NOT a number.
*
* If the quicklist element is a long long, then the return value is returned in
* 'sval'.
*
* Return value of 0 means no elements available.
* Return value of 1 means check 'data' and 'sval' for values.
* If 'data' is set, use 'data' and 'sz'. Otherwise, use 'sval'. */
int quicklistPopCustom(quicklist *quicklist, int where, unsigned char **data,
unsigned int *sz, long long *sval,
void *(*saver)(unsigned char *data, unsigned int sz)) {
unsigned char *p;
unsigned char *vstr;
unsigned int vlen;
long long vlong;
int pos = (where == QUICKLIST_HEAD) ? 0 : -1;

if (quicklist->count == 0)
return 0;

if (data)
*data = NULL;
if (sz)
*sz = 0;
if (sval)
*sval = -123456789;

/* 将node指向需要pop的Node */
quicklistNode *node;
if (where == QUICKLIST_HEAD && quicklist->head) {
node = quicklist->head;
} else if (where == QUICKLIST_TAIL && quicklist->tail) {
node = quicklist->tail;
} else {
return 0;
}

/* 根据pos去遍历ziplist,获取pos地址 */
p = ziplistIndex(node->zl, pos);
/* 根据p的地址,获取值,如果是字符串encoding,则存入vstr,并将长度存入vlen,否则将数字存入vlong */
if (ziplistGet(p, &vstr, &vlen, &vlong)) {
if (vstr) {
if (data)
*data = saver(vstr, vlen);
if (sz)
*sz = vlen;
} else {
if (data)
*data = NULL;
if (sval)
*sval = vlong;
}
/* 将该节点删除 */
quicklistDelIndex(quicklist, node, &p);
return 1;
}
return 0;
}

/* Return a malloc'd copy of data passed in */
REDIS_STATIC void *_quicklistSaver(unsigned char *data, unsigned int sz) {
unsigned char *vstr;
if (data) {
vstr = zmalloc(sz);
memcpy(vstr, data, sz);
return vstr;
}
return NULL;
}

/* Default pop function
*
* Returns malloc'd value from quicklist */
int quicklistPop(quicklist *quicklist, int where, unsigned char **data,
unsigned int *sz, long long *slong) {
unsigned char *vstr;
unsigned int vlen;
long long vlong;
if (quicklist->count == 0)
return 0;
int ret = quicklistPopCustom(quicklist, where, &vstr, &vlen, &vlong,
_quicklistSaver);
if (data)
*data = vstr;
if (slong)
*slong = vlong;
if (sz)
*sz = vlen;
return ret;
}