redis的内存淘汰机制

[TOC]

内存淘汰机制

Redis在使用内存达到某个阈值（通过maxmemory配置)的时候，就会触发内存淘汰机制，选取一些key来删除。内存淘汰有许多策略，下面分别介绍这几种不同的策略。

1 2	# maxmemory <bytes> 配置内存阈值 # maxmemory-policy noeviction

noeviction：当内存不足以容纳新写入数据时，新写入操作会报错。默认策略
allkeys-lru：当内存不足以容纳新写入数据时，在键空间（server.db[i].dict）中，移除最近最少使用的key。
allkeys-random：当内存不足以容纳新写入数据时，在键空间中，随机移除某个key。
allkeys-lfu：对所有key使用LFU算法进行删除。
volatile-lru：当内存不足以容纳新写入数据时，在设置了过期时间的键空间（server.db[i].expires）中，移除最近最少使用的key。
volatile-random：当内存不足以容纳新写入数据时，在设置了过期时间的键空间中，随机移除某个key。
volatile-lfu：对所有设置了过期时间的key使用LFU算法进行删除
volatile-ttl：当内存不足以容纳新写入数据时，在设置了过期时间的键空间中，有更早过期时间的key优先移除。

如何选择淘汰机制

在Redis中，数据有一部分访问频率较高，其余部分访问频率较低，或者无法预测数据的使用频率时，设置allkeys-lru是比较合适的。
如果所有数据访问概率大致相等时，可以选择allkeys-random。
如果研发者需要通过设置不同的ttl来判断数据过期的先后顺序，此时可以选择volatile-ttl策略。
如果希望一些数据能长期被保存，而一些数据可以被淘汰掉时，选择volatile-lru或volatile-random都是比较不错的。
由于设置expire会消耗额外的内存，如果计划避免Redis内存在此项上的浪费，可以选用allkeys-lru 策略，这样就可以不再设置过期时间，高效利用内存了。

淘汰机制分类

lru

LRU(Least recently used，最近最少使用)算法根据数据的历史访问记录来进行淘汰数据，其核心思想是“如果数据最近被访问过，那么将来被访问的几率也更高”。

lfu

LFU，最近最少使用淘汰

random

在随机淘汰的场景下获取待删除的键值对，随机找hash桶再次hash指定位置的dictEntry即可。

源码分析

内存淘汰主逻辑

int performEvictions(void) {
    // 算出需要释放多少内存 mem_tofree
    if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK) {
    	return EVICT_OK;
    }
     while (mem_freed < (long long)mem_tofree) {
         // 先判断是否是 lru, lfu, ttl 策略
        if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
            server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
        {
                /* We don't want to make local-db choices when expiring keys,
                 * so to start populate the eviction pool sampling keys from
                 * every DB. */
                for (i = 0; i < server.dbnum; i++) {
                    db = server.db+i;
                    dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
                            db->dict : db->expires;
                    if ((keys = dictSize(dict)) != 0) {
                        evictionPoolPopulate(i, dict, db->dict, pool);
                        total_keys += keys;
                    }
                }
            	// 根据策略计算出每一个样本的idle值，值越高，可以理解为匹配度越高，优先删除
            	// 倒序查， 数组后面的更适合被删除
                /* Go backward from best to worst element to evict. */
                for (k = EVPOOL_SIZE-1; k >= 0; k--) {
                    if (pool[k].key == NULL) continue;

                    /* If the key exists, is our pick. Otherwise it is
                     * a ghost and we need to try the next element. */
                    if (de) {
                        bestkey = dictGetKey(de);
                        break;
                    } else {
                        /* Ghost... Iterate again. */
                    }
                }
            
        }        
         /* volatile-random and allkeys-random policy */
        else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
                 server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
        {
            // random 就比较简答了
        }
         
        /* Finally remove the selected key. */
        if (bestkey) {
            // 删除
        }
         
     }
    
}

采样并计算 idle (score)

void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
    // 采样，代码分析见下面
    count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
    for (j = 0; j < count; j++) {
    	// 在这里根据策略来计算得分 
    /* Calculate the idle time according to the policy. This is called
 		* idle just because the code initially handled LRU, but is in fact
 		* just a score where an higher score means better candidate. */
        pool[k].idle = idle;
				pool[k].dbid = dbid;
    }
    
}

采样

/* This function samples the dictionary to return a few keys from random
 * locations.
 *
 * It does not guarantee to return all the keys specified in 'count', nor
 * it does guarantee to return non-duplicated elements, however it will make
 * some effort to do both things.
 *
 * Returned pointers to hash table entries are stored into 'des' that
 * points to an array of dictEntry pointers. The array must have room for
 * at least 'count' elements, that is the argument we pass to the function
 * to tell how many random elements we need.
 */

// 数据被存放 des 中， des 是一个数组
unsigned int dictGetSomeKeys(dict *d, dictEntry **des, unsigned int count) {
    // 随机选一个 table
    unsigned long i = random() & maxsizemask;
    // j 用于判断是否处于 rehash 阶段
    dictEntry *he = d->ht[j].table[i];
    // 向数组中放样本数据
    while (he) {
        /* Collect all the elements of the buckets found non
         * empty while iterating. */
        *des = he;
        des++;
        // hash 冲突的链表
        he = he->next;
        stored++;
        if (stored == count) return stored;
    }
    // 返回采样数量
    return stored;
}

redis LRU 实现

如果基于传统 LRU 算法实现 Redis LRU 会有什么问题？

需要额外的数据结构存储，消耗内存。Redis LRU 对传统的 LRU 算法进行了改良，通过随机采样来调整算法的精度。

如果淘汰策略是 LRU，则根据配置的采样值 maxmemory_samples（默认是 5 个）,随机从数据库中选择 m 个 key, 淘汰其中热度最低的 key 对应的缓存数据。所以采样参数m配置的数值越大, 就越能精确的查找到待淘汰的缓存数据,但是也消耗更多的CPU计算,执行效率降低。

如何找出热度最低的数据？

Redis 中所有对象结构都有一个 lru 字段, 且使用了 unsigned 的低 24 位，这个字段用来记录对象的热度。对象被创建时会记录 lru 值。在被访问的时候也会更新 lru 的值。但不是获取系统当前的时间戳，而是设置为全局变量server.lruclock 的值。

// 对象在被读写时，会更新 lru 时间
typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;

问题：server.lruclock 的值怎么来的？

Redis中有个定时处理的函数serverCron，默认每100毫秒调用函数。 updateCachedTime 更新一次全局变量的server.lruclock的值，它记录的是当前unix时间戳。

问题：为什么不获取精确的时间而是放在全局变量中？不会有延迟的问题吗？

这样函数 lookupKey 中更新数据的 lru 热度值时,就不用每次调用系统函数 time，可以提高执行效率。

OK，当对象里面已经有了 LRU 字段的值，就可以评估对象的热度了。函数 estimateObjectIdleTime 评估指定对象的 lru 热度，思想就是对象的 lru 值和全局的 server.lruclock 的差值越大（越久没有得到更新），该对象热度越低。

为什么不用常规的哈希表+双向链表的方式实现？

需要额外的数据结构，消耗资源。而 Redis LRU 算法在 sample 为 10 的情况下，已经能接近传统 LRU 算法了。

假设 A 在 10 秒内被访问了 5 次，而 B 在 10 秒内被访问了 3 次。因为 B 最后一次被访问的时间比 A 要晚，在同等的情况下，A 反而先被回收。

redis LFU 实现

当这 24 bits 用作 LFU 时，其被分为两部分：

高 16 位用来记录访问时间（单位为分钟，ldt，last decrement time）
低 8 位用来记录访问频率，简称 counter（logc，logistic counter）

counter 是用基于概率的对数计数器实现的，8 位可以表示百万次的访问频率。对象被读写的时候，lfu 的值会被更新。

/* Update LFU when an object is accessed.
 * Firstly, decrement the counter if the decrement time is reached.
 * Then logarithmically increment the counter, and update the access time. */
void updateLFU(robj *val) {
    unsigned long counter = LFUDecrAndReturn(val);
    counter = LFULogIncr(counter);
    val->lru = (LFUGetTimeInMinutes()<<8) | counter;
}

如果计数器只会递增不会递减，也不能体现对象的热度。没有被访问的时候，计数器怎么递减呢？

减少的值由衰减因子 lfu-decay-time（分钟）来控制，如果值是 1 的话，N 分钟没有访问就要减少 N。 redis.conf 配置文件# lfu-decay-time 1

unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    if (num_periods)
        counter = (num_periods > counter) ? 0 : counter - num_periods;
    return counter;
}

参考

Redis内存回收知多少

Redis内存淘汰策略源码分析以及LFU/LRU实现