Lua Table
前言
table 是lua中是lua中唯一的复合结构, 大多数高级语言中具有的:Array/Map/Class/Struct 数据组织结构均可通过table去实现。table核心的实现主要有:
- Array part: 存储数组部分数据结构
- Hash part: 存储Map相关数据
- metatable: 实现核心func以及运算符重载逻辑,class的数据模式也是利用这个机制实现
数据结构定义
table也是可gc object,所以”继承”了CommonHeader,类似大多数的map/dictionary实现,lua hash部分每个kv都封装在一个Node(也就是entry)中,其中的key决定pos。
- flags 8bit标记了Lua是否实现了对应的元方法,Lua一共可以实现24个元方法,其中前6个lua认为是tag method,需要标记出来,避免每次都查表
- lsizenode hash部分已分配内存size的平方根(证明都是szie^2分配)
- sizearray array部分已经分配的size
- array 存储array数据的部分
- node 存储hash数据
- lastfree 指向一个free,可存储node的数据节点
- metatable 元表
/*
** Tables
*/
typedef struct Table {
CommonHeader;
lu_byte flags; /* 1<<p means tagmethod(p) is not present */
lu_byte lsizenode; /* log2 of size of 'node' array */
unsigned int sizearray; /* size of 'array' array */
TValue *array; /* array part */
Node *node;
Node *lastfree; /* any free position is before this position */
struct Table *metatable;
GCObject *gclist;
} Table;
typedef union TKey {
struct {
TValuefields;
int next; /* for chaining (offset for next node) */
} nk;
TValue tvk;
} TKey;
typedef struct Node {
TValue i_val;
TKey i_key;
} Node;
Table 存数据
类似tbl[k]=v
操作,都是在执行一条set table的指令:OP_SETTABLE,具体指令定义暂不分析,下图是指令执行流程:
luaH_newkey 向table中插入一个不存在的key。类似大多数map实现,此处也有个mainpostion概念,对于任意key,hash空间长度:sizehash, 那么mainposition就是使得tbl->node[pos] = hash(val)%sizehash
的position。hash算法必然有冲突的情况,遇到冲突则就需要把entry 链到首位,流程如下:
/*
** 向table中插入一个新key;1. 检查key的main position是否被占据。 2. 如果已经被占,检查目前占领
** 该mian position的node是否是它自己的main position,如果不是则它该node移动到其他free节点,
** 新key即可占领改main position。 3.如果main position未被占,则直接赋值即可
*/
TValue *luaH_newkey (lua_State *L, Table *t, const TValue *key) {
Node *mp;
TValue aux;
if (ttisnil(key)) luaG_runerror(L, "table index is nil");
else if (ttisfloat(key)) { //如果是可以转换为integer的number则转化一下,对流程无影响
lua_Integer k;
if (luaV_tointeger(key, &k, 0)) { /* does index fit in an integer? */
setivalue(&aux, k);
key = &aux; /* insert it as an integer */
}
else if (luai_numisnan(fltvalue(key)))
luaG_runerror(L, "table index is NaN");
}
mp = mainposition(t, key); //计算main position
if (!ttisnil(gval(mp)) || isdummy(t)) { /* main position 被占领了 */
Node *othern;
Node *f = getfreepos(t); /* get a free place */
if (f == NULL) { /* cannot find a free place? */
rehash(L, t, key); /* rehash 保证一定要能找到一个free pos*/
return luaH_set(L, t, key);
}
lua_assert(!isdummy(t));
othern = mainposition(t, gkey(mp)); //计算pos上node对应的main position
if (othern != mp) { /* 当前占领mainPos的节点并不是这个节点的main position/
/* yes; move colliding node into free position */
while (othern + gnext(othern) != mp) /* 说明other一定是f这个mainpPos的pre节点,通过next偏移找到它*/
othern += gnext(othern);
gnext(othern) = cast_int(f - othern); /* rechain to point to 'f' */
*f = *mp; /* copy colliding node into free pos. (mp->next also goes) */
if (gnext(mp) != 0) {
gnext(f) += cast_int(mp - f); /* correct 'next' 现在f是mp的深拷贝了,但是mp可能chain 某个pre,所以这里还需要处理一下偏移问题*/
gnext(mp) = 0; /* now 'mp' is free */
}
setnilvalue(gval(mp));
}
else { /*mp就是 main position,冲突的情况 */
/* new node will go into free position */
if (gnext(mp) != 0)
gnext(f) = cast_int((mp + gnext(mp)) - f); /* chain new position 让f成为mainPos的第二个节点 */
else lua_assert(gnext(f) == 0);
gnext(mp) = cast_int(f - mp); //把f chain到mp上
mp = f;
}
}
setnodekey(L, &mp->i_key, key);
luaC_barrierback(L, t, key);
lua_assert(ttisnil(gval(mp)));
return gval(mp);
}
Table rehash
当luaH_newkey过程中无法找到free pos时,就会触发 rehash。
- 计算各个bit段的key count
- 计算出array部分需要扩展的size
- resize 扩展array和hash部分size
注:先统计nums,再计算array size,主要为了避免array size低效率扩展,参考:computesizes实现即可
/*
** nums[i] = number of keys 'k' where 2^(i - 1) < k <= 2^i
*/
static void rehash (lua_State *L, Table *t, const TValue *ek) {
unsigned int asize; /* optimal size for array part */
unsigned int na; /* number of keys in the array part */
unsigned int nums[MAXABITS + 1];
int i;
int totaluse;
for (i = 0; i <= MAXABITS; i++) nums[i] = 0; /* reset counts */
na = numusearray(t, nums); /* count keys in array part */
totaluse = na; /* all those keys are integer keys */
totaluse += numusehash(t, nums, &na); /* count keys in hash part */
/* count extra key */
na += countint(ek, nums);
totaluse++;
/* compute new size for array part */
asize = computesizes(nums, &na);
/* resize the table to new computed sizes */
luaH_resize(L, t, asize, totaluse - na);
}
Table resize
resize 时rehash的中间过程,真正处理array&hash的部分内存扩展。
- array 部分比较简单,直接realloc就行,然后把新alloc出来的obj setnilval即可
- hash部分相对费一点,主要分四步: 1.保存old hash指针 2.alloc新的hash空间 3.把old hash空间调整到new hash 空间 4.释放old hash空间内存
void luaH_resize (lua_State *L, Table *t, unsigned int nasize,
unsigned int nhsize) {
unsigned int i;
int j;
AuxsetnodeT asn;
unsigned int oldasize = t->sizearray;
int oldhsize = allocsizenode(t);
Node *nold = t->node; /* save old hash ... */
if (nasize > oldasize) /* array part must grow? */
setarrayvector(L, t, nasize);
/* create new hash part with appropriate size */
asn.t = t; asn.nhsize = nhsize;
if (luaD_rawrunprotected(L, auxsetnode, &asn) != LUA_OK) { /* mem. error? */
setarrayvector(L, t, oldasize); /* array back to its original size */
luaD_throw(L, LUA_ERRMEM); /* rethrow memory error */
}
if (nasize < oldasize) { /* array part must shrink? */
t->sizearray = nasize;
/* re-insert elements from vanishing slice */
for (i=nasize; i<oldasize; i++) {
if (!ttisnil(&t->array[i]))
luaH_setint(L, t, i + 1, &t->array[i]);
}
/* shrink array */
luaM_reallocvector(L, t->array, oldasize, nasize, TValue);
}
/* re-insert elements from hash part */
for (j = oldhsize - 1; j >= 0; j--) {
Node *old = nold + j;
if (!ttisnil(gval(old))) {
/* doesn't need barrier/invalidate cache, as entry was
already present in the table */
setobjt2t(L, luaH_set(L, t, gkey(old)), gval(old));
}
}
if (oldhsize > 0) /* not the dummy node? */
luaM_freearray(L, nold, cast(size_t, oldhsize)); /* free old hash */
}
Table 使用注意事项
- #table 取array长度指令
“#” 操作符仅仅算table的array部分,并且是不是逐项遍历,用二分法找nil obj,然后算出一个pos,实现如下:
/*
** Try to find a boundary in table 't'. A 'boundary' is an integer index
** such that t[i] is non-nil and t[i+1] is nil (and 0 if t[1] is nil).
*/
lua_Unsigned luaH_getn (Table *t) {
unsigned int j = t->sizearray;
if (j > 0 && ttisnil(&t->array[j - 1])) {
/* there is a boundary in the array part: (binary) search for it */
unsigned int i = 0;
while (j - i > 1) {
unsigned int m = (i+j)/2;
if (ttisnil(&t->array[m - 1])) j = m;
else i = m;
}
return i;
}
/* else must find a boundary in hash part */
else if (isdummy(t)) /* hash part is empty? */
return j; /* that is easy... */
else return unbound_search(t, j);
}
那么在计算array size时,如果确定array中间无空洞则可以使用“#”,否则还是手搬一个逐项遍历, 如下面测试就可能出现意外结果了:
local arr = {1,nil,3,nil, 5,6, nil, nil}
print(#arr)
--output: 1
- metable 中__newindex, 如果fastset找到已存在的key, 则不会执行到__newindex,具体实现:
#define settableProtected(L,t,k,v) { const TValue *slot;
if (!luaV_fastset(L,t,k,slot,luaH_get,v))
Protect(luaV_finishset(L,t,k,v,slot)); }
测试代码:
local arr = {}
arr[1] = 5
local mt = {}
mt.__newindex = function(tbl, k, v)
print("hello", k, v)
end
setmetatable(arr, mt)
arr[1] = 4
arr[2] = 6
–output:
–hello 2 6
- ipairs 迭代器实现简单说就是遍历array部分,如果对应的val不是nil,则继续,否则就终止,所以中间不能有“空洞”
static int ipairsaux (lua_State *L) {
lua_Integer i = luaL_checkinteger(L, 2) + 1;
lua_pushinteger(L, i);
return (lua_geti(L, 1, i) == LUA_TNIL) ? 1 : 2;
}
测试用例:
local arr = {1,nil,3,nil, 5,6, nil, nil}
for i,v in ipairs(arr) do
print(v)
end
–output:
–1
<hr style="visibility: hidden;"/>
<link rel="stylesheet" href="https://unpkg.com/gitalk/dist/gitalk.css"/>
<script src="https://unpkg.com/gitalk@latest/dist/gitalk.min.js"></script>
<div id="gitalk-container"></div>
<script src="/js/md5.min.js"></script>
<script type="text/javascript">
var gitalk = new Gitalk({
clientID: 'd468c9684ec424e3e5cc',
clientSecret: 'b52faba288acb42039032014d5fd6f781ef3054a',
repo: 'lixiang-share.github.io',
owner: 'lixiang-share',
admin: ['lixiang-share'],
distractionFreeMode: true,
id: md5(location.pathname),
});
gitalk.render('gitalk-container');
</script>