Glibc内存管理--ptmalloc2源代码分析（十六） -

mqzhuang

浏览: 185280 次
性别:
来自: 杭州

最近访客更多访客>>

iams13

hyfcomeon

zhangyadick18

hanchengye0423

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (39)

社区版块

存档分类

Glibc内存管理--ptmalloc2源代码分析（十六）

thread Access UP Cache

5.3 核心结构体分析

每个分配区是 struct malloc_state 的一个实例， ptmalloc 使用 malloc_state 来管理分配区，而参数管理使用 struct malloc_par ，全局拥有一个唯一的 malloc_par 实例。

5.3.1 malloc_state

stuct malloc_state 的定义如下：

struct malloc_state {
  /* Serialize access.  */
  mutex_t mutex;

  /* Flags (formerly in max_fast).  */
  int flags;

#if THREAD_STATS
  /* Statistics for locking.  Only used if THREAD_STATS is defined.  */
  long stat_lock_direct, stat_lock_loop, stat_lock_wait;
#endif

  /* Fastbins */
  mfastbinptr      fastbinsY[NFASTBINS];

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr        top;

  /* The remainder from the most recent split of a small request */
  mchunkptr        last_remainder;

  /* Normal bins packed as described above */
  mchunkptr        bins[NBINS * 2 - 2];

  /* Bitmap of bins */
  unsigned int     binmap[BINMAPSIZE];

  /* Linked list */
  struct malloc_state *next;

#ifdef PER_THREAD
  /* Linked list for free arenas.  */
  struct malloc_state *next_free;
#endif

  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

Mutex 用于串行化访问分配区，当有多个线程访问同一个分配区时，第一个获得这个 mutex 的线程将使用该分配区分配内存，分配完成后，释放该分配区的 mutex ，以便其它线程使用该分配区。

Flags 记录了分配区的一些标志， bit0 用于标识分配区是否包含至少一个 fast bin chunk ， bit1 用于标识分配区是否能返回连续的虚拟地址空间。

/*
  FASTCHUNKS_BIT held in max_fast indicates that there are probably
  some fastbin chunks. It is set true on entering a chunk into any
  fastbin, and cleared only in malloc_consolidate.

  The truth value is inverted so that have_fastchunks will be true
  upon startup (since statics are zero-filled), simplifying
  initialization checks.
*/
#define FASTCHUNKS_BIT        (1U)
#define have_fastchunks(M)     (((M)->flags &  FASTCHUNKS_BIT) == 0)
#ifdef ATOMIC_FASTBINS
#define clear_fastchunks(M)    catomic_or (&(M)->flags, FASTCHUNKS_BIT)
#define set_fastchunks(M)      catomic_and (&(M)->flags, ~FASTCHUNKS_BIT)
#else
#define clear_fastchunks(M)    ((M)->flags |=  FASTCHUNKS_BIT)
#define set_fastchunks(M)      ((M)->flags &= ~FASTCHUNKS_BIT)
#endif

上面的宏用于设置或是置位 flags 中 fast chunk 的标志位 bit0 ，如果 bit0 为 0 ，表示分配区中有 fast chunk ，如果为 1 表示没有 fast chunk ，初始化完成后的 malloc_state 实例中， flags 值为 0 ，表示该分配区中有 fast chunk ，但实际上没有，试图从 fast bins 中分配 chunk 都会返回 NULL ，在第一次调用函数 malloc_consolidate() 对 fast bins 进行 chunk 合并时，如果 max_fast 大于 0 ，会调用 clear_fastchunks 宏，标志该分配区中已经没有 fast chunk ，因为函数 malloc_consolidate() 会合并所有的 fast bins 中的 chunk 。 clear_fastchunks 宏只会在函数 malloc_consolidate() 中调用。当有 fast chunk 加入 fast bins 时，就是调用 set_fastchunks 宏标识分配区的 fast bins 中存在 fast chunk 。

/*
  NONCONTIGUOUS_BIT indicates that MORECORE does not return contiguous
  regions.  Otherwise, contiguity is exploited in merging together,
  when possible, results from consecutive MORECORE calls.

  The initial value comes from MORECORE_CONTIGUOUS, but is
  changed dynamically if mmap is ever used as an sbrk substitute.
*/
#define NONCONTIGUOUS_BIT     (2U)
#define contiguous(M)          (((M)->flags &  NONCONTIGUOUS_BIT) == 0)
#define noncontiguous(M)       (((M)->flags &  NONCONTIGUOUS_BIT) != 0)
#define set_noncontiguous(M)   ((M)->flags |=  NONCONTIGUOUS_BIT)
#define set_contiguous(M)      ((M)->flags &= ~NONCONTIGUOUS_BIT)

Flags 的 bit1 如果为 0 ，表示 MORCORE 返回连续虚拟地址空间， bit1 为 1 ，表示 MORCORE 返回非连续虚拟地址空间，对于主分配区， MORECORE 其实为 sbr() ，默认返回连续虚拟地址空间，对于非主分配区，使用 mmap() 分配大块虚拟内存，然后进行切分来模拟主分配区的行为，而默认情况下 mmap 映射区域是不保证虚拟地址空间连续的，所以非住分配区默认分配非连续虚拟地址空间。

Malloc_state 中声明了几个对锁的统计变量，默认没有定义 THREAD_STATS ，所以不会对锁的争用情况做统计。

fastbinsY 拥有 10 （ NFASTBINS ）个元素的数组，用于存放每个 fast chunk 链表头指针，所以 fast bins 最多包含 10 个 fast chunk 的单向链表。

top 是一个 chunk 指针，指向分配区的 top chunk 。

last_remainder 是一个 chunk 指针，分配区上次分配 small chunk 时，从一个 chunk 中分裂出一个 small chunk 返回给用户，分裂后的剩余部分形成一个 chunk ， last_remainder 就是指向的这个 chunk 。

bins 用于存储 unstored bin ， small bins 和 large bins 的 chunk 链表头， small bins 一共 62 个， large bins 一共 63 个，加起来一共 125 个 bin 。而 NBINS 定义为 128 ，其实 bin[0] 和 bin[127] 都不存在， bin[1] 为 unsorted bin 的 chunk 链表头，所以实际只有 126bins 。 Bins 数组能存放了 254 （ NBINS*2 – 2 ）个 mchunkptr 指针，而我们实现需要存储 chunk 的实例，一般情况下， chunk 实例的大小为 6 个 mchunkptr 大小，这 254 个指针的大小怎么能存下 126 个 chunk 呢？这里使用了一个技巧，如果按照我们的常规想法，也许会申请 126 个 malloc_chunk 结构体指针元素的数组，然后再给链表申请一个头节点（即 126 个），再让每个指针元素正确指向而形成 126 个具有头节点的链表。事实上，对于 malloc_chunk 类型的链表“头节点”，其内的 prev_size 和 size 字段是没有任何实际作用的， fd_nextsize 和 bk_nextsize 字段只有 large bins 中的空闲 chunk 才会用到，而对于 large bins 的空闲 chunk 链表头不需要这两个字段，因此这四个字段所占空间如果不合理使用的话那就是白白的浪费。我们再来看一看 128 个 malloc_chunk 结构体指针元素的数组占了多少内存空间呢？假设 SIZE_SZ 的大小为 8B ，则指针的大小也为 8B ，结果为 126*2*8=2016 字节。而 126 个 malloc_chunk 类型的链表“头节点”需要多少内存呢？ 126*6*8=6048 ，真的是 6048B 么？不是，刚才不是说了， prev_size ， size ， fd_nextsize 和 bk_nextsize 这四个字段是没有任何实际作用的，因此完全可以被重用（覆盖），因此实际需要内存为 126*2*8=2016 。 Bins 指针数组的大小为，（ 128*2-2 ） *8=2032,2032 大于 2016 （事实上最后 16 个字节都被浪费掉了），那么这 254 个 malloc_chunk 结构体指针元素数组所占内存空间就可以存储这 126 个头节点了。

binmap 字段是一个 int 数组， ptmalloc 用一个 bit 来标识该 bit 对应的 bin 中是否包含空闲 chunk 。

/*
  Binmap

    To help compensate for the large number of bins, a one-level index
    structure is used for bin-by-bin searching.  `binmap' is a
    bitvector recording whether bins are definitely empty so they can
    be skipped over during during traversals.  The bits are NOT always
    cleared as soon as bins are empty, but instead only
    when they are noticed to be empty during traversal in malloc.
*/

/* Conservatively use 32 bits per map word, even if on 64bit system */
#define BINMAPSHIFT      5
#define BITSPERMAP       (1U << BINMAPSHIFT)
#define BINMAPSIZE       (NBINS / BITSPERMAP)

#define idx2block(i)     ((i) >> BINMAPSHIFT)
#define idx2bit(i)       ((1U << ((i) & ((1U << BINMAPSHIFT)-1))))

#define mark_bin(m,i)    ((m)->binmap[idx2block(i)] |=  idx2bit(i))
#define unmark_bin(m,i)  ((m)->binmap[idx2block(i)] &= ~(idx2bit(i)))
#define get_binmap(m,i)  ((m)->binmap[idx2block(i)] &   idx2bit(i))

binmap一共 128bit ， 16 字节， 4 个 int 大小， binmap 按 int 分成 4 个 block ，每个 block 有 32 个 bit ，根据 bin indx 可以使用宏 idx2block 计算出该 bin 在 binmap 对应的 bit 属于哪个 block 。 idx2bit 宏取第 i 位为 1 ，其它位都为 0 的掩码，举个例子： idx2bit(3) 为 “ 0000 1000 ”（只显示 8 位）。 mark_bin 设置第 i 个 bin 在 binmap 中对应的 bit 位为 1 ； unmark_bin 设置第 i 个 bin 在 binmap 中对应的 bit 位为 0 ； get_binmap 获取第 i 个 bin 在 binmap 中对应的 bit 。

next 字段用于将分配区以单向链表链接起来。

next_free 字段空闲的分配区链接在单向链表中，只有在定义了 PER_THREAD 的情况下才定义该字段。

system_mem 字段记录了当前分配区已经分配的内存大小。

max_system_mem 记录了当前分配区最大能分配的内存大小。

5.3.2 Malloc_par

Struct malloc_par 的定义如下：

struct malloc_par {
  /* Tunable parameters */
  unsigned long    trim_threshold;
  INTERNAL_SIZE_T  top_pad;
  INTERNAL_SIZE_T  mmap_threshold;
#ifdef PER_THREAD
  INTERNAL_SIZE_T  arena_test;
  INTERNAL_SIZE_T  arena_max;
#endif

  /* Memory map support */
  int              n_mmaps;
  int              n_mmaps_max;
  int              max_n_mmaps;
  /* the mmap_threshold is dynamic, until the user sets
     it manually, at which point we need to disable any
     dynamic behavior. */
  int              no_dyn_threshold;

  /* Cache malloc_getpagesize */
  unsigned int     pagesize;

  /* Statistics */
  INTERNAL_SIZE_T  mmapped_mem;
  INTERNAL_SIZE_T  max_mmapped_mem;
  INTERNAL_SIZE_T  max_total_mem; /* only kept for NO_THREADS */

  /* First address handed out by MORECORE/sbrk.  */
  char*            sbrk_base;
};

trim_threshold 字段表示收缩阈值，默认为 128KB ，当每个分配区的 top chunk 大小大于这个阈值时，在一定的条件下，调用 free 时会收缩内存，减小 top chunk 的大小。由于 mmap 分配阈值的动态调整，在 free 时可能将收缩阈值修改为 mmap 分配阈值的 2 倍，在 64 位系统上， mmap 分配阈值最大值为 32MB ，所以收缩阈值的最大值为 64MB ，在 32 位系统上， mmap 分配阈值最大值为 512KB ，所以收缩阈值的最大值为 1MB 。收缩阈值可以通过函数 mallopt() 进行设置。

top_pad 字段表示在分配内存时是否添加额外的 pad ，默认该字段为 0 。

mmap_threshold 字段表示 mmap 分配阈值，默认值为 128KB ，在 32 位系统上最大值为 512KB ， 64 位系统上的最大值为 32MB ，由于默认开启 mmap 分配阈值动态调整，该字段的值会动态修改，但不会超过最大值。

arena_test 和 arena_max 用于 PER_THREAD 优化，在 32 位系统上 arena_test 默认值为 2 ， 64 位系统上的默认值为 8 ，当每个进程的分配区数量小于等于 arena_test 时，不会重用已有的分配区。为了限制分配区的总数，用 arena_max 来保存分配区的最大数量，当系统中的分配区数量达到 arena_max ，就不会再创建新的分配区，只会重用已有的分配区。这两个字段都可以使用 mallopt() 函数设置。

n_mmaps 字段表示当前进程使用 mmap() 函数分配的内存块的个数。

n_mmaps_max 字段表示进程使用 mmap() 函数分配的内存块的最大数量，默认值为 65536 ，可以使用 mallopt() 函数修改。

max_n_mmaps 字段表示当前进程使用 mmap() 函数分配的内存块的数量的最大值，有关系 n_mmaps <= max_n_mmaps 成立。这个字段是由于 mstats() 函数输出统计需要这个字段。

no_dyn_threshold 字段表示是否开启 mmap 分配阈值动态调整机制，默认值为 0 ，也就是默认开启 mmap 分配阈值动态调整机制。

pagesize 字段表示系统的页大小，默认为 4KB 。

mmapped_mem 和 max_mmapped_mem 都用于统计 mmap 分配的内存大小，一般情况下两个字段的值相等， max_mmapped_mem 用于 mstats() 函数。

max_total_mem 字段在单线程情况下用于统计进程分配的内存总数。

sbrk_base 字段表示堆的起始地址。

5.3.3 分配区的初始化

Ptmalloc 定义了如下几个全局变量：

/* There are several instances of this struct ("arenas") in this
   malloc.  If you are adapting this malloc in a way that does NOT use
   a static or mmapped malloc_state, you MUST explicitly zero-fill it
   before using. This malloc relies on the property that malloc_state
   is initialized to all zeroes (as is true of C statics).  */
static struct malloc_state main_arena;
/* There is only one instance of the malloc parameters.  */
static struct malloc_par mp_;
/* Maximum size of memory handled in fastbins.  */
static INTERNAL_SIZE_T global_max_fast;

main_arena 表示主分配区，任何进程有且仅有一个全局的主分配区， mp_ 是全局唯一的一个 malloc_par 实例，用于管理参数和统计信息， global_max_fast 全局变量表示 fast bins 中最大的 chunk 大小。

分配区 main_arena 初始化函数

/*
  Initialize a malloc_state struct.

  This is called only from within malloc_consolidate, which needs
  be called in the same contexts anyway.  It is never called directly
  outside of malloc_consolidate because some optimizing compilers try
  to inline it at all call points, which turns out not to be an
  optimization at all. (Inlining it in malloc_consolidate is fine though.)
*/
#if __STD_C
static void malloc_init_state(mstate av)
#else
static void malloc_init_state(av) mstate av;
#endif
{
  int     i;
  mbinptr bin;

  /* Establish circular links for normal bins */
  for (i = 1; i < NBINS; ++i) {
    bin = bin_at(av,i);
    bin->fd = bin->bk = bin;
  }

#if MORECORE_CONTIGUOUS
  if (av != &main_arena)
#endif
    set_noncontiguous(av);
  if (av == &main_arena)
    set_max_fast(DEFAULT_MXFAST);
  av->flags |= FASTCHUNKS_BIT;

  av->top            = initial_top(av);
}

分配区的初始化函数默认分配区的实例 av 是全局静态变量或是已经将 av 中的所有字段都清 0 了。初始化函数做的工作比较简单，首先遍历所有的 bins ，初始化每个 bin 的空闲链表为空，即将 bin 的 fb 和 bk 都指向 bin 本身。由于 av 中所有字段默认为 0 ，即默认分配连续的虚拟地址空间，但只有主分配区才能分配连续的虚拟地址空间，所以对于非主分配区，需要设置为分配非连续虚拟地址空间。如果初始化的是主分配区，需要设置 fast bins 中最大 chunk 大小，由于主分配区只有一个，并且一定是最先初始化，这就保证了对全局变量 global_max_fast 只初始化了一次，只要该全局变量的值非 0 ，也就意味着主分配区初始化了。最后初始化 top chunk 。

Ptmalloc 参数初始化

/* Set up basic state so that _int_malloc et al can work.  */
static void
ptmalloc_init_minimal (void)
{
#if DEFAULT_TOP_PAD != 0
  mp_.top_pad        = DEFAULT_TOP_PAD;
#endif
  mp_.n_mmaps_max    = DEFAULT_MMAP_MAX;
  mp_.mmap_threshold = DEFAULT_MMAP_THRESHOLD;
  mp_.trim_threshold = DEFAULT_TRIM_THRESHOLD;
  mp_.pagesize       = malloc_getpagesize;
#ifdef PER_THREAD
# define NARENAS_FROM_NCORES(n) ((n) * (sizeof(long) == 4 ? 2 : 8))
  mp_.arena_test     = NARENAS_FROM_NCORES (1);
  narenas = 1;
#endif
}

主要是将全局变量 mp_ 的字段初始化为默认值，值得一提的是，如果定义了编译选项 PER_THREAD ，会根据系统 cpu 的个数设置 arena_test 的值，默认 32 位系统是双核， 64 位系统为 8 核， arena_test 也就设置为相应的值。

分享到：

Glibc内存管理--ptmalloc2源代码分析（十 ... | Glibc内存管理--ptmalloc2源代码分析（十 ...

2011-05-30 17:05
浏览 4058
评论(0)
分类:操作系统
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Glibc内存管理--ptmalloc2源代码分析（十六）

5.3 核心结构体分析

Ptmalloc 定义了如下几个全局变量：

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Glibc内存管理--ptmalloc2源代码分析（十六）

5.3 核心结构体分析

Ptmalloc 定义了如下几个全局变量：

评论

发表评论

相关推荐

最近访客更多访客>>