在C++程序中,内存问题除了非法改写,还有另外一个很重要也很频繁出现的问题是堆内存未释放。若是在高负载网络应用中,出现这个问题,很快会致使服务崩溃。之前检测此类问题的办法是在每个内存分配和释放处加上log,而后人肉debug,可是……面对几十万行内存分配/释放trace,相信大多数人会丧失查找问题所在的勇气更别说高效率解决问题了。html
所幸已经有高人大贤包装了基于内存分配器的跟踪器,以管理内存块生命期的方式来定位问题(参见http://www.cnblogs.com/clover-toeic/p/3819636.html)。可是原文中使用的数据结构是链表,这样在删除内存管理结构时显然会带来性能问题,并且该方法仍是线程不安全的。有鉴于此,我作出了对应性改进,主要是用C++11中的unordered_map(基于hash table)取代了链表,显著提升了内存块释放时的查找速度;另外就是编写了一个基于TLS的包装类,将内存分配跟踪器per thread化,这样就能够用于多线程程序中。安全
首先咱们定义一个内存块管理结构:网络
typedef struct mem_info { const char* fileName; const char* funcName; uint32_t codeLine; pid_t tid; size_t memSize; void* memAddr; } mem_info_t;
因为咱们须要直接删除内存块,那么它在hash table中的存放方式,就应该是{void* ptr, mem_info_t* mi},这样的一个pair做为unordered_map的元素。同时还要记录总共分配了多少字节的内存,以及总共的分配次数,所以采用一个称为global_meminfo的类来完成这一任务: 数据结构
class global_meminfo {
public:
global_meminfo(): allocBytes_(0), allocTimes_(0), releaseBytes_(0), releaseTimes_(0) {}多线程
void saveMemInfo(void* ptr, mem_info_t* mi) {
uint64_t addr = reinterpret_cast<uint64_t>(ptr);
tracked_meminfo_.insert({addr, mi});
++allocTimes_;
allocBytes_ += mi->memSize;
}函数
void removeMemInfo(void* ptr) {
uint64_t addr = reinterpret_cast<uint64_t>(ptr);
auto ele = tracked_meminfo_.find(addr);
if ( ele == tracked_meminfo_.end() ) {
printf("No valid memory block found(%p)\n", ptr);
return;
}
++releaseTimes_;
releaseBytes_ += ele->second->memSize;
free(ele->second->memAddr);
free(ele->second); // release mem_info_t*
tracked_meminfo_.erase(ele);
}性能
~global_meminfo() {
printf("Total memory allocated: %zu \n", allocBytes_);
printf("Times of memory allocation: %u\n", allocTimes_);
printf("Total memory released: %zu\n", releaseBytes_);
printf("Times of memory releasing: %u\n", releaseTimes_);
size_t unreleased = 0;
if ( !tracked_meminfo_.empty()) {
for ( auto& ele: tracked_meminfo_ ) {
unreleased += ele.second->memSize;
free(ele.second->memAddr);
free(ele.second);
}
}
printf("Thread %s: Unleased memory: %zu of %zu bytes\n", \
CurrentThread::getTidString(), tracked_meminfo_.size(), unreleased);
tracked_meminfo_.clear();
}ui
private:
typedef unordered_map<uint64_t, mem_info_t*> tracked_mem_info_t;
tracked_mem_info_t tracked_meminfo_;
size_t allocBytes_;
uint32_t allocTimes_;
size_t releaseBytes_;
uint32_t releaseTimes_;
};spa
未释放的内存块信息会在该对象析构时打印出来。有了内存分配跟踪器的管理类,那么如何将其per thread化?在Linux中提供了pthread_getspecific来实现TLS。那么能够采用一个模板类来包装之:线程
template<typename T> class ThreadLocalStorage: public Noncopyable { public: ThreadLocalStorage() { pthread_key_create(&pKey_, &ThreadLocalStorage::destroyer); } ~ThreadLocalStorage() { pthread_key_delete(pKey_); } T& value() { T* v = static_cast<T*>(pthread_getspecific(pKey_)); if ( !v ) { T* newObj(new T); pthread_setspecific(pKey_, newObj); v = newObj; } return *v; } private: static void destroyer(void* x) { T* v = static_cast<T*>(x); typedef char T_must_be_complete_type[sizeof(T) == 0 ? -1: 1]; T_must_be_complete_type foo; (void)foo; delete v; } pthread_key_t pKey_; };
这样就能够将全局的global_meminfo对象存储到TLS中了:
ThreadLocalStorage<global_meminfo> g_meminfo;
有了内存分配跟踪器的包装类以后,再来从新定义内存分配和释放函数:
void* tracked_malloc(size_t size, const char* file, const char* func, uint32_t line) { void* ptr = malloc(size); mem_info_t* mi = new mem_info_t; mi->fileName = file; mi->funcName = func; mi->codeLine = line; mi->tid = CurrentThread::getTid(); mi->memSize = size; mi->memAddr = ptr; global_meminfo& mem_info = g_meminfo.value(); mem_info.saveMemInfo(ptr, mi); return ptr; } void tracked_free(void* ptr) { global_meminfo& mem_info = g_meminfo.value(); mem_info.removeMemInfo(ptr); } #define TRACKED_MALLOC(size) tracked_malloc(size, __FILE__, __FUNCTION__, __LINE__) #define TRACKED_FREE(ptr) tracked_free(ptr)
tracked_malloc的功能很简单,将分配出的内存地址/大小,所在文件/行数/函数名及当前线程ID保存至hash table。固然这两个宏只适用于C语言程序,对于C++,由于operator new也是基于malloc/free的,因此只要继续定义一个本身的operator new取而代之便可。
最后咱们经过一个简单的demo程序来演示下这个方案的能力:
void func() { for ( int i = 0; i < 1024; ++i ) { void* ptr = TRACKED_MALLOC(8); if ( i < 512 ) TRACKED_FREE(ptr); } } int main() { std::thread t1(func); std::thread t2(func); t1.join(); t2.join(); sleep(5); return 0; }
两个线程独立运行,8个字节的内存分配1024次可是只释放512次。所以会泄漏4096字节,是否如此呢?实际运行一下就知道了:
Total memory allocated: 8192 Times of memory allocation: 1024 Total memory released: 4096 Times of memory releasing: 512 Thread 1504: Unleased memory: 512 of 4096 bytes
Total memory allocated: 8192 Times of memory allocation: 1024 Total memory released: 4096 Times of memory releasing: 512 Thread 1503: Unleased memory: 512 of 4096 bytes
可见是可以检测到各线程的内存泄漏状况的。