Caffe使用：如何将一维数据或其余非图像数据转换成lmdb

时间 2019-12-13

标签 caffe 使用如何一维数据其余图像转换 lmdb 繁體版

原文原文链接

　　caffe事儿真多，数据必须得lmdb或者leveldb什么的才行，若是数据是图片的话，那用caffe自带的convert_image.cpp就行，但若是不是图片，就得本身写程序了。我也不是计算机专业的，我哪看得懂源码，遂奋发而百度之，然无甚结果，遂google之，尝闻“内事不决问百度，外事不决问google”，古人诚不我欺。在caffe的google group里我找到了这个网址：http://deepdish.io/2015/04/28/creating-lmdb-in-python/python

代码以下：git

import numpy as np
import lmdb
import caffe

N = 1000

# Let's pretend this is interesting data
X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64)

# We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10

env = lmdb.open('mylmdb', map_size=map_size)

with env.begin(write=True) as txn:
    # txn is a Transaction object
    for i in range(N):
        datum = caffe.proto.caffe_pb2.Datum()
        datum.channels = X.shape[1]
        datum.height = X.shape[2]
        datum.width = X.shape[3]
        datum.data = X[i].tobytes()  # or .tostring() if numpy < 1.9
        datum.label = int(y[i])
        str_id = '{:08}'.format(i)

        # The encode is only essential in Python 3
        txn.put(str_id.encode('ascii'), datum.SerializeToString())

　　这是用python将数据转为lmdb的代码，可是我用这个处理完数据再使用caffe会出现std::bad_alloc错误，后来通过艰苦地奋斗，查阅了大量资料，我发现了问题所在：github

　　1.caffe的数据格式默认为四维(n_samples, n_channels, height, width) .因此必须把个人数据处理成这种格式ui

　　2.最后一行txn.put(str_id.encode('ascii'), datum.SerializeToString())必定要加上，我一开始一维python2不用写这个，结果总是出错，后来才发现这行必须写！this

　　3.若是出现mdb_put: MDB_MAP_FULL: Environment mapsize limit reached的错误，是由于lmdb默认的map_size比较小，我把lmdb/cffi.py里面的map_size默认值改了一下，改为了1099511627776（也就是1Tb），我也不知道是否是这么改，而后我又把上面python程序里map_size = X.nbytes 这句改为了map_size = X.nbytes * 10，而后就成功了！google

　　找资料的过程当中，我还发现了用python写leveldb的程序，网址在这里：https://github.com/BVLC/caffe/issues/745和http://stackoverflow.com/questions/32707393/whats-caffes-input-formatspa

　　用python写HDF5的程序在这里：http://stackoverflow.com/questions/31774953/test-labels-for-regression-caffe-float-not-allowed/31808324#31808324rest

参考：code

　　1.http://stackoverflow.com/questions/30983213/how-to-use-1-dim-vector-as-input-for-caffe/30991590#30991590orm

　　2.关于lmdb的map_size大小的问题：https://github.com/BVLC/caffe/issues/1298和http://stackoverflow.com/questions/31820976/lmdb-increase-map-size