qhooge 0人评论 2975人阅读 2012-03-22 21:07:47php
Copyright © 2009, The e. Publishing Dept. of Morpho Studio (Spruce Int. Found.® ) All rights reserved.
关于ZFS,最常常被提到的问题就是:“如何提高ZFS的性能?”
这不是说ZFS的性能很差,任何现有的文件系统在使用一段时间之后性能都会变差。事实上zfs能够是一个很是快速的文件系统。
ZFS具备最强的自我校订特性和ZFS后台算法的固有特性,能帮助你在无需昂贵的硬件控制器的状况下达到比大多数RAID控制器和RAID模组更好的性能。因此咱们说ZFS是目前业界的第一个真正的RAID(廉价磁盘阵列)解决方案。
大多数我所见到的对ZFS性能问题,根本上都是源于对硬件性能的错误假设,或仅仅参照那些不切实际的物理定律作出的断定。
如今,咱们应该来看一下10个能够提高ZFS性能的简便方法,这些每一个人都可以使用,并不须要首先学成ZFS的专家。
为了便于阅读,这里先列出一个内容目录:html
在开始咱们的性能主题以前,首先复习下基础知识:ios
分清两类基本的文件系统操做是很是重要的:读 和 写算法
有人可能会说这是一个很简单甚至愚蠢的问题。可是你必须有耐心听我说完,做为FS文件系统的两个I/O 子系统,读和写的数据操做流程具备很是大的差别,这意味着提高读/写性能的方法是有差异的。数据库
咱们可使用 zpool iostat
或 iostat(1M)
命令来核查系统的读/写性能是否符合你的意见或预期。缓存
而后,咱们须要了解文件系统两类性能参数:安全
再次,这些不一样的性能视角在进行性能优化时有不一样的意义,你只须要了解本身面临的是哪一类特殊问题。 同时读/写都有两种不一样的性能模式:性能优化
这里有一个好消息,ZFS 经过称为copy-on-write的魔术(操做特性),自动将随机的写操做转化为连续的写操做。这是一类较少被其余文件系统较少顾及到的性能问题。服务器
最后,对I/Os操做来讲,你应该了一心两种操做直接的解差异:数据结构
性能预期,目标与策略
立刻就将开始今天的性能主题了,不过在开始以前,咱们须要理清一些概念:
*肯定实际的预期:ZFS是很棒的,是的。 可是你须要遵照物理学定律。 一个10000 rpm的一个磁盘不能实现超过每秒166次的随机IOPS,由于10000 prm(周/分钟) 除以60秒(每分钟)等于166。这表示磁头每秒钟只能在一个随机街区上方定位它本身的位置166次。 任何多于那个数的寻道和你的数据读/写其实不是随机的。磁盘随机读/写操做的最大理论IOPS数就是这么计算出来的。
与此相似,RAID-Z 意味对于每一个RAID-Z磁盘组你只会得到至关于单个磁盘的IOPS性能,由于每一个文件系统IO 将并行发生在一个RAID-Z磁盘组的所有磁盘上。
你得明确得知你到你的存储设备的物理限制和你指望的实际性能,在何时分析你性能而且肯定性能目标。
*设定性能目标:究竟什么状况是" 太慢" ? 什么性能将是可接受的? 如今得到了多大的性能,而且你想要多大的性能?
设定性能目标很重要,由于他们告诉你何时你已经作到了。 总有方法提升性能,可是不惜任何代价提升性能是无用的。 知道何时你已经作到了,而后庆祝!
*系统性:咱们试验这,而后咱们试验那,咱们用CP(1)来测量,即便咱们的应用其实是数据库。而后咱们各处拧(调整参数),而且一般在咱们知道它是什么以前,咱们意识到:咱们真的什么也不知道。
有系统是指肯定怎样的方法测量咱们想要的性能,设定系统当前的状态,而后用咱们感兴趣的直接与实际应用有关的一种性能测定方式,并坚持在整个性能分析和优化过程当中使用相同的方法。
不然,事情变得使人困惑,咱们会丢失信号,咱们将不能告诉本身是否到达了目标。
如今咱们已经理解咱们想要那一类的性能提高,咱们了解基于今天的硬件咱们能够指望实现的性能,咱们肯定一些实际的目标,而且对性能优化有一条有条不紊的方法,下面让咱们开始今天的主题:
#1: 增长足够的内存
磁盘上的一小部分数据空间被用于存储ZFS元数据。这些数据是ZFS自身所须要的,用来知道实际的用户数据在磁盘上的存储位置。换个说法这些元数据是ZFS用来查找用户数据的路线图和数据结构。
若是你的服务器没有足够的内存来存储元数据,那就会耗费额外的元数据读取IO操做来肯定每项须要读取的数据是否真的位于磁盘上。这将致使用户数据读取速度变慢,你应该尽可能避免这种状况的发生。若是你的实际可用的内存很小,那么这会对磁盘的性能形成严重的影响。
你须要多少内存? 根据thumb的粗略计算规则是你磁盘的总容量除以1千,而后加上为操做系统保留的1GB 。这意味着每1TB数据,你将须要至少1GB的内存用于缓存ZFS元数据,加上操做系统和其余应用程序所需的额外内存容量。
拥有足够的内存将使你在数据读取时得到收益,不论是读取操做时随机的仍是顺序的,仅仅由于这些元素数据缓存在内存中,和访问磁盘相比更容易被找到,因此请确认你的系统具备最少n/1000+1GB的内存,n为你的存储池容量(GB)
#2: 增长跟多的内存
ZFS 会使用他找到的每一块内存来缓存数据。ZFS具备很是精致的缓存算法,他会尝试缓存最进使用和最常用的数据,根据数据的使用状况自适应平衡两种数据类型的缓存。ZFS同时还有高级的预读能力,能够极大得改善不一样类型的数据顺序读取性能。
你分配给ZFS的内存越多以上特性就能工做得越好。可是什么时候你能知道给多的内存是否给你带来突破性的性能或仅有小的性能提高呢?
这取决于你的工做数据集位置。
你的工做数据集是指那部分你最常用的数据:系统上运行主要产品/网站/电子商务数据库中的内容,你的主机环境中数据流量最大的客户端程序,你最常用的文件等等。
若是你的工做数据数据集能加载到内存中,大多数时间里主要的数据读取请求均可以经过访问内存得到,而无需建立访问低速的磁盘的IO操做。
尝试计算出你最常用的数据大小,而后为你的ZFS服务器添加足够的内存使其常驻于内存中,将使你得到最大的读取性能。
若是你但愿更加自动化得进行以上工做, Ben Rockwood 编写了一个很是棒的工具,称为 arc_summary (ARC——ZFS Adaptive Replacement Cache ZFS自适应可调整缓存). 其中两个"Ghost" 变量将确切的告诉你根据过去的一段时间内数据负载,到底须要增长多少内存,才能帮助你明显得改善你的ZFS性能。
If you want to influence the balance between user data and metadata in the ZFS ARC cache, check out the primarycache filesystem property that you can set using the zfs(1M) command. For RAM-starved servers with a lot of random reads, it may make sense to restrict the precious RAM cache to metadata and use an L2ARC, explained in tip #4 below.
#3: 增长更多的内存得到重复数据消除技术带来的提高性能
在较早的文章里, 我写过关于ZFS 重复数据消除(ZFS Deduplication.)的基础知识。若是你计划使用这项功能,请记住ZFS将分配一个表格包含文件系统中存储的每个数据块的存储位置信息以及数据块的校验和,而后就能肯定是否一个特定的数据块已经被写入过,以及安全得将这些数据标记为重复的。
重复消除技术将可以节省你的存储空间,同时由于节省了没必要要的读写IOPs ,你的ZFS性能也将得到提高。可是,使用这一技术的成本是你须要更多的内存来存储重复数据表(ZFS dedup table),不然额外的低速磁盘的IO操做反而会下降文件系统的性能。
那么ZFS 重复数据表到底有多大呢?Richard Elling 在最近发表的一篇文章中指出:针对每个数据块,ZFS 重复数据表会有一条记录,每条记录会使用大约250字节。假设数据块大小为8K,那么每1TB的用户数据将须要32GB的内存来容纳。若是你存储的主要是大尺寸的文件,那么你会有一个比较大的平均数据块大小,好比64K,那你只须要4GB内存就能容纳整个重复数据表。
若是你没有足够的内存,就没必要使用ZFS的重复数据消除技术,不然会带来额外的磁盘IO开销,反而下降ZFS的性能。
#4: 使用固态硬盘(SSDs)提高读取性能
若是你没法为服务器添加更多的内存(或者你公司的采购部不批准你的申购金额),退一步最好的提高读取性能的办法就是为系统增长固态硬盘(基于闪存)做为二级ARC缓存(L2ARC)。
你能够经过 zpool(1M)命令很是简便得完成配置工做,参阅man-page的 "Cache devices" 章节。
SSDs can deliver two orders of magnitude better IOPS than traditional harddisks, and they're much cheaper on a per-GB basis than RAM.
They form an excellent layer of cache between the ZFS RAM-based ARC and the actual disk storage.
You don't need to observe any reliability requirements when configuring L2ARC devices: If they fail, no data is lost because it can always be retrieved from disk.
This means that L2ARC devices can be cheap, but before you start putting USB sticks into your server, you should make sure they deliver a good performance benefit over your rotating disks :).
SSDs come in various sizes: From drop-in-replacements for existing SATA disks in the range of 32GB to the Oracle Sun F20 PCI card with 96GB of flash and built-in SAS controllers (which is one of the secrets behind Oracle Exadata V2's breakthrough performance), to the mighty fast Oracle Sun F5100 flash array (which is the secret behind Oracle's current TPC-C and other world records) with a whopping 1.96TB of pure flash memory and over a million IOPS. Nice!
And since the dedup table is stored in the ZFS ARC and consequently spills off into the L2ARC if available, using SSDs as cache devices will also benefit deduplication performance.
#5: Use SSDs to Improve Write Performance
Most write performance problems are related to synchronous writes. These are mostly found in file servers and database servers.
With synchronous writes, ZFS needs to wait until each particular IO is written to stable storage, and if that's your disk, then it'll need to wait until the rotating rust has spun into the right place, the harddisk's arm moved to the right position, and finally, until the block has been written. This is mechanical, it's latency-bound, it's slow.
See Roch's excellent article on ZFS NFS performance for a more detailed discussion on this.
SSDs can change the whole game for synchronous writes because they have 100x better latency: No moving parts, no waiting, instant writes, instant performance.
So if you're suffering from a high load in synchronous writes, add SSDs as ZFS log devices (aka ZIL, Logzillas) and watch your synchronous writes fly. Check out the zpool(1M) man page under the "Intent Log" section for more details.
Make sure you mirror your ZIL devices: They are there to guarantee the POSIX requirement for "stable storage" so they must function reliably, otherwise data may be lost on power or system failure.
Also, make sure you use high quality SLC Flash Memory devices, because they can give you reliable write transactions. Cheaper MLC cells can damage existing data if the power fails during write operations, something you really don't want.
#6: Use Mirroring
Many people configure their storage for maximum capacity. They just look at how many TB they can get out of their system. After all, storage is expensive, isn't it?
Wrong. Storage capacity is cheap. Every 18 months or so, the same disk only costs half as much, or you can buy double the capacity for the same price, depending on how you view it.
But storage performance can be precious. So why squeeze the last GB out of your storage if capacity is cheap anyway? Wouldn't it make more sense to trade in capacity for speed?
This is what mirroring disks offer as opposed to RAID-Z or RAID-Z2:
For a more detailed discussion on this, I highly recommend Richard Elling's post on ZFS RAID recommendations: Space, performance and MTTDL.
Also, there's some more discussion on this in my earlier RAID-GREED-article.
Bottom line: If you want performance, use mirroring.
#7: Add More Disks
Our next tip was already buried inside tip #6: Add more disks. The more vdevs ZFS has to play with, the more shoulders it can place its load on and the faster your storage performance will become.
This works both for increasing IOPS and for increasing bandwidth, and it'll also add to your storage space, so there's nothing to lose by adding more disks to your pool.
But keep in mind that the performance benefit of adding more disks (and of using mirrors instead of RAID-Z(2)) only accelerates aggregate performance. The performance of every single I/O operation is still confined to that of a single disk's I/O performance.
So, adding more disks does not substitute for adding SSDs or RAM, but it'll certainly help aggregate IOPS and bandwidth for the cases where lots of concurrent IOPS and bigger overall bandwidth are needed.
#8 Leave Enough Free Space
Don't wait until your pool is full before adding new disks, though.
ZFS uses copy on write which means that it writes new data into free blocks, and only when the überblock has been updated, the new state becomes valid.
This is great for performance because it gives ZFS the opportunity to turn random writes into sequential writes - by choosing the right blocks out of the list of free blocks so they're nicely in order and thus can be written to quickly.
That is, when there are enough blocks.
Because if you don't have enough free blocks in your pool, ZFS will be limited in its choice, and that means it won't be able to choose enough blocks that are in order, and hence it won't be able to create an optimal set of sequential writes, which will impact write performance.
As a rule of thumb, don't let your pool become more full than about 80% of its capacity. Once it reaches that point, you should start adding more disks so ZFS has enough free blocks to choose from in sequential write order.
#9: Hire A ZFS Expert
There's a reason why this point comes up almost last: In the utter majority of all ZFS performance cases, one or more of #1-#8 above are almost always the solution.
And they're cheaper than hiring a ZFS performance expert who will likely tell you to add more RAM, or add SSDs or switch from RAID-Z to mirroring after looking at your configuration for a couple of minutes anyway!
But sometimes, a performance problem can be really tricky. You may think it's a storage performance problem, but instead your application may be suffering from an entirely different effect.
Or maybe there are some complex dependencies going on, or some other unusual interaction between CPUs, memory, networking, I/O and storage.
Or perhaps you're hitting a bug or some other strange phenomenon?
So, if all else fails and none of the above options seem to help, contact your favorite Oracle/Sun representative (or send me a mail) and ask for a performance workshop quote.
If your performance problem is really that hard, we want to know about it.
#10: Be An Evil Tuner - But Know What You Do
If you don't want to go for option #9 and if you know what you do, you can check out the ZFS Evil Tuning Guide.
There's a reason it's called "evil": ZFS is not supposed to be tuned. The default values are almost always the right values, and most of the time, changing them won't help, unless you really know what you're doing. So, handle with care.
Still, when people encounter a ZFS performance problem, they tend to Google "ZFS tuning", then they'll find the Evil Tuning Guide, then think that performance is just a matter of setting that magic variable in /etc/system.
This is simply not true.
Measuring performance in a standardized way, setting goals, then sticking to them helps. Adding RAM helps. Using SSDs helps. Thinking about the right number and RAID level of disks helps. Letting ZFS breathe helps.
But tuning kernel parameters is reserved for very special cases, and then you're probably much better off hiring an expert to help you do that correctly.
Bonus: Some Miscellaneous Settings
If you look through the zfs(1M) man page, you'll notice a few performance related properties you can set.
They're not general cures for all performance problems (otherwise they'd be set by default), but they can help in specific situations. Here are a few:
Your Turn
Sorry for the long article. I hope the table of contents at the beginning makes it more digestible, and I hope it's useful to you as a little checklist for ZFS performance planning and for dealing with ZFS performance problems.
Let me know if you want me to split up longer articles like these (though this one is really meant to remain together).
Now it's your turn: What is your experience with ZFS performance? What options from the above list did you implement for what kind of application/problem and what were your results? What helped and what didn't and what are your own ZFS performance secrets?
Share your ZFS performance expertise in the comments section and help others get the best performance out of ZFS!
Related Posts