ZFS
维库,知识与思想的自由文库
ZFS 源自于Sun Microsystems为Solaris操作系统开发的文件系统。ZFS是一个具有高存储容量、文件系统与卷管理概念整合、崭新的磁盘逻辑结构的轻量级文件系统,同时也是一个便捷的存储池管理系统。ZFS是一个使用Common Development and Distribution License (CDDL)协议条款授权的开源项目。
[编辑] 历史ZFS的设计与开发由Sun公司的Jeff Bonwick所领导的一支团队完成。最早宣布于2004年9月14日,[1] 于2005年10月31日并入了Solaris开发的主干源代码。[2] 并在2005年11月16日作为OpenSolaris build 27的一部分发布。 Sun在OpenSolaris社区开张1年后的2006年六月,将ZFS整合进了Solaris 10 6/06版本更新。[3] ZFS之名最早代表"Zettabyte File System", 但现在仅是无意义的首字缩写。[4] [编辑] 容量ZFS是一个128位的文件系统,这意味着它能存储1800亿亿(18.4 × 1018倍于当前64位文件系统的数据。ZFS的设计如此超前以至于这个极限就当前现实际可能永远无法遇到。项目领导Bonwick曾说:“要填满一个128位的文件系统,将耗尽地球上所有存储设备。除非你拥有煮沸整个海洋的能量,不然你不可能将其填满。(Populating 128-bit file systems would exceed the quantum limits of earth-based storage. You couldn't fill a 128-bit storage pool without boiling the oceans.)”[1] 一下是ZFS的一些理论极限:
作为对这些数字的感性认识,假设每秒钟创建1,000个新文件,达到ZFS文件数极限需要大约9,000年。 在辩解填满ZFS与煮沸海洋的关系时,Bonwick写到:
[编辑] 存储池不同于传统文件系统需要驻留于单独设备或者需要一个卷管理系统去使用一个一上的设备,ZFS建立在虚拟的,被称为“zpools”的存储池之上。每个存储池由若干虚拟设备(virtual devices, vdevs)组成。这些虚拟设备可以是原始磁盘,也可能是一个RAID1镜像设备,或是非标准RAID等级的多磁盘组。于是zpool上的文件系统可以使用这些虚拟设备的总存储容量。 可以使用磁盘限额以及设置磁盘预留空间来限制存储池中单个文件系统所占用的空间。 [编辑] 写时拷贝事务模型ZFS uses a copy-on-write, transactional object model. All block pointers within the filesystem contain a 256-bit checksum of the target block which is verified when the block is read. Blocks containing active data are never overwritten in place; instead, a new block is allocated, modified data is written to it, and then any metadata blocks referencing it are similarly read, reallocated, and written. To reduce the overhead of this process, multiple updates are grouped into transaction groups, and an intent log is used when synchronous write semantics are required. [编辑] 快照与克隆The ZFS copy-on-write model has another powerful advantage: when ZFS writes new data, instead of releasing the blocks containing the old data, it can instead retain them, creating a snapshot version of the file system. ZFS snapshots are created very quickly, since all the data comprising the snapshot is already stored; they are also space efficient, since any unchanged data is shared among the file system and its snapshots. Writable snapshots ("clones") can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist. [编辑] Dynamic stripingDynamic striping across all devices to maximize throughput means that as additional devices are added to the zpool, the stripe width automatically expands to include them, thus all disks in a pool are used, which balances the write load across them. [编辑] 可变块尺寸ZFS uses variable-sized blocks of up to 128 kilobytes. The currently available code allows the administrator to tune the maximum block size used as certain workloads do not perform well with large blocks. Automatic tuning to match workload characteristics is contemplated. If compression is enabled, variable block sizes are used. If a block can be compressed to fit into a smaller block size, the smaller size is used on the disk to use less storage and improve IO throughput (though at the cost of increased CPU use for the compression and decompression operations) [编辑] 轻量化文件系统创建在ZFS中,存储池中文件系统的操作相比传统文件系统的卷管理更加便捷。创建ZFS文件系统或者改变一个ZFS文件系统的大小接近于传统技术中的管理目录而非管理卷。 [编辑] Additional capabilities
[编辑] Cache ManagementZFS also introduces the ARC, a new method for cache management instead of the traditional Solaris virtual memory page cache. [编辑] 限制ZFS尚不支持透明加密(如NTFS),但有相关的OpenSolaris项目正在从事开发此功能。[9] ZFS不支持用户/组等级的磁盘限额。作为替代,可以创建用户所有的文件系统并设定其容量限制。ZFS does not support per-user or per-group quotas. Instead, it is possible to create user-owned filesystems, each with its own size limit. The low overhead of ZFS filesystems makes this practical even with many users (but, as noted in the current implementation issues, may slow system startup considerably). Intrinsically, there is no practical quota solution for the file systems shared among several users (such as team projects, for example), where the data cannot be separated per user, although it could be implemented on top of the ZFS stack. Capacity expansion is normally achieved by adding groups of disk as vdev (stripe, RAID-Z, RAID-Z2, or mirrored). Newly written data will dynamically start to use all available vdevs. It is also possible to expand the array by iteratively swapping each drive in the array with a bigger drive and waiting for ZFS to heal itself - the heal time will depend on amount of store information, not the disk size. One should refrain from taking snapshots during the process (as this will cause the heal to be restarted). It is currently not possible to reduce the number of vdevs in a pool nor otherwise reduce pool capacity. However, it is expected to be implemented in the near future.[來源請求]
Reconfiguring storage requires copying data offline, destroying the pool, and recreating the pool with the new policy. [编辑] Current implementation issuesCurrent ZFS implementation (Solaris 10 11/06) has some issues admins should know before deploying it. These issues are NOT inherent to ZFS, and might be solved in future releases:
[编辑] PlatformsZFS is part of Sun's own Solaris operating system and is thus available on both SPARC and x86-based systems. Since the code for ZFS is open source, a port to other operating systems and platforms can be produced without Sun's involvement. Nexenta OS, a complete GNU-based open source operating system built on top of the OpenSolaris kernel and runtime, includes a ZFS implementation, added in version alpha1. Apple Computer is porting ZFS to their Mac OS X operating system, according to a post by a Sun employee on the opensolaris.org zfs-discuss mailing list, and previewed screenshots of the next version of Apple's Mac OS X.[18] As of Mac OS X 10.5 (Developer Seed 9A321), support for ZFS has been included, but lacks the ability to act as a root partition, noted above. Also, attempts to format local drives using ZFS are unsuccessful; this is a known bug.[19] Porting ZFS to Linux is complicated by the fact that the GNU General Public License, which governs the Linux kernel prevents from linking with code under other licenses, such as CDDL, the license ZFS is is released under.[20] To work around this problem the Google Summer of Code program is sponsoring a port of ZFS to Linux's FUSE system so the filesystem will run in userspace instead.[21] However, running a file system outside the kernel on traditional unix-like systems has significant performance impact. There are no plans to port ZFS to HP-UX or AIX.[22] Pawel Jakub Dawidek has ported and committed ZFS to FreeBSD for inclusion in FreeBSD 7.0, due to be released in 2007.[23] [编辑] Adaptive EndiannessPools and their associated ZFS file systems can be moved between different platform architectures, even between systems implementing different byte orders. The ZFS block pointer format allows for filesystem metadata to be stored in an endian-adaptive way; individual metadata blocks are written with the native byte order of the system writing the block. When reading, if the stored endianness doesn't match the endianness of the system, the metadata is byte-swapped in memory. This does not affect the stored data itself: as is usual in POSIX systems, files appear to applications as simple arrays of bytes, so applications creating and reading data remain responsible for doing so in a way independent of the underlying system's endianness. [编辑] References
[编辑] See also
[编辑] External links | ||||||||||||||||||||||||||||||||||||||||||||||||||


