summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2014-09-17Btrfs: fix loop writing of async reclaimLiu Bo
One of my tests shows that when we really don't have space to reclaim via flush_space and also run out of space, this async reclaim work loops on adding itself into the workqueue and keeps writing something to disk according to iostat's results, and these writes mainly comes from commit_transaction which writes super_block. This's unacceptable as it can be bad to disks, especially memeory storages. This adds a check to avoid the above situation. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: make fiemap not blow when you have lots of snapshotsJosef Bacik
We have been iterating all references for each extent we have in a file when we do fiemap to see if it is shared. This is fine when you have a few clones or a few snapshots, but when you have 5k snapshots suddenly fiemap just sits there and stares at you. So add btrfs_check_shared which will use the backref walking code but will short circuit as soon as it finds a root or inode that doesn't match the one we currently have. This makes fiemap on my testbox go from looking at me blankly for a day to spitting out actual output in a reasonable amount of time. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: add missing compression property remove in btrfs_ioctl_setflagsFilipe Manana
The behaviour of a 'chattr -c' consists of getting the current flags, clearing the FS_COMPR_FL bit and then sending the result to the set flags ioctl - this means the bit FS_NOCOMP_FL isn't set in the flags passed to the ioctl. This results in the compression property not being cleared from the inode - it was cleared only if the bit FS_NOCOMP_FL was set in the received flags. Reproducer: $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt && cd /mnt $ mkdir a $ chattr +c a $ touch a/file $ lsattr a/file --------c------- a/file $ chattr -c a $ touch a/file2 $ lsattr a/file2 --------c------- a/file2 $ lsattr -d a ---------------- a Reported-by: Andreas Schneider <asn@cryptomilk.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: Fix a deadlock in btrfs_dev_replace_finishing()Qu Wenruo
btrfs-transacion:5657 [stack snip] btrfs_bio_map() btrfs_bio_counter_inc_blocked() percpu_counter_inc(&fs_info->bio_counter) ###bio_counter > 0(A) __btrfs_bio_map() btrfs_dev_replace_lock() mutex_lock(dev_replace->lock) ###wait mutex(B) btrfs:32612 [stack snip] btrfs_dev_replace_start() btrfs_dev_replace_lock() mutex_lock(dev_replace->lock) ###hold mutex(B) btrfs_dev_replace_finishing() btrfs_rm_dev_replace_blocked() wait until percpu_counter_sum == 0 ###wait on bio_counter(A) This bug can be triggered quite easily by the following test script: http://pastebin.com/MQmb37Cy This patch will fix the ABBA problem by calling btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked(). The consistency of btrfs devices list and their superblocks is protected by device_list_mutex, not btrfs_dev_replace_lock/unlock(). So it is safe the move btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked(). Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: cleanup the same name in end_bio_extent_readpageLiu Bo
We've defined a 'offset' out of bio_for_each_segment_all. This is just a clean rename, no function changes. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: don't go readonly on existing qgroup itemsMark Fasheh
btrfs_drop_snapshot() leaves subvolume qgroup items on disk after completion. This can cause problems with snapshot creation. If a new snapshot tries to claim the deleted subvolumes id, btrfs will get -EEXIST from add_qgroup_item() and go read-only. The following commands will reproduce this problem (assume btrfs is on /dev/sda and is mounted at /btrfs) mkfs.btrfs -f /dev/sda mount -t btrfs /dev/sda /btrfs/ btrfs quota enable /btrfs/ btrfs su sna /btrfs/ /btrfs/snap btrfs su de /btrfs/snap sleep 45 umount /btrfs/ mount -t btrfs /dev/sda /btrfs/ We can fix this by catching -EEXIST in add_qgroup_item() and initializing the existing items. We have the problem of orphaned relation items being on disk from an old snapshot but that is outside the scope of this patch. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: shrink further sizeof(struct extent_buffer)Filipe Manana
The map_start and map_len fields aren't used anywhere, so just remove them. On a x86_64 system, this reduced sizeof(struct extent_buffer) from 296 bytes to 280 bytes, and therefore 14 extent_buffer structs can now fit into a page instead of 13. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: send, lower mem requirements for processing xattrsFilipe Manana
Maximum xattr size can be up to nearly the leaf size. For an fs with a leaf size larger than the page size, using kmalloc requires allocating multiple pages that are contiguous, which might not be possible if there's heavy memory fragmentation. Therefore fallback to vmalloc if we fail to allocate with kmalloc. Also start with a smaller buffer size, since xattr values typically are smaller than a page. Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: remove stale define after removing ordered operationsDavid Sterba
Last user removed in commit "btrfs: disable strict file flushes for renames and truncates" (8d875f95da43c6a8f18f77869f2ef26e9594fecc). Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: improve free space cache management and space allocationFilipe Manana
While under random IO, a block group's free space cache eventually reaches a state where it has a mix of extent entries and bitmap entries representing free space regions. As later free space regions are returned to the cache, some of them are merged with existing extent entries if they are contiguous with them. But others are not merged, because despite the existence of adjacent free space regions in the cache, the merging doesn't happen because the existing free space regions are represented in bitmap extents. Even when new free space regions are merged with existing extent entries (enlarging the free space range they represent), we create chances of having after an enlarged region that is contiguous with some other region represented in a bitmap entry. Both clustered and non-clustered space allocation work by iterating over our extent and bitmap entries and skipping any that represents a region smaller then the allocation request (and giving preference to extent entries before bitmap entries). By having a contiguous free space region that is represented by 2 (or more) entries (mix of extent and bitmap entries), we end up not satisfying an allocation request with a size larger than the size of any of the entries but no larger than the sum of their sizes. Making the caller assume we're under a ENOSPC condition or force it to allocate multiple smaller space regions (as we do for file data writes), which adds extra overhead and more chances of causing fragmentation due to the smaller regions being all spread apart from each other (more likely when under concurrency). For example, if we have the following in the cache: * extent entry representing free space range: [128Mb - 256Kb, 128Mb[ * bitmap entry covering the range [128Mb, 256Mb[, but only with the bits representing the range [128Mb, 128Mb + 768Kb[ set - that is, only that space in this 128Mb area is marked as free An allocation request for 1Mb, starting at offset not greater than 128Mb - 256Kb, would fail before, despite the existence of such contiguous free space area in the cache. The caller could only allocate up to 768Kb of space at once and later another 256Kb (or vice-versa). In between each smaller allocation request, another task working on a different file/inode might come in and take that space, preventing the former task of getting a contiguous 1Mb region of free space. Therefore this change implements the ability to move free space from bitmap entries into existing and new free space regions represented with extent entries. This is done when a space region is added to the cache. A test was added to the sanity tests that explains in detail the issue too. Some performance test results with compilebench on a 4 cores machine, with 32Gb of ram and using an HDD follow. Test: compilebench -D /mnt -i 30 -r 1000 --makej Before this change: intial create total runs 30 avg 69.02 MB/s (user 0.28s sys 0.57s) compile total runs 30 avg 314.96 MB/s (user 0.12s sys 0.25s) read compiled tree total runs 3 avg 27.14 MB/s (user 1.52s sys 0.90s) delete compiled tree total runs 30 avg 3.14 seconds (user 0.15s sys 0.66s) After this change: intial create total runs 30 avg 68.37 MB/s (user 0.29s sys 0.55s) compile total runs 30 avg 382.83 MB/s (user 0.12s sys 0.24s) read compiled tree total runs 3 avg 27.82 MB/s (user 1.45s sys 0.97s) delete compiled tree total runs 30 avg 3.18 seconds (user 0.17s sys 0.65s) Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: rename total_bytes to avoid confusionAnand Jain
we are assigning number_devices to the total_bytes, that's very confusing for a moment Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: fix typo in the log messageAnand Jain
there is no matching open parenthesis for the closing parenthesis Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: rw_devices shouldn't be incremented for seed fs in ↵Anand Jain
btrfs_rm_dev_replace_srcdev() seed fs devices don't participate as rw_device, so don't increment rw_devices when the device being handled belongs to a seed fs. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: fix memory leak when there is no more seed deviceAnand Jain
When we replace all the seed device in the system there is no point in just keeping the btrfs_fs_devices with out any device Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: update sprout seed pointer when seed fs is relinquishedAnand Jain
We are not updating sprout fs seed pointer when all seed device is replaced. This patch will check if all seed device has been replaced and then update the sprout pointer accordingly. Same reproducer as in the previous patch would apply here. And notice that btrfs_close_device will check if seed fs is present and spits out the error with out this patch. int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) { :: seed_devices = fs_devices->seed; :: while (seed_devices) { fs_devices = seed_devices; seed_devices = fs_devices->seed; __btrfs_close_devices(fs_devices); free_fs_devices(fs_devices); } Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: fix rw_devices miss match after seed replaceAnand Jain
reproducer: reproducer: mount /dev/sdb /btrfs btrfs dev add /dev/sdc /btrfs btrfs rep start -B /dev/sdb /dev/sdd /btrfs umount /btrfs WARNING: CPU: 0 PID: 3882 at fs/btrfs/volumes.c:892 __btrfs_close_devices+0x1c8/0x200 [btrfs]() which is WARN_ON(fs_devices->rw_devices); The problem here is that we did not add one to the rw_devices when we replace the seed device with a writable device. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: replace seed device followed by unmount causes kernel WARNINGAnand Jain
reproducer: mount /dev/sdb /btrfs btrfs dev add /dev/sdc /btrfs btrfs rep start -B /dev/sdb /dev/sdd /btrfs umount /btrfs WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 __btrfs_close_devices+0x1b0/0x200 [btrfs]() :: __btrfs_close_devices() :: WARN_ON(fs_devices->open_devices); After the seed device has been replaced the new target device is no more a seed device. So we need to update the device numbers in the fs_devices as pointed by the fs_info. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: preparatory to make btrfs_rm_dev_replace_srcdev() seed awareAnand Jain
There is no logical change in this patch, just a preparatory patch, so that changes can be easily reasoned. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: Drop stray check of fixup_workers creationAndrey Utkin
The issue was introduced in a79b7d4b3e8118f265dcb4bdf9a572c392f02708, adding allocation of extent_workers, so this stray check is surely not meant to be a check of something else. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=82021 Reported-by: Maks Naumov <maksqwe1@ukr.net> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: make btrfs_search_forward return with nodes unlockedFilipe Manana
None of the uses of btrfs_search_forward() need to have the path nodes (level >= 1) read locked, only the leaf needs to be locked while the caller processes it. Therefore make it return a path with all nodes unlocked, except for the leaf. This change is motivated by the observation that during a file fsync we repeatdly call btrfs_search_forward() and process the returned leaf while upper nodes of the returned path (level >= 1) are read locked, which unnecessarily blocks other tasks that want to write to the same fs/subvol btree. Therefore instead of modifying the fsync code to unlock all nodes with level >= 1 immediately after calling btrfs_search_forward(), change btrfs_search_forward() to do it, so that it benefits all callers. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: sysfs label interface should check for read only FSAnand Jain
Not sure how this escaped many eyes so far Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: code optimize: BTRFS_ATTR_RW could set the modeAnand Jain
BTRFS_ATTR_RW could set the mode and be inline with BTRFS_ATTR Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: code optimize: BTRFS_ATTR could handle the modeAnand Jain
All that uses BTRFS_ATTR want mode to be set at 0444 so just do it at the define. And few spacing alignments. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: use BTRFS_ATTR instead of btrfs_no_store()Anand Jain
we have BTRFS_ATTR define to create sysfs RO file, use that. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: avoid unnecessary switch of path locks to blocking modeFilipe Manana
If we need to cow a node, increase the write lock level and retry the tree search, there's no point of changing the node locks in our path to blocking mode, as we only waste time and unnecessarily wake up other tasks waiting on the spinning locks (just to block them again shortly after) because we release our path before repeating the tree search. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: unlock nodes earlier when inserting items in a btreeFilipe Manana
In ctree.c:setup_items_for_insert(), we can unlock all nodes in our path before we process the leaf (shift items and data, adjust data offsets, etc). This allows for better btree concurrency, as we're often holding a write lock on at least the node at level 1. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: use IS_ALIGNED() for assertion in btrfs_lookup_csums_range() for ↵Satoru Takeuchi
simplicity btrfs_lookup_csums_range() uses ALIGN() to check if "start" and "end + 1" are aligned to "root->sectorsize". It's better to replace these with IS_ALIGNED() for simplicity. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: add trace for qgroup accountingMark Fasheh
We want this to debug qgroup changes on live systems. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: cleanup unused latest_devid and latest_trans in fs_devicesMiao Xie
The member variants - latest_devid and latest_trans - of fs_devices structure are set, but no one use them to do anything. so remove them. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: update the comment of total_bytes and disk_total_bytes of btrfs_devieMiao Xie
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: Fix the problem that the dirty flag of dev stats is clearedMiao Xie
The io error might happen during writing out the device stats, and the device stats information and dirty flag would be update at that time, but the current code didn't consider this case, just clear the dirty flag, it would cause that we forgot to write out the new device stats information. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: make the device lock and its protected data in the same cachelineMiao Xie
The lock in btrfs_device structure was far away from its protected data, it would make CPU load the cache line twice when we accessed them, move them together. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix wrong generation check of super block on a seed deviceMiao Xie
The super block generation of the seed devices is not the same as the filesystem which sprouted from them because we don't update the super block on the seed devices when we change that new filesystem. So we should not use the generation of that new filesystem to check the super block generation on the seed devices, Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix wrong fsid check of scrubMiao Xie
All the metadata in the seed devices has the same fsid as the fsid of the seed filesystem which is on the seed device, so we should check them by the current filesystem. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: wake up transaction thread from SYNC_FS ioctlDavid Sterba
The transaction thread may want to do more work, namely it pokes the cleaner ktread that will start processing uncleaned subvols. This can be triggered by user via the 'btrfs fi sync' command, otherwise there was a delay up to 30 seconds before the cleaner started to clean old snapshots. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix wrong max inline data size limitWang Shilong
inline data is stored from offset of @disk_bytenr in struct btrfs_file_extent_item. So substracting total size of struct btrfs_file_extent_item is wrong, fix it. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix off-by-one in cow_file_range_inline()Wang Shilong
Btrfs could still inline file data if its size is same as page size, so don't skip max value here. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fall into nocompression codes quickly if possibleWang Shilong
If flag NOCOMPRESS is set which means bad compression ratio, we could avoid call cow_file_range_async() for this case earlier. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix wrong skipping compression for an inodeWang Shilong
If a file's compression ratios is bad, we will set NOCOMPRESS flag for it, and it will skip compression for that inode next time. However, if we remount fs to COMPRESS_FORCE, it still should try if we could compress pages for that inode, this patch fix wrong check for this problem. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix sparse warningFabian Frederick
Fix the following sparse warning: fs/btrfs/send.c:518:51: warning: incorrect type in argument 2 (different address spaces) fs/btrfs/send.c:518:51: expected char const [noderef] <asn:1>*<noident> fs/btrfs/send.c:518:51: got char * We can safely use (const char __user *) with set_fs(KERNEL_DS) __force added to avoid sparse-all warning: fs/btrfs/send.c:518:40: warning: cast adds address space to expression (<asn:1>) Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Zach Brown <zab@zabbo.net> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: use BUG_ONHIMANGI SARAOGI
Use BUG_ON(x) rather than if(x) BUG(); The semantic patch that fixes this problem is as follows: // <smpl> @@ identifier x; @@ -if (x) BUG(); +BUG_ON(x); // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs compression: merge inflate and deflate z_streamsSergey Senozhatsky
`struct workspace' used for zlib compression contains two zlib z_stream-s: `def_strm' used in zlib_compress_pages(), and `inf_strm' used in zlib_decompress/zlib_decompress_biovec(). None of these functions use `inf_strm' and `def_strm' simultaniously, meaning that for every compress/decompress operation we need only one z_stream (out of two available). `inf_strm' and `def_strm' are different in size of ->workspace. For inflate stream we vmalloc() zlib_inflate_workspacesize() bytes, for deflate stream - zlib_deflate_workspacesize() bytes. On my system zlib returns the following workspace sizes, correspondingly: 42312 and 268104 (+ guard pages). Keep only one `z_stream' in `struct workspace' and use it for both compression and decompression. Hence, instead of vmalloc() of two z_stream->worskpace-s, allocate only one of size: max(zlib_deflate_workspacesize(), zlib_inflate_workspacesize()) Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: set error return value in btrfs_get_blocks_directFilipe Manana
We were returning with 0 (success) because we weren't extracting the error code from em (PTR_ERR(em)). Fix it. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: reduce size of struct extent_stateFilipe Manana
The tree field of struct extent_state was only used to figure out if an extent state was connected to an inode's io tree or not. For this we can just use the rb_node field itself. On a x86_64 system with this change the sizeof(struct extent_state) is reduced from 96 bytes down to 88 bytes, meaning that with a page size of 4096 bytes we can now store 46 extent states per page instead of 42. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: use PTR_ERR_OR_ZEROFabian Frederick
replace IS_ERR/PTR_ERR Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: print btrfs specific info for some fatal error casesWang Shilong
Marc argued that if there are several btrfs filesystems mounted, while users even don't know which filesystem hit the corrupted errors something like generation verification failure. Since @extent_buffer structure has a member @fs_info, let's output btrfs device info. Reported-by: Marc MERLIN <marc@merlins.org> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: fix writing data into the seed filesystemMiao Xie
If we mounted a seed filesystem with degraded option, and then added a new device into the seed filesystem, then we found adding device failed because of the IO failure. Steps to reproduce: # mkfs.btrfs -d raid1 -m raid1 <dev0> <dev1> # btrfstune -S 1 <dev0> # mount <dev0> -o degraded <mnt> # btrfs device add -f <dev2> <mnt> It is because the original didn't set the chunk on the seed device to be read-only if the degraded flag was set. It was introduced by patch f48b90756, which fixed the problem the raid1 filesystem became read-only after one device of it was missing. But this fix method was not right, we should set the read-only flag according to the number of the missing devices, not the degraded mount option, if the number of the missing devices is less than the max error number that the profile of the chunk tolerates, we don't set it to be read-only. Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17Btrfs: make defragment work with nodatacow optionWang Shilong
Btrfs defragment will utilize COW feature, which means this did not work for nodatacow option, this problem was detected by xfstests generic/018 with nodatacow mount option. Fix this problem by forcing cow for a extent with state @EXTETN_DEFRAG setting. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: label should not contain return charSatoru Takeuchi
Rediffed remaining parts of original patch from Anand Jain. This makes sure to avoid trailing newlines in the btrfs label output reproducer.sh: =============================================================================== TEST_DEV=/dev/vdb TEST_DIR=/home/sat/mnt umount /home/sat/mnt mkfs.btrfs -f $TEST_DEV UUID=$(btrfs fi show $TEST_DEV | head -1 | sed -e 's/.*uuid: \([-0-9a-z]*\)$/\1/') mount $TEST_DEV $TEST_DIR LABELFILE=/sys/fs/btrfs/$UUID/label echo "Test for empty label..." >&2 LINES="$(cat $LABELFILE | wc -l | awk '{print $1}')" RET=0 if [ $LINES -eq 0 ] ; then echo '[PASS] Trailing \n is removed correctly.' >&2 else echo '[FAIL] Trailing \n still exists.' >&2 RET=1 fi echo "Test for non-empty label..." >&2 echo testlabel >$LABELFILE LINES="$(cat $LABELFILE | wc -l | awk '{print $1}')" if [ $LINES -eq 1 ] ; then echo '[PASS] Trailing \n is removed correctly.' >&2 else echo '[FAIL] Trailing \n still exists.' >&2 RET=1 fi exit $RET =============================================================================== Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17btrfs: device delete must be syslogedAnand Jain
as in the disk add patch, disk detached from the volume must be recorded in the syslog as well for the same reason. Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>