Skip to content
  1. May 09, 2013
    • Tyler Hicks's avatar
      eCryptfs: Use the ablkcipher crypto API · 4dfea4f0
      Tyler Hicks authored
      
      
      Make the switch from the blkcipher kernel crypto interface to the
      ablkcipher interface.
      
      encrypt_scatterlist() and decrypt_scatterlist() now use the ablkcipher
      interface but, from the eCryptfs standpoint, still treat the crypto
      operation as a synchronous operation. They submit the async request and
      then wait until the operation is finished before they return. Most of
      the changes are contained inside those two functions.
      
      Despite waiting for the completion of the crypto operation, the
      ablkcipher interface provides performance increases in most cases when
      used on AES-NI capable hardware.
      
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Acked-by: default avatarColin King <colin.king@canonical.com>
      Reviewed-by: default avatarZeev Zilberman <zeev@annapurnaLabs.com>
      Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Ying Huang <ying.huang@intel.com>
      Cc: Thieu Le <thieule@google.com>
      Cc: Li Wang <dragonylffly@163.com>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@iki.fi>
      4dfea4f0
  2. Apr 26, 2013
  3. Apr 18, 2013
  4. Apr 17, 2013
  5. Apr 13, 2013
    • Suleiman Souhlal's avatar
      vfs: Revert spurious fix to spinning prevention in prune_icache_sb · 5b55d708
      Suleiman Souhlal authored
      
      
      Revert commit 62a3ddef ("vfs: fix spinning prevention in prune_icache_sb").
      
      This commit doesn't look right: since we are looking at the tail of the
      list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
      it back at the head of the list instead of the tail, otherwise we will
      keep spinning on it.
      
      Discovered when investigating why prune_icache_sb came top in perf
      reports of a swapping load.
      
      Signed-off-by: default avatarSuleiman Souhlal <suleiman@google.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v3.2+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b55d708
    • Josef Bacik's avatar
      Btrfs: make sure nbytes are right after log replay · 4bc4bee4
      Josef Bacik authored
      
      
      While trying to track down a tree log replay bug I noticed that fsck was always
      complaining about nbytes not being right for our fsynced file.  That is because
      the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
      nbytes are not necessarily updated properly when we log it.  So to fix this we
      need to set nbytes to whatever it is on the inode that is on disk, so when we
      replay the extents we can just add the bytes that are being added as we replay
      the extent.  This makes it work for the case that we have the wrong nbytes or
      the case that we logged everything and nbytes is actually correct.  With this
      I'm no longer getting nbytes errors out of btrfsck.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      4bc4bee4
  6. Apr 12, 2013
    • Thomas Gleixner's avatar
      kthread: Prevent unpark race which puts threads on the wrong cpu · f2530dc7
      Thomas Gleixner authored
      
      
      The smpboot threads rely on the park/unpark mechanism which binds per
      cpu threads on a particular core. Though the functionality is racy:
      
      CPU0	       	 	CPU1  	     	    CPU2
      unpark(T)				    wake_up_process(T)
        clear(SHOULD_PARK)	T runs
      			leave parkme() due to !SHOULD_PARK  
        bind_to(CPU2)		BUG_ON(wrong CPU)						    
      
      We cannot let the tasks move themself to the target CPU as one of
      those tasks is actually the migration thread itself, which requires
      that it starts running on the target cpu right away.
      
      The solution to this problem is to prevent wakeups in park mode which
      are not from unpark(). That way we can guarantee that the association
      of the task to the target cpu is working correctly.
      
      Add a new task state (TASK_PARKED) which prevents other wakeups and
      use this state explicitly for the unpark wakeup.
      
      Peter noticed: Also, since the task state is visible to userspace and
      all the parked tasks are still in the PID space, its a good hint in ps
      and friends that these tasks aren't really there for the moment.
      
      The migration thread has another related issue.
      
      CPU0	      	     	 CPU1
      Bring up CPU2
      create_thread(T)
      park(T)
       wait_for_completion()
      			 parkme()
      			 complete()
      sched_set_stop_task()
      			 schedule(TASK_PARKED)
      
      The sched_set_stop_task() call is issued while the task is on the
      runqueue of CPU1 and that confuses the hell out of the stop_task class
      on that cpu. So we need the same synchronizaion before
      sched_set_stop_task().
      
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Reported-and-tested-by: default avatarDave Hansen <dave@sr71.net>
      Reported-and-tested-by: default avatarBorislav Petkov <bp@alien8.de>
      Acked-by: default avatarPeter Ziljstra <peterz@infradead.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: dhillf@gmail.com
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      f2530dc7
  7. Apr 10, 2013
  8. Apr 09, 2013
  9. Apr 05, 2013
    • Trond Myklebust's avatar
      NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list · 7b1f1fd1
      Trond Myklebust authored
      
      
      It is unsafe to use list_for_each_entry_safe() here, because
      when we drop the nn->nfs_client_lock, we pin the _current_ list
      entry and ensure that it stays in the list, but we don't do the
      same for the _next_ list entry. Use of list_for_each_entry() is
      therefore the correct thing to do.
      
      Also fix the refcounting in nfs41_walk_client_list().
      
      Finally, ensure that the nfs_client has finished being initialised
      and, in the case of NFSv4.1, that the session is set up.
      
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Bryan Schumaker <bjschuma@netapp.com>
      Cc: stable@vger.kernel.org [>= 3.7]
      7b1f1fd1
    • Trond Myklebust's avatar
      NFSv4: Fix a memory leak in nfs4_discover_server_trunking · b193d59a
      Trond Myklebust authored
      
      
      When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy
      the old one.
      
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org [>=3.7]
      b193d59a
    • Bob Peterson's avatar
      GFS2: Issue discards in 512b sectors · b2c87cae
      Bob Peterson authored
      
      
      This patch changes GFS2's discard issuing code so that it calls
      function sb_issue_discard rather than blkdev_issue_discard. The
      code was calling blkdev_issue_discard and specifying the correct
      sector offset and sector size, but blkdev_issue_discard expects
      these values to be in terms of 512 byte sectors, even if the native
      sector size for the device is different. Calling sb_issue_discard
      with the BLOCK size instead ensures the correct block-to-512b-sector
      translation. I verified that "minlen" is specified in blocks, so
      comparing it to a number of blocks is correct.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      b2c87cae
  10. Apr 04, 2013
  11. Apr 03, 2013
    • Zheng Liu's avatar
      ext4: fix big-endian bugs which could cause fs corruptions · 8cde7ad1
      Zheng Liu authored
      
      
      When an extent was zeroed out, we forgot to do convert from cpu to le16.
      It could make us hit a BUG_ON when we try to write dirty pages out.  So
      fix it.
      
      [ Also fix a bug found by Dmitry Monakhov where we were missing
        le32_to_cpu() calls in the new indirect punch hole code.
      
        There are a number of other big endian warnings found by static code
        analyzers, but we'll wait for the next merge window to fix them all
        up.  These fixes are designed to be Obviously Correct by code
        inspection, and easy to demonstrate that it won't make any
        difference (and hence, won't introduce any bugs) on little endian
        architectures such as x86.  --tytso ]
      
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: default avatarCAI Qian <caiqian@redhat.com>
      Reported-by: default avatarChristian Kujau <lists@nerdbynature.de>
      Cc: Dmitry Monakhov <dmonakhov@openvz.org>
      8cde7ad1
  12. Apr 01, 2013
    • Anatol Pomozov's avatar
      loop: prevent bdev freeing while device in use · c1681bf8
      Anatol Pomozov authored
      
      
      struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
      block_device allocated first time we access /dev/loopXX and deallocated on
      bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
      we want that block_device stay alive until we destroy the loop device
      with "losetup -d".
      
      But because we do not hold /dev/loopXX inode its counter goes 0, and
      inode/bdev can be destroyed at any moment. Usually it happens at memory
      pressure or when user drops inode cache (like in the test below). When later in
      loop_clr_fd() we want to use bdev we have use-after-free error with following
      stack:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
        bd_set_size+0x10/0xa0
        loop_clr_fd+0x1f8/0x420 [loop]
        lo_ioctl+0x200/0x7e0 [loop]
        lo_compat_ioctl+0x47/0xe0 [loop]
        compat_blkdev_ioctl+0x341/0x1290
        do_filp_open+0x42/0xa0
        compat_sys_ioctl+0xc1/0xf20
        do_sys_open+0x16e/0x1d0
        sysenter_dispatch+0x7/0x1a
      
      To prevent use-after-free we need to grab the device in loop_set_fd()
      and put it later in loop_clr_fd().
      
      The issue is reprodusible on current Linus head and v3.3. Here is the test:
      
        dd if=/dev/zero of=loop.file bs=1M count=1
        while [ true ]; do
          losetup /dev/loop0 loop.file
          echo 2 > /proc/sys/vm/drop_caches
          losetup -d /dev/loop0
        done
      
      [ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
        time we call loop_set_fd() we check that loop_device->lo_state is
        Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
        it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
        device we'll get ENXIO.
      
        loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
        loop_device->lo_ctl_mutex. ]
      
      Signed-off-by: default avatarAnatol Pomozov <anatol.pomozov@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c1681bf8
  13. Mar 29, 2013
  14. Mar 28, 2013
  15. Mar 27, 2013
    • Al Viro's avatar
      vfs/splice: Fix missed checks in new __kernel_write() helper · 3e84f48e
      Al Viro authored
      
      
      Commit 06ae43f3 ("Don't bother with redoing rw_verify_area() from
      default_file_splice_from()") lost the checks to test existence of the
      write/aio_write methods.  My apologies ;-/
      
      Eventually, we want that in fs/splice.c side of things (no point
      repeating it for every buffer, after all), but for now this is the
      obvious minimal fix.
      
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e84f48e
    • Eric W. Biederman's avatar
      userns: Restrict when proc and sysfs can be mounted · 87a8ebd6
      Eric W. Biederman authored
      
      
      Only allow unprivileged mounts of proc and sysfs if they are already
      mounted when the user namespace is created.
      
      proc and sysfs are interesting because they have content that is
      per namespace, and so fresh mounts are needed when new namespaces
      are created while at the same time proc and sysfs have content that
      is shared between every instance.
      
      Respect the policy of who may see the shared content of proc and sysfs
      by only allowing new mounts if there was an existing mount at the time
      the user namespace was created.
      
      In practice there are only two interesting cases: proc and sysfs are
      mounted at their usual places, proc and sysfs are not mounted at all
      (some form of mount namespace jail).
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      87a8ebd6
    • Eric W. Biederman's avatar
      vfs: Carefully propogate mounts across user namespaces · 132c94e3
      Eric W. Biederman authored
      
      
      As a matter of policy MNT_READONLY should not be changable if the
      original mounter had more privileges than creator of the mount
      namespace.
      
      Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
      a mount namespace that requires more privileges to a mount namespace
      that requires fewer privileges.
      
      When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
      if any of the mnt flags that should never be changed are set.
      
      This protects both mount propagation and the initial creation of a less
      privileged mount namespace.
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      132c94e3
    • Eric W. Biederman's avatar
      vfs: Add a mount flag to lock read only bind mounts · 90563b19
      Eric W. Biederman authored
      
      
      When a read-only bind mount is copied from mount namespace in a higher
      privileged user namespace to a mount namespace in a lesser privileged
      user namespace, it should not be possible to remove the the read-only
      restriction.
      
      Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
      remain read-only.
      
      CC: stable@vger.kernel.org
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      90563b19
    • Eric W. Biederman's avatar
      userns: Don't allow creation if the user is chrooted · 3151527e
      Eric W. Biederman authored
      
      
      Guarantee that the policy of which files may be access that is
      established by setting the root directory will not be violated
      by user namespaces by verifying that the root directory points
      to the root of the mount namespace at the time of user namespace
      creation.
      
      Changing the root is a privileged operation, and as a matter of policy
      it serves to limit unprivileged processes to files below the current
      root directory.
      
      For reasons of simplicity and comprehensibility the privilege to
      change the root directory is gated solely on the CAP_SYS_CHROOT
      capability in the user namespace.  Therefore when creating a user
      namespace we must ensure that the policy of which files may be access
      can not be violated by changing the root directory.
      
      Anyone who runs a processes in a chroot and would like to use user
      namespace can setup the same view of filesystems with a mount
      namespace instead.  With this result that this is not a practical
      limitation for using user namespaces.
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      3151527e
  16. Mar 26, 2013
    • Al Viro's avatar
      Nest rename_lock inside vfsmount_lock · 7ea600b5
      Al Viro authored
      
      
      ... lest we get livelocks between path_is_under() and d_path() and friends.
      
      The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
      it is possible to have thread B spin on attempt to take lock shared while thread
      A is already holding it shared, if B is on lower-numbered CPU than A and there's
      a thread C spinning on attempt to take the same lock exclusive.
      
      As the result, we need consistent ordering between vfsmount_lock (lglock) and
      rename_lock (seq_lock), even though everything that takes both is going to take
      vfsmount_lock only shared.
      
      Spotted-by: default avatarBrad Spengler <spender@grsecurity.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7ea600b5
Loading