Skip to content
  1. Aug 26, 2021
    • Sasha Levin's avatar
    • Sergey Marinkevich's avatar
      netfilter: nft_exthdr: fix endianness of tcp option cast · 4bf19415
      Sergey Marinkevich authored
      
      
      [ Upstream commit 2e34328b ]
      
      I got a problem on MIPS with Big-Endian is turned on: every time when
      NF trying to change TCP MSS it returns because of new.v16 was greater
      than old.v16. But real MSS was 1460 and my rule was like this:
      
      	add rule table chain tcp option maxseg size set 1400
      
      And 1400 is lesser that 1460, not greater.
      
      Later I founded that main causer is cast from u32 to __be16.
      
      Debugging:
      
      In example MSS = 1400(HEX: 0x578). Here is representation of each byte
      like it is in memory by addresses from left to right(e.g. [0x0 0x1 0x2
      0x3]). LE — Little-Endian system, BE — Big-Endian, left column is type.
      
      	     LE               BE
      	u32: [78 05 00 00]    [00 00 05 78]
      
      As you can see, u32 representation will be casted to u16 from different
      half of 4-byte address range. But actually nf_tables uses registers and
      store data of various size. Actually TCP MSS stored in 2 bytes. But
      registers are still u32 in definition:
      
      	struct nft_regs {
      		union {
      			u32			data[20];
      			struct nft_verdict	verdict;
      		};
      	};
      
      So, access like regs->data[priv->sreg] exactly u32. So, according to
      table presents above, per-byte representation of stored TCP MSS in
      register will be:
      
      	                     LE               BE
      	(u32)regs->data[]:   [78 05 00 00]    [05 78 00 00]
      	                                       ^^ ^^
      
      We see that register uses just half of u32 and other 2 bytes may be
      used for some another data. But in nft_exthdr_tcp_set_eval() it casted
      just like u32 -> __be16:
      
      	new.v16 = src
      
      But u32 overfill __be16, so it get 2 low bytes. For clarity draw
      one more table(<xx xx> means that bytes will be used for cast).
      
      	                     LE                 BE
      	u32:                 [<78 05> 00 00]    [00 00 <05 78>]
      	(u32)regs->data[]:   [<78 05> 00 00]    [05 78 <00 00>]
      
      As you can see, for Little-Endian nothing changes, but for Big-endian we
      take the wrong half. In my case there is some other data instead of
      zeros, so new MSS was wrongly greater.
      
      For shooting this bug I used solution for ports ranges. Applying of this
      patch does not affect Little-Endian systems.
      
      Signed-off-by: default avatarSergey Marinkevich <sergey.marinkevich@eltex-co.ru>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4bf19415
    • Jeff Layton's avatar
      fs: warn about impending deprecation of mandatory locks · e4fd994f
      Jeff Layton authored
      
      
      [ Upstream commit fdd92b64 ]
      
      We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros
      have disabled it. Warn the stragglers that still use "-o mand" that
      we'll be dropping support for that mount option.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e4fd994f
    • Johannes Weiner's avatar
      mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim · 41c7f46c
      Johannes Weiner authored
      [ Upstream commit f56ce412 ]
      
      We've noticed occasional OOM killing when memory.low settings are in
      effect for cgroups.  This is unexpected and undesirable as memory.low is
      supposed to express non-OOMing memory priorities between cgroups.
      
      The reason for this is proportional memory.low reclaim.  When cgroups
      are below their memory.low threshold, reclaim passes them over in the
      first round, and then retries if it couldn't find pages anywhere else.
      But when cgroups are slightly above their memory.low setting, page scan
      force is scaled down and diminished in proportion to the overage, to the
      point where it can cause reclaim to fail as well - only in that case we
      currently don't retry, and instead trigger OOM.
      
      To fix this, hook proportional reclaim into the same retry logic we have
      in place for when cgroups are skipped entirely.  This way if reclaim
      fails and some cgroups were scanned with diminished pressure, we'll try
      another full-force cycle before giving up and OOMing.
      
      [akpm@linux-foundation.org: coding-style fixes]
      
      Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org
      
      
      Fixes: 9783aa99 ("mm, memcg: proportional memory.{low,min} reclaim")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarLeon Yang <lnyng@fb.com>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>		[5.4+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      41c7f46c
    • Yafang Shao's avatar
      mm, memcg: avoid stale protection values when cgroup is above protection · 1a3aa814
      Yafang Shao authored
      
      
      [ Upstream commit 22f7496f ]
      
      Patch series "mm, memcg: memory.{low,min} reclaim fix & cleanup", v4.
      
      This series contains a fix for a edge case in my earlier protection
      calculation patches, and a patch to make the area overall a little more
      robust to hopefully help avoid this in future.
      
      This patch (of 2):
      
      A cgroup can have both memory protection and a memory limit to isolate it
      from its siblings in both directions - for example, to prevent it from
      being shrunk below 2G under high pressure from outside, but also from
      growing beyond 4G under low pressure.
      
      Commit 9783aa99 ("mm, memcg: proportional memory.{low,min} reclaim")
      implemented proportional scan pressure so that multiple siblings in excess
      of their protection settings don't get reclaimed equally but instead in
      accordance to their unprotected portion.
      
      During limit reclaim, this proportionality shouldn't apply of course:
      there is no competition, all pressure is from within the cgroup and should
      be applied as such.  Reclaim should operate at full efficiency.
      
      However, mem_cgroup_protected() never expected anybody to look at the
      effective protection values when it indicated that the cgroup is above its
      protection.  As a result, a query during limit reclaim may return stale
      protection values that were calculated by a previous reclaim cycle in
      which the cgroup did have siblings.
      
      When this happens, reclaim is unnecessarily hesitant and potentially slow
      to meet the desired limit.  In theory this could lead to premature OOM
      kills, although it's not obvious this has occurred in practice.
      
      Workaround the problem by special casing reclaim roots in
      mem_cgroup_protection.  These memcgs are never participating in the
      reclaim protection because the reclaim is internal.
      
      We have to ignore effective protection values for reclaim roots because
      mem_cgroup_protected might be called from racing reclaim contexts with
      different roots.  Calculation is relying on root -> leaf tree traversal
      therefore top-down reclaim protection invariants should hold.  The only
      exception is the reclaim root which should have effective protection set
      to 0 but that would be problematic for the following setup:
      
       Let's have global and A's reclaim in parallel:
        |
        A (low=2G, usage = 3G, max = 3G, children_low_usage = 1.5G)
        |\
        | C (low = 1G, usage = 2.5G)
        B (low = 1G, usage = 0.5G)
      
       for A reclaim we have
       B.elow = B.low
       C.elow = C.low
      
       For the global reclaim
       A.elow = A.low
       B.elow = min(B.usage, B.low) because children_low_usage <= A.elow
       C.elow = min(C.usage, C.low)
      
       With the effective values resetting we have A reclaim
       A.elow = 0
       B.elow = B.low
       C.elow = C.low
      
       and global reclaim could see the above and then
       B.elow = C.elow = 0 because children_low_usage > A.elow
      
      Which means that protected memcgs would get reclaimed.
      
      In future we would like to make mem_cgroup_protected more robust against
      racing reclaim contexts but that is likely more complex solution than this
      simple workaround.
      
      [hannes@cmpxchg.org - large part of the changelog]
      [mhocko@suse.com - workaround explanation]
      [chris@chrisdown.name - retitle]
      
      Fixes: 9783aa99 ("mm, memcg: proportional memory.{low,min} reclaim")
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarChris Down <chris@chrisdown.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Link: http://lkml.kernel.org/r/cover.1594638158.git.chris@chrisdown.name
      Link: http://lkml.kernel.org/r/044fb8ecffd001c7905d27c0c2ad998069fdc396.1594638158.git.chris@chrisdown.name
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1a3aa814
    • Takashi Iwai's avatar
      ASoC: intel: atom: Fix breakage for PCM buffer address setup · 9c1c449d
      Takashi Iwai authored
      
      
      [ Upstream commit 65ca89c2 ]
      
      The commit 2e6b8363 ("ASoC: intel: atom: Fix reference to PCM
      buffer address") changed the reference of PCM buffer address to
      substream->runtime->dma_addr as the buffer address may change
      dynamically.  However, I forgot that the dma_addr field is still not
      set up for the CONTINUOUS buffer type (that this driver uses) yet in
      5.14 and earlier kernels, and it resulted in garbage I/O.  The problem
      will be fixed in 5.15, but we need to address it quickly for now.
      
      The fix is to deduce the address again from the DMA pointer with
      virt_to_phys(), but from the right one, substream->runtime->dma_area.
      
      Fixes: 2e6b8363 ("ASoC: intel: atom: Fix reference to PCM buffer address")
      Reported-and-tested-by: default avatarHans de Goede <hdegoede@redhat.com>
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/2048c6aa-2187-46bd-6772-36a4fb3c5aeb@redhat.com
      Link: https://lore.kernel.org/r/20210819152945.8510-1-tiwai@suse.de
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9c1c449d
    • Marcin Bachry's avatar
      PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI · 846ba58a
      Marcin Bachry authored
      [ Upstream commit e0bff432 ]
      
      The Renoir XHCI controller apparently doesn't resume reliably with the
      standard D3hot-to-D0 delay.  Increase it to 20ms.
      
      [Alex: I talked to the AMD USB hardware team and the AMD Windows team and
      they are not aware of any HW errata or specific issues.  The HW works fine
      in Windows.  I was told Windows uses a rather generous default delay of
      100ms for PCI state transitions.]
      
      Link: https://lore.kernel.org/r/20210722025858.220064-1-alexander.deucher@amd.com
      
      
      Signed-off-by: default avatarMarcin Bachry <hegel666@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Cc: Mario Limonciello <mario.limonciello@amd.com>
      Cc: Prike Liang <prike.liang@amd.com>
      Cc: Shyam Sundar S K <shyam-sundar.s-k@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      846ba58a
    • NeilBrown's avatar
      btrfs: prevent rename2 from exchanging a subvol with a directory from different parents · 548b75f4
      NeilBrown authored
      
      
      [ Upstream commit 3f79f6f6 ]
      
      Cross-rename lacks a check when that would prevent exchanging a
      directory and subvolume from different parent subvolume. This causes
      data inconsistencies and is caught before commit by tree-checker,
      turning the filesystem to read-only.
      
      Calling the renameat2 with RENAME_EXCHANGE flags like
      
        renameat2(AT_FDCWD, namesrc, AT_FDCWD, namedest, (1 << 1))
      
      on two paths:
      
        namesrc = dir1/subvol1/dir2
       namedest = subvol2/subvol3
      
      will cause key order problem with following write time tree-checker
      report:
      
        [1194842.307890] BTRFS critical (device loop1): corrupt leaf: root=5 block=27574272 slot=10 ino=258, invalid previous key objectid, have 257 expect 258
        [1194842.322221] BTRFS info (device loop1): leaf 27574272 gen 8 total ptrs 11 free space 15444 owner 5
        [1194842.331562] BTRFS info (device loop1): refs 2 lock_owner 0 current 26561
        [1194842.338772]        item 0 key (256 1 0) itemoff 16123 itemsize 160
        [1194842.338793]                inode generation 3 size 16 mode 40755
        [1194842.338801]        item 1 key (256 12 256) itemoff 16111 itemsize 12
        [1194842.338809]        item 2 key (256 84 2248503653) itemoff 16077 itemsize 34
        [1194842.338817]                dir oid 258 type 2
        [1194842.338823]        item 3 key (256 84 2363071922) itemoff 16043 itemsize 34
        [1194842.338830]                dir oid 257 type 2
        [1194842.338836]        item 4 key (256 96 2) itemoff 16009 itemsize 34
        [1194842.338843]        item 5 key (256 96 3) itemoff 15975 itemsize 34
        [1194842.338852]        item 6 key (257 1 0) itemoff 15815 itemsize 160
        [1194842.338863]                inode generation 6 size 8 mode 40755
        [1194842.338869]        item 7 key (257 12 256) itemoff 15801 itemsize 14
        [1194842.338876]        item 8 key (257 84 2505409169) itemoff 15767 itemsize 34
        [1194842.338883]                dir oid 256 type 2
        [1194842.338888]        item 9 key (257 96 2) itemoff 15733 itemsize 34
        [1194842.338895]        item 10 key (258 12 256) itemoff 15719 itemsize 14
        [1194842.339163] BTRFS error (device loop1): block=27574272 write time tree block corruption detected
        [1194842.339245] ------------[ cut here ]------------
        [1194842.443422] WARNING: CPU: 6 PID: 26561 at fs/btrfs/disk-io.c:449 csum_one_extent_buffer+0xed/0x100 [btrfs]
        [1194842.511863] CPU: 6 PID: 26561 Comm: kworker/u17:2 Not tainted 5.14.0-rc3-git+ #793
        [1194842.511870] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
        [1194842.511876] Workqueue: btrfs-worker-high btrfs_work_helper [btrfs]
        [1194842.511976] RIP: 0010:csum_one_extent_buffer+0xed/0x100 [btrfs]
        [1194842.512068] RSP: 0018:ffffa2c284d77da0 EFLAGS: 00010282
        [1194842.512074] RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffff928867bd9978
        [1194842.512078] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff928867bd9970
        [1194842.512081] RBP: ffff92876b958000 R08: 0000000000000001 R09: 00000000000c0003
        [1194842.512085] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
        [1194842.512088] R13: ffff92875f989f98 R14: 0000000000000000 R15: 0000000000000000
        [1194842.512092] FS:  0000000000000000(0000) GS:ffff928867a00000(0000) knlGS:0000000000000000
        [1194842.512095] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [1194842.512099] CR2: 000055f5384da1f0 CR3: 0000000102fe4000 CR4: 00000000000006e0
        [1194842.512103] Call Trace:
        [1194842.512128]  ? run_one_async_free+0x10/0x10 [btrfs]
        [1194842.631729]  btree_csum_one_bio+0x1ac/0x1d0 [btrfs]
        [1194842.631837]  run_one_async_start+0x18/0x30 [btrfs]
        [1194842.631938]  btrfs_work_helper+0xd5/0x1d0 [btrfs]
        [1194842.647482]  process_one_work+0x262/0x5e0
        [1194842.647520]  worker_thread+0x4c/0x320
        [1194842.655935]  ? process_one_work+0x5e0/0x5e0
        [1194842.655946]  kthread+0x135/0x160
        [1194842.655953]  ? set_kthread_struct+0x40/0x40
        [1194842.655965]  ret_from_fork+0x1f/0x30
        [1194842.672465] irq event stamp: 1729
        [1194842.672469] hardirqs last  enabled at (1735): [<ffffffffbd1104f5>] console_trylock_spinning+0x185/0x1a0
        [1194842.672477] hardirqs last disabled at (1740): [<ffffffffbd1104cc>] console_trylock_spinning+0x15c/0x1a0
        [1194842.672482] softirqs last  enabled at (1666): [<ffffffffbdc002e1>] __do_softirq+0x2e1/0x50a
        [1194842.672491] softirqs last disabled at (1651): [<ffffffffbd08aab7>] __irq_exit_rcu+0xa7/0xd0
      
      The corrupted data will not be written, and filesystem can be unmounted
      and mounted again (all changes since the last commit will be lost).
      
      Add the missing check for new_ino so that all non-subvolumes must reside
      under the same parent subvolume. There's an exception allowing to
      exchange two subvolumes from any parents as the directory representing a
      subvolume is only a logical link and does not have any other structures
      related to the parent subvolume, unlike files, directories etc, that
      are always in the inode namespace of the parent subvolume.
      
      Fixes: cdd1fedf ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
      CC: stable@vger.kernel.org # 4.7+
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      548b75f4
    • Dongliang Mu's avatar
      ipack: tpci200: fix memory leak in the tpci200_register · 0fc6a9c2
      Dongliang Mu authored
      
      
      [ Upstream commit 50f05bd1 ]
      
      The error handling code in tpci200_register does not free interface_regs
      allocated by ioremap and the current version of error handling code is
      problematic.
      
      Fix this by refactoring the error handling code and free interface_regs
      when necessary.
      
      Fixes: 43986798 ("ipack: add error handling for ioremap_nocache")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDongliang Mu <mudongliangabcd@gmail.com>
      Signed-off-by: default avatarDongliang Mu <mudongliangabcd@gmail.com>
      Link: https://lore.kernel.org/r/20210810100323.3938492-2-mudongliangabcd@gmail.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0fc6a9c2
    • Dongliang Mu's avatar
      ipack: tpci200: fix many double free issues in tpci200_pci_probe · 280d66b3
      Dongliang Mu authored
      
      
      [ Upstream commit 57a16810 ]
      
      The function tpci200_register called by tpci200_install and
      tpci200_unregister called by tpci200_uninstall are in pair. However,
      tpci200_unregister has some cleanup operations not in the
      tpci200_register. So the error handling code of tpci200_pci_probe has
      many different double free issues.
      
      Fix this problem by moving those cleanup operations out of
      tpci200_unregister, into tpci200_pci_remove and reverting
      the previous commit 9272e5d0 ("ipack/carriers/tpci200:
      Fix a double free in tpci200_pci_probe").
      
      Fixes: 9272e5d0 ("ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDongliang Mu <mudongliangabcd@gmail.com>
      Signed-off-by: default avatarDongliang Mu <mudongliangabcd@gmail.com>
      Link: https://lore.kernel.org/r/20210810100323.3938492-1-mudongliangabcd@gmail.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      280d66b3
    • Srinivas Kandagatla's avatar
      slimbus: ngd: reset dma setup during runtime pm · cb7aa510
      Srinivas Kandagatla authored
      
      
      [ Upstream commit d7777253 ]
      
      During suspend/resume NGD remote instance is power cycled along
      with remotely controlled bam dma engine.
      So Reset the dma configuration during this suspend resume path
      so that we are not dealing with any stale dma setup.
      
      Without this transactions timeout after first suspend resume path.
      
      Fixes: 917809e2 ("slimbus: ngd: Add qcom SLIMBus NGD driver")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Link: https://lore.kernel.org/r/20210809082428.11236-5-srinivas.kandagatla@linaro.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cb7aa510
    • Srinivas Kandagatla's avatar
      slimbus: messaging: check for valid transaction id · abce32d0
      Srinivas Kandagatla authored
      
      
      [ Upstream commit a263c1ff ]
      
      In some usecases transaction ids are dynamically allocated inside
      the controller driver after sending the messages which have generic
      acknowledge responses. So check for this before refcounting pm_runtime.
      
      Without this we would end up imbalancing runtime pm count by
      doing pm_runtime_put() in both slim_do_transfer() and slim_msg_response()
      for a single  pm_runtime_get() in slim_do_transfer()
      
      Fixes: d3062a21 ("slimbus: messaging: add slim_alloc/free_txn_tid()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Link: https://lore.kernel.org/r/20210809082428.11236-3-srinivas.kandagatla@linaro.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      abce32d0
    • Srinivas Kandagatla's avatar
      slimbus: messaging: start transaction ids from 1 instead of zero · 0786d315
      Srinivas Kandagatla authored
      
      
      [ Upstream commit 9659281c ]
      
      As tid is unsigned its hard to figure out if the tid is valid or
      invalid. So Start the transaction ids from 1 instead of zero
      so that we could differentiate between a valid tid and invalid tids
      
      This is useful in cases where controller would add a tid for controller
      specific transfers.
      
      Fixes: d3062a21 ("slimbus: messaging: add slim_alloc/free_txn_tid()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Link: https://lore.kernel.org/r/20210809082428.11236-2-srinivas.kandagatla@linaro.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0786d315
    • Steven Rostedt (VMware)'s avatar
      tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name · 20c2f141
      Steven Rostedt (VMware) authored
      [ Upstream commit 5acce0bf ]
      
      The following commands:
      
       # echo 'read_max u64 size;' > synthetic_events
       # echo 'hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)' > events/syscalls/sys_enter_read/trigger
      
      Causes:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP
       CPU: 4 PID: 1763 Comm: bash Not tainted 5.14.0-rc2-test+ #155
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
      v03.03 07/14/2016
       RIP: 0010:strcmp+0xc/0x20
       Code: 75 f7 31 c0 0f b6 0c 06 88 0c 02 48 83 c0 01 84 c9 75 f1 4c 89 c0
      c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07
      3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89
       RSP: 0018:ffffb5fdc0963ca8 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffffffffb3a4e040 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: ffff9714c0d0b640 RDI: 0000000000000000
       RBP: 0000000000000000 R08: 00000022986b7cde R09: ffffffffb3a4dff8
       R10: 0000000000000000 R11: 0000000000000000 R12: ffff9714c50603c8
       R13: 0000000000000000 R14: ffff97143fdf9e48 R15: ffff9714c01a2210
       FS:  00007f1fa6785740(0000) GS:ffff9714da400000(0000)
      knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 000000002d863004 CR4: 00000000001706e0
       Call Trace:
        __find_event_file+0x4e/0x80
        action_create+0x6b7/0xeb0
        ? kstrdup+0x44/0x60
        event_hist_trigger_func+0x1a07/0x2130
        trigger_process_regex+0xbd/0x110
        event_trigger_write+0x71/0xd0
        vfs_write+0xe9/0x310
        ksys_write+0x68/0xe0
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7f1fa6879e87
      
      The problem was the "trace(read_max,count)" where the "count" should be
      "$count" as "onmax()" only handles variables (although it really should be
      able to figure out that "count" is a field of sys_enter_read). But there's
      a path that does not find the variable and ends up passing a NULL for the
      event, which ends up getting passed to "strcmp()".
      
      Add a check for NULL to return and error on the command with:
      
       # cat error_log
        hist:syscalls:sys_enter_read: error: Couldn't create or find variable
        Command: hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)
                                      ^
      Link: https://lkml.kernel.org/r/20210808003011.4037f8d0@oasis.local.home
      
      
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 50450603 tracing: Add 'onmax' hist trigger action support
      Reviewed-by: default avatarTom Zanussi <zanussi@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      20c2f141
    • Jaroslav Kysela's avatar
      ALSA: hda - fix the 'Capture Switch' value change notifications · 8fbfebe1
      Jaroslav Kysela authored
      [ Upstream commit a2befe93 ]
      
      The original code in the cap_put_caller() function does not
      handle correctly the positive values returned from the passed
      function for multiple iterations. It means that the change
      notifications may be lost.
      
      Fixes: 352f7f91 ("ALSA: hda - Merge Realtek parser code to generic parser")
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=213851
      
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarJaroslav Kysela <perex@perex.cz>
      Link: https://lore.kernel.org/r/20210811161441.1325250-1-perex@perex.cz
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8fbfebe1
    • Vincent Whitchurch's avatar
      mmc: dw_mmc: Fix hang on data CRC error · 85e60614
      Vincent Whitchurch authored
      
      
      [ Upstream commit 25f8203b ]
      
      When a Data CRC interrupt is received, the driver disables the DMA, then
      sends the stop/abort command and then waits for Data Transfer Over.
      
      However, sometimes, when a data CRC error is received in the middle of a
      multi-block write transfer, the Data Transfer Over interrupt is never
      received, and the driver hangs and never completes the request.
      
      The driver sets the BMOD.SWR bit (SDMMC_IDMAC_SWRESET) when stopping the
      DMA, but according to the manual CMD.STOP_ABORT_CMD should be programmed
      "before assertion of SWR".  Do these operations in the recommended
      order.  With this change the Data Transfer Over is always received
      correctly in my tests.
      
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Reviewed-by: default avatarJaehoon Chung <jh80.chung@samsung.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210630102232.16011-1-vincent.whitchurch@axis.com
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      85e60614
    • Murphy Zhou's avatar
      ovl: add splice file read write helper · 4f6c9caf
      Murphy Zhou authored
      
      
      [ Upstream commit 1a980b8c ]
      
      Now overlayfs falls back to use default file splice read
      and write, which is not compatiple with overlayfs, returning
      EFAULT. xfstests generic/591 can reproduce part of this.
      
      Tested this patch with xfstests auto group tests.
      
      Signed-off-by: default avatarMurphy Zhou <jencce.kernel@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4f6c9caf
    • Sylwester Dziedziuch's avatar
      iavf: Fix ping is lost after untrusted VF had tried to change MAC · 85813f1f
      Sylwester Dziedziuch authored
      
      
      [ Upstream commit 8da80c9d ]
      
      Make changes to MAC address dependent on the response of PF.
      Disallow changes to HW MAC address and MAC filter from untrusted
      VF, thanks to that ping is not lost if VF tries to change MAC.
      Add a new field in iavf_mac_filter, to indicate whether there
      was response from PF for given filter. Based on this field pass
      or discard the filter.
      If untrusted VF tried to change it's address, it's not changed.
      Still filter was changed, because of that ping couldn't go through.
      
      Fixes: c5c922b3 ("iavf: fix MAC address setting for VFs when filter is rejected")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarGurucharan G <Gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      85813f1f
    • Arkadiusz Kubalewski's avatar
      i40e: Fix ATR queue selection · a498115d
      Arkadiusz Kubalewski authored
      
      
      [ Upstream commit a222be59 ]
      
      Without this patch, ATR does not work. Receive/transmit uses queue
      selection based on SW DCB hashing method.
      
      If traffic classes are not configured for PF, then use
      netdev_pick_tx function for selecting queue for packet transmission.
      Instead of calling i40e_swdcb_skb_tx_hash, call netdev_pick_tx,
      which ensures that packet is transmitted/received from CPU that is
      running the application.
      
      Reproduction steps:
      1. Load i40e driver
      2. Map each MSI interrupt of i40e port for each CPU
      3. Disable ntuple, enable ATR i.e.:
      ethtool -K $interface ntuple off
      ethtool --set-priv-flags $interface flow-director-atr
      4. Run application that is generating traffic and is bound to a
      single CPU, i.e.:
      taskset -c 9 netperf -H 1.1.1.1 -t TCP_RR -l 10
      5. Observe behavior:
      Application's traffic should be restricted to the CPU provided in
      taskset.
      
      Fixes: 89ec1f08 ("i40e: Fix queue-to-TC mapping on Tx")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Tested-by: default avatarDave Switzer <david.switzer@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a498115d
    • kaixi.fan's avatar
      ovs: clear skb->tstamp in forwarding path · 1b8a8fba
      kaixi.fan authored
      
      
      [ Upstream commit 01634047 ]
      
      fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs
      doesn't clear skb->tstamp. We encountered a problem with linux
      version 5.4.56 and ovs version 2.14.1, and packets failed to
      dequeue from qdisc when fq qdisc was attached to ovs port.
      
      Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
      Signed-off-by: default avatarkaixi.fan <fankaixi.li@bytedance.com>
      Signed-off-by: default avatarxiexiaohui <xiexiaohui.xxh@bytedance.com>
      Reviewed-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b8a8fba
    • Saravana Kannan's avatar
      net: mdio-mux: Handle -EPROBE_DEFER correctly · 84dbbf54
      Saravana Kannan authored
      [ Upstream commit 7bd0cef5 ]
      
      When registering mdiobus children, if we get an -EPROBE_DEFER, we shouldn't
      ignore it and continue registering the rest of the mdiobus children. This
      would permanently prevent the deferring child mdiobus from working instead
      of reattempting it in the future. So, if a child mdiobus needs to be
      reattempted in the future, defer the entire mdio-mux initialization.
      
      This fixes the issue where PHYs sitting under the mdio-mux aren't
      initialized correctly if the PHY's interrupt controller is not yet ready
      when the mdio-mux is being probed. Additional context in the link below.
      
      Fixes: 0ca2997d ("netdev/of/phy: Add MDIO bus multiplexer support.")
      Link: https://lore.kernel.org/lkml/CAGETcx95kHrv8wA-O+-JtfH7H9biJEGJtijuPVN0V5dUKUAB3A@mail.gmail.com/#t
      
      
      Signed-off-by: default avatarSaravana Kannan <saravanak@google.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarKevin Hilman <khilman@baylibre.com>
      Tested-by: default avatarKevin Hilman <khilman@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      84dbbf54
    • Saravana Kannan's avatar
      net: mdio-mux: Don't ignore memory allocation errors · 453486e7
      Saravana Kannan authored
      
      
      [ Upstream commit 99d81e94 ]
      
      If we are seeing memory allocation errors, don't try to continue
      registering child mdiobus devices. It's unlikely they'll succeed.
      
      Fixes: 342fa196 ("mdio: mux: make child bus walking more permissive and errors more verbose")
      Signed-off-by: default avatarSaravana Kannan <saravanak@google.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarKevin Hilman <khilman@baylibre.com>
      Tested-by: default avatarKevin Hilman <khilman@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      453486e7
    • Dinghao Liu's avatar
      net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32 · 6b70c678
      Dinghao Liu authored
      
      
      [ Upstream commit 0a298d13 ]
      
      qlcnic_83xx_unlock_flash() is called on all paths after we call
      qlcnic_83xx_lock_flash(), except for one error path on failure
      of QLCRD32(), which may cause a deadlock. This bug is suggested
      by a static analysis tool, please advise.
      
      Fixes: 81d0aeb0 ("qlcnic: flash template based firmware reset recovery")
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Link: https://lore.kernel.org/r/20210816131405.24024-1-dinghao.liu@zju.edu.cn
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6b70c678
    • Jason Wang's avatar
      virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO · da92ce36
      Jason Wang authored
      
      
      [ Upstream commit dbcf24d1 ]
      
      Commit a02e8964 ("virtio-net: ethtool configurable LRO")
      maps LRO to virtio guest offloading features and allows the
      administrator to enable and disable those features via ethtool.
      
      This leads to several issues:
      
      - For a device that doesn't support control guest offloads, the "LRO"
        can't be disabled triggering WARN in dev_disable_lro() when turning
        off LRO or when enabling forwarding bridging etc.
      
      - For a device that supports control guest offloads, the guest
        offloads are disabled in cases of bridging, forwarding etc slowing
        down the traffic.
      
      Fix this by using NETIF_F_GRO_HW instead. Though the spec does not
      guarantee packets to be re-segmented as the original ones,
      we can add that to the spec, possibly with a flag for devices to
      differentiate between GRO and LRO.
      
      Further, we never advertised LRO historically before a02e8964
      ("virtio-net: ethtool configurable LRO") and so bridged/forwarded
      configs effectively always relied on virtio receive offloads behaving
      like GRO - thus even if this breaks any configs it is at least not
      a regression.
      
      Fixes: a02e8964 ("virtio-net: ethtool configurable LRO")
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reported-by: default avatarIvan <ivan@prestigetransportation.com>
      Tested-by: default avatarIvan <ivan@prestigetransportation.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      da92ce36
    • Xuan Zhuo's avatar
      virtio-net: support XDP when not more queues · 9aeadce8
      Xuan Zhuo authored
      
      
      [ Upstream commit 97c2c69e ]
      
      The number of queues implemented by many virtio backends is limited,
      especially some machines have a large number of CPUs. In this case, it
      is often impossible to allocate a separate queue for
      XDP_TX/XDP_REDIRECT, then xdp cannot be loaded to work, even xdp does
      not use the XDP_TX/XDP_REDIRECT.
      
      This patch allows XDP_TX/XDP_REDIRECT to run by reuse the existing SQ
      with __netif_tx_lock() hold when there are not enough queues.
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9aeadce8
    • Lahav Schlesinger's avatar
      vrf: Reset skb conntrack connection on VRF rcv · 3ed7cf83
      Lahav Schlesinger authored
      
      
      [ Upstream commit 09e856d5 ]
      
      To fix the "reverse-NAT" for replies.
      
      When a packet is sent over a VRF, the POST_ROUTING hooks are called
      twice: Once from the VRF interface, and once from the "actual"
      interface the packet will be sent from:
      1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct()
           This causes the POST_ROUTING hooks to run.
      2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again.
      
      Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and
      second vrf_l3_rcv() calls them again.
      
      As an example, consider the following SNAT rule:
      > iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1
      
      In this case sending over a VRF will create 2 conntrack entries.
      The first is from the VRF interface, which performs the IP SNAT.
      The second will run the SNAT, but since the "expected reply" will remain
      the same, conntrack randomizes the source port of the packet:
      e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack
      rules are:
      udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1
      udp      17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
      
      i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is
      SNAT-ed from 10000 --> 61033.
      
      But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later
      conntrack entry is matched:
      udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1
      udp      17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
      
      And a "port 61033 unreachable" ICMP packet is sent back.
      
      The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(),
      the skb already has a conntrack flow attached to it, which means
      nf_conntrack_in() will not resolve the flow again.
      
      This means only the dest port is "reverse-NATed" (61033 -> 10000) but
      the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's
      not received.
      This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'.
      
      The fix is then to reset the flow when skb is received on a VRF, to let
      conntrack resolve the flow again (which now will hit the earlier flow).
      
      To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by
        running 'bash ./repro'):
        $ cat run_in_A1.py
        import logging
        logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
        from scapy.all import *
        import argparse
      
        def get_packet_to_send(udp_dst_port, msg_name):
            return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \
                IP(src='3.3.3.3', dst='2.2.2.2')/ \
                UDP(sport=53, dport=udp_dst_port)/ \
                Raw(f'{msg_name}\x0012345678901234567890')
      
        parser = argparse.ArgumentParser()
        parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True,
                            help="From run_in_A3.py")
        parser.add_argument('-socket_port', dest="socket_port", type=str,
                            required=True, help="From run_in_A3.py")
        parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True,
                            help="From script")
      
        args, _ = parser.parse_known_args()
        iface_mac = args.iface_mac
        socket_port = int(args.socket_port)
        v1_mac = args.v1_mac
      
        print(f'Source port before NAT: {socket_port}')
      
        while True:
            pkts = sniff(iface='_v0', store=True, count=1, timeout=10)
            if 0 == len(pkts):
                print('Something failed, rerun the script :(', flush=True)
                break
            pkt = pkts[0]
            if not pkt.haslayer('UDP'):
                continue
      
            pkt_sport = pkt.getlayer('UDP').sport
            print(f'Source port after NAT: {pkt_sport}', flush=True)
      
            pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port')
            sendp(pkt_to_send, '_v0', verbose=False) # Will not be received
      
            pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port')
            sendp(pkt_to_send, '_v0', verbose=False)
            break
      
        $ cat run_in_A2.py
        import socket
        import netifaces
      
        print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}",
              flush=True)
        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE,
                     str('vrf_1' + '\0').encode('utf-8'))
        s.connect(('3.3.3.3', 53))
        print(f'{s. getsockname()[1]}', flush=True)
        s.settimeout(5)
      
        while True:
            try:
                # Periodically send in order to keep the conntrack entry alive.
                s.send(b'a'*40)
                resp = s.recvfrom(1024)
                msg_name = resp[0].decode('utf-8').split('\0')[0]
                print(f"Got {msg_name}", flush=True)
            except Exception as e:
                pass
      
        $ cat repro.sh
        ip netns del A1 2> /dev/null
        ip netns del A2 2> /dev/null
        ip netns add A1
        ip netns add A2
      
        ip -n A1 link add _v0 type veth peer name _v1 netns A2
        ip -n A1 link set _v0 up
      
        ip -n A2 link add e00000 type bond
        ip -n A2 link add lo0 type dummy
        ip -n A2 link add vrf_1 type vrf table 10001
        ip -n A2 link set vrf_1 up
        ip -n A2 link set e00000 master vrf_1
      
        ip -n A2 addr add 1.1.1.1/24 dev e00000
        ip -n A2 link set e00000 up
        ip -n A2 link set _v1 master e00000
        ip -n A2 link set _v1 up
        ip -n A2 link set lo0 up
        ip -n A2 addr add 2.2.2.2/32 dev lo0
      
        ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000
        ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001
      
        ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \
      	SNAT --to-source 2.2.2.2 -o vrf_1
      
        sleep 5
        ip netns exec A2 python3 run_in_A2.py > x &
        XPID=$!
        sleep 5
      
        IFACE_MAC=`sed -n 1p x`
        SOCKET_PORT=`sed -n 2p x`
        V1_MAC=`ip -n A2 link show _v1 | sed -n 2p | awk '{print $2'}`
        ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \
                ${SOCKET_PORT} -v1_mac ${SOCKET_PORT}
        sleep 5
      
        kill -9 $XPID
        wait $XPID 2> /dev/null
        ip netns del A1
        ip netns del A2
        tail x -n 2
        rm x
        set +x
      
      Fixes: 73e20b76 ("net: vrf: Add support for PREROUTING rules on vrf device")
      Signed-off-by: default avatarLahav Schlesinger <lschlesinger@drivenets.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20210815120002.2787653-1-lschlesinger@drivenets.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3ed7cf83
    • Michael Chan's avatar
      bnxt_en: Add missing DMA memory barriers · 447b1602
      Michael Chan authored
      
      
      [ Upstream commit 828affc2 ]
      
      Each completion ring entry has a valid bit to indicate that the entry
      contains a valid completion event.  The driver's main poll loop
      __bnxt_poll_work() has the proper dma_rmb() to make sure the valid
      bit of the next entry has been checked before proceeding further.
      But when we call bnxt_rx_pkt() to process the RX event, the RX
      completion event consists of two completion entries and only the
      first entry has been checked to be valid.  We need the same barrier
      after checking the next completion entry.  Add missing dma_rmb()
      barriers in bnxt_rx_pkt() and other similar locations.
      
      Fixes: 67a95e20 ("bnxt_en: Need memory barrier when processing the completion ring.")
      Reported-by: default avatarLance Richardson <lance.richardson@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Reviewed-by: default avatarLance Richardson <lance.richardson@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      447b1602
    • Andy Shevchenko's avatar
      ptp_pch: Restore dependency on PCI · c9566df3
      Andy Shevchenko authored
      
      
      [ Upstream commit 55c8fca1 ]
      
      During the swap dependency on PCH_GBE to selection PTP_1588_CLOCK_PCH
      incidentally dropped the implicit dependency on the PCI. Restore it.
      
      Fixes: 18d359ce ("pch_gbe, ptp_pch: Fix the dependency direction between these drivers")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c9566df3
    • Pavel Skripkin's avatar
      net: 6pack: fix slab-out-of-bounds in decode_data · a73b9aa1
      Pavel Skripkin authored
      
      
      [ Upstream commit 19d1532a ]
      
      Syzbot reported slab-out-of bounds write in decode_data().
      The problem was in missing validation checks.
      
      Syzbot's reproducer generated malicious input, which caused
      decode_data() to be called a lot in sixpack_decode(). Since
      rx_count_cooked is only 400 bytes and noone reported before,
      that 400 bytes is not enough, let's just check if input is malicious
      and complain about buffer overrun.
      
      Fail log:
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in drivers/net/hamradio/6pack.c:843
      Write of size 1 at addr ffff888087c5544e by task kworker/u4:0/7
      
      CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.6.0-rc3-syzkaller #0
      ...
      Workqueue: events_unbound flush_to_ldisc
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
       __kasan_report.cold+0x1b/0x32 mm/kasan/report.c:506
       kasan_report+0x12/0x20 mm/kasan/common.c:641
       __asan_report_store1_noabort+0x17/0x20 mm/kasan/generic_report.c:137
       decode_data.part.0+0x23b/0x270 drivers/net/hamradio/6pack.c:843
       decode_data drivers/net/hamradio/6pack.c:965 [inline]
       sixpack_decode drivers/net/hamradio/6pack.c:968 [inline]
      
      Reported-and-tested-by: default avatar <syzbot+fc8cd9a673d4577fb2e4@syzkaller.appspotmail.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Reviewed-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a73b9aa1
    • Jakub Kicinski's avatar
      bnxt: disable napi before canceling DIM · 2bc75713
      Jakub Kicinski authored
      
      
      [ Upstream commit 01cca6b9 ]
      
      napi schedules DIM, napi has to be disabled first,
      then DIM canceled.
      
      Noticed while reading the code.
      
      Fixes: 0bc0b97f ("bnxt_en: cleanup DIM work on device shutdown")
      Fixes: 6a8788f2 ("bnxt_en: add support for software dynamic interrupt moderation")
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2bc75713
    • Jakub Kicinski's avatar
      bnxt: don't lock the tx queue from napi poll · a9fb0f15
      Jakub Kicinski authored
      
      
      [ Upstream commit 3c603136 ]
      
      We can't take the tx lock from the napi poll routine, because
      netpoll can poll napi at any moment, including with the tx lock
      already held.
      
      The tx lock is protecting against two paths - the disable
      path, and (as Michael points out) the NETDEV_TX_BUSY case
      which may occur if NAPI completions race with start_xmit
      and both decide to re-enable the queue.
      
      For the disable/ifdown path use synchronize_net() to make sure
      closing the device does not race we restarting the queues.
      Annotate accesses to dev_state against data races.
      
      For the NAPI cleanup vs start_xmit path - appropriate barriers
      are already in place in the main spot where Tx queue is stopped
      but we need to do the same careful dance in the TX_BUSY case.
      
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a9fb0f15
    • Ilya Leoshkevich's avatar
      bpf: Clear zext_dst of dead insns · 1fe03803
      Ilya Leoshkevich authored
      
      
      [ Upstream commit 45c709f8 ]
      
      "access skb fields ok" verifier test fails on s390 with the "verifier
      bug. zext_dst is set, but no reg is defined" message. The first insns
      of the test prog are ...
      
         0:	61 01 00 00 00 00 00 00 	ldxw %r0,[%r1+0]
         8:	35 00 00 01 00 00 00 00 	jge %r0,0,1
        10:	61 01 00 08 00 00 00 00 	ldxw %r0,[%r1+8]
      
      ... and the 3rd one is dead (this does not look intentional to me, but
      this is a separate topic).
      
      sanitize_dead_code() converts dead insns into "ja -1", but keeps
      zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such
      an insn, it sees this discrepancy and bails. This problem can be seen
      only with JITs whose bpf_jit_needs_zext() returns true.
      
      Fix by clearning dead insns' zext_dst.
      
      The commits that contributed to this problem are:
      
      1. 5aa5bd14 ("bpf: add initial suite for selftests"), which
         introduced the test with the dead code.
      2. 5327ed3d ("bpf: verifier: mark verified-insn with
         sub-register zext flag"), which introduced the zext_dst flag.
      3. 83a28819 ("bpf: Account for BPF_FETCH in
         insn_has_def32()"), which introduced the sanity check.
      4. 9183671a ("bpf: Fix leakage under speculation on
         mispredicted branches"), which bisect points to.
      
      It's best to fix this on stable branches that contain the second one,
      since that's the point where the inconsistency was introduced.
      
      Fixes: 5327ed3d ("bpf: verifier: mark verified-insn with sub-register zext flag")
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210812151811.184086-2-iii@linux.ibm.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1fe03803
    • Xie Yongji's avatar
      vhost: Fix the calculation in vhost_overflow() · 73a45f75
      Xie Yongji authored
      
      
      [ Upstream commit f7ad318e ]
      
      This fixes the incorrect calculation for integer overflow
      when the last address of iova range is 0xffffffff.
      
      Fixes: ec33d031 ("vhost: detect 32 bit integer wrap around")
      Reported-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarXie Yongji <xieyongji@bytedance.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20210728130756.97-2-xieyongji@bytedance.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      73a45f75
    • Parav Pandit's avatar
      virtio: Protect vqs list access · b9a59636
      Parav Pandit authored
      
      
      [ Upstream commit 0e566c8f ]
      
      VQs may be accessed to mark the device broken while they are
      created/destroyed. Hence protect the access to the vqs list.
      
      Fixes: e2dcdfe9 ("virtio: virtio_break_device() to mark all virtqueues broken.")
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Link: https://lore.kernel.org/r/20210721142648.1525924-4-parav@nvidia.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b9a59636
    • Randy Dunlap's avatar
      dccp: add do-while-0 stubs for dccp_pr_debug macros · b264e37b
      Randy Dunlap authored
      
      
      [ Upstream commit 86aab09a ]
      
      GCC complains about empty macros in an 'if' statement, so convert
      them to 'do {} while (0)' macros.
      
      Fixes these build warnings:
      
      net/dccp/output.c: In function 'dccp_xmit_packet':
      ../net/dccp/output.c:283:71: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
        283 |                 dccp_pr_debug("transmit_skb() returned err=%d\n", err);
      net/dccp/ackvec.c: In function 'dccp_ackvec_update_old':
      ../net/dccp/ackvec.c:163:80: warning: suggest braces around empty body in an 'else' statement [-Wempty-body]
        163 |                                               (unsigned long long)seqno, state);
      
      Fixes: dc841e30 ("dccp: Extend CCID packet dequeueing interface")
      Fixes: 38024086 ("dccp ccid-2: Update code for the Ack Vector input/registration routine")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: dccp@vger.kernel.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b264e37b
    • Marek Behún's avatar
      cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant · 9112ebc2
      Marek Behún authored
      
      
      [ Upstream commit 484f2b7c ]
      
      The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
      the SOC boots, the WTMI firmware sets clocks and AVS values that work
      correctly with 1.2 GHz CPU frequency, but random crashes occur once
      cpufreq driver starts scaling.
      
      We do not know currently what is the reason:
      - it may be that the voltage value for L0 for 1.2 GHz variant provided
        by the vendor in the OTP is simply incorrect when scaling is used,
      - it may be that some delay is needed somewhere,
      - it may be something else.
      
      The most sane solution now seems to be to simply forbid the cpufreq
      driver on 1.2 GHz variant.
      
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Fixes: 92ce45fb ("cpufreq: Add DVFS support for Armada 37xx")
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9112ebc2
    • Frank Wunderlich's avatar
      iommu: Check if group is NULL before remove device · cb9a9d5f
      Frank Wunderlich authored
      
      
      [ Upstream commit 5aa95d88 ]
      
      If probe_device is failing, iommu_group is not initialized because
      iommu_group_add_device is not reached, so freeing it will result
      in NULL pointer access.
      
      iommu_bus_init
        ->bus_iommu_probe
            ->probe_iommu_group in for each:/* return -22 in fail case */
                ->iommu_probe_device
                    ->__iommu_probe_device       /* return -22 here.*/
                        -> ops->probe_device          /* return -22 here.*/
                        -> iommu_group_get_for_dev
                              -> ops->device_group
                              -> iommu_group_add_device //good case
        ->remove_iommu_group  //in fail case, it will remove group
           ->iommu_release_device
               ->iommu_group_remove_device // here we don't have group
      
      In my case ops->probe_device (mtk_iommu_probe_device from
      mtk_iommu_v1.c) is due to failing fwspec->ops mismatch.
      
      Fixes: d72e31c9 ("iommu: IOMMU Groups")
      Signed-off-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Link: https://lore.kernel.org/r/20210731074737.4573-1-linux@fw-web.de
      
      
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cb9a9d5f
    • Ole Bjørn Midtbø's avatar
      Bluetooth: hidp: use correct wait queue when removing ctrl_wait · 911a8141
      Ole Bjørn Midtbø authored
      
      
      [ Upstream commit cca342d9 ]
      
      A different wait queue was used when removing ctrl_wait than when adding
      it. This effectively made the remove operation without locking compared
      to other operations on the wait queue ctrl_wait was part of. This caused
      issues like below where dead000000000100 is LIST_POISON1 and
      dead000000000200 is LIST_POISON2.
      
       list_add corruption. next->prev should be prev (ffffffc1b0a33a08), \
      	but was dead000000000200. (next=ffffffc03ac77de0).
       ------------[ cut here ]------------
       CPU: 3 PID: 2138 Comm: bluetoothd Tainted: G           O    4.4.238+ #9
       ...
       ---[ end trace 0adc2158f0646eac ]---
       Call trace:
       [<ffffffc000443f78>] __list_add+0x38/0xb0
       [<ffffffc0000f0d04>] add_wait_queue+0x4c/0x68
       [<ffffffc00020eecc>] __pollwait+0xec/0x100
       [<ffffffc000d1556c>] bt_sock_poll+0x74/0x200
       [<ffffffc000bdb8a8>] sock_poll+0x110/0x128
       [<ffffffc000210378>] do_sys_poll+0x220/0x480
       [<ffffffc0002106f0>] SyS_poll+0x80/0x138
       [<ffffffc00008510c>] __sys_trace_return+0x0/0x4
      
       Unable to handle kernel paging request at virtual address dead000000000100
       ...
       CPU: 4 PID: 5387 Comm: kworker/u15:3 Tainted: G        W  O    4.4.238+ #9
       ...
       Call trace:
        [<ffffffc0000f079c>] __wake_up_common+0x7c/0xa8
        [<ffffffc0000f0818>] __wake_up+0x50/0x70
        [<ffffffc000be11b0>] sock_def_wakeup+0x58/0x60
        [<ffffffc000de5e10>] l2cap_sock_teardown_cb+0x200/0x224
        [<ffffffc000d3f2ac>] l2cap_chan_del+0xa4/0x298
        [<ffffffc000d45ea0>] l2cap_conn_del+0x118/0x198
        [<ffffffc000d45f8c>] l2cap_disconn_cfm+0x6c/0x78
        [<ffffffc000d29934>] hci_event_packet+0x564/0x2e30
        [<ffffffc000d19b0c>] hci_rx_work+0x10c/0x360
        [<ffffffc0000c2218>] process_one_work+0x268/0x460
        [<ffffffc0000c2678>] worker_thread+0x268/0x480
        [<ffffffc0000c94e0>] kthread+0x118/0x128
        [<ffffffc000085070>] ret_from_fork+0x10/0x20
        ---[ end trace 0adc2158f0646ead ]---
      
      Signed-off-by: default avatarOle Bjørn Midtbø <omidtbo@cisco.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      911a8141
    • Bing Guo's avatar
      drm/amd/display: Fix Dynamic bpp issue with 8K30 with Navi 1X · 5b14c1f1
      Bing Guo authored
      
      
      [ Upstream commit 06050a0f ]
      
      Why:
      In DCN2x, HW doesn't automatically divide MASTER_UPDATE_LOCK_DB_X
      by the number of pipes ODM Combined.
      
      How:
      Set MASTER_UPDATE_LOCK_DB_X to the value that is adjusted by the
      number of pipes ODM Combined.
      
      Reviewed-by: default avatarMartin Leung <martin.leung@amd.com>
      Acked-by: default avatarAurabindo Pillai <aurabindo.pillai@amd.com>
      Signed-off-by: default avatarBing Guo <bing.guo@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5b14c1f1
    • Ivan T. Ivanov's avatar
      net: usb: lan78xx: don't modify phy_device state concurrently · f92dc3a8
      Ivan T. Ivanov authored
      
      
      [ Upstream commit 6b67d4d6 ]
      
      Currently phy_device state could be left in inconsistent state shown
      by following alert message[1]. This is because phy_read_status could
      be called concurrently from lan78xx_delayedwork, phy_state_machine and
      __ethtool_get_link. Fix this by making sure that phy_device state is
      updated atomically.
      
      [1] lan78xx 1-1.1.1:1.0 eth0: No phy led trigger registered for speed(-1)
      
      Signed-off-by: default avatarIvan T. Ivanov <iivanov@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f92dc3a8
Loading