Skip to content
  1. Mar 10, 2019
    • Dietmar Eggemann's avatar
      arm, arm64: Enable kernel config options required for EAS testing · caac29e5
      Dietmar Eggemann authored
      
      
      arm and arm64:
      
        Add    Function, Function Graph, Irqsoff, Preempt, Sched Tracer
        Add    Prove Locking
        Add    Prove RCU
      
      for arm64:
      
        Add    USB Net RTL8152
        Add    USB Net
        Add    USB Net AX8817X
        Remove Mouse PS2
      
      for arm:
      
        Add    kernel .config support and /proc/config.gz
        Add    ARM Big.Little cpufreq driver
        Add    ARM Big.Little cpuidle driver
        Add    Sensor Vexpress
      
      Signed-off-by: Dietmar Eggemann's avatarDietmar Eggemann <dietmar.eggemann@arm.com>
    • Dietmar Eggemann's avatar
      arm, arm64: Enable kernel config options required for EAS · e86e6946
      Dietmar Eggemann authored
      
      
      arm and arm64:
      
        Add    Cgroups support
        Add    Energy Model
        Add    CpuFreq governors and make schedutil default
        Add    Uclamp support for tasks and taskgroups
      
      for arm:
      
        Add    Cpuset support
        Add    Scheduler autogroups
        Add    DIE sched domain level
      
      Signed-off-by: Dietmar Eggemann's avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      e86e6946
    • Patrick Bellasi's avatar
      sched/core: uclamp: Update CPU's refcount on TG's clamp changes · b1f99962
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      On updates of task group (TG) clamp values, ensure that these new values
      are enforced on all RUNNABLE tasks of the task group, i.e. all RUNNABLE
      tasks are immediately boosted and/or clamped as requested.
      
      Do that by slightly refactoring uclamp_bucket_inc(). An additional
      parameter *cgroup_subsys_state (css) is used to walk the list of tasks
      in the TGs and update the RUNNABLE ones. Do that by taking the rq
      lock for each task, the same mechanism used for cpu affinity masks
      updates.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
    • Patrick Bellasi's avatar
      sched/core: uclamp: Use TG's clamps to restrict TASK's clamps · c9a0a399
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      When a task specific clamp value is configured via sched_setattr(2),
      this value is accounted in the corresponding clamp bucket every time the
      task is {en,de}qeued. However, when cgroups are also in use, the task
      specific clamp values could be restricted by the task_group (TG)
      clamp values.
      
      Update uclamp_cpu_inc() to aggregate task and TG clamp values. Every
      time a task is enqueued, it's accounted in the clamp_bucket defining the
      smaller clamp between the task specific value and its TG effective
      value. This allows to:
      
      1. ensure cgroup clamps are always used to restrict task specific
         requests, i.e. boosted only up to the effective granted value or
         clamped at least to a certain value
      
      2. implement a "nice-like" policy, where tasks are still allowed to
         request less then what enforced by their current TG
      
      This mimics what already happens for a task's CPU affinity mask when the
      task is also in a cpuset, i.e. cgroup attributes are always used to
      restrict per-task attributes.
      
      Do this by exploiting the concept of "effective" clamp, which is already
      used by a TG to track parent enforced restrictions.
      
      Apply task group clamp restrictions only to tasks belonging to a child
      group. While, for tasks in the root group or in an autogroup, only
      system defaults are enforced.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      c9a0a399
    • Patrick Bellasi's avatar
      sched/core: uclamp: Propagate system defaults to root group · 682d2b49
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      The clamp values are not tunable at the level of the root task group.
      That's for two main reasons:
      
       - the root group represents "system resources" which are always
         entirely available from the cgroup standpoint.
      
       - when tuning/restricting "system resources" makes sense, tuning must
         be done using a system wide API which should also be available when
         control groups are not.
      
      When a system wide restriction is available, cgroups should be aware of
      its value in order to know exactly how much "system resources" are
      available for the subgroups.
      
      Utilization clamping supports already the concepts of:
      
       - system defaults: which define the maximum possible clamp values
         usable by tasks.
      
       - effective clamps: which allows a parent cgroup to constraint (maybe
         temporarily) its descendants without losing the information related
         to the values "requested" from them.
      
      Exploit these two concepts and bind them together in such a way that,
      whenever system default are tuned, the new values are propagated to
      (possibly) restrict or relax the "effective" value of nested cgroups.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      682d2b49
    • Patrick Bellasi's avatar
      sched/core: uclamp: Propagate parent clamps · 45bd522f
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      In order to properly support hierarchical resources control, the cgroup
      delegation model requires that attribute writes from a child group never
      fail but still are (potentially) constrained based on parent's assigned
      resources. This requires to properly propagate and aggregate parent
      attributes down to its descendants.
      
      Let's implement this mechanism by adding a new "effective" clamp value
      for each task group. The effective clamp value is defined as the smaller
      value between the clamp value of a group and the effective clamp value
      of its parent. This is the actual clamp value enforced on tasks in a
      task group.
      
      Since it can be interesting for userspace, e.g. system management
      software, to know exactly what the currently propagated/enforced
      configuration is, the effective clamp values are exposed to user-space
      by means of a new pair of read-only attributes
      cpu.util.{min,max}.effective.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      
      ---
      Changes in v7:
       Others:
       - ensure clamp values are not tunable at root cgroup level
      45bd522f
    • Patrick Bellasi's avatar
      sched/core: uclamp: Extend CPU's cgroup controller · 1b0d9f7b
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      The cgroup CPU bandwidth controller allows to assign a specified
      (maximum) bandwidth to the tasks of a group. However this bandwidth is
      defined and enforced only on a temporal base, without considering the
      actual frequency a CPU is running on. Thus, the amount of computation
      completed by a task within an allocated bandwidth can be very different
      depending on the actual frequency the CPU is running that task.
      The amount of computation can be affected also by the specific CPU a
      task is running on, especially when running on asymmetric capacity
      systems like Arm's big.LITTLE.
      
      With the availability of schedutil, the scheduler is now able
      to drive frequency selections based on actual task utilization.
      Moreover, the utilization clamping support provides a mechanism to
      bias the frequency selection operated by schedutil depending on
      constraints assigned to the tasks currently RUNNABLE on a CPU.
      
      Giving the mechanisms described above, it is now possible to extend the
      cpu controller to specify the minimum (or maximum) utilization which
      should be considered for tasks RUNNABLE on a cpu.
      This makes it possible to better defined the actual computational
      power assigned to task groups, thus improving the cgroup CPU bandwidth
      controller which is currently based just on time constraints.
      
      Extend the CPU controller with a couple of new attributes util.{min,max}
      which allows to enforce utilization boosting and capping for all the
      tasks in a group. Specifically:
      
      - util.min: defines the minimum utilization which should be considered
      	    i.e. the RUNNABLE tasks of this group will run at least at a
      		 minimum frequency which corresponds to the min_util
      		 utilization
      
      - util.max: defines the maximum utilization which should be considered
      	    i.e. the RUNNABLE tasks of this group will run up to a
      		 maximum frequency which corresponds to the max_util
      		 utilization
      
      These attributes:
      
      a) are available only for non-root nodes, both on default and legacy
         hierarchies, while system wide clamps are defined by a generic
         interface which does not depends on cgroups
      
      b) do not enforce any constraints and/or dependencies between the parent
         and its child nodes, thus relying:
         - on permission settings defined by the system management software,
           to define if subgroups can configure their clamp values
         - on the delegation model, to ensure that effective clamps are
           updated to consider both subgroup requests and parent group
           constraints
      
      c) have higher priority than task-specific clamps, defined via
         sched_setattr(), thus allowing to control and restrict task requests
      
      This patch provides the basic support to expose the two new attributes
      and to validate their run-time updates, while we do not (yet) actually
      allocated clamp buckets.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      1b0d9f7b
    • Patrick Bellasi's avatar
      sched/fair: uclamp: Add uclamp support to energy_compute() · 37afee69
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      The Energy Aware Scheduler (AES) estimates the energy impact of waking
      up a task on a given CPU. This estimation is based on:
       a) an (active) power consumptions defined for each CPU frequency
       b) an estimation of which frequency will be used on each CPU
       c) an estimation of the busy time (utilization) of each CPU
      
      Utilization clamping can affect both b) and c) estimations. A CPU is
      expected to run:
       - on an higher than required frequency, but for a shorter time, in case
         its estimated utilization will be smaller then the minimum utilization
         enforced by uclamp
       - on a smaller than required frequency, but for a longer time, in case
         its estimated utilization is bigger then the maximum utilization
         enforced by uclamp
      
      While effects on busy time for both boosted/capped tasks are already
      considered by compute_energy(), clamping effects on frequency selection
      are currently ignored by that function.
      
      Fix it by considering how CPU clamp values will be affected by a
      task waking up and being RUNNABLE on that CPU.
      
      Do that by refactoring schedutil_freq_util() to take an additional
      task_struct* which allows EAS to evaluate the impact on clamp values of
      a task being eventually queued in a CPU. Clamp values are applied to the
      RT+CFS utilization only when a FREQUENCY_UTIL is required by
      compute_energy().
      
      Do note that switching from ENERGY_UTIL to FREQUENCY_UTIL in the
      computation of cpu_util signal implies that we are more likely to
      estimate the higherst OPP when a RT task is running in another CPU of
      the same performance domain. This can have an impact on energy
      estimation but:
       - it's not easy to say which approach is better, since it quite likely
         depends on the use case
       - the original approach could still be obtained by setting a smaller
         task-specific util_min whenever required
      
      Since we are at that:
       - rename schedutil_freq_util() into schedutil_cpu_util(),
         since it's not only used for frequency selection.
       - use "unsigned int" instead of "unsigned long" whenever the tracked
         utilization value is not expected to overflow 32bit.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      
      ---
      Changes in v7:
       Message-ID: <20190122151404.5rtosic6puixado3@queper01-lin>
       - add a note on side-effects due to the usage of FREQUENCY_UTIL for
         performance domain frequency estimation
       - add a similer note to this changelog
      37afee69
    • Patrick Bellasi's avatar
      sched/core: uclamp: Add uclamp_util_with() · 44fdbf08
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Currently uclamp_util() allows to clamp a specified utilization
      considering the clamp values requested by RUNNABLE tasks in a CPU.
      Sometimes however, it could be interesting to verify how clamp values
      will change when a task is going to be running on a given CPU.
      For example, the Energy Aware Scheduler (EAS) is interested in
      evaluating and comparing the energy impact of different scheduling
      decisions.
      
      Add uclamp_util_with() which allows to clamp a given utilization by
      considering the possible impact on CPU clamp values of a specified task.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      44fdbf08
    • Patrick Bellasi's avatar
      sched/cpufreq: uclamp: Add clamps for FAIR and RT tasks · 0fe75e34
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Each time a frequency update is required via schedutil, a frequency is
      selected to (possibly) satisfy the utilization reported by each
      scheduling class. However, when utilization clamping is in use, the
      frequency selection should consider userspace utilization clamping
      hints.  This will allow, for example, to:
      
       - boost tasks which are directly affecting the user experience
         by running them at least at a minimum "requested" frequency
      
       - cap low priority tasks not directly affecting the user experience
         by running them only up to a maximum "allowed" frequency
      
      These constraints are meant to support a per-task based tuning of the
      frequency selection thus supporting a fine grained definition of
      performance boosting vs energy saving strategies in kernel space.
      
      Add support to clamp the utilization of RUNNABLE FAIR and RT tasks
      within the boundaries defined by their aggregated utilization clamp
      constraints.
      
      Do that by considering the max(min_util, max_util) to give boosted tasks
      the performance they need even when they happen to be co-scheduled with
      other capped tasks.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      
      ---
      Changes in v7:
       Message-ID: <CAJZ5v0j2NQY_gKJOAy=rP5_1Dk9TODKNhW0vuvsynTN3BUmYaQ@mail.gmail.com>
       - merged FAIR and RT integration patches in this one
       Message-ID: <20190123142455.454u4w253xaxzar3@e110439-lin>
       - dropped clamping for IOWait boost
       Message-ID: <20190122123704.6rb3xemvxbp5yfjq@e110439-lin>
       - fixed go to max for RT tasks on !CONFIG_UCLAMP_TASK
      0fe75e34
    • Patrick Bellasi's avatar
      sched/core: uclamp: Set default clamps for RT tasks · 55f66ab6
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      By default FAIR tasks start without clamps, i.e. neither boosted nor
      capped, and they run at the best frequency matching their utilization
      demand.  This default behavior does not fit RT tasks which instead are
      expected to run at the maximum available frequency, if not otherwise
      required by explicitly capping them.
      
      Enforce the correct behavior for RT tasks by setting util_min to max
      whenever:
      
       1. a task is switched to the RT class and it does not already have a
          user-defined clamp value assigned.
      
       2. a task is forked from a parent with RESET_ON_FORK set.
      
      NOTE: utilization clamp values are cross scheduling class attributes and
      thus they are never changed/reset once a value has been explicitly
      defined from user-space.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      55f66ab6
    • Patrick Bellasi's avatar
      sched/core: uclamp: Reset uclamp values on RESET_ON_FORK · 370c1ca3
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      A forked tasks gets the same clamp values of its parent however, when
      the RESET_ON_FORK flag is set on parent, e.g. via:
      
         sys_sched_setattr()
            sched_setattr()
               __sched_setscheduler(attr::SCHED_FLAG_RESET_ON_FORK)
      
      the new forked task is expected to start with all attributes reset to
      default values.
      
      Do that for utilization clamp values too by caching the reset request
      and propagating it into the existing uclamp_fork() call which already
      provides the required initialization for other uclamp related bits.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      370c1ca3
    • Patrick Bellasi's avatar
      sched/core: uclamp: Extend sched_setattr() to support utilization clamping · 731fd9b0
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      The SCHED_DEADLINE scheduling class provides an advanced and formal
      model to define tasks requirements that can translate into proper
      decisions for both task placements and frequencies selections. Other
      classes have a more simplified model based on the POSIX concept of
      priorities.
      
      Such a simple priority based model however does not allow to exploit
      most advanced features of the Linux scheduler like, for example, driving
      frequencies selection via the schedutil cpufreq governor. However, also
      for non SCHED_DEADLINE tasks, it's still interesting to define tasks
      properties to support scheduler decisions.
      
      Utilization clamping exposes to user-space a new set of per-task
      attributes the scheduler can use as hints about the expected/required
      utilization for a task. This allows to implement a "proactive" per-task
      frequency control policy, a more advanced policy than the current one
      based just on "passive" measured task utilization. For example, it's
      possible to boost interactive tasks (e.g. to get better performance) or
      cap background tasks (e.g. to be more energy/thermal efficient).
      
      Introduce a new API to set utilization clamping values for a specified
      task by extending sched_setattr(), a syscall which already allows to
      define task specific properties for different scheduling classes. A new
      pair of attributes allows to specify a minimum and maximum utilization
      the scheduler can consider for a task.
      
      Do that by checking and validating the required clamp values before and
      then applying the required changes using _the_ same pattern already in
      use for __setscheduler(). This ensures that the task is re-enqueued with
      the new clamp values.
      
      Do not allow to change sched class specific params and non class
      specific params (i.e. clamp values) at the same time.  This keeps things
      simple and still works for the most common cases since we are usually
      interested in just one of the two actions.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      
      ---
      Changes in v7:
       Message-ID: <20190124123814.GM13777@hirez.programming.kicks-ass.net>
       - split validation code from actual state changing code
       - for state changing code, use _the_ same pattern __setscheduler() and
         other code already use, i.e. dequeue-change-enqueue
       - add SCHED_FLAG_KEEP_PARAMS and use it to skip __setscheduler() when
         policy and params are not specified
      731fd9b0
    • Patrick Bellasi's avatar
      sched/core: Allow sched_setattr() to use the current policy · 501f2535
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      The sched_setattr() syscall mandates that a policy is always specified.
      This requires to always know which policy a task will have when
      attributes are configured and it makes it impossible to add more generic
      task attributes valid across different scheduling policies.
      Reading the policy before setting generic tasks attributes is racy since
      we cannot be sure it is not changed concurrently.
      
      Introduce the required support to change generic task attributes without
      affecting the current task policy. This is done by adding an attribute flag
      (SCHED_FLAG_KEEP_POLICY) to enforce the usage of the current policy.
      
      This is done by extending to the sched_setattr() non-POSIX syscall with
      the SETPARAM_POLICY policy already used by the sched_setparam() POSIX
      syscall.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      
      ---
      Changes in v7:
       Message-ID: <20190125135646.j4j2onitam4mwvcw@google.com>
       - fix definition of SCHED_POLICY_MAX
      501f2535
    • Patrick Bellasi's avatar
      sched/core: uclamp: Add system default clamps · 71c4bbaf
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Tasks without a user-defined clamp value are considered not clamped
      and by default their utilization can have any value in the
      [0..SCHED_CAPACITY_SCALE] range.
      
      Tasks with a user-defined clamp value are allowed to request any value
      in that range, and we unconditionally enforce the required clamps.
      However, a "System Management Software" could be interested in limiting
      the range of clamp values allowed for all tasks.
      
      Add a privileged interface to define a system default configuration via:
      
        /proc/sys/kernel/sched_uclamp_util_{min,max}
      
      which works as an unconditional clamp range restriction for all tasks.
      
      The default configuration allows the full range of SCHED_CAPACITY_SCALE
      values for each clamp index. If otherwise configured, a task specific
      clamp is always capped by the corresponding system default value.
      
      Do that by tracking, for each task, the "effective" clamp value and
      bucket the task has been actual refcounted in at enqueue time. This
      allows to lazy aggregate "requested" and "system default" values at
      enqueue time and simplify refcounting updates at dequeue time.
      
      The cached bucket ids are used to avoid (relatively) more expensive
      integer divisions every time a task is enqueued.
      
      An active flag is used to report when the "effective" value is valid and
      thus the task actually refcounted in the corresponding rq's bucket.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      
      ---
      Changes in v7:
       Message-ID: <20190124123009.2yulcf25ld66popd@e110439-lin>
       - make system defaults to support a "nice" policy where a task, for
         each clamp index, can get only "up to" what allowed by the system
         default setting, i.e. tasks are always allowed to request for less
      71c4bbaf
    • Patrick Bellasi's avatar
      sched/core: uclamp: Enforce last task UCLAMP_MAX · 093c7cd8
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      When the task sleeps, it removes its max utilization clamp from its CPU.
      However, the blocked utilization on that CPU can be higher than the max
      clamp value enforced while the task was running. This allows undesired
      CPU frequency increases while a CPU is idle, for example, when another
      CPU on the same frequency domain triggers a frequency update, since
      schedutil can now see the full not clamped blocked utilization of the
      idle CPU.
      
      Fix this by using
        uclamp_rq_dec_id(p, rq, UCLAMP_MAX)
          uclamp_rq_update(rq, UCLAMP_MAX, clamp_value)
      to detect when a CPU has no more RUNNABLE clamped tasks and to flag this
      condition.
      
      Don't track any minimum utilization clamps since an idle CPU never
      requires a minimum frequency. The decay of the blocked utilization is
      good enough to reduce the CPU frequency.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      093c7cd8
    • Patrick Bellasi's avatar
      sched/core: uclamp: Add CPU's clamp buckets refcounting · 3b8cdaed
      Patrick Bellasi authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Utilization clamping allows to clamp the CPU's utilization within a
      [util_min, util_max] range, depending on the set of RUNNABLE tasks on
      that CPU. Each task references two "clamp buckets" defining its minimum
      and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp
      bucket is active if there is at least one RUNNABLE tasks enqueued on
      that CPU and refcounting that bucket.
      
      When a task is {en,de}queued {on,from} a rq, the set of active clamp
      buckets on that CPU can change. Since each clamp bucket enforces a
      different utilization clamp value, when the set of active clamp buckets
      changes, a new "aggregated" clamp value is computed for that CPU.
      
      Clamp values are always MAX aggregated for both util_min and util_max.
      This ensures that no tasks can affect the performance of other
      co-scheduled tasks which are more boosted (i.e. with higher util_min
      clamp) or less capped (i.e. with higher util_max clamp).
      
      Each task has a:
         task_struct::uclamp[clamp_id]::bucket_id
      to track the "bucket index" of the CPU's clamp bucket it refcounts while
      enqueued, for each clamp index (clamp_id).
      
      Each CPU's rq has a:
         rq::uclamp[clamp_id]::bucket[bucket_id].tasks
      to track how many tasks, currently RUNNABLE on that CPU, refcount each
      clamp bucket (bucket_id) of a clamp index (clamp_id).
      
      Each CPU's rq has also a:
         rq::uclamp[clamp_id]::bucket[bucket_id].value
      to track the clamp value of each clamp bucket (bucket_id) of a clamp
      index (clamp_id).
      
      The rq::uclamp::bucket[clamp_id][] array is scanned every time we need
      to find a new MAX aggregated clamp value for a clamp_id. This operation
      is required only when we dequeue the last task of a clamp bucket
      tracking the current MAX aggregated clamp value. In these cases, the CPU
      is either entering IDLE or going to schedule a less boosted or more
      clamped task.
      The expected number of different clamp values, configured at build time,
      is small enough to fit the full unordered array into a single cache
      line.
      
      Add the basic data structures required to refcount, in each CPU's rq,
      the number of RUNNABLE tasks for each clamp bucket. Add also the max
      aggregation required to update the rq's clamp value at each
      enqueue/dequeue event.
      
      Use a simple linear mapping of clamp values into clamp buckets.
      Pre-compute and cache bucket_id to avoid integer divisions at
      enqueue/dequeue time.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      
      ---
      Changes in v7:
       Message-ID: <20190123191007.GG17749@hirez.programming.kicks-ass.net>
       - removed buckets mapping code
       - use a simpler linear mapping of clamp values into buckets
       Message-ID: <20190124161443.lv2pw5fsspyelckq@e110439-lin>
       - move this patch at the beginning of the series,
         in the attempt to make the overall series easier to digest by moving
         at the very beginning the core bits and main data structures
       Others:
       - update the mapping logic to use exactly and only
         UCLAMP_BUCKETS_COUNT buckets, i.e. no more "special" bucket
       - update uclamp_rq_update() to do top-bottom max search
      3b8cdaed
    • Quentin Perret's avatar
    • Quentin Perret's avatar
      24ae4bab
    • Quentin Perret's avatar
      PM / EM: Expose the Energy Model in debugfs · 8a3618d4
      Quentin Perret authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      The recently introduced Energy Model (EM) framework manages power cost
      tables of CPUs. These tables are currently only visible from kernel
      space. However, in order to debug the behaviour of subsystems that use
      the EM (EAS for example), it is often required to know what the power
      costs are from userspace.
      
      For this reason, introduce under /sys/kernel/debug/energy_model a set of
      directories representing the performance domains of the system. Each
      performance domain contains a set of sub-directories representing the
      different capacity states (cs) and their attributes, as well as a file
      exposing the related CPUs.
      
      The resulting hierarchy is as follows on Arm juno r0 for example:
      
          /sys/kernel/debug/energy_model
          ├── pd0
          │   ├── cpus
          │   ├── cs:450000
          │   │   ├── cost
          │   │   ├── frequency
          │   │   └── power
          │   ├── cs:575000
          │   │   ├── cost
          │   │   ├── frequency
          │   │   └── power
          │   ├── cs:700000
          │   │   ├── cost
          │   │   ├── frequency
          │   │   └── power
          │   ├── cs:775000
          │   │   ├── cost
          │   │   ├── frequency
          │   │   └── power
          │   └── cs:850000
          │       ├── cost
          │       ├── frequency
          │       └── power
          └── pd1
              ├── cpus
              ├── cs:1100000
              │   ├── cost
              │   ├── frequency
              │   └── power
              ├── cs:450000
              │   ├── cost
              │   ├── frequency
              │   └── power
              ├── cs:625000
              │   ├── cost
              │   ├── frequency
              │   └── power
              ├── cs:800000
              │   ├── cost
              │   ├── frequency
              │   └── power
              └── cs:950000
                  ├── cost
                  ├── frequency
                  └── power
      
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8a3618d4
    • Paul E. McKenney's avatar
      arm: Use common outgoing-CPU-notification code · 72754198
      Paul E. McKenney authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      This commit removes the open-coded CPU-offline notification with new
      common code.  In particular, this change avoids calling scheduler code
      using RCU from an offline CPU that RCU is ignoring.  This is a minimal
      change.  A more intrusive change might invoke the cpu_check_up_prepare()
      and cpu_set_state_online() functions at CPU-online time, which would
      allow onlining throw an error if the CPU did not go offline properly.
      
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Russell King <linux@arm.linux.org.uk>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
    • Dietmar Eggemann's avatar
      arm: fix a migrating irq bug when hotplug cpu · bafafc36
      Dietmar Eggemann authored
      
      
      Arm TC2 fails cpu hotplug stress test.
      
      This issue was tracked down to a missing copy of the new affinity
      cpumask for the vexpress-spc interrupt into struct
      irq_common_data.affinity when the interrupt is migrated in
      migrate_one_irq().
      
      Fix it by replacing the arm specific hotplug cpu migration with the
      generic irq code.
      
      This is the counterpart implementation to commit 217d453d ("arm64:
      fix a migrating irq bug when hotplug cpu").
      
      Tested with cpu hotplug stress test on Arm TC2 (multi_v7_defconfig plus
      CONFIG_ARM_BIG_LITTLE_CPUFREQ=y and CONFIG_ARM_VEXPRESS_SPC_CPUFREQ=y).
      The vexpress-spc interrupt (irq=22) on this board is affine to CPU0.
      Its affinity cpumask now changes correctly e.g. from 0 to 1-4 when
      CPU0 is hotplugged out.
      
      Suggested-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: Dietmar Eggemann's avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      bafafc36
    • Valentin Schneider's avatar
      arm64: defconfig: Update UFSHCD for Hi3660 soc · 9818a925
      Valentin Schneider authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Commit 7ee7ef24 ("scsi: arm64: defconfig: enable configs for
      Hisilicon ufs") enabled the Hisilicon UFS, which means that we can
      flash a rootfs to the on-board flash. However, as it stands, the
      kernel gets stuck on:
      
      [    3.360733] Waiting for root device /dev/sdd10...
      
      That seems to be because even though we have SCSI_UFS_HISI=y,
      SCSI_UFSHCD and SCSI_UFSHCD_PLATFORM are set to 'm', which means the
      required drivers won't be built-in.
      
      We need those to load the rootfs and then load the modules, so set
      them as built-ins.
      
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
    • Chen Feng's avatar
      reset: hisi-reboot: adb reboot bootloader · d0067fba
      Chen Feng authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Signed-off-by: default avatarChen Feng <puck.chen@hisilicon.com>
      d0067fba
    • Daniel Lezcano's avatar
      cpuidle: Fix NULL driver checking · 4c26dbe8
      Daniel Lezcano authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      4c26dbe8
    • John Stultz's avatar
      dts: hikey960: Fix bootwarning on mapping reboot reason syscon · a4491afb
      John Stultz authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      a4491afb
    • Chen Jun's avatar
      arm64: dts: hi3660: adb reboot node · fa24c24e
      Chen Jun authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Add "hisilicon,hi3660-reboot" node for hi3660.
      
      Eventually when we've transitioned to UEFI this can be dropped.
      As we can then use syscon-reboot-mode.
      
      Signed-off-by: default avatarChen Feng <puck.chen@hisilicon.com>
      Signed-off-by: default avatarChen Jun <chenjun14@huawei.com>
      fa24c24e
    • Valentin Schneider's avatar
      HACK: usb: dwc3: Tie USB_ROLE_SWITCH to USB_DWC3_DUAL_ROLE · 5b55ebad
      Valentin Schneider authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      This or something cleaner will land in v3, but in the meantime we need
      either this or a defconfig tweak, so I'd rather go for this.
      
      Not-signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      5b55ebad
    • Valentin Schneider's avatar
      arm64: defconfig: Add USB support for HiKey960 · 2d8014f0
      Valentin Schneider authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      2d8014f0
    • Yu Chen's avatar
      dts: hi3660: Add support for usb on Hikey960 · a9430f1b
      Yu Chen authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      This patch adds support for usb on Hikey960.
      
      Cc: Wei Xu <xuwei5@hisilicon.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Binghui Wang <wangbinghui@hisilicon.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      a9430f1b
    • Yu Chen's avatar
      usb: gadget: Add configfs attribuite for controling match_existing_only · d71a473d
      Yu Chen authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      Currently the "match_existing_only" of usb_gadget_driver in configfs is
      set to one which is not flexible.
      Dwc3 udc will be removed when usb core switch to host mode. This causes
      failure of writing name of dwc3 udc to configfs's UDC attribuite.
      To fix this we need to add a way to change the config of
      "match_existing_only".
      There are systems like Android do not support udev, so adding
      "match_existing_only" attribute to allow configuration by user is cost little.
      This patch adds a configfs attribuite for controling match_existing_only
      which allow user to config "match_existing_only".
      
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Felipe Balbi <balbi@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Binghui Wang <wangbinghui@hisilicon.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      d71a473d
    • Yu Chen's avatar
      hikey960: Support usb functionality of Hikey960 · 842ccae6
      Yu Chen authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      This driver handles usb hub power on and typeC port event of HiKey960 board:
      1)DP&DM switching between usb hub and typeC port base on typeC port
      state
      2)Control power of usb hub on Hikey960
      3)Control vbus of typeC port
      
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Binghui Wang <wangbinghui@hisilicon.com>
      Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      842ccae6
    • Yu Chen's avatar
      usb: dwc3: Registering a role switch in the DRD code. · 4b585169
      Yu Chen authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      The Type-C drivers use USB role switch API to inform the
      system about the negotiated data role, so registering a role
      switch in the DRD code in order to support platforms with
      USB Type-C connectors.
      
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Felipe Balbi <balbi@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
      Suggested-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      4b585169
    • Yu Chen's avatar
      usb: roles: Add usb role switch notifier. · 8b2824dd
      Yu Chen authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      This patch adds notifier for drivers want to be informed of the usb role switch.
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Suggested-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      8b2824dd
    • Yu Chen's avatar
      phy: Add usb phy support for hi3660 Soc of Hisilicon · 12fdd56f
      Yu Chen authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      This driver handles usb phy power on and shutdown for hi3660 Soc of
      Hisilicon.
      
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Kishon Vijay Abraham I <kishon@ti.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Cc: Pengcheng Li <lpc.li@hisilicon.com>
      Cc: Jianguo Sun <sunjianguo1@huawei.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Jiancheng Xue <xuejiancheng@hisilicon.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Binghui Wang <wangbinghui@hisilicon.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      12fdd56f
    • Yu Chen's avatar
      usb: dwc3: Add two quirks for Hisilicon Kirin Soc Platform · d5c17c60
      Yu Chen authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      There are tow quirks for DesignWare USB3 DRD Core of Hisilicon Kirin Soc.
      1)SPLIT_BOUNDARY_DISABLE should be set for Host mode
      2)A GCTL soft reset should be executed when switch mode
      
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Felipe Balbi <balbi@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Binghui Wang <wangbinghui@hisilicon.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      d5c17c60
    • Yu Chen's avatar
      usb: dwc3: dwc3-of-simple: Add support for dwc3 of Hisilicon Soc Platform · 01a875cd
      Yu Chen authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      This patch adds support for the poweron and shutdown of dwc3 core
      on Hisilicon Soc Platform.
      01a875cd
    • Yu Chen's avatar
      dt-bindings: misc: Add bindings for HiSilicon usb hub and data role switch... · 18d4d25e
      Yu Chen authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      dt-bindings: misc: Add bindings for HiSilicon usb hub and data role switch functionality on HiKey960
      
      This patch adds binding documentation to support usb hub and usb
      data role switch of Hisilicon HiKey960 Board.
      
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Binghui Wang <wangbinghui@hisilicon.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      18d4d25e
    • Yu Chen's avatar
      dt-bindings: phy: Add support for HiSilicon's hi3660 USB PHY · e94ba02f
      Yu Chen authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      This patch adds binding documentation for supporting the hi3660 usb
      phy on boards like the HiKey960.
      
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Binghui Wang <wangbinghui@hisilicon.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      e94ba02f
    • Heikki Krogerus's avatar
      device connection: Add fwnode member to struct device_connection · 772331be
      Heikki Krogerus authored and Dietmar Eggemann's avatar Dietmar Eggemann committed
      
      
      This will prepare the device connection API for connections
      described in firmware.
      
      Acked-by: default avatarHans de Goede <hdegoede@redhat.com>
      Reviewed-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Reviewed-by: default avatarJun Li <jun.li@nxp.com>
      Signed-off-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      (cherry picked from commit 09aa11cf)
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      772331be
Loading