Commits · f245e12b82da6141cfc62b1581deaf6ba4dc32eb · Beata Michalska / lisa

May 14, 2020
- README: Update to remove reference to trappy · f245e12b
  Douglas RAILLARD authored May 14, 2020 and Douglas Raillard committed May 14, 2020
  
  f245e12b
May 13, 2020

lisa.trace: use category for func col of print/bprint/bputs df · 61379240
Douglas RAILLARD authored May 13, 2020 and Douglas Raillard committed May 13, 2020
```
Save some memory and speed-up selection based on function for the
print/bprint/bputs dataframes.
```
61379240

lisa.datautils: Speed up the conversion from strings to bytes · d39d6169

Douglas RAILLARD authored May 13, 2020 and

Douglas Raillard committed May 13, 2020

Use pandas categorical dtype as an intermediate type to deduplicate the
strings to decode. Since decoding is expensive, this is faster, and the
result is also less memory hungry since strings are deduplicated.

d39d6169

Merge pull request #1399 from douglas-raillard-arm/_home_pr50 · 15c881c8
Douglas Raillard authored May 13, 2020
```
Speed up trace parsing
```
15c881c8

lisa.datautils: Improve SignalDesc.from_event() heuristic · a4b74411

Douglas RAILLARD authored May 13, 2020

* Accept non-strict superset of checked fields, i.e. an event with only
  one "cpu" field will be accepted.
* Accept a simple "pid" field in addition to both "pid" and "comm"
* Add "cpu_id" field as alternative to "cpu"

a4b74411

lisa.analysis.frequency: Use real names of clock_set_rate fields · 81138894
Douglas RAILLARD authored May 12, 2020

81138894
lisa.datautils: Fix cpu_frequency signal field · b5e2a3cc
Douglas RAILLARD authored May 12, 2020
```
Use the actual field name "cpu_id" rather than the renamed version.
```
b5e2a3cc
lisa.analysis.load_tracking: Add "capacity" signal · 83a032e2
Douglas RAILLARD authored May 12, 2020
```
Add capacity CPU signal and stop using raw cpu_capacity in favor of the
function with renamed columns.
```
83a032e2
lisa.analysis.load_tracking: Add df_cpus_signals(cpus=None) option · 8e0a8f8e
Douglas RAILLARD authored May 12, 2020
```
Allow filtering on a list of CPUs.
```
8e0a8f8e

lisa.platforms.platinfo: Add PlatformInfo.add_trace_src() · d862de44

Douglas RAILLARD authored May 11, 2020

Similarly to add_target_src(), add a source by inferring data from a
Trace object. This allows:

* Using heuristics based on the content of the trace if only_reliable==False
* Using metadata extracted from the trace file and made available by the
  parser, since this can include information recorded about the platform.

d862de44

lisa.datautils: Add signals for sched_pelt_* signals · fed77412
Douglas RAILLARD authored May 07, 2020

fed77412
lisa.trace: Remove "success" field of sched_wakeup · f0d02c5c
Douglas RAILLARD authored May 07, 2020
```
Since it's hardcoded to 1 in the kernel, avoid parsing it to save
memory.
```
f0d02c5c
lisa.trace: Add compat stub for Trace.df_events() · 2eb509a0
Douglas RAILLARD authored May 06, 2020
```
Since df_events() has been renamed into df_event(), add a compatibility
stub under the old names.
```
2eb509a0

lisa.trace: Rename Trace.df_events() into df_event() · 826b553f

Douglas RAILLARD authored May 06, 2020

Since that function only gives the dataframe of one event at a time,
remove the "s" from its name.

Note: this commit was 100% generated using the following command:
sed --follow-symlinks -i 's/df_events/df_event/g' {lisa,doc}/**.py doc/**.rst ipynb/**ipynb tools/*.py

826b553f

lisa.trace: Remove barely used Trace.df_events(rename_cols=True) option · 48d761f1

Douglas RAILLARD authored May 04, 2020

BREAKING CHANGE

Since the scope of sanitization functions have been reduced to
formatting data so that it matches what is used inside the kernel,
remove column renaming.

This was already done in analysis methods anyway, which are much better
suited to having ad-hoc parameters tailored to a specific event.

48d761f1

lisa.trace: Remove unused event sanitization · 930a0d5f

Douglas RAILLARD authored May 06, 2020

If needed, this processing will be reintroduced as part of an analysis,
as the sanitization is only there to provide values similar to what the
kernel manipulates.

Column names modifications or arbitrary augmentation of the data needs
to be done in separate functions, with custom parameters driving their
behavior, and with the ability to deprecate them if necessary.

930a0d5f

lisa.trace: Bunch deprecated sanitization together · 990029b8
Douglas RAILLARD authored May 04, 2020
```
Isolate event sanitization that is likely to be removed soon.
```
990029b8
lisa.analysis.load_tracking: Move all column renaming logic in analysis · 6287c6a0
Douglas RAILLARD authored May 04, 2020
```
Remove the renaming logic from Trace sanitization and remove the
renaming that turned out to be a no-op.
```
6287c6a0

lisa.analysis.functions: Move ksym resolution from event sanitization to analysis · 551ec432

Douglas RAILLARD authored May 04, 2020

BREAKING CHANGE

Move more advanced data processing to trace analysis for the following
events:

    * funcgraph_entry: trace.analysis.functions.df_funcgraph('entry')
    * funcgraph_exit: trace.analysis.functions.df_funcgraph('exit')

551ec432

lisa.analysis.functions: Rename FunctionsAnalysis · 9789de3d

Douglas RAILLARD authored May 04, 2020

BREAKING CHANGE

Rename into JSONStatsFunctionsAnalysis to free up the name for an
ftrace-based analysis.

9789de3d

lisa.trace: Remove sanitization for cpu_frequency event · e7cb1cce

Douglas RAILLARD authored May 04, 2020

BREAKING CHANGE

Consolidate the cleanup logic in FrequencyAnalysis.df_cpus_frequency().
If you want that cleaned up version, please use that:
  trace.analysis.frequency.df_cpus_frequency()
Instead of that:
  trace.df_events('cpu_frequency')

Since the later is the raw version.

e7cb1cce

lisa.analysis.idle: Add df_cpu_idle() · 8582bf5f

Douglas RAILLARD authored May 04, 2020

Return a cleaned up dataframe with:

* 4294967295 replaced with -1
* cpu_id column replaced by more wildly used "cpu"

8582bf5f

lisa.trace: Avoid using GroupBy.first() · 720aeec2

Douglas RAILLARD authored May 01, 2020

GroupBy.first() raises some unexpected exceptions when used on
categorical data, so use head(1) instead.

Since head() resets the index, we have to update a bit of code around.

720aeec2

lisa.trace: Use category dtype for some columns · 3f51f8c6

Douglas RAILLARD authored May 01, 2020

Use category dtype for "__comm" and "comm" columns, since they are
heavily used strings. This allows saving a lot of memory, at the
expense of groupby() speed (x10 slowdown). The filtering speed is a bit
improved (x1.1 speed up)

sched_switch dataframe also uses category for prev_comm and next_comm
columns, achieving a reduction of 5x in memory consumption.

3f51f8c6

lisa: Use pandas.DataFrame.groupby(observed=True) · 5b4b1c67
Douglas RAILLARD authored May 01, 2020
```
Prepare for using category dtype for which the expected behavior needs
observed=True to be passed.
```
5b4b1c67

external: trappy: remove dependency · 2016192a

Douglas RAILLARD authored Apr 27, 2020

LISA is now able to parse traces in linear time and memory consumption
with lower peak memory and shorter parse time.

2016192a

lisa.trace: Filter-out meta-events in FtraceCollector · ee1cb5ab

Douglas RAILLARD authored May 07, 2020

Since meta events cannot be queried directly in "trace-cmd start",
replace such events with the source events that will allow getting them.

ee1cb5ab

lisa.analysis.frequency: Rename devlib_cpu_frequency to userspace@... · cd0e22dd
Douglas RAILLARD authored Apr 29, 2020
```
Since it's a userspace event generated by writing to trace_marker,
prefix the event with "userspace@"
```
cd0e22dd

lisa: Rename rtapp events to userspace@... · 45eabace

Douglas RAILLARD authored Apr 29, 2020

Userspace events provided through trace_marker need to be prefixed with
"userspace@" in order to trigger the meta event parser.

45eabace

lisa.trace: Add text trace parser · aa61fcf2

Douglas RAILLARD authored Apr 27, 2020

BREAKING CHANGE

Parse text traces based on the output of `trace-cmd report -R`
Also add some convenience parser for the human readable format.

Quick & dirty events based on calling trace_printk() in the kernel now
need to be prefixed with "trace_printk@" when parsed in lisa:

 kernel: trace_printk("foo: field1=42 field2=hello world")
 lisa  : trace.df_events('trace_printk@foo')

Matching on calling function name can be achieved with:

 kernel: (called from function foo()) trace_printk("field1=42 field2=hello world")
 lisa  : trace.df_events('trace_printk@func@foo')

Similarly, events generated from userspace by writing to
/sys/kernel/debug/tracing/trace_marker need to be prefixed:

 user: echo "foo: field1=42 field2=hello world" > /sys/kernel/debug/tracing/trace_marker
 lisa: trace.df_events('userspace@foo')

Note: The format is expected to follow the raw format of
`trace-cmd report -R`, such as in the example.

Also, event field names are now reflecting the name they have in the
kernel. Since trappy has non-optional renaming of the fields, client
code update might be needed. The recommended way is to use the high
level df_* functions from the analysis where applicable, since they will
give friendlier names and can usually cope with multiple formats of the
event across kernel versions if necessary.

aa61fcf2

lisa.trace: rename private methods of Trace · 5bc4732c
douglas-raillard-arm authored Apr 16, 2020 and Douglas RAILLARD committed May 13, 2020
```
Align some private methods on some kind of naming convention.
```
5bc4732c
lisa.trace: remove dead code · 925b7ae0
douglas-raillard-arm authored Apr 16, 2020 and Douglas RAILLARD committed May 13, 2020
```
Since potential parallel parsing would be achieved at the parser level,
having it in the Trace class is useless.
```
925b7ae0
lisa.trace: Do not modify input in sanitization functions · 4d941716
douglas-raillard-arm authored Apr 15, 2020 and Douglas RAILLARD committed May 13, 2020
```
The input being the cached raw dataframe, it must not be modified in any
way by the sanitized functions.
```
4d941716
lisa.trace: Sanitize sched_switch prev_state if it's a string · 3db5b5cb
douglas-raillard-arm authored Apr 15, 2020 and Douglas RAILLARD committed May 13, 2020
```
If prev_state is a string, use the newly introduced
TaskState.from_sched_switch_str() factory to convert it to an integer.
```
3db5b5cb

lisa.analysis.rta: Remove reference to __line · 1e3f2382

Douglas RAILLARD authored Apr 23, 2020

Since that column is not guaranteed to be available with all trace
parsers, do not refer to it in an analysis.

1e3f2382

lisa.analysis.frequency: Avoid needing devlib_cpu_frequency in df_cpu_frequency() · 438e2e06
Douglas RAILLARD authored Apr 29, 2020
```
Just return the kernel event dataframe if there is no devlib event.
```
438e2e06
lisa.analysis.frequency: Use devlib_cpu_frequency event · 66629872
Douglas RAILLARD authored Apr 29, 2020
```
Fix a typo that prevented from using the devlib event.
```
66629872
lisa.datautils: Add df_combine_duplicates(prune=True) · e0a54e65
douglas-raillard-arm authored Apr 15, 2020 and Douglas RAILLARD committed May 13, 2020
```
When prune=False, do not remove the rows of the duplicate group, but
replace the values in output_col with the series returned by the
callback instead.
```
e0a54e65
lisa.datautils: Add series_convert() · 6e83c688
Douglas RAILLARD authored May 01, 2020
```
Convert a pandas Series to a given dtype, with a best effort strategy.
```
6e83c688

lisa.utils: Add take() and consume() · 6dfc7e5d

Douglas RAILLARD authored Apr 16, 2020

take(): Lazily take the first/last N items of an iterable.
consume(): Consume the first N items of an iterable, or all of it.

6dfc7e5d