- May 14, 2020
-
-
- May 13, 2020
-
-
Save some memory and speed-up selection based on function for the print/bprint/bputs dataframes.
-
Use pandas categorical dtype as an intermediate type to deduplicate the strings to decode. Since decoding is expensive, this is faster, and the result is also less memory hungry since strings are deduplicated.
-
Douglas Raillard authored
Speed up trace parsing
-
Douglas RAILLARD authored
* Accept non-strict superset of checked fields, i.e. an event with only one "cpu" field will be accepted. * Accept a simple "pid" field in addition to both "pid" and "comm" * Add "cpu_id" field as alternative to "cpu"
-
Douglas RAILLARD authored
-
Douglas RAILLARD authored
Use the actual field name "cpu_id" rather than the renamed version.
-
Douglas RAILLARD authored
Add capacity CPU signal and stop using raw cpu_capacity in favor of the function with renamed columns.
-
Douglas RAILLARD authored
Allow filtering on a list of CPUs.
-
Douglas RAILLARD authored
Similarly to add_target_src(), add a source by inferring data from a Trace object. This allows: * Using heuristics based on the content of the trace if only_reliable==False * Using metadata extracted from the trace file and made available by the parser, since this can include information recorded about the platform.
-
Douglas RAILLARD authored
-
Douglas RAILLARD authored
Since it's hardcoded to 1 in the kernel, avoid parsing it to save memory.
-
Douglas RAILLARD authored
Since df_events() has been renamed into df_event(), add a compatibility stub under the old names.
-
Douglas RAILLARD authored
Since that function only gives the dataframe of one event at a time, remove the "s" from its name. Note: this commit was 100% generated using the following command: sed --follow-symlinks -i 's/df_events/df_event/g' {lisa,doc}/**.py doc/**.rst ipynb/**ipynb tools/*.py
-
Douglas RAILLARD authored
BREAKING CHANGE Since the scope of sanitization functions have been reduced to formatting data so that it matches what is used inside the kernel, remove column renaming. This was already done in analysis methods anyway, which are much better suited to having ad-hoc parameters tailored to a specific event.
-
Douglas RAILLARD authored
If needed, this processing will be reintroduced as part of an analysis, as the sanitization is only there to provide values similar to what the kernel manipulates. Column names modifications or arbitrary augmentation of the data needs to be done in separate functions, with custom parameters driving their behavior, and with the ability to deprecate them if necessary.
-
Douglas RAILLARD authored
Isolate event sanitization that is likely to be removed soon.
-
Douglas RAILLARD authored
Remove the renaming logic from Trace sanitization and remove the renaming that turned out to be a no-op.
-
Douglas RAILLARD authored
BREAKING CHANGE Move more advanced data processing to trace analysis for the following events: * funcgraph_entry: trace.analysis.functions.df_funcgraph('entry') * funcgraph_exit: trace.analysis.functions.df_funcgraph('exit')
-
Douglas RAILLARD authored
BREAKING CHANGE Rename into JSONStatsFunctionsAnalysis to free up the name for an ftrace-based analysis.
-
Douglas RAILLARD authored
BREAKING CHANGE Consolidate the cleanup logic in FrequencyAnalysis.df_cpus_frequency(). If you want that cleaned up version, please use that: trace.analysis.frequency.df_cpus_frequency() Instead of that: trace.df_events('cpu_frequency') Since the later is the raw version.
-
Douglas RAILLARD authored
Return a cleaned up dataframe with: * 4294967295 replaced with -1 * cpu_id column replaced by more wildly used "cpu"
-
Douglas RAILLARD authored
GroupBy.first() raises some unexpected exceptions when used on categorical data, so use head(1) instead. Since head() resets the index, we have to update a bit of code around.
-
Douglas RAILLARD authored
Use category dtype for "__comm" and "comm" columns, since they are heavily used strings. This allows saving a lot of memory, at the expense of groupby() speed (x10 slowdown). The filtering speed is a bit improved (x1.1 speed up) sched_switch dataframe also uses category for prev_comm and next_comm columns, achieving a reduction of 5x in memory consumption.
-
Douglas RAILLARD authored
Prepare for using category dtype for which the expected behavior needs observed=True to be passed.
-
Douglas RAILLARD authored
LISA is now able to parse traces in linear time and memory consumption with lower peak memory and shorter parse time.
-
Douglas RAILLARD authored
Since meta events cannot be queried directly in "trace-cmd start", replace such events with the source events that will allow getting them.
-
Douglas RAILLARD authored
Since it's a userspace event generated by writing to trace_marker, prefix the event with "userspace@"
-
Douglas RAILLARD authored
Userspace events provided through trace_marker need to be prefixed with "userspace@" in order to trigger the meta event parser.
-
Douglas RAILLARD authored
BREAKING CHANGE Parse text traces based on the output of `trace-cmd report -R` Also add some convenience parser for the human readable format. Quick & dirty events based on calling trace_printk() in the kernel now need to be prefixed with "trace_printk@" when parsed in lisa: kernel: trace_printk("foo: field1=42 field2=hello world") lisa : trace.df_events('trace_printk@foo') Matching on calling function name can be achieved with: kernel: (called from function foo()) trace_printk("field1=42 field2=hello world") lisa : trace.df_events('trace_printk@func@foo') Similarly, events generated from userspace by writing to /sys/kernel/debug/tracing/trace_marker need to be prefixed: user: echo "foo: field1=42 field2=hello world" > /sys/kernel/debug/tracing/trace_marker lisa: trace.df_events('userspace@foo') Note: The format is expected to follow the raw format of `trace-cmd report -R`, such as in the example. Also, event field names are now reflecting the name they have in the kernel. Since trappy has non-optional renaming of the fields, client code update might be needed. The recommended way is to use the high level df_* functions from the analysis where applicable, since they will give friendlier names and can usually cope with multiple formats of the event across kernel versions if necessary.
-
Align some private methods on some kind of naming convention.
-
Since potential parallel parsing would be achieved at the parser level, having it in the Trace class is useless.
-
The input being the cached raw dataframe, it must not be modified in any way by the sanitized functions.
-
If prev_state is a string, use the newly introduced TaskState.from_sched_switch_str() factory to convert it to an integer.
-
Douglas RAILLARD authored
Since that column is not guaranteed to be available with all trace parsers, do not refer to it in an analysis.
-
Douglas RAILLARD authored
Just return the kernel event dataframe if there is no devlib event.
-
Douglas RAILLARD authored
Fix a typo that prevented from using the devlib event.
-
When prune=False, do not remove the rows of the duplicate group, but replace the values in output_col with the series returned by the callback instead.
-
Douglas RAILLARD authored
Convert a pandas Series to a given dtype, with a best effort strategy.
-
Douglas RAILLARD authored
take(): Lazily take the first/last N items of an iterable. consume(): Consume the first N items of an iterable, or all of it.
-