Optimize ctags calls (d4141967) · Commits · Tooling / qa-tools

Verified Commit d4141967 authored Jun 11, 2025 by Sean Fitzgibbon
Optimize ctags calls

It's expensive to call ctags once per source file identified in the
DWARF data.  This patch:

- Calls ctags on a bunch of source files at a time.  The '-L -'
  argument avoids the risk of a huge command line hitting OS limits.
  Similarly, we read a line at a time from the stdout pipe to avoid
  holding a huge amount of output at once in memory.

- Recovers the file paths from the lines output by ctags. (It always
  prints them in '-x' mode, even for a single input file).  This
  exposes us to spurious differences in pathnames from ctags and
  objdump, (e.g. redundant '/./'), so we fix them up with
  os.path.relpath().  (This inaccuracy happens elsewhere in
  intermediate_layer.py, and will be addressed in a future patch).

- Similarly, changes FunctionLineNumbers to handle a bunch of
  functions at once.  Even with the cache of ctags results, repeated
  calls to get_line_number() can be expensive.  For example, simply
  checking that the files are already cached takes linear time each
  call and can dominate the run time for some workloads.  This patch
  reduces the number of calls to two - one for functions with source
  information in the DWARF, one for functions without.

- Renames get_line_number() to get_definition_map() to reflect these
  changes.

- Renames 'workspace' to 'local_workspace' in FunctionLineNumbers for
  consistency with the rest of the program.  This is cosmetic, and has
  nothing to do with the rest of the patch.

LIMITATIONS:

- Parsing the file name from ctags output will break for a path with
  embedded whitespace.  This risk is present in the original (because
  of shell=True), so it is assumed to be acceptable here.  Consider
  ctags json output if it becomes a problem.

- The error message for ctags failure appears to lose specificity
  because it no longer outputs each file name.  The likely intent was
  to report problems with particular missing, corrupt or unreadable
  files.  In practice, Exuberant ctags returns success in all these
  cases, even for a single file on the command line.  The only way I
  found to provoke a nonzero exit code was to pass a bad command line
  argument.  The point is that ctags will either fail for every file
  or none of them, so testing each exit code is not effective.  If
  file-specific errors need to be reported, consider capturing and
  logging lines from the stderr pipe.

Change-Id: I89c80a8ac181d6bc01351c0f4922941829a1ed5c
parent 2cd8ff5b
Hide whitespace changes
Inline Side-by-side
Please register or to comment