#!/usr/bin/perl -w

=head1 NAME

egypt - create call graph from gcc RTL dump

=head1 SYNOPISIS

 egypt [--omit function,function,...] [--include-external] <rtl-file>... | dotty -
 egypt [--omit function,function,...] [--include-external] <rtl-file>... | dot <dot-options>

=head1 DESCRIPTION

Egypt is a simple tool for creating call graphs of C programs.

=head1 OPTIONS

=over 8

=item omit

Omit the given functions from the call graph.  Multiple function names
may be given separated by commas.

=item include-external

Include calls to external functions in the call graph.  A function is
considered external if it is not defined in any of the input files.
For example, functions in the standard C library are external.  Only
direct function calls will be displayed; there is no way to display
the action of taking the address of an external function.

=back

=head1 HOW IT WORKS

The two major tasks in creating a call graph are analyzing the syntax
of the source code to find the function calls and laying out the
graph, but Egypt actually does neither.  Instead, it delegates the
source code analysis to GCC and the graph layout to Graphviz, both of
which are better at their respective jobs than egypt could ever hope
to be itself.  Egypt itself is just a small Perl script that acts as
glue between these existing tools.

Egypt takes advantage of GCC's capability to dump an intermediate
representation of the program being compiled into a file (a I<RTL
file>); this file is much easier to extract information from than a C
source file.  Egypt extracts information about function calls from the
RTL file and massages it into the format used by Graphviz.

=head1 GENERATING THE RTL FILE

Compile the program or source file you want to create a call graph for
with gcc, adding the option "-fdump-rtl-expand" to CFLAGS.  This
option causes gcc to dump its intermediate code representation of each
file it compiles into a a file.  In old versions of GCC this option
was called accepted "-dr", but GCC 4.4.0 and newer accept only the
"-fdump-rtl-expand" form.

For example, the following works for many programs:

   make clean
   make CFLAGS=-fdump-rtl-expand

Depending on the GCC version, the RTL file for a source file F<foo.c>
may be called something like F<foo.c.rtl>, F<foo.c.00.rtl>, or
F<foo.c.00.expand>.

=head1 VIEWING THE CALL GRAPH


To view the call graph in an X11 window, run egypt with one or
more RTL files as command line arguments and pipe its output to the
B<dotty> program from the Graphviz package.  For example, if you
compiled F<foo.c> with C<gcc -fdump-rtl-expand> to
generate F<foo.c.00.expand>, use

    egypt foo.c.00.expand | dotty -

=head1 PRINTING THE CALL GRAPH

To generate a PostScript version of the call graph for printing, use
the B<dot> program from the Graphviz package.  For example, to generate
a callgraph in the file F<callgraph.ps> fitting everything on a US
letter size page in landscape mode, try

   egypt foo.c.00.rtl | dot -Grotate=90 -Gsize=11,8.5 -Tps -o callgraph.ps

Sometimes, the graph will fit better if function calls go from left to
right instead of top to bottom.  The B<dot> option B<-Grankdir=LR>
will do that:

   egypt foo.c.00.rtl | dot -Gsize=8.5,11 -Grankdir=LR -Tps -o callgraph.ps

For nontrivial programs, the graph may end up too small
to comfortably read.  If that happens, try N-up printing:

   egypt foo.c.00.rtl | dot -Gpage=8.5,11 -Tps -o callgraph.ps

You can also try playing with other B<dot> options such as B<-Gratio>,
or for a different style of graph, try using B<neato> instead of
B<dot>.  See the Graphwiz documentation for more information about the
various options available for customizing the style of the graph.

=head1 READING THE CALL GRAPH

Function calls are displayed as solid arrows.  A dotted arrow means
that the function the arrow points from takes the address of the
function the arrow points to.

=head1 INDIRECT FUNCTION CALLS

Egypt does not display indirect function calls.  Doing that is
impossible in the general case: determining which functions will call
each other indirectly at runtime would require solving the halting
problem.

The dotted arrows generated by egypt are sometimes misunderstood to
represent indirect calls, but that's not the case; they represent
taking the address of a function, resulting in a function pointer.
That function pointer will typically be used to make an indirect
function call at some later time, but the call is not necessarily made
from the same function where there address was taken, and it is
generally not possible to determine where or even whether that call
will take place.

The dotted arrows may or may not be useful for understanding the
program structure depending on the particular style of programming
used.  One case where they are often useful is with event-driven
programs where a sequence of events is handled by a chain of callback
functions, each one registering the address of the next with the event
handling framework before returning to the event loop.  In such a
program, the dotted arrows will indicate which callbacks cause which
other callbacks to be invoked; such a graph may to be more useful than
a graph of the actual indirect calls, which would just show the event
loop calling every callback.

=head1 C++ SUPPORT

Egypt provides limited support for C++.  When used with GCC version
4 or newer, egypt will automatically demangle C++ member function
names and display them in in the native C++ syntax, e.g., C<C::f()>.
Egypt will not display virtual function calls, because there is no
easy way to determine which virtual function is being called
based on the RTL.

=head1 WHY IS IT CALLED EGYPT?

Egypt was going to be called B<rtlcg>, short for I<RTL Call Graph>,
but it turned out to be one of those rare cases where ROT13'ing the
name made it easier to remember and pronounce.

=head1 SEE ALSO

L<gcc>, L<dotty>, L<dot>, L<neato>

=head1 COPYRIGHT

Copyright 1994-2011 Andreas Gustafsson

This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.

=head1 AUTHOR

Andreas Gustafsson

=cut

use strict;
use Getopt::Long;

use vars qw($VERSION);

$VERSION = "1.10";

# A data structure containing information about potential function
# calls.  This is a reference to a hash table where the key is a
# the name of a function (the caller) and the value is a reference
# to another hash table indexed by the name of a symbol referenced
# by the caller (the potential callee) and a value of "call"
# (if the reference is a direct function call) or "ref"
# (if the reference is a non-function-call symbol reference;
# if the referenced symbol itself turns out to be a function,
# this will be considered an indirect function call).

my $calls = { };

# A map from mangled C++ names to the corresponding demangled ones
my $demangle = { };

# The current function
my $curfunc;

# Functions to omit
my @omit = ();
my $include_external = 0;

# Mapping from symbol reference types to dot styles
my $styles = {
    call => 'solid',
    ref => 'dotted'
};

sub demangle {
    my ($name) = @_;
    $name = $demangle->{$name} || $name;
    # Escape embedded quotes
    $name =~ s/\"/\\\"/g;
    return $name;
}

GetOptions('omit=s' => \@omit,
	   'include-external' => \$include_external);

@omit = split(/,/, join(',', @omit));

sub enter_func {
    my ($funcname) = @_;
    $curfunc = $funcname;
    $calls->{$curfunc} = { } if ! exists($calls->{$curfunc});
}

while (<>) {
    chomp;
    if (/^;; Function (\S+)\s*$/) {
	# pre-gcc4 style
	enter_func($1);
    } elsif (/^;; Function (.*)\s+\((\S+)(,.*)?\).*$/) {
	# gcc4 style
	# Compiling for ARM, it can look like ";; Function foo (foo)[0:3]"	
	enter_func($2);
	$demangle->{$curfunc} = $1;
	$calls->{$curfunc} = { } if ! exists($calls->{$curfunc});
    }
    if (/^.*\(call.*"(.*)".*$/) {
	$calls->{$curfunc}->{$1} = 'call';
    } elsif (/^.*\(symbol_ref.*"(.*)".*$/) {
	$calls->{$curfunc}->{$1} = 'ref';
    }
}

delete @$calls{@omit};

my %omit_map;
@omit_map{@omit} = ();

my %unconnected = map { ($_, undef) } keys %{$calls};

print "digraph callgraph {\n";

foreach my $caller (keys %{$calls}) {
    my $caller_d = demangle($caller);
    foreach my $callee (keys %{$calls->{$caller}}) {
	my $reftype = $calls->{$caller}->{$callee};
	# ARM short calls are flagged with a caret prefix; ignore it
	$callee =~ s/^\^+//;
	# If the referenced symbol is not a defined function
	# or a direct call to an external function, ignore it.
	next unless exists($calls->{$callee}) or
	    $include_external and $reftype eq 'call'
	        and ! exists $omit_map{$callee};
	my $style = $styles->{$reftype};
	my $callee_d = demangle($callee);
	print "\"$caller_d\" -> \"$callee_d\" [style=$style];\n";
	delete $unconnected{$caller};
	delete $unconnected{$callee};
    }
}

foreach my $f (keys %unconnected) {
    my $f_d = demangle($f);
    print "\"$f_d\";\n";
}

print "}\n";
