NetGrep: fast network schema searches in interactomes

E Banks, E Nabieva, R Peterson, M Singh. Genome Biology, 2008, 9:R138

This software was created by the Singh Lab at Princeton University.

NetGrep is a tool used to find instances of schemas within interaction networks. A schema is a subgraph connected by edges in a specific manner with nodes annotated by specific descriptions. In protein interaction networks the nodes represent proteins and the edges represent the interactions among them. NetGrep searches a given interaction network to find all possible matches to the provided schema.

Click on a screenshot below to see a full-size image.

Creating Schemas
The "Schema Description Area" (with black background) is used to describe the target schema. Actions can be performed by right-clicking on various areas in the Schema Description Area (i.e. on an empty area to add a new node or on nodes and edges to modify them). Alternatively, one can use the first section under the Graph Operations menu for the same functionality. The "Quick Start" menu can be used to add commonly used (sub)schemas to the Schema Description Area. Nodes can be moved around to convenient places by (left-)clicking and dragging. Sample schema queries can be found under the Examples menu.

Annotating Nodes
One can click on a node to bring up its "Annotation Panel" (the middle lower panel). From this panel, one can annotate the selected node or create/modify edges between this node and others (see the "About Edges" section). Clicking on the "Annotate this node" link opens a menu of annotation choices. Within many individual annotation sections, there is a search box available to facilitate finding a specific annotation among the many options available. Right-clicking on a specific annotation will pop up an internet browser with a description of that annotation; left-clicking will annotate the selected node with the given annotation. Once a node is annotated, that description can be modifed by either clearing the annotation or adding to it - either by a logical OR ('expanding' the description) or AND ('restricting' the description). Note that multiple expressions are evaluated from left to right.

About Edges
Edges can be added to connect any two nodes in the graph. The edges can be specified to be of one particular type or of multiple types - in which case the types can be inclusive ("any of these types") or exclusive ("all of these types"). Note that some edge types are directed, so that it matters which node is used to create the edge. Self-edges are allowed for directed edges. Note that the whole schema must be connected (i.e. there cannot be two or more unconnected parts) or the program will return an error when searching for matches.

Option Nodes/Edges
When a node is marked as optional, NetGrep will find matches both with and without the given node. When searching for matches without the given optional node, the graph is considered fully connected so that interactions continue through the optional node. For example, if the following nodes are connected in order: 'A-B-C', with B being optional, then NetGrep will search for paths that match 'A-B-C' and that match 'A-C'. The interaction type between nodes A and C will be the logical OR of the interactions between A-B and between B-C. Having optional nodes allows flexibility in describing schemas, so that one could easily request, for example, 'a succession of between two and four kinases'. Note that nodes with more than two interactions cannot be optional.

Optional edges are a similar concept applied to connections between nodes. Because edge data might be incomplete for some of the underlying interaction networks, a user might want to specify that an edge be optional. This in essence allows the schema subgraph to be unconnected.

Finding Matches
The "Info Panel" (bottom left panel) allows the user to toggle between various interaction networks in which to find matches to the target schema. Also, the user can set the maximum number of matching instances to be returned; if more than the maximum number is found then the search exits and displays what it has found to that point. Finally, the "Find Matches" button initiates the search for matches.

When the search is complete, the results are displayed in the "Results Panel" (the bottom right panel). Results are sorted by scoring the confidence values of their inherent interactions. Note that only the top 300 matches are displayed; to view all of the results or to save the results, click the "Download All Results" button. Moving the cursor over a specific protein shows which node in the schema it relates to; clicking a specific protein will pop up an internet browser with information about that protein. Finally, results can be "clustered" to discern common structures.

Node Colors
The meaning of various protein colors is as follows:

Walk through a sample NetGrep query

This research has been supported by NSF CCF-0542187, NIH GM076275 and the NIH Center of Excellence Grant P50 GM071508.