read.graph {igraph} | R Documentation |
The read.graph
function is able to read graphs in
various representations from a file, or from a http
connection. Currently some simple formats are supported.
read.graph(file, format = c("edgelist", "pajek", "ncol", "lgl", "graphml", "dimacs", "graphdb", "gml"), ...)
file |
The connection to read from. This can be a local file, or
a |
format |
Character constant giving the file format. Right now
|
... |
Additional arguments, see below. |
The read.graph
function may have additional arguments depending
on the file format (the format
argument).
edgelist
This format is a simple text file with numeric vertex ids defining the edges. There is no need to have newline characters between the edges, a simple space will also do.
Additional arguments:
nThe number of vertices in the graph. If it is smaller than or equal to the largest integer in the file, then it is ignored; so it is safe to set it to zero (the default).
directedLogical scalar, whether to create a directed
graph. The default value is TRUE
.
pajek Pajek it a popular network analysis program for Windows. (See the Pajek homepage at http://vlado.fmf.uni-lj.si/pub/networks/pajek/.) It has a quite flexible but not very well documented file format, see the Pajek manual on the Pajek homepage for some information about the file format.
igraph
implements only a subset of the Pajek format:
Only .net files are supported, Pajek project files (which can contain many graph and also other type of data) are not. Poject files might be supported in a forthcoming igraph release if they turned out to be needed.
Time events networks are not supported.
Hypergraphs (graphs with non-binary edges) are not supported as igraph cannot handle them.
Graphs containing both directed and undirected edges are not supported as igraph cannot represent them.
Bipartite (also called affiliation) networks are not supported. The surrent igraph version imports the network structure correctly but vertex type information is omitted.
Graph with multiple edge sets are not supported.
Vertex and edge attributes defined in the Pajek file will be also read and assigned to the graph object to be created. These are mainly parameters for graph visualization, but not exclusively, eg. the file might contain edge weights as well.
The following vertex attributes might be added:
igraph name | description, Pajek attribute |
id | Vertex id |
x, y, z | The ‘ x’ , ‘y’ and ‘z’ coordinate of the vertex |
vertexsize | The size of the vertex when plotted (size in Pajek). |
shape | The shape of the vertex when plotted. |
color | Vertex color (ic in Pajek) if given with symbolic name |
color-red, | |
color-green, | |
color-blue | Vertex color (ic in Pajek) if given in RGB notation |
framecolor | Border color (bc in Pajek) if given with symbolic name |
framecolor-red, | |
framecolor-green, | |
framecolor-blue | Border color (bc in Pajek) if given in RGB notation |
labelcolor | Label color (lc in Pajek) if given with symbolic name |
labelcolor-red, | |
labelcolor-green, | |
labelcolor-blue | Label color (lc in Pajek) if given in RGB notation |
xfact, yfact | The x_fact and y_fact Pajek attributes. |
labeldist | The distance of the label from the vertex. (lr in Pajek.) |
labeldegree, | |
labeldegree2 | The la and lphi Pajek attributes |
framewidth | The width of the border (bw in Pajek). |
fontsize | Size of the label font (fos in Pajek.) |
rotation | The rotation of the vertex (phi in Pajek). |
radius | Radius, for some vertex shapes (r in Pajek). |
diamondratio | For the diamond shape (q in Pajek). |
These igraph attributes are only created if there is at least one vertex
in the Pajek file which has the corresponding associated
information. Eg. if there are vertex coordinates for at least one vertex
then the ‘x’, ‘y’ and possibly also ‘z’
vertex attributes will be created. For those vertices for which the
attribute is not defined, NaN
is assigned.
The following edge attributes might be added:
igraph name | description, Pajek attribute |
weight | Edge weights. |
label | l in Pajek. |
color | Edge color, if the color is given with a symbolic name, c in Pajek. |
color-red, | |
color-green, | |
color-blue | Edge color if it was given in RGB notation, c in Pajek. |
edgewidth | w in Pajek. |
arrowsize | s in Pajek. |
hook1, hook2 | h1 and h2 in Pajek. |
angle1, angle2 | a1 and a2 in Pajek, Bezier curve parameters. |
velocity1, | |
velocity2 | k1 and k2 in Pajek, Bezier curve parameter. |
arrowpos | ap in Pajek. |
labelpos | lp in Pajek. |
labelangle, | |
labelangle2 | lr and lphi in Pajek. |
labeldegree | la in Pajek. |
fontsize | fos in Pajek. |
arrowtype | a in Pajek. |
linepattern | p in Pajek. |
labelcolor | lc in Pajek. |
There are no additional arguments for this format.
graphml GraphML is an XML-based file format (an XML application in the XML terminology) to describe graphs. It is a modern format, and can store graphs with an extensible set of vertex and edge attributes, and generalized graphs which igraph cannot handle. Thus igraph supports only a subset of the GraphML language:
Hypergraphs are not supported.
Nested graphs are not supported.
Mixed graphs, ie. graphs with both directed and undirected edges are not supported. read.graph() sets the graph directed if this is the default in the GraphML file, even if all the edges are in fact undirected.
See the GraphML homepage at http://graphml.graphdrawing.org for more information about the GraphML format.
Additional arguments:
indexIf the GraphML file contains more than one graphs, this argument can be used to select the graph to read. By default the first graph is read (index 0).
GML GML is a simple textual format, see http://www.infosun.fim.uni-passau.de/Graphlet/GML/ for details.
Although all syntactically correct GML can be parsed, we implement only a subset of this format, some attributes might be ignored. Here is a list of all the differences:
Only node
and edge
attributes are
used, and only if they have a simple type: integer, real or
string. So if an attribute is an array or a record, then it is
ignored. This is also true if only some values of the
attribute are complex.
Top level attributes except for Version
and the
first graph
attribute are completely ignored.
Graph attributes except for node
and
edge
are completely ignored.
There is no maximum line length.
There is no maximum keyword length.
Character entities in strings are not interpreted.
We allow inf
(infinity) and nan
(not a number) as a real number. This is case insensitive, so
nan
, NaN
and NAN
are equal.
Please contact us if you cannot live with these limitations of the GML parser.
There are not additional argument for this format.
ncol This format is used by the Large Graph Layout program (http://bioinformatics.icmb.utexas.edu/lgl), and it is simply a symbolic weighted edge list. It is a simple text file with one edge per line. An edge is defined by two symbolic vertex names separated by whitespace. (The symbolic vertex names themselves cannot contain whitespace. They might followed by an optional number, this will be the weight of the edge; the number can be negative and can be in scientific notation. If there is no weight specified to an edge it is assumed to be zero.
The resulting graph is always undirected. LGL cannot deal with files which contain multiple or loop edges, this is however not checked here, as igraph is happy with these.
Additional arguments:
namesLogical constant, whether to add the symbolic names as vertex attributes to the graph. If TRUE the name of the vertex attribute will be ‘name’.
weightsLogical constant, whether to add the weights of the edges as edge attribute ‘weight’.
directedLogical constant, whether to create a directed graph. The default is undirected.
lglThe lgl
format is used by the Large Graph Layout
visualization software
(http://bioinformatics.icmb.utexas.edu/lgl), it can describe
undirected optionally weighted graphs. From the LGL manual:
The second format is the LGL file format (.lgl file suffix). This is yet another graph file format that tries to be as stingy as possible with space, yet keeping the edge file in a human readable (not binary) format. The format itself is like the following:
# vertex1name vertex2name [optionalWeight] vertex3name [optionalWeight]Here, the first vertex of an edge is preceded with a pound sign ‘#’. Then each vertex that shares an edge with that vertex is listed one per line on subsequent lines. LGL cannot handle loop and multiple edges or directed graphs, but in igraph it is not an error to have multiple and loop edges. Additional arguments:
namesLogical constant, whether to add the symbolic names as vertex attributes to the graph. If TRUE the name of the vertex attribute will be ‘name’.
weightsLogical constant, whether to add the weights of the edges as edge attribute ‘weight’.
dimacs The DIMACS file format, more specifically the version for network flow problems, see the files at ftp://dimacs.rutgers.edu/pub/netflow/general-info/
This is a line-oriented text file (ASCII) format. The first
character of each line defines the type of the line. If the first
character is c
the line is a comment line and it is
ignored. There is one problem line (p
) in the file, it
must appear before any node and arc descriptor lines. The problem
line has three fields separated by spaces: the problem type
(min
, max
or asn
), the
number of vertices and number of edges in the graph.
Exactly two node identification lines are expected
(n
), one for the source, one for the target vertex.
These have two fields: the id of the vertex and the type of the
vertex, either s
(=source) or t
(=target). Arc lines start with a
and have three
fields: the source vertex, the target vertex and the edge capacity.
Vertex ids are numbered from 1.
The source vertex is assigned to the source
, the target
vertex to the target
graph attribute. The edge capacities
are assigned to the capacity
edge attribute.
Additional arguments:
directedLogical scalar, whether to create a directed graph. By default a directed graph is created.
graphdbThis is a binary format, used in the graph database for isomorphism testing (http://amalfi.dis.unina.it/graph/) From the graph database homepage (http://amalfi.dis.unina.it/graph/db/doc/graphdbat-2.html):
The graphs are stored in a compact binary format, one graph per file. The file is composed of 16 bit words, which are represented using the so-called little-endian convention, i.e. the least significant byte of the word is stored first.
Then, for each node, the file contains the list of edges coming out of the node itself. The list is represented by a word encoding its length, followed by a word for each edge, representing the destination node of the edge. Node numeration is 0-based, so the first node of the graph has index 0.
See also graph.graphdb
.
Only unlabelled graphs are implemented.
Additional attributes:
directedLogical scalar. Whether to create a directed graph.
A graph object.
Gabor Csardi csardi@rmki.kfki.hu