pyg_lib.sampler
- neighbor_sample(rowptr: Tensor, col: Tensor, seed: Tensor, num_neighbors: List[int], node_time: Optional[Tensor] = None, edge_time: Optional[Tensor] = None, seed_time: Optional[Tensor] = None, edge_weight: Optional[Tensor] = None, csc: bool = False, replace: bool = False, directed: bool = True, disjoint: bool = False, temporal_strategy: str = 'uniform', return_edge_id: bool = True) Tuple[Tensor, Tensor, Tensor, Optional[Tensor], List[int], List[int]] [source]
Recursively samples neighbors from all node indices in
seed
in the graph given by(rowptr, col)
.Note
For temporal sampling, the
col
vector needs to be sorted according totime
within individual neighborhoods since we use binary search to find neighbors that fulfill temporal constraints.- Parameters:
rowptr (
Tensor
) – Compressed source node indices.col (
Tensor
) – Target node indices.seed (
Tensor
) – The seed node indices.num_neighbors (
List
[int
]) – The number of neighbors to sample for each node in each iteration. If an entry is set to-1
, all neighbors will be included.node_time (
Optional
[Tensor
], default:None
) – Timestamps for the nodes in the graph. If set, temporal sampling will be used such that neighbors are guaranteed to fulfill temporal constraints, i.e. sampled nodes have an earlier or equal timestamp than the seed node. If used, thecol
vector needs to be sorted according to time within individual neighborhoods. Requiresdisjoint=True
. Only eithernode_time
oredge_time
can be specified.edge_time (
Optional
[Tensor
], default:None
) – Timestamps for the edges in the graph. If set, temporal sampling will be used such that neighbors are guaranteed to fulfill temporal constraints, i.e. sampled edges have an earlier or equal timestamp than the seed node. If used, thecol
vector needs to be sorted according to time within individual neighborhoods. Requiresdisjoint=True
. Only eithernode_time
oredge_time
can be specified.seed_time (
Optional
[Tensor
], default:None
) – Optional values to override the timestamp for seed nodes. If not set, will use timestamps innode_time
as default for seed nodes. Needs to be specified in case edge-level sampling is used viaedge_time
.edge_weight (
Optional
[Tensor
], default:None
) – If given, will perform biased sampling based on the weight of each edge.csc (
bool
, default:False
) – If set toTrue
, assumes that the graph is given in CSC format(colptr, row)
.replace (
bool
, default:False
) – If set toTrue
, will sample with replacement.directed (
bool
, default:True
) – If set toFalse
, will include all edges between all sampled nodes.disjoint (
bool
, default:False
) – If set toTrue
, will create disjoint subgraphs for every seed node.temporal_strategy (
str
, default:'uniform'
) – The sampling strategy when using temporal sampling ("uniform"
,"last"
).return_edge_id (
bool
, default:True
) – If set toFalse
, will not return the indices of edges of the original graph.
- Returns:
Tuple
[Tensor
,Tensor
,Tensor
,Optional
[Tensor
],List
[int
],List
[int
]] – Row indices, col indices of the returned subtree/subgraph, as well as original node indices for all nodes sampled. In addition, may return the indices of edges of the original graph. Lastly, returns information about the sampled amount of nodes and edges per hop.
- hetero_neighbor_sample(rowptr_dict: Dict[Tuple[str, str, str], Tensor], col_dict: Dict[Tuple[str, str, str], Tensor], seed_dict: Dict[str, Tensor], num_neighbors_dict: Dict[Tuple[str, str, str], List[int]], node_time_dict: Optional[Dict[str, Tensor]] = None, edge_time_dict: Optional[Dict[Tuple[str, str, str], Tensor]] = None, seed_time_dict: Optional[Dict[str, Tensor]] = None, edge_weight_dict: Optional[Dict[Tuple[str, str, str], Tensor]] = None, csc: bool = False, replace: bool = False, directed: bool = True, disjoint: bool = False, temporal_strategy: str = 'uniform', return_edge_id: bool = True) Tuple[Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor], Dict[str, Tensor], Optional[Dict[Tuple[str, str, str], Tensor]], Dict[str, List[int]], Dict[Tuple[str, str, str], List[int]]] [source]
Recursively samples neighbors from all node indices in
seed_dict
in the heterogeneous graph given by(rowptr_dict, col_dict)
.Note
Similar to
neighbor_sample()
, but expects a dictionary of node types (str
) and edge types (Tuple[str, str, str]
) for each non-boolean argument. Seeneighbor_sample()
for more details.
- subgraph(rowptr: Tensor, col: Tensor, nodes: Tensor, return_edge_id: bool = True) Tuple[Tensor, Tensor, Optional[Tensor]] [source]
Returns the induced subgraph of the graph given by
(rowptr, col)
, containing only the nodes innodes
.
- random_walk(rowptr: Tensor, col: Tensor, seed: Tensor, walk_length: int, p: float = 1.0, q: float = 1.0) Tensor [source]
Samples random walks of length
walk_length
from all node indices inseed
in the graph given by(rowptr, col)
, as described in the “node2vec: Scalable Feature Learning for Networks” paper.- Parameters:
rowptr (
Tensor
) – Compressed source node indices.col (
Tensor
) – Target node indices.seed (
Tensor
) – Seed node indices from where random walks start.walk_length (
int
) – The walk length of a random walk.p (
float
, default:1.0
) – Likelihood of immediately revisiting a node in the walk.q (
float
, default:1.0
) – Control parameter to interpolate between breadth-first strategy and depth-first strategy.
- Returns:
Tensor
– A tensor of shape[seed.size(0), walk_length + 1]
, holding the nodes indices of each walk for each seed node.