pyg_lib.sampler

neighbor_sample(rowptr: Tensor, col: Tensor, seed: Tensor, num_neighbors: List[int], node_time: Optional[Tensor] = None, edge_time: Optional[Tensor] = None, seed_time: Optional[Tensor] = None, edge_weight: Optional[Tensor] = None, csc: bool = False, replace: bool = False, directed: bool = True, disjoint: bool = False, temporal_strategy: str = 'uniform', return_edge_id: bool = True) → Tuple[Tensor, Tensor, Tensor, Optional[Tensor], List[int], List[int]][source]

Recursively samples neighbors from all node indices in seed in the graph given by (rowptr, col).

Note

For temporal sampling, the col vector needs to be sorted according to time within individual neighborhoods since we use binary search to find neighbors that fulfill temporal constraints.

Parameters:

rowptr (Tensor) – Compressed source node indices.
col (Tensor) – Target node indices.
seed (Tensor) – The seed node indices.
num_neighbors (List[int]) – The number of neighbors to sample for each node in each iteration. If an entry is set to -1, all neighbors will be included.
node_time (Optional[Tensor], default: None) – Timestamps for the nodes in the graph. If set, temporal sampling will be used such that neighbors are guaranteed to fulfill temporal constraints, i.e. sampled nodes have an earlier or equal timestamp than the seed node. If used, the col vector needs to be sorted according to time within individual neighborhoods. Requires disjoint=True. Only either node_time or edge_time can be specified.
edge_time (Optional[Tensor], default: None) – Timestamps for the edges in the graph. If set, temporal sampling will be used such that neighbors are guaranteed to fulfill temporal constraints, i.e. sampled edges have an earlier or equal timestamp than the seed node. If used, the col vector needs to be sorted according to time within individual neighborhoods. Requires disjoint=True. Only either node_time or edge_time can be specified.
seed_time (Optional[Tensor], default: None) – Optional values to override the timestamp for seed nodes. If not set, will use timestamps in node_time as default for seed nodes. Needs to be specified in case edge-level sampling is used via edge_time.
edge_weight (Optional[Tensor], default: None) – If given, will perform biased sampling based on the weight of each edge.
csc (bool, default: False) – If set to True, assumes that the graph is given in CSC format (colptr, row).
replace (bool, default: False) – If set to True, will sample with replacement.
directed (bool, default: True) – If set to False, will include all edges between all sampled nodes.
disjoint (bool, default: False) – If set to True , will create disjoint subgraphs for every seed node.
temporal_strategy (str, default: 'uniform') – The sampling strategy when using temporal sampling ("uniform", "last").
return_edge_id (bool, default: True) – If set to False, will not return the indices of edges of the original graph.

Returns:

Tuple[Tensor, Tensor, Tensor, Optional[Tensor], List[int], List[int]] – Row indices, col indices of the returned subtree/subgraph, as well as original node indices for all nodes sampled. In addition, may return the indices of edges of the original graph. Lastly, returns information about the sampled amount of nodes and edges per hop.

hetero_neighbor_sample(rowptr_dict: Dict[Tuple[str, str, str], Tensor], col_dict: Dict[Tuple[str, str, str], Tensor], seed_dict: Dict[str, Tensor], num_neighbors_dict: Dict[Tuple[str, str, str], List[int]], node_time_dict: Optional[Dict[str, Tensor]] = None, edge_time_dict: Optional[Dict[Tuple[str, str, str], Tensor]] = None, seed_time_dict: Optional[Dict[str, Tensor]] = None, edge_weight_dict: Optional[Dict[Tuple[str, str, str], Tensor]] = None, csc: bool = False, replace: bool = False, directed: bool = True, disjoint: bool = False, temporal_strategy: str = 'uniform', return_edge_id: bool = True) → Tuple[Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor], Dict[str, Tensor], Optional[Dict[Tuple[str, str, str], Tensor]], Dict[str, List[int]], Dict[Tuple[str, str, str], List[int]]][source]

Recursively samples neighbors from all node indices in seed_dict in the heterogeneous graph given by (rowptr_dict, col_dict).

Note

Similar to neighbor_sample(), but expects a dictionary of node types (str) and edge types (Tuple[str, str, str]) for each non-boolean argument. See neighbor_sample() for more details.

Return type:: Tuple[Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor], Dict[str, Tensor], Optional[Dict[Tuple[str, str, str], Tensor]], Dict[str, List[int]], Dict[Tuple[str, str, str], List[int]]]

subgraph(rowptr: Tensor, col: Tensor, nodes: Tensor, return_edge_id: bool = True) → Tuple[Tensor, Tensor, Optional[Tensor]][source]

Returns the induced subgraph of the graph given by (rowptr, col), containing only the nodes in nodes.

Parameters:

rowptr (Tensor) – Compressed source node indices.
col (Tensor) – Target node indices.
nodes (Tensor) – Node indices of the induced subgraph.
return_edge_id (bool, default: True) – If set to False, will not return the indices of edges of the original graph contained in the induced subgraph.

Returns:

Tuple[Tensor, Tensor, Optional[Tensor]] – Compressed source node indices and target node indices of the induced subgraph. In addition, may return the indices of edges of the original graph.

random_walk(rowptr: Tensor, col: Tensor, seed: Tensor, walk_length: int, p: float = 1.0, q: float = 1.0) → Tensor[source]

Samples random walks of length walk_length from all node indices in seed in the graph given by (rowptr, col), as described in the “node2vec: Scalable Feature Learning for Networks” paper.

Parameters:

rowptr (Tensor) – Compressed source node indices.
col (Tensor) – Target node indices.
seed (Tensor) – Seed node indices from where random walks start.
walk_length (int) – The walk length of a random walk.
p (float, default: 1.0) – Likelihood of immediately revisiting a node in the walk.
q (float, default: 1.0) – Control parameter to interpolate between breadth-first strategy and depth-first strategy.

Returns:

Tensor – A tensor of shape [seed.size(0), walk_length + 1], holding the nodes indices of each walk for each seed node.