

# Step 3: Train PyG models with Quiver # for batch_size, n_id, adjs in train_loader: # Comment out PyG training loop for seeds in train_loader: # Use PyTorch training loop in Quiver n_id, batch_size, adjs = quiver_sampler. # Step 2: Replace PyG feature collectors # feature = data.x.to(device) # Comment out PyG feature collector quiver_feature = quiver. edge_index), sizes =) # Quiver: Graph sampler DataLoader( train_idx) # Quiver: PyTorch Dataloader quiver_sampler = quiver. ) # Comment out PyG sampler train_loader = torch. # Step 1: Replace PyG graph sampler # train_loader = NeighborSampler(data.edge_index. More details of our feature aggregation techniques can be found in our repo quiver-feature.īelow is a chart that describes a benchmark that evaluates the performance of Quiver, PyG (2.0.1) and DGL (0.7.0) on a 4-GPU server that runs the Open Graph Benchmark.įor system design details, see Quiver's design overview (Chinese version: 设计简介). Reads to access remote features over NVLink/InfiniBand. Quiver uses GPU kernels that can leverage efficient one-sided ThisĪllows GPUs to achieve low-latency access to features when Such as NVLink and InfiniBand, to their peer GPUs. A primary objective of feature placement is toĮnable GPUs to take advantage of low-latency connectivity, Quiver uses the feature access probability metric to place popular features strategically on GPUs. Quiver enables faster feature aggregation with the following techniques:

Quiver is thus easy to be adopted by PyG users and deployed in production clusters.įeature aggregation is one of the performance bottleneck of GNN systems.
#Neo classic torch and quiver meaning code
Easy to use: To use Quiver, developers only need to add a few lines of code in existing PyG programs.This is contributed by Quiver's novel adaptive data/feature/processor management techniques and effective usage of fast networking technologies (e.g., NVLink and RDMA). High scalability: Quiver can achieve (super) linear scalability in distributed graph learning. Quiver thus often significantly out-perform PyG and DGL even with a single GPU (see benchmark results below), especially when processing large-scale datasets and models. High performance: Quiver enables GPUs to be effectively used in accelerating performance-critical graph learning tasks: graph sampling, feature collection and data-parallel training. To make such scaling effective, Quiver has several novel features: A typical scenario is: Users can use the easy-to-use APIs of PyG to efficiently develop graph learning programs, and rely on Quiver to run these PyG programs at large scale. The primary motivation for this project is to make it easy to take a PyG program and scale it across many GPUs and CPUs. Quiver achieves up to 35$\times$ lower latency with a 8$\times$ higher throughput compared to state-of-the-art GNN approaches (DGL and PyG).īelow is a figure that describes a benchmark that evaluates the performance of Quiver in serving situation, PyG (2.0.3) and DGL (1.0.2) on a 2-GPU server that runs the Reddit with GraphSage. Quiver uses this metric to assign sampling tasks to GPUs only when the performance gains surpass CPU-based sampling and (2) for feature aggregation, Quiver relies on the feature access probability to decide which features to partition and replicate across a distributed GPU NUMA topology.

Quiver's key idea is to exploit workload metrics for predicting the irregular computation of GNN requests, and governing the use of GPUs for graph sampling and feature aggregation: (1) for graph sampling, Quiver calculates the probabilistic sampled graph size, a metric that predicts the degree of parallelism in graph sampling.

#Neo classic torch and quiver meaning full
result_queue_list()Ī full example using Quiver to serve a GNN model with Reddit dataset on a single machine can be found here. # result_queue_list = result_queue_list = server. # Instantiate the inference server component server = InferenceServer( model_path, dataset, sampled_queue_list. # sampled_request_queue_list = sampled_queue_list = hybrid_sampler. # Instantiate the sampler component hybrid_sampler = HybridSampler( dataset, batched_queue_list. # batched_request_queue_list = batched_queue_list = request_batcher. # Instantiate the auto batch component request_batcher = RequestBatcher( stream_input_queue. multiprocessing import Queue from quiver import AutoBatch, ServingSampler, ServerInference # Define dataset and sampler dataset = Reddit(.)
