Amp-powered subgraphs: Move Amp configs from ENV to TOML#6383
Amp-powered subgraphs: Move Amp configs from ENV to TOML#6383
Conversation
Add AmpChainConfig runtime struct in graph/src/components/network_provider/mod.rs with address as parsed Uri, plus token, context_dataset, context_table, and network fields. Add Config::amp_chain_configs() method that produces a HashMap<String, AmpChainConfig> for all chains with an [amp] TOML section. URI parsing is defensive (returns error, not panic) even though ChainSection::validate already rejects invalid URIs. Three unit tests cover constructability, mixed configs, and error handling for invalid addresses.
Replace the single global Amp Flight client with per-chain instances stored in a new AmpClients<AC> wrapper type. Each chain with an [amp] section in the TOML config now gets its own FlightClient, looked up by chain name at the point of use. Key changes: - Add AmpClients<AC> in graph/src/components/network_provider/mod.rs with new(), get(), is_empty() methods - launcher.rs and run.rs: iterate amp_chain_configs() to create per-chain clients; log errors per-chain and continue startup - registrar.rs, instance_manager.rs, amp_subgraph/manager.rs: accept AmpClients instead of Option<Arc<AC>>; look up client by network name from the raw manifest - server/index-node: thread AmpClients through server, service, resolver - Add 3 unit tests for AmpClients wrapper
Improved Amp status logging in launcher.rs and run.rs to provide three distinct log levels based on config state: info when no chains have [amp] configuration (skip connection loop entirely), info with chain names when connections succeed, and warn when all configured chains fail to connect. Added two unit tests verifying AmpClients is_empty() drives Amp manager registration decisions.
…p manifest resolution
Add resolve_amp_start_block function that queries the Amp Flight service for a block hash at a given block number, using auto_block_hash_decoder to detect the block hash column from the returned RecordBatch. In create_subgraph_version, detect Amp subgraphs and use this new path instead of RPC-based block_pointer_from_number when start_block > 0, falling back to RPC on failure.
Remove the GRAPH_AMP_FLIGHT_SERVICE_ADDRESS CLI flag from Opt structs in opt.rs and manager.rs, remove GRAPH_AMP_FLIGHT_SERVICE_TOKEN env var from AmpEnv and Inner envconfig struct. These are now fully superseded by per-chain TOML config. Add unit test verifying AmpEnv constructs correctly without the removed fields.
Verify that full_config.toml parses successfully with the [chains.mainnet.amp] table form and that all five Amp field values (address, token, context_dataset, context_table, network) are correctly deserialized.
…nction Consolidates the duplicated YAML traversal pattern (dataSources[0].network) into a single pub fn in graph::data::subgraph, replacing four identical copies across registrar, instance_manager, amp_subgraph/manager, and index-node resolver.
…e context idents at config load time
|
I feel this would be more logical to configure as a new kind of provider so that in the [chains.mainnet]
shard = "blocks"
provider = [
{ label = "mainnet-amp", details = { type = "amp", url = "https://..", token = "..", pointers = "ethereum.blocks", name = "ethereum-mainnet" },
{ label = "mainnet-rpc", details = { type = "web3", url = "http://..", features = [..] },
..
]For now, we would only allow one |
| /// (e.g., AMP uses `"ethereum-mainnet"` while graph-node uses `"mainnet"`). | ||
| /// This struct is the *runtime* counterpart of the TOML-level `AmpConfig` | ||
| /// (which stores the address as a plain `String`). The `Config::amp_chain_configs()` | ||
| /// method bridges the two by parsing each address string into a `Uri`. |
There was a problem hiding this comment.
You could avoid this almost identical struct by deserializing the Url in the config (have a look at annotations like #[serde(with = ..)] and #[serde(deserialize_with = ..)] in config.rs
core/src/subgraph/registrar.rs
Outdated
| /// This wrapper is used to pass per-chain Amp clients through the system | ||
| /// instead of a single global `Option<Arc<AC>>`. Use `get(chain_name)` to | ||
| /// retrieve the client for a specific chain. | ||
| pub struct AmpClients<AC> { |
There was a problem hiding this comment.
Isn't this pretty much a HashMap? Just a curried type like type AmpClients<V> = HashMap<String, V>;
node/src/manager/commands/run.rs
Outdated
There was a problem hiding this comment.
I believe this is the point where the different config syntax we discussed doesn't matter - Config accepts a different format, but amp_chain_configs can return the same structs as it does now
node/src/manager/commands/run.rs
Outdated
There was a problem hiding this comment.
Since the new config syntax has a label, we should only log the label anywhere where we need to identify the amp server we are talking to. That completely obscures any operational details, like, e.g., host addresses or names
| raw: serde_yaml::Mapping, | ||
| resolver: &Arc<dyn LinkResolver>, | ||
| amp_client: Option<Arc<AC>>, | ||
| amp_context: Option<(String, String)>, |
There was a problem hiding this comment.
Since we seem to be passing amp_context around with amp_client a lot, could we carry the context in the client? It would mostly require an additional method on the Client trait.
|
|
||
| /// Resolves a block pointer for an Amp subgraph by querying the Amp Flight | ||
| /// service for the block hash at the given `block_number`. | ||
| async fn resolve_amp_start_block<AC: amp::Client>( |
There was a problem hiding this comment.
This would be a good candidate for a method on amp::Client
| // Non-Amp subgraph — use RPC. | ||
| _ => resolve_start_block(&manifest, &*chain, &logger).await?, | ||
| } | ||
| } |
There was a problem hiding this comment.
The overall create_subgraph_version function is already not great, but it would be great if these 30 lines of fairly straightforward code lived somewhere in a helper. I am always worried about the readability of code that mixes control logic with basic "get me this thing" code.
| context_dataset: amp.context_dataset.clone(), | ||
| context_table: amp.context_table.clone(), | ||
| context_dataset: normalize_sql_ident(&.context_dataset), | ||
| context_table: normalize_sql_ident(&.context_table), |
There was a problem hiding this comment.
It's definitely good to do the normalization early - I am just not sure if we should do this, since SQL identifiers are case-sensitive, it's just that dealing with identifiers that are not all-lowercase is highly annoying since, at least in Postgres, you need to write select * from "MyTable" since it would interpret select * from MyTable as the same as select * from mytable. An alternative would be to use what the user gave us verbatim but enclose each component in double quotes (if it is not already in double-quotes, it gets complicated)
Either way, fine to leave this as-is for now, just something to think about.
| (0, _, _) => None, | ||
| // Amp subgraph with start_block > 0 — try Amp-based resolution. | ||
| (min, Some(client), Some((dataset, table))) => { | ||
| (min, Some(client), Some((dataset, table))) if is_amp_subgraph => { |
There was a problem hiding this comment.
Isn't there some bigger issue if is_amp_subgraph is true and client or context are None (and vice versa)? I think we should produce some sort of internal error for those cases rather than silently do something else (like talk to RPC)
Summary
Amp Flight service configuration moves from a single global endpoint (via
GRAPH_AMP_FLIGHT_SERVICE_ADDRESSenv var or--amp-flight-service-addressCLI flag) to per-chain TOML configuration under[chains.<name>.amp]. This allows each chain to connect to its own Amp Flight service with independent credentials and context tables, which is necessary for multi-chain deployments where different chains are served by different Amp instances.The per-chain config also makes the context table (used to join block hashes and timestamps into query results) explicit rather than discovered at runtime by probing source tables. Previously, manifest resolution would iterate over
source.tablesand try each one as a potential context source, silently skipping failures. Now the operator declarescontext_datasetandcontext_tablein the TOML config, and resolution fails clearly if the context table is missing or lacks required columns.A secondary change adds Amp-based start block resolution: when an Amp subgraph has
start_block > 0, the registrar queries the Amp Flight service for the block hash atstart_block - 1instead of making an RPC call to the chain. This falls back to RPC on failure.Breaking changes
GRAPH_AMP_FLIGHT_SERVICE_ADDRESSenv var and--amp-flight-service-addressCLI flagGRAPH_AMP_FLIGHT_SERVICE_TOKENenv var (nowtokenfield in TOML)ampfield on a chain section changes from a plain string (network alias) to a TOML table withaddress,context_dataset,context_table, and optionaltokenandnetworkfieldsNew config format