Skip to content

Refactor networking#66

Open
mpretty-cyro wants to merge 130 commits intosession-foundation:devfrom
mpretty-cyro:feature/lokinet-testing
Open

Refactor networking#66
mpretty-cyro wants to merge 130 commits intosession-foundation:devfrom
mpretty-cyro:feature/lokinet-testing

Conversation

@mpretty-cyro
Copy link
Collaborator

This PR refactors the session_network to be far more configurable and easier to extend, as well as fix a number of bugs which existed in the original implementation. The main interface has now been genericised with the routing and transport mechanisms abstracted from the client; the updated Request structure also makes it easier to pre-construct requests which should allow for abstracting more of the network requests in the future.

It also includes a number of new configuration options, some particularly useful ones include:

  • The ability to route requests through either Onion Requests, Lokinet or Direct to their destination
  • The ability to use a custom devnet environment

Note: This contains a breaking change for clients which don't currently use networking - ENABLE_ONIONREQ has been renamed to ENABLE_NETWORKING to be more accurate (this defaults to on so any clients which have it disabled will need to update their build flag)

jagerman and others added 30 commits June 29, 2025 18:36
Ports under 1024 are priveledged and were failing (at least on Linux)
when running as a normal user.
We already have oxen-libquic, oxen-logging, and nlohmann via lokinet, so
get them via that nested submodule rather than having a duplicated
submodule in libsession-util itself.

Also removes the macos workaround call to `oxen_logging_add_source_dir`
because that directive no longer does anything.
- LOKINET_EMBEDDED=ON is replaced with LOKINET_FULL=OFF
- LOKINET_BOOTSTRAP was removed
- LOKINET_DAEMON=OFF is not strictly needed (it should be default) but
  makes it clear what we're doing.
- Load libquic before oxenc/oxen-logging so that libquic has a chance to
  set up its oxen-logging, etc. targets before libsession tries.
- Remove unneeded settings to disable tests/docs (these are [now] the
  dependency defaults when not doing a top-level project build).
- Update to depend on proper lokinet::liblokinet target.
• Added missing config options
• Added exponential backoffs for retries (and a retry limit for path building)
• Fixed a couple of issues with the logic to finish refreshing the snode pool
• Fixed a use-after-move issue
• Fixed an issue where the OnionRequestRouter would start trying to make requests before the SnodePool bootstrap was completed
• Added a missing import
• Updated the OnionRequestRouter to wait for the SnodePool to be populated before allowing any requests to be sent
• Updated the SnodePool to make ephemeral connections to refresh it's cache (that way we won't always use seed node connections for subsequent requests on new accounts)
• Fixed some use-after-move issues
• Fixed an issue were the SnodePool bootstrap request response wasn't being handled
• Fixed an infinite loop with the OnionRequestRouter refreshing the SnodePool while a refresh was already running
• Fixed an edge-case where the SnodePool wouldn't trigger a refresh when all nodes are marked as failed
• Added parse and expose the general network settings the clients use (network_time_offset, hardfork_version, softfork_version)
• Added error handling from old logic
• Added 421 retry handling
• Fixed an issue where retrying the snode refresh would cause a deadlock
• Added initial LokinetRouter wrapper
• Added changes that were missing from previous commit
• Updated QuicTransport to be able to send requests to RemoteAddress directly
• Added factory functions for the FileServer endpoints the clients use
• Ran the formatter
• Fixed a linker error
• Fixed a bug where we were incorrectly reporting successful responses as failures
• Added a log when succeeding after a 421 retry (old code had it)
• Added logic to mark a node as failed after a QUIC handshake timeout
• Added a connection status hook and logic to track the connection status
• Added a function to retrieve the current active paths (TODO for the LokinetRouter)
• Added logic so the OnionRequestRouter can observe connection failures to it's guard nodes and trigger path rebuilds when they happen
• Fixed an issue where paths in 'single_path_mode' wouldn't get rebuild
jagerman and others added 13 commits February 24, 2026 19:30
Conflicts in how oxen::logging was being set up was causing an infinite
loop in target library recursion when doing a non-static build.
- We can magically get ==, !=, and inequality via a defaulted operator<=>
- fork_versions is getting put into an `std::atomic<fork_versions>`, and
  clang seems to be unhappy enough about that to add a dependency on
  libatomic (which we aren't currently linking, and so fails).  This
  changes it to 4-byte alignment so that clang knows it can safely do
  a 32-bit atomic exchange on modern CPUs.
Adds a ENABLE_NETWORKING_SROUTER option that controls whether
session-router support gets built when ENABLE_NETWORKING is also
enabled.

This option is disabled for now for mingw windoes CI builds because
session-router isn't currently building on windows (even in embedded
mode) and needs some fixes before it can be turned on, but it is nice to
ensure that the networking code (less session-router) still builds.

Also fixes a win32 compilation error (no such include sys/resource.h).
- Uses the `sys_ms` typedef where appropriate
- Adds epoch_count/_ms/_seconds and duration_count/_ms/_seconds and uses
  them everywhere to extract integers, so that the overly verbose:

      std::chrono::duration_cast<std::chrono::milliseconds>(
          some_timestamp.time_since_epoch()).count()

  found in oh so many places is now just:

      epoch_ms(some_timestamp)
- Instead of a shared_ptr<CancellationToken> we can just hold a
  shared_ptr<atomic<bool>> because that's all the CancellationToken was.
- Slight simplification to add a helper `cancel()` method on the request
- file_metadata uploaded/expiry times are now sys_seconds instead of
  full precision time points because we never have more than second
  precision for them.
- UploadRequest::ttl is now std::chrono::seconds, to keep the duration
  in a type-safe value longer.
- Default initialize the shared cancelled atomic bool
// Trigger the callback based on the response we got
if (resp.timed_out) {
log::debug(cat, "[Request {}] Timed out.", req_id);
return cb(false, true, 408, {content_type_plain_text}, "Request timed out");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to give these status values named constants somewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone through and tried to defined error variables for most of these cases (there are a couple of generic failures which have been left as -1 for now)

jagerman and others added 13 commits February 25, 2026 18:20
- the `Response` struct was never acting as a struct but rather just as
  a namespace for a couple of static methods, so I made it a namespace
  instead.
- Made those methods take a string_view instead of string to be able to
  avoid some string copying back at the call site.
- (unrelated) minor cleanup to use modern sys_seconds construction
  rather than older time_t approach.
• Cleaned up `exponential` delay function
• Cleaned up some duration usage
• Cleaned up a lookup-then-erase case
• Cleaned up the "reserved" stream logic and error handling when getting/creating streams
• Removed unnecessary "explicit" from opt constructors
• Removed the concept of an "ephemeral connection" (not really used in the way it used to be, and not used in the way we will eventually want to use it)
• Fixed up risky RequestQueue constructor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants