<fix>[expon]: fix vhost installPath overwrite and test cleanup#3360
<fix>[expon]: fix vhost installPath overwrite and test cleanup#3360ZStack-Robot wants to merge 51 commits into5.5.6from
Conversation
…isk offering Resolves: ZSTAC-74683 Change-Id: Id0339ed0221e92e506f60745cde972cc3ee6d9ae
When anti-split-brain check selects a disconnected MDS node, the HTTP call now times out after 30s instead of 5+ minutes, and automatically retries the next available MDS via tryNext mechanism. Resolves: ZSTAC-80595 Change-Id: I1be80f1b70cad1606eb38d1f0078c8f2781e6941
When MN restarts during a destroy operation, the hypervisor may report the VM as Stopped. Without this transition, the state machine throws an exception and the VM stays stuck in Destroying state forever. Resolves: ZSTAC-80620 Change-Id: I037edba70d145a44a88ce0d3573089182fedb162
…pacity Resolves: ZSTAC-79709 Change-Id: I45a2133bbb8c51c25ae3549d59e588976192a08d
Resolves: ZSTAC-78989 Change-Id: I0fe3a56ab724978944c69afadaab7ff7353e4c0f
Resolves: ZSTAC-82153 Change-Id: Ib51c2e21553277416d1a9444be55aca2aa4b2fc4
…fterAddIpAddress to prevent NPE during rollback Resolves: ZSTAC-81741 Change-Id: I53bcf20a10306afc7b6172da294d347b74e6c41f
Resolves: ZSTAC-81182 Change-Id: Id1bb642154dc66ae9995dcc4d9fc00cdce9bcaf8
Walkthrough此 PR 为两个外部主存储后端(Expon、Xinfini)添加方案配置与大量测试支持,调整若干存储、KVM、调度、核心与管理节点相关的健壮性与并发边界处理,并在多处增加空值/错误处理与日志改进(含若干接口行为细微更改)。 Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.41.0)plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javaplugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java[] utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.javaComment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy (1)
23-24:⚠️ Potential issue | 🟡 Minor注释中包含中文文本,违反编码规范。
第 23 行的注释
下午2:28包含中文,根据编码规范要求,代码中不应包含中文(包括注释)。建议修复
-/** - * `@author` Xingwei Yu - * `@date` 2024/4/19 下午2:28 - */ +/** + * `@author` Xingwei Yu + * `@date` 2024/4/19 14:28 + */As per coding guidelines:
**/*.*: 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写
🧹 Nitpick comments (3)
testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy (1)
1010-1020: 建议合并重复的 null 检查逻辑。在
/:flatten和/:rollback处理器中,vol == null的检查逻辑可以提前,避免先修改状态再检查 null 的情况。建议优化(以 /:flatten 为例)
// 14. POST /afa/v1/bs-volumes/{id}/:flatten - FlattenVolumeRequest simulator("/afa/v1/bs-volumes/\\d+/:flatten") { HttpServletRequest req, HttpEntity<String> e, EnvSpec spec -> int volId = extractIdFromUri(req.getRequestURI()) def vol = volumes.get(volId) - if (vol != null) { - vol.spec.flattened = true - } if (vol == null) { return makeNotFoundResponse() } + vol.spec.flattened = true return makeItemResponse(vol) }Also applies to: 1023-1030
test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/xinfini/XinfiniPrimaryStorageCase.groovy (1)
372-394: 考虑使用retryInSecs替代固定sleep(1000)。当前使用
sleep(1000)等待异步操作完成,这可能导致测试不稳定(在慢速环境下可能不够,在快速环境下又浪费时间)。建议参考第 396-399 行的retryInSecs模式。建议优化示例
- sleep(1000) - // vm in running, not deactivate volume - bdev = controller.apiHelper.queryBdcBdevByVolumeIdAndBdcId(volId, bdc.spec.id) - assert bdev != null + retryInSecs { + // vm in running, not deactivate volume + bdev = controller.apiHelper.queryBdcBdevByVolumeIdAndBdcId(volId, bdc.spec.id) + assert bdev != null + }plugin/expon/src/main/java/org/zstack/expon/sdk/ExponConnectConfig.java (1)
8-46: Add validation and normalization tosetSchememethod.The
schemefield is passed directly to OkHttp'sHttpUrl.Builder.scheme(), which only accepts"http"or"https". While OkHttp normalizes these case-insensitively, it is better to validate and normalize the input here to prevent potential issues and ensure consistency with RFC 3986. Also trim user input to handle accidental whitespace.♻️ Recommended implementation (validation + normalization)
-import java.util.concurrent.TimeUnit; +import java.util.Locale; +import java.util.concurrent.TimeUnit; @@ public Builder setScheme(String scheme) { - config.scheme = scheme; + if (scheme == null || scheme.trim().isEmpty()) { + config.scheme = "https"; + return this; + } + String normalized = scheme.trim().toLowerCase(Locale.ROOT); + if (!"http".equals(normalized) && !"https".equals(normalized)) { + throw new IllegalArgumentException("scheme must be http or https"); + } + config.scheme = normalized; return this; }
<fix>[vm]: add Destroying->Stopped state transition See merge request zstackio/zstack!9156
<fix>[i18n]: improve snapshot error message for unattached volume See merge request zstackio/zstack!9192
<fix>[zbs]: reduce mds connect timeout and enable tryNext for volume clients See merge request zstackio/zstack!9153
<fix>[vm]: use max of virtual and actual size for root disk allocation See merge request zstackio/zstack!9155
…talling In dual management node scenarios, concurrent modifications to the consistent hash ring from heartbeat reconciliation and canonical event callbacks can cause NodeHash/Nodes inconsistency, leading to message routing failures and task timeouts. Fix: (1) synchronized all ResourceDestinationMakerImpl methods to ensure atomic nodeHash+nodes updates, (2) added lifecycleLock in ManagementNodeManagerImpl to serialize heartbeat reconciliation with event callbacks, (3) added two-round delayed confirmation before removing nodes from hash ring to avoid race with NodeJoin events. Resolves: ZSTAC-77711 Change-Id: I3d33d53595dd302784dff17417a5b25f2d0f3426
<fix>[network]: filter reserved IPs in getFreeIp See merge request zstackio/zstack!9170
<fix>[ceph]: apply over-provisioning ratio when releasing snapshot capacity See merge request zstackio/zstack!9162
<fix>[network]: add null check for L3 network system tags in API interceptor See merge request zstackio/zstack!9169
The mdsUrls field in ExternalPrimaryStorage config contains user:password@host format credentials. Add desensitization to mask credentials as ***@host in API/CLI output. Resolves: ZSTAC-80664 Change-Id: I94bdede5a1b52eb039de70efb5458693484405f7
Add ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT constant to CloudOperationsErrorCode for use in premium volumebackup module. Resolves: ZSTAC-82195 Change-Id: Ibc405876e1171b637cf76b91a6822574fb6e7811
<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition See merge request zstackio/zstack!9154
SyncTaskFuture constructor calls Context.current() unconditionally, triggering ServiceLoader for ContextStorageProvider even when telemetry is disabled. If sentry-opentelemetry-bootstrap jar is on classpath, ServiceLoader fails with "not a subtype" due to ClassLoader isolation in Tomcat, throwing ServiceConfigurationError (extends Error) that escapes all catch(Exception) blocks. 1. Why is this change necessary? MN startup crashes with ORG_ZSTACK_CORE_WORKFLOW_10001 because Context.current() triggers ServiceLoader unconditionally in SyncTaskFuture constructor, even when telemetry is disabled. 2. How does it address the problem? Only call Context.current() when isTelemetryEnabled() returns true, matching the existing guard pattern used in other DispatchQueueImpl code paths (lines 351, 1069). 3. Are there any side effects? None. When telemetry is disabled, parentContext was never used. # Summary of changes (by module): - core/thread: conditionalize Context.current() in SyncTaskFuture Related: ZSTAC-82275 Change-Id: I5c0e1f15769c746c630028a29df8cf1815620608
<fix>[thread]: guard Context.current() with telemetry check See merge request zstackio/zstack!9202
Resolves: ZSTAC-82195 Change-Id: I3d5e91d09d7c088d3c53e3839f8b32f4bce32dec
<fix>[loadBalancer]: block SLB deletion during grayscale upgrade See merge request zstackio/zstack!9187
<fix>[volumebackup]: add backup cancel timeout error code See merge request zstackio/zstack!9200
<fix>[core]: add @nologging to sensitive config fields See merge request zstackio/zstack!9171
…plit-brain When a management node departs, its VM skip-trace entries were immediately removed. If VMs were still being started by kvmagent, the next VM sync would falsely detect them as Stopped and trigger HA, causing split-brain. Fix: transfer departed MN skip-trace entries to an orphaned set with 10-minute TTL instead of immediate deletion. VMs in the orphaned set remain skip-traced until the TTL expires or they are explicitly continued, preventing false HA triggers during MN restart scenarios. Resolves: ZSTAC-80821 Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34
…anup Resolves: ZSTAC-80821 Change-Id: I59284c4e69f5d2ee357b1836b7c243200e30949a
Resolves: ZSTAC-77544 Change-Id: I1f711bff9c1e87a8cbf6a2eb310ca6086f0f99ba
Resolves: ZSTAC-80821 Change-Id: Ia9a9597feceb96b3e6e22259e2d0be7bde8ae499
…V bond check add new error code constant for SR-IOV bond slave NIC count validation. Resolves: ZSTAC-81163 Change-Id: Ie2a74411129a98c3c03a4a085e94a3bd45922da5
Resolves: ZSTAC-81354 Change-Id: Iff2131b3a878444fa27641f24dd727fe4fa176fb
- UpdateQueryImpl: guard val.getClass() NPE - LogSafeGson: return JsonNull when input is null - HostAllocatorChain: null check completion - VmInstanceVO: use Objects.equals to avoid NPE - SessionManagerImpl: guard null session - VmCapabilitiesJudger: guard null PS type result - CephPSMonBase: guard null self when evicted - CephPSBase: guard null mon.getSelf() - HostBase: guard null self when HostVO deleted - ExternalPSFactory: guard null URI protocol - LocalStorageBase: guard null errorCode.getCause() Resolves: ZSTAC-69300, ZSTAC-69957, ZSTAC-71973, ZSTAC-81294, ZSTAC-70180, ZSTAC-70181, ZSTAC-78309, ZSTAC-78310, ZSTAC-70668, ZSTAC-71909, ZSTAC-80555, ZSTAC-81270, ZSTAC-70101, ZSTAC-72034, ZSTAC-73197, ZSTAC-79921, ZSTAC-81160, ZSTAC-81224, ZSTAC-81805, ZSTAC-72304, ZSTAC-81804, ZSTAC-74898, ZSTAC-69215, ZSTAC-70151, ZSTAC-68933 Change-Id: I910e9b542ecd254fdf7e956f943316988a56a1f9
- LdapUtil: CRE to OperationFailureException - QueryFacadeImpl: CRE to OperationFailureException - HostAllocatorManagerImpl: CRE to warn + clamp - CloudOperationsErrorCode: add LDAP/PROMETHEUS codes Resolves: ZSTAC-81334 Change-Id: Iab947b0476e9174d5a61baa095847b521b1f59fa
<fix>[pciDevice]: add error code ORG_ZSTACK_PCIDEVICE_10077 See merge request zstackio/zstack!9205
Add ORG_ZSTACK_AI_10134 for ModelCenter disconnected check and ORG_ZSTACK_PCIDEVICE_10077 for SR-IOV bond validation. Resolves: ZSTAC-72783 Change-Id: I504a415a6e822513df955be600188ae88e2e1058
fix(ai): add error codes for AI and PCI modules [ZSTAC-72783] See merge request zstackio/zstack!9215
<fix>[multi]: batch fix CRE quality issues See merge request zstackio/zstack!9214
<fix>[volumebackup]: add backup cancel timeout error code See merge request zstackio/zstack!9204
Companion DB migration for premium MR !13025. Adds normalizedModelName column and index to GpuDeviceSpecVO for GPU spec dedup by normalized model name. Resolves: ZSTAC-75319 Change-Id: If15e615bcbda955cc1d6c58527bae27d4af4b497
<fix>[compute]: respect vm.migrationQuantity during host maintenance See merge request zstackio/zstack!9209
…ttachedPSMountPath NPE Resolves: ZSTAC-69300 Change-Id: I1b39a9e7b76751e8a4ef4cc53e9ac2028386e334
Resolves: ZSTAC-61988 Change-Id: Id3d5a48801fda21e2a51d96949c743bac254b2e6
<fix>[kvm]: configurable orphan skip timeout See merge request zstackio/zstack!9203
Resolves: ZSTAC-61988 Change-Id: I0908fce97128904f9954198645290f4e5709252e
|
Comment from yaohua.wu: ⚡ 更新 Review — 本次 Review 替代以下之前的 Review Review: MR !9194 — ZSTAC-82153MR 范围: 23 文件变更,涵盖 CBD 路径覆盖修复、Expon/Xinfini scheme 可配置化、LongJob 挂起竞态修复、查询分页稳定性、配额空指针防护、VM 设备信息同步、DHCP/Userdata 空值防护、ApplianceVm 设备名修正、SQL 数据清理迁移、测试重构。 Jira 背景: ZSTAC-82153 — 重构 Expon/Xinfini Case 使其不依赖真实存储环境。MR 实际包含远超 Jira 描述的多项生产修复。 🟡 Warning
🟢 Suggestion
✅ 亮点
Upstream Freshness
Verdict: APPROVED with warnings核心修复(CBD 路径覆盖、LongJob 挂起态、分页稳定性、配额 NPE)方向正确,测试重构质量高。Warning 项建议逐条评估,尤其是 GROUP BY + ORDER BY 顺序问题和 LongJob TOCTOU 窗口。 🤖 Robot Reviewer |
<fix>[multi]: batch guard NPE quality issues See merge request zstackio/zstack!9213
<fix>[telemetry]: fix Sentry transaction loss and add debug logging See merge request zstackio/zstack!9220
<fix>[gpu]: add normalizedModelName migration SQL See merge request zstackio/zstack!9218
<fix>[zbs]: sync MDS node statuses to DB when reconnect fails See merge request zstackio/zstack!9161
Add missing error codes ORG_ZSTACK_STORAGE_PRIMARY_10048 and ORG_ZSTACK_STORAGE_PRIMARY_10051 to all 10 language files. Fix zh_CN mistranslations replaced with correct term. Fix zh_TW garbled characters in error messages. Resolves: ZSTAC-72656 Change-Id: I5f08109d1c415b751ec130285b9d92522f1e0a34
fix(ZSTAC-72656): improve i18n error messages for PS UUID conflicts See merge request zstackio/zstack!9224
Fix CBD KvmCbdNodeServer overwriting vhost volume installPath and add Expon/Xinfini storage API simulators for integration test. 1. Why is this change necessary? KvmCbdNodeServer.convertPathIfNeeded unconditionally returned the original installPath for non-CBD protocols, causing convertAndSetPathIfNeeded to overwrite the /var/run/vhost path that KvmVhostNodeServer had already set. This made the ExponPrimaryStorageCase test fail on installPath assertion. Additionally, vol2 was not cleaned up in testClean, causing a ConstraintViolationException during env.delete. 2. How does it address the problem? - Changed convertPathIfNeeded to return null for non-CBD protocols so the setter is skipped - Added vol2 cleanup in testClean to prevent FK violation - Added Expon API simulators in ExternalPrimaryStorageSpec - Updated ExponStorageController and SDK for test support - Added ExternalPrimaryStorageFactory null host guard - Added XinfiniPrimaryStorageCase integration test 3. Are there any side effects? None. The CBD fix only affects non-CBD protocol volumes which should not have been modified by CBD code. # Summary of changes (by module): - cbd: fix convertPathIfNeeded to skip non-CBD protocols - expon: add scheme support for ExponStorageController - expon/sdk: update ExponClient and ExponConnectConfig - storage: add null host check in ExternalPSFactory - test: fix vol2 cleanup and update test assertions - test: add XinfiniPrimaryStorageCase - testlib: add Expon/Xinfini API simulators Related: ZSTAC-82153 Change-Id: I14f8d5c9155dccf2803566d133813ca5675feb76
da1858b to
b217085
Compare
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java (1)
79-93:⚠️ Potential issue | 🔴 CriticalBug:
Map.put()返回的是旧值(null),导致缓存未命中时getNodeInfo始终返回null。
Map.put(key, value)的返回值是该 key 之前关联的旧值。由于进入此分支时nodes中不存在该 key,put返回null,因此info被赋为null,第 92 行返回null。调用方拿到null后极有可能触发 NPE。🐛 修复:将 put 的结果与返回值分离
public synchronized NodeInfo getNodeInfo(String nodeUuid) { NodeInfo info = nodes.get(nodeUuid); if (info == null) { ManagementNodeVO vo = dbf.findByUuid(nodeUuid, ManagementNodeVO.class); if (vo == null) { throw new ManagementNodeNotFoundException(nodeUuid); } nodeHash.add(nodeUuid); - info = nodes.put(nodeUuid, new NodeInfo(vo)); + info = new NodeInfo(vo); + nodes.put(nodeUuid, info); } return info; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java` around lines 79 - 93, The getNodeInfo method incorrectly assigns the return of nodes.put(...) (which is the previous value, null on cache miss) to info; change the flow in getNodeInfo so that after loading ManagementNodeVO via dbf.findByUuid(...) and creating a new NodeInfo(vo) you store that new instance into a local variable (e.g. newInfo), add nodeUuid to nodeHash, put newInfo into the nodes map, and then return newInfo; keep the ManagementNodeNotFoundException behavior and the synchronized signature intact (references: getNodeInfo, NodeInfo, nodes, nodeHash, ManagementNodeVO, dbf.findByUuid).compute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.java (1)
56-64:⚠️ Potential issue | 🟡 Minor
UsedIpVO更新发生在 null 检查之前,可能产生悬挂外键引用在
afterAddIpAddress中,第 58 行的 SQL 无条件地将UsedIpVO.vmNicUuid设为vmNicUUid,而这一步在第 61 行的 null 检查之前执行。如果此时VmNicVO已被删除(即nic == null),UsedIpVO就会持有一个指向不存在的 NIC UUID 的外键,造成数据孤立。相比之下,
afterDelIpAddress中 null 检查被正确地放在方法最开头(第 95 行),两者处理策略不一致。建议将第 58 行的 SQL 更新移到 null 检查之后(或在 null 时同样跳过该更新):
🐛 建议修复顺序
`@Override` public void afterAddIpAddress(String vmNicUUid, String usedIpUuid) { - /* update UsedIpVO */ - SQL.New(UsedIpVO.class).eq(UsedIpVO_.uuid, usedIpUuid).set(UsedIpVO_.vmNicUuid, vmNicUUid).update(); - VmNicVO nic = Q.New(VmNicVO.class).eq(VmNicVO_.uuid, vmNicUUid).find(); if (nic == null) { logger.debug(String.format("VmNic[uuid:%s] not found, skip afterAddIpAddress", vmNicUUid)); return; } + + /* update UsedIpVO */ + SQL.New(UsedIpVO.class).eq(UsedIpVO_.uuid, usedIpUuid).set(UsedIpVO_.vmNicUuid, vmNicUUid).update();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.java` around lines 56 - 64, The SQL update that sets UsedIpVO.vmNicUuid is executed before verifying the VmNicVO exists, risking a dangling FK; in the afterAddIpAddress method move the SQL.New(UsedIpVO.class)...update() call to after the VmNicVO nic null-check (the Q.New(VmNicVO.class).eq(...).find() block) or guard it with an early return so that when nic == null you skip updating UsedIpVO; update references: afterAddIpAddress, UsedIpVO, VmNicVO, vmNicUUid, usedIpUuid.header/src/main/java/org/zstack/header/vm/VmAbnormalLifeCycleStruct.java (2)
53-60:⚠️ Potential issue | 🔴 Critical
VmRunningFromUnknownStateHostChanged的匹配条件与其名称语义相反(疑似 copy-paste 错误)
VmRunningFromUnknownStateHostChanged(第 53–60 行)与VmRunningFromUnknownStateHostNotChanged(第 45–52 行)的match()条件完全相同,均断言currentHostUuid == originalHostUuid(宿主机未变更)。由于
getVmAbnormalLifeCycleOperationFromStruct使用findFirst(),VmRunningFromUnknownStateHostChanged永远不会被返回:在宿主机未变更时VmRunningFromUnknownStateHostNotChanged先匹配;而在宿主机真正变更时,两者均不匹配,该枚举值成为死代码。根据命名语义,条件中的相等判断应取反:
🐛 建议修复
VmRunningFromUnknownStateHostChanged { `@Override` boolean match(VmAbnormalLifeCycleStruct struct) { return struct.getOriginalState() == VmInstanceState.Unknown && struct.getCurrentState() == VmInstanceState.Running - && struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid()); + && !Objects.equals(struct.getCurrentHostUuid(), struct.getOriginalHostUuid()); } },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/VmAbnormalLifeCycleStruct.java` around lines 53 - 60, The enum value VmRunningFromUnknownStateHostChanged in VmAbnormalLifeCycleStruct has the same match() predicate as VmRunningFromUnknownStateHostNotChanged, causing it to never match; update VmRunningFromUnknownStateHostChanged.match() to invert the host comparison so it returns true when struct.getCurrentHostUuid() does NOT equal struct.getOriginalHostUuid(), leaving other checks (original state == Unknown and current state == Running) unchanged so getVmAbnormalLifeCycleOperationFromStruct can correctly find the "host changed" case.
42-42:⚠️ Potential issue | 🟠 Major此次 null 安全修复不完整——其余 10 处
.equals()调用仍存在 NPE 风险本 PR 仅修复了
VmStoppedFromPausedStateHostNotChanged(第 75 行),但以下 10 个match()方法仍在currentHostUuid可能为null时直接调用.equals(),会抛出NullPointerException:
枚举常量 行号 危险调用 VmPausedFromUnknownStateHostNotChanged42 struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid())VmRunningFromUnknownStateHostNotChanged50 struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid())VmRunningFromUnknownStateHostChanged58 struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid())VmStoppedFromUnknownStateHostNotChanged67 struct.getCurrentHostUuid().equals(struct.getVmLastHostUuid())VmPausedFromRunningStateHostNotChanged107 struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid())VmRunningFromPausedStateHostNotChanged115 struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid())VmPausedFromMigratingStateHostNotChanged123 struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid())VmCrashedFromRunningStateHostNotChanged131 struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid())VmRunningFromCrashedStateHostNotChanged139 struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid())VmStoppedFromCrashedStateHostNotChanged155 struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid())请将上述所有调用统一替换为
Objects.equals(),与已修复的位置保持一致。🛡️ 建议修复(以 VmPausedFromRunningStateHostNotChanged 为例,其余同理)
- && struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid()); + && Objects.equals(struct.getCurrentHostUuid(), struct.getOriginalHostUuid());Also applies to: 50-50, 58-58, 67-67, 107-107, 115-115, 123-123, 131-131, 139-139, 155-155
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/VmAbnormalLifeCycleStruct.java` at line 42, The fix is incomplete: multiple enum constants' match() implementations call struct.getCurrentHostUuid().equals(...) and can NPE when currentHostUuid is null; update each listed enum constant (VmPausedFromUnknownStateHostNotChanged, VmRunningFromUnknownStateHostNotChanged, VmRunningFromUnknownStateHostChanged, VmStoppedFromUnknownStateHostNotChanged, VmPausedFromRunningStateHostNotChanged, VmRunningFromPausedStateHostNotChanged, VmPausedFromMigratingStateHostNotChanged, VmCrashedFromRunningStateHostNotChanged, VmRunningFromCrashedStateHostNotChanged, VmStoppedFromCrashedStateHostNotChanged) to use Objects.equals(struct.getCurrentHostUuid(), struct.getOriginalHostUuid()) or Objects.equals(struct.getCurrentHostUuid(), struct.getVmLastHostUuid()) as appropriate (same pattern used in the already-fixed VmStoppedFromPausedStateHostNotChanged); import java.util.Objects if not present.identity/src/main/java/org/zstack/identity/AuthorizationManager.java (1)
112-122:⚠️ Potential issue | 🟡 Minor第 114 行
DebugUtils.Assert抛出的异常类型与新增检查不一致
DebugUtils.Assert(session != null, ...)在session == null时会抛出RuntimeException,而第 119–122 行新增的检查抛出的是ApiMessageInterceptionException。虽然这两个代码路径都涉及 session 为空的验证,但异常类型不同,会导致上层调用者的错误处理逻辑不一致。根据 ZStack 的设计规范,API 拦截层的参数验证应当统一使用
ApiMessageInterceptionException以保持一致性。建议将第 114 行的DebugUtils.Assert替换为相同的ApiMessageInterceptionException抛出,或者在 Assert 失败时捕获RuntimeException并转换为ApiMessageInterceptionException,确保两个路径使用统一的异常类型。此外,第 66 行和第 120 行均复用了错误码
ORG_ZSTACK_IDENTITY_10012,但两处的语义略有不同(前者是原始 session 为 null,后者是 renewSession 返回 null),可以考虑使用不同的错误码加以区分,便于问题诊断。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@identity/src/main/java/org/zstack/identity/AuthorizationManager.java` around lines 112 - 122, Replace the DebugUtils.Assert in AuthorizationManager where IdentityByPassCheck.NoSessionEvaluation is handled so that when session == null it throws an ApiMessageInterceptionException (using the same err(...) helper) instead of a RuntimeException; locate the block referencing IdentityByPassCheck.NoSessionEvaluation and msg.getSession(), and change the assertion to throw new ApiMessageInterceptionException(err(ORG_ZSTACK_IDENTITY_10012, IdentityErrors.INVALID_SESSION, "NoSessionEvaluation set but no session set in the message: %s", CloudBusGson.toJson(msg))) or alternatively catch the RuntimeException from DebugUtils.Assert and rethrow an ApiMessageInterceptionException to ensure both the NoSessionEvaluation path and the evaluateSession path use the same exception type (and consider using distinct error codes if you want to differentiate original-null vs renewed-null cases).
🧹 Nitpick comments (11)
core/src/main/java/org/zstack/core/log/LogSafeGson.java (1)
211-215: LGTM — null 防护正确返回
JsonNull.INSTANCE是 Gson 表示 JSON null 的惯用方式,可防止集合元素为null时(L191、L202)触发 NPE,修复合理。不过,相邻的
toJson(Object o)(L218)存在相同的隐患——若o为null则o.getClass()仍会抛出 NPE。目前其唯一调用点desensitize(L246)已在外层做了 null 检查,暂无风险,但建议保持一致性,同步加上 null 防护:♻️ 建议同步修复
toJsonpublic static String toJson(Object o) { + if (o == null) { + return logSafeGson.toJson(JsonNull.INSTANCE); + } return logSafeGson.toJson(o, getGsonType(o.getClass())); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/log/LogSafeGson.java` around lines 211 - 215, The toJson(Object o) method in LogSafeGson can NPE when o is null because it calls o.getClass(); add a null guard at the start of LogSafeGson.toJson(Object o) (similar to the existing toJsonTree(Object o) check) and return the appropriate JSON null representation (e.g. the JSON string "null" or use logSafeGson.toJson(JsonNull.INSTANCE)) instead of calling getGsonType(o.getClass()); ensure calls like desensitize continue to work unchanged.core/src/main/java/org/zstack/core/db/UpdateQueryImpl.java (1)
57-58: LGTM — NPE 修复正确。当
val为null时,null instanceof Collection在 Java 中返回false,因此异常分支确实可达;此次修改通过三元表达式安全地避免了val.getClass()的 NPE,逻辑正确。可选优化:非 null 分支中
val.getClass()会经Class.toString()产生class fully.qualified.Name这种带class前缀的冗余格式,使用val.getClass().getName()可使输出更简洁。这是已有行为而非此次引入的问题,可按需改动。💡 可选改进(非阻塞)
- throw new CloudRuntimeException(String.format("for operation IN or NOT IN, a Collection value is expected, but %s got", - val == null ? "null" : val.getClass())); + throw new CloudRuntimeException(String.format("for operation IN or NOT IN, a Collection value is expected, but %s got", + val == null ? "null" : val.getClass().getName()));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/db/UpdateQueryImpl.java` around lines 57 - 58, The exception message in UpdateQueryImpl's throw currently uses val.getClass() which prints the redundant "class ..." prefix; change it to use val.getClass().getName() so the message shows the simple fully-qualified class name (and keep the existing null-safe ternary check that avoids NPE when val is null).search/src/main/java/org/zstack/query/QueryFacadeImpl.java (1)
381-381: 建议保留原始异常上下文,避免排障信息丢失Line 381 当前只透传
e.getMessage(),如果上游未记录日志会丢失堆栈信息,定位线上查询失败会比较困难。建议在此处先记录带异常对象的日志,再抛出OperationFailureException。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@search/src/main/java/org/zstack/query/QueryFacadeImpl.java` at line 381, In QueryFacadeImpl replace the current throw of OperationFailureException that only uses e.getMessage() by first logging the failure with the exception object (e.g., logger.error("failed to query", e)) and then rethrowing an OperationFailureException that preserves the original cause (attach e as the cause or include the exception in the constructor) so the stacktrace is retained for debugging; update the throw site where OperationFailureException is created to include the original exception.core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java (1)
80-93: 在synchronized块内执行数据库查询可能导致锁竞争。
getNodeInfo在持有this监视器锁的同时调用dbf.findByUuid,这会阻塞所有其他同步方法(包括makeDestination、isManagedByUs等高频路径)直到 DB 查询完成。如果未来调用频率增加,可以考虑先在锁外查询 DB,再在锁内做 double-check 后写入缓存。当前管理节点数量有限,实际影响不大。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java` around lines 80 - 93, getNodeInfo currently holds the synchronized lock while calling dbf.findByUuid, causing unnecessary lock contention; change it to first check nodes.get(nodeUuid) and if missing call dbf.findByUuid outside the synchronized block, then enter synchronized(this) and double-check nodes.get(nodeUuid) again; if still null validate vo (throw ManagementNodeNotFoundException if vo is null), create the new NodeInfo from ManagementNodeVO, add nodeUuid to nodeHash and put the new NodeInfo into nodes (use the newly created NodeInfo as the value to insert and return it), otherwise return the cached NodeInfo—refer to getNodeInfo, nodes, nodeHash, dbf.findByUuid, ManagementNodeVO, NodeInfo, and ManagementNodeNotFoundException.compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.java (1)
149-149:t != null检查是无效的死代码在
catch (Throwable t)块中,JVM 规范保证捕获的异常引用t永远不会为 null,因此三元表达式的"unknown error"分支永远不会被执行,这会误导阅读代码的人。♻️ 建议修复
- String errMsg = t != null ? t.toString() : "unknown error"; + String errMsg = t.toString();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.java` at line 149, In HostAllocatorChain's catch (Throwable t) block replace the redundant ternary that checks "t != null ? t.toString() : 'unknown error'" with a direct use of the caught throwable (e.g., t.toString() or t.getMessage()) to form errMsg; remove the dead null branch so the assignment to errMsg uses the guaranteed-non-null t (reference: variable errMsg and the catch (Throwable t) in HostAllocatorChain).core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java (1)
58-58: 建议将1000提取为具名常量,避免魔法值。这有助于表达意图,并降低后续调整成本。
♻️ 建议重构
public final class SentryInitHelper { private static final CLogger logger = Utils.getLogger(SentryInitHelper.class); private static final Object INIT_LOCK = new Object(); + private static final int DEFAULT_SENTRY_MAX_QUEUE_SIZE = 1000; @@ - options.setMaxQueueSize(1000); + options.setMaxQueueSize(DEFAULT_SENTRY_MAX_QUEUE_SIZE);As per coding guidelines: “避免使用魔法值(Magic Value):直接使用未经定义的数值或字符串应替换为常量或枚举。”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java` at line 58, Replace the magic literal 1000 used in options.setMaxQueueSize(1000) with a well-named constant (e.g., MAX_QUEUE_SIZE) declared near the top of SentryInitHelper (static final int MAX_QUEUE_SIZE = 1000) and then use options.setMaxQueueSize(MAX_QUEUE_SIZE) so the intent is clear and the value is easy to adjust.plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java (3)
513-525:getOrphanTtlMs()在循环内被重复调用,建议提至循环外与
nodeLeft()中同样的问题:getOrphanTtlMs()(第 520 行)在for循环内每次迭代均读取全局配置,应在循环前缓存一次以避免冗余调用。♻️ 建议修改
- long now = System.currentTimeMillis(); - for (Map.Entry<String, Long> entry : orphanedSkipVms.entrySet()) { - if (now - entry.getValue() >= getOrphanTtlMs()) { + long now = System.currentTimeMillis(); + long ttlMs = getOrphanTtlMs(); + for (Map.Entry<String, Long> entry : orphanedSkipVms.entrySet()) { + if (now - entry.getValue() >= ttlMs) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java` around lines 513 - 525, The loop in cleanupExpiredOrphanedSkipVms() repeatedly calls getOrphanTtlMs(), causing redundant global config reads; cache the TTL once before the for-loop (e.g., long ttl = getOrphanTtlMs()) and use ttl inside the loop when comparing (referencing orphanedSkipVms, getOrphanTtlMs(), and cleanupExpiredOrphanedSkipVms()) so you avoid repeated calls as done similarly in nodeLeft().
469-477: 将getOrphanTtlMs()调用提至循环外;消除魔法值60000
getOrphanTtlMs()在for循环内(第 475 行)每次迭代均读取全局配置,应在循环开始前缓存一次。60000(毫秒/分钟)属于魔法值,应提取为具名常量(同一问题也出现在isVmDoNotNeedToTrace第 504-505 行),违反编码规范中"避免使用魔法值"的要求。♻️ 建议修改
在类中增加常量(或与 504 行共用):
+ private static final long MILLIS_PER_MINUTE = 60_000L;循环中修改:
- long now = System.currentTimeMillis(); - for (String vmUuid : skippedVms) { - orphanedSkipVms.put(vmUuid, now); - logger.info(String.format("moved VM[uuid:%s] from departed MN[uuid:%s] skip list to orphaned set" + - " (will expire in %d minutes)", vmUuid, inv.getUuid(), getOrphanTtlMs() / 60000)); - } + long now = System.currentTimeMillis(); + long ttlMinutes = getOrphanTtlMs() / MILLIS_PER_MINUTE; + for (String vmUuid : skippedVms) { + orphanedSkipVms.put(vmUuid, now); + logger.info(String.format("moved VM[uuid:%s] from departed MN[uuid:%s] skip list to orphaned set" + + " (will expire in %d minutes)", vmUuid, inv.getUuid(), ttlMinutes)); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java` around lines 469 - 477, Cache the result of getOrphanTtlMs() before iterating skippedVms and compute the minutes value once to avoid repeated global config reads, and replace the hardcoded 60000 with a named constant (e.g., MS_PER_MINUTE) defined in KvmVmSyncPingTask (or a shared constant used by isVmDoNotNeedToTrace) so the logger uses the cached minutes value; apply the same replacement of the magic 60000 in isVmDoNotNeedToTrace (lines ~504-505) to use the named constant.
501-506: 日志文案 "expired after %d minutes" 语义不准确此处
getOrphanTtlMs() / 60000得到的是配置的 TTL 时长,而非实际已过去的时间。日志文字 "expired after %d minutes" 容易被误读为"已经过去了多少分钟",建议改为描述 TTL 本身的措辞,例如 "TTL of %d minutes"。同时,此处的60000也应使用上方建议的具名常量替代。♻️ 建议修改
- logger.info(String.format("orphaned skip entry for VM[uuid:%s] expired after %d minutes, resuming trace", - vmUuid, getOrphanTtlMs() / 60000)); + logger.info(String.format("orphaned skip entry for VM[uuid:%s] expired (TTL %d minutes), resuming trace", + vmUuid, getOrphanTtlMs() / MILLIS_PER_MINUTE));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java` around lines 501 - 506, The log message in KvmVmSyncPingTask uses getOrphanTtlMs() / 60000 and "expired after %d minutes" which is misleading; update the logger.info call that references vmUuid and getOrphanTtlMs() so it reports the configured TTL (e.g., "TTL of %d minutes") rather than elapsed time, and replace the magic number 60000 with a named constant (e.g., MILLIS_PER_MINUTE) defined near the class or file; keep the orphanedSkipVms.remove(vmUuid, orphanedAt) behavior unchanged.testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy (1)
298-301: 建议在registerSimulators()开头调用clear()重置静态状态。当前状态容器是 static;若同一 JVM 连续跑多个用例,历史数据可能串入后续用例,影响稳定性和可复现性。
♻️ 建议修改
`@Override` void registerSimulators(EnvSpec espec) { + clear() def simulator = { arg1, arg2 -> espec.simulator(arg1, arg2) }`@Override` void registerSimulators(EnvSpec espec) { + clear() def simulator = { arg1, arg2 -> espec.simulator(arg1, arg2) }Also applies to: 902-905
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy` around lines 298 - 301, 在 ExternalPrimaryStorageSpec 的 registerSimulators() 方法开头调用静态状态容器的 clear() 方法以重置全局/静态状态,防止同 JVM 中多次用例运行时历史数据污染后续用例;定位到 registerSimulators(...) 并在其第一行添加对相应静态容器的 clear() 调用(并同样在另一个相同模式的位置,即文件中约 902-905 行对应的 registerSimulators 实现处也做相同修改)。portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java (1)
893-896: 异常分支建议补齐堆栈日志并设置明确回退主机名。Line 893-896 目前只记录文本日志且
hostName可能保持null。建议在 catch 分支带上异常对象,并写入可识别回退值,避免下游监听器处理空值时出现隐性分支。可参考的最小改动
- try { - inv.setHostName(destinationMaker.getNodeInfo(nodeUuid).getNodeIP()); - } catch (Exception e) { - logger.warn(String.format("cannot get node info for node[uuid:%s], use empty hostname", nodeUuid)); - } + try { + if (destinationMaker.getNodeInfo(nodeUuid) != null) { + inv.setHostName(destinationMaker.getNodeInfo(nodeUuid).getNodeIP()); + } else { + inv.setHostName("unknown"); + } + } catch (Exception e) { + inv.setHostName("unknown"); + logger.warn(String.format("cannot get node info for node[uuid:%s], use fallback hostname", nodeUuid), e); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java` around lines 893 - 896, In ManagementNodeManagerImpl inside the catch for destinationMaker.getNodeInfo(nodeUuid), update the exception handling so the logger.warn call includes the caught Exception object (e) and set a clear fallback hostname via inv.setHostName (e.g., "unknown-host" or ""), so hostName is never left null; reference the existing inv.setHostName, destinationMaker.getNodeInfo, and logger.warn identifiers when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.java`:
- Around line 150-156: The code in HostAllocatorChain handling errors silently
swallows Throwables when isDryRun is true and dryRunCompletion==null (or when
completion==null), causing callers to hang; change the logic in the error path
to mirror done(): remove the null guards and always call
dryRunCompletion.fail(...) or completion.fail(...) (or if you prefer to keep
guards, add an error-level log and an assertion/fail-fast when the completion is
null before returning) so that Throwable is not silently dropped; update the
error branch around isDryRun/dryRunCompletion/completion to ensure a fail(...)
is invoked (or a clear error+assert) using the same identifiers used in the
diff.
In `@compute/src/main/java/org/zstack/compute/host/HostBase.java`:
- Line 1445: The deduplicate key currently returns a fixed
"connect-host-unknown" when self is null, causing different hosts to share the
same key; change the key generation in HostBase to prefer msg.getHostUuid() as
the fallback (i.e., use self.getUuid() if self != null, else msg.getHostUuid()),
and only fall back to a literal "unknown" if both are null so each host's
connect-host-{uuid} key remains unique.
In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java`:
- Around line 67-69: 在 SentryInitHelper 中不要把 Sentry 的 ERROR 级别降为 WARN;在处理 Sentry
日志级别的映射(在 switch 分支里处理 case ERROR/ WARNING/ default 的代码块,例如当前通过
logger.warn("[Sentry] " + formatted) 处理 ERROR 和 WARNING 的位置)将 ERROR 映射为
logger.error("[Sentry] " + formatted),将 WARNING 映射为 logger.warn("[Sentry] " +
formatted),保持 default(或 INFO)映射为 logger.info("[Sentry] " +
formatted);同样修正文件中另一个相同映射块(也就是你提到的 76-79 附近的 switch 分支)以保持一致。
In `@network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java`:
- Around line 1081-1082: The removal logic for freeIpInventorys calls
NetworkUtils.isInRange against every reservedIpRanges entry regardless of IP
family, causing exceptions in dual-stack scenarios; update the predicate used in
freeIpInventorys.removeIf (the code that iterates freeIp ->
reservedIpRanges.stream().anyMatch(...)) to first compare IP versions (e.g.,
derive family from freeIp.getIp() and the reserved range start/end or use an
existing IP version field) and only call NetworkUtils.isInRange when the
versions match, ensuring compatibility for both IPv4 and IPv6 and preventing
APIGetFreeIp runtime errors.
- Around line 1079-1085: 当前实现是先按 limit 聚合各网段的 freeIpInventorys 再一次性过滤
reservedIpRanges(使用 NetworkUtils.isInRange),导致可能在达到 limit 后才删除保留段,返回的数量低于
msg.getLimit();修改方法:在遍历各网段并向 freeIpInventorys 添加每段结果的地方(相关代码使用 freeIpInventorys
集合和添加逻辑),在将单段 IP 加入集合前先用 reservedIpRanges.stream().anyMatch(...) 或等价逻辑对每个 freeIp
进行过滤(使用 NetworkUtils.isInRange 和 ReservedIpRangeVO 的
getStartIp/getEndIp),然后再判断是否已达到 msg.getLimit() 并决定是否 break;确保最后调用
reply.setInventories(freeIpInventorys) 的集合已经是过滤后的结果。
In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageAllocatorFactory.java`:
- Around line 388-393: The code in LocalStorageAllocatorFactory reuses error
code ORG_ZSTACK_STORAGE_PRIMARY_LOCAL_10023 for both URI parse failures and
unsupported protocol cases; update the unsupported-protocol branch (where parser
== null and the OperationFailureException is thrown using argerr) to use a new
distinct error code (e.g., ORG_ZSTACK_STORAGE_PRIMARY_LOCAL_10024) and add that
new error code constant to CloudOperationsErrorCode, keeping the existing
message text and leaving the parseUri/URISyntaxException path using
ORG_ZSTACK_STORAGE_PRIMARY_LOCAL_10023 so callers can distinguish malformed-URI
vs unsupported-protocol errors.
In `@plugin/zbs/src/main/java/org/zstack/storage/zbs/ZbsStorageController.java`:
- Around line 1365-1381: syncMdsStatuses can NPE when newAddonInfo.getMdsInfos()
or addonInfo.getMdsInfos() is null; add null/empty guards before iterating: in
ZbsStorageController.syncMdsStatuses check newAddonInfo != null &&
newAddonInfo.getMdsInfos() != null && !newAddonInfo.getMdsInfos().isEmpty() and
also ensure addonInfo != null && addonInfo.getMdsInfos() != null (or treat
missing lists as Collections.emptyList()) before the nested loops that reference
getMdsInfos(); keep the existing existMds.getAddr().equals(newMds.getAddr())
logic and still perform the SQL.New(...).set(...).update() call only after safe
iteration to avoid a secondary NPE masking the original error.
In
`@portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java`:
- Around line 917-923: Inside ManagementNodeManagerImpl, the block that iterates
nodesInDb under synchronized(lifecycleLock) must not call new Runnable(){
`@AsyncThread` public void run(){ nodeLifeCycle.nodeJoin(...) }}.run() because
`@AsyncThread` on an anonymous method is ignored and .run() is synchronous;
instead, inside the synchronized(lifecycleLock) collect the
ManagementNodeInventory objects that need processing into a temporary list
(e.g., pendingJoins), then release the lock and for each item in pendingJoins
submit an asynchronous task to the proper executor (use the existing
ThreadFacade/async API the project uses or create a new Runnable and submit to a
thread pool) that calls
nodeLifeCycle.nodeJoin(ManagementNodeInventory.valueOf(n)); this ensures
nodeJoin runs asynchronously and the lifecycleLock is not held during nodeJoin.
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/expon/ExponPrimaryStorageCase.groovy`:
- Line 689: The cleanup currently calls deleteVolume(vol2.uuid) unconditionally
which can throw an NPE if vol2 was never created; update the cleanup in
ExponPrimaryStorageCase (reference variable vol2) to check that vol2 is not null
(and optionally that vol2.uuid is not null/empty) before calling deleteVolume,
and skip or log the deletion when vol2 is absent so the original failure isn't
masked.
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/xinfini/XinfiniPrimaryStorageCase.groovy`:
- Line 574: 在 testClean() 中对可能为 null 的 vol2 做保护性判断后再调用 deleteVolume;也可以改为在删除前使用
if (vol2 != null) deleteVolume(vol2.uuid) 或者封装一个安全删除方法(如
safeDeleteVolume(uuid))来避免在前序失败时因 vol2 为 null 导致空指针,定位到 testClean() 中涉及 vol2
变量并在调用 deleteVolume(...) 处添加该判空或安全删除调用。
In `@testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy`:
- Around line 455-457: The current else-if "remove" branch calls
iscsiClientSnapshots.getOrDefault(clientId, Collections.emptySet()).remove(...),
which may invoke remove() on an immutable emptySet and throw
UnsupportedOperationException; change the logic in
ExternalPrimaryStorageSpec.groovy to fetch the set via
iscsiClientSnapshots.get(clientId) (or check containsKey) and only call remove
on a non-null, mutable set (mirroring the computeIfAbsent() approach used in the
"add" branch) so removals operate on the actual stored collection for clientId
instead of on Collections.emptySet().
In
`@utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java`:
- Line 6281: The constant ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT in
CloudOperationsErrorCode breaks naming conventions by lacking the numeric
suffix; rename it to follow the existing pattern (e.g.
ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT_XXXXX) and update every reference to
this symbol across the codebase (tests, serializations, config, logs) to use the
new name; ensure CloudOperationsErrorCode's string value is changed to match the
new identifier and run a global search/replace for
ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT to avoid dangling references.
---
Outside diff comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.java`:
- Around line 56-64: The SQL update that sets UsedIpVO.vmNicUuid is executed
before verifying the VmNicVO exists, risking a dangling FK; in the
afterAddIpAddress method move the SQL.New(UsedIpVO.class)...update() call to
after the VmNicVO nic null-check (the Q.New(VmNicVO.class).eq(...).find() block)
or guard it with an early return so that when nic == null you skip updating
UsedIpVO; update references: afterAddIpAddress, UsedIpVO, VmNicVO, vmNicUUid,
usedIpUuid.
In
`@core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java`:
- Around line 79-93: The getNodeInfo method incorrectly assigns the return of
nodes.put(...) (which is the previous value, null on cache miss) to info; change
the flow in getNodeInfo so that after loading ManagementNodeVO via
dbf.findByUuid(...) and creating a new NodeInfo(vo) you store that new instance
into a local variable (e.g. newInfo), add nodeUuid to nodeHash, put newInfo into
the nodes map, and then return newInfo; keep the ManagementNodeNotFoundException
behavior and the synchronized signature intact (references: getNodeInfo,
NodeInfo, nodes, nodeHash, ManagementNodeVO, dbf.findByUuid).
In `@header/src/main/java/org/zstack/header/vm/VmAbnormalLifeCycleStruct.java`:
- Around line 53-60: The enum value VmRunningFromUnknownStateHostChanged in
VmAbnormalLifeCycleStruct has the same match() predicate as
VmRunningFromUnknownStateHostNotChanged, causing it to never match; update
VmRunningFromUnknownStateHostChanged.match() to invert the host comparison so it
returns true when struct.getCurrentHostUuid() does NOT equal
struct.getOriginalHostUuid(), leaving other checks (original state == Unknown
and current state == Running) unchanged so
getVmAbnormalLifeCycleOperationFromStruct can correctly find the "host changed"
case.
- Line 42: The fix is incomplete: multiple enum constants' match()
implementations call struct.getCurrentHostUuid().equals(...) and can NPE when
currentHostUuid is null; update each listed enum constant
(VmPausedFromUnknownStateHostNotChanged,
VmRunningFromUnknownStateHostNotChanged, VmRunningFromUnknownStateHostChanged,
VmStoppedFromUnknownStateHostNotChanged, VmPausedFromRunningStateHostNotChanged,
VmRunningFromPausedStateHostNotChanged,
VmPausedFromMigratingStateHostNotChanged,
VmCrashedFromRunningStateHostNotChanged,
VmRunningFromCrashedStateHostNotChanged,
VmStoppedFromCrashedStateHostNotChanged) to use
Objects.equals(struct.getCurrentHostUuid(), struct.getOriginalHostUuid()) or
Objects.equals(struct.getCurrentHostUuid(), struct.getVmLastHostUuid()) as
appropriate (same pattern used in the already-fixed
VmStoppedFromPausedStateHostNotChanged); import java.util.Objects if not
present.
In `@identity/src/main/java/org/zstack/identity/AuthorizationManager.java`:
- Around line 112-122: Replace the DebugUtils.Assert in AuthorizationManager
where IdentityByPassCheck.NoSessionEvaluation is handled so that when session ==
null it throws an ApiMessageInterceptionException (using the same err(...)
helper) instead of a RuntimeException; locate the block referencing
IdentityByPassCheck.NoSessionEvaluation and msg.getSession(), and change the
assertion to throw new
ApiMessageInterceptionException(err(ORG_ZSTACK_IDENTITY_10012,
IdentityErrors.INVALID_SESSION, "NoSessionEvaluation set but no session set in
the message: %s", CloudBusGson.toJson(msg))) or alternatively catch the
RuntimeException from DebugUtils.Assert and rethrow an
ApiMessageInterceptionException to ensure both the NoSessionEvaluation path and
the evaluateSession path use the same exception type (and consider using
distinct error codes if you want to differentiate original-null vs renewed-null
cases).
---
Nitpick comments:
In `@compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.java`:
- Line 149: In HostAllocatorChain's catch (Throwable t) block replace the
redundant ternary that checks "t != null ? t.toString() : 'unknown error'" with
a direct use of the caught throwable (e.g., t.toString() or t.getMessage()) to
form errMsg; remove the dead null branch so the assignment to errMsg uses the
guaranteed-non-null t (reference: variable errMsg and the catch (Throwable t) in
HostAllocatorChain).
In
`@core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java`:
- Around line 80-93: getNodeInfo currently holds the synchronized lock while
calling dbf.findByUuid, causing unnecessary lock contention; change it to first
check nodes.get(nodeUuid) and if missing call dbf.findByUuid outside the
synchronized block, then enter synchronized(this) and double-check
nodes.get(nodeUuid) again; if still null validate vo (throw
ManagementNodeNotFoundException if vo is null), create the new NodeInfo from
ManagementNodeVO, add nodeUuid to nodeHash and put the new NodeInfo into nodes
(use the newly created NodeInfo as the value to insert and return it), otherwise
return the cached NodeInfo—refer to getNodeInfo, nodes, nodeHash,
dbf.findByUuid, ManagementNodeVO, NodeInfo, and ManagementNodeNotFoundException.
In `@core/src/main/java/org/zstack/core/db/UpdateQueryImpl.java`:
- Around line 57-58: The exception message in UpdateQueryImpl's throw currently
uses val.getClass() which prints the redundant "class ..." prefix; change it to
use val.getClass().getName() so the message shows the simple fully-qualified
class name (and keep the existing null-safe ternary check that avoids NPE when
val is null).
In `@core/src/main/java/org/zstack/core/log/LogSafeGson.java`:
- Around line 211-215: The toJson(Object o) method in LogSafeGson can NPE when o
is null because it calls o.getClass(); add a null guard at the start of
LogSafeGson.toJson(Object o) (similar to the existing toJsonTree(Object o)
check) and return the appropriate JSON null representation (e.g. the JSON string
"null" or use logSafeGson.toJson(JsonNull.INSTANCE)) instead of calling
getGsonType(o.getClass()); ensure calls like desensitize continue to work
unchanged.
In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java`:
- Line 58: Replace the magic literal 1000 used in options.setMaxQueueSize(1000)
with a well-named constant (e.g., MAX_QUEUE_SIZE) declared near the top of
SentryInitHelper (static final int MAX_QUEUE_SIZE = 1000) and then use
options.setMaxQueueSize(MAX_QUEUE_SIZE) so the intent is clear and the value is
easy to adjust.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java`:
- Around line 513-525: The loop in cleanupExpiredOrphanedSkipVms() repeatedly
calls getOrphanTtlMs(), causing redundant global config reads; cache the TTL
once before the for-loop (e.g., long ttl = getOrphanTtlMs()) and use ttl inside
the loop when comparing (referencing orphanedSkipVms, getOrphanTtlMs(), and
cleanupExpiredOrphanedSkipVms()) so you avoid repeated calls as done similarly
in nodeLeft().
- Around line 469-477: Cache the result of getOrphanTtlMs() before iterating
skippedVms and compute the minutes value once to avoid repeated global config
reads, and replace the hardcoded 60000 with a named constant (e.g.,
MS_PER_MINUTE) defined in KvmVmSyncPingTask (or a shared constant used by
isVmDoNotNeedToTrace) so the logger uses the cached minutes value; apply the
same replacement of the magic 60000 in isVmDoNotNeedToTrace (lines ~504-505) to
use the named constant.
- Around line 501-506: The log message in KvmVmSyncPingTask uses
getOrphanTtlMs() / 60000 and "expired after %d minutes" which is misleading;
update the logger.info call that references vmUuid and getOrphanTtlMs() so it
reports the configured TTL (e.g., "TTL of %d minutes") rather than elapsed time,
and replace the magic number 60000 with a named constant (e.g.,
MILLIS_PER_MINUTE) defined near the class or file; keep the
orphanedSkipVms.remove(vmUuid, orphanedAt) behavior unchanged.
In
`@portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java`:
- Around line 893-896: In ManagementNodeManagerImpl inside the catch for
destinationMaker.getNodeInfo(nodeUuid), update the exception handling so the
logger.warn call includes the caught Exception object (e) and set a clear
fallback hostname via inv.setHostName (e.g., "unknown-host" or ""), so hostName
is never left null; reference the existing inv.setHostName,
destinationMaker.getNodeInfo, and logger.warn identifiers when making this
change.
In `@search/src/main/java/org/zstack/query/QueryFacadeImpl.java`:
- Line 381: In QueryFacadeImpl replace the current throw of
OperationFailureException that only uses e.getMessage() by first logging the
failure with the exception object (e.g., logger.error("failed to query", e)) and
then rethrowing an OperationFailureException that preserves the original cause
(attach e as the cause or include the exception in the constructor) so the
stacktrace is retained for debugging; update the throw site where
OperationFailureException is created to include the original exception.
In `@testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy`:
- Around line 298-301: 在 ExternalPrimaryStorageSpec 的 registerSimulators()
方法开头调用静态状态容器的 clear() 方法以重置全局/静态状态,防止同 JVM 中多次用例运行时历史数据污染后续用例;定位到
registerSimulators(...) 并在其第一行添加对相应静态容器的 clear() 调用(并同样在另一个相同模式的位置,即文件中约 902-905
行对应的 registerSimulators 实现处也做相同修改)。
ℹ️ Review info
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (13)
build/pom.xmlis excluded by!**/*.xmlconf/i18n/globalErrorCodeMapping/global-error-de-DE.jsonis excluded by!**/*.jsonconf/i18n/globalErrorCodeMapping/global-error-en_US.jsonis excluded by!**/*.jsonconf/i18n/globalErrorCodeMapping/global-error-fr-FR.jsonis excluded by!**/*.jsonconf/i18n/globalErrorCodeMapping/global-error-id-ID.jsonis excluded by!**/*.jsonconf/i18n/globalErrorCodeMapping/global-error-ja-JP.jsonis excluded by!**/*.jsonconf/i18n/globalErrorCodeMapping/global-error-ko-KR.jsonis excluded by!**/*.jsonconf/i18n/globalErrorCodeMapping/global-error-ru-RU.jsonis excluded by!**/*.jsonconf/i18n/globalErrorCodeMapping/global-error-th-TH.jsonis excluded by!**/*.jsonconf/i18n/globalErrorCodeMapping/global-error-zh_CN.jsonis excluded by!**/*.jsonconf/i18n/globalErrorCodeMapping/global-error-zh_TW.jsonis excluded by!**/*.jsoncore/pom.xmlis excluded by!**/*.xmlpom.xmlis excluded by!**/*.xml
📒 Files selected for processing (40)
compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.javacompute/src/main/java/org/zstack/compute/allocator/HostAllocatorManagerImpl.javacompute/src/main/java/org/zstack/compute/host/HostBase.javacompute/src/main/java/org/zstack/compute/vm/VmCapabilitiesJudger.javacompute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.javaconf/db/upgrade/V5.5.12__schema.sqlcore/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.javacore/src/main/java/org/zstack/core/db/UpdateQueryImpl.javacore/src/main/java/org/zstack/core/log/LogSafeGson.javacore/src/main/java/org/zstack/core/telemetry/SentryInitHelper.javacore/src/main/java/org/zstack/core/telemetry/TelemetryGlobalProperty.javaheader/src/main/java/org/zstack/header/storage/addon/primary/ExternalPrimaryStorageInventory.javaheader/src/main/java/org/zstack/header/vm/VmAbnormalLifeCycleStruct.javaheader/src/main/java/org/zstack/header/vm/VmInstanceSpec.javaheader/src/main/java/org/zstack/header/vm/VmInstanceState.javaidentity/src/main/java/org/zstack/identity/AuthorizationManager.javanetwork/src/main/java/org/zstack/network/l3/L3BasicNetwork.javaplugin/cbd/src/main/java/org/zstack/cbd/kvm/KvmCbdNodeServer.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageMonBase.javaplugin/expon/src/main/java/org/zstack/expon/ExponStorageController.javaplugin/expon/src/main/java/org/zstack/expon/sdk/ExponClient.javaplugin/expon/src/main/java/org/zstack/expon/sdk/ExponConnectConfig.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMGlobalConfig.javaplugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.javaplugin/ldap/src/main/java/org/zstack/ldap/LdapUtil.javaplugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerApiInterceptor.javaplugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageAllocatorFactory.javaplugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.javaplugin/zbs/src/main/java/org/zstack/storage/zbs/ZbsStorageController.javaportal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.javasearch/src/main/java/org/zstack/query/QueryFacadeImpl.javastorage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.javatest/src/test/groovy/org/zstack/test/integration/storage/StorageTest.groovytest/src/test/groovy/org/zstack/test/integration/storage/primary/PrimaryStorageTest.groovytest/src/test/groovy/org/zstack/test/integration/storage/primary/addon/expon/ExponPrimaryStorageCase.groovytest/src/test/groovy/org/zstack/test/integration/storage/primary/addon/xinfini/XinfiniPrimaryStorageCase.groovytestlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovytestlib/src/main/java/org/zstack/testlib/SpringSpec.groovyutils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
✅ Files skipped from review due to trivial changes (1)
- conf/db/upgrade/V5.5.12__schema.sql
🚧 Files skipped from review as they are similar to previous changes (5)
- plugin/expon/src/main/java/org/zstack/expon/ExponStorageController.java
- plugin/cbd/src/main/java/org/zstack/cbd/kvm/KvmCbdNodeServer.java
- plugin/expon/src/main/java/org/zstack/expon/sdk/ExponConnectConfig.java
- test/src/test/groovy/org/zstack/test/integration/storage/StorageTest.groovy
- plugin/expon/src/main/java/org/zstack/expon/sdk/ExponClient.java
| if (isDryRun) { | ||
| if (dryRunCompletion != null) { | ||
| dryRunCompletion.fail(inerr(ORG_ZSTACK_COMPUTE_ALLOCATOR_10019, errMsg)); | ||
| } | ||
| } else if (completion != null) { | ||
| completion.fail(inerr(ORG_ZSTACK_COMPUTE_ALLOCATOR_10019, errMsg)); | ||
| } |
There was a problem hiding this comment.
静默吞咽 Throwable 可能导致调用方永远挂起
当 isDryRun=true 且 dryRunCompletion==null(或 isDryRun=false 且 completion==null)时,该 Throwable 被完全静默丢弃,调用方的 ReturnValueCompletion 回调永远不会触发,造成无限挂起。
与此形成对比的是 done() 方法(第 88–119 行)直接调用 dryRunCompletion.fail(...) / completion.fail(...) 而没有任何 null 检查——若 completion 为 null 时 done() 会立即抛出 NPE,使问题暴露;但此处的 null 守卫反而将错误隐藏,使故障模式从"立即可见的崩溃"变成"难以排查的静默挂起"。
从 allocate(ReturnValueCompletion) 和 dryRun(ReturnValueCompletion) 的实现来看,completion/dryRunCompletion 在 start() 调用前已经被赋值,正常路径下不应为 null。建议去掉 null 守卫,或至少在 null 时补充一条 error 级别日志并触发断言,以保持与 done() 的行为一致性。
🐛 建议修复——移除静默吞咽,保持与 done() 一致
} catch (Throwable t) {
logger.warn("unhandled throwable", t);
- String errMsg = t != null ? t.toString() : "unknown error";
- if (isDryRun) {
- if (dryRunCompletion != null) {
- dryRunCompletion.fail(inerr(ORG_ZSTACK_COMPUTE_ALLOCATOR_10019, errMsg));
- }
- } else if (completion != null) {
- completion.fail(inerr(ORG_ZSTACK_COMPUTE_ALLOCATOR_10019, errMsg));
+ String errMsg = t.toString();
+ if (isDryRun) {
+ dryRunCompletion.fail(inerr(ORG_ZSTACK_COMPUTE_ALLOCATOR_10019, errMsg));
+ } else {
+ completion.fail(inerr(ORG_ZSTACK_COMPUTE_ALLOCATOR_10019, errMsg));
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.java`
around lines 150 - 156, The code in HostAllocatorChain handling errors silently
swallows Throwables when isDryRun is true and dryRunCompletion==null (or when
completion==null), causing callers to hang; change the logic in the error path
to mirror done(): remove the null guards and always call
dryRunCompletion.fail(...) or completion.fail(...) (or if you prefer to keep
guards, add an error-level log and an assertion/fail-fast when the completion is
null before returning) so that Throwable is not silently dropped; update the
error branch around isDryRun/dryRunCompletion/completion to ensure a fail(...)
is invoked (or a clear error+assert) using the same identifiers used in the
diff.
| @Override | ||
| protected String getDeduplicateString() { | ||
| return String.format("connect-host-%s", self.getUuid()); | ||
| return String.format("connect-host-%s", self == null ? "unknown" : self.getUuid()); |
There was a problem hiding this comment.
避免把不同主机退化为同一个去重键。
Line 1445 在 self == null 时固定返回 connect-host-unknown,会让不同主机的连接任务共用同一 deduplicate key,存在误去重/串行化串扰风险。建议优先回退到 msg.getHostUuid()。
建议修复
- return String.format("connect-host-%s", self == null ? "unknown" : self.getUuid());
+ String hostUuid = self != null ? self.getUuid() : msg.getHostUuid();
+ return String.format("connect-host-%s", hostUuid == null ? "unknown" : hostUuid);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return String.format("connect-host-%s", self == null ? "unknown" : self.getUuid()); | |
| String hostUuid = self != null ? self.getUuid() : msg.getHostUuid(); | |
| return String.format("connect-host-%s", hostUuid == null ? "unknown" : hostUuid); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@compute/src/main/java/org/zstack/compute/host/HostBase.java` at line 1445,
The deduplicate key currently returns a fixed "connect-host-unknown" when self
is null, causing different hosts to share the same key; change the key
generation in HostBase to prefer msg.getHostUuid() as the fallback (i.e., use
self.getUuid() if self != null, else msg.getHostUuid()), and only fall back to a
literal "unknown" if both are null so each host's connect-host-{uuid} key
remains unique.
| case ERROR: logger.warn("[Sentry] " + formatted); break; | ||
| case WARNING: logger.warn("[Sentry] " + formatted); break; | ||
| default: logger.info("[Sentry] " + formatted); break; |
There was a problem hiding this comment.
请不要把 Sentry 的 ERROR 级别降为 WARN。
当前映射会弱化错误信号,可能影响基于日志级别的告警与定位。
🔧 建议修复
- case ERROR: logger.warn("[Sentry] " + formatted); break;
+ case ERROR: logger.error("[Sentry] " + formatted); break;
case WARNING: logger.warn("[Sentry] " + formatted); break;
default: logger.info("[Sentry] " + formatted); break;- case ERROR: logger.warn("[Sentry] " + message, throwable); break;
+ case ERROR: logger.error("[Sentry] " + message, throwable); break;
case WARNING: logger.warn("[Sentry] " + message, throwable); break;
default: logger.info("[Sentry] " + message, throwable); break;Also applies to: 76-79
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java` around
lines 67 - 69, 在 SentryInitHelper 中不要把 Sentry 的 ERROR 级别降为 WARN;在处理 Sentry
日志级别的映射(在 switch 分支里处理 case ERROR/ WARNING/ default 的代码块,例如当前通过
logger.warn("[Sentry] " + formatted) 处理 ERROR 和 WARNING 的位置)将 ERROR 映射为
logger.error("[Sentry] " + formatted),将 WARNING 映射为 logger.warn("[Sentry] " +
formatted),保持 default(或 INFO)映射为 logger.info("[Sentry] " +
formatted);同样修正文件中另一个相同映射块(也就是你提到的 76-79 附近的 switch 分支)以保持一致。
| Set<ReservedIpRangeVO> reservedIpRanges = self.getReservedIpRanges(); | ||
| if (reservedIpRanges != null && !reservedIpRanges.isEmpty()) { | ||
| freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream().anyMatch( | ||
| r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp()))); | ||
| } | ||
|
|
||
| reply.setInventories(freeIpInventorys); |
There was a problem hiding this comment.
保留段过滤放在取满 limit 之后,会返回少于请求数量的结果
Line 1073 先按 limit 提前 break,Line 1079-1083 再删掉保留 IP,会让返回数小于 msg.getLimit()(即使后续网段仍有可用 IP)。建议在每个网段取回后先过滤,再决定是否达到 limit。
建议修复(将过滤前移到循环内)
- List<FreeIpInventory> freeIpInventorys = new ArrayList<FreeIpInventory>();
+ List<FreeIpInventory> freeIpInventorys = new ArrayList<FreeIpInventory>();
+ Set<ReservedIpRangeVO> reservedIpRanges = self.getReservedIpRanges();
int limit = msg.getLimit();
String start = msg.getStart();
for ( IpRangeVO ipRangeVO : ipRangeVOs) {
@@
List<FreeIpInventory> tempFreeIpInventorys = IpRangeHelper.getFreeIp(ipRangeVO, limit,start);
+ if (reservedIpRanges != null && !reservedIpRanges.isEmpty()) {
+ tempFreeIpInventorys.removeIf(freeIp -> {
+ int ipVersion = NetworkUtils.isIpv4Address(freeIp.getIp()) ? IPv6Constants.IPv4 : IPv6Constants.IPv6;
+ return reservedIpRanges.stream()
+ .filter(r -> r.getIpVersion() == ipVersion)
+ .anyMatch(r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp()));
+ });
+ }
freeIpInventorys.addAll(tempFreeIpInventorys);
if (freeIpInventorys.size() >= msg.getLimit()) {
break;
}
- limit -= freeIpInventorys.size();
+ limit = msg.getLimit() - freeIpInventorys.size();
}
-
- Set<ReservedIpRangeVO> reservedIpRanges = self.getReservedIpRanges();
- if (reservedIpRanges != null && !reservedIpRanges.isEmpty()) {
- freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream().anyMatch(
- r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp())));
- }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java` around lines
1079 - 1085, 当前实现是先按 limit 聚合各网段的 freeIpInventorys 再一次性过滤 reservedIpRanges(使用
NetworkUtils.isInRange),导致可能在达到 limit 后才删除保留段,返回的数量低于
msg.getLimit();修改方法:在遍历各网段并向 freeIpInventorys 添加每段结果的地方(相关代码使用 freeIpInventorys
集合和添加逻辑),在将单段 IP 加入集合前先用 reservedIpRanges.stream().anyMatch(...) 或等价逻辑对每个 freeIp
进行过滤(使用 NetworkUtils.isInRange 和 ReservedIpRangeVO 的
getStartIp/getEndIp),然后再判断是否已达到 msg.getLimit() 并决定是否 break;确保最后调用
reply.setInventories(freeIpInventorys) 的集合已经是过滤后的结果。
| freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream().anyMatch( | ||
| r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp()))); |
There was a problem hiding this comment.
按 IP 版本过滤保留段,否则这里在双栈场景会抛运行时异常
Line 1081-1082 当前会把任意 freeIp 与所有保留段做 isInRange 比较;当 IPv4 IP 去比较 IPv6 保留段(或反向)时,NetworkUtils.isInRange 会走错族解析并抛异常,导致 APIGetFreeIp 失败。
建议修复
- if (reservedIpRanges != null && !reservedIpRanges.isEmpty()) {
- freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream().anyMatch(
- r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp())));
- }
+ if (reservedIpRanges != null && !reservedIpRanges.isEmpty()) {
+ freeIpInventorys.removeIf(freeIp -> {
+ int ipVersion = NetworkUtils.isIpv4Address(freeIp.getIp()) ? IPv6Constants.IPv4 : IPv6Constants.IPv6;
+ return reservedIpRanges.stream()
+ .filter(r -> r.getIpVersion() == ipVersion)
+ .anyMatch(r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp()));
+ });
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java` around lines
1081 - 1082, The removal logic for freeIpInventorys calls NetworkUtils.isInRange
against every reservedIpRanges entry regardless of IP family, causing exceptions
in dual-stack scenarios; update the predicate used in freeIpInventorys.removeIf
(the code that iterates freeIp -> reservedIpRanges.stream().anyMatch(...)) to
first compare IP versions (e.g., derive family from freeIp.getIp() and the
reserved range start/end or use an existing IP version field) and only call
NetworkUtils.isInRange when the versions match, ensuring compatibility for both
IPv4 and IPv6 and preventing APIGetFreeIp runtime errors.
| new Runnable() { | ||
| @Override | ||
| @AsyncThread | ||
| public void run() { | ||
| nodeLifeCycle.nodeJoin(ManagementNodeInventory.valueOf(n)); | ||
| } | ||
| }.run(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== 1) Find AsyncThread annotation definition ==="
fd -t f -i asyncthread.java
echo
echo "=== 2) Search for AsyncThread annotation class definition ==="
rg -t java 'public.*@interface.*AsyncThread|class.*AsyncThread' -A 10
echo
echo "=== 3) Find weaving/interceptor for AsyncThread ==="
rg -t java -i 'aspect|@around|@pointcut' | grep -i async | head -20
echo
echo "=== 4) Show ManagementNodeManagerImpl context around lines 917-923 ==="
if [ -f "portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java" ]; then
sed -n '900,940p' portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java
else
echo "File not found, searching for it..."
fd ManagementNodeManagerImpl.java
fi
echo
echo "=== 5) Find all `@AsyncThread` usage patterns in codebase ==="
rg -t java '@AsyncThread' -B 3 -A 5 | head -100Repository: MatheMatrix/zstack
Length of output: 15657
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== 1) Find AsyncThread annotation definition with full content ==="
cat core/src/main/java/org/zstack/core/thread/AsyncThread.java
echo
echo "=== 2) Search for AspectJ aspect or weaving interceptor for AsyncThread ==="
rg -t java -i 'aspect|@pointcut|@around' core/src/main/java/org/zstack/core | grep -A 20 -B 5 -i async | head -150
echo
echo "=== 3) Get broader context around lines 910-930 in ManagementNodeManagerImpl ==="
sed -n '890,950p' portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java | cat -n
echo
echo "=== 4) Check if there is a synchronized block wrapping this code ==="
rg -t java -n 'synchronized.*lifecycleLock' portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java -A 100 | head -150
echo
echo "=== 5) Search for where the AsyncThread weaving happens ==="
fd -t f -path '*/core/*' -name '*.java' | xargs rg -l 'AsyncThread' | head -10Repository: MatheMatrix/zstack
Length of output: 14545
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== 1) Search for ThreadFacade or thread pool that processes `@AsyncThread` ==="
rg -t java 'ThreadFacade|ThreadPool|AsyncThread' core/src/main/java/org/zstack/core/thread/ -A 5 -B 2 | head -200
echo
echo "=== 2) Check test files for how `@AsyncThread` is expected to work ==="
if [ -f "test/src/test/java/org/zstack/test/core/thread/TestThreadFacadeAnnotationFuture.java" ]; then
echo "--- TestThreadFacadeAnnotationFuture.java ---"
cat test/src/test/java/org/zstack/test/core/thread/TestThreadFacadeAnnotationFuture.java | head -150
fi
echo
echo "=== 3) Search for method interception or CGLib usage for AsyncThread ==="
rg -t java 'AsyncThread' core/ -B 5 -A 10 | grep -i -E 'proxy|intercept|aspectj|cglib|invoke|reflection' | head -50
echo
echo "=== 4) Find where `@AsyncThread` annotation is actually processed ==="
rg -t java 'AsyncThread' core/ | grep -v test | grep -v '.class'Repository: MatheMatrix/zstack
Length of output: 26785
确认 @AsyncThread 在匿名 Runnable#run() 上无法生效,并发修复失效。
第 917-923 行代码存在以下问题:
@AsyncThread注解仅在常规方法声明上有效(见ThreadFacadeImpl、CloudBusImpl等使用例),不支持匿名内部类方法。- 即使注解被处理,直接调用
new Runnable(){...}.run()也是同步执行,无法异步。 - 该代码在
synchronized(lifecycleLock)内,若注解失效,nodeJoin将在持有锁的状态下同步执行,锁持有时间延长。
建议改为:锁内只记录待处理节点,锁外提交异步任务。
当前代码
synchronized (lifecycleLock) {
// ...
nodesInDb.forEach(n -> {
if (n.getUuid().equals(node().getUuid()) || suspects.contains(n)) {
return;
}
new Runnable() {
`@Override`
`@AsyncThread`
public void run() {
nodeLifeCycle.nodeJoin(ManagementNodeInventory.valueOf(n));
}
}.run();
});
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java`
around lines 917 - 923, Inside ManagementNodeManagerImpl, the block that
iterates nodesInDb under synchronized(lifecycleLock) must not call new
Runnable(){ `@AsyncThread` public void run(){ nodeLifeCycle.nodeJoin(...) }}.run()
because `@AsyncThread` on an anonymous method is ignored and .run() is
synchronous; instead, inside the synchronized(lifecycleLock) collect the
ManagementNodeInventory objects that need processing into a temporary list
(e.g., pendingJoins), then release the lock and for each item in pendingJoins
submit an asynchronous task to the proper executor (use the existing
ThreadFacade/async API the project uses or create a new Runnable and submit to a
thread pool) that calls
nodeLifeCycle.nodeJoin(ManagementNodeInventory.valueOf(n)); this ensures
nodeJoin runs asynchronously and the lifecycleLock is not held during nodeJoin.
|
|
||
| deleteVm(vm.uuid) | ||
|
|
||
| deleteVolume(vol2.uuid) |
There was a problem hiding this comment.
清理阶段建议对 vol2 先判空,避免掩盖真实失败原因。
Line 689 若前序步骤异常导致 vol2 未创建,这里会二次抛错(NPE),让排障变难。
💡 建议修改
- deleteVolume(vol2.uuid)
+ if (vol2?.uuid != null) {
+ deleteVolume(vol2.uuid)
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/expon/ExponPrimaryStorageCase.groovy`
at line 689, The cleanup currently calls deleteVolume(vol2.uuid) unconditionally
which can throw an NPE if vol2 was never created; update the cleanup in
ExponPrimaryStorageCase (reference variable vol2) to check that vol2 is not null
(and optionally that vol2.uuid is not null/empty) before calling deleteVolume,
and skip or log the deletion when vol2 is absent so the original failure isn't
masked.
| void testClean() { | ||
| deleteVm(vm.uuid) | ||
|
|
||
| deleteVolume(vol2.uuid) |
There was a problem hiding this comment.
testClean() 里建议对 vol2 判空后再删除。
Line 574 在前序失败场景可能触发空指针,影响失败根因定位。
💡 建议修改
- deleteVolume(vol2.uuid)
+ if (vol2?.uuid != null) {
+ deleteVolume(vol2.uuid)
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/xinfini/XinfiniPrimaryStorageCase.groovy`
at line 574, 在 testClean() 中对可能为 null 的 vol2 做保护性判断后再调用 deleteVolume;也可以改为在删除前使用
if (vol2 != null) deleteVolume(vol2.uuid) 或者封装一个安全删除方法(如
safeDeleteVolume(uuid))来避免在前序失败时因 vol2 为 null 导致空指针,定位到 testClean() 中涉及 vol2
变量并在调用 deleteVolume(...) 处添加该判空或安全删除调用。
| } else if ("remove".equals(action)) { | ||
| iscsiClientSnapshots.getOrDefault(clientId, Collections.emptySet()).remove(snapId.toString()) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# 检查目标文件的确切行内容
cd /tmp && find . -path "*testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy" 2>/dev/null
# 如果不存在,尝试查找仓库
git ls-files | grep -i ExternalPrimaryStorageSpecRepository: MatheMatrix/zstack
Length of output: 130
🏁 Script executed:
# 读取目标文件的 455-457 行
git show HEAD:testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy | sed -n '450,460p'Repository: MatheMatrix/zstack
Length of output: 725
🏁 Script executed:
# 展开检查更多上下文(440-470行)以理解 iscsiClientSnapshots 定义
git show HEAD:testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy | sed -n '440,470p'Repository: MatheMatrix/zstack
Length of output: 2001
🏁 Script executed:
# 检查是否存在中文注释(违反代码指南 **/*.* 不应有中文)
git show HEAD:testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy | grep -n '[^\x00-\x7F]' | head -20Repository: MatheMatrix/zstack
Length of output: 878
🏁 Script executed:
# 搜索整个文件中是否有类似的 getOrDefault + Collections.emptySet() + remove 模式
git show HEAD:testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy | grep -n "getOrDefault" | head -20Repository: MatheMatrix/zstack
Length of output: 620
🏁 Script executed:
# 验证 Collections.emptySet() 在 Groovy/Java 中是否真的不支持 remove()
# 查看同仓库内是否有类似的错误处理模式
rg -n "getOrDefault.*Collections\.(emptySet|emptyList|emptyMap)\(\)" testlib/src/main/java/org/zstack/testlib/ -A 2Repository: MatheMatrix/zstack
Length of output: 2082
🏁 Script executed:
# 检查 iscsiClientSnapshots 的定义类型
git show HEAD:testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy | grep -n "iscsiClientSnapshots" | head -10Repository: MatheMatrix/zstack
Length of output: 896
避免在不可变集合上执行 remove(),否则触发运行时异常。
第 456 行在 clientId 不存在时会获得 Collections.emptySet(),该集合不可修改,调用 remove() 会抛出 UnsupportedOperationException,导致模拟器接口异常。与第 454 行"add"分支使用 computeIfAbsent() 的正确做法不一致。
建议改为:
def snaps = iscsiClientSnapshots.get(clientId)
if (snaps != null) {
snaps.remove(snapId.toString())
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy`
around lines 455 - 457, The current else-if "remove" branch calls
iscsiClientSnapshots.getOrDefault(clientId, Collections.emptySet()).remove(...),
which may invoke remove() on an immutable emptySet and throw
UnsupportedOperationException; change the logic in
ExternalPrimaryStorageSpec.groovy to fetch the set via
iscsiClientSnapshots.get(clientId) (or check containsKey) and only call remove
on a non-null, mutable set (mirroring the computeIfAbsent() approach used in the
"add" branch) so removals operate on the actual stored collection for clientId
instead of on Collections.emptySet().
|
|
||
| public static final String ORG_ZSTACK_STORAGE_BACKUP_10133 = "ORG_ZSTACK_STORAGE_BACKUP_10133"; | ||
|
|
||
| public static final String ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT = "ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT"; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java"
echo "[1] 列出 CloudOperationsErrorCode 中非数字后缀的 ORG_ZSTACK 常量:"
rg -nP 'public static final String ORG_ZSTACK_[A-Z0-9_]+ = "ORG_ZSTACK_[A-Z0-9_]+";' "$FILE" | rg -nPv '_\d+";$' || true
echo
echo "[2] 搜索仓库内可能依赖"错误码数字后缀"的匹配/解析逻辑:"
rg -nP --type=java '(matches\([^)]*ORG_ZSTACK[^)]*\\d|Pattern\.compile\([^)]*ORG_ZSTACK[^)]*\\d|\.split\("_"\)|parseInt|parseLong)' -A 2 -B 2 | head -100 || true
echo
echo "[3] 检查错误码文件中数字后缀与非数字后缀的比例:"
echo "Total error codes with numeric suffix (ending with _XXXX where X is digit):"
rg -c '_\d+";$' "$FILE" || echo "0"
echo "Total error codes with non-numeric suffix:"
rg -cP 'public static final String ORG_ZSTACK_[A-Z0-9_]+[A-Z] = "ORG_ZSTACK' "$FILE" || echo "0"Repository: MatheMatrix/zstack
Length of output: 11140
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "[1] 搜索 ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT 的使用:"
rg 'ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT' --type=java || echo "No usage found"
echo
echo "[2] 搜索错误码处理相关的工具类(可能处理数字后缀):"
rg -nP --type=java '(ErrorCode|error.*code|errorCode)' -i | rg -iP '(parse|extract|suffix|numeric|digit)' | head -50 || true
echo
echo "[3] 检查CloudOperationsErrorCode附近是否有特殊的命名说明或注释:"
FILE="utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java"
sed -n '6270,6290p' "$FILE"
echo
echo "[4] 查看错误码文件的开头以了解命名规范:"
head -50 "$FILE"Repository: MatheMatrix/zstack
Length of output: 6615
应添加数字后缀以保持错误码命名一致性
该常量命名与仓库现有约定不符。CloudOperationsErrorCode 中 7940 个常量均采用 _NNNN 的数字后缀形式(如 ORG_ZSTACK_STORAGE_BACKUP_10129),而该常量为唯一例外。建议改为 ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT_XXXXX 的形式,保持规范一致性。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java`
at line 6281, The constant ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT in
CloudOperationsErrorCode breaks naming conventions by lacking the numeric
suffix; rename it to follow the existing pattern (e.g.
ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT_XXXXX) and update every reference to
this symbol across the codebase (tests, serializations, config, logs) to use the
new name; ensure CloudOperationsErrorCode's string value is changed to match the
new identifier and run a global search/replace for
ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT to avoid dangling references.
ZSTAC-82153
Changes
Test
Related: ZSTAC-82153
sync from gitlab !9194