improved the logtool, but not quite there yet?

This commit is contained in:
Thaddeus Hughes
2026-03-30 11:39:04 -05:00
parent 9eb283420a
commit 837ec18fad
24 changed files with 55223 additions and 57 deletions

68
TODO.md
View File

@@ -59,7 +59,73 @@
22. - [clauded] NVS vs custom params: NVS serves WiFi/BT internals + BDA storage; custom flash partition serves app params with CRC32 protection. Different purposes, no consolidation needed.
23. - [clauded] BUG FIX: `FSM_CMD_START` fallthrough was overwriting `this_move_dist = MIN(...)` with unconditional `DRIVE_DIST` — replaced fallthrough with goto to shared start logic so leash limit is preserved
24. - [ ] Do general bug-scan with claude. Especially think through the FSM logic.
24. - [clauded] General bug scan (FSM, power, sensors, storage, comms, RTC, peripherals)
- Ran 4 parallel deep-dive reviews across entire codebase. Findings below.
- False positives eliminated: override fallthrough (breaks present), soft idle during motor ops (FSM resets timer), JACK_DOWN_TIME uninitialized first move (jack_finish_us always set before use)
## Suspected Bugs (from item 24 scan)
28. - [ ] **BUG [CRITICAL]:** `get_is_safe()` hardcoded `return true` — safety sensor completely bypassed
- sensors.c:182 — `return true;` with `//return is_safe;` commented out below
- All FSM safety checks (STATE_JACK_UP_START, JACK_UP, DRIVE_START_DELAY, DRIVE, DRIVE_END_DELAY, calibration states) are no-ops
- Safety break will NOT trigger STATE_UNDO_JACK_START — machine runs through hazard conditions
- Debounce logic in sensors_check() still runs but output is discarded
29. - [ ] **BUG [CRITICAL]:** E-fuse INOM params allow min=0.0 → division by zero
- power_mgmt.c:380 — `float I_norm = fabsf(channel->current / I_nominal);`
- storage.h EFUSE_INOM_1/2/3 bounds: min=0.0, max=200.0
- If param=0 → I_norm=Inf → instant trip on any current (motor won't run)
- If param=NaN (flash corruption) → I_norm=NaN → all comparisons false → e-fuse NEVER trips (motor can burn)
- Fix: raise min bound to 0.1 or add explicit NaN/zero guard before division
30. - [ ] **BUG [HIGH]:** No timeout on STATE_UNDO_JACK_START
- control_fsm.c:486-493 — waits for `!efuse_get(BRIDGE_JACK)` with no max wait
- If jack efuse never cools (hardware fault, thermal runaway), FSM stuck indefinitely
- User CAN send FSM_CMD_STOP to escape, but no automatic recovery
- Fix: add timeout (e.g. 30-60s) before forcing transition to IDLE with error
31. - [ ] **BUG [HIGH]:** No e-fuse checks in calibration movement states
- control_fsm.c:495-512 — STATE_CALIBRATE_JACK_MOVE and STATE_CALIBRATE_DRIVE_MOVE
- Only check get_is_safe() and timer_done(), NOT efuse_get()
- Relay outputs (lines 625-640) drive motors regardless of efuse status
- Jack cal runs up to 3s, drive cal up to 6s without overcurrent protection
- Fix: add efuse_get() check and abort calibration on trip
32. - [ ] **BUG [HIGH]:** BLE HID scan task missing watchdog registration
- bt_hid.c — `bt_hid_scan_task()` never calls `esp_task_wdt_add(NULL)`
- Task blocks on `xSemaphoreTake(s_scan_sem, portMAX_DELAY)` — if GAP callback never signals, hangs forever
- Unlike rf_433 task (which registers WDT), BT task has no WDT coverage
- Fix: add `esp_task_wdt_add(NULL)` and periodic `esp_task_wdt_reset()` (or use timeout on semaphore)
33. - [ ] **BUG [HIGH]:** ISR sensor queue full → events silently dropped
- sensors.c:57 — queue size 16, `xQueueSendFromISR()` return value not checked
- If sensors_check() consumer falls behind (4 sensors firing edges), events lost
- Encoder counts become inaccurate → drive distance wrong
- Fix: check return value, optionally increment a dropped-event counter for diagnostics
34. - [ ] **BUG [HIGH]:** Params not validated on set, only on commit — FSM reads unvalidated values
- storage.c:268-273 — `set_param_value_t()` writes directly to `parameter_table[]` with no bounds check
- `validate_param()` only called in `commit_params()` (before flash write)
- Between POST and commit, FSM can read out-of-range values (e.g. DRIVE_DIST=999999)
- Fix: call `validate_param()` inside `set_param_value_t()`, or at least in comms.c after setting
35. - [ ] **BUG [MEDIUM]:** Solar FSM timer uninitialized
- solar.c:17 — `RTC_DATA_ATTR int64_t timer;` has no initializer
- RTC memory may contain garbage on first cold boot before `solar_reset_fsm()` sets it to -1
- `solar_run_fsm()` is called (main.c:253) before `solar_reset_fsm()` has run on first boot path
- Fix: initialize to -1 in declaration: `RTC_DATA_ATTR int64_t timer = -1;`
36. - [ ] **BUG [MEDIUM]:** E-fuse param bounds too loose
- EFUSE_HEAT_THRESH min=0.0 — allows instant trip on any current draw (storage.h)
- EFUSE_INRUSH_US max=10000000 (10s) — allows 10s of unlimited current with no e-fuse protection
- Fix: tighten bounds (e.g. HEAT_THRESH min=1.0, INRUSH_US max=2000000)
37. - [ ] **BUG [MEDIUM]:** No mutex on parameter_table[] — concurrent access from HTTP/UART/FSM tasks
- storage.c — `parameter_table[]` read/written by HTTP POST handlers, UART handlers, and FSM task
- 32-bit aligned reads/writes are atomic on ESP32, so u16/u32/i16/i32/f32 are safe
- f64 (8 bytes) and str16 (16 bytes) could be torn reads — but no f64 or str params are read by FSM in hot path
- Severity is low in practice but architecturally unsound
25. - [ ] Extract pure logic (e-fuse thermal model, param serialization, sensor debounce) into host-testable modules with Unity/CMock?
26. - [ ] UART integration test framework: Python runner + ESP-side test commands
27. - [ ] Bug: WiFi won't want to connect to STA except at first boot