improved the logtool, but not quite there yet?
This commit is contained in:
68
TODO.md
68
TODO.md
@@ -59,7 +59,73 @@
|
||||
22. - [clauded] NVS vs custom params: NVS serves WiFi/BT internals + BDA storage; custom flash partition serves app params with CRC32 protection. Different purposes, no consolidation needed.
|
||||
23. - [clauded] BUG FIX: `FSM_CMD_START` fallthrough was overwriting `this_move_dist = MIN(...)` with unconditional `DRIVE_DIST` — replaced fallthrough with goto to shared start logic so leash limit is preserved
|
||||
|
||||
24. - [ ] Do general bug-scan with claude. Especially think through the FSM logic.
|
||||
24. - [clauded] General bug scan (FSM, power, sensors, storage, comms, RTC, peripherals)
|
||||
- Ran 4 parallel deep-dive reviews across entire codebase. Findings below.
|
||||
- False positives eliminated: override fallthrough (breaks present), soft idle during motor ops (FSM resets timer), JACK_DOWN_TIME uninitialized first move (jack_finish_us always set before use)
|
||||
|
||||
## Suspected Bugs (from item 24 scan)
|
||||
|
||||
28. - [ ] **BUG [CRITICAL]:** `get_is_safe()` hardcoded `return true` — safety sensor completely bypassed
|
||||
- sensors.c:182 — `return true;` with `//return is_safe;` commented out below
|
||||
- All FSM safety checks (STATE_JACK_UP_START, JACK_UP, DRIVE_START_DELAY, DRIVE, DRIVE_END_DELAY, calibration states) are no-ops
|
||||
- Safety break will NOT trigger STATE_UNDO_JACK_START — machine runs through hazard conditions
|
||||
- Debounce logic in sensors_check() still runs but output is discarded
|
||||
|
||||
29. - [ ] **BUG [CRITICAL]:** E-fuse INOM params allow min=0.0 → division by zero
|
||||
- power_mgmt.c:380 — `float I_norm = fabsf(channel->current / I_nominal);`
|
||||
- storage.h EFUSE_INOM_1/2/3 bounds: min=0.0, max=200.0
|
||||
- If param=0 → I_norm=Inf → instant trip on any current (motor won't run)
|
||||
- If param=NaN (flash corruption) → I_norm=NaN → all comparisons false → e-fuse NEVER trips (motor can burn)
|
||||
- Fix: raise min bound to 0.1 or add explicit NaN/zero guard before division
|
||||
|
||||
30. - [ ] **BUG [HIGH]:** No timeout on STATE_UNDO_JACK_START
|
||||
- control_fsm.c:486-493 — waits for `!efuse_get(BRIDGE_JACK)` with no max wait
|
||||
- If jack efuse never cools (hardware fault, thermal runaway), FSM stuck indefinitely
|
||||
- User CAN send FSM_CMD_STOP to escape, but no automatic recovery
|
||||
- Fix: add timeout (e.g. 30-60s) before forcing transition to IDLE with error
|
||||
|
||||
31. - [ ] **BUG [HIGH]:** No e-fuse checks in calibration movement states
|
||||
- control_fsm.c:495-512 — STATE_CALIBRATE_JACK_MOVE and STATE_CALIBRATE_DRIVE_MOVE
|
||||
- Only check get_is_safe() and timer_done(), NOT efuse_get()
|
||||
- Relay outputs (lines 625-640) drive motors regardless of efuse status
|
||||
- Jack cal runs up to 3s, drive cal up to 6s without overcurrent protection
|
||||
- Fix: add efuse_get() check and abort calibration on trip
|
||||
|
||||
32. - [ ] **BUG [HIGH]:** BLE HID scan task missing watchdog registration
|
||||
- bt_hid.c — `bt_hid_scan_task()` never calls `esp_task_wdt_add(NULL)`
|
||||
- Task blocks on `xSemaphoreTake(s_scan_sem, portMAX_DELAY)` — if GAP callback never signals, hangs forever
|
||||
- Unlike rf_433 task (which registers WDT), BT task has no WDT coverage
|
||||
- Fix: add `esp_task_wdt_add(NULL)` and periodic `esp_task_wdt_reset()` (or use timeout on semaphore)
|
||||
|
||||
33. - [ ] **BUG [HIGH]:** ISR sensor queue full → events silently dropped
|
||||
- sensors.c:57 — queue size 16, `xQueueSendFromISR()` return value not checked
|
||||
- If sensors_check() consumer falls behind (4 sensors firing edges), events lost
|
||||
- Encoder counts become inaccurate → drive distance wrong
|
||||
- Fix: check return value, optionally increment a dropped-event counter for diagnostics
|
||||
|
||||
34. - [ ] **BUG [HIGH]:** Params not validated on set, only on commit — FSM reads unvalidated values
|
||||
- storage.c:268-273 — `set_param_value_t()` writes directly to `parameter_table[]` with no bounds check
|
||||
- `validate_param()` only called in `commit_params()` (before flash write)
|
||||
- Between POST and commit, FSM can read out-of-range values (e.g. DRIVE_DIST=999999)
|
||||
- Fix: call `validate_param()` inside `set_param_value_t()`, or at least in comms.c after setting
|
||||
|
||||
35. - [ ] **BUG [MEDIUM]:** Solar FSM timer uninitialized
|
||||
- solar.c:17 — `RTC_DATA_ATTR int64_t timer;` has no initializer
|
||||
- RTC memory may contain garbage on first cold boot before `solar_reset_fsm()` sets it to -1
|
||||
- `solar_run_fsm()` is called (main.c:253) before `solar_reset_fsm()` has run on first boot path
|
||||
- Fix: initialize to -1 in declaration: `RTC_DATA_ATTR int64_t timer = -1;`
|
||||
|
||||
36. - [ ] **BUG [MEDIUM]:** E-fuse param bounds too loose
|
||||
- EFUSE_HEAT_THRESH min=0.0 — allows instant trip on any current draw (storage.h)
|
||||
- EFUSE_INRUSH_US max=10000000 (10s) — allows 10s of unlimited current with no e-fuse protection
|
||||
- Fix: tighten bounds (e.g. HEAT_THRESH min=1.0, INRUSH_US max=2000000)
|
||||
|
||||
37. - [ ] **BUG [MEDIUM]:** No mutex on parameter_table[] — concurrent access from HTTP/UART/FSM tasks
|
||||
- storage.c — `parameter_table[]` read/written by HTTP POST handlers, UART handlers, and FSM task
|
||||
- 32-bit aligned reads/writes are atomic on ESP32, so u16/u32/i16/i32/f32 are safe
|
||||
- f64 (8 bytes) and str16 (16 bytes) could be torn reads — but no f64 or str params are read by FSM in hot path
|
||||
- Severity is low in practice but architecturally unsound
|
||||
|
||||
25. - [ ] Extract pure logic (e-fuse thermal model, param serialization, sensor debounce) into host-testable modules with Unity/CMock?
|
||||
26. - [ ] UART integration test framework: Python runner + ESP-side test commands
|
||||
27. - [ ] Bug: WiFi won't want to connect to STA except at first boot
|
||||
Reference in New Issue
Block a user