11 KiB
11 KiB
SC-F001 Firmware — TODO
-
- [clauded] sdkconfig audit
- [clauded] Enable
CONFIG_ESP_TASK_WDT_PANIC=y— added to sdkconfig.defaults and sdkconfig - [clauded] Verify
CONFIG_FREERTOS_CHECK_STACKOVERFLOW=2— confirmed canary method active - [clauded] Verify
CONFIG_ESP_SYSTEM_PANIC_PRINT_REBOOT— confirmed active - [clauded] Confirm brownout detector level — ~2.43V is correct (ESP32 rail protection; battery low-V handled by FSM's
LOW_PROTECTION_V) - [clauded] Research sdkconfig management best practices — documented in CLAUDE.md "sdkconfig Management" section
-
- [clauded] Fix managed_components: removed unused
littlefsandtca95x5deps, pinnedmdnsto~1.9.1, bumped IDF min to>=5.0; documented in CLAUDE.md
- [clauded] Fix managed_components: removed unused
-
- [clauded] OTA rollback via consecutive-reset counter
- [clauded] Add
RTC_DATA_ATTR uint8_t ota_reset_counter— incremented on panic/WDT resets, cleared on power-on/ext reset - [clauded] On counter ≥ 5, call
esp_ota_mark_app_invalid_rollback_and_reboot() - [clauded] After POST passes and FSM starts, call
esp_ota_mark_app_valid_cancel_rollback()and clear counter - [clauded] Health check = POST passes + all critical inits + FSM task started + non-critical inits attempted
-
- [clauded] Critical init failures (ADC, storage, log, I2C, FSM, UART) →
init_critical()retries 3×, thenesp_restart()
- [clauded] Critical init failures (ADC, storage, log, I2C, FSM, UART) →
-
- [clauded] Non-critical init failures (RF, BT, webserver) → log error, continue booting
- [clauded] WiFi/BT/RF retry once on init failure at boot (200ms delay for RF/BT, 500ms for WiFi), then log and continue
-
- [clauded] Power-on self-test (POST) —
init_critical()wrapper + dedicated POST checks after init - [clauded] ADC:
adc_post()reads all 4 channels twice with 5ms delay, warns if frozen - [clauded] I2C:
i2c_post()verifies TCA9555 responds (read port 0) - [clauded] Flash:
storage_post()write-read-verify on last sector of storage partition
- [clauded] Power-on self-test (POST) —
-
- [clauded] Parameter validation
- [clauded] Add per-param bounds to
PARAM_LISTmacro (min, max) — extended PARAM_DEF 6-arg macro - [clauded] NaN/Inf → reset to default; out-of-range → clamp to min/max —
validate_param()in storage.c - [clauded] Enforce validation in
storage_init()(after flash load) andcommit_params()(before flash write) - [clauded] Audit
set_param_value_tcalls outside comms.c — deleted dead code:rf_433_set_keycode(),FSM_CMD_CALIBRATE_*_FINISHhandlers + FSM cases +fsm_set_cal_val()(web JS does cal math client-side, commits via standard param POST) - [clauded] Audit abandoned parameters —
JACK_IS_DOWNmarked deprecated (may duplicateJACK_I_DOWN);BOOT_TIMEis informational-only
-
- [clauded] Factory reset: erases params + log + post_test partitions, requires 10s button hold on cold boot, LEDs flash during hold → solid when triggered
-
- [clauded] Ensure RTC_DATA_ATTR variables survive panics/WDT resets
- [clauded] Verified
sync_unix_us,sync_rtc_us,rtc_set— no init path zeroes them;rtc_restore_time()recovers via RTC HW counter - [clauded] Verified
remaining_distance,fsm_error—fsm_init()does not touch them; only cleared by explicit user action - [clauded] Verified
log_head_offset,log_tail_offset—log_init()always recovers from flash scan; RTC_DATA_ATTR is historical/harmless
-
- [clauded] Measure flash log write duration —
test_log_write_timing()in log_test.c, runs 200 iterations of 39-byte writes, reports min/max/avg/sector-crossing times, compares to 5s WDT
- [clauded] Measure flash log write duration —
-
- [clauded] WiFi STA mode with event-group signaling
- [clauded] STA-first with softAP fallback was already implemented in
start_wifi() - [clauded] Added
EventGroupHandle_t comms_event_groupincomms_events.hwithWIFI_READY_BIT/BT_READY_BIT - [clauded] Replaced blind 500ms
vTaskDelayon alarm wake withxEventGroupWaitBits(COMMS_ALL_BITS, 5s timeout) - [clauded]
soft_idle_exit()→webserver_restart_wifi()/bt_hid_resume()set bits;webserver_stop()/bt_hid_stop()clear bits - [clauded] Bits set even on permanent init failure so alarm-wake never blocks forever
-
- [clauded] Verify
sensors_init()placement and ISR safety - [clauded] Moved
sensors_init()to main.c asinit_critical("SENSORS", sensors_init)— runs before FSM - [clauded] Removed dead commented-out
sensors_init()/sensors_stop()from sensors.c - [clauded] Audited ISR:
sensor_isr_handleris IRAM_ATTR, uses onlyesp_timer_get_time()(IRAM-safe),gpio_get_level(),xQueueSendFromISR()— no logging/malloc/flash - [clauded]
sensors_init()failure is now critical (→ reboot viainit_critical)
- [clauded] Verify
-
- [clauded] External 32kHz crystal not needed (deep sleep disabled, soft idle instead) — removed crystal config from sdkconfig.defaults;
rtc_xtal_init()already a no-op; crystal remains on PCB but unused
- [clauded] External 32kHz crystal not needed (deep sleep disabled, soft idle instead) — removed crystal config from sdkconfig.defaults;
-
- [clauded] Removed
rtc_wakeup_cause()— was unused (informational only, never called)
- [clauded] Removed
-
- [clauded] Confirmed
rtc_check_shutdown_timer()uses unsignedTickType_tsubtraction — wraps correctly; removed esp_timer overflow TODO comment from main.c
- [clauded] Confirmed
-
- [test] Logtool GUI output (matplotlib)
-
- [test] Verify naming convention adherence across codebase
-
- [test] Verify WiFi SSID rename triggers comms reboot
-
- [clauded] Documentation restructure
- [clauded] Move project/hardware documentation from CLAUDE.md → README.md; keep CLAUDE.md for AI-specific instructions and conventions only
- [clauded] Document all FreeRTOS tasks and priorities in README.md
- [clauded] Add terse comments to FSM state transitions in
control_fsm.c(focus on "why", not "what")
-
- [clauded] Fix compile warnings — unused vars (uart_comms.c, rf_433.c), const-correctness (log_write signatures), fallthrough annotation (control_fsm.c)
-
- [clauded] NVS is required: WiFi blob stores RF cal data (CONFIG_ESP_WIFI_NVS_ENABLED), Bluedroid stores bonding/GATT cache unconditionally, bt_hid.c stores last-connected BDA. Cannot remove nvs_flash_init().
-
- [clauded] NVS vs custom params: NVS serves WiFi/BT internals + BDA storage; custom flash partition serves app params with CRC32 protection. Different purposes, no consolidation needed.
-
- [clauded] BUG FIX:
FSM_CMD_STARTfallthrough was overwritingthis_move_dist = MIN(...)with unconditionalDRIVE_DIST— replaced fallthrough with goto to shared start logic so leash limit is preserved
- [clauded] BUG FIX:
-
- [clauded] General bug scan (FSM, power, sensors, storage, comms, RTC, peripherals)
- Ran 4 parallel deep-dive reviews across entire codebase. Findings below.
- False positives eliminated: override fallthrough (breaks present), soft idle during motor ops (FSM resets timer), JACK_DOWN_TIME uninitialized first move (jack_finish_us always set before use)
Suspected Bugs (from item 24 scan)
-
- BUG [CRITICAL]:
get_is_safe()hardcodedreturn true— safety sensor completely bypassed - sensors.c:182 —
return true;with//return is_safe;commented out below - All FSM safety checks (STATE_JACK_UP_START, JACK_UP, DRIVE_START_DELAY, DRIVE, DRIVE_END_DELAY, calibration states) are no-ops
- Safety break will NOT trigger STATE_UNDO_JACK_START — machine runs through hazard conditions
- Debounce logic in sensors_check() still runs but output is discarded
- BUG [CRITICAL]:
-
- BUG [CRITICAL]: E-fuse INOM params allow min=0.0 → division by zero
- power_mgmt.c:380 —
float I_norm = fabsf(channel->current / I_nominal); - storage.h EFUSE_INOM_1/2/3 bounds: min=0.0, max=200.0
- If param=0 → I_norm=Inf → instant trip on any current (motor won't run)
- If param=NaN (flash corruption) → I_norm=NaN → all comparisons false → e-fuse NEVER trips (motor can burn)
- Fix: raise min bound to 0.1 or add explicit NaN/zero guard before division
-
- BUG [HIGH]: No timeout on STATE_UNDO_JACK_START
- control_fsm.c:486-493 — waits for
!efuse_get(BRIDGE_JACK)with no max wait - If jack efuse never cools (hardware fault, thermal runaway), FSM stuck indefinitely
- User CAN send FSM_CMD_STOP to escape, but no automatic recovery
- Fix: add timeout (e.g. 30-60s) before forcing transition to IDLE with error
-
- BUG [HIGH]: No e-fuse checks in calibration movement states
- control_fsm.c:495-512 — STATE_CALIBRATE_JACK_MOVE and STATE_CALIBRATE_DRIVE_MOVE
- Only check get_is_safe() and timer_done(), NOT efuse_get()
- Relay outputs (lines 625-640) drive motors regardless of efuse status
- Jack cal runs up to 3s, drive cal up to 6s without overcurrent protection
- Fix: add efuse_get() check and abort calibration on trip
-
- BUG [HIGH]: BLE HID scan task missing watchdog registration
- bt_hid.c —
bt_hid_scan_task()never callsesp_task_wdt_add(NULL) - Task blocks on
xSemaphoreTake(s_scan_sem, portMAX_DELAY)— if GAP callback never signals, hangs forever - Unlike rf_433 task (which registers WDT), BT task has no WDT coverage
- Fix: add
esp_task_wdt_add(NULL)and periodicesp_task_wdt_reset()(or use timeout on semaphore)
-
- BUG [HIGH]: ISR sensor queue full → events silently dropped
- sensors.c:57 — queue size 16,
xQueueSendFromISR()return value not checked - If sensors_check() consumer falls behind (4 sensors firing edges), events lost
- Encoder counts become inaccurate → drive distance wrong
- Fix: check return value, optionally increment a dropped-event counter for diagnostics
-
- BUG [HIGH]: Params not validated on set, only on commit — FSM reads unvalidated values
- storage.c:268-273 —
set_param_value_t()writes directly toparameter_table[]with no bounds check validate_param()only called incommit_params()(before flash write)- Between POST and commit, FSM can read out-of-range values (e.g. DRIVE_DIST=999999)
- Fix: call
validate_param()insideset_param_value_t(), or at least in comms.c after setting
-
- BUG [MEDIUM]: Solar FSM timer uninitialized
- solar.c:17 —
RTC_DATA_ATTR int64_t timer;has no initializer - RTC memory may contain garbage on first cold boot before
solar_reset_fsm()sets it to -1 solar_run_fsm()is called (main.c:253) beforesolar_reset_fsm()has run on first boot path- Fix: initialize to -1 in declaration:
RTC_DATA_ATTR int64_t timer = -1;
-
- BUG [MEDIUM]: E-fuse param bounds too loose
- EFUSE_HEAT_THRESH min=0.0 — allows instant trip on any current draw (storage.h)
- EFUSE_INRUSH_US max=10000000 (10s) — allows 10s of unlimited current with no e-fuse protection
- Fix: tighten bounds (e.g. HEAT_THRESH min=1.0, INRUSH_US max=2000000)
-
- BUG [MEDIUM]: No mutex on parameter_table[] — concurrent access from HTTP/UART/FSM tasks
- storage.c —
parameter_table[]read/written by HTTP POST handlers, UART handlers, and FSM task - 32-bit aligned reads/writes are atomic on ESP32, so u16/u32/i16/i32/f32 are safe
- f64 (8 bytes) and str16 (16 bytes) could be torn reads — but no f64 or str params are read by FSM in hot path
- Severity is low in practice but architecturally unsound
-
- Extract pure logic (e-fuse thermal model, param serialization, sensor debounce) into host-testable modules with Unity/CMock?
-
- UART integration test framework: Python runner + ESP-side test commands
-
- Bug: WiFi won't want to connect to STA except at first boot