docs & sdkconfig

This commit is contained in:
Thaddeus Hughes
2026-03-12 08:38:39 -05:00
parent fff1295862
commit 35b7074e81
6 changed files with 417 additions and 392 deletions

106
TODO.md
View File

@@ -1,55 +1,55 @@
# SC-F001 Firmware — TODO
- [ ] sdkconfig audit
- [ ] Enable `CONFIG_ESP_TASK_WDT_PANIC=y` (required for OTA rollback reset counter to work on WDT hangs)
- [ ] Verify `CONFIG_FREERTOS_CHECK_STACKOVERFLOW=2` is set (currently canary — confirmed)
- [ ] Verify `CONFIG_ESP_SYSTEM_PANIC_PRINT_REBOOT` is set (currently set — confirmed)
- [ ] Confirm brownout detector level (~2.43V) is appropriate for 12V battery system with regulator
- [ ] Research sdkconfig management best practices; document in CLAUDE.md
- [ ] Fix managed_components: remove unused deps, pin versions in `idf_component.yml`; document in CLAUDE.md
- [ ] OTA rollback via consecutive-reset counter
- [ ] Add `RTC_DATA_ATTR uint8_t reset_counter` — increment on boot, clear after successful health check
- [ ] On counter ≥ 5, call `esp_ota_mark_app_invalid_rollback_and_reboot()`
- [ ] After POST passes and FSM starts, call `esp_ota_mark_app_valid_cancel_rollback()`
- [ ] Decide what "health check passes" means (POST passes? 30s uptime? first successful FSM cycle?)
- [ ] Critical init failures (ADC, storage, log, I2C, FSM, sensors) should `esp_restart()` — this feeds the OTA rollback reset counter
- [ ] Non-critical init failures (wifi, webserver, RF, BT) should log a `LOG_TYPE_ERROR` entry and attempt retry
- [ ] WiFi/BT already have restart paths (`webserver_restart_wifi()`, `bt_hid_resume()`) — wire these into a retry-on-failure path at boot, not just soft idle exit
- [ ] Power-on self-test (POST) — run after all inits, before FSM starts; log results; feed OTA health check
- [ ] ADC: read all 4 channels twice with short delay, flag if frozen or out of range (battery 525V, currents 0150A)
- [ ] I2C: verify TCA9555 responds (read port 0)
- [ ] Flash: write-read-verify test on last sector of storage partition
- [ ] Parameter validation
- [ ] Add per-param bounds to `PARAM_LIST` macro (min, max, flags)
- [ ] NaN/Inf → reset to default; out-of-range → clamp to min/max
- [ ] Enforce validation inside `commit_params()` (covers both `storage_init()` load and `/set` POST)
- [ ] Audit for anywhere params are set without an immediate `commit_params()` call
- [ ] Audit abandoned parameters (e.g. jack current) — add comments marking them deprecated
- [ ] Factory reset: erase entire storage partition (not just params), require 10s button hold, LED indication (flash all → hold solid once triggered)
- [ ] Ensure RTC_DATA_ATTR variables survive panics/WDT resets
- [ ] Verify `sync_unix_us`, `sync_rtc_us`, `rtc_set` (time) are not corrupted by any init path
- [ ] Verify `remaining_distance`, `fsm_error` (FSM state) are not zeroed except by intentional reset
- [ ] Verify `log_head_offset`, `log_tail_offset` stay consistent after crash (no partial writes)
- [ ] Measure flash log write duration (bracket with `esp_timer_get_time()`, compare to WDT timeout of 5s)
- [ ] WiFi STA mode with event-group signaling
- [ ] Try connecting to saved STA network first, fall back to softAP on failure/timeout
- [ ] Add `EventGroupHandle_t` with `WIFI_READY_BIT` (set when STA connected or softAP up) and `BT_READY_BIT` (set when BT scan task starts)
- [ ] Replace blind 500ms `vTaskDelay` on alarm wake with `xEventGroupWaitBits()` + timeout
- [ ] Use same event group in `soft_idle_exit()` path
- [ ] Verify `sensors_init()` placement and ISR safety
- [ ] Confirm `sensors_init()` is safe to call from `app_main()` (research says yes — creates queue + installs ISR service, no task-context dependency)
- [ ] Decide: move to main.c (simpler) or keep in `control_task()` (current) — either way, remove the dead commented-out call in main.c and add a clarifying comment
- [ ] Audit all ISRs are IRAM-safe: no `ESP_LOGx`, `printf`, `malloc`, or flash access — only `xQueueSendFromISR()`
- [ ] Handle `sensors_init()` failure as critical (→ reboot)
- [ ] Confirm whether external RTC crystal can be dropped (device never enters deep sleep now) — if yes, remove `rtc_xtal_init()` and related sdkconfig entries; if no, document why it must stay
- [ ] Remove `rtc_wakeup_cause()` call (informational only, no longer needed)
- [ ] Confirm `rtc_check_shutdown_timer()` uses signed subtraction — then remove the esp_timer overflow TODO comment (int64_t overflows after 292K years)
- [ ] Extract pure logic (e-fuse thermal model, param serialization, sensor debounce) into host-testable modules with Unity/CMock
- [ ] UART integration test framework: Python runner + ESP-side test commands
- [test] Logtool GUI output (matplotlib)
- [test] Verify naming convention adherence across codebase
- [test] Verify WiFi SSID rename triggers comms reboot
- [ ] Documentation restructure
- [ ] Move project/hardware documentation from CLAUDE.md → README.md; keep CLAUDE.md for AI-specific instructions and conventions only
- [ ] Document all FreeRTOS tasks and priorities in README.md
- [ ] Add terse comments to FSM state transitions in `control_fsm.c` (focus on "why", not "what")
1. - [clauded] sdkconfig audit
- [clauded] Enable `CONFIG_ESP_TASK_WDT_PANIC=y` — added to sdkconfig.defaults and sdkconfig
- [clauded] Verify `CONFIG_FREERTOS_CHECK_STACKOVERFLOW=2` — confirmed canary method active
- [clauded] Verify `CONFIG_ESP_SYSTEM_PANIC_PRINT_REBOOT` — confirmed active
- [clauded] Confirm brownout detector level ~2.43V is correct (ESP32 rail protection; battery low-V handled by FSM's `LOW_PROTECTION_V`)
- [clauded] Research sdkconfig management best practices documented in CLAUDE.md "sdkconfig Management" section
2. - [ ] Fix managed_components: remove unused deps, pin versions in `idf_component.yml`; document in CLAUDE.md
3. - [ ] OTA rollback via consecutive-reset counter
- [ ] Add `RTC_DATA_ATTR uint8_t reset_counter` — increment on boot, clear after successful health check
- [ ] On counter ≥ 5, call `esp_ota_mark_app_invalid_rollback_and_reboot()`
- [ ] After POST passes and FSM starts, call `esp_ota_mark_app_valid_cancel_rollback()`
- [ ] Decide what "health check passes" means (POST passes? 30s uptime? first successful FSM cycle?)
4. - [ ] Critical init failures (ADC, storage, log, I2C, FSM, sensors) should `esp_restart()` — this feeds the OTA rollback reset counter
5. - [ ] Non-critical init failures (wifi, webserver, RF, BT) should log a `LOG_TYPE_ERROR` entry and attempt retry
- [ ] WiFi/BT already have restart paths (`webserver_restart_wifi()`, `bt_hid_resume()`) — wire these into a retry-on-failure path at boot, not just soft idle exit
6. - [ ] Power-on self-test (POST) — run after all inits, before FSM starts; log results; feed OTA health check
- [ ] ADC: read all 4 channels twice with short delay, flag if frozen or out of range (battery 525V, currents 0150A)
- [ ] I2C: verify TCA9555 responds (read port 0)
- [ ] Flash: write-read-verify test on last sector of storage partition
7. - [ ] Parameter validation
- [ ] Add per-param bounds to `PARAM_LIST` macro (min, max, flags)
- [ ] NaN/Inf → reset to default; out-of-range → clamp to min/max
- [ ] Enforce validation inside `commit_params()` (covers both `storage_init()` load and `/set` POST)
- [ ] Audit for anywhere params are set without an immediate `commit_params()` call
- [ ] Audit abandoned parameters (e.g. jack current) — add comments marking them deprecated
8. - [ ] Factory reset: erase entire storage partition (not just params), require 10s button hold, LED indication (flash all → hold solid once triggered)
9. - [ ] Ensure RTC_DATA_ATTR variables survive panics/WDT resets
- [ ] Verify `sync_unix_us`, `sync_rtc_us`, `rtc_set` (time) are not corrupted by any init path
- [ ] Verify `remaining_distance`, `fsm_error` (FSM state) are not zeroed except by intentional reset
- [ ] Verify `log_head_offset`, `log_tail_offset` stay consistent after crash (no partial writes)
10. - [ ] Measure flash log write duration (bracket with `esp_timer_get_time()`, compare to WDT timeout of 5s)
11. - [ ] WiFi STA mode with event-group signaling
- [ ] Try connecting to saved STA network first, fall back to softAP on failure/timeout
- [ ] Add `EventGroupHandle_t` with `WIFI_READY_BIT` (set when STA connected or softAP up) and `BT_READY_BIT` (set when BT scan task starts)
- [ ] Replace blind 500ms `vTaskDelay` on alarm wake with `xEventGroupWaitBits()` + timeout
- [ ] Use same event group in `soft_idle_exit()` path
12. - [ ] Verify `sensors_init()` placement and ISR safety
- [ ] Confirm `sensors_init()` is safe to call from `app_main()` (research says yes — creates queue + installs ISR service, no task-context dependency)
- [ ] Decide: move to main.c (simpler) or keep in `control_task()` (current) — either way, remove the dead commented-out call in main.c and add a clarifying comment
- [ ] Audit all ISRs are IRAM-safe: no `ESP_LOGx`, `printf`, `malloc`, or flash access — only `xQueueSendFromISR()`
- [ ] Handle `sensors_init()` failure as critical (→ reboot)
13. - [ ] Confirm whether external RTC crystal can be dropped (device never enters deep sleep now) — if yes, remove `rtc_xtal_init()` and related sdkconfig entries; if no, document why it must stay
14. - [ ] Remove `rtc_wakeup_cause()` call (informational only, no longer needed)
15. - [ ] Confirm `rtc_check_shutdown_timer()` uses signed subtraction — then remove the esp_timer overflow TODO comment (int64_t overflows after 292K years)
16. - [ ] Extract pure logic (e-fuse thermal model, param serialization, sensor debounce) into host-testable modules with Unity/CMock
17. - [ ] UART integration test framework: Python runner + ESP-side test commands
18. - [test] Logtool GUI output (matplotlib)
19. - [test] Verify naming convention adherence across codebase
20. - [test] Verify WiFi SSID rename triggers comms reboot
21. - [clauded] Documentation restructure
- [clauded] Move project/hardware documentation from CLAUDE.md → README.md; keep CLAUDE.md for AI-specific instructions and conventions only
- [clauded] Document all FreeRTOS tasks and priorities in README.md
- [clauded] Add terse comments to FSM state transitions in `control_fsm.c` (focus on "why", not "what")