 
            On 7/3/25 2:49 AM, Arnaud POULIQUEN wrote:
On 7/2/25 19:00, Tanmay Shah wrote:
On 7/2/25 10:47 AM, Arnaud POULIQUEN wrote:
On 7/2/25 17:23, Tanmay Shah wrote:
On 7/2/25 2:18 AM, Arnaud POULIQUEN wrote:
On 7/1/25 23:19, Tanmay Shah wrote:
On 7/1/25 1:06 PM, Tanmay Shah wrote: > > > On 7/1/25 12:56 PM, Tanmay Shah wrote: >> >> >> On 7/1/25 12:18 PM, Arnaud POULIQUEN wrote: >>> >>> >>> On 7/1/25 17:16, Tanmay Shah wrote: >>>> >>>> >>>> On 7/1/25 3:07 AM, Arnaud POULIQUEN wrote: >>>>> Hi Tanmay, >>>>> >>>>> On 6/27/25 23:29, Tanmay Shah wrote: >>>>>> Hello all, >>>>>> >>>>>> I am implementing remoteproc recovery on attach-detach use case. >>>>>> I have implemented the feature in the platform driver, and it works for >>>>>> boot >>>>>> recovery. >>>>> >>>>> Few questions to better understand your use case. >>>>> >>>>> 1) The linux remoteproc firmware attach to a a remote processor, and you >>>>> generate a crash of the remote processor, right? >>>>> >>>> >>>> Yes correct. >>>> >>>>> 1) How does the remoteprocessor reboot? On a remoteproc request or it >>>>> is an >>>>> autoreboot independent from the Linux core? >>>>> >>>> >>>> It is auto-reboot independent from the linux core. >>>> >>>>> 2) In case of auto reboot, when does the remoteprocessor send an even to >>>>> the >>>>> Linux remoteproc driver ? beforeor after the reset? >>>>> >>>> >>>> Right now, when Remote reboots, it sends crash event to remoteproc driver >>>> after >>>> reboot. >>>> >>>>> 3) Do you expect to get core dump on crash? >>>>> >>>> >>>> No coredump expected as of now, but only recovery. Eventually will >>>> implement >>>> coredump functionality as well. >>>> >>>>>> >>>>>> However, I am stuck at the testing phase. >>>>>> >>>>>> When should firmware report the crash ? After reboot ? or during some >>>>>> kind of >>>>>> crash handler ? >>>>>> >>>>>> So far, I am reporting crash after rebooting remote processor, but it >>>>>> doesn't >>>>>> seem to work i.e. I don't see rpmsg devices created after recovery.> >>>>>> What should be the correct process to test this feature ? How other >>>>>> platforms >>>>>> are testing this? >>>>> >>>>> I have never tested it on ST board. As a first analysis, in case of >>>>> autoreboot >>>>> of the remote processor, it look like you should detach and reattach to >>>>> recover. >>>> >>>> That is what's done from the remoteproc framework. >>>> >>>>> - On detach the rpmsg devices should be unbind >>>>> - On attach the remote processor should request RPmsg channels using >>>>> the NS >>>>> announcement mechanism >>>>> >>>> >>>> Main issue is, Remote firmware needs to wait till all above happens. Then >>>> only >>>> initialize virtio devices. Currently we don't have any way to notify >>>> recovery >>>> progress from linux to remote fw in the remoteproc framework. So I might >>>> have to >>>> introduce some platform specific mechanism in remote firmware to wait for >>>> recovery to complete successfully. >>> >>> I guess the rproc->clean_table contains a copy of the resource table >>> that is >>> reapplied on attach, and the virtio devices should be re-probed, right? >>> >>> During the virtio device probe, the vdev status in the resource table is >>> updated >>> to 7 when virtio is ready to communicate. Virtio should then call >>> rproc_virtio_notify() to inform the remote processor of the status update. >>> At this stage, your remoteproc driver should be able to send a mailbox >>> message >>> to inform the remote side about the recovery completion. >>> >> >> I think I spot the problem now. >> >> Linux side: file: remoteproc_core.c >> rproc_attach_recovery >> __rproc_detach >> cleans up the resource table and re-loads it >> __rproc_attach >> stops and re-starts subdevices >> >> >> Remote side: >> Remote re-boots after crash >> Detects crash happened previously >> notify crash to Linux >> (Linux is executing above flow meanwhile) >> starts creating virtio devices >> **rproc_virtio_create_vdev - parse vring & create vdev device** >> **rproc_virtio_wait_remote_ready - wait for remote ready** [1] >> >> I think Remote should wait on DRIVER_OK bit, before creating virtio devices. >> The temporary solution I implemented was to make sure vrings addresses are >> not 0xffffffff like following: >> >> while(rsc->rpmsg_vring0.da == FW_RSC_U32_ADDR_ANY || >> rsc->rpmsg_vring1.da == FW_RSC_U32_ADDR_ANY) { >> usleep(100); >> metal_cache_invalidate(rsc, rproc->rsc_len); >> } >> >> Above works, but I think better solution is to change sequence where remote >> waits before creating virtio devices. > > I am sorry, I should have said, remote should wait before parsing and > assigning vrings to virtio device. > >> >> >> [1] https://github.com/OpenAMP/open-amp/ >> blob/391671ba24840833d882c1a75c5d7307703b1cf1/lib/remoteproc/ >> remoteproc.c#L994 >>
Actually upon further checking, I think above code is okay. I see that wait_remote_ready is called before vrings are setup on remote fw side.
However, during recovery time on remote side, somehow I still have to implement platform specific wait for vrings to setup correctly.
From linux side, DRIVER_OK bit is set before vrings are setup correctly. Because of that, when remote firmware sets up wrong vring addresses and then rpmsg channels are not created.
I am investigating on this further.
Do you reset the vdev status as requested by the virtio spec? https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#...
Regards, Arnaud
Yes I do. I am actually restoring deafult resource table on firmware side, which will set rpmsg_vdev status to 0.
However, when printing vrings right before wait_remote_ready, I see vrings are not set correctly from linux side:
`vring0 = 0xFFFFFFFF, vring1 = 0xFFFFFFFF`
That makes sense if values corresponds to the initial values of the resource table rproc->clean_table should contain a copy of these initial values.
However, the rproc state was still moved to attach when checked from remoteproc sysfs.
Does the rproc_handle_resources() is called before going back in attached state?
You are right. I think __rproc_attach() isn't calling rproc_handle_resources().
But recovery is supported by other platforms so I think recovery should work without calling rproc_handle_resources().
Right. Having taken a deeper look at the code, it seems that there is an issue. In rproc_reset_rsc_table_on_detach(), we clean the resource table without calling rproc_resource_cleanup().
It seems to me that rproc_reset_rsc_table_on_detach() should not be called in __rproc_detach() but rather in rproc_detach() after calling rproc_resource_cleanup().
Yes that sounds correct. It's long-weekend here in US. So, I will try this next week and update.
Thanks, Tanmay
May be re-storing resource table from firmware side after reboot isn't a good idea. I will try without it.
`cat /sys/class/remoteproc/remoteproc0/state` attached
Somehow the sync between remote fw and linux isn't right.
>> >> Thanks, >> Tanmay >>> Regards >>> Arnaud >>> >>> >>>> >>>>> Regards, >>>>> Arnaud >>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Tanmay >>>> >> >
openamp-rp@lists.openampproject.org
