Cite and discuss previous work more properly

Reviewer 2 (part 1 of 6)

The authors propose a method to use deeplabcut to extract whisker position and then provide a ttl pulse for real-time manipulations. This latter facet may be the more innovative side, although several approaches already exist. The manuscript in its current version falls below the bar, but improvements could be made to greatly improve the product. As this is not my field, upon conclusion of my review I did a pubmed search for real time whisker tracking. In ~30 minutes, my eyes were opened. Perkon et al 2011 and the several subsequent papers that use it (Ma 2017, Rahmati 2014, Romano 2018, 2020) achieves greater temporal resolution, in freely moving mice, and of all whiskers. This BIOTACT tool has been used many times and is freely available on github. Ma demonstrates 2ms resolution with the same camera used here There is no mention of it. Others like Petersen et al plos comp bio 2020 are also not mentioned. While an interesting project, the manuscript is reporting on a new type of horse carriage after the car has been invented.

My eyes were opened for the arrogant way in which this referee writes, but it cannot be helped.

I am trying to take a look at each of these publications. Does anyone know any of them?

The reviewer also seems to be cited on this point:

openPose - while most often used with humans, should be cited when referring to 'behavioral monitoring algorithms'. More importantly, JAABA, MOTR is missing - these constitute a vital pillar for supervised algorithm to rodent behavioral research.

Another part:

is the 20ms or so reported that PixyCam provides that much worse than 11.3ms? in the example shown (1D), the frames are separated by 11.5ms and 12.8ms? I do not see how this detection, delayed by potentially less than 8ms (though likely more) makes a difference. I am open and eager to be proven wrong, but it has to be in the paper. Perhaps part of the missing quantification of performance can compare pixy to dlc, showing this method to have a greatly reduced failure rate. A new tool must in some way supplant the old one. Similarly, I don't see any sort of rivalring the excelent job done with 2ms resolution of Sehara 2019. I don't particularly think that the camera used here not specialized? it is not something most people own, and the overall cost, including high quality GPU is in the same ballpark as a neuromorphic camera (DVXplorer Lite is 2200 Euro). Again, I'm eager to find out why I would choose this approach over an existing one.

"DNN-based image-processing approaches are more versatile, more flexible, and allows researchers to track multiple body parts simultaneously. " it is unclear to me how this general quality of DNN's benefits this method. How does flexibility make this emthod better than previous methods? How it makes this an advancement over the Sehara paper with which the authors are well familiar with. I could see a similar method being used for lick or reach detection, but there is no evidence that it would work and it is not mentioned in the paper. Much like Nashaat et al., this demonstration could greatly increase the impact of this research. Alternatively, DLC works regardless of head fixation. This would be a fantastic tool if the flexibility were extended to a head free mouse - even if the arena was quite small. It's questions like this that DLC was designed to address, rather than a new way to do something old.

Said more simply: make it clear to your reader what the advance is. As is, it reads as "we found yet another way to do something that is already possible". What are the benefits of this approach over the others?

Also discuss what is important of achieving 11.3 ms rather than "20 ms" of Pixy (although I am in doubt of this figure)

Another part: > is the 20ms or so reported that PixyCam provides that much worse than 11.3ms? in the example shown (1D), the frames are separated by 11.5ms and 12.8ms? I do not see how this detection, delayed by potentially less than 8ms (though likely more) makes a difference. I am open and eager to be proven wrong, but it has to be in the paper. Perhaps part of the missing quantification of performance can compare pixy to dlc, showing this method to have a greatly reduced failure rate. A new tool must in some way supplant the old one. Similarly, I don't see any sort of rivalring the excelent job done with 2ms resolution of Sehara 2019. I don't particularly think that the camera used here not specialized? it is not something most people own, and the overall cost, including high quality GPU is in the same ballpark as a neuromorphic camera (DVXplorer Lite is 2200 Euro). Again, I'm eager to find out why I would choose this approach over an existing one. > > "DNN-based image-processing approaches are more versatile, more flexible, and allows researchers to track multiple body parts simultaneously. " it is unclear to me how this general quality of DNN's benefits this method. How does flexibility make this emthod better than previous methods? How it makes this an advancement over the Sehara paper with which the authors are well familiar with. I could see a similar method being used for lick or reach detection, but there is no evidence that it would work and it is not mentioned in the paper. Much like Nashaat et al., this demonstration could greatly increase the impact of this research. Alternatively, DLC works regardless of head fixation. This would be a fantastic tool if the flexibility were extended to a head free mouse - even if the arena was quite small. It's questions like this that DLC was designed to address, rather than a new way to do something old. > > Said more simply: make it clear to your reader what the advance is. As is, it reads as "we found yet another way to do something that is already possible". What are the benefits of this approach over the others? Also discuss what is important of achieving 11.3 ms rather than "20 ms" of Pixy (although I am in doubt of this figure)

Reference	URL	Type	Notes
Voigts et al., 2008	https://journals.physiology.org/doi/full/10.1152/jn.00012.2008		Earlier attempt
Perkon et al., 2011	https://journals.physiology.org/doi/pdf/10.1152/jn.00764.2010	ViSA/BIOTACT	Probably the original paper (from Diamond lab)
Clack et al., 2012	https://doi.org/10.1371/journal.pcbi.1002591	Janelia Whisk	35 fps
Kabra, Robie et al., 2012	https://doi.org/10.1038/nmeth.2281	JAABA	generic automatic annotation tool, http://jaaba.sourceforge.net/
Ohayon et al., 2013	https://doi.org/10.1016/j.jneumeth.2013.05.013	Motr	Multi-day tracking of identified animals, http://motr.janelia.org/
Ma et al., 2017	https://doi.org/10.1109/SAMOS.2017.8344621	ViSA/BIOTACT	From De Zeeuw lab; use of grid computing, up to 1 ms processing latency; computation latency only, specialized to whisker detection
Romano et al., 2018	https://doi.org/10.7554/eLife.38852.001	ViSA/BIOTACT	from Jochen K Spanke lab (Italy)
Cao et al., 2019	https://arxiv.org/abs/1812.08008	OpenPose	2D real-time tracking for humans; makes use of the CNN (VGG-19)
Betting et al., 2020	https://doi.org/10.3389/fncel.2020.588445	WhiskEras	From De Zeeuw lab, offline, claims to be faster than Janelia Whisk
Petersen et al., 2020	https://doi.org/10.1371/journal.pcbi.1007402	WhiskerMan	Michaella Loft lab; 3D tracking with curvature

Keisuke Sehara referenced this issue from a commit 3 years ago

Update Discussion section...

Keisuke Sehara referenced this issue from a commit 3 years ago

update Introduction...

seharak closed 3 years ago

inserted reference by the time this commit is generated

#5 Cite and discuss previous work more properly