NB — The numbers might not always work out here, there are missing data from the analyses due to conflicts.
This blog post covers reviewing activity during the first round of reviewing for the CHI 2024 Papers track. A blog post has already been published on the outcomes of those reviews. This blog post instead focuses on the reviewing activity itself — we’re considering the distribution of reviewing load, the relationship between authors and reviewers, the length of reviews. Those kinds of things.
Reviews, authors, reviewers, and ACs
What is the overlap between authors and reviewers? Where there is overlap, how many reviews do people contribute compared with the load created by their submissions? We discount desk rejected submissions, and focus on submissions that completed the first round in a regular fashion (i.e., with an RR or reject recommendation).
Submissions had 1-36 authors (M=4.8, SD=2.4). There were 71 single-author submissions (37% to RR) and 137 submissions with ten or more co-authors (58% to RR). (A logistic regression shows that author count weakly predicts decision, with more authors increasing the chance of an RR decision, p < .001, 95% CI [.05, .11].) The load created by a given author on a given submission is the reciprocal of the number of authors on the submission multiplied by four (as a submission requires four reviews). A paper with one author generates a load of four for the author. A paper with four authors creates a review load of one per author. A paper with 36 authors generates a load of ~0.1 per author.
Computing this load allows us to understand the review load implied by each author, controlling for the fact their co-authors should also be reviewing. We can use this to produce a histogram of load per author, show in Figure 1. Load created ranged between 32.7 reviews and 0.11 of a review (M=1.24, SD=1.27).
As an author, if your load-created is four, it means you need to complete four reviews in order to have provided as many reviews to the pool as your submissions have incurred. Is this what happens? No. Of the list of 14,461 people who were ‘participants’ in the first round of the papers track (either as an author, reviewer, or AC), 9,673 were authors who did not participate in reviewing; 4098 were external reviewers (of whom 2,051, 50%,were also authors) and 690 were ACs (of whom 472, 68%, were also authors). There were 2,037 reviewers for CHI 2024 who did not make a submission.
Associate Chairs undertake half of the reviews on each paper – one internal and one metareview to go with two external ones. This means that an AC produces a mean of 10.7 reviews (SD=2.1), compared to 1.8 for an external (1.2). Figure 2 illustrates the review ‘balance’ for ACs, reviewers, and non-reviewing authors.
Associate chairs have a balance of +5900: they produced 5900 more reviews than they consume. Authors who also reviewed have a balance of +3633. Authors who did not review have a balance of -9932. Ultimately, we rely on ACs producing a large surplus of reviews in order to have a conference programme.
One final thing to consider here is whether the individual authors on a given submission balance out for the submission. So while one author might be in ‘debit’, their co-author might be in ‘credit’ and the net is that a lot of submissions do ‘cover their costs’ with reviews. This does not really work out, though. Without treating this as a constraint optimisation problem, which we’re not going to do, a rough-and-ready indication of whether a given submission was ‘net zero’ on reviews based on the sum of the balances of its individual authors. If there was one author with a significant deficit (say, the leader of a lab) but the other authors had picked up that slack, then we’d expect to see that in the data. Figure 3 shows the distribution of these per-submission sums. The aggregate balances for a submission range between -48.3 and +40.2, with a mean submission balance of -3.5 (SD=8.5). In other words, most submissions do not cover their own reviewing ‘costs’. (Though given that half of all reviews have to be written by ACs, this deficit is effectively ‘designed in’ to the process.)
If you’ve done a lot of reviewing as an AC or an external reviewer, looking across these plots, it might feel like there’s a bit of a tragedy of the commons happening – nearly 10,000 authors who didn’t contribute any reviews. Recent analysis has shown that individual authors are submitting more and more work to CHI. Everyone in peer review will have noticed that it has got more difficult to find willing reviewers over the last few years. But caveats abound, here. Calculating a “deficit” in this manner ignores the many types of contribution that have to be made for the conference to happen. Any calculus cannot incorporate the efforts of the conference organizing committee, SIGCHI Executive Committee or the CHI Steering Committee. These contributions are all essential, and often leave colleagues with less capacity to commit to reviewing service. There is no satisfactory way to capture this in our analysis. Similarly, authors are encouraged to ‘pay back’ their contributions across SIGCHI conferences and HCI journals. It might be that authors reviewed for CSCW, or were an AC there, and ran up ‘surpluses’. This data is also too difficult to capture and bring to bear in an analysis of this kind.
It’s also worth remembering that just because an author didn’t provide a review, it doesn’t mean that they weren’t willing to. Reviewing relies on networks, and with so many first time authors every year, there are always going to be prospective reviewers who aren’t called on to review. There will also be many authors who won’t make appropriate reviewers, too: undergraduates, perhaps, or folks from other disciplines who have been brought into multidisciplinary papers. The main takeaway from all of this is that the conference ACs are doing really sterling work. Chapeau!
Review lengths and quality
There were 14,883 reviews for submissions that went through the complete Round 1 review process (i.e., not desk rejects, withdrawn papers etc). Of these reviews, 4256 were completed by (self-identified) Experts, 8614 by Knowledgable reviewers, 1872 by reviewers with Passing Knowledge, and one by a reviewer with No Knowledge. A breakdown of expertise by recommendation is given below for all but “No Knowledge” (which would not tell much). These data seem to imply that Expert reviewers are more likely to recommend rejection than other reviewers.
Of the reviews, 1575 (11%) were recognised as excellent reviews by ACs. These excellent reviews were produced by 1273 individual reviewers producing 1-5 excellent reviews (M=1.2,SD=0.55). Of the 1575 excellent reviews, 1166 were produced by externals (74%). This probably just represents a difference in propensity to give special recognition to reviews (only seven 1AC reviews were recognised excellent), rather than a meaningful difference in the rate at which different roles produce excellent reviews.
Reviews comprised 8,733,697 words. There were twelve reviews with a review length of zero – most of these were the result of a reviewer or AC pasting their review into the wrong field (e.g., confidential comments, award nominations etc). We discarded these. The rest of the reviews varied in length between 9 and 6903 words (M=593, SD=378). There are 1731 reviews over 1000 words in length (12%), with 407 of these reviews over 1500 (3%).
As you might expect, 1AC metareviews are shorter (M=360,SD=206) than 2ACs’ (M=611, SD=334) and reviewers’ (M=699, SD=412) ‘full’ reviews. Figure 5 shows a stacked histogram of review lengths by reviewer role. There is a long tail! Ignoring 1AC reviews, which are qualitatively different kinds of reviews, Figure 4 show that reviews recognised as excellent by ACs (M=976, SD=486) tend to be longer than regular reviews (M=620, SD=346).
Bonus Chartjunk
No bonus Chartjunk for this blog, with many apologies. Suggestions gratefully received at analytics@chi2024.acm.org.
Datatables
Figure 1 his a histogram. We can’t share the raw data for that, but we can share binned data:
| Author-created review load, range |
n |
| (0.1,0.25] |
227 |
| (0.25,0.5] |
1929 |
| (0.5,0.75] |
2431 |
| (0.75,1] |
3459 |
| (1,2] |
2756 |
| (2,3] |
651 |
| (3,4] |
383 |
| (4,10] |
348 |
| (10,30] |
31 |
Figure 2, which shows the ‘balance’ of each author likewise uses individual data, so we can instead offer some binned data:
Figure 3’s data looks something like this:
Figure 4’s data:
Figure 5’s data:
| Review length, range |
Role |
n |
| (0,300] |
1AC |
1656 |
| (0,300] |
2AC |
482 |
| (0,300] |
Reviewer |
715 |
| (300,600] |
1AC |
1638 |
| (300,600] |
2AC |
1657 |
| (300,600] |
Reviewer |
2950 |
| (600,1000] |
1AC |
328 |
| (600,1000] |
2AC |
1134 |
| (600,1000] |
Reviewer |
2440 |
| (1000,2000] |
1AC |
46 |
| (1000,2000] |
2AC |
392 |
| (1000,2000] |
Reviewer |
1181 |
| (2000,4000] |
1AC |
4 |
| (2000,4000] |
2AC |
20 |
| (2000,4000] |
Reviewer |
84 |
| (4000,10000] |
Reviewer |
4 |