CHI 2024 — Papers track, post-round one outcomes report

NB — The numbers might not always work out here, there are missing data from the analyses due to conflicts.

This blog post covers reviewing activity during the first round of reviewing for the CHI 2024 Papers track. A blog post has already been published on the outcomes of those reviews. This blog post instead focuses on the reviewing activity itself — we’re considering the distribution of reviewing load, the relationship between authors and reviewers, the length of reviews. Those kinds of things.

Reviews, authors, reviewers, and ACs

What is the overlap between authors and reviewers? Where there is overlap, how many reviews do people contribute compared with the load created by their submissions? We discount desk rejected submissions, and focus on submissions that completed the first round in a regular fashion (i.e., with an RR or reject recommendation).

Submissions had 1-36 authors (M=4.8, SD=2.4). There were 71 single-author submissions (37% to RR) and 137 submissions with ten or more co-authors (58% to RR). (A logistic regression shows that author count weakly predicts decision, with more authors increasing the chance of an RR decision, p < .001, 95% CI [.05, .11].) The load created by a given author on a given submission is the reciprocal of the number of authors on the submission multiplied by four (as a submission requires four reviews). A paper with one author generates a load of four for the author. A paper with four authors creates a review load of one per author. A paper with 36 authors generates a load of ~0.1 per author.

Computing this load allows us to understand the review load implied by each author, controlling for the fact their co-authors should also be reviewing. We can use this to produce a histogram of load per author, show in Figure 1. Load created ranged between 32.7 reviews and 0.11 of a review (M=1.24, SD=1.27).

A histogram showing the review load created by each author of a CHI 2024 papers submission. There are a very small number of authors creating a tiny load of ~0.1, and a very small number creating a high load of >10. Most authors create a total load of between 0.5 and 1.2

Figure 1: What is the review load created by each author submitting to CHI 2024 Papers? It depends on how many co-authors they have!

As an author, if your load-created is four, it means you need to complete four reviews in order to have provided as many reviews to the pool as your submissions have incurred. Is this what happens? No. Of the list of 14,461 people who were ‘participants’ in the first round of the papers track (either as an author, reviewer, or AC), 9,673 were authors who did not participate in reviewing; 4098 were external reviewers (of whom 2,051, 50%,were also authors) and 690 were ACs (of whom 472, 68%, were also authors). There were 2,037 reviewers for CHI 2024 who did not make a submission.

Associate Chairs undertake half of the reviews on each paper – one internal and one metareview to go with two external ones. This means that an AC produces a mean of 10.7 reviews (SD=2.1), compared to 1.8 for an external (1.2). Figure 2 illustrates the review ‘balance’ for ACs, reviewers, and non-reviewing authors.

A frequency polygon showing the reviewing balance (review load created vs reviews completed) of authors across roles. The plot shows four roles, AC, Non-Reviewer Author, Reviewer and Reviewer-Author. The chart peaks at slightly less than zero because most authors do not complete a review. ACs take a significant load, and the chart shows a peak for ACs at +10.

Figure 2: What is the ‘balance’ of review load created and reviews completed? This is breaking it down by role. Unsurprisingly, ACs (who complete 50% of reviews) have strong positive reviewing balances.

Associate chairs have a balance of +5900: they produced 5900 more reviews than they consume. Authors who also reviewed have a balance of +3633. Authors who did not review have a balance of -9932. Ultimately, we rely on ACs producing a large surplus of reviews in order to have a conference programme.

One final thing to consider here is whether the individual authors on a given submission balance out for the submission. So while one author might be in ‘debit’, their co-author might be in ‘credit’ and the net is that a lot of submissions do ‘cover their costs’ with reviews. This does not really work out, though. Without treating this as a constraint optimisation problem, which we’re not going to do, a rough-and-ready indication of whether a given submission was ‘net zero’ on reviews based on the sum of the balances of its individual authors. If there was one author with a significant deficit (say, the leader of a lab) but the other authors had picked up that slack, then we’d expect to see that in the data. Figure 3 shows the distribution of these per-submission sums. The aggregate balances for a submission range between -48.3 and +40.2, with a mean submission balance of -3.5 (SD=8.5). In other words, most submissions do not cover their own reviewing ‘costs’. (Though given that half of all reviews have to be written by ACs, this deficit is effectively ‘designed in’ to the process.)

A histogram showing the net reviewing balance of a submission. There are extremes at -50 and +30 (i.e., high debt, high surplus), but there's a big spike between -10 and 0; most submissions are falling in this range of reviewing debt.

Figure 3: What are the sums of author balances for each submission? This histogram shows that it’s around -3.5, which is not a surprise; half of all reviewing is done by ACs, and most authors are not ACs. It therefore makes sense that most submissions have a negative review balance when looking across their authors.

If you’ve done a lot of reviewing as an AC or an external reviewer, looking across these plots, it might feel like there’s a bit of a tragedy of the commons happening – nearly 10,000 authors who didn’t contribute any reviews. Recent analysis has shown that individual authors are submitting more and more work to CHI. Everyone in peer review will have noticed that it has got more difficult to find willing reviewers over the last few years. But caveats abound, here. Calculating a “deficit” in this manner ignores the many types of contribution that have to be made for the conference to happen. Any calculus cannot incorporate the efforts of the conference organizing committee, SIGCHI Executive Committee or the CHI Steering Committee. These contributions are all essential, and often leave colleagues with less capacity to commit to reviewing service. There is no satisfactory way to capture this in our analysis. Similarly, authors are encouraged to ‘pay back’ their contributions across SIGCHI conferences and HCI journals. It might be that authors reviewed for CSCW, or were an AC there, and ran up ‘surpluses’. This data is also too difficult to capture and bring to bear in an analysis of this kind.

It’s also worth remembering that just because an author didn’t provide a review, it doesn’t mean that they weren’t willing to. Reviewing relies on networks, and with so many first time authors every year, there are always going to be prospective reviewers who aren’t called on to review. There will also be many authors who won’t make appropriate reviewers, too: undergraduates, perhaps, or folks from other disciplines who have been brought into multidisciplinary papers. The main takeaway from all of this is that the conference ACs are doing really sterling work. Chapeau!

Review lengths and quality

There were 14,883 reviews for submissions that went through the complete Round 1 review process (i.e., not desk rejects, withdrawn papers etc). Of these reviews, 4256 were completed by (self-identified) Experts, 8614 by Knowledgable reviewers, 1872 by reviewers with Passing Knowledge, and one by a reviewer with No Knowledge. A breakdown of expertise by recommendation is given below for all but “No Knowledge” (which would not tell much). These data seem to imply that Expert reviewers are more likely to recommend rejection than other reviewers.

Expertise	Recommendation	n	Proportion (by Expertise)
Expert	A	159	3.7%
Expert	ARR	392	9.2%
Expert	RR	940	22.1%
Expert	RRX	1348	31.7%
Expert	X	1417	33.3%
Knowledgeable	A	339	3.94%
Knowledgeable	ARR	844	9.80%
Knowledgeable	RR	2220	25.77%
Knowledgeable	X	2237	25.97%
Knowledgeable	RRX	2974	34.53%
Passing knowledge	A	58	3.1%
Passing knowledge	ARR	245	13.1%
Passing knowledge	X	390	20.8%
Passing knowledge	RR	460	24.6%
Passing knowledge	RRX	719	38.4%

Of the reviews, 1575 (11%) were recognised as excellent reviews by ACs. These excellent reviews were produced by 1273 individual reviewers producing 1-5 excellent reviews (M=1.2,SD=0.55). Of the 1575 excellent reviews, 1166 were produced by externals (74%). This probably just represents a difference in propensity to give special recognition to reviews (only seven 1AC reviews were recognised excellent), rather than a meaningful difference in the rate at which different roles produce excellent reviews.

Reviews comprised 8,733,697 words. There were twelve reviews with a review length of zero – most of these were the result of a reviewer or AC pasting their review into the wrong field (e.g., confidential comments, award nominations etc). We discarded these. The rest of the reviews varied in length between 9 and 6903 words (M=593, SD=378). There are 1731 reviews over 1000 words in length (12%), with 407 of these reviews over 1500 (3%).

As you might expect, 1AC metareviews are shorter (M=360,SD=206) than 2ACs’ (M=611, SD=334) and reviewers’ (M=699, SD=412) ‘full’ reviews. Figure 5 shows a stacked histogram of review lengths by reviewer role. There is a long tail! Ignoring 1AC reviews, which are qualitatively different kinds of reviews, Figure 4 show that reviews recognised as excellent by ACs (M=976, SD=486) tend to be longer than regular reviews (M=620, SD=346).

A histogram showing the number of words in a review on the x-axis and number of reviews on the y-axis. The histogram shows the length of reviews split by reviews that were rated as excellent and reviews that were not rated as excellent. There are meny fewer excellent reviews than regular reviews, so the distribution is rendered much smaller on the screen. Two line representing the means of these group help to make it clear what the group means are.

Figure 4: Excellent reviews tend to be quite a bit longer than regular reviews. (Here we’re only looking at reviews that are 2500 words in length or shorter.)

A histogram showing the number of words in a review on the x-axis and number of reviews on the y-axis. The histogram shows three roles, 1AC, 2AC and Reviewer. Their data are stacked in thsi histogram, weith a clear and strong peak at 600 words. The count is well into tailing off by 1500 words.

Figure 5: How long are reviews? About 600 words, give or take. This stacked histogram shows how long reviews from 1ACs, 2ACs and external reviewers are. The metareviews of 1ACs are qualitatively different in style, so it perhaps makes sense that these are shorter. There are some reviews as long as papers amongst them all, too.

Bonus Chartjunk

No bonus Chartjunk for this blog, with many apologies. Suggestions gratefully received at analytics@chi2024.acm.org.

Datatables

Figure 1 his a histogram. We can’t share the raw data for that, but we can share binned data:

Author-created review load, range	n
(0.1,0.25]	227
(0.25,0.5]	1929
(0.5,0.75]	2431
(0.75,1]	3459
(1,2]	2756
(2,3]	651
(3,4]	383
(4,10]	348
(10,30]	31

Figure 2, which shows the ‘balance’ of each author likewise uses individual data, so we can instead offer some binned data:

Author review balance, range	n
(-30,-10]	16
(-10,-4]	207
(-4,-3]	187
(-3,-2]	785
(-2,-1]	3085
(-1,0]	6266
(0,1]	2096
(1,2]	673
(2,3]	293
(3,4]	136
(4,10]	499
(10,30]	218

Figure 3’s data looks something like this:

Paper review balance, range	n
(-30,-10]	472
(-10,-5]	909
(-5,-1]	1223
(-1,0]	131
(0,1]	121
(1,5]	360
(5,10]	332
(10,30]	184

Figure 4’s data:

Review length, range	Excellent review	n
(0,300]	No	2839
(0,300]	Yes	14
(300,600]	No	5926
(300,600]	Yes	319
(600,1000]	No	3278
(600,1000]	Yes	624
(1000,2000]	No	1048
(1000,2000]	Yes	571
(2000,4000]	No	64
(2000,4000]	Yes	44
(4000,10000]	No	1
(4000,10000]	Yes	3

Figure 5’s data:

Review length, range	Role	n
(0,300]	1AC	1656
(0,300]	2AC	482
(0,300]	Reviewer	715
(300,600]	1AC	1638
(300,600]	2AC	1657
(300,600]	Reviewer	2950
(600,1000]	1AC	328
(600,1000]	2AC	1134
(600,1000]	Reviewer	2440
(1000,2000]	1AC	46
(1000,2000]	2AC	392
(1000,2000]	Reviewer	1181
(2000,4000]	1AC	4
(2000,4000]	2AC	20
(2000,4000]	Reviewer	84
(4000,10000]	Reviewer	4