The present study addresses whether and how the design of presenting the pairwise sentences to the participants affects the result of acceptability judgment testing. Because a single choice in experimental design can occasionally lead to a dramatic change at the end, empirical science requires the latent impacts of the parameters in experimental settings to be examined in advance. The present study conducts three designs of acceptability judgment testing, such as the between-group, the within-group, and the Latin Square designs with the same test items. The experiments were administrated on a large scale with two different experimental tasks, such as the Likert scale task and the binary Yes/No task. The comparison across the designs and the tasks demonstrates that the different designs do not necessarily yield different results This indicates that there is no ground for us to regard the Latin Square design as an indispensable process in language experiments.