CYP-LEX - A large-scale lexical database of books read by British children aged 7-16

The CYP-LEX project is a collaboration with Kathy Rastle, Marc Brysbaert, and Marco Marelli. The aim of this work was to create a lexical database of words that British children aged 7-16 encounter when they read for pleasure. To this end, we built a corpus of 1,200 books popular with children and young people in the UK, and analysed the properties of words used in these books. The associated research article came out in March 2024, and is publicly available and free to download. Kathy and I also wrote an accessible blog post explaining the key insights from our analysis.

Within a month of its publication, our paper has attracted a lot of attention from teachers, teacher educators, and literacy charities (see below). It has been highlighted that the “implications [of this research] can be profound for the choices we make to help pupils learn language”.

  • Research article viewed and downloaded more than 1,100 times, associated blog post viewed more than 1,400 times
  • In the top 5% of research outputs ever tracked by Altmetric
  • Featured in publications aimed at education professionals, e.g., Times Educational Supplement (TES), 3Rs newsletter