Computer Science Department Contributes to Major NSF Study on the Evolution of the Domesticated Tomato

Professor Sofia Visa, two Wooster students computationally analyzed DNA of cultivated tomatoes in Latin America

February 25, 2020   /  

WOOSTER, Ohio – The College of Wooster’s Department of Computer Science played an integral role in a National Science Foundation (NSF) study that helped uncover the evolution of the tomato, as students Erika Goetz and Angelo Williams worked with Associate Professor of Computer Science Sofia Visa to examine the DNA of 245 tomato varieties. They provided data analysis to biologists/geneticists Esther van der Knaap of the University of Georgia and Hamid Razifard at the University of Massachusetts Amherst among other faculty collaborators, which resulted in a paper published in the journal Molecular Biology and Evolution.

Goetz, a senior computer science major, and Williams, a junior mathematics and computer science double major, wrote computer programs in Python to compare structural variants such as DNA insertions, deletions, and inversions, and then clustered these structural variants in a phylogenetic tree with 10 geographical groupings. Their primary finding is that the modern tomato is most closely related to a Mexican variant, a weed-like tomato group rather than the semi-domesticated intermediate types found in South America.

Wooster’s contribution to the project, courtesy of a $400,000 portion from the total NSF grant, required modeling a very large data set of approximately 1 terabyte of genomic data distributed into three categories, namely:

  • SLL – domesticated tomato (Solanum lycopersicum var lycopersicum)
  • SLC – semi-domesticated tomato (Solanum lycopersicum var cerasiforme)
  • SP – wild tomato (Solanum pimpinellifolium)

The algorithms developed by the Wooster team helped build the phylogenetic tree with 34,980 structural variants identified, which allowed the biologists/geneticists to reconstruct tomato domestication. Further, these structural variants will be used to identify traits in wild tomatoes that have been lost through domestication, but which still might be desirable in the domesticated tomato. An additional research avenue is using three structural variants to identify genes that are associated with tomato taste. This will allow breeders to develop sweeter tomatoes, for instance.

For the students, this was a perfect experience of collaboration and working across disciplines—computer science and biology, in this case—that will likely occur in their postgraduate work. And, in addition to being credited as co-authors on the journal paper, Visa, Goetz, and Williams were also part of a conference paper, “Detecting Structural Variants in Tomato Fruits Using Illumina Reads,” at the International Conference on Applied Informatics Imagination, Creativity, Design, Development.

“This work is rewarding because the students gained so much interdisciplinary knowledge and they (got) to apply their programming skills to a concrete problem. Also, their work is published in a journal paper and a conference paper, (which) is as much weight in computer science as a scientific journal,” remarked Visa.

Erika Goetz, Angelo Williams
Students Erika Goetz (left) and Angelo Williams (right) gained valuable experience, writing computer programs in Python for an NSF-funded study.