This summer break I worked with Professor Cati Fortin on her ongoing research project in linguistics. Our research focused on comparative deletion in Indonesian, a type of ellipsis. This is a syntactic phenomenon where a repetitive segment of a sentence is assumed to be deleted. An example of this would be the sentence “Mary ate more cookies than Sam did.” Some linguists claim that this sentence is underlyingly “Mary ate more cookies than Sam ate cookies.” However, there is plenty of debate and discussion about whether this is the correct interpretation or not, along with inconsistent data across many languages. This led Cati to take a look at Indonesian ellipsis and see if she could find patterns in the data that can help further the discussion about this topic in the linguistics community.
Because Cati had already done a substantial amount of work on the project already, my role was to familiarize myself with the existing literature and help gather data. This mostly consisted of reading influential works in the field and fully learning the basics of ellipsis, and comparative deletion in particular. Another main part of my work with Cati was bouncing ideas off each other. It was helpful for both of us to have a second opinion on some of the things we found, as well as discuss how we should approach this.
One issue that we came across while working was the method of gathering data. Finding a native speaker to provide all of our data brought up concerns about consistency and replicability, so we turned to corpora. Corpora are large databases of naturally-occurring sentences that were collected for the purpose of being a resource for linguists to pull data from. However, looking for ellipses in corpora is difficult, because there is no easy way to search for a “blank segment” or an assumed deletion of a segment. Due to this problem, a large portion of my work was reading through other published papers about similar ellipsis-related topics and seeing how they searched through corpora and conducted their research. Additionally, I was also looking for corpora that had the data we needed. Not all corpora are Indonesian, and not all of them are accessible, so part of the challenge was finding databases that fit our requirements.
This summer was enlightening in terms of how to conduct research. At first I thought that it would be rather straightforward, and all we had to do was answer a few questions. However, it turned out to be much more complicated than that. Each step had unexpected challenges that I had to overcome effectively, and in a timely manner so Cati still had material to work with every week. As I read papers, I had to ensure that the material that I summarized for Cati was not only concise, but also relevant to what we were researching. Learning how to filter out what was important and not important was critical in the early stages of this project. I’ve also learned how to effectively use corpora, as well as organize the data in a way that is useful for our research questions. This was an entirely new skill set for me, and I am excited to use those skills in future endeavors. While we haven’t been able to fully process the data yet, this summer has been critical in setting up for future research. Overall, it was an incredible experience and the skills I learned will definitely help me in whatever I decide to do next.