From NetsBlox to Spark (and Back Again) Snap!Con 2021

Presented by:

Clifford Anderson

from Vanderbilt University

<p>Clifford B. Anderson is Associate University Librarian for Research and Digital Strategy at the Vanderbilt University Library. He holds a secondary appointment as Professor of Religious Studies in the College of Arts & Science at Vanderbilt University and is affiliated faculty in the Comparative Media Analysis and Practice Joint-Ph.D. program. He was also an Adjunct Professor of Computer Science in the Department of Electrical Engineering and Computer Science in the Vanderbilt Universi...

Brian Broll

from Vanderbilt University

Brian Broll is a Research Scientist at the Institute for Software Integrated Systems at Vanderbilt University. He holds a Ph.D. from Vanderbilt University in Computer Science and a B.Sc. from Buena Vista University, majoring in mathematics education. His research interests include computer science education and model integrated computing.

Mark Schoenfield

from Vanderbilt Department of English

Specializing in romantic literature and law-and-literature, Mark Schoenfield is interested in how text-mining can assist and improve cultural analysis.

Corey Brady

from Vanderbilt University

I'm an assistant professor of the Learning Sciences at Vanderbilt, where I also co-direct the Computational Thinking and Learning Initiative. I work to create learning environments that allow me and the teachers I collaborate with to support and study students' scientific, mathematical, and computational thinking.

Volunteer Hosts
Thanks for helping with Snap!Con 2021!

Vedansh Malhotra

from UC Berkeley

No materials for the event yet, sorry!

This paper presents a case study of an ongoing experiment at Vanderbilt University to teach the fundamental concepts of text mining to undergraduates in an accessible and equitable way. During the Fall 2020 and Spring 2021 semesters, the authors instructed two cohorts of students from a variety of backgrounds, ranging from computer science to English, in a range of text mining techniques. After providing students with a whirlwind introduction to NetsBlox, we introduced concepts like applying transformation pipelines and natural language processing within a block-based environment. We then translated those procedures into ‘big data’ contexts using code notebooks with Apache Spark. As we introduced new concepts, we regularly and iteratively returned to NetsBlox to introduce the main ideas before applying them in a production setting.

A key strategy of this project has been to introduce text-mining concepts and techniques in NetsBlox, using a sample of the larger corpus that was much smaller but featured the same data structures. Once learners had worked through a focal technique, availing themselves of the visual feedback and direct manipulation characteristic of Snap!, we transitioned them to a text-based notebook environment where they could apply their knowledge in ways that engaged with the entire corpus of texts.

NetsBlox provided an ideal ‘test bed’ for these experiments for several reasons. NetsBlox leveled the playing field for students with different levels of prior programming experience. The block-based environment enabled students with little to no background in programming to learn effectively alongside peers with significant expertise in text-based languages. NetsBlox’s remote procedure calls also made it possible to integrate industrial tools like CoreNLP into our block-based pipelines, permitting us to mimic production environments more closely. And, finally, the functional programming operators in Snap! allowed us to model ‘big data’ operations like map/reduce in a straightforward manner.

As we look to develop best practices for teaching large-scale textual analysis with NetsBlox, we invite the broad Snap! community to provide feedback about our next steps. For instance, does it make sense to connect NetsBlox directly with Spark, creating a block-based interface to a big data platform? How might our methodology foster a CS+X (or X+CS) approach, especially for students of the digital humanities?

Finally, we will also showcase “hidden gems” of NetsBlox as we proceed through our presentation, including the use of cloud variables, integration with a BaseX database, and the implementation of an auto-grader.

Date:: 2021 July 29 - 12:20 PDT
Duration:: 20 min
Room:: Room 1
Conference:: Snap!Con 2021
Type:: Talk