Analyzing Words and Networks with ConText

Jana Diesner (PhD) - iSchool, Universität Illinois Urbana-Champaign (UIUC)

1. What is Covered in the Workshop?
What will you learn? The functioning and dynamics of real-world networks involve the continuous production, processing and flow of knowledge and information. Sources for this knowledge and information often occur in the form of unstructured, natural language text data. In this workshop, participants learn how to a) construct network data from text data and pertaining meta-data, and b) how to jointly consider text data and network data for analysis; allowing for considering two types of behavioral information, namely social interactions and language use.

Workshop participants are introduced to fundamental theories, concepts and methods for these purposes. Using text analysis for network analysis has been useful in answering questions such as: Who is talking to whom, and about what? What perceptions or mental models do social agents have of certain themes? How do opinions evolve and diffuse in society and online? Throughout this workshop, we discuss practical applications for the introduced techniques from various domains.
The focus of this workshop is on teaching practical, hands-on skills for using text analysis methods in an informed, systematic and efficient fashion. We use the ConText software. Our goal is to equip the participants with the skills and tools needed to use the covered techniques for their own research purposes and text data sets. Attendants will perform automated text mining and natural language processing techniques including:

Collecting various types of text data, including social media data, news wire data, legal documents and corporate information.
Creating curated corpora of collected data, deduplicating documents, and managing text data and pertaining meta data in automatically populated databases that can be used for search and retrieval functions and data mining techniques.
Summarization techniques such as topic modeling, term weighting techniques and corpus statistics.
Sentiment Analysis, also known as opinion mining.
Visualization of text mining results.
Pre-processing techniques such as stemming and parts of speech tagging.
Entity detection, i.e. detecting and categorizing terms and term sequences that represent instances of relevant node classes in one-mode and multi-mode network analysis, e.g. agents, organizations, locations and knowledge.
Relation Extraction, i.e. linking identified entities into edges based on various criteria, including proximity, syntax and semantics. The extracted networks can be imported into standard SNA tools, incl. Gephi, ORA, Pajek, UCINET and visone.
Extracting semantic networks and fusing them with social networks.
Analyzing the extracted networks, i.e. conducting (social) network analysis, including network visualization, computing metrics on the node, group and graph level, and clustering techniques. This last step will be kept at a minimum since other workshops at Sunbelt cover this topic in great detail. For this step, we focus on interpreting the obtained results in the context of working with text data as a source for constructing or enhancing network data.

Going from texts to networks involves some principles and strategies originating from computer science that are not only applicable to the task at hand, but to a wide range of problems. These principles and strategies are referred to as “Computational Thinking” - a basic skill like reading, writing and arithmetic that is crucial for solving problems and understanding human behavior across fields (Wing 2006). In this workshop, participants are introduced to Computational Thinking and practice applying this way of thinking.

2. Who Should Attend?
This is an interdisciplinary and interactive workshop designed to benefit from the participation of attendants from different backgrounds. The material, exercises and mode of delivery are suitable for researchers and practitioners alike. No specific prior knowledge or computational skills are required. The delivery is driven towards forming an understanding of fundamental concepts and gaining hands-on experience with text analysis and network analysis methods and tools.

3. What to Bring to the Workshop?
Software: We will use ConText and Gephi for this workshop. Prior to the workshop, I will send an email to confirmed participants with links and installation instructions for these tools. You are invited to bring a laptop to the workshop. If attendants cannot bring a laptop they will still fully benefit from the workshop as I screen-project all live walk-through exercises. At the workshop, I will provide a tutorial document and further learning resources.
Data: Attendants can work with the sample data that we provide and/ or bring their own data.
Readings Prior to the workshop, I recommend reading the following overviews on the concepts and methods covered in the workshop:

All further readings are optional:

Introduction of information extraction/ text mining: McCallum, A. (2005). Information extraction: distilling structured data from unstructured text. ACM Queue, 3(9), 48-57.
Introduction of information extraction/ text mining: Hanneman, RA & Riddle, M. (2005). Introduction to social network methods. Riverside, CA: University of California.
Introduction to Computational Thinking: Wing, J. M. (2006). Computational Thinking. Communications of the ACM, 49(3), 33-35.

4. Information About the Instructor
Jana Diesner is an Assistant Professor at the iSchool (a.k.a. Graduate School of Library and Information Science) at the University of Illinois Urbana-Champaign (UIUC), and an affiliate at the Department of Computer Science (CS). She got her PhD from Carnegie Mellon University, School of Computer Science. Jana’s work is at the nexus of social network analysis, natural language processing and machine learning. With her team, Jana is developing and advancing computational methods and technologies that help people to measure and understand the interplay and co-evolution of information and socio-technical networks. She brings these computational solutions into various application context, currently mainly in the domains of medical informatics and media impact assessment. For more information about Jana’s work see http://people.lis.illinois.edu/~jdiesner/.

5. Questions?
Contact Jana with any questions about the workshop at jdiesner@illinois.edu.

6. Slides
Download the slides from the workshop.