CrossMigration is currently putting the finishing touches to the Migration Research Hub, the new online place to find all migration research under one roof, where the open-source migration research taxonomy will come to life.
Researchers have been working hard to refine this taxonomy, to make it easier to use and relevant to all researchers in the field. Since the Migration Research Hub seeks to foster collaboration across disciplines, it should always try to lead by example. And nowhere is this drive towards a collaboration more visible than when translating the taxonomy from an academic exercise into a growing database, a success made possible only through the close cooperation between researchers and developers.
Youngminds, the bureau that is developing and refining the database talks about the challenges and innovations that the project, at the intersection of research and technology, has brought with it, and what all participants have learned along the way.
Defining the problem
Bogdan Taut: To understand why this project is so innovative, we first have to understand what it is. The Migration Research Hub is a content project at the core, and the challenge is that we have to manage large amounts of content and data. But not only that, the data needs to be collected automatically and from many sources that have different ways of presenting the information to users. As it were, we needed an octopus with tentacles everywhere.
Dragos Ionescu: For that, we needed to make sure to get all types of different Application Program Interfaces (APIs) to speak to one another, to look through different types of sources. And this is where we needed a new approach. What we have achieved is a common format that takes data from pages, finds a structure and then brings it to our database in a coherent manner that can be analyzed and consumed.
What is an API?
An Application Programming Interface (API) allows separate systems to communicate in a structured manner, and different programs or structures can be interpreted in the same way.
How did you define the problem at the start?
DI: While platforms that collect data in this manner already exist, they are mostly niche enterprises for, say, programmers. We’d never really worked or come across platforms specifically for research that had this large and growing definition.
BT: When we started the project, we didn’t know that this was going to grow as it did. We understood the general idea behind how they wanted to use the taxonomy -- as a ‘living’ thing that could grow and be modified in real time.
DI: But when we started, we didn’t really know how it was going to translate to the finer technical points. We understood that we needed to get the data, and that we had to systematize and filter it, so that the taxonomy could become a reality. The product is our technical interpretation of all of these wishes.
From database to taxonomy
DI: First we made sure that we could extract all the data, and collect it in one place. Then the real fun started: the technical expression of the taxonomy looks into each piece of data and extracts the most relevant information to categorize it. We also created an advanced search system, which is the infrastructure that helps express the searches for specific topics, each category in the taxonomy.
BT: We needed to build in the logic necessary for all of these searches to intersect, because a piece of research can belong to different topics in the taxonomy. And it needed to be automated because it would be physically impossible for anyone to manually tag the 450 topics that currently are on the database, and the almost million items that we have. Especially since the database will continue to grow as it is used and improved!
DI: To do this we used crawlers, which are little pieces of code that extract content from a source. But sources have different formats, so we needed to create, test and run hundreds of them. Since scientists working on refining the taxonomy continue to identify sources and topics that need to be included, this is a work in progress. Again, the system needs to be able to accommodate new sources and different types of data, and we are confident that it can.
Research and development
BT: Working with researchers was great. It gave us a different way to understand the problem, and together we reached this solution, a new way of thinking about this project and the technical expression of a research taxonomy. We were very lucky to work with both young researchers who understood our language and our way of thinking, and more senior researchers who really trusted our expertise and that of their own teams.
DI: And as it stands, the project provides the infrastructure for their research. The researchers are still refining the categorization, but our scaffolding is secure enough already. This is great, considering that when we first started the project all we had was a very abstract idea of how things were supposed to work!
Reaching a wider audience
BT: We truly believe that it makes sense to have a specialized database for this --and other-- research communities. This idea goes beyond a technical solution for putting data online. What the CrossMigration team has in mind is a true community where new connections are made and reinforced through data. Migration studies has the critical mass of researchers and the Migration Research Hub brings a powerful way to connect to the table. We are of course open for improvement and refinement but as we said before, we have a solid scaffolding to build from.