Very often, visitors to a tourist destination record their opinions and experiences in online / social media. Other times, online social media users express their intention or desire to visit a place for leisure. They can reasonably suggest a destination or a tourism activity. On other occasions, they are negative and dissuasive. Collecting and analyzing this content may involve a geographical area, a tourism business, or both. The instant detection and analysis of such views provides valuable knowledge to a destination management entity that has undertaken the promotion for a particular area. The collection of unstructured data and the extraction of knowledge from these data on a large scale is almost impossible and expensive with the tools available for searching and editing, based on machine learning techniques and natural language processing with a statistical approach. They do not integrate linguistic analysis capabilities with tourism specialization, while at the same time they cannot scale up to huge volumes of public internet data. Consequently, existing emotion and opinion analysis mechanisms do not export accurate knowledge, do not evolve, and do not readily adapt to thematic requirements, concepts and particularities of languages, and geographic data.
The proposed project develops natural language analysis and processing algorithms to meet the needs of tourism and in particular the need to analyze and explain these views. To this end, the project incorporates modern methods of deep learning into a word and term detection platform, in order to identify meanings and concepts and finally achieve higher degree of accuracy into the analysis of views and causes in tourism-related texts and articles.
In addition to improved accurate analysis of views, it is equally important to be able to process large volumes of such data. Therefore, the project attempts to adapt algorithm execution to advanced scaling mechanisms (scale out), based on Spark’s architecture in all phases of processing: data transformation, training and use of the computational model. The architecture will use the distributed cluster memory to load the data and apply a variety of operators to extract the necessary attributes that the computing model will use. For computational model training, the project will utilize optimized implementations of deep learning algorithmic methods. The platform includes also the user applications, focusing on the visualization of analysis and on a practical notification mechanism to serve tourism professionals’ usage scenarios.
Pilot implementation and demonstration of the integrated platform on selected destinations will allow the evaluation of the platform in real-world data and conditions, according to performance criteria, user and operational indicators for usability and accuracy of the analysis. Thus, in a convincing way, the benefits to tourism and stakeholders will be justified and exhibited. The proposed system will be valuable for tourism destinations that will use it. Analyzing views can be used proactively. The immediate discovery of negative opinions will give the signal for timely corrective actions. Even knowing about positive opinions can lead to effective and targeted promotional actions. The systematic analysis of these data is necessary to provide a more complete picture of the visitor's perception and trends in order to take immediate decisions and react to events and views expressed in social media.