Machine learning in Madrid
Machine learning in Madrid
Lunes, 31 de mayo de 2021, 12-13h
Ponente: David Arroyo (CSIC)
Título: Trustworthy, Reliable and Engaging Scientific Communication Approaches (TRESCA): Trustworthy, Reliable and Engaging Scientific Communication Approaches (TRESCA):
On one hand, decentralised systems that do not rely on the authority of a Trusted Third Parties posit the challenge of determining whether a piece of information is authentic. On the other hand, people consume more news and information coming from decentralised sources, such as social networks or messaging apps, than from centralised media such as newspapers or national television channels. Decentralisation and multiplication of types and sources ofinformation erode our ability to discern the accurate from inaccurate information. Traditionally information quality and reliability was established based on the credibility and the reputation of the source. On social media platforms and across messaging service apps, such as Whatsapp or Telegram, attribution cannot be properly established. As a result, the curation of news data along the entire data life cycle becomes a difficult task. Clearly categorising news on the continuum from unintentionally inaccurate to intentionally misleading information remains problematic. Poor identification of non-genuine information is a serious issue that prevents the effective containment of false information.
In this seminar we will explore the design implications for the construction of a misinformation widget guiding users in assessing the trustworthiness of various sources of information. A critical aspect in the design of the widget is the identification of the best news classification tools and methodologies. To achieve this objective, on option is to rely on fact checking platforms and human experts to obtain feedback, which can be extended by leveraging the so-called wisdom of crowds and perform news curation as result of a collaborative effort among users and experts. Expert-based systems are accurate but costly and not scalable, while crowds-based systems can be biased by herding behaviour. To overcome these limitations, we can ponder the developing of automatic detection techniques by means of Natural Language Processing (NLP) and more advanced Machine Learning (ML) techniques. Nonetheless, the selection of adequate models and datasets for their tuning and training is itself a challenge. Thus, we explore the option of adopting a so-called “human on the loop” approach, which integrates expert knowledge on fact checking and automatic detection of fake news and misinformation. Specifically, we propose a methodology that leverages fact-checking platforms to perform datasets labelling and the validation of the performance of NLP and ML tools for the automatic classification of information.
Location Lunes, 31 de mayo de 2021, 12-13h