Skip to main content

Showing 1–35 of 35 results for author: Bizarro, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.06906  [pdf, other

    cs.LG cs.AI

    Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

    Authors: Jean V. Alves, Diogo Leitão, Sérgio Jesus, Marco O. P. Sampaio, Javier Liébana, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key aspects of real-world systems that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type 1 and type 2 errors have different costs; ii) requiring c… ▽ More

    Submitted 21 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  2. arXiv:2401.08534  [pdf, other

    cs.LG cs.AI cs.HC

    DiConStruct: Causal Concept-based Explanations through Black-Box Distillation

    Authors: Ricardo Moreira, Jacopo Bono, Mário Cardoso, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predi… ▽ More

    Submitted 26 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at Conference on Causal Learning and Reasoning (CLeaR 2024, https://www.cclear.cc/2024). To be published at Proceedings of Machine Learning Research (PMLR)

  3. arXiv:2312.13218  [pdf, other

    cs.LG cs.AI

    FiFAR: A Fraud Detection Dataset for Learning to Defer

    Authors: Jean V. Alves, Diogo Leitão, Sérgio Jesus, Marco O. P. Sampaio, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud det… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: The public dataset and detailed synthetic expert information are available at: https://github.com/feedzai/fifar-dataset

  4. arXiv:2307.15677  [pdf, other

    cs.LG cs.CR

    Adversarial training for tabular data with attack propagation

    Authors: Tiago Leon Melo, João Bravo, Marco O. P. Sampaio, Paolo Romano, Hugo Ferreira, João Tiago Ascensão, Pedro Bizarro

    Abstract: Adversarial attacks are a major concern in security-centered applications, where malicious actors continuously try to mislead Machine Learning (ML) models into wrongly classifying fraudulent activity as legitimate, whereas system maintainers try to stop them. Adversarially training ML models that are robust against such attacks can prevent business losses and reduce the work load of system maintai… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  5. arXiv:2307.13787  [pdf, other

    cs.LG cs.CR

    The GANfather: Controllable generation of malicious activity to improve defence systems

    Authors: Ricardo Ribeiro Pereira, Jacopo Bono, João Tiago Ascensão, David Aparício, Pedro Ribeiro, Pedro Bizarro

    Abstract: Machine learning methods to aid defence systems in detecting malicious activity typically rely on labelled data. In some domains, such labelled data is unavailable or incomplete. In practice this can lead to low detection rates and high false positive rates, which characterise for example anti-money laundering systems. In fact, it is estimated that 1.7--4 trillion euros are laundered annually and… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  6. arXiv:2307.08433  [pdf, other

    cs.LG

    From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphs

    Authors: Ahmad Naser Eddin, Jacopo Bono, David Aparício, Hugo Ferreira, João Ascensão, Pedro Ribeiro, Pedro Bizarro

    Abstract: Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin… ▽ More

    Submitted 16 February, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: 9 pages, 5 figures, 7 tables

  7. arXiv:2303.16963  [pdf, other

    cs.LG cs.CY

    Fairness-Aware Data Valuation for Supervised Learning

    Authors: José Pombal, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework tha… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: ICLR 2023 Workshop Trustworthy ML

  8. arXiv:2211.13358  [pdf, other

    cs.LG

    Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

    Authors: Sérgio Jesus, José Pombal, Duarte Alves, André Cruz, Pedro Saleiro, Rita P. Ribeiro, João Gama, Pedro Bizarro

    Abstract: Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this… ▽ More

    Submitted 28 November, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted at NeurIPS 2022. https://openreview.net/forum?id=UrAYT2QwOX8

  9. arXiv:2210.14360  [pdf, other

    cs.LG cs.AI

    LaundroGraph: Self-Supervised Graph Representation Learning for Anti-Money Laundering

    Authors: Mário Cardoso, Pedro Saleiro, Pedro Bizarro

    Abstract: Anti-money laundering (AML) regulations mandate financial institutions to deploy AML systems based on a set of rules that, when triggered, form the basis of a suspicious alert to be assessed by human analysts. Reviewing these cases is a cumbersome and complex task that requires analysts to navigate a large network of financial interactions to validate suspicious movements. Furthermore, these syste… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted at ACM International Conference on AI in Finance 2022 (ICAIF'22)

  10. arXiv:2209.07850  [pdf, other

    cs.LG cs.AI cs.CY

    FairGBM: Gradient Boosting with Fairness Constraints

    Authors: André F Cruz, Catarina Belém, Sérgio Jesus, João Bravo, Pedro Saleiro, Pedro Bizarro

    Abstract: Tabular data is prevalent in many high-stakes domains, such as financial services or public policy. Gradient Boosted Decision Trees (GBDT) are popular in these settings due to their scalability, performance, and low training cost. While fairness in these domains is a foremost concern, existing in-processing Fair ML methods are either incompatible with GBDT, or incur in significant performance loss… ▽ More

    Submitted 3 March, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

    Comments: Published as a conference paper at ICLR 2023

  11. arXiv:2207.08640  [pdf, other

    cs.LG

    Lightweight Automated Feature Monitoring for Data Streams

    Authors: João Conde, Ricardo Moreira, João Torres, Pedro Cardoso, Hugo R. C. Ferreira, Marco O. P. Sampaio, João Tiago Ascensão, Pedro Bizarro

    Abstract: Monitoring the behavior of automated real-time stream processing systems has become one of the most relevant problems in real world applications. Such systems have grown in complexity relying heavily on high dimensional input data, and data hungry Machine Learning (ML) algorithms. We propose a flexible system, Feature Monitoring (FM), that detects data drifts in such data sets, with a small and co… ▽ More

    Submitted 19 July, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: 10 pages, 5 figures. AutoML, KDD22, August 14-17, 2022, Washington, DC, US

  12. arXiv:2207.06273  [pdf, other

    cs.LG cs.CY q-fin.ST

    Understanding Unfairness in Fraud Detection through Model and Data Bias Interactions

    Authors: José Pombal, André F. Cruz, João Bravo, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: KDD'22 Workshop on Machine Learning in Finance

  13. arXiv:2206.13503  [pdf, other

    cs.LG cs.HC

    On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

    Authors: Kasun Amarasinghe, Kit T. Rodolfa, Sérgio Jesus, Valerie Chen, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro, Ameet Talwalkar, Rayid Ghani

    Abstract: Most existing evaluations of explainable machine learning (ML) methods rely on simplifying assumptions or proxies that do not reflect real-world use cases; the handful of more robust evaluations on real-world settings have shortcomings in their design, resulting in limited conclusions of methods' real-world utility. In this work, we seek to bridge this gap by conducting a study that evaluates thre… ▽ More

    Submitted 21 February, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

  14. arXiv:2206.13202  [pdf, other

    cs.LG cs.AI cs.HC

    Human-AI Collaboration in Decision-Making: Beyond Learning to Defer

    Authors: Diogo Leitão, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements… ▽ More

    Submitted 13 July, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: ICML 2022 Workshop on Human-Machine Collaboration and Teaming

  15. arXiv:2206.13183  [pdf, other

    cs.LG

    Prisoners of Their Own Devices: How Models Induce Data Bias in Performative Prediction

    Authors: José Pombal, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases o… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: ICML 2022 Workshop on Responsible Decision Making in Dynamic Environments

  16. arXiv:2205.03601  [pdf, other

    cs.LG cs.AI

    ConceptDistil: Model-Agnostic Distillation of Concept Explanations

    Authors: João Bento Sousa, Ricardo Moreira, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro

    Abstract: Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop. Previous work has focused on providing concepts for specific models (eg, neural networks) or data types (eg, images), and by either trying to extract concepts from an already trained network or training self-explainable models through multi-task learning. In this work, we propose ConceptDis… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: ICLR 2022 PAIR2Struct Workshop

  17. arXiv:2204.14025  [pdf, other

    cs.LG cs.HC

    Data+Shift: Supporting visual investigation of data distribution shifts by data scientists

    Authors: João Palmeiro, Beatriz Malveiro, Rita Costa, David Polido, Ricardo Moreira, Pedro Bizarro

    Abstract: Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+S… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

    Comments: 5 pages, 3 figures, short paper accepted at EuroVis 2022

  18. arXiv:2112.07508  [pdf, ps, other

    cs.LG

    Anti-Money Laundering Alert Optimization Using Machine Learning with Graphs

    Authors: Ahmad Naser Eddin, Jacopo Bono, David Aparício, David Polido, João Tiago Ascensão, Pedro Bizarro, Pedro Ribeiro

    Abstract: Money laundering is a global problem that concerns legitimizing proceeds from serious felonies (1.7-4 trillion euros annually) such as drug dealing, human trafficking, or corruption. The anti-money laundering systems deployed by financial institutions typically comprise rules aligned with regulatory frameworks. Human investigators review the alerts and report suspicious cases. Such systems suffer… ▽ More

    Submitted 17 June, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: 8 pages, 5 figures

    MSC Class: I.2.1; J.4

  19. arXiv:2108.09200  [pdf, other

    cs.SI

    GUDIE: a flexible, user-defined method to extract subgraphs of interest from large graphs

    Authors: Maria Inês Silva, David Aparício, Beatriz Malveiro, João Tiago Ascensão, Pedro Bizarro

    Abstract: Large, dense, small-world networks often emerge from social phenomena, including financial networks, social media, or epidemiology. As networks grow in importance, it is often necessary to partition them into meaningful units of analysis. In this work, we propose GUDIE, a message-passing algorithm that extracts relevant context around seed nodes based on user-defined criteria. We design GUDIE for… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: 16 pages, 8 figures, accepted at GEM2021

  20. arXiv:2108.04494  [pdf, other

    cs.SI

    Finding NeMo: Fishing in banking networks using network motifs

    Authors: Xavier Fontes, David Aparício, Maria Inês Silva, Beatriz Malveiro, João Tiago Ascensão, Pedro Bizarro

    Abstract: Banking fraud causes billion-dollar losses for banks worldwide. In fraud detection, graphs help understand complex transaction patterns and discovering new fraud schemes. This work explores graph patterns in a real-world transaction dataset by extracting and analyzing its network motifs. Since banking graphs are heterogeneous, we focus on heterogeneous network motifs. Additionally, we propose a no… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: 6 pages, 6 figures, accepted at SEAData 2021

  21. arXiv:2107.07724  [pdf, other

    cs.LG stat.ML

    Active learning for imbalanced data under cold start

    Authors: Ricardo Barata, Miguel Leite, Ricardo Pacheco, Marco O. P. Sampaio, João Tiago Ascensão, Pedro Bizarro

    Abstract: Modern systems that rely on Machine Learning (ML) for predictive modelling, may suffer from the cold-start problem: supervised models work well but, initially, there are no labels, which are costly or slow to obtain. This problem is even worse in imbalanced data scenarios, where labels of the positive class take longer to accumulate. We propose an Active Learning (AL) system for datasets with orde… ▽ More

    Submitted 22 October, 2021; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: 9 pages, 6 figures, 2 tables

    Journal ref: ACM International Conference on AI in Finance, Nov 2021

  22. arXiv:2106.12626  [pdf, other

    cs.DC

    Railgun: managing large streaming windows under MAD requirements

    Authors: Ana Sofia Gomes, João Oliveirinha, Pedro Cardoso, Pedro Bizarro

    Abstract: Some mission critical systems, e.g., fraud detection, require accurate, real-time metrics over long time sliding windows on applications that demand high throughput and low latencies. As these applications need to run 'forever' and cope with large, spiky data loads, they further require to be run in a distributed setting. We are unaware of any streaming system that provides all those properties. I… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: text overlap with arXiv:2009.00361

  23. arXiv:2104.12459  [pdf, other

    cs.LG cs.AI

    Weakly Supervised Multi-task Learning for Concept-based Explainability

    Authors: Catarina Belém, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro

    Abstract: In ML-aided decision-making tasks, such as fraud detection or medical diagnosis, the human-in-the-loop, usually a domain-expert without technical ML knowledge, prefers high-level concept-based explanations instead of low-level explanations based on model features. To obtain faithful concept-based explanations, we leverage multi-task learning to train a neural network that jointly learns to predict… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: Accepted at ICLR 2021 Workshop on Weakly Supervised Learning (WeaSuL)

  24. Promoting Fairness through Hyperparameter Optimization

    Authors: André F. Cruz, Pedro Saleiro, Catarina Belém, Carlos Soares, Pedro Bizarro

    Abstract: Considerable research effort has been guided towards algorithmic fairness but real-world adoption of bias reduction techniques is still scarce. Existing methods are either metric- or model-specific, require access to sensitive attributes at inference time, or carry high development or deployment costs. This work explores the unfairness that emerges when optimizing ML models solely for predictive p… ▽ More

    Submitted 11 October, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2010.03665

    Journal ref: 2021 IEEE International Conference on Data Mining (ICDM)

  25. arXiv:2102.05373  [pdf, other

    cs.LG cs.SI

    GuiltyWalker: Distance to illicit nodes in the Bitcoin network

    Authors: Catarina Oliveira, João Torres, Maria Inês Silva, David Aparício, João Tiago Ascensão, Pedro Bizarro

    Abstract: Money laundering is a global phenomenon with wide-reaching social and economic consequences. Cryptocurrencies are particularly susceptible due to the lack of control by authorities and their anonymity. Thus, it is important to develop new techniques to detect and prevent illicit cryptocurrency transactions. In our work, we propose new features based on the structure of the graph and past labels to… ▽ More

    Submitted 21 July, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

    Comments: 5 pages, 3 figures

  26. How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations

    Authors: Sérgio Jesus, Catarina Belém, Vladimir Balayan, João Bento, Pedro Saleiro, Pedro Bizarro, João Gama

    Abstract: There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessment, explanations might be chosen that, in fact, hur… ▽ More

    Submitted 22 January, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

    Comments: Accepted at FAccT'21, the ACM Conference on Fairness, Accountability, and Transparency

  27. arXiv:2012.01932  [pdf, other

    cs.LG cs.AI

    Teaching the Machine to Explain Itself using Domain Knowledge

    Authors: Vladimir Balayan, Pedro Saleiro, Catarina Belém, Ludwig Krippahl, Pedro Bizarro

    Abstract: Machine Learning (ML) has been increasingly used to aid humans to make better and faster decisions. However, non-technical humans-in-the-loop struggle to comprehend the rationale behind model predictions, hindering trust in algorithmic decision-making systems. Considerable research work on AI explainability attempts to win back trust in AI systems by developing explanation methods but there is sti… ▽ More

    Submitted 27 November, 2020; originally announced December 2020.

    ACM Class: I.2

  28. TimeSHAP: Explaining Recurrent Models through Sequence Perturbations

    Authors: João Bento, Pedro Saleiro, André F. Cruz, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may b… ▽ More

    Submitted 26 June, 2021; v1 submitted 30 November, 2020; originally announced December 2020.

    Comments: Accepted at KDD 2021

  29. arXiv:2010.03665  [pdf, other

    cs.LG cs.AI

    A Bandit-Based Algorithm for Fairness-Aware Hyperparameter Optimization

    Authors: André F. Cruz, Pedro Saleiro, Catarina Belém, Carlos Soares, Pedro Bizarro

    Abstract: Considerable research effort has been guided towards algorithmic fairness but there is still no major breakthrough. In practice, an exhaustive search over all possible techniques and hyperparameters is needed to find optimal fairness-accuracy trade-offs. Hence, coupled with the lack of tools for ML practitioners, real-world adoption of bias reduction methods is still scarce. To address this, we pr… ▽ More

    Submitted 22 October, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

  30. arXiv:2009.11751  [pdf, ps, other

    cs.CR cs.LG stat.ML

    BreachRadar: Automatic Detection of Points-of-Compromise

    Authors: Miguel Araujo, Miguel Almeida, Jaime Ferreira, Luis Silva, Pedro Bizarro

    Abstract: Bank transaction fraud results in over $13B annual losses for banks, merchants, and card holders worldwide. Much of this fraud starts with a Point-of-Compromise (a data breach or a skimming operation) where credit and debit card digital information is stolen, resold, and later used to perform fraud. We introduce this problem and present an automatic Points-of-Compromise (POC) detection procedure.… ▽ More

    Submitted 24 September, 2020; originally announced September 2020.

    Comments: 9 pages, 10 figures, published in SIAM's 2017 International Conference on Data Mining (SDM17)

  31. arXiv:2009.00361  [pdf, other

    cs.DC cs.DB

    Railgun: streaming windows for mission critical systems

    Authors: João Oliveirinha, Ana Sofia Gomes, Pedro Cardoso, Pedro Bizarro

    Abstract: Some mission critical systems, such as fraud detection, require accurate, real-time metrics over long time windows on applications that demand high throughputs and low latencies. As these applications need to run "forever", cope with large and spiky data loads, they further require to be run in a distributed setting. Unsurprisingly, we are unaware of any distributed streaming system that provides… ▽ More

    Submitted 10 November, 2020; v1 submitted 1 September, 2020; originally announced September 2020.

    Comments: Previously submitted to CIDR 2021

    ACM Class: C.3; H.3.4; H.2.4

  32. arXiv:2005.14635  [pdf, other

    cs.LG stat.ML

    Machine learning methods to detect money laundering in the Bitcoin blockchain in the presence of label scarcity

    Authors: Joana Lorenz, Maria Inês Silva, David Aparício, João Tiago Ascensão, Pedro Bizarro

    Abstract: Every year, criminals launder billions of dollars acquired from serious felonies (e.g., terrorism, drug smuggling, or human trafficking) harming countless people and economies. Cryptocurrencies, in particular, have developed as a haven for money laundering activity. Machine Learning can be used to detect these illicit patterns. However, labels are so scarce that traditional supervised algorithms a… ▽ More

    Submitted 5 October, 2021; v1 submitted 29 May, 2020; originally announced May 2020.

    Comments: 8 pages, 7 figures

  33. arXiv:2002.06075  [pdf, other

    cs.LG cs.AI cs.DB stat.ML

    ARMS: Automated rules management system for fraud detection

    Authors: David Aparício, Ricardo Barata, João Bravo, João Tiago Ascensão, Pedro Bizarro

    Abstract: Fraud detection is essential in financial services, with the potential of greatly reducing criminal activities and saving considerable resources for businesses and customers. We address online fraud detection, which consists of classifying incoming transactions as either legitimate or fraudulent in real-time. Modern fraud detection systems consist of a machine learning model and rules defined by h… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: 11 pages, 12 figures, submitted to KDD '20 Applied Data Science Track

  34. arXiv:2002.05988  [pdf, other

    cs.LG cs.CR stat.ML

    Interleaved Sequence RNNs for Fraud Detection

    Authors: Bernardo Branco, Pedro Abreu, Ana Sofia Gomes, Mariana S. C. Almeida, João Tiago Ascensão, Pedro Bizarro

    Abstract: Payment card fraud causes multibillion dollar losses for banks and merchants worldwide, often fueling complex criminal activities. To address this, many real-time fraud detection systems use tree-based models, demanding complex feature engineering systems to efficiently enrich transactions with historical data while complying with millisecond-level latencies. In this work, we do not require thos… ▽ More

    Submitted 17 June, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

    Comments: 9 pages, 4 figures, to appear in SIGKDD'20 Industry Track

  35. arXiv:1908.04240  [pdf, other

    cs.LG stat.ML

    Automatic Model Monitoring for Data Streams

    Authors: Fábio Pinto, Marco O. P. Sampaio, Pedro Bizarro

    Abstract: Detecting concept drift is a well known problem that affects production systems. However, two important issues that are frequently not addressed in the literature are 1) the detection of drift when the labels are not immediately available; and 2) the automatic generation of explanations to identify possible causes for the drift. For example, a fraud detection model in online payments could show a… ▽ More

    Submitted 12 August, 2019; originally announced August 2019.

    Comments: 9 pages, 9 figures, 2 tables

    Journal ref: KDD-ADF-2019