Tandem Inference: An Out-of-Core Streaming Algorithm for Very Large-Scale Relational Inference
Statistical relational learning (SRL) frameworks allow users to create large, complex graphical models using a compact, rule-based representation. However, these models can quickly become prohibitively large and not fit into machine memory. In this work we address this issue by introducing a novel technique called tandem inference (ti). The primary idea of ti is to combine grounding and inference such that both processes happen in tandem. ti uses an out-of-core streaming approach to overcome memory limitations. Even when memory is not an issue, we show that our proposed approach is able to do inference faster while using less memory than existing approaches. To show the effectiveness of ti, we use a popular SRL framework called Probabilistic Soft Logic (PSL). We implement ti for PSL by proposing a gradient-based inference engine and a streaming approach to grounding. We show that we are able to run an SRL model with over 1B cliques in under nine hours and using only 10 GB of RAM; previous approaches required more than 800 GB for this model and are infeasible on common hardware. To the best of our knowledge, this is the largest SRL model ever run.