Avoiding Plagiarism in Markov Sequence Generation

Authors

  • Alexandre Papadopoulos Sony CSL and Sorbonne Universites, UPMC Univ Paris 06
  • Pierre Roy Sony CSL
  • François Pachet Sony CSL and Sorbonne Universites, UPMC Univ Paris 06

DOI:

https://doi.org/10.1609/aaai.v28i1.9126

Keywords:

markov chains, constraint programming, music generation, automatic content generation, plagiarism, max order

Abstract

Markov processes are widely used to generate sequences that imitate a given style, using random walk. Random walk generates sequences by iteratively concatenating states to prefixes of length equal or less than the given Markov order}. However, at higher orders, Markov chains tend to replicate chunks of the corpus with a size possibly higher than the order, a primary form of plagiarism. The Markov order defines a maximum length for training but not for generation. In the framework of constraint satisfaction (CSP), we introduce MaxOrder. This global constraint ensures that generated sequences do not include chunks larger than a given maximum order. We exhibit an automaton that recognises the solution set, with a size linear in the size of the corpus. We propose a linear-time procedure to generate this automaton from a corpus and a given max order. We then use this automaton to achieve generalised arc consistency for the MaxOrder constraint, holding on a sequence of size n, in O(n.T) time, where T is the size of the automaton. We illustrate our approach by generating text sequences from text corpora with a maximum order guarantee, effectively controlling plagiarism.

Downloads

Published

2014-06-21

How to Cite

Papadopoulos, A., Roy, P., & Pachet, F. (2014). Avoiding Plagiarism in Markov Sequence Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1). https://doi.org/10.1609/aaai.v28i1.9126

Issue

Section

Main Track: Search and Constraint Satisfaction