Case Studies ❯  CLUTCH Information Management & Discovery ❯  Predictive Coding In a Regulatory Investigation

Scope of Services

Clutch was called upon to assist our client, a large financial services institution, comply with investigation requests from regulators in the US and Europe. Data from the matter spanned across three jurisdictions.

The purpose of the investigation was to look at custodians responsible for having developed an index fund’s strategy and its subsequent modifications for other jurisdictions (the product was developed for two different European countries and later modified for the US.). It also involved analyzing data from custodians responsible for marketing and selling as well as trading behavior that was linked to the index’s fluctuations. This involved the analysis of a wide range of custodians’ data, amounting to more than 5000 GBs. The initial volume to be analyzed was over 10 million documents hosted on multiple client databases. Clutch’s mandate was to utilize technology to efficiently identify relevant information while concurrently making all efforts to limit costs.

Context and Description

The bank had established an index fund managing a number of assets based on an internally developed trading strategy. This fund had then been marketed and sold as shares to clients. The regulatory investigation emerged from concerns that the methods of calculating value created by the fund’s developers were inconsistent with marketing material such as offering circulars. There were additional concerns pertaining to specific misconduct related to traders who had foreknowledge of the fund’s strategy.

Clutch was asked by the client to develop key facts from the matter as well as to assist the client in meeting the regulator’s deadlines for production.

Our mandate was to utilize technology to efficiently identify relevant information while concurrently making all efforts to limit costs.

Particular issues pertaining

  • Work had to be completed within strict deadlines and to a high degree of accuracy to avoid regulatory penalty
  • The total amount of data consisting of over 10 million documents required a significant upfront processing and analysis effort.
  • The Data resided in multiple client databases – Clutch had to map, identify, and centralize the data prior to reducing and prioritising.
  • The data consisted of different media types (email, structured data, chats). The various activities needed to be reconstructed and joined into a chronology irrespective of type.


We determined that the best way to efficiently identify relevant information while reducing costs was to develop a defensible predictive model (leverage technology assisted review – TAR) to identify key documents for investigative and production purposes with minimal linear review.

Our subject matter experts reviewed the sample and seed sets of the data population to build the predictive model. This model was built by harnessing targeted Boolean searches, relationship analysers, and concept and lexicon searches and backing it with a unified strategy – to rapidly identify critical documents while defensibly excluding non-relevant documents. In addition to these targeted searches, we layered additional analytics tools such as near-duping technologies over the predictive model.

We ran the model over the entire dataset. Afterwards we first analysed the results in the 90 -100 range, highest probability of relevance, of the tiers created by the predictive model.  This yielded approximately 150k documents to use as the base review population.

Outcome in cost savings and time

  • As a result of leveraging the predictive model, the review lasted only 4 weeks and utilized only a fraction of the resources than if there had been no predictive analysis.
  • The client was able to meet all regulator requests. The regulators constantly expanded the scope of the inquiry but Clutch was able to seamlessly incorporate the additional custodians into the investigation. The material delivered to the client was assessed by the client and outside counsel prior to production to the regulator and deemed to be of the highest quality.
  • Had the review been conducted on the basis of search terms, assuming a 25% search term hit rate, then the population for review would have been 3.4 million documents. Instead, leveraging the predictive model, 150,000 documents were reviewed i.e. 95% fewer.
  • As a direct result of the deployment of predictive coding and other analytical tools, we were able achieve cost savings of up to $3m.

Long term benefits

  • The decision to drive the review through a predictive model and advanced analytical tools was validated not only by the drastically shortened timeline and the tremendous cost savings, but also by our two stage Quality Control process and comparison of results to documents tagged by outside counsel.
  • The predictive model not only streamlined review but proved to be more effective in its analysis than standard review. We discovered that the relevancy rate using the predictive model was 3 to 4 times more effective compared to similar projects where review was based solely on search terms.
  • Some key information that was not previously known to counsel was discovered over the course of the review thus strengthening their ability to make more informed decisions over the course of the investigation. An example being the undisclosed fees associated with the fund.
  • Furthermore, over the course of the review, we were able to bring to counsel’s attention the fact that there was a marked difference in the index calculation methods in practice compared to those stated in public offering representations.
  • Finally, we identified instances where bank staff were potentially experiencing conflicts of interest.
  • All these critical discoveries as well as insights from across 30 issues within the matter were memorialized by Clutch team members in fact development communications. Over the course of the review, our team issued 9 in-depth investigative reports. Furthermore, the team highlighted 11,000 Priority and Relevant documents.
  • These findings made a firm impression on the client who were happy to present the results to the relevant regulators and also support the use of predictive analytics in future.