About project
Central Forensic Laboratory of the Police (CFLP) is a leader of the consortium which consists of the top Polish universities (Medical University of Warsaw, Jagiellonian University in Kraków, Pomeranian Medical University in Szczecin) and Polish bioinformatical company (ARDIGEN). The Consortium received funding from the National Center for Research and Development for the implementation of the project with the acronym SMAFT (Soil Microbiome Analysis Forensic Tool) which aims to investigate the potential of using the soil microbiome in forensics (DOB-BIO10 / 03/01/2019).
Due to the ease of transferring on shoes, clothes, tools or vehicles, soil is a particularly valuable evidence material that allows to connect a suspect or object with a specific geographical location. Unfortunately, often the information obtained through routine soil analyzes, conducted mainly on the basis of its physical and chemical characteristics, does not allow for the verification of investigative hypotheses. For this reason, there is a need to develop soil identification methods based on DNA analysis of the microorganisms which inhabit it. It is estimated, that 1 gram of dry soil contains on average: 1010 viruses, 1010 bacteria and archaea (including 108 actinomycetes), 106 each of fungi and algae, 105 protozoa and 102 nematodes. Moreover, the soil may also contain plant fragments as well as invertebrates other than nematodes and extracellular nucleic acids. It seems, that such a huge 'library' of DNA, combined with modern sequencing technologies, may, in the future, prove to be just as useful for comparing soil traces or determining its place of origin as human DNA profiles for establishing the relationship between biological traces and a suspect. At present, soil microbiome DNA evidence is not used routinely in investigations or in courts. To change this, it is necessary to show that the statistical inference based on DNA analysis of the soil microbiome is sufficiently rigorous to be considered by the courts as evidence of adequate scientific basis. According to key opinion leaders in the field of forensic science, confirmation of the credibility and effectiveness of microbiome analyzes can be achieved by conducting research on sufficient sample size, creating databases of microbiome from various environments based on clearly defined and well-documented procedures, improving bioinformatics tools and learning about the dynamics of microbiome change in time and location.
The assumptions of the SMAFT project, which aims to develop an innovative tool for forensic analysis of the soil microbiome and take into account the guidelines presented above. The data obtained from sequencing nearly 1000 DNA samples of soil microbiomes will be the basis for the design of the SMAFT system, enabling the identification and determination of the place of origin of a soil sample. Information obtained with the use of the SMAFT system will be able to direct, and thus accelerate, investigations in criminal cases and those related to terrorist attacks. The system can also be used in biodiversity research.
Research plan of the SMAFT Project is organized in 8 work packages:
Collection of 960 soil samples, 240 in each season: autumn, winter, spring and summer from 80 different locations in Poland (3 samples in each location).
The selection of soil sampling sites was made on the basis of analysis of data from hydrological and meteorological measurements performed in all weather stations located in Poland, collected over last 20 years. This stage also includes the development of a detailed methodology for soil sampling (regarding securing, describing and transporting samples to the laboratory under appropriate conditions).
Isolation of microbiome DNA from soil.
Before starting the task, several tests were carried out to select the most effective method of DNA isolation from soil in terms of the quantity, purity and quality of the obtained isolate.
Preparation of NGS libraries from the isolates.
In order to obtain the best quality libraries with the desired fragment lengths, the library preparation procedure was optimized, both at the stage of DNA fragmentation and amplification. The library fragment lengths are verified by capillary electrophoresis, and the DNA concentration in the libraries is determined using the fluorometric method.
WGS (whole genome sequencing) sequencing using Illumina® SBS technology with the latest generation NovaSeq 6000 Illumina® sequencer.
The result of this task will be high-quality raw data from deep sequencing. To achieve this goal, we plan to obtain 80 to 100 million pairs of 150 bp reads per single soil sample.
Data analysis.
The data obtained in step four will be analyzed using various bioinformatics tools. The aim of the analysis is to select the optimal set of markers (identification panel), that will enable the assessment of the microbiome composition in a sample of unknown origin and then link the sample to a specific location. The selected genetic markers will be composed of highly informative genomic sequences which will allow qualitative and quantitative identification of the microbiome present in the analyzed sample.
Development, optimization and validation of the targeted NGS method for soil microbiome analysis.
Testing of samples will be based on selected genomic sequences defined in step five. The compatibility of the sequencing results for soil samples obtained in the fourth step (deep sequencing) with the sequencing results obtained for the test developed at this stage (targeted sequencing) performed with several sequencing methods of medium throughput will be verified. The optimal sequencing technology will be selected and validated, with particular emphasis on the specificity and limitations of the forensic analyzes.
Creation of an IT system for the analysis and interpretation of the results obtained by the genetic test and selected NGS technology.
Data obtained from DNA sequencing of soil microbiomes will be placed in the project database, which is an element of the system being created, constituting a 'map' of soil microbiome in Poland. Additionally, an efficient search engine within the database will be developed to allow effective database resources search and also interpretation of analysis results. The results of the DNA sequencing from a soil sample of unknown origin will be compared with the database using the developed software and then linked to the most probable location on the map of Poland. Ultimately, a complete predictive system will be created, which will allow identification of the bacterial communities/groups and interpretation of the data.
Testing the effectiveness of the created system in the conditions similar to real ones and preparation of the standard operation procedures (SOPs) describing the “ways of working” for the developed system.
The correctness of the parameters of the developed program/system will be assessed. Additionally, guidelines, procedures and instructions required to conduct predictive tests and then comparative analyzes of DNA samples isolated from soil will be developed.
The end product of the SMAFT project will be a complete predictive system, based on microbiome composition of soil samples, which will allow identification and tracking of the tested sample origin.