Enumeration, measurement and identification of net zooplankton samples using the ZOOSCAN digital imaging sstem Philippe Grosjean, Caroline Warembourg, Marc Picheral & Gabriel Gorsk Université de Paris VI / Observatoire Océanologique de Villefranche sur mer
Wh to use image analsis for zooplankton? Manual analsis is time-consumming Particle counters measure general abundance and size-spectra, but the give little information on the nature of particles (taxa) Numerical images are easier to manipulate and share, possibl through the internet, than fixed biological samples Image analsis provides both size and identification of all particles in a sample
The ZOOSCAN sstem is: Specificall designed for high-resolution / highspeed digitalization of net mesozooplankton and micronekton, including historical series Provided with a software for automatic objects analsis and measurements Supplemented with an innovative and powerful automatic recognition sstem Producing outputs in the form of spreadsheets, graphs and databases (standard formats)
Eas 3 steps data acquisition 1) Pour and distribute the sample in the cell 2) Digitize and analze the picture in one mouse click
Image digitization 20% of the whole image 200 µm Each picture is 17,500 x 7,000 pixels and contains hundreds of animals
Image analsis 1 mm
Eas 3 steps data acquisition 2) Digitize and analze the picture in one mouse click 3) Collect the sample back without lost or degradation 1) Pour and distribute the sample in the cell <15 min
Flowchart of the sstem Digitization Direct scan of zooplankton Image processing Processed images Zooscan Raw grascale image Measures Database External sstems Results storage Biodiversit & marine life databases (OBIS) Objects recognition Manual recognition Learning process Selective manual recognition Training set (small fraction) Recognition improvement (facultative) Automatic recognition sstem Automaticall identified objects Statistical analsis Graphical representation Space-time series analsis Size-classes analsis Biomasses calculation Etc
Machine learning Dataset (series) Training set + manual ident. (1 / series) + Learning Test set Automatic classification
Application to zooplankton Groups in the training set L (total = 1127) AcartiaSp CalptopisSp CavoliniaInflexa CentropagesSp Chaetognathes CladoceransOthers ClausoParacalanus Copepodit Corceidae CreseisSp EggOthers EuterpinaSp FishEgg FishLarva FritillariaSp Hdrozoa MolluskEggs MolluskOthers Msis OikopleuraSp OithonaSp PeniliaSp Pluteus reject Salp SiphonoCali SiphonoPhso TemoraSp Zoe 0 20 40 60 80 Number of items 1 ear of weekl samples; >400,000 objects Training set of 1,127 objects manuall classified into 29 groups Test of different classification algorithms
Results accurac (%) time to predict 10,000 items (sec) 50 60 70 80 90 0 5 10 15 20 25 Methods accuracies and timings (8 external samples) lda qda mda fda knn lvq tree rpar bagg db.l db.k rfor svm nnet dvf lda qda mda fda knn lvq tree rpar bagg db.l db.k rfor svm nnet dvf 15 methods tested and compared Recognition accuracies higher than 80% are difficult to obtain Our new method (dvf, discriminant vector forest) is both efficient and reasonabl fast
Complementar semi-automatic identification Suspect items are automaticall tagged (~10-15%) An optional interactive session allows for manual verification of these suspect items The accurac of dvf increases up to 85-90% when the suspect items are manuall verified
Analsis of a series 0 100000 250000 total AcartiaSp CalptopisSp CavoliniaInflexa CentropagesSp 2 4 6 8 4 6 8 10 14 janv. mai juil. sept. nov. Centropages janv. janv. mai juil. sept. sp nov. + janv. Oithona janv. mai sp juil. (black) sept. nov. janv. versus janv. mai Chaetognathes juil. sept. nov. janv. janv. mai (red) juil. sept. nov. janv. 0.0 0.5 1.0 1.5 5 10 15 20 25 0 2 4 6 8 10 Chaetognathes 0 2 4 6 8 1 2 3 4 5 5 10 20 30 CladoceransOthers EuterpinaSp MolluskOthers reject 0.5 1.5 2.5 3.5 0.0 0.2 0.4 5 10 15 20 ClausoParacalanus mai juil. sept. nov. janv. janv. FishEgg mai juil. sept. janv. nov. janv. Msis 0 5 10 15 0.00 0.10 0.20 0.30 30 40 50 60 70 Copepodit FishLarva OikopleuraSp 10 20 30 40 0 5 10 15 0.5 1.0 1.5 Corceidae FritillariaSp OithonaSp 0 20 40 60 1 2 3 4 5 0.0 1.0 2.0 CreseisSp EggOthers Hdrozoa PeniliaSp 0.2 0.6 1.0 1.4 2 4 6 8 10 % Chaetognates 0 2 4 6 8 MolluskEggs Pluteus 20 40 60 80 1 2 3 4 5 Salp SiphonoCali 0 2 4 6 8 12 0.0 1.0 2.0 SiphonoPhso TemoraSp 4 6 8 10 12 0.5 1.5 2.5 3.5 Pluteus Penilia larvae sp % Centropages sp + Oithona sp % Penilia Plutei sp 10 20 30 40 00 10 2 204 306 40 8 janv. mars mai juil. sept. nov. janv. janv. janv. mars mars mai mai juil. juil. sept. sept. nov. nov. janv. janv. 0 1 2 3 4 5 Zoe
Conclusions Resolution and speed of the ZOOSCAN are well adapted for zooplankton image analsis A new machine learning algorithm (discriminant vector forest) allows high recognition levels (> 75% in our test series, and up to 85-90% using a complementar semi-automatic identification) Large-scale practical applications are now possible, such as analsis of historical or long-term series, including online experts consulting and images sharing For more information Visit the Zooscan web site at: http://www.zooscan.com