NoSQL La fin du relationnel? Michael Bailly RMLL 2011
Raison #1 RDBMS don't are hard to scale
Mais aussi... Dénormalisation Mise en cache Moteurs d'indexation (Solr, Sphinx) Files d'attentes (Gearman, ActiveMQ)
Reason #2 Schema stinks smells
Raison #3 Données éphémères
En résumé RDBMS Enregistrements de petite taille, dôtés de relations bien définies et normalisées NoSQL Requêtes pouvant évoluer Données à longue durée de vie (fréquence MAJ faible) Pas besoin de perfs exceptionnelles en lecture Intégrité des données > perfs extensibilité Jeu de données très important Données pouvant être modélisées en arbres dont l'accès se fait par le noeud racine Structure de données hautement dynamique Complexité du mapping relationnel/objet -> perte de productivité
Compromis #1 ACID BASE ACID = Atomicity, Consistency, Isolation, Durability BASE = Basically Available Soft-state Eventual consistency
Compromis #2 C A P Consistency, Availability, Partitiontolerance Pick two! Eric A. Brewer (2000)
Dépôts clé-valeur http://www.flickr.com/photos/nshepard/230901
Dépôts clé-valeur // création d'un utilisateur INCR global:next_user_id => 1234 SET uid:1234:username jdoe SET uid:1234:password p4s5w0rd SET username:jdoe:uid 1234 // authentification de l'utilisateur GET username:jdoe:uid => 1234 GET uid:1234:password => p4s5w0rd SET uid:1234:auth fea5e81ac8ca77622bed1c2132a021f9 SET auth:fea5e81ac8ca77622bed1c2132a021f9 1234 // récupération des docs de l'utilisateur GET uid:1234:docs => [7890, 4567, 2345] GET doc:7890 => "<owner_id> <ts> Awesome!" // ajout d'un doc INCR global:next_doc_id => 7891 SET doc:7891 "<owner_id> <ts> Enorme!" LPUSH uid:1234:docs 7891
Redis Ecrit en C, licence BSD Stockage en RAM Persistance via snapshotting asynchrome ou AOF STRING, LIST, SET, ZSET Non distribué Pas de tolérance aux pannes Mais très rapide!
Redis : cas d'utilisation "Memcached on steroids" Nuages de tags Statistiques, logs Sessions Moteurs d'indexation Job queues (resque, Celery)
Redis : exemple Tags : utilisation d'un SET // tagging du doc 1234 SADD article:1234 ruby SADD article:1234 python SADD article:1234 php // tagging du doc 6789 SADD article:6789 django SADD article:6789 python // récupération des tags communs SINTER article:1234 article:6789 => python
Redis : librairies PHP Python Ruby Predis redis-py redis-rb Rediska txredis em-ruby PHP-Redis ohm
Ohm class Event < Ohm::Model attribute :name reference :venue, Venue set :participants, Person counter :votes index :name def validate assert_present :name end end class Venue < Ohm::Model attribute :name collection :events, Event end class Person < Ohm::Model attribute :name end
Rediska $options = array( 'namespace' => 'Application_', 'servers' => array( array ('host' => '127.0.0.1', 'port' => 6379) ) ); $writer = new Rediska_Zend_Log_Writer_Redis( 'keyname', $options ); $log = new Zend_Log($writer);
Project Voldemort LinkedIn Ecrit en Java, licence Apache 2.0 Distribué! Réplication auto des données sur plusieurs serveurs Partitionnement auto des données Tolérant aux pannes Versioning des données Backends de stockage pluggables (BDB, MySQL)
Dépôts orientés colonnes http://www.flickr.com/photos/stuckincustoms/536710395
Cassandra Facebook Ecrit en Java, licence Apache 2.0 Distribué! Réplication auto des données sur plusieurs serveurs (et même datacenters) Partitionnement auto des données Tolérant aux pannes, décentralisé (pas de SPOF) Disponibilité paramétrable via le ConsistencyLevel Excellente performances en écriture (stockage
Modèle de données Column : { name : "emailaddress", value : "jdoe@example.com", timestamp: 123456789 } SuperColumn : { name : "physicaladdress", value : { street: { name: "street", value: "xxx", ts: 123 }, city: { name: "city", value: "Paris", ts: 123 }, zip: { name: "zip", value: "75017", ts: 123 } } }
Modèle de données ColumnFamily : Users = { jdoe: { username: { name: "username", value: "jdoe", ts: 123 }, email: { name: "email", value: "jdoe@example.com", ts: 123 } }, jane: { username: { name: "username", value: "jane", ts: 123 }, email: { name: "email", value: "jane@example.com", ts: 123 }, gender: { name: "gender", value: "female", ts: 123 }, age: { name: "age", value: "25", ts: 123 }, } }
Modèle de données SuperColumnFamily : AddressBooks = { jdoe: { bob: { name : "physicaladdress", value : { street: { name: "street", value: "1 rue de la paix", ts: 123 }, city: { name: "city", value: "Paris", ts: 123 }, zip: { name: "zip", value: "75017", ts: 123 } } }, karen: { name : "physicaladdress", value : { street: { name: "street", value: "2 rue de la paix", ts: 123 }, city: { name: "city", value: "Paris", ts: 123 }, zip: { name: "zip", value: "75017", ts: 123 } } }, } }
Cassandra Object class Customer < CassandraObject::Base attribute :first_name, :type => :string attribute :last_name, :type => :string attribute :date_of_birth, :type => :date attribute :signed_up_at, :type => :time_with_zone validate :should_be_cool key :uuid index :date_of_birth association :invoices, :unique=>false, :inverse_of=>:customer end
Dépôts de documents http://www.flickr.com/photos/60849961@n00/238658480
CouchDB Ecrit en Erlang, licence Apache 2.0 Document = structure JSON API REST : HTTP + JSON Résolution de conflits facile (ID + n révision) Réplication incrémentale (scalable +++!) Robuste MapReduce incrémental Service Comet de notification des changements Partionnement automatique avec CouchDB Lounge
MongoDB Ecrit en C++, licence AGPL v.3.0 Document = structure JSON++, stockage en BSON API de requête assez complète Indexes Partitionnement (sharding) MapReduce Réplication Support payant possible In-Place updates
MongoDB : librairies PHP Python Ruby Mongo Pymongo (driver) Mango MongoMapper Mongoid MongoRecord MongoModel MongoDoc
MongoDB : shell interactif root@lenny:/opt/mongodb 1.3.2/bin#./mongo MongoDB shell version: 1.3.2 url: test connecting to: test type "help" for help > show dbs admin local test > db.people.save({"firstname":"john"}) ObjectId("4b8cccc622131491059056cd") > person = db.people.findone({ firstname : "John"}) { "_id" : ObjectId("4b8cccc622131491059056cd"), "firstname" : "John" } > person.lastname = "Doe" Doe > db.people.save(person) > db.people.findone({ firstname : "John"}) { "_id" : ObjectId("4b8cccc622131491059056cd"), "firstname" : "John", "lastname" : "Doe" }
Pymongo >>> import pymongo >>> from pymongo import Connection >>> conn = Connection('localhost', 27017) >>> db = conn.blog >>> posts = db.posts >>> import datetime >>> post = {"author": "Raphael",... "title": "La guerre des frameworks",... "tags": ["rails", "django", "symfony"],... "date": datetime.datetime.utcnow()} >>> posts.insert(post) ObjectId ('4b8ce378a5835f0dda000000') post = {"author": "Raphael",... "title": "Django + MongoDB = Mango",... "tags": ["python", "django", "mongodb"],... "date": datetime.datetime.utcnow()} >>> posts.insert(post) ObjectId ('4b8ce531a5835f0dda000001') >>> for post in posts.find({"author": "Raphael"}):... print post... >>> posts.count() 2 >>> d = datetime.datetime(2010, 3, 1) >>> for post in posts.find({"date": {"$gt": d}}):... print post
Pymongo >>> contact = {"firstname": "John",... "lastname": "Doe",... "address": {"street" : "1 rue de la paix",... "city": "Paris"},... "phones": [{"type": "home", "number": "+331234567"},... {"type": "mobile", "number" : "+33678901234"}]} >>> contacts.insert(contact) ObjectId ('4b8ce9f8a5835f0dda000002') >>> contact = {"firstname": "Jane",... "lastname": "Doe",... "address": {"street" : "1 rue de la paix",... "city": "Paris"},... "phones": [{"type": "home", "number": "+331234567"},... {"type": "mobile", "number" : "+3361234567"}]} >>> contacts.insert(contact) ObjectId ('4b8ceb90a5835f0dda000003') >>> for contact in contacts.find({"address.city": "Paris"}).sort("lastname"):... print contact...
Pymongo >>> contacts.find({"address.city": "Paris"}).sort("lastname").explain()["cursor"] u'basiccursor' >>> contacts.find({"address.city": "Paris"}).sort("lastname").explain()["nscanned"] 3.0 >>> from pymongo import ASCENDING, DESCENDING >>> contacts.create_index([("address.city", ASCENDING), ("lastname", ASCENDING)]) u'address.city_1_lastname_1' >>> contacts.find({"address.city": "Paris"}).sort("lastname").explain()["cursor"] u'btreecursor address.city_1_lastname_1' >>> contacts.find({"address.city": "Paris"}).sort("lastname").explain()["nscanned"] 2.0
MongooseJS var Schema = mongoose.schema ; var Phone = new Schema({ type: {type: String}, number: {type: String} }); var Person = new Schema({ firstname: {type: String}, lastname: {type: String, required: true}, address: { street: {type: String}, city: {type: String} }, phones: [Phone] }); mongoose.model("person", Person) ;
MongooseJS var newperson = mongoose.model("person"); newperson.firstname = "John" ; newperson.lastname = "Doe" ; newperson.address = {street: "1 rue de la Paix", city: "Paris"}; newperson.phones.push({type: "home", number: "+331234567"}); newperson.phones.push({type: "mobile", number: "+33067891234"}); newperson.save(function(err) { if (!err ) console.log("saved!"); }); mongoose.model("person").find( {"address.city" : "Paris" }, function (err,docs) { if ( err ) return console.log("aie!",err) ; console.log("found "+docs.length+" persons living in Paris"); } );
Un "planet NoSQL" http://nosql.mypopescu.com/
Merci de votre attention Questions?