Specifications: eZDBNFSClusterHandler ##################################### .. contents:: Table of Contents Multiple issues keep showing when using most common flavours of NFS. These are mostly related to attribute caching & delay added by NFS to file create / delete operations. This document describes most of them, and clearly shows that most workarounds will have a severe impact on NFS performances: http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html Current eZFS & eZFS2 issues =========================== Generation handing (stalecache) depends on real-time system operations (file creation). NFS totally prevents this, and makes PHP return unreliable results when performing file operations (creation failing with a wrong return, possibilities of multiple openings of the same file). Cache items expiry can also be misinterpreted by mount points as NFS can add a delay when checking for a file expiry or existence. Solution: mix eZDB and eZFS =========================== It has been proved that the eZDB approach is more reliable and controllable than eZFS. The database allows these realtime operations in a satisfactory way. On the other hand, the eZDB approach is blamed for the storage mechanism used (files stored in BLOBs in the database, leading to a HUGE database and possible speed issues). It should be possible to mix these approaches by using both eZDB and NFS: - eZDB is reliable, and can be used to store files metadatas - NFS is not reliable, but can still be used to store the actual data eZDBNFS will use a custom database, much like the standard ezdbfile, to ensure data integrity. Cache processing (stalecache) will be performed in this database, while real files can be stored on NFS itself. The handler will take care of creating files on NFS. Exactly like eZDB currently does, eZDBNFS will also copy files locally to each eZ publish instance upon request, in order to avoid querying NFS when data have not changed. Implementation ============== This handler's implementation is mostly based on eZDB. All the parts of eZDB that interact with file data will be replaced with file operations from/to the DFS. Architecture ------------ * X eZ publish web servers - local var folder: `/var/www/var/` - local moint point to the NFS share: `/data/nfs` - uses `eZNFSDB` * 1 NFS server - locally mounted on each frontend * 2 databases - the standard eZ publish relational DB - a cluster DB, with the single ezdfsfile table Current eZDB API interaction ---------------------------- The following table indicates for each method wether it interacts with local data (fs), db metadata (ezdfsfile) or db filedata (ezdfsfile_data). +-----------------------------+------------+-----------+-------+ | Method | FS | Metadata | Data | +=============================+============+===========+=======+ | ezdfsfileHandler | N | N | N | +-----------------------------+------------+-----------+-------+ | loadMetaData | N | R | N | +-----------------------------+------------+-----------+-------+ | fileStore | R | W | W | +-----------------------------+------------+-----------+-------+ | fileStoreContents | R | W | W | +-----------------------------+------------+-----------+-------+ | storeContents | R | W | W | +-----------------------------+------------+-----------+-------+ | fileFetch | W | W | R | +-----------------------------+------------+-----------+-------+ | processCache | R/W | R/W | R/W | +-----------------------------+------------+-----------+-------+ | isFileExpired | N | R | N | +-----------------------------+------------+-----------+-------+ | isLocalFileExpired | R | N | N | +-----------------------------+------------+-----------+-------+ | isDBFileExpired | N | R | N | +-----------------------------+------------+-----------+-------+ | fetchUnique | W | R | R | +-----------------------------+------------+-----------+-------+ | fileFetchContents | R | R | R | +-----------------------------+------------+-----------+-------+ | stat | N | R | R | +-----------------------------+------------+-----------+-------+ | size | N | R | R | +-----------------------------+------------+-----------+-------+ | name | N | N | N | +-----------------------------+------------+-----------+-------+ | fileDeleteByRegex | N | W | W | +-----------------------------+------------+-----------+-------+ | fileDeleteByWildcard | N | W | W | +-----------------------------+------------+-----------+-------+ | fileDeleteByDirList | N | W | W | +-----------------------------+------------+-----------+-------+ | fileDelete | N | W | W | +-----------------------------+------------+-----------+-------+ | delete | N | W | W | +-----------------------------+------------+-----------+-------+ | fileDeleteLocal | W | N | N | +-----------------------------+------------+-----------+-------+ | deleteLocal | W | N | N | +-----------------------------+------------+-----------+-------+ | deleteLocal | W | N | N | +-----------------------------+------------+-----------+-------+ | purge | R/W | R/W | R/W | +-----------------------------+------------+-----------+-------+ | fileExists | N | R | R | +-----------------------------+------------+-----------+-------+ | exists | N | R | R | +-----------------------------+------------+-----------+-------+ | passthrough | N | R | R | +-----------------------------+------------+-----------+-------+ | copy | N | R/W | R/W | +-----------------------------+------------+-----------+-------+ | fileLinkCopy | N | R/W | R/W | +-----------------------------+------------+-----------+-------+ | fileMove | N | R/W | R/W | +-----------------------------+------------+-----------+-------+ | move | N | R/W | R/W | +-----------------------------+------------+-----------+-------+ | getFileList | N | R | R | +-----------------------------+------------+-----------+-------+ | cleanPath | N | R | N | +-----------------------------+------------+-----------+-------+ | startCacheGeneration | N | W | N | +-----------------------------+------------+-----------+-------+ | endCacheGeneration | N | W | N | +-----------------------------+------------+-----------+-------+ | abortCacheGeneration | N | W | N | +-----------------------------+------------+-----------+-------+ | checkCacheGenerationTimeout | N | R | N | +-----------------------------+------------+-----------+-------+ | _cacheType | N | N | N | +-----------------------------+------------+-----------+-------+ | _get | N | N | N | +-----------------------------+------------+-----------+-------+ Handing atomicity ----------------- Atomicity of all file operations is critical. Since this handler would be using 2 storage mediums (DB for metadata, NFS for actual data, we need to make sure all operations are totally secured. No process should be able to access a file during write operations. For instance, when a new file is added to NFS, we have to: * lock this file for writing (stalecache in DB) * write the metadata to the database (stalecache before rename) * write the data to NFS (using the "stale" name) * make the file available for reading by other processes, in an order that will totally prevent readings before the operation is complete. Possible write algorithm of a new file -------------------------------------- 1) start generation * create the database entry of the .generating file * further processes requesting to read this file will be locked it a wait loop since no stale file exists 2) write file metadata to DB 2) write file to NFS * we can safely use the real filename (without .generating here) since the file will not be accessed by any other process (blocked by 1) 3) end generation * delete the .generating entry Possible read algorithm for a file not found locally ---------------------------------------------------- We assume that the file is remotely valid, but doesn't exist / is expired locally. 1) Check file validity in database 2) Copy the file from the local NFS mountpoint to the local folder. * NFS seems to ensure data safety when reading a file: if a file has an open handle and is modified / deleted from NFS during this time, reading can be completed safely since NFS keeps a local copy of opened files. Database -------- Structure ''''''''' This is the SQL CREATE for the database table required by eZDFS:: CREATE TABLE ezdfsfile ( datatype VARCHAR(60) NOT NULL DEFAULT 'application/octet-stream', name TEXT NOT NULL, name_trunk TEXT NOT NULL, name_hash VARCHAR(34) NOT NULL DEFAULT '', scope VARCHAR(20) NOT NULL DEFAULT '', size BIGINT(20) UNSIGNED NOT NULL, mtime INT(11) NOT NULL DEFAULT '0', expired BOOL NOT NULL DEFAULT '0', PRIMARY KEY (name_hash), INDEX ezdfsfile_name (name(250)), INDEX ezdfsfile_name_trunk (name_trunk(250)), INDEX ezdfsfile_mtime (mtime), INDEX ezdfsfile_expired_name (expired, name(250)) ) ENGINE=InnoDB; Fields details '''''''''''''' * ezdfsfile.datatype: File datatype * ezdfsfile.name File path * ezdfsfile.name_trunk File's name trunk. Contains for some types of files (viewcache for instance) the common part that will be used to perform multiple removal operations faster. * ezdfsfile.name_hash MD5 transformed ezdfsfile.name. Used for quick access to a file (faster than ezdfsfile.name * ezdfsfile.scope File's scope * ezdfsfile.size File size in bytes * ezdfsfile.mtime File's mtime, as a unix timestamp * ezdfsfile.expired Will contain 1 if the file is considered as expired (e.g. deleted) Might be deprecated by ezdfsfile.status Ini Settings ------------ New settings are introduced by eZDFS in file.ini:: [ClusteringSettings] # Cluster file handler. # Since 4.1 name of the filehandlers have changed # you may choose between : # - eZFSFileHandler # - eZFS2FileHandler (requires linux or Windows + PHP >= 5.3) # - ezdfsfileHandler # - eZDFSFileHandler: handles NFS mount based architectures using # and it is case sensitive FileHandler=eZFSFileHandler [eZDFSClusteringSettings] # Path to the NFS mount point # Can be relative to the eZ publish root, or absolute MountPointPath= # Database backend DBBackend=eZDFSFileHandlerMySQLBackend DBHost=dbhost DBPort=3306 DBSocket= DBName=cluster DBUser=root DBPassword= DBConnectRetries=3 DBExecuteRetries=20 Misc ==== Issues ------ Class name tests ---------------- In a few places, we test the class returned by eZClusterFileHandler. This is hardcoded and won't work with the new handler. To fix this, an interface, eZClusterFileHandlerInterface has to be created (we can't use eZClusterFileHandler as a root class, as this name is already used for the singleton method eZClusterFileHandler::instance(). Ideas ----- metadata DB check delay ''''''''''''''''''''''' Add a configurable delay that would prevent mtime checks from being performed everytime a file is requested, like 3 or 5 seconds. This would save TONS of DB calls on high traffic site. The possible drawback is that a mix of valid & expired files could be used. Reverse proxies could also cache temporarily invalid pages => HTTP headers ? This will not be implemented for the first version. TODO ==== N/A Testing ======= Note: tests have been implemented in tests/tests/kernel/classes/clusterfilehandlers It is critical that this handler's developement is properly tested, in a unitary way. The first critical consideration is the testing structure itself. This particular handler requires a complex architecture: * NFS server * two local NFS mount points, NOT sharing the same cache * two local eZ publish instances * one local database (of course) * one local cluster database Glossary -------- F1: Frontend 1 F2: Frontend 2 DFS-DB: Cluster database DFS-F1: local mount point on F1 to the NFS server DFS-F2: local mount point on F2 to the NFS server Test examples ------------- Test if a *new* file (does not exist on DB/DFS) is correctly written. The file is created from F1. * Create random file on F1 in var * call eZDFSFileHandler::store() on this file * check if file exists: * Using eZDFSFileHandler::exists() on a new instance * By checking for the file's existence DFS-DB (raw SQL) * By checking for the file's existence on DFS-F1 (system call) Note: a common method can be implemented to test these 3 criterias