New concepts ```````````` In 3.7 or earlier the system kept all languages inside one version and would allow only one person to modify their data. This made it impossible for multiple persons to edit each language separately and at the same time, instead they would have to wait for one language to be updated and published. Languages are now more self-contained and independent. This allows languages to be created and edited separately by multiple users, it also simplifies things like approval since you can consider one language only. The major changes to content model are: - A user only edits one version and language at a time. - Objects can be created in any language on any site-access, the first language of the object is recorded. - Objects can be translated to as many languages as necessary. - Available languages can be controlled per site-access, allowing languages to be filtered. Global translations ------------------- A new table (*ezcontent_language*) is introduced which replaces the old table (*ezcontent_translation*). Each entry in the table gets a new *id* which is a representation of the bitnumber as a value. ie. 2^bitnum. Id 1 (2^0) is reserved by the system. The language is stored with a locale code (e.g. nor-NO) and a name, in addition it also has the field *disabled* which can be used to disable a language on the site. The maximum number of languages is 30. The global table looks like the following:: ezcontent_language +----+----------+--------+--------------------------+ | id | disabled | locale | name | +----+----------+--------+--------------------------+ | 2 | 0 | eng-GB | English (United Kingdom) | | 4 | 0 | rus-RU | Russian | | 8 | 0 | slk-SK | Slovak | | 16 | 0 | eng-US | English (American) | | 32 | 0 | nor-NO | Norwegian (Bokmal) | +----+----------+--------+--------------------------+ Initial language ---------------- Whenever an object is created for the first time it will store the language the object was created in. This allows the system to find one language on the object when none of the others can be used (e.g. filtered away). The value will be stored in the field *initial_language_id* using the *language_id* from the *ezcontent_language* table. The new field looks like:: ezcontentobject +-----+----------------------+ | id | initial_language_id | +-----+----------------------+ | 1 | 2 | (eng-GB) | 2 | 2 | (eng-GB) | 5 | 2 | (eng-GB) | 10 | 2 | (eng-GB) | 42 | 16 | (eng-US) | 44 | 4 | (rus-RU) +-----+----------------------+ Available languages ------------------- To make it easier to figure out which languages are available on an object a new attribute is added which returns the languages as an array of locale codes. The new attribute is called *available_languages* and will use the *language_mask* field combined with the global language table to figure out the correct codes. The result of this attribute is shown below:: Object 1: array( 'eng-GB' ) Object 2: array( 'eng-GB', 'slk-SK' ) Object 5: array( 'eng-GB', 'rus-RU' ) Object 10: array( 'eng-GB' ) Object 42: array( 'eng-US' ) Object 44: array( 'rus-RU' ) Language priority and filtering ------------------------------- With the new concepts is no longer directly known which languages to use per object, instead this must be calculated based on the requested languages in prioritized order and the available languages (in object). The requested languages might be different per site-access so the solution must be fast and scalable. A language priority list is basically a list of languages to show and the first language will get higher priority than the second and so on. Filtering languages ~~~~~~~~~~~~~~~~~~~ To perform proper language filtering and prioritization a new bit-field algorithm is introduced. All languages used in an eZ Publish instance are identified by powers of 2 to be able to use bitmap *and* for selecting rows containing information on objects or attributes in prioritized languages. Id 1 (2^0) is reserved for marking objects which are always available. To show how the bit-field algorithm works, let's have a look at the ezcontentobject table. A new field is introduced to identify languages in which last published version of an object exist. This field contains the sum of id values of these languages. Because id values are powers of 2 it is possible to identify the languages and/or to select objects published in any of prioritized languages:: ezcontentobject +-----+---------------+ | id | language_mask | +-----+---------------+ | 1 | 3 | %00011 | 2 | 10 | %01010 | 5 | 6 | %00110 | 10 | 2 | %00010 | 42 | 16 | %10000 | 44 | 4 | %00100 +-----+---------------+ Note that language_mask 3 in the first row denotes that the language is English (id 2) and the object is always available (id 1), even if the prioritized list does not contain English (eng-GB). For example, to filter objects which exist in Slovak (id 8) or Russian (id 4), we might run the following SQL:: SELECT id FROM ezcontentobject WHERE language_mask & 12 > 0 Whenever the available languages change for the object (i.e. new version published) it the bit-field is updated. Another bit-field is used in tables containing the language_code attribute. This bit-field represents the same information as the language_code but uses the id from ezcontent_language table to make easy to select correct rows with respect to the list of prioritized languages. Consider the following example:: ezcontentobject_attribute +----+---------+------------------+---------------+-------------+ | id | version | contentobject_id | language_code | language_id | +----+---------+------------------+---------------+-------------+ | 10 | 1 | 1 | eng-GB | 3 | | 31 | 1 | 2 | eng-GB | 2 | | 32 | 1 | 2 | slk-SK | 8 | | 46 | 1 | 5 | eng-GB | 2 | | 47 | 1 | 5 | rus-RU | 4 | +----+---------+------------------+---------------+-------------+ To select content object attributes in prioritized languages, we might use a single SQL containing a condition using bit-fields, bitmap *and* operation and simple arithmetic operations (multiplication, division and sum). Consider the following prioritized language list: slk-SK, eng-GB. To check if the attribute is in slk-SK we can check if language_id & 8 is not zero etc. If we perform the following operation:: ( language_id & 8 ) / 2 + language_id & 2 + language_id & 1 we will get the bit-field value having 1 on 0th bit if the attribute is always available (i.e. the object containing it have to be shown even if it is not in any of prioritized languages), having 1 on 1st bit if the attribute is available in eng-GB and having 1 on 2nd bit if the attribute is available in slk-SK. To find out if the attribute is in one of the prioritized languages and that there is no translation of this attribute in more prioritized language, we select those which hold the following condition:: ( ( language_mask - language_id ) & 8 ) / 2 + ( language_mask - language_id ) & 2 + ( language_mask - language_id ) & 1 < ( language_id & 8 ) / 2 + language_id & 2 + language_id & 1 where *language_mask* is the ezcontentobject attribute containing the bit-field of available languages. All languages ~~~~~~~~~~~~~ To solve the issues with cronjobs and admin interfaces which must always list all available languages we introduce a special flag. When used the SQLs will include all objects even if it does not have any of the other languages. Translated attributes --------------------- With the new database concepts it is possible for multiple users editing the same object but in different languages (and versions). This means that storage of attributes is changed since one version does not reflect all languages anymore. In short it means that for one version of an object there will only be attribute data for one language when editing. When the version is published the system will copy the attribute for all other languages from the last published version to this one. The table will then look like:: ezcontentobject_attribute +----+------------------+---------+---------------+-------------+ | id | contentobject_id | version | language_code | language_id | +----+------------------+---------+---------------+-------------+ | 7 | 5 | 1 | eng-GB | 2 | | 8 | 5 | 1 | eng-GB | 2 | | 9 | 5 | 2 | rus-RU | 4 | | 10 | 5 | 2 | rus-RU | 4 | | 7 | 5 | 2 | eng-GB | 2 | | 8 | 5 | 2 | eng-GB | 2 | | 7 | 5 | 3 | eng-GB | 2 | | 8 | 5 | 3 | eng-GB | 2 | | 9 | 5 | 3 | rus-RU | 4 | | 10 | 5 | 3 | rus-RU | 4 | | 7 | 5 | 4 | slk-SK | 8 | | 8 | 5 | 4 | slk-SK | 8 | | 7 | 5 | 4 | eng-GB | 2 | | 8 | 5 | 4 | eng-GB | 2 | | 9 | 5 | 4 | rus-RU | 4 | | 10 | 5 | 4 | rus-RU | 4 | | 22 | 44 | 1 | eng-GB | 2 | | 23 | 44 | 1 | eng-GB | 2 | +----+------------------+---------+---------------+-------------+ Node assignment --------------- As with Translated attributes there are now issues with the node assignments as well. For instance if two users edits the same object in different languages at the same time they will each get a copy of the published node assignment, if they both modify it there will be conflicts when publishing. The avoid this conflict the system will no longer allowed locations to be added, removed or moved from the admin interface (except first version). Any changes to locations will have to be done from the admin interface (locations tab). The system will track the operations in the the node-assignment table and merge the result with the "live" tree structure to properly add, remove or move nodes. Note: It will still be possible for PHP scripts to add, remove and move locations. Object relations ---------------- Currently all relations are now stored only per version and not per language. This means that there will be possible conflicts when two languages are edited at the same time. When publishing a version/language it makes sure the relation list is merged together with the previous published data. This means that removed relations must be marked as removed and not just deleted from the database. Ignoring translation -------------------- Some objects will need to always be available no matter which site-access is used. For instance user objects will need to be fetched even if they are not translated to the current language. The current solution already has this on class attributes but is extended further. Class ~~~~~ A new field called *always_available* (default 0) is added to *class* table which defines if objects of this class are always available. If this is set to 1 any object created from it will always be available. One of the major uses for this is for classes which needs to be always available, for instance users or user groups. When creating objects the objects will copy the current setting in the class, any changes to the setting in the class will not affect existing objects. This allows this setting to be switched per object at a later time. By default eZ Publish will come with this setting turned on for Folders, Users and User Groups. Image datatype ++++++++++++++ The ezimage datatype got some changes to the XML storage due to the new multi-language features. Now it always stores the tag with attribute ID, version and language and it is only updated when a new image is uploaded. An example:: This was required since the system will now copy over attribute data when translation is disabled by using pure SQL commands. Upgrade ------- The system will still work with old entries having empty data in but will fill in the information if it is edited and published. Also old non-translatable attributes have the tag set correctly and will still work. Objects ~~~~~~~ When an object is always available the first bit (bit 0) will be set to 1. This means it will be fetched no matter which priority list is used. It also means that the object can still be translated. For objects which has none of the available languages it will pick the initial language. Workflows ````````` Each version of an object can now only contain language, this means that there is no need to change the workflow system or calls to the 'publish' process. Instead the workflow events *approval* and *multiplexer* will get support for filtering on language, they will store a language mask in the *data_int2* DB field to choose which language to match on, a value of 0 means any language. Permissions ``````````` Being able to restrict access based on language is very useful since a translator might not be allowed to modify the original content but only create a new translation. The following access functions have been given language support: - content/create : specify if user is allowed to create new object in specified language. - content/edit : specify if user has got access to edit the specified language(s) and create new translation for specified language(s). To translate you need *content/read* and *content/edit* for the language to translate into, this means users can translate objects without having *content/create* rights. View-cache `````````` With the new multi-language feature a view-cache is no longer made for one specific language but for a prioritized list of languages, this means that the directory storage must change. We replace the singular language code and site-designs with the name of the site-access e.g. if we have the languages *nor-NO*, *eng-GB* and *ger-DE* in the site-access *no* we get:: var/cache/no/full/ We also optimize the cleaning code to use the *glob* call with *GLOB_BRACE* expansion, e.g:: glob( "var/cache/{admin,mysite}/{full,line}/1/5/15-*.cache", GLOB_BRACE ); This change means that it now uses *AvailableSiteAccessList* and *RelatedSiteAccessList* (*site.ini*) when cleaning up caches. Trash ````` Currently objects are restored from trash by going to content/edit and then getting a new version. This will not work very well with the new system where there is only one language per version. A new view in the content module handles restoration of trash objects (content/restore). It will allow the user to restore to old location or browse for a new location. Cronjobs ```````` Cronjobs are run with all languages enabled to ensure that they can reach any object. The cronjob system enables the global flag for the language class which makes all subsequent operations fetch any object in the system regardless of the language. Configurability ``````````````` Currently the system will use the site-access settings to determine which languages to fetch. However sometimes it is crucial to able to override this per session or function call. This means that most fetches should have an ability to choose the priority list. In addition PHP code can change the current prioritized languages used by all operations by doing:: // Only Norwegian and English are shown, no matter what INI settings are // configured to $languages = array( 'nor-NO', 'eng-GB' ); eZContentLanguage::setPrioritizedLanguages( $languages ); Fetching node/object lists -------------------------- When fetching list (template fetch function) of nodes or objects it is possible to override the default language of the site-access. This can be used to narrow down the languages or to fetch objects in languages normally not accessible. The fetch function has a parameter called *language* which is an array of languages to act as the priority list (instead of the one defined in site-access). The existing parameters *only_translated* and *language* to the fetch functions *content/list*, *content/tree*, *content/list_count* and *content/tree_count* are deprecated since they are not valid anymore. If they are used the system will give a warning about it. The *treemenu()* operator also has the *language* parameter. API ``` Handling languages on the site is done trough the new class eZContentLanguage, it has methods to fetch the current priority list, full language list or to manipulate the available languages for the current request. Site preferences ---------------- To control which languages are used on the site (or site-access) some new INI settings are available, they are placed under the *RegionalSettings* group in *site.ini*. To control the available languages and their priority the setting *SiteLanguageList* is used, it is an array with language codes (from locale) which are used, the ones appearing at the top will get higher priority than the next one. e.g. to setup a site with two languages, English and German you would do:: [RegionalSettings] SiteLanguageList[] SiteLanguageList[]=eng-GB SiteLanguageList[]=ger-DE Changing the order of languages (e.g. German first) is simply rearranging the setting like this:: [RegionalSettings] SiteLanguageList[] SiteLanguageList[]=ger-DE SiteLanguageList[]=eng-GB If this setting is not added the system will use the old *ContentObjectLocale* setting which means only one language is shown. Also since it useful to show all languages another setting is available which is called *ShowUntranslatedObjects*. It is very often used for the administration page but also be used for any site-access you choose. If this setting is enabled the system will not filter away languages but will still use *SiteLanguageList* for priority. To enable this setting you would do:: [RegionalSettings] ShowUntranslatedObjects=enabled eZContentObject::defaultLanguage() ---------------------------------- This static method now behaves a bit different than earlier. It will now return the top-most language in the language list or the value of *ContentObjectLocale* if it is set. If you have been using this in your code you should see if it still is the correct method to use, using the attribute *initial_language_code* on the object might be of more use since it returns the first language for the object. User interface `````````````` The user interface (admin) has been updated because of the enhanced multi-language features. The main idea is to make it easier for the users to manage translations. List languages in draft list ---------------------------- Each language of an object now shows up as a separate entry in the draft list. If a user edits two languages on the same objects it will show two entries. Creating and editing -------------------- When creating a new object it is now possible to choose which language it should be made in using a drop-down list. The available languages are filtered based on which class is chosen. If JavaScript is not enabled no drop-down is shown but instead the user gets an extra screen asking the user to choose the language. When editing an object the system will give the user a language choice if the language is not set in the URL. The user can edit an existing language or create a new one (if possible). Editing multiple languages -------------------------- To ease the translation process it will still be able to edit two or more translations (UI wise) at the same time, however the process is changed from earlier. Internally the system actually edits two versions of the same object. Context menus ------------- The JS popup menus must has an extra sub-menu, it contains the languages the object can be edited in or the choice to create a new translation (Another). Changes to existing functionality ````````````````````````````````` While the multi-language features for 3.8 tried very hard to not change any setting or interface which already present (only adding), there were some points which needed changes, they are explained here: content/action -------------- Action AddAssignment/SelectAssignmentLocation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This action no longer creates a new version of the object for adding a location but instead creates the node immediately. Action RemoveAssignment ~~~~~~~~~~~~~~~~~~~~~~~ This action no longer accepts the POST variable *AssignmentIDSelection* which contains list of node-assignment ids. Instead use the variable *LocationIDSelection* and fill it with Node IDs. The template *location.tpl* (admin) has also been changed to use this new variable and fetching locations from the node table and not node-assignments. Outtakes ```````` Due to technical constraints the following items were not able to make it into 3.8. URL aliases ----------- For instance if you have a French site where you only show french translations of the URLs it is desirable to show the french URL alias for the users. The translated URL will point to the same internal URL but will use language priority to choose what to show. Instead of introducing a new table for this or changing the subtree table, the existing URL alias table is extended. This table will now have a new field called *language_code* which tells which language the url is made for. The system will then create/update these entries for all languages of an object when it is published. The table will look like:: ezurlalias +---------------------------+-------------+--------------------------------------+ | destination_url | language_id | path_id_string | +---------------------------+-------------+--------------------------------------+ | content/view/full/2 | 2 | | | content/view/full/13 | 32 | folder_1/pingvinen_har_en_megafon | | content/view/full/13 | 2 | folder_1/the_penguin_has_a_megaphone | +---------------------------+-------------+--------------------------------------+ The PHP class eZContentObjectTreeNode will have this translated url-alias available as a function attribute (*localized_path*), if it is null in the object it will fetch it from the DB. To avoid having to perform lost of repeated SQL calls for each node all fetches nodes should be associated with some collection, then we can go over the nodes in the collection and collect multiple node IDs and use that for the SQL. (The existing cache system might also suffice). The fetch calls should get the possibility to fetch the name and path immediately, this means that if you know you will use the name and/or path it can be fetched in one go (after the main SQL). A new parameter is added to the fetch functions for this. Search engine ------------- Add *language_code* on word table, this will separate words per language and solve the issue when one word exists in two or more languages but with different meaning. Another issues it solves is when you search in specific language and the word you are looking for does not exist in this language but in other ones, then there is no need to include that in the search (it could even tell the user that). An example on how it could look:: ezsearch_word +------+--------------+-------+-------------+ | id | object_count | word | language_id | +------+--------------+-------+-------------+ | 1 | 5 | to | 2 | | 2 | 2 | to | 32 | | 3 | 1 | kaker | 32 | | 4 | 1 | go | 2 | +------+--------------+-------+-------------+ Search UI --------- When searching it must be possible to override the default language of the site-access. This can be used to narrow down the languages or to fetch objects in languages normally not accessible. The language must be able to be specified in two forms: - Using a parameter to the search template fetch function. - Using a GET/POST parameter from an HTML form. The languages which can be chosen must be one of the languages defined in the site-access. Choosing translation per node ----------------------------- The user should be able to choose which translation to use per placement. This means that the priority list would change depending on which node is chosen, e.g. the Norwegian article is only shown in the Norwegian sub-tree while the English is shown in the English sub-tree. This is a highly complicated technical issue which cannot be solved without major changes to the tree structure. .. Local Variables: mode: rst fill-column: 79 End: vim: et syn=rst tw=79