Clarida Data Warehouse


The Clarida Data WarehouseTM is a high performance data storage and indexing solution. The system combines high availability, horizontal scalability and flexible data formats to meet the needs of data-intensive organisations. Moreover, full indexation of databases contents is included out of the box allowing not only traditional queries on structured data fields but also full text search capabilities which can be performed in real-time.

The Clarida Data WarehouseTM solution contains advanced security features such as transport encryption, advanced role-based access control and Kerberos authentication making the system capable of securely handling sensitive data. The system was designed to be compatible with any software environments in use by the client’s organisation making data migration a relatively straightforward process.

The infographic displayed below shows a schematic overview of the Clarida Data WarehouseTM architecture.



Features


The Clarida Data WarehouseTM consists of a plethora of useful features ready for use by organizations of any size. Here are some of the many features of the Clarida Data WarehouseTM.

High Availability

Computing capacity can be added dynamically to the Clarida Data WarehouseTM by instantiating additional replication nodes automatically based on specific SLA query and data redundancy requirements.

Horizontal Scalability

The Clarida Data WarehouseTM can be sharded over multiple servers. When data volumes presented to the Clarida Data WarehouseTM grow significantly, new storage servers can be automatically created.

Fully Searchable Database

Any data field in the database can be searched and queried using the Clarida Data WarehouseTM web service interface. Advanced Natural Language Processing techniques such as stemming, stop word removal, sentence detection, etc. are available.

High-level Data Security

State of the art data encryption and user authentication methods are supported to guarantee no unauthorized access to sensitive data is possible. Moreover, these techniques guarantee data integrity by making data tampering by unauthorized individuals impossible.

Real-time Performance

All instances of the Clarida Data WarehouseTM can be automatically replicated to handle increasing data loads. This guarantees real-time performance even for big data.

Cross Platform Availability

Run Clarida Technologies software on the Operating System of your choice. No need to get expensive hardware, install and start using our software immediately on standard hardware.

Flexible Data Model

Any type of data can be stored in the Clarida Data WarehouseTM system. Amongst others we support the standard formats such as XML and JSON.

Open-ended Architecture

The Clarida Data WarehouseTM was designed as a modular system and can be expanded to handle changing data needs of the client’s organisation.

Deployment

The Clarida Data WarehouseTM can be deployed in a matter of hours to the premises of the client’s organisation. Deployment can be done to any hardware, both legacy as well as modern hardware are supported.

Data Warehouse Architecture

The Clarida Data WarehouseTM is a modular designed system consisting of several components. The actual data storage is performed using MongoDB. All structured and unstructured data in the MongoDB database is indexed using Apache SOLR, making the database fully searchable. The load balancers will distribute incoming query requests to the available database and indexer instances.

Other applications can communicate with the Clarida Data WarehouseTM using REST API web services. These web services will transfer incoming requests to the request handler which interfaces with the core of the system, thereby shielding of the complexities of the inner workings of the system to end users.

The Clarida Data WarehouseTM uses MongoDB as the core data storage solution. MongoDB is an open source NoSQL database which provides high performance data persistence. Both storage and retrieval will be significantly faster than in a traditional SQL database solution. MongoDB will be used to store any type of documents including JSON, XML, … The MongoDB database can be sharded to cope with dynamically growing data volumes without losing performance. Moreover, full replication of the database over multiple nodes can be performed to guarantee high availability, data redundancy and automatic failover. All communications with the database will be encrypted using SSL, user authentication can be performed using X509 certificates or Kerberos authentication.
The Clarida Data WarehouseTM uses Apache SOLR for indexing the entire contents of the database. SOLR is an open source indexing engine which can handle both structured data such as dates, times, numbers, geospatial codes as well as unstructured data in the form of free text. The index resulting from SOLR indexation allows for rapid document retrieval and query execution on any data fields available in the database. Indexing can be performed in real-time making newly added documents to the database immediately searchable. To cope with growing data volumes, instances of SOLR can be automatically replicated or distributed to maintain optimal indexing performance and query execution speeds. All communications with the SOLR indexing engine will be fully encrypted using SSL guaranteeing privacy for sensitive data. The Natural Language Processing features of SOLR allow for the parsing of complicated linguistic constructs and advanced querying of free text contents available in the database. All widely spoken languages are supported out of the box, including Dutch, German, French, Chinese and English.
Customers of the Clarida Data WarehouseTM solution have the option to replicate the database contents over several data servers. This allows the organization to achieve data redundancy, preventing data loss caused by hardware malfunctioning, but also offers additional computing resources available to handle incoming requests. The Core Load Balancers will distribute incoming traffic intelligently over available storage nodes to make optimal use of computing power and minimize response times. Also, the load balancers will guarantee correct system functioning in case one or more storage nodes fail due to hardware malfunctioning.
The Request Handler will deal with any incoming web service requests and interface with the core of the system to fulfill the requirements of each request in a real-time manner. The request handler shields off the complexities of the inner workings of the system to allow for the easy and flexible development and addition of new web services over time. The functionalities of the Clarida Data WarehouseTM can therefore be expanded without ripple-down effects to the core of the system.
Multiple instances of the Request Handler can be active simultaneously to guarantee high availability and reduce any potentially arising data bottlenecks. The Request Load Balancers will distribute incoming traffic intelligently over available request handlers to make optimal use of computing power and minimize response times. As an additional benefit, the Request Load Balancers will guarantee automatic failover in case of hardware malfunctioning.
Multiple web services will be offered out of the box to handle incoming client requests such as document storage, document retrieval, database queries, aggregate data statistics, etc. These web services can be expanded to offer additional features to the end user without having to modify the core of the system. These web services can be accessed through a REST API.

Have a question?

We'd love to hear from you, why not try one of the following ways:
EMAIL: info@claridatech.com
PHONE: +44 203 5144 631
ADDRESS: Regent Street 207, London, United Kingdom

Organization-wide Intelligence

Learn how to bring everything together and become an intelligent organization:
Read More