web content engineersSkip to content
Magusweb content engineers


 

Home > About us > Our technology

Our technology

Our technology is designed around a single goal – the precision delivery of content from the web.

Web content is our raw material, and Magus is dedicated to harnessing its full potential for our clients through our suite of state-of-the-art managed applications and custom engineered solutions.

Our powerful web content technology platform underpins all our solutions. It's robust, scalable and versatile. And there's a strong demand for bespoke solutions which leverage its specialist functionality.

Our platform

Trawling

Trawling is the process of downloading whole or part of a target website. Several spider applications are available to the system, each with particular characteristics and strengths. For example, our in-house spider has been developed to traverse JavaScript navigation systems and areas of websites accessed via forms-based interfaces, such as database driven pages.

Few, if any, other spiders have the complete range of capability that we have developed. As well as retrieving content from the free web we can access subscription, RSS and FTP sites. We can also trawl proprietary content from syndicated content providers like Factiva and Lexis-Nexis.

Harvesting

Downloaded pages are 'cleaned' by applying a content pattern to remove unwanted elements, such as HTML codes, navigation sidebars or repeated content. This produces an optimised index - clean, fast and accurate.

In addition content elements such as the news headline and publication date can be extracted, as required by the target application.

The harvesting process is fully automated, only requiring human input when there is a change to the target site's design or structure.

Indexing

Magus employs the highly scalable and customisable text search engine Lucene (part of the Apache Jakarta project) to handle all indexing duties.

An index is built for each target website and then these are merged, as required, to provide larger indices covering tens to thousands of websites.

Searching

Our query engine is fast and highly customisable. A particular strength is the speed with which it can return results to long, complex query strings.

Hosting

Our servers are located at one of Europe's premium dedicated co-location facilities, guaranteeing sub-second speeds, unparalleled uptime and industrial-strength reliability.

Our standard SLA (Service Level Agreement) exceeds by a considerable margin the baseline level of service guaranteed by most other companies. Our service includes:

  • Uptime guarantee minimum of 99.9%
  • High speed, load-managed servers with multiple redundancy
  • Dedicated switched Ethernet connectivity direct to the internet over the COLT Telecom backbone
  • Fault-tolerant architecture for network infrastructure, redundant power supply and environmental control
  • 24x7x365 availability and support
  • Full disaster recovery procedures

Bespoke solutions

We undertake bespoke projects that harness our specialist framework.

Take a tour of some creative applications:



Copyright [an error occurred while processing this directive] Magus Research Ltd