Methods for Web Content Analysis and Context Detection

This project was part of Portland State University’s senior capstone program. It is the work of seven students over the course of six months. For the duration of the project we worked with a Mozilla adviser, Dietrich Ayala, to keep on track with the project’s original requirements. The team was composed of the following students:

Overview and Goals of the Project

This project was a research-intensive proof of concept for a feature that would expand the reader mode to content beyond articles, for one of many Mozilla projects. We set out to solve the problem of how to “put the internet back in the hands of the user”, as web pages are often bloated with unnecessary content that degrades the user experience.

In developing nations with low-powered smartphones and slow internet connections, this can incur a high computing cost on browsing and affect battery life. In our research we divided the problem into four main areas: the quality of the user’s internet connection, the target device of the user, what content is important to the user, and is the data accessible to those with disabilities.

For example, the graph below shows that the difference on one of the web pages tested with and without reader mode was nearly 6mb.

Data Usage

By understanding what part of a web page is content and what is not, we can limit data usage, by only downloading the relevant content. In addition, if we can grab only what’s necessary from a site, it opens the possibility of the user’s device optimizing the view of this data.

This transformation of the data for contextual presentation can be used to improve accessibility, or enable alternate browser models. We outline several possible efficient methods of content analysis. Ultimately, we found that currently available tools solve only a subset of the problems identified. However, by utilizing several of these tools and the concepts explored in our research paper, we believe it is possible to implement such a feature.

What does this mean for an everyday web developer? Imagine smarter tooling for content analysis, detection, and optimization that could be built as advanced features of the browser in the near future. Imagine developer tools that would make building website accessibility and platform-specific features far easier and less costly than it is today.

Read on to learn more about our findings and the research we designed to test our ideas.

Installation & Usage

The process outlined in our paper is referred to as “Minimum Contextualization”, or contextualization for short. This process is split into three main phases: Content analysis, content filtration and content transformation. Each of these phases has several steps.

Phoenix-node is a command line application written in Node.js that we developed to analyze HTML document structure. It relies on Node.js 4.0+, the npm package manager, and the jsdom npm package and its dependencies.

  1. Install Node.js 4.0+ following the instructions for your environment:
  2. Clone the Phoenix-node repository from
  3. Install jsdom into the source directory with ‘npm install jsdom’. A node_modules folder will be made.
  4. Run phoenix-node parsing with ‘node alt.js’. This will print the DOM structure to the terminal.

Phoenix Output


Research Findings

Our research identified three major phases in the contextualization process: content analysis, content filtration and content transformation. Our findings focus on content analysis. Content filtration and content transformation are not covered in our research.



For content analysis, we recommend two distinct steps: The first step should identify which “Structure Group” a site falls into by utilizing cluster analysis of document structures. In the second step, one of several methods can be used to parse through the site to determine which content is essential for the user to understand its meaning. For example, if a site is placed into a cluster which is text-heavy and has little to no other content, then basic reader mode features are sufficient for this, such as shallow-text methods. Otherwise a more advanced method must be used, such as semantic segment detection (discussed further in our paper).

Through our research we were able to learn about the limitations inherent in modern reader mode techniques and the status of similar research. Our team’s recommended method for content analysis and context detection is to utilize a cluster analysis to group like pages in order to learn about the archetypal structure in a cluster and group sites with similar structures together.

Read the full paper here:

Methods for Web Content Analysis and Context Detection

View full post on Mozilla Hacks – the Web developer blog

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

CSS source map support, network performance analysis & more – Firefox Developer Tools Episode 29

Firefox 29 was just uplifted to the Aurora release channel. This means that it is time to report some of the major changes that you can expect to see inside of the Developer Tools for this release.

Better Looking Tools

In addition to new features, we have been updating the look and feel of our dark and light themes. The light theme has been completely overhauled, and both themes feature a more consistent design throughout the toolbox. Your current theme can be changed from the Toolbox settings. (development notes)

Network Monitor

The Network Monitor now shows you how long it takes the browser to load different parts of your page. This will help measure the network performance of applications, both on first-run and with a primed cache. (development notes)

To open the performance analysis tool, click the stopwatch icon in the network panel. For more information, watch the screencast below or read more on MDN.

You can now copy an image request as a Data URI. Just right click on the image request, select the item from the context menu, and the Data URI will be on your clipboard. (development notes)


We’ve updated the inspector highlighter behavior to bring the highlighting functionality more in line with other tools. (development notes)

CSS transform preview tooltips have been added to the CSS rule view. Now, if you hover over a CSS transform, you will get a tooltip with a visualization of the transform. Grab a download of Firefox Nightly or Aurora and try it out on some live CSS transfom examples. (development notes)

CSS rule view now supports pasting multiple CSS declarations at once, like background: #ccc; color: red. (development notes).

Just like in the network panel, you can now copy <img> elements as Data URIs. (development notes)

Style Editor

CSS source map support has been added to the Style Editor. (development notes), and CSS properties and values will now be autocompleted in the Style Editor. (development notes)

Keep an eye out for a post on Hacks in the very near future with more information about how to use the source maps feature.


We have added a classic call stack list in the debugger next to the list of sources. (development notes)

There is a new ‘enable/disable all breakpoints’ button in the debugger. This will toggle the active state of all existing breakpoints at once, to allow switching between normal usage and debugging quickly. (development notes)

You can now highlight and inspect DOM nodes from the debugger. If you hover a DOM node in the variables listing it will be highlighted on the page, and if you click on the inspect icon the node will be opened in the inspector tab. This feature is also available in the console output. (development notes)

Pretty printing now preserves code comments. We are using the open source pretty-fast pretty printer, so it should be pretty fast. If it isn’t, be sure to let us know. (development notes)


console.trace improvements. The call stack is shown inline with other output, and includes links to access each line in the debugger. (development notes)

We’ve also improved console object output to show additional information based on the object type. (development notes)

Code Editor

The code editor can be seen throughout the tools in places like Scratchpad, Style Editor, and Debugger. Here are some of the updates you will see in this release:

  • Code folding in the editor. (development notes)
  • Emacs and VIM keybindings are now available in the code editor. To enable them, open about:config, and set “devtools.editor.keymap” to either “vim” or “emacs”, then restart DevTools. (development notes)
  • ES6 syntax highlighting support (development notes)

Big thanks to all of our DevTools contributors this release (43 people)! Here is a list of all DevTools bugs resolved for Firefox 29.

Do you have feedback, bug reports, feature requests, or questions? As always, you can comment here or get in touch with the team at @FirefoxDevTools.

View full post on Mozilla Hacks – the Web developer blog

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Soil Analysis Manager/Geotechnical

AL-Mobile, Aerotek Scientific is looking for a Soil Analysis Manager in Baldwin County with 5-10 years geotechnical soils testing and laboratory management experience. This person would oversee staff of 10 performing laboratory analysis on a variety of soil samples and: – Maintain proper and complete laboratory data; – Obtain laboratory certifications; – Perform equipment maintenance and calibrations; – Lead View full post on Monster Job Search Results (mobile)

View full post on

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Financial/Business Analyst with Build vs. Buy Analysis Exp.

Matrix Resources Santa Ana, CA
Job description: …and reporting. Perform configuration and testing of cost estimator tool in collaboration with internal business systems specialists and vendor. Participates as part of project team to drive implementation and achieve project success. … View full post on – Search Specialist

View full post on

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Web Trends/KPI analysis

Ambrosia Infotech Ltd. Bloomington, IL
Job description: …Location: Bloomington ILDuration: 6 MonthsRole: Web Trends/KPI analysisDescription:Responsibilities will…ensure new and existing analytic code integrates with the application and technical architectures.- Assisting in acquisition, licensing, installations… View full post on – Web Application

View full post on

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)