Files
roytam1 4b8580917e import changes from `dev' branch of rmottola/Arctic-Fox:
- some pref. cleanup (21e17660e7)
- add some font names and aliases (cb38962246)
- remove unused dom.max_child_script_run_time (d214b353d4)
- align strange layout.css.scroll-snap.enabled overwrite (f2562a5cc1)
- reshuffle some preferences, remove unused (41f586186b)
- more reshuffle and cleanup of preferences (0208aa32a3)
- Bug 1168891 Part 1 - Refine two functions related to caret positioning. r=mats (86d718d60e)
- Bug 1168891 Part 2 - Allow one caret to be dragged across the other caret. r=mats (9276eb7728)
- part of Bug 1252802 - Web page scrolls when dragging caret in editable, r=snorp (31dade8b77)
- Bug 1235508 - Re-implement fast Phone number selection on long-press, r=TYLin (59b6371d17)
- Bug 1249201 Part 1 - Add "scroll" reason to CaretStateChangedEvent. r=smaug (b92ff6cbfc)
- Bug 1249201 Part 2 - Show carets continuously when panning or zooming. r=mats,sebastian (ca5c51c479)
- Bug 1245246: Add null check for mDocViewerPrint in nsPrintEngine::FirePrintingErrorEvent. r=roc (e9d5b49a3f)
- Bug 1025267 - Make some -moz- prefixed pseudo-classes chrome-only. r=bz (238f7a85d4)
- Bug 1259889 Part 1 - Add @supports -moz-bool-pref for internal-only style sheets. r=heycam (d716a7b884)
- Bug 1237633 - Part 1: Percentages are not allowed in a <source-size-value>. r=jdm (52ccffbf86)
- Bug 1081362 - Change nsStyleBasicShape pointer to an nsRefPtr, to avoid leak in unexpected case. r=dholbert (2a5cb8ffdd)
- Bug 1264317 - Make the basic shape clip-path clipping use nsCSSValue::Array instead of nsCSSValueList. r=dholbert (7aaf39f2d7)
- Bug 1247150 - Consistently use StyleSheetHandle::RefPtr* for outparams in nsLayoutStylesheetCache. r=dholbert (ddc85f29f8)
- Bug 1251848: Check StyleSheetHandles for being null-flavored before derefing them, in assertions within nsLayoutStylesheetCache::InvalidateSheet. r=bholley (edb3924075)
- Bug 1245260 - Add crashtest; r=hiro (6347e37750)
- Bug 460209 - Add crashtest. (97b4786de2)
- Bug 474377 - Add crashtest. (516b4e8164)
- Bug 1264396 - Don't allow animation of 'display' property; r=heycam (6e94bcb26a)
- missing bit of  759568 - Part 1 (fc954f075b)
- part of Bug 1037483 replace microdata with microformats (4ff01e11d6)
- Bug 1245334 - Make PromiseMessage.jsm ids more meaningful. r=baku (913ac1b9a5)
- Bug 1094201 - Implement an Integration.jsm module for low-overhead registration of overrides. r=mak (9982624b90)
- Bug 1167663 - Mark nsCSSKeyframeStyleDeclaration/nsCSSPageStyleDeclaration::mRule as MOZ_NON_OWNING_REF. r=dbaron (6d4e9751a1)
- Bug 1244992 - Avoid double-counting in various refcounted types related to nsCSSValue. r=heycam. (c830949dd9)
- Bug 1262646 - Change the outparams passed to nsStyleUtil::AppendEscapedCSSString from nsString to nsAutoString. r=dholbert (2b0caadf9d)
- Bug 1247336 - De-dupe changes in ActiveLayerTracker before treating property as animated. r=roc (c44ed5aee6)
- space fix (5e79d245ea)
- Bug 1266288 - Track changes to all margin properties for scroll-linked effects. r=mstange (fed6994e4d)
- Bug 1259641 - Do not force reflow for all tabs when size mode changed. r=smaug (70847cc6d2)
- Bug 1261265 - Fix nsStyleContext::MoveTo flag assertions to allow mismatch on parents if bit is set on child. r=dholbert (3e6b08372e)
- Bug 1264837 Part 43 - Remove SVGFEUnstyledLeafFrameBase. r=dholbert (bb55feda77)
- Remove mention of old SVG text pref in comment; no bug. (DONTBUILD) (3a618aca18)
- Bug 752638, part 1 - Move SVGTextFrame::SetupContextPaint to nsSVGUtils. r=heycam (c125c2903f)
- Bug 1258843 - Don't build SVG display items if their visibility is hidden. r=dholbert (150c3b0059)
- Bug 1258650. Properly use aExtraMasksTransform when combining masks. r=Bas,a=kwierso (ba5ea1928b)
- Bug 1263789 - Stop nsSVGMaskFrameNEON.h from polluting the global namespace. r=dholbert (e2c8544d35)
- Bug 1162418 - Try to find a suitable non-zero dimension to use when containing block's inline-size depends on an SVG element which is specified as a percentage of its container. r=jwatt (3eab79c8a4)
- Bug 1250143. Account for border/padding on outer <svg> elements in GeometryUtils. r=mats (f307820b75)
- Bug 1243623. Don't skip unregistering a table part if we have a split table. r=mats (35bb0821c1)
- Bug 1203417. Propagate error result from PaintTableFrame. r=seth (866e47b3e4)
- Bug 1209780. Propagate the use of MOZ_MUST_USE DrawResult in nsTablePainter. r=seth (851618d06c)
- var-const (29d5e9f859)
- Bug 1209780. Propagate the use of MOZ_MUST_USE DrawResult in nsTreeBodyFrame::PaintText. r=seth (1ce563ea18)
- Bug 1203626 - remove the unused argument from nsTreeBodyFrame::GetTwistyRect. r=mattwoodrow (03293f52b5)
- Bug 1218041, part 1: Give nsTreeBodyFrame::PaintImage a fallback codepath for painting SVG images with no explicit height or width. r=seth (b6fd3a39f7)
- Bug 1218041, part 2: add reftests for <treecell> SVG-image rendering. (no review) (90231e0bfa)
- Bug 1224736: When image size lookup fails in nsTreeBodyFrame::PaintImage, only fall back to use the full destRect if we've got a VectorImage. r=tn (dd7d7667ca)
- Bug 1156108 - Make nsTreeColumns::mFirstColumn an nsRefPtr; r=roc (f6888480bc)
- Bug 1255069 - use UniquePtr for storage in nsTreeContentView; r=dholbert (598256735f)
- Bug 1181560 - ensure previous menus get closed when opening new ones, r=Enn (2c88f3452a)
- Bug 1192655 - Make menubar not react to events when it is not visible. r=enn (2bbcbc81a2)
- Bug 1197913 - Keep the last hovered item highlighted after moving the cursor outside the <select> drop-down list on Windows. r=neil (abd3240473)
- Bug 1228029 - Fix the usage of gtest assertion macros in TestJobScheduler.cpp. r=kats (0fcc9aa6fe)
- Bug 1244234 - Simplify joining jobs with the gfx job scheduler. r=jrmuizel (f4b6bbf418)
- Bug 1239288 - Add a shutdown test to the gfx job scheduler. r=jrmuizel (fd2432d108)
- Bug 1239288 - Fix a race in the win32 job scheduler's shutdown. r=jrmuizel (4e509b4bf3)
- Bug 1241161 - make Matrix4x4::ProjectTo2D normalize out perpective where possible. r=mattwoodrow (5a68e396a3)
- bits of  Bug 1135138 - Remove UNICODE from DEFINES (1eb51a0a79)
- Bug 1249640: Part 4 Android to use new blocking. r=snorp (855e5c0dda)
- Bug 1234875 - Remove alwaysAcceptSessionCookies pref. r=mak (8bed323449)
- Bug 1247912 - convert left side expression to int64_t when assigning to mCookiesLifetimeSec in order to avoid overflow. r=jdm (0cedb68c83)
- code and comment style (9215d74a8f)
- code and comment style (1d4cda31af)
- Bug 1219928 - Skip misspelled words in style blocks. r=enndeakin. (91dd0bcedf)
- Bug 1236968 - autodial telemetry r=mayhemer (3844b9c19e)
- Bug 1254310 - Add a hidden pref to temporarily disable Safe Browsing on given hostnames. r=gcp (4955fc88f8)
- Bug 772528 - Remove nsFileInputStream::Seek() from nsPartialFileInputStream::Init(). r=baku (15db900fb5)
- Bug 1150921 - Add telemetry for response codes to SafeBrowsing requests. r=francois f=bsmedberg (215d50e4ad)
- Bug 1164518 - Better logging of completions. r=gcp (95b4fe3731)
- Bug 1172688 - Add telemetry for when gethash calls timeout. r=francois, r=bsmedberg (b94a2b38a7)
- Bug 1266184 - Implement nsIMIMEInputStream.data getter. r=mcmanus (8c9159c030)
- Bug 1239955 - Let DNSService rely on IOService::Offline, r=bagder (336f161d21)
- Bug 1260407 - added logging for proxy/pac to aid debugging, r=mcmanus (a179275ca6)
- Bug 1259089 - Set TCP socket to non-blocking in sts again, just to be sure. r=mcmanus (bf0656bf07)
- Bug 1256473 - Cast values to avoid C4838 on VS2015; r=mayhemer (d4b138dba8)
- Bug 1260764 - Creation of PollableEvent needs a lock r=dragana a=kwierso (01c9d5e477)
- Bug 652186 - Implement URL Standard's backslash replacement r=mcmanus (6485fa7e8c)
- Bug 1042347 - %2e entered in URL bar not normalized leading to denormalized request r=mcmanus (3fc1ff92cd)
- Bug 377052 - nsBaseURLParser::ParseURL doesn't handle spaces embedded in the scheme properly r=mcmanus (1f54055b9d)
- fix editor format (444d6a62c4)
- Bug 1154124 - Prevent recursion when calling HTTP cache entry's callbacks. r=michal (7bdfbf603d)
- Bug 1247644 - Don't do any I/O on doomed and unused HTTP cache entries, r=michal (7668d29a36)
2024-08-07 16:47:10 +08:00

279 lines
7.9 KiB
JavaScript

/* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */
"use strict";
this.EXPORTED_SYMBOLS = ["PageMetadata"];
const {classes: Cc, interfaces: Ci, utils: Cu, results: Cr} = Components;
Cu.import("resource://gre/modules/Services.jsm");
Cu.import("resource://gre/modules/XPCOMUtils.jsm");
Cu.import("resource://gre/modules/microformat-shiv.js");
XPCOMUtils.defineLazyServiceGetter(this, "UnescapeService",
"@mozilla.org/feed-unescapehtml;1",
"nsIScriptableUnescapeHTML");
/**
* Maximum number of images to discover in the document, when no preview images
* are explicitly specified by the metadata.
* @type {Number}
*/
const DISCOVER_IMAGES_MAX = 5;
/**
* Extract metadata and microformats from a HTML document.
* @type {Object}
*/
this.PageMetadata = {
/**
* Get all metadata from an HTML document. This includes:
* - URL
* - title
* - Metadata specified in <meta> tags, including OpenGraph data
* - Links specified in <link> tags (short, canonical, preview images, alternative)
* - Content that can be found in the page content that we consider useful metadata
* - Microformats
*
* @param {Document} document - Document to extract data from.
* @param {Element} [target] - Optional element to restrict microformats lookup to.
* @returns {Object} Object containing the various metadata, normalized to
* merge some common alternative names for metadata.
*/
getData(document, target = null) {
let result = {
url: this._validateURL(document, document.documentURI),
title: document.title,
previews: [],
};
this._getMetaData(document, result);
this._getLinkData(document, result);
this._getPageData(document, result);
result.microformats = this.getMicroformats(document, target);
return result;
},
getMicroformats(document, target = null) {
if (target) {
return Microformats.getParent(target, {node: document});
}
return Microformats.get({node: document});
},
/**
* Get metadata as defined in <meta> tags.
* This adds properties to an existing result object.
*
* @param {Document} document - Document to extract data from.
* @param {Object} result - Existing result object to add properties to.
*/
_getMetaData(document, result) {
// Query for standardized meta data.
let elements = document.querySelectorAll("head > meta[property], head > meta[name]");
if (elements.length < 1) {
return;
}
for (let element of elements) {
let value = element.getAttribute("content")
if (!value) {
continue;
}
value = UnescapeService.unescape(value.trim());
let key = element.getAttribute("property") || element.getAttribute("name");
if (!key) {
continue;
}
// There are a wide array of possible meta tags, expressing articles,
// products, etc. so all meta tags are passed through but we touch up the
// most common attributes.
result[key] = value;
switch (key) {
case "title":
case "og:title": {
result.title = value;
break;
}
case "description":
case "og:description": {
result.description = value;
break;
}
case "og:site_name": {
result.siteName = value;
break;
}
case "medium":
case "og:type": {
result.medium = value;
break;
}
case "og:video": {
let url = this._validateURL(document, value);
if (url) {
result.source = url;
}
break;
}
case "og:url": {
let url = this._validateURL(document, value);
if (url) {
result.url = url;
}
break;
}
case "og:image": {
let url = this._validateURL(document, value);
if (url) {
result.previews.push(url);
}
break;
}
}
}
},
/**
* Get metadata as defined in <link> tags.
* This adds properties to an existing result object.
*
* @param {Document} document - Document to extract data from.
* @param {Object} result - Existing result object to add properties to.
*/
_getLinkData: function(document, result) {
let elements = document.querySelectorAll("head > link[rel], head > link[id]");
for (let element of elements) {
let url = element.getAttribute("href");
if (!url) {
continue;
}
url = this._validateURL(document, UnescapeService.unescape(url.trim()));
let key = element.getAttribute("rel") || element.getAttribute("id");
if (!key) {
continue;
}
switch (key) {
case "shorturl":
case "shortlink": {
result.shortUrl = url;
break;
}
case "canonicalurl":
case "canonical": {
result.url = url;
break;
}
case "image_src": {
result.previews.push(url);
break;
}
case "alternate": {
// Expressly for oembed support but we're liberal here and will let
// other alternate links through. oembed defines an href, supplied by
// the site, where you can fetch additional meta data about a page.
// We'll let the client fetch the oembed data themselves, but they
// need the data from this link.
if (!result.alternate) {
result.alternate = [];
}
result.alternate.push({
type: element.getAttribute("type"),
href: element.getAttribute("href"),
title: element.getAttribute("title")
});
}
}
}
},
/**
* Scrape thought the page content for additional content that may be used to
* suppliment explicitly defined metadata. This includes:
* - First few images, when no preview image metadata is explicitly defined.
*
* This adds properties to an existing result object.
*
* @param {Document} document - Document to extract data from.
* @param {Object} result - Existing result object to add properties to.
*/
_getPageData(document, result) {
if (result.previews.length < 1) {
result.previews = this._getImageUrls(document);
}
},
/**
* Find the first few images in a document, for use as preview images.
* Will return upto DISCOVER_IMAGES_MAX number of images.
*
* @note This is not very clever. It does not (yet) check if any of the
* images may be appropriate as a preview image.
*
* @param {Document} document - Document to extract data from.
* @return {[string]} Array of URLs.
*/
_getImageUrls(document) {
let result = [];
let elements = document.querySelectorAll("img");
for (let element of elements) {
let src = element.getAttribute("src");
if (src) {
result.push(this._validateURL(document, UnescapeService.unescape(src)));
// We don't want a billion images.
// TODO: Move this magic number to a const.
if (result.length > DISCOVER_IMAGES_MAX) {
break;
}
}
}
return result;
},
/**
* Validate a URL. This involves resolving the URL if it's relative to the
* document location, ensuring it's using an expected scheme, and stripping
* the userPass portion of the URL.
*
* @param {Document} document - Document to use as the root location for a relative URL.
* @param {string} url - URL to validate.
* @return {string} Result URL.
*/
_validateURL(document, url) {
let docURI = Services.io.newURI(document.documentURI, null, null);
let uri = Services.io.newURI(docURI.resolve(url), null, null);
if (["http", "https"].indexOf(uri.scheme) < 0) {
return null;
}
uri.userPass = "";
return uri.spec;
},
};