????? ??????? on 2012-05-06 09:34:48
If I might —perhaps you should consider adding a few images. I don’t mean to disrespect what you’ve said ; its very enlightening, indeed. However, I think would respond to it more positively if they could be something tangible to your ideas
pedro on 2009-08-20 11:10:45
thank you very much for this function is what I needed to resolve my problem. thkx
Kevin van Zonneveld on 2009-06-18 13:03:04
@ Brett Zamir: YEah I already have: class DATABASE_CONFIG { var $default = array( 'driver' => 'mysql', '....', 'encoding' => 'utf8', ); in my cake datasource which should execute that statement ever time. I'm kind of puzzled what else I need to make utf8 aware to avoid these question marks..
Brett Zamir on 2009-06-10 23:15:30
Nope, still not working, as indicated by my test characters...
Brett Zamir (test: ????? ) on 2009-06-10 23:14:34
@Kevin, do you have the "SET NAMES 'UTF8'" going too? (trying a few characters out) ?????
Kevin van Zonneveld on 2009-06-10 14:46:15
@ Brett Zamir: Good job man! I'm thinking the only place left that could screw us with unicode is mysql. I've changed the table collation to utf8_unicode_ci. Let's see if things improve.
Brett Zamir on 2009-06-04 01:48:09
Hello ?ukasz (Kevin, a Unicode bug?--otherwise, I can't credit this person for "input by"), I did modify get_html_translation_table() to keep the order of what PHP returns for that function (and as a result removed the hack within this and other functions for adding & at the end). One catch is that although get_html_translation_table() returns ', the functions we use like htmlspecialchars, return '. But we cannot modify get_html_translation_table() to add ' since that histogram (correctly) is keyed with an apostrophe leading necessarily to only one value ('). So, we have to modify the functions to work with ' as well (which is not a problem really since this is the only numeric character reference in the list (' is XML-only, so it couldn't be used)). So, I've fixed htmlspecialchars_decode() and html_entity_decode() to work with both ' and ' and also "fixed" htmlspecialchars() and htmlentities() to use ' for output as they do in PHP (without modifying get_html_translation_table() which uses '). I think that should address all the issues.
?ukasz Czerwi?ski on 2009-06-03 22:22:14
I have noticed that ' is decoded by html_entity_decode() as ' (apostrophe), but ' isn't!!! (of course when using 'ENT_QUOTES') The same problem is with htmlspecialchars_decode(). I have checked that in PHP decodes both ' and ' I tried to find the code in PHP sources, but they seems to be veery complicated. I have only found a structure that stores several entities - those decoded by htmlspecialchars_decode: php-5.2.9.tar.bz2/ext/standard/html.c, lines 454-466

static const struct {
	unsigned short charcode;
	char *entity;
	int entitylen;
	int flags;
} basic_entities[] = {
	{ '"',	""",	6,	ENT_HTML_QUOTE_DOUBLE },
	{ '\'',	"'",	6,	ENT_HTML_QUOTE_SINGLE },
	{ '\'',	"'",	5,	ENT_HTML_QUOTE_SINGLE },
	{ '<',	"&lt;",		4,	0 },
	{ '>',	"&gt;",		4,	0 },
	{ 0, NULL, 0, 0 }
};
As you can see, both &#039; and &#39; are listed. In case of JS code of these two functions (in fact I think we should modify get_html_transition_table), the modification is quite complicated...
Kevin van Zonneveld on 2008-12-31 12:35:26
@ Azriel Fasten: Yes but that would also make it harder for people to just copy 1 function: http://trac.plutonia.nl/projects/phpjs/wiki/DeveloperGuidelines#DependencyvsRedundancy The less dependencies the better, but of course we are not about to duplicate the histogram from get_html_translation_table 4 times, so dependencies are already made in this function family. I think we should probably first come up with the fastest str_replace as possible. And base our decision (Dependency vs Redundancy) on the final algorithm used.
Azriel Fasten on 2008-12-30 18:05:39
I think that perhaps the replace should be relegated to str_replace, and that function should be highly optimized. Many other parts of the library all use different ways of replacing. These should all use str_replace.
Kevin van Zonneveld on 2008-12-30 16:56:14
@ Azriel Fasten: You reported a bug by mail, that is exactly the same as the real PHP encountered at one point: http://bugs.php.net/bug.php?id=25707 I've read the bug report more thorough, and applied the same fix as was proposed there. I put the &amp; entity at the bottom of the histogram. Faster ways to replace (without using regex) can still be explored.
Kevin van Zonneveld on 2008-10-20 18:36:56
@ marc andreu: I've revised all of the functions like get_html_translation_table, htmlentities &amp; htmlspecialchars and their decoding counterparts, they now also support your second argument. Thank you!
marc andreu on 2008-10-15 15:15:33
Hi I needed to deal with secodn parameter of html_entity_decode() funcion, and I added it as follows. I hope to be right, however it's a suggestion. That's all folks. // {{{ html_entity_decode function html_entity_decode(string, quote_style ) { // Convert all HTML entities to their applicable characters // // + discuss at: http://kevin.vanzonneveld.net/techblog/article/javascript_equivalent_for_phps_html_entity_decode/ // + version: 810.621 // + original by: john (http://www.jd-tech.net) // + input by: ger // + improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + revised by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + bugfixed by: Onno Marsman // % note: table from http://www.the-art-of-web.com/html/character-codes/ // * example 1: html_entity_decode('Kevin &amp;amp; van Zonneveld'); // * returns 1: 'Kevin &amp; van Zonneveld' var histogram = {}, histogram_r = {}, code = 0; var entity = chr = ''; histogram['34'] = 'quot'; histogram['38'] = 'amp'; histogram['60'] = 'lt'; histogram['62'] = 'gt'; histogram['160'] = 'nbsp'; histogram['161'] = 'iexcl'; histogram['162'] = 'cent'; histogram['163'] = 'pound'; histogram['164'] = 'curren'; histogram['165'] = 'yen'; histogram['166'] = 'brvbar'; histogram['167'] = 'sect'; histogram['168'] = 'uml'; histogram['169'] = 'copy'; histogram['170'] = 'ordf'; histogram['171'] = 'laquo'; histogram['172'] = 'not'; histogram['173'] = 'shy'; histogram['174'] = 'reg'; histogram['175'] = 'macr'; histogram['176'] = 'deg'; histogram['177'] = 'plusmn'; histogram['178'] = 'sup2'; histogram['179'] = 'sup3'; histogram['180'] = 'acute'; histogram['181'] = 'micro'; histogram['182'] = 'para'; histogram['183'] = 'middot'; histogram['184'] = 'cedil'; histogram['185'] = 'sup1'; histogram['186'] = 'ordm'; histogram['187'] = 'raquo'; histogram['188'] = 'frac14'; histogram['189'] = 'frac12'; histogram['190'] = 'frac34'; histogram['191'] = 'iquest'; histogram['192'] = 'Agrave'; histogram['193'] = 'Aacute'; histogram['194'] = 'Acirc'; histogram['195'] = 'Atilde'; histogram['196'] = 'Auml'; histogram['197'] = 'Aring'; histogram['198'] = 'AElig'; histogram['199'] = 'Ccedil'; histogram['200'] = 'Egrave'; histogram['201'] = 'Eacute'; histogram['202'] = 'Ecirc'; histogram['203'] = 'Euml'; histogram['204'] = 'Igrave'; histogram['205'] = 'Iacute'; histogram['206'] = 'Icirc'; histogram['207'] = 'Iuml'; histogram['208'] = 'ETH'; histogram['209'] = 'Ntilde'; histogram['210'] = 'Ograve'; histogram['211'] = 'Oacute'; histogram['212'] = 'Ocirc'; histogram['213'] = 'Otilde'; histogram['214'] = 'Ouml'; histogram['215'] = 'times'; histogram['216'] = 'Oslash'; histogram['217'] = 'Ugrave'; histogram['218'] = 'Uacute'; histogram['219'] = 'Ucirc'; histogram['220'] = 'Uuml'; histogram['221'] = 'Yacute'; histogram['222'] = 'THORN'; histogram['223'] = 'szlig'; histogram['224'] = 'agrave'; histogram['225'] = 'aacute'; histogram['226'] = 'acirc'; histogram['227'] = 'atilde'; histogram['228'] = 'auml'; histogram['229'] = 'aring'; histogram['230'] = 'aelig'; histogram['231'] = 'ccedil'; histogram['232'] = 'egrave'; histogram['233'] = 'eacute'; histogram['234'] = 'ecirc'; histogram['235'] = 'euml'; histogram['236'] = 'igrave'; histogram['237'] = 'iacute'; histogram['238'] = 'icirc'; histogram['239'] = 'iuml'; histogram['240'] = 'eth'; histogram['241'] = 'ntilde'; histogram['242'] = 'ograve'; histogram['243'] = 'oacute'; histogram['244'] = 'ocirc'; histogram['245'] = 'otilde'; histogram['246'] = 'ouml'; histogram['247'] = 'divide'; histogram['248'] = 'oslash'; histogram['249'] = 'ugrave'; histogram['250'] = 'uacute'; histogram['251'] = 'ucirc'; histogram['252'] = 'uuml'; histogram['253'] = 'yacute'; histogram['254'] = 'thorn'; histogram['255'] = 'yuml'; // Reverse table. Cause for maintainability purposes, the histogram is // identical to the one in htmlentities. for (code in histogram) { entity = histogram
;
        histogram_r[entity] = code;
    }
    
    var retTemp = (string+'').replace(/(\&amp;([a-zA-Z]+)\;)/g, function(full, m1, m2){
        if (m2 in histogram_r) {
            return String.fromCharCode(histogram_r[m2]);
        } else {
            return m2;
        }
    });
    
    //Add for Marc Andreu Fernadnez. To decode quotes.
    // Encode depending on quote_style
    if (quote_style == 'ENT_QUOTES') {
        retTemp = retTemp.replace('&amp;quot;','&quot;');
        retTemp = retTemp.replace('&amp;#039;',&quot;'&quot;);
    } else if (quote_style != 'ENT_NOQUOTES') {
        // All other cases (ENT_COMPAT, default, but not ENT_NOQUOTES)
        retTemp = retTemp.replace('&amp;quot;','&quot;');
    } 
    
    return retTemp;
}// }}}

rekcor on 2008-06-23 11:58:42
Thanks for the code! But shouldn't you destroy
tarea
(otherwise we will end up with n numbers of textareas floating around in the DOM's hyperspace)
Kevin van Zonneveld on 2008-03-20 15:06:39
@lubber: You sure did! And as I said, as soon as php.js supports optional components, I will include them. Thanks again!
lubber on 2008-03-20 06:43:58
@Kevin: i use these functions to shrink my GET-Parameters in cases where POST wasnt possible (imagine an img-tag which will generate a custom picture and the parameters will exceed the 2048 url-chars limit on IE (that was the case for me)) Anyway, i just wanted to contribute my 2cent for this project :)
Kevin van Zonneveld on 2008-03-19 16:29:52
@ lubber: Wow that is some awesome code and I will definitely save the links. However, the 2 functions are probably rarely used in JavaScript. That hasn't stopped me before, but in this case the 2 functions alone (72kB) will increase the total project size by 52%. That's a bit to much for now. However, when php.js gets a page for component customization, I will include the functions and just leave them unchecked by default. Sounds good?
lubber on 2008-03-19 09:14:28
you can find the javascript equivalents for gz_inflate and gz_deflate here http://www.onicos.com/staff/iz/amuse/javascript/expert/inflate.txt http://www.onicos.com/staff/iz/amuse/javascript/expert/deflate.txt
john on 2008-03-18 01:47:53
ha, sry about that!
Kevin van Zonneveld on 2008-03-15 23:48:52
@ ger: Aha that was ugly. Thanks for helping us!
ger on 2008-03-15 22:25:26
heh... I almost sure a can see some js code after the return...; in the function source listed at this page.