<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Derivante &#187; Clay vanSchalkwijk</title>
	<atom:link href="http://www.derivante.com/author/admin/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.derivante.com</link>
	<description>to obtain or receive from a source</description>
	<lastBuildDate>Mon, 26 Apr 2010 18:44:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>ActiveRecord and Zend_Paginator_Adapter_Interface</title>
		<link>http://www.derivante.com/2009/10/29/activerecord-and-zend_paginator_adapter_interface/</link>
		<comments>http://www.derivante.com/2009/10/29/activerecord-and-zend_paginator_adapter_interface/#comments</comments>
		<pubDate>Thu, 29 Oct 2009 14:29:07 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=757</guid>
		<description><![CDATA[Zend has a lot of tools to help speed up the application development process.  One such tool I found useful, was Paginator.   I am using php-activerecord in my project using Zend_Framework as the backend, to tie the two together is very simple.  Paginator requires two methods, it needs to be able to pull a count [...]]]></description>
			<content:encoded><![CDATA[<p>Zend has a lot of tools to help speed up the application development process.  One such tool I found useful, was Paginator.   I am using <a href="http://www.phpactiverecord.org" target="_blank">php-activerecord</a> in my project using Zend_Framework as the backend, to tie the two together is very simple.  Paginator requires two methods, it needs to be able to pull a count to get the total and it also needs to be able to pull in a subset of the data.  Take a look at the following example:</p>
<pre class="php">&nbsp;
<span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #000000; font-weight: bold;">class</span> My_Paginator implements Zend_Paginator_Adapter_Interface <span style="color: #66cc66;">&#123;</span>
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> __construct<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$table</span>,<span style="color: #0000ff;">$conditions</span> = <a style="text-decoration: none;" href="http://www.php.net/array"><span style="color: #000066;">array</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
	<span style="color: #66cc66;">&#123;</span>
		<span style="color: #b1b100;">if</span><span style="color: #66cc66;">&#40;</span>!<a style="text-decoration: none;" href="http://www.php.net/is_array"><span style="color: #000066;">is_array</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$conditions</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
&nbsp;
			<span style="color: #0000ff;">$conditions</span> = <a style="text-decoration: none;" href="http://www.php.net/array"><span style="color: #000066;">array</span></a><span style="color: #66cc66;">&#40;</span> <span style="color: #0000ff;">$conditions</span> <span style="color: #66cc66;">&#41;</span>;
&nbsp;
		<span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">conditions</span> = <span style="color: #0000ff;">$conditions</span>;
&nbsp;
		<span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">table</span>	  = <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #0000ff;">$table</span>;
&nbsp;
	<span style="color: #66cc66;">&#125;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> getItems<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$offset</span>, <span style="color: #0000ff;">$itemCountPerPage</span><span style="color: #66cc66;">&#41;</span>
	<span style="color: #66cc66;">&#123;</span>
			<span style="color: #b1b100;">return</span> <span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">table</span>-&gt;<span style="color: #006600;">find</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'all'</span>, <a style="text-decoration: none;" href="http://www.php.net/array"><span style="color: #000066;">array</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'limit'</span> =&gt; <span style="color: #0000ff;">$itemCountPerPage</span>, <span style="color: #ff0000;">'offset'</span> =&gt; <span style="color: #0000ff;">$offset</span>, <span style="color: #ff0000;">'conditions'</span> =&gt; <span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">conditions</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
	<span style="color: #66cc66;">&#125;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> <a style="text-decoration: none;" href="http://www.php.net/count"><span style="color: #000066;">count</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
	<span style="color: #66cc66;">&#123;</span>
		<span style="color: #b1b100;">return</span> <span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">table</span>-&gt;<span style="color: #006600;">count</span><span style="color: #66cc66;">&#40;</span><a style="text-decoration: none;" href="http://www.php.net/array"><span style="color: #000066;">array</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'conditions'</span> =&gt; <span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">conditions</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
	<span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span>
&nbsp;</pre>
<p>The two methods the Zend_Paginator_Adapter_Interface expects are count() and getItems().  The above example is a little "raw", it should serve to guide you in what to do when extending the Paginator with it's own adapter regardless of what your database layer is.  In the case of $conditions, these are the parameters you are passing to SQL:</p>
<pre class="php">&nbsp;
<span style="color: #0000ff;">$paginator</span> = <span style="color: #000000; font-weight: bold;">new</span> Zend_Paginator<span style="color: #66cc66;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> My_Paginator<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'User'</span>,<span style="color: #ff0000;">' active = &quot;Y&quot; '</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<p>We want to access the User model, and only want to pull out users who are active.  Easy enough, you can certainly put in more complicated SQL here, but for a general use purpose it solves 99% of what I want with ActiveRecord and paging, pass in a model to the adapter, and some basic conditions for listing.</p>
<pre class="php">&nbsp;
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> pageUserAction<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
    <span style="color: #66cc66;">&#123;</span>
 		<span style="color: #0000ff;">$paginator</span> = <span style="color: #000000; font-weight: bold;">new</span> Zend_Paginator<span style="color: #66cc66;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> My_Paginator<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'User'</span>,<span style="color: #ff0000;">' active = &quot;Y&quot; '</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
		<span style="color: #0000ff;">$paginator</span>-&gt;<span style="color: #006600;">setCurrentPageNumber</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$this</span>-&gt;_getParam<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'page'</span>, <span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
		<span style="color: #0000ff;">$paginator</span>-&gt;<span style="color: #006600;">setItemCountPerPage</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'25'</span><span style="color: #66cc66;">&#41;</span>;
		<span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">view</span>-&gt;<span style="color: #006600;">paginator</span> = <span style="color: #0000ff;">$paginator</span>;
    <span style="color: #66cc66;">&#125;</span>
&nbsp;</pre>
<p>This is a basic usage of Zend_Paginator, you are passing in the current page you are on, and the results per page and pushing it out to the view, and on the view side:</p>
<pre class="php">&nbsp;
&lt;div id=<span style="color: #ff0000;">&quot;userlist&quot;</span>&gt;
<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><a style="text-decoration: none;" href="http://www.php.net/count"><span style="color: #000066;">count</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">paginator</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#123;</span>
	<span style="color: #b1b100;">foreach</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">paginator</span> <span style="color: #b1b100;">as</span> <span style="color: #0000ff;">$user</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#123;</span>
		<a style="text-decoration: none;" href="http://www.php.net/echo"><span style="color: #000066;">echo</span></a> <span style="color: #0000ff;">$user</span>-&gt;<span style="color: #006600;">username</span>.<span style="color: #ff0000;">&quot;&lt;br&gt;&quot;</span>;
	<span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span>
&lt;/div&gt;
&nbsp;
<span style="color: #000000; font-weight: bold;">&lt;?</span>= <span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">paginationControl</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">paginator</span>, <span style="color: #ff0000;">'Elastic'</span>, <span style="color: #ff0000;">'/common/paginator.phtml'</span><span style="color: #66cc66;">&#41;</span>; <span style="color: #000000; font-weight: bold;">?&gt;</span>
&nbsp;</pre>
<p>If anyone has any problems or questions getting the two to work together let me know in the comments and I will do my best to answer your questions.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/10/29/activerecord-and-zend_paginator_adapter_interface/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>PHP Content Rating / Confidence</title>
		<link>http://www.derivante.com/2009/09/01/php-content-rating-confidence/</link>
		<comments>http://www.derivante.com/2009/09/01/php-content-rating-confidence/#comments</comments>
		<pubDate>Tue, 01 Sep 2009 21:15:51 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web Technology]]></category>
		<category><![CDATA[content analysis]]></category>
		<category><![CDATA[rating]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=717</guid>
		<description><![CDATA[For those web masters dealing with user feedback looking to weight content finding the right algorithm can be challenging. From experience, there is going to be no out of the box solution since each site and the requirements will be unique. Getting started and putting a solid foundation is the first step and of course, [...]]]></description>
			<content:encoded><![CDATA[<p>For those web masters dealing with user feedback looking to weight content finding the right algorithm can be challenging. From experience, there is going to be no out of the box solution since each site and the requirements will be unique.  Getting started and putting a solid foundation is the first step and of course, refining over time to get just the right recipe. The following is a binomial proportion confidence interval (what?).  It is a PHP implementation using the Wilson Score Interval to weight the feedback.</p>
<p><span id="more-717"></span></p>
<pre class="php"><span style="color: #000000; font-weight: bold;">class</span> Rating
<span style="color: #66cc66;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">public</span> <a style="text-decoration: none;" href="http://www.php.net/static"><span style="color: #000066;">static</span></a> <span style="color: #000000; font-weight: bold;">function</span> ratingAverage<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$positive</span>, <span style="color: #0000ff;">$total</span>, <span style="color: #0000ff;">$power</span> = <span style="color: #ff0000;">'0.05'</span><span style="color: #66cc66;">&#41;</span>
  <span style="color: #66cc66;">&#123;</span>
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$total</span> == <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">0</span>;
&nbsp;
    <span style="color: #0000ff;">$z</span> = Rating::<span style="color: #006600;">pnormaldist</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>-<span style="color: #0000ff;">$power</span>/<span style="color: #cc66cc;">2</span>,<span style="color: #cc66cc;">0</span>,<span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>;
    <span style="color: #0000ff;">$p</span> = <span style="color: #cc66cc;">1.0</span> * <span style="color: #0000ff;">$positive</span> / <span style="color: #0000ff;">$total</span>;
    <span style="color: #0000ff;">$s</span> = <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span> + <span style="color: #0000ff;">$z</span>*<span style="color: #0000ff;">$z</span>/<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span>*<span style="color: #0000ff;">$total</span><span style="color: #66cc66;">&#41;</span> - <span style="color: #0000ff;">$z</span> * <a style="text-decoration: none;" href="http://www.php.net/sqrt"><span style="color: #000066;">sqrt</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span>*<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>-<span style="color: #0000ff;">$p</span><span style="color: #66cc66;">&#41;</span>+<span style="color: #0000ff;">$z</span>*<span style="color: #0000ff;">$z</span>/<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">4</span>*<span style="color: #0000ff;">$total</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>/<span style="color: #0000ff;">$total</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>/<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>+<span style="color: #0000ff;">$z</span>*<span style="color: #0000ff;">$z</span>/<span style="color: #0000ff;">$total</span><span style="color: #66cc66;">&#41;</span>;
    <span style="color: #b1b100;">return</span> <span style="color: #0000ff;">$s</span>;
  <span style="color: #66cc66;">&#125;</span> 
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <a style="text-decoration: none;" href="http://www.php.net/static"><span style="color: #000066;">static</span></a> <span style="color: #000000; font-weight: bold;">function</span> pnormaldist<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$qn</span><span style="color: #66cc66;">&#41;</span>
  <span style="color: #66cc66;">&#123;</span>
    <span style="color: #0000ff;">$b</span> = <a style="text-decoration: none;" href="http://www.php.net/array"><span style="color: #000066;">array</span></a><span style="color: #66cc66;">&#40;</span>
      <span style="color: #cc66cc;">1.570796288</span>, <span style="color: #cc66cc;">0.03706987906</span>, <span style="color: #cc66cc;">-0</span>.8364353589e<span style="color: #cc66cc;">-3</span>,
      <span style="color: #cc66cc;">-0</span>.2250947176e<span style="color: #cc66cc;">-3</span>, <span style="color: #cc66cc;">0</span>.6841218299e<span style="color: #cc66cc;">-5</span>, <span style="color: #cc66cc;">0</span>.5824238515e<span style="color: #cc66cc;">-5</span>,
      <span style="color: #cc66cc;">-0</span>.104527497e<span style="color: #cc66cc;">-5</span>, <span style="color: #cc66cc;">0</span>.8360937017e<span style="color: #cc66cc;">-7</span>, <span style="color: #cc66cc;">-0</span>.3231081277e<span style="color: #cc66cc;">-8</span>,
      <span style="color: #cc66cc;">0</span>.3657763036e<span style="color: #cc66cc;">-10</span>, <span style="color: #cc66cc;">0</span>.6936233982e<span style="color: #cc66cc;">-12</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$qn</span> &lt; <span style="color: #cc66cc;">0.0</span> || <span style="color: #cc66cc;">1.0</span> &lt; <span style="color: #0000ff;">$qn</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">0.0</span>;
&nbsp;
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$qn</span> == <span style="color: #cc66cc;">0.5</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">0.0</span>;
&nbsp;
    <span style="color: #0000ff;">$w1</span> = <span style="color: #0000ff;">$qn</span>;
&nbsp;
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$qn</span> &gt; <span style="color: #cc66cc;">0.5</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #0000ff;">$w1</span> = <span style="color: #cc66cc;">1.0</span> - <span style="color: #0000ff;">$w1</span>;
&nbsp;
    <span style="color: #0000ff;">$w3</span> = - <a style="text-decoration: none;" href="http://www.php.net/log"><span style="color: #000066;">log</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">4.0</span> * <span style="color: #0000ff;">$w1</span> * <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1.0</span> - <span style="color: #0000ff;">$w1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
    <span style="color: #0000ff;">$w1</span> = <span style="color: #0000ff;">$b</span><span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#93;</span>;
&nbsp;
    <span style="color: #b1b100;">for</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$i</span> = <span style="color: #cc66cc;">1</span>;<span style="color: #0000ff;">$i</span> &lt;= <span style="color: #cc66cc;">10</span>; <span style="color: #0000ff;">$i</span>++<span style="color: #66cc66;">&#41;</span>
      <span style="color: #0000ff;">$w1</span> += <span style="color: #0000ff;">$b</span><span style="color: #66cc66;">&#91;</span><span style="color: #0000ff;">$i</span><span style="color: #66cc66;">&#93;</span> * <a style="text-decoration: none;" href="http://www.php.net/pow"><span style="color: #000066;">pow</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$w3</span>,<span style="color: #0000ff;">$i</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$qn</span> &gt; <span style="color: #cc66cc;">0.5</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #b1b100;">return</span> <a style="text-decoration: none;" href="http://www.php.net/sqrt"><span style="color: #000066;">sqrt</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$w1</span> * <span style="color: #0000ff;">$w3</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
    <span style="color: #b1b100;">return</span> - <a style="text-decoration: none;" href="http://www.php.net/sqrt"><span style="color: #000066;">sqrt</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$w1</span> * <span style="color: #0000ff;">$w3</span><span style="color: #66cc66;">&#41;</span>;
  <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span>
&nbsp;</pre>
<p>The function takes 3 parameters: the positive votes, total votes, and the power. The power can be adjusted, 0.10 to have a 95% chance that your lower bound is correct, 0.05 to have a 97.5% chance, etc.  Sample usage:</p>
<pre class="php">sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>,<span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">250</span>,<span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1000</span>,<span style="color: #cc66cc;">500</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #000000; font-weight: bold;">function</span> sample<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span>,<span style="color: #0000ff;">$n</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#123;</span>
  <a style="text-decoration: none;" href="http://www.php.net/echo"><span style="color: #000066;">echo</span></a> Rating::<span style="color: #006600;">ratingAverage</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span>,<span style="color: #0000ff;">$p</span>+<span style="color: #0000ff;">$n</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #66cc66;">&#125;</span>
&nbsp;</pre>
<p>Output:</p>
<table border="0" cellpadding="2" cellspacing="0" border="1">
<tbody>
<tr>
<td>Positive</td>
<td>Negative</td>
<td>Score</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0.20654931654388</td>
</tr>
<tr>
<td>100</td>
<td>50</td>
<td>0.58789756740385</td>
</tr>
<tr>
<td>250</td>
<td>100</td>
<td>0.6648317184611</td>
</tr>
<tr>
<td>1000</td>
<td>500</td>
<td>0.6424116916199</td>
</tr>
</tbody>
</table>
<p>When dealing with sites like Reddit, Digg, and the like  you have a certain "freshness" element.  The above solution might be a working model for the entire span of the site, but for that front page element you will need to implement some form of "gravity".  This can be done by taking the raw score and decaying it over time, like so:</p>
<pre class="php">&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> Rating
<span style="color: #66cc66;">&#123;</span>
  ...
  <span style="color: #000000; font-weight: bold;">public</span> <a style="text-decoration: none;" href="http://www.php.net/static"><span style="color: #000066;">static</span></a> <span style="color: #000000; font-weight: bold;">function</span> gravityRating<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$positive</span>, <span style="color: #0000ff;">$total</span>, <span style="color: #0000ff;">$time</span>, <span style="color: #0000ff;">$power</span> = <span style="color: #ff0000;">'0.05'</span><span style="color: #66cc66;">&#41;</span>
  <span style="color: #66cc66;">&#123;</span>
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$total</span> == <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">0</span>;
    <span style="color: #b1b100;">return</span> <span style="color: #66cc66;">&#40;</span>Rating::<span style="color: #006600;">ratingAverage</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$positive</span>, <span style="color: #0000ff;">$total</span>, <span style="color: #0000ff;">$power</span><span style="color: #66cc66;">&#41;</span> / <a style="text-decoration: none;" href="http://www.php.net/pow"><span style="color: #000066;">pow</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$time</span>,<span style="color: #cc66cc;">0.5</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
  <span style="color: #66cc66;">&#125;</span>
  ...
<span style="color: #66cc66;">&#125;</span>
&nbsp;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span>,<span style="color: #ff0000;">'0.5'</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span>,<span style="color: #ff0000;">'1'</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span>,<span style="color: #ff0000;">'4'</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span>,<span style="color: #ff0000;">'8'</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span>,<span style="color: #ff0000;">'24'</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #000000; font-weight: bold;">function</span> sample<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span>,<span style="color: #0000ff;">$n</span>,<span style="color: #0000ff;">$time</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#123;</span>
  <a style="text-decoration: none;" href="http://www.php.net/echo"><span style="color: #000066;">echo</span></a> Rating::<span style="color: #006600;">gravityRating</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span>,<span style="color: #0000ff;">$p</span>+<span style="color: #0000ff;">$n</span>,<span style="color: #0000ff;">$time</span><span style="color: #66cc66;">&#41;</span>.<span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>;
<span style="color: #66cc66;">&#125;</span>
&nbsp;</pre>
<p>In the example above, $time represents the age (in hours) and you can see the decay in the output:</p>
<p>0.83141271310867<br />
0.58789756740385<br />
0.29394878370192<br />
0.20785317827717<br />
0.12000408843024</p>
<p>My recommendation would be to "cap" the time to stop decay after a fixed period such as 12 or 24 hours to stop the initial boost of fresh content and let it normalize quickly.  The rate of decay of course, can be adjusted as fast or as slow as you want and again the individual weighting you want to apply will vary from site to site.  Depending on the volatility of your content, a front page "freshness" that will encompass a week would not merit a 12 hour decay, but rather a week long decay.   Hopefully the above code is enough to get started with content rating and making better use of user feedback and can help lead web masters to making a more intelligent calculation of their content beyond the traditional "5-Star Rating".</p>
<p>[ c9maji2tvz ]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/09/01/php-content-rating-confidence/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>PHP Google Analytics API</title>
		<link>http://www.derivante.com/2009/07/08/php-google-analytics-api/</link>
		<comments>http://www.derivante.com/2009/07/08/php-google-analytics-api/#comments</comments>
		<pubDate>Wed, 08 Jul 2009 15:19:50 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web Technology]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=707</guid>
		<description><![CDATA[GAPI 1.3 released this past month. First, read the Google Analytics Data API Reference and then read up on dimensions, metrics, and valid combinations of the two.  The quotas apply to a single web property, so each analytics profile (site1, site2, etc) are subject to their own individual quotas.   There is no per user account [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://code.google.com/p/gapi-google-analytics-php-interface/">GAPI 1.3</a> released this past month.  First, read the <a href="http://code.google.com/apis/analytics/docs/gdata/gdataReference.html">Google Analytics Data API Reference</a> and then read up on <a href="http://code.google.com/apis/analytics/docs/gdata/gdataReferenceDimensionsMetrics.html">dimensions, metrics, and valid combinations</a> of the two.  The quotas apply to a single web property, so each analytics profile (site1, site2, etc) are subject to their own individual quotas.   There is no per user account limit for accessing the API.    If you plan on making any kind of on demand application I suggest you query in bulk and cache the results locally (analytics is not real time anyways).</p>
<p><span id="more-707"></span></p>
<p><strong>API Features</strong></p>
<ul>
<li>Supports CURL and fopen HTTP access methods, with autodetection </li>
<li>PHP arrays for Google Analytics metrics and dimensions </li>
<li>Account data object mapping - get methods for parameters </li>
<li>Report data object mapping - get methods for metrics and parameters </li>
<li>Easy filtering, use a GAPI query language for Google Analytics filters </li>
<li>Full PHP5 Object Oriented code, ready for use in your PHP application </li>
</ul>
<p><br></p>
<p><strong>Google Quotas</strong></p>
<ul>
<li>The quota applies to a <em>single web property</em></li>
<li>10,000 requests per 24 hours</li>
<li>100 requests in any given 10-second period</li>
<li>A query is also limited to pagination limits of 10,000 entries per feed,   with a default response of 1,000 entries</li>
</ul>
<p><br></p>
<p>While the broad set of data they have is extremely useful to pull down to do your own processing on, I find the custom visitor segments with the API a lot more interesting.    If you have a user based site with registration and collect information on your users, you can pull this data back out to determine a lot of information about your user.   A lot of tools are already built into Analytics for reporting but the ability to mash this data up to provide custom reporting for your users (customers) without giving them access to analytics is key to delivering metrics and performance</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/07/08/php-google-analytics-api/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>PHP Rapid Application Development with ZF1.8 and AR</title>
		<link>http://www.derivante.com/2009/05/21/php-rapid-application-development-with-zf18-and-ar/</link>
		<comments>http://www.derivante.com/2009/05/21/php-rapid-application-development-with-zf18-and-ar/#comments</comments>
		<pubDate>Thu, 21 May 2009 19:02:42 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[Framework]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[ORM]]></category>
		<category><![CDATA[php 5.3]]></category>
		<category><![CDATA[phpactiverecord]]></category>
		<category><![CDATA[ZendFramework]]></category>
		<category><![CDATA[Zend_Tool]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=627</guid>
		<description><![CDATA[In the days since I began programming in PHP the web has come a long way. With the 5.3 release of PHP, the OOP side of things are finally getting a much needed polish. In the past year there's been a steady rise in the usage of Ruby on Rails for web development as programmers [...]]]></description>
			<content:encoded><![CDATA[<p>In the days since I began programming in PHP the web has come a long way.  With the 5.3 release of PHP, the OOP side of things are finally getting a much needed polish.  In the past year there's been a steady rise in the usage of Ruby on Rails for web development as programmers are discovering that coming up with an idea for a site is far more enjoyable than programming one.  Rails offers a very rapid application development environment to get from concept to code in a minimal number of steps.  Unfortunately, PHP has been lacking in this regard. PHP needs a standard framework and easy database interaction. Currently, the steps required to go from concept to code brings with it a huge overhead in scaffolding development.</p>
<p><span id="more-627"></span></p>
<p>It is precisely this overhead that needs to be eliminated.  Time is money and if you are working with a new application on a weekly or monthly basis the scaffolding overhead comprises a great deal of that time.  Not to mention that most web applications today are running to and fro from a database. This time spent fetching and storing data makes up for a significant chunk in the overall development.  2009 is a big step forward for PHP and there are tools out there that tackle this problem.</p>
<p>
Enter ZendFramework 1.8, just recently released it represents Zend's commitment to make PHP a competitive environment for Rapid Application Development (totally RAD).  There are several new tools included within this release, primarily Zend_Tool, Zend_Application, and Zend_Navigation. Zend_Tool_Project allows for the automated creation of new projects complete with a ready to go application shell.  You can liken this shell to a new Hello World! project with the core structure and scaffolding setup.  Rather than going into the details of everything included within the release, you can find a thorough overview within the 1.8 release notes.</p>
<p>This brings me to what I'm really excited about. Recently we pushed out <a href="http://www.derivante.com/2009/05/14/php-activerecord-with-php-53/">PHP ActiveRecord</a> for everyone!  This tutorial goes over the steps necessary to build your environment.  You'll be able to create new projects within the ZendFramework with <a href="http://www.derivante.com/2009/05/14/php-activerecord-with-php-53/">PHP ActiveRecord</a>. I believe developers should spend their time dealing with application logic and not be burdened with configuration, bootstrapping or custom SQL.  For this tutorial I am using the following environment:</p>
<p>- Ubuntu 9.04<br />
- PHP 5.3-RC1<br />
- MySQL 5.4<br />
- Apache 2.2.11<br />
- ZendFramework 1.8 min<br />
- php-activerecord 0.9</p>
<p>For the sake of keeping this tutorial short and concise, there are already several resources out there that deal with compiling PHP 5.3.  I'll assume you've already setup your server with MySQL, PHP and Apache ready to go.  In fact, you can use nginx if you prefer going down that route or even ZendServer CE, although I personally prefer Apache. It certainly doesn't matter what you're serving out PHP with and it will probably be dependent on what you're existing environment is using.</p>
<p><strong>Installing ZendFramework 1.8 and ActiveRecord</strong></p>
<p>We're going to want both libraries in our PHP include directory.  First you should download the latest ZF release, 1.8.1 at the time of this post.   When you have it ready to go on your server it is important to make sure your environment is working.  If you installed PHP 5.3 correctly you should be able to get the following output from CLI:</p>
<pre># php -v
PHP 5.3.0RC1 (cli) (built: May  3 2009 15:31:21)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2009 Zend Technologies
# php -i | grep include_path
include_path => .:/usr/local/php/lib => .:/usr/local/php/lib</pre>
<p>Don't worry if your include path is elsewhere, just make a note of where it is since we'll be dumping the Zend stuff there. Now to extract and install Zend:</p>
<pre># tar xzfv ZendFramework-1.8.1-minimal.tar.gz
# mv ZendFramework-1.8.1-minimal/library/Zend /usr/local/php/lib/
# mv ZendFramework-1.8.1-minimal/bin/zf.* /usr/local/php/bin/
# ln -s /usr/local/php/bin/zf.sh /usr/bin/zf
# zf show version
Zend Framework Version: 1.8.1</pre>
<p>I opted to put the zf.sh into my PHP bin directory and symlink it from my bin dir.  The next step is to download the source from <a href="http://github.com/kla/php-activerecord/">github</a> and unpack it into your include path to a directory called php-activerecord. Our dependencies are there and we can move on to our first project.</strong></p>
<p><strong>Create Project</strong></p>
<p>Now that everything is installed, we want to test out zf create project.  The following example is a bare bones project ready for us to extend:</p>
<pre># zf create project /var/www/newproject
Creating project at /var/www/newproject
# ls /var/www/newproject/
application  library  public  tests</pre>
<p>That is it, you are ready for the web.  Configure your web server to start serving out pages from the location you set for your new project.  The docroot path should be /var/www/newproject/public.  Load up your web browser and you should see your page.</p>
<p><strong>Loading up ActiveRecord</strong></p>
<p>Inside your application/Bootstrap.php you might notice the code rather bare.  The Zend Bootstrapper will load anything within the bootstrap function starting with _init*.  To tack on ActiveRecord, we simply add the following:</p>
<pre class="php"><span style="color: #000000; font-weight: bold;">class</span> Bootstrap <span style="color: #000000; font-weight: bold;">extends</span> Zend_Application_Bootstrap_Bootstrap
<span style="color: #66cc66;">&#123;</span>
    protected <span style="color: #000000; font-weight: bold;">function</span> _initActiveRecord<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
    <span style="color: #66cc66;">&#123;</span>
        <span style="color: #b1b100;">require_once</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;php-activerecord/ActiveRecord.php&quot;</span><span style="color: #66cc66;">&#41;</span>;
        ActiveRecord\Config::<span style="color: #006600;">initialize</span><span style="color: #66cc66;">&#40;</span><span style="color: #000000; font-weight: bold;">function</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$cfg</span><span style="color: #66cc66;">&#41;</span>
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #0000ff;">$cfg</span>-&gt;<span style="color: #006600;">set_model_directory</span><span style="color: #66cc66;">&#40;</span>APPLICATION_PATH . <span style="color: #ff0000;">'/models'</span> <span style="color: #66cc66;">&#41;</span>;
            <span style="color: #0000ff;">$cfg</span>-&gt;<span style="color: #006600;">set_connections</span><span style="color: #66cc66;">&#40;</span><a style="text-decoration: none;" href="http://www.php.net/array"><span style="color: #000066;">array</span></a><span style="color: #66cc66;">&#40;</span>
                <span style="color: #ff0000;">'development'</span> =&gt; <span style="color: #ff0000;">'mysql://user:password@localhost/newproject'</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
        <span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span>;
    <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span></pre>
<p>You will need to adjust your SQL connect strings, and at this point if you want to restructure your model directory differently you may do so.  There are only two configuration points, both of which you could pass from an XML or INI config.  Even though ActiveRecord is loaded, we still need to test out that everything is working. In order to do this we will need to create some test data:</p>
<pre class="sql"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">DATABASE</span> newproject;
<span style="color: #993333; font-weight: bold;">USE</span> newproject;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> books<span style="color: #66cc66;">&#40;</span>
    id int <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> <span style="color: #993333; font-weight: bold;">AUTO_INCREMENT</span>,
    name varchar<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">50</span><span style="color: #66cc66;">&#41;</span>,
    author varchar<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">50</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;
<span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> books<span style="color: #66cc66;">&#40;</span>id,name,author<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">VALUES</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>,<span style="color: #ff0000;">'How to be Angry'</span>,<span style="color: #ff0000;">'Jax'</span><span style="color: #66cc66;">&#41;</span>;</pre>
<p>The next step is to create a model for the books table to be used from within your application. In your model directory that you set within the configuration above, create a Book.php with the following:</p>
<pre class="php"><span style="color: #000000; font-weight: bold;">class</span> Book <span style="color: #000000; font-weight: bold;">extends</span> ActiveRecord\Model <span style="color: #66cc66;">&#123;</span> <span style="color: #66cc66;">&#125;</span></pre>
<p>And finally to test out that everything is working, within your IndexController.php for indexAction add the following:</p>
<pre class="php"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> indexAction<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#123;</span>
    <span style="color: #0000ff;">$this</span>-&gt;<span style="color: #006600;">view</span>-&gt;<span style="color: #006600;">book</span> = Book::<span style="color: #006600;">find</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #66cc66;">&#125;</span></pre>
<p>And so we can actually see the output, edit your /views/scripts/index/index.phtml with the following:</p>
<pre class="php">Today I read <span style="color: #000000; font-weight: bold;">&lt;?</span>=book-&gt;<span style="color: #006600;">name</span>?&gt; by <span style="color: #000000; font-weight: bold;">&lt;?</span>=book-&gt;<span style="color: #006600;">author</span>?&gt; and it was terrible.</pre>
<p>As you can see, you can pull the variable set by the action assigned to the view or you can query right from the view itself if you please.  If you get the following output, everything is good to go:</p>
<pre>Today I read How to be Angry by Jax and it was terrible.</pre>
<p><strong>Next Steps</strong></p>
<p>Going forward, there is plenty of room to extend this setup.  Unfortunately, the Zend_Tool_Project abstraction documents are still being completed but there are examples out there on how to add new commands to it.  One new command I am working on is the automated creation of AR models by passing a database connect string to simplify the process even further.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/05/21/php-rapid-application-development-with-zf18-and-ar/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>PHP/KML Polyline Simplification with Douglas-Peucker</title>
		<link>http://www.derivante.com/2009/04/20/phpkml-polyline-simplification-with-douglas-peucker/</link>
		<comments>http://www.derivante.com/2009/04/20/phpkml-polyline-simplification-with-douglas-peucker/#comments</comments>
		<pubDate>Tue, 21 Apr 2009 02:41:16 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web Technology]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=331</guid>
		<description><![CDATA[Quality GIS data sometimes comes with a lot more precision than what is usable for Google Maps (or other mapping software). The problem lies in the number of points representing a polygon that you want to overlay. A county representation for a state might include 100,000 points that is not usable without some form of [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-425" style="margin: 15px;" title="php-med-trans-light" src="http://www.derivante.com/wp-content/uploads/2009/05/php-med-trans-light.gif" alt="php-med-trans-light" width="95" height="51" />Quality GIS data sometimes comes with a lot more precision than what is usable for Google Maps (or other mapping software). The problem lies in the number of points representing a polygon that you want to overlay. A county representation for a state might include 100,000 points that is not usable without some form of reduction. Luckily there is an algorithm that solves that problem, Douglas-Peucker.</p>
<p>The algorithm simplifies a polyline by removing vertices that do not contribute (sufficiently) to the overall shape. It is a recursive process which finds the most important vertices for every given reduction. First, the most basic reduction is assumed. A single segment connecting the beginning and end of the original polyline. This is when the recursion starts, the most significant vertex (the most distant) for this segment is found and, when the distance from this vertex to the segment exceeds the reduction tolerance, the segment is split into two sub-segments, each inheriting a subset of the original vertex list. Each segment continues to subdivide until none of the vertices in the local list are further away than the tolerance value.</p>
<p>There is a PHP class that does just this: <a href="http://www.fonant.com/demos/douglas_peucker/algorithm">Douglas-Peucker Polyline Simplification in PHP</a> by <a href="http://www.fonant.com/">Anthony Cartmell</a>. Based on the original quality of the data and tolerance level, I was able to achieve a 90-93% reduction in size. This reduction allows me to represent significantly more data at a reasonable performance level to clients. Keep in mind, that this reduction is removing data out of the coordinate array so the quality of your representation will go down with the tolerance and reduction being applied. I highly suggest that you play around with the tolerance until you find a good balance between data size and image quality.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/04/20/phpkml-polyline-simplification-with-douglas-peucker/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>PHP GIS Functions</title>
		<link>http://www.derivante.com/2009/04/14/php-gis-functions/</link>
		<comments>http://www.derivante.com/2009/04/14/php-gis-functions/#comments</comments>
		<pubDate>Tue, 14 Apr 2009 17:35:53 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=313</guid>
		<description><![CDATA[I have been working a lot of with PHP and GIS consulting for CitySquares and the History Engine. I found searching for everything I needed to do basic processing &#38; Google Integration tedious and painful. So here is a collection of common functions that helped me get through the massaging of the data and ready [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-425" style="margin: 15px;" title="php-med-trans-light" src="http://www.derivante.com/wp-content/uploads/2009/05/php-med-trans-light.gif" alt="php-med-trans-light" width="95" height="51" />I have been working a lot of with PHP and GIS consulting for <a href="http://citysquares.com">CitySquares </a>and the <a href="http://historyengine.richmond.edu/">History Engine</a>.  I found searching for everything I needed to do basic processing &amp; Google Integration tedious and painful.  So here is a collection of common functions that helped me get through the <em>massaging</em> of the data and ready for integration.</p>
<ul style="padding-left: 20px;padding-bottom: 10px;"> <strong><a href="http://www.derivante.com/files/phpgis.txt">pnPoly</a></strong> - Used to determine if a coordinate falls inside a polygon.<br />
<strong><a href="http://www.derivante.com/files/phpgis.txt">Centroid </a></strong>- Find the center of a polygon..<br />
<strong><a href="http://www.derivante.com/files/phpgis.txt">Area</a></strong> - Calculate the area of a polygon.<br />
<strong><a href="http://www.derivante.com/files/geocoder.txt">googleGeoCoder </a></strong> - Extracts GIS information from Google Maps from an address.<br />
<strong><a href="http://www.derivante.com/files/polyline.txt">PolylineEncoder</a></strong> - Takes a set of coordinates and encodes it for Google Maps.</ul>
<p>If you ran into the problem I did, which is that a lot of the data is coming in the form of shp/dbf files and needs to be parsed out to something friendlier either KML or CSV, there are a couple of solutions for that.  You can parse out the data with <a href="http://www.obviously.com/gis/shp2text/">shp2text</a> if your source coordinate format is already in lat/lng or if you have different coordinate system and use ArcGIS, you can try the plugin <a href="http://arcscripts.esri.com/details.asp?dbid=14273">Export to KML 2.5.3</a> to help with the exporting of data with the ESRI suite of products.</p>
<p>Once your data is in SQL, the following query is an example of distance sorting with SQL. You can grab a copy of the zip_codes database <a href="http://www.derivante.com/files/zip_codes.sql.gz">here</a> and play around with it.</p>
<pre style="padding-left: 20px;padding-bottom: 10px;" lang="sql">SELECT *,
sqrt((69.1 * ("37.6" - latitude)) * (69.1 * ("37.6" - latitude)) +
(53.0 * ("-77.6" - longitude)) * (53.0 * ("-77.6" - longitude)))
AS distance
FROM `zip_codes`
HAVING distance &lt; 10
ORDER BY distance ASC</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/04/14/php-gis-functions/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Bayesian filter training with N-gram</title>
		<link>http://www.derivante.com/2009/03/31/bayesian-filter-training-with-n-gram/</link>
		<comments>http://www.derivante.com/2009/03/31/bayesian-filter-training-with-n-gram/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 04:09:19 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=274</guid>
		<description><![CDATA[Bayesian filtering is based on the principle that most events are dependent and that the probability of an event occurring in the future can be inferred from the previous occurrences of that event (link). A probability value is then assigned to each word or token; the probability is based on calculations that take into account [...]]]></description>
			<content:encoded><![CDATA[<p>Bayesian filtering is based on the principle that most events are dependent and that the probability of an event occurring in the future can be inferred from the previous occurrences of that event (<a href="http://support.gfi.com/manuals/en/me12/me12manual.1.13.html">link</a>).  A probability value is then assigned to each word or token; the probability is based on calculations that take into account how often that word occurs in one category or another.  The most common application of the filter is for identifying words that appear in spam versus legitimate emails. A word by itself is often times useless without the context  it was used in.</p>
<p>There is a whole suite of tools that are able to break down content to help improve the filter by supplementing it not only with a database of words to categories, but also sets of <a href="http://en.wikipedia.org/wiki/N-gram">N-gram</a> derived from the text.   There are several scripts out there that will help with this extraction and it offers a few more layers of depth for Bayesian filtering.  One such tool is, <a href="http://ngram.sourceforge.net/">Ngram Statistics Package (NSP)</a> which is easy to install and run.</p>
<p><span id="more-274"></span><br />
I ran a very basic test against an older <a href="http://www.derivante.com/2009/01/26/there-and-back-again-an-ec2-mysql-cluster/">post</a> to see how it does with bigram extraction.</p>
<p># perl bin/count.pl --ngram 2 test.cnt test.txt<br />
# perl statistic.pl --ngram 2 dice test.res test.cnt</p>
<p>Sample bigrams found:</p>
<p>cloud computing, master slave, groups online, Back Again, made absolutely, very costly, extensive development, hefty bill, start ups, distribution awareness</p>
<p>Rather than running a probability that the set of words above would fit into one category in this case, "Technology" we can now compound the score with the probability that those terms fall into the category as well.  For another layer of scoring, trigrams can be extracted, 4-grams, etc.  In the financial sector the terminology is thick and analysis will be almost impossible without N-gram extraction.  "Filed for bankruptcy" and "avoided bankruptcy" could not be further apart.  With traditional filtering, the word "bankruptcy" would be meaningless because it really is not an indicator as to the probability that the article is favorable or not because there is no context.  In this case by extracting the phrases the filter can understand and score appropriate the difference between the two terms.</p>
<p>Paul Graham has been working on <a href="http://www.bgl.nu/bogofilter/graham.html">improving </a>the Bayesian filter to deal with spam by splitting the data into categories.  Text is classified not only as legitimate or spam based on the context of the message, but the likely hood of tokens appearing in various parts of the message.  N-gram filtering in this case wouldn't work as well for spam as the amount of grammar mistakes, misspellings, and word ordering would make any benefit worthless.   Spammers are adjusting their content to beat such filters all the time.  When the source data is reliable, the N-gram addition to the  filter will boost categorization accuracy.</p>
<p>Integration to traditional Bayesian filtering is very easy.  <a href="http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html">Google</a> has been using text processing for a while now.  This is a huge area of study in linguistics, language processing and machine learning.  With so much data out there and more being collected on a daily basis, deriving context from text will allow for applications to behave smarter.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/03/31/bayesian-filter-training-with-n-gram/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SEO: Taking control of search</title>
		<link>http://www.derivante.com/2009/03/30/seo-taking-control-of-search/</link>
		<comments>http://www.derivante.com/2009/03/30/seo-taking-control-of-search/#comments</comments>
		<pubDate>Mon, 30 Mar 2009 20:56:05 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=265</guid>
		<description><![CDATA[In my experience the majority of web agencies and developers still do not take search seriously enough. Most businesses have very simple requests, "How do I show up for keyword for people in the area", "How do I show up higher than my competitor on searches", and "How do people find my site". The web [...]]]></description>
			<content:encoded><![CDATA[<p>In my experience the majority of web agencies and developers still do not take search seriously enough.  Most businesses have very simple requests, "How do I show up for <em>keyword </em>for people in the area", "How do I show up higher than my competitor on searches", and "How do people find my site".  The web is an economy and driving consumers to business on the internet is a highly desired skill set.  Consistently controlling the results of Google will be impossible and there is always room for improvement for every site.</p>
<p>Every developer will grow their own set of tools, but the core components are available for free.  Google offers <a href="https://www.google.com/analytics/">analytics</a> to take control of your traffic performance, sources, and patterns.  There is also <a href="https://www.google.com/analytics/">Adwords Keyword Tool</a>, which will help you target search phrases, volume, and competition.   Based on these factors and a list of similar keywords you will be able to identify good opportunities to compete for relevant traffic.  There is also the <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=35769">Webmaster guidelines</a> published by Google that will give you a general best practice for search engines.</p>
<p>This process requires a lot of patience.  It takes time for changes to take shape and results are delivered.  When making changes to any site or even designing a new site with SEO built in, user traffic is not going to happen right away.  Seeing the results come in will trigger an OCD to check Analytics and forever make improvements and indentify new markets and opportunities.   The vast majority of web sites are there for user consumption.  SEO became big business when a lot of people all at once figured out that users translated to consumers.</p>
<p>Google is the search leader, therefore they offer the highest return.  They control the flow of traffic on the internet.  Luckily, they also published a <a href="http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf">search engine optimization starter guide</a> in pdf format! This is the 101 of SEO and it will be pointless to try to chase down every obscure reference and tip on the countless SEO sites out there when the components to their content analysis is available all in one place.  The document is a general overview but offers some very important best practice rules that are easy to implement:</p>
<p><strong>Title Tags</strong></p>
<p>- Choose a title that effectively communicates the topic of the page's content.<br />
- Create unique title tags for each page<br />
- Use brief, but descriptive titles (limit of 66 characters or 12 keywords)</p>
<p><strong>Description Tags</strong></p>
<p>- Accurately summarize the page's content<br />
- Use unique descriptions for each page<br />
- Avoid filling the description with only keywords<br />
- Avoid copy and pasting the entire content of the document into the description meta tag</p>
<p><strong>URL structure</strong></p>
<p>- Use words in URLs<br />
- Create a simple directory structure<br />
- Provide one version of a URL to reach a document<br />
- Many users expect lower-case URLs and remember them better)</p>
<p><strong>Site Navigation</strong></p>
<p>- Create a naturally flowing hierarchy<br />
- Use mostly text for navigation<br />
- Use "breadcrumb" navigation<br />
- Put an HTML sitemap page on your site, and use an XML Sitemap file<br />
- Consider what happens when a user removes part of your URL<br />
- Have a useful 404 page</p>
<p><strong>Anchor Text (Links)</strong></p>
<p>- Choose descriptive text<br />
- Write concise text<br />
- Format links so they're easy to spot</p>
<p><strong>Heading Text</strong></p>
<p>- There are six sizes of heading tags, beginning with &lt;h1&gt;, the most important, and ending with &lt;h6&gt;, the  least important.<br />
- Imagine you're writing an outline<br />
- Use headings sparingly across the page<br />
- Avoid using heading tags only for styling text and not presenting structure<br />
- Avoid excessively using heading tags throughout the page</p>
<p><strong>Other Confirmed Ranking Factors</strong></p>
<p>- Keyword in URL<br />
- Keyword in Domain name<br />
- Freshness of Pages<br />
- Freshness - Amount of Content Change<br />
- Freshness of Links<br />
- Site Age<br />
- Anchor text of inbound link<br />
- <a href="ftp://ftp.cs.toronto.edu/pub/reports/csri/405/hilltop.html">Hilltop Algorithm</a><br />
- Domain Registration Time</p>
<p>There is a lot of helpful content in the document but it does not go deep into the inner mechanics like other sites attempt to do.   There are several sites out there that try to go beyond what has been published and into the details for generating traffic, you would just need to google "<a href="http://www.google.com/search?q=google+seo+rules">Google Ranking Factors</a>".  A lot of information came from when google released US Patent Application <a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&amp;r=1&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PG01&amp;s1=20050071741&amp;OS=20050071741&amp;RS=20050071741">#20050071741</a>.</p>
<p>Use the above as a baseline of the steps to get your site more traffic. This is a topic that is constantly being updated as search improves and requires a lot of time and research to do efficiently.  Overhauling existing projects to meet the standards of today's crawlers is tedious, boring, and offers no immediate results.  It has been something I avoided in the past, but for a web site to stay competitive and more importantly, be seen it has to be found.  I find having some good rules in place for how to deal with SEO makes new projects going forward much easier to deal with.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/03/30/seo-taking-control-of-search/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bayesian Filtering &amp; Financial Applications</title>
		<link>http://www.derivante.com/2009/03/27/bayesian-filtering-financial-applications/</link>
		<comments>http://www.derivante.com/2009/03/27/bayesian-filtering-financial-applications/#comments</comments>
		<pubDate>Fri, 27 Mar 2009 17:08:17 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bayesian]]></category>
		<category><![CDATA[content analysis]]></category>
		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=257</guid>
		<description><![CDATA[A friend of mine and I recently started a new project. After kicking around several ideas we finally reached a consensus on applying software prediction to financial data. This has been pursued pretty heavily but from a home brew stand point, we wanted to make software that could compete by mashing up existing data and [...]]]></description>
			<content:encoded><![CDATA[<p>A friend of mine and I recently started a new project.  After kicking around several ideas we finally reached a consensus on applying software prediction to financial data.  This has been pursued pretty heavily but from a home brew stand point, we wanted to make software that could compete by mashing up existing data and technology available on the internet to make competitive and functioning software.</p>
<p>We intend on predicting the movement of stocks based on real time content analysis.  This requires a good deal of machine learning and historical data, but even good content analysis is not enough.  Using Bayesian Filtering with noise word reduction we plan on processing historical data and assigning the content to one of three categories: moveup, movedown, nomove.  In order to train the filters, past press releases will be inserted into the filter mashed up with the stock data to track how the markets reacted to the context of the content.  Over time, the software will be able to recognize keywords that trigger positive versus negative emotion in the market that would drive the price one way or the other.  A score can be applied much like spam scores are applied and this number can be used as part of a greater overall algorithm to determine an action.</p>
<p>Just to bring a few readers up to speed on exactly how this will be applied, take the following formula:</p>
<p><img class="aligncenter size-full wp-image-263" src="http://www.derivante.com/wp-content/uploads/2009/03/b307149835ea31ced4ae23af2ab89b05.png" alt="" width="437" height="46" /></p>
<p>Rather than training it to recognize the probability of spam we train it to recognize the probability that the word will trigger positive stock movement:</p>
<ul>
<li><span class="texhtml"><em>p</em></span> is the probability that the content will result in positive movement.</li>
<li><span class="texhtml"><em>p</em>1</span> is the probability <span class="texhtml"><em>p</em>(<em>S</em> | <em>W</em>1)</span> that it is positive knowing it contains a first word (for example "capital");</li>
<li><span class="texhtml"><em>p</em>2</span> is the probability <span class="texhtml"><em>p</em>(<em>S</em> | <em>W</em>2)</span> that it is positive knowing it contains a second word (for example "boosted");</li>
<li><em>etc...</em></li>
</ul>
<p>The entire body of the content will be processed against a known database of words and the market reaction to the presence of those words.  The basic Bayesian filtering will need to be extended to deal with phrase recognition but overall a solid proven technology for machine learning to build from.</p>
<p>This information by itself, is nothing revolutionary but with strong pattern analysis like candlestick pattern recognition and other market indicators it can be used to create an accurate trading platform for marginal gains which over time can offer pretty high returns.  There is certainly a lot of potential for this if it works, but it heavily depends on working accurately and there will be a lot of trial and error in the process.</p>
<p>For more reading on the concepts and components behind this idea, check out:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier">Naive Bayes Classifier</a></li>
<li><a href="http://www.leavittbrothers.com/education/candlestick_patterns/">Candlestick Patterns</a></li>
<li><a href="http://en.wikipedia.org/wiki/Candlestick_chart">Candlestick Charting</a></li>
<li><a href="http://www.paulgraham.com/better.html">Better Bayesian Filtering</a></li>
<li><a href="http://www.tdameritrade.com/tradingtools/partnertools/api_dev.html">TD Ameritrade API</a></li>
</ul>
<p>The nice part is all the historical data is out there around the internet which makes back-testing and scoring very easy to do and there will need to be a lot of testing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/03/27/bayesian-filtering-financial-applications/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>That went well, what now?</title>
		<link>http://www.derivante.com/2009/03/18/that-went-well-what-now/</link>
		<comments>http://www.derivante.com/2009/03/18/that-went-well-what-now/#comments</comments>
		<pubDate>Wed, 18 Mar 2009 17:07:46 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=241</guid>
		<description><![CDATA[It has been about a month now since the roll out and you can see the traffic trends rising since we started this process back in January. At the rate google is crawilng the data, the projection is that traffic will continue to rise well into the fall as everything is indexed. With that said, [...]]]></description>
			<content:encoded><![CDATA[<p>It has been about a month now since the roll out and you can see the traffic trends rising since we started this process back in January.  At the rate google is crawilng the data, the projection is that traffic will continue to rise well into the fall as everything is indexed.</p>
<p>With that said, we are about to surpass several sites on the way of traffic including <a href="http://www.quantcast.com/reddit.com">reddit.com</a>, <a href="http://www.quantcast.com/fark.com">fark.com</a>, <a href="http://www.quantcast.com/mcdonalds.com">mcdonalds.com</a>, and <a href="http://www.quantcast.com/ibm.com">ibm.com</a> to name a few.  As a developer, seeing the metrics come back helps motivate and encourage the work that I've done.  Even now we are still dealing with speed bumps along the way.  None of which are noticeable as far as traffic is concerned but this maintained scalability is certainly a huge task.  Using Drupal as a back end has proven that there are several challenges with how we proceed going forward.  We've decided to scrap the MySQL Master/Master replication due to Drupal's sequences tables and duplicate key problems.  An issue easily fixed if only auto increment was used... but alas without rewriting a good chunk of the code base going forward we must adapt to Master/Slave Read/Write splitting.  It seems a week does not go by without encountering a scaling/replication pitfall.  Drupal's general compatibility attitude torwards their framework makes it very difficult to leverage any perticular technology like MySQL to it's maximum because the database layer is written with several database backends in mind.  A word of caution going for other developers that when they plan on creating a high traffic web site, there is a point where an up front investment in the infrastructure and backend will pay off huge.  I believe we're reaching that point.</p>
<p>The unfortunate part with rapid growth is if the team is capable of adjusting at the same pace.  While there is only but so much that can be planned ahead, now more than ever it is important that issues are indetified long before the become customer facing because the stakes are so much higher.  Despite a successful launch, there is still a lot more ahead.  How much time do we invest into new features, maintenance, and re-writes?  What takes a higher priority, growth or consumer experience?  Do we have the resources to invest in research and development?</p>
<p>At the end of every milestone, I find it necessary everyone pats themselves on the back, take deep breath,  regroup as a team, and the cycle begins all over again.   The gaps in between the end of one project and a start of another is the most important time for management and development to be in step with each other so everyone can move forward rowing in the same direction.  Revisit company values, mission statements, and have meaningfull follow up discussions on what went well and what didn't.  If as a team there is no time allocated for dialogue, despite accomplishing the task at hand, the same problems will occur over and over again.  Not all problems in development are technical-- process and communication are consistent issues that seems to always manifest one way or another when working in a collaborative enviroment and it's important to determine what works well in the current situation.  What may have worked in the past on a project, at a previous job, or for one person might not work now.</p>
<p>Congratulations on a job well done,  let's open the dialogue and relish in the reflection time... that went well, what now?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/03/18/that-went-well-what-now/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Drupal Scaling, not so much</title>
		<link>http://www.derivante.com/2009/02/25/drupal-scaling-not-so-much/</link>
		<comments>http://www.derivante.com/2009/02/25/drupal-scaling-not-so-much/#comments</comments>
		<pubDate>Wed, 25 Feb 2009 16:23:45 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[Framework]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=237</guid>
		<description><![CDATA[After working on the migration this week and moving CitySquares.com over to a new environment we ended the week with a load test from Soasta. Starting at several hundred concurrent users to a cap of 2500 simulated users on at once we were able to really pick apart the infrastructure. Couple this with our own [...]]]></description>
			<content:encoded><![CDATA[<p>After working on the migration this week and moving CitySquares.com over to a new environment we ended the week with a load test from <a href="http://www.soasta.com/" target="_blank">Soasta</a>.  Starting at several hundred concurrent users to a cap of 2500 simulated users on at once we were able to really pick apart the infrastructure.  Couple this with our own internal testing and tweaking with ab over the past few days I've come to the conclusion that Drupal does not scale so well when you're talking about traffic in the tens of millions of page views a month.  It's just not possible to get requests in and out fast enough without continually adding more boxes to the front end.  I tried every setup from mod_php to php-fcgi, worker, prefork, eaccelerator, xcache, and every configuration therein.</p>
<p>I realized that I was over-thinking the problem so I did a little test to just see how much overhead Drupal is adding.  I can execute ten times the number of queries natively through PHP and mysql_connect than if I were to execute those queries from within Drupal with just bootstrap included.  Apache answered requests and processed these several million round trips to the database much faster.  I understand that as a framework there is a lot being loaded, but a whole decimal point off when dealing with requests/sec is a huge overhead.</p>
<p>The better part of the last few days was spent gutting the inner workings of Drupal and removing as much of it as possible to lover overhead, reduce response time, and hopefully let us scale a little further on the hardware that we have.  As it is, traffic is growing a steady 40-50% month over month for a while now and we are already dealing with several million unique visitors a month.  That number is growing rapidly with no ceiling in the near future and it's unfortunate that at this point the only way to make this work is to break the upgrade path of Drupal.</p>
<p>Going through the code base, it's easy to see where so much of the bloat comes from.  With comments like: "TODO: remove this when we require at least PHP 4.4.0" there is a lot of backwards compatibility that is required to support such a large community.   I am seeing more and more Frameworks replace legacy CMS systems that provide a bare bone set of tools.   The agility, performance, and maintenance of a ground up site using frameworks like Symfony or Rails leads to longer development cycles and increased costs but there is a point where you stop putting a round peg in a square hole and realize that the cost of maintenance and development down the road will far exceed the up front investment.</p>
<p>I'm glad Drupal has been able to take us this far and I'm very grateful for the community behind it.  It's a pleasant surprise that everything is still working and hopefully we can get a few more miles out of it before we are ready to do the next iteration of the site.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/02/25/drupal-scaling-not-so-much/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SQL script to grab the worst performing indexes</title>
		<link>http://www.derivante.com/2009/02/11/sql-script-to-grab-the-worst-performing-indexes/</link>
		<comments>http://www.derivante.com/2009/02/11/sql-script-to-grab-the-worst-performing-indexes/#comments</comments>
		<pubDate>Wed, 11 Feb 2009 17:52:23 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=231</guid>
		<description><![CDATA[I have been doing a lot of auditing and clean up of database performance the last few days. We are currently in the middle of a migration and with hardware infrastructure in place it is time to go back and see what changes we can do on the code and database side of things to [...]]]></description>
			<content:encoded><![CDATA[<p>I have been doing a lot of auditing and clean up of database performance the last few days.  We are currently in the middle of a migration and with hardware infrastructure in place it is time to go back and see what changes we can do on the code and database side of things to help bring the site up to optimal performance and lower query times.  I found this gem in the MySQL Forge site; which turned out to be a great resource for MySQL tidbits.</p>
<pre class="sql"><span style="color: #808080; font-style: italic;">/*
SQL script to grab the worst performing indexes
in the whole server
*/</span>
<span style="color: #993333; font-weight: bold;">SELECT</span>
t.TABLE_SCHEMA <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`db`</span>
, t.TABLE_NAME <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`table`</span>
, s.INDEX_NAME <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`inde name`</span>
, s.COLUMN_NAME <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`field name`</span>
, s.SEQ_IN_INDEX <span style="color: #ff0000;">`seq in index`</span>
, s2.max_columns <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`# cols`</span>
, s.CARDINALITY <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`card`</span>
, t.TABLE_ROWS <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`est rows`</span>
, ROUND<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span>s.CARDINALITY / IFNULL<span style="color: #66cc66;">&#40;</span>t.TABLE_ROWS, <span style="color: #cc66cc;">0.01</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> * <span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span>, <span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`sel %`</span>
<span style="color: #993333; font-weight: bold;">FROM</span> INFORMATION_SCHEMA.STATISTICS s
<span style="color: #993333; font-weight: bold;">INNER</span> <span style="color: #993333; font-weight: bold;">JOIN</span> INFORMATION_SCHEMA.<span style="color: #993333; font-weight: bold;">TABLES</span> t
<span style="color: #993333; font-weight: bold;">ON</span> s.TABLE_SCHEMA = t.TABLE_SCHEMA
<span style="color: #993333; font-weight: bold;">AND</span> s.TABLE_NAME = t.TABLE_NAME
<span style="color: #993333; font-weight: bold;">INNER</span> <span style="color: #993333; font-weight: bold;">JOIN</span> <span style="color: #66cc66;">&#40;</span>
<span style="color: #993333; font-weight: bold;">SELECT</span>
TABLE_SCHEMA
, TABLE_NAME
, INDEX_NAME
, MAX<span style="color: #66cc66;">&#40;</span>SEQ_IN_INDEX<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> max_columns
<span style="color: #993333; font-weight: bold;">FROM</span> INFORMATION_SCHEMA.STATISTICS
<span style="color: #993333; font-weight: bold;">WHERE</span> TABLE_SCHEMA != <span style="color: #ff0000;">'mysql'</span>
<span style="color: #993333; font-weight: bold;">GROUP</span> <span style="color: #993333; font-weight: bold;">BY</span> TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> s2
<span style="color: #993333; font-weight: bold;">ON</span> s.TABLE_SCHEMA = s2.TABLE_SCHEMA
<span style="color: #993333; font-weight: bold;">AND</span> s.TABLE_NAME = s2.TABLE_NAME
<span style="color: #993333; font-weight: bold;">AND</span> s.INDEX_NAME = s2.INDEX_NAME
<span style="color: #993333; font-weight: bold;">WHERE</span> t.TABLE_SCHEMA != <span style="color: #ff0000;">'mysql'</span>                         <span style="color: #808080; font-style: italic;">/* Filter out the mysql system DB */</span>
<span style="color: #993333; font-weight: bold;">AND</span> t.TABLE_ROWS &amp;gt; <span style="color: #cc66cc;">10</span>                                   <span style="color: #808080; font-style: italic;">/* Only tables with some rows */</span>
<span style="color: #993333; font-weight: bold;">AND</span> s.CARDINALITY <span style="color: #993333; font-weight: bold;">IS</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>                           <span style="color: #808080; font-style: italic;">/* Need at least one non-NULL value in the field */</span>
<span style="color: #993333; font-weight: bold;">AND</span> <span style="color: #66cc66;">&#40;</span>s.CARDINALITY / IFNULL<span style="color: #66cc66;">&#40;</span>t.TABLE_ROWS, <span style="color: #cc66cc;">0.01</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> &amp;lt; <span style="color: #cc66cc;">1.00</span> <span style="color: #808080; font-style: italic;">/* Selectivity &amp;lt; 1.0 b/c unique indexes are perfect anyway */</span>
<span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> <span style="color: #ff0000;">`sel %`</span>, s.TABLE_SCHEMA, s.TABLE_NAME          <span style="color: #808080; font-style: italic;">/* Switch to `sel %` DESC for best non-unique indexes */</span>
<span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">10</span>;</pre>
<p>To audit just one database if you are running on a server with several different databases, just adjust the where clause to WHERE t.TABLE_SCHEMA = 'mytable'.  This would have been very useful when working with the cluster to recover memory/space from indexes that aren't being used and to optimize  queries to hit indexes that are meaningful.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/02/11/sql-script-to-grab-the-worst-performing-indexes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL Proxy Quick Start</title>
		<link>http://www.derivante.com/2009/02/10/mysql-proxy-quick-start/</link>
		<comments>http://www.derivante.com/2009/02/10/mysql-proxy-quick-start/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 16:40:39 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=229</guid>
		<description><![CDATA[There isn't too much out there for easy reliable mysql load balancing. I recently built a master/master mysqld setup and needed automatic roll over on the software side. After going through the options I ended up going with MySQL Proxy 0.60 for low overhead, ease of setup, and ease of maintenance. I installed the proxy [...]]]></description>
			<content:encoded><![CDATA[<p>There isn't too much out there for easy reliable mysql load balancing.   I recently built a master/master mysqld setup and needed automatic roll over on the software side.  After going through the options I ended up going with MySQL Proxy 0.60 for low overhead, ease of setup, and ease of maintenance.  I installed the proxy on each of my web servers (already load balanced through hardware) and configured the code to just use the local proxy.   A quick start for RHEL systems,  you can get it up and running in just a few simple steps:</p>
<ol>
<li>Download the latest version at <a href="http://dev.mysql.com/downloads/mysql-proxy/">http://dev.mysql.com/downloads/mysql-proxy/</a>.</li>
<li><strong>yum install</strong> <strong>glib2-devel</strong>.x86_64 <strong>ncurses-devel</strong>.x86_64 <strong>libevent-devel</strong>.x86_64 <strong>mysql-devel</strong>.x86_64</li>
<li><strong>./configure -without-lua</strong> (for a straight proxy, you will not need LUA and this will save you having to install additional dependencies).</li>
<li><strong>make &amp;&amp; make install</strong></li>
</ol>
<p>Easy enough to get it installed in the system, but I noticed that the default parameters and as with most software is rather wide open so be sure to lock down the ip addresses you are listening on.</p>
<p>mysql-proxy --proxy-backend-addresses=192.168.1.101:3306 --proxy-backend-addresses=192.168.1.102:3306 --admin-address=127.0.0.1:4041 --proxy-address=127.0.0.1:4040 --proxy-skip-profiling --daemon</p>
<p>The admin address and proxy address will be accessible by anyone hitting your server unless you lock it down to your local host.   Now configure your web servers to use localhost:4040 to connect to MySQL and all should be working fine.   You can find additional configuration parameters, admin commands, and LUA scripts at <a href="http://forge.mysql.com/wiki/MySQL_Proxy">http://forge.mysql.com/wiki/MySQL_Proxy</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/02/10/mysql-proxy-quick-start/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>There and Back Again, an EC2 MySQL Cluster</title>
		<link>http://www.derivante.com/2009/01/26/there-and-back-again-an-ec2-mysql-cluster/</link>
		<comments>http://www.derivante.com/2009/01/26/there-and-back-again-an-ec2-mysql-cluster/#comments</comments>
		<pubDate>Tue, 27 Jan 2009 04:40:20 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[SQL]]></category>
		<category><![CDATA[Web Architecture]]></category>
		<category><![CDATA[Web Technology]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=223</guid>
		<description><![CDATA[Limitations of EC2 as a web platform: Price- An m1.xlarge instance will run you ~$600 with data transfer costs. Managed hosting solutions run cheaper especially if you plan on purchasing in bulk. The grid is designed for on-demand computation and not as a cost efficient web services. Configuration- There are a limited number of options [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Limitations of EC2 as a web platform:</strong></p>
<ul style="padding-left: 20px;">
<li> <strong>Price</strong>-  An m1.xlarge instance will run you ~$600 with data transfer costs. Managed hosting solutions run cheaper especially if you plan on purchasing in bulk. The grid is designed for on-demand computation and not as a cost efficient web services.</li>
<li><strong>Configuration</strong>- There are a limited number of options and you will not be able to tailor the hardware to your application. Databases over 10GB of size will have performance issues since that is the memory cap.</li>
<li><strong>Network storage</strong>- The primary disks offers limited storage, additional volumes will need to be attached across the network and at an additional cost.</li>
<li><strong>Software</strong> - No hardware based solutions for load balancing or custom application servers.  The model is software driven so all needs must be met with a software solution.  In a managed hosting or collocation  solution you will at least have the option of adding additional hardware and having a private network.  No dedicated switching, routers, firewall, or load balancers.</li>
</ul>
<p><br><br />
<strong>EC2 might be right for you if:</strong></p>
<ul style="padding-left: 20px;">
<li><strong>Distribution Awareness</strong>-  Your application was designed to scale horizontally from the get-go and you can take advantage of grid computing.</li>
<li><strong>Research and Development</strong>- EC2 &amp; Rightscale will allow for you to bring up new servers, test configuration, and scale quickly.  If you are not sure what your hardware demands will be or the scope of the project it will allow for some flexibility to get this right before committing to rather lengthy contracts with other hosting options.</li>
<li><strong>Disaster Recovery</strong>- If you need an off-site mirror for your site that you can keep dormant and activate as needed.</li>
</ul>
<p><br><br />
Over the past several months I have been doing extensive development using Amazon's EC2 as my hardware infrastructure.  I was tasked with taking CitySquares.com from a New England area hyper-local search and business directory to a national site in a few months.   Due to the memory limitations of EC2 instances, m1.xlarge only providing 15GB the jump from a 15GB database to anything larger becomes very costly.  When everything was able to be contained in two servers in a master/slave environment we were able to provide redundancy, performance, and easy management when working with the database. The final estimations of the national roll out would put our core data at 50GB.  Far too large for any one EC2 instance.  Going to disk was not an option as everything works off of EBS attached storage and that any disk writes means traveling over the network.  An additional overhead which degrades performance even more when switching off RAM.  Then, there was also the nature of the data, which means that at any page load, any piece of data could be requested.</p>
<p>With OS overhead, index storage, and ndb overhead, each x1.mlarge instance gave about 12GB of usable storage.  Include replication, and it comes down to 6GB of storage per node.  To store a ~50GB database that I had planned on requires 8 storage nodes, two management servers, and two mysqld api servers.  This is where it became important to understand the advantages of vertical scaling versus horizontal scaling.  EC2 provided fast horizontal scaling and configuration.  Servers can be launched on demand and their configurations scripted.   While I appreciated that aspect of cloud computing and being able to bring that many servers up and configure each one relatively quickly I really just needed two decent boxes with 64GB of RAM in each and a master/slave setup.  The operations costs for the cluster was $6000/month, a hefty bill considering I could buy all the hardware needed to run the cluster in just a few months of paying for EC2.</p>
<p>Twelve servers later and a working cluster we were able to successfully roll out our MySQL Cluster with minimal performance loss.   A lot of it was due to tweaking every query top to bottom.  The site runs on a Drupal core, which meant a lot of the queries were not designed with distribution awareness both from code in house and the core.  This was another added growing pain since the network overhead of running 12 servers on shared resources, with mediocre latency, and throughput limitations was amplifying flaws in the database design and every poorly designed query and join degraded performance significantly.</p>
<p>To give credit where credit is due, EC2 did allow us to scale the site up rather quickly.  We were able to test server configurations, new applications, and have easy management.  It would not have been possible for us to push out the data, handle influx of new traffic, and expand as fast as we had without it.  Long term however, it made absolutely no sense that once we were finished scaling up, to stay on EC2.  It is a great platform for start-ups to be able to configure and launch servers for their service or application and grow rapidly.</p>
<p>As a database platform, until EC2 offers more configuration options in it's hardware and the ability to increase memory, the cap of 15GB will make EC2 problematic for any database that plans on growing past that. It is important to understand your application and database needs before considering EC2.  It is no surprise to me that even with the RightScale interface, and easy management of EC2 web sites are reluctant to switch off their own hardware or managed hosting.</p>
<p>Both Sun and Continuent are pursuing MySQL Clustering on cloud computing.  As of this post, Continuent for EC2 was still in closed beta and testing and Sun is doing their own research into offering more support for database clusters on compute clouds.    Maybe in the future this can be something to revisit, but it would require more ndb configuration options on the network layer to cope with shared bandwidth and additional hardware configurations by Amazon (someday).</p>
<p>The 6.4 release which is in beta now offers new features which would make MySQL Clustering more attractive on cloud computing architecture:</p>
<ul>
<li><em>Ability to add nodes and node groups online.</em> This will allow the database to scale up without taking the cluster down.</li>
<li><em>Data node multithreading support. </em> The m1.xlarge instance comes with 4 cores.</li>
</ul>
<p>If you are setting up a MySQL cluster, the following resources will help get you up and running quickly:</p>
<ul>
<li><a href="http://www.severalnines.com/" target="_blank">http://www.severalnines.com/</a></li>
<li><a href="http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-configuration.html" target="_blank">http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-configuration.html</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/01/26/there-and-back-again-an-ec2-mysql-cluster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->