<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Derivante &#187; content analysis</title>
	<atom:link href="http://www.derivante.com/tag/content-analysis/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.derivante.com</link>
	<description>to obtain or receive from a source</description>
	<lastBuildDate>Mon, 26 Apr 2010 18:44:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>PHP Content Rating / Confidence</title>
		<link>http://www.derivante.com/2009/09/01/php-content-rating-confidence/</link>
		<comments>http://www.derivante.com/2009/09/01/php-content-rating-confidence/#comments</comments>
		<pubDate>Tue, 01 Sep 2009 21:15:51 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web Technology]]></category>
		<category><![CDATA[content analysis]]></category>
		<category><![CDATA[rating]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=717</guid>
		<description><![CDATA[For those web masters dealing with user feedback looking to weight content finding the right algorithm can be challenging. From experience, there is going to be no out of the box solution since each site and the requirements will be unique. Getting started and putting a solid foundation is the first step and of course, [...]]]></description>
			<content:encoded><![CDATA[<p>For those web masters dealing with user feedback looking to weight content finding the right algorithm can be challenging. From experience, there is going to be no out of the box solution since each site and the requirements will be unique.  Getting started and putting a solid foundation is the first step and of course, refining over time to get just the right recipe. The following is a binomial proportion confidence interval (what?).  It is a PHP implementation using the Wilson Score Interval to weight the feedback.</p>
<p><span id="more-717"></span></p>
<pre class="php"><span style="color: #000000; font-weight: bold;">class</span> Rating
<span style="color: #66cc66;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">public</span> <a style="text-decoration: none;" href="http://www.php.net/static"><span style="color: #000066;">static</span></a> <span style="color: #000000; font-weight: bold;">function</span> ratingAverage<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$positive</span>, <span style="color: #0000ff;">$total</span>, <span style="color: #0000ff;">$power</span> = <span style="color: #ff0000;">'0.05'</span><span style="color: #66cc66;">&#41;</span>
  <span style="color: #66cc66;">&#123;</span>
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$total</span> == <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">0</span>;
&nbsp;
    <span style="color: #0000ff;">$z</span> = Rating::<span style="color: #006600;">pnormaldist</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>-<span style="color: #0000ff;">$power</span>/<span style="color: #cc66cc;">2</span>,<span style="color: #cc66cc;">0</span>,<span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>;
    <span style="color: #0000ff;">$p</span> = <span style="color: #cc66cc;">1.0</span> * <span style="color: #0000ff;">$positive</span> / <span style="color: #0000ff;">$total</span>;
    <span style="color: #0000ff;">$s</span> = <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span> + <span style="color: #0000ff;">$z</span>*<span style="color: #0000ff;">$z</span>/<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span>*<span style="color: #0000ff;">$total</span><span style="color: #66cc66;">&#41;</span> - <span style="color: #0000ff;">$z</span> * <a style="text-decoration: none;" href="http://www.php.net/sqrt"><span style="color: #000066;">sqrt</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span>*<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>-<span style="color: #0000ff;">$p</span><span style="color: #66cc66;">&#41;</span>+<span style="color: #0000ff;">$z</span>*<span style="color: #0000ff;">$z</span>/<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">4</span>*<span style="color: #0000ff;">$total</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>/<span style="color: #0000ff;">$total</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>/<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>+<span style="color: #0000ff;">$z</span>*<span style="color: #0000ff;">$z</span>/<span style="color: #0000ff;">$total</span><span style="color: #66cc66;">&#41;</span>;
    <span style="color: #b1b100;">return</span> <span style="color: #0000ff;">$s</span>;
  <span style="color: #66cc66;">&#125;</span> 
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <a style="text-decoration: none;" href="http://www.php.net/static"><span style="color: #000066;">static</span></a> <span style="color: #000000; font-weight: bold;">function</span> pnormaldist<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$qn</span><span style="color: #66cc66;">&#41;</span>
  <span style="color: #66cc66;">&#123;</span>
    <span style="color: #0000ff;">$b</span> = <a style="text-decoration: none;" href="http://www.php.net/array"><span style="color: #000066;">array</span></a><span style="color: #66cc66;">&#40;</span>
      <span style="color: #cc66cc;">1.570796288</span>, <span style="color: #cc66cc;">0.03706987906</span>, <span style="color: #cc66cc;">-0</span>.8364353589e<span style="color: #cc66cc;">-3</span>,
      <span style="color: #cc66cc;">-0</span>.2250947176e<span style="color: #cc66cc;">-3</span>, <span style="color: #cc66cc;">0</span>.6841218299e<span style="color: #cc66cc;">-5</span>, <span style="color: #cc66cc;">0</span>.5824238515e<span style="color: #cc66cc;">-5</span>,
      <span style="color: #cc66cc;">-0</span>.104527497e<span style="color: #cc66cc;">-5</span>, <span style="color: #cc66cc;">0</span>.8360937017e<span style="color: #cc66cc;">-7</span>, <span style="color: #cc66cc;">-0</span>.3231081277e<span style="color: #cc66cc;">-8</span>,
      <span style="color: #cc66cc;">0</span>.3657763036e<span style="color: #cc66cc;">-10</span>, <span style="color: #cc66cc;">0</span>.6936233982e<span style="color: #cc66cc;">-12</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$qn</span> &lt; <span style="color: #cc66cc;">0.0</span> || <span style="color: #cc66cc;">1.0</span> &lt; <span style="color: #0000ff;">$qn</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">0.0</span>;
&nbsp;
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$qn</span> == <span style="color: #cc66cc;">0.5</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">0.0</span>;
&nbsp;
    <span style="color: #0000ff;">$w1</span> = <span style="color: #0000ff;">$qn</span>;
&nbsp;
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$qn</span> &gt; <span style="color: #cc66cc;">0.5</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #0000ff;">$w1</span> = <span style="color: #cc66cc;">1.0</span> - <span style="color: #0000ff;">$w1</span>;
&nbsp;
    <span style="color: #0000ff;">$w3</span> = - <a style="text-decoration: none;" href="http://www.php.net/log"><span style="color: #000066;">log</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">4.0</span> * <span style="color: #0000ff;">$w1</span> * <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1.0</span> - <span style="color: #0000ff;">$w1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
    <span style="color: #0000ff;">$w1</span> = <span style="color: #0000ff;">$b</span><span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#93;</span>;
&nbsp;
    <span style="color: #b1b100;">for</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$i</span> = <span style="color: #cc66cc;">1</span>;<span style="color: #0000ff;">$i</span> &lt;= <span style="color: #cc66cc;">10</span>; <span style="color: #0000ff;">$i</span>++<span style="color: #66cc66;">&#41;</span>
      <span style="color: #0000ff;">$w1</span> += <span style="color: #0000ff;">$b</span><span style="color: #66cc66;">&#91;</span><span style="color: #0000ff;">$i</span><span style="color: #66cc66;">&#93;</span> * <a style="text-decoration: none;" href="http://www.php.net/pow"><span style="color: #000066;">pow</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$w3</span>,<span style="color: #0000ff;">$i</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$qn</span> &gt; <span style="color: #cc66cc;">0.5</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #b1b100;">return</span> <a style="text-decoration: none;" href="http://www.php.net/sqrt"><span style="color: #000066;">sqrt</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$w1</span> * <span style="color: #0000ff;">$w3</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
    <span style="color: #b1b100;">return</span> - <a style="text-decoration: none;" href="http://www.php.net/sqrt"><span style="color: #000066;">sqrt</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$w1</span> * <span style="color: #0000ff;">$w3</span><span style="color: #66cc66;">&#41;</span>;
  <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span>
&nbsp;</pre>
<p>The function takes 3 parameters: the positive votes, total votes, and the power. The power can be adjusted, 0.10 to have a 95% chance that your lower bound is correct, 0.05 to have a 97.5% chance, etc.  Sample usage:</p>
<pre class="php">sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>,<span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">250</span>,<span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1000</span>,<span style="color: #cc66cc;">500</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #000000; font-weight: bold;">function</span> sample<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span>,<span style="color: #0000ff;">$n</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#123;</span>
  <a style="text-decoration: none;" href="http://www.php.net/echo"><span style="color: #000066;">echo</span></a> Rating::<span style="color: #006600;">ratingAverage</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span>,<span style="color: #0000ff;">$p</span>+<span style="color: #0000ff;">$n</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #66cc66;">&#125;</span>
&nbsp;</pre>
<p>Output:</p>
<table border="0" cellpadding="2" cellspacing="0" border="1">
<tbody>
<tr>
<td>Positive</td>
<td>Negative</td>
<td>Score</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0.20654931654388</td>
</tr>
<tr>
<td>100</td>
<td>50</td>
<td>0.58789756740385</td>
</tr>
<tr>
<td>250</td>
<td>100</td>
<td>0.6648317184611</td>
</tr>
<tr>
<td>1000</td>
<td>500</td>
<td>0.6424116916199</td>
</tr>
</tbody>
</table>
<p>When dealing with sites like Reddit, Digg, and the like  you have a certain "freshness" element.  The above solution might be a working model for the entire span of the site, but for that front page element you will need to implement some form of "gravity".  This can be done by taking the raw score and decaying it over time, like so:</p>
<pre class="php">&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> Rating
<span style="color: #66cc66;">&#123;</span>
  ...
  <span style="color: #000000; font-weight: bold;">public</span> <a style="text-decoration: none;" href="http://www.php.net/static"><span style="color: #000066;">static</span></a> <span style="color: #000000; font-weight: bold;">function</span> gravityRating<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$positive</span>, <span style="color: #0000ff;">$total</span>, <span style="color: #0000ff;">$time</span>, <span style="color: #0000ff;">$power</span> = <span style="color: #ff0000;">'0.05'</span><span style="color: #66cc66;">&#41;</span>
  <span style="color: #66cc66;">&#123;</span>
    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$total</span> == <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span>
      <span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">0</span>;
    <span style="color: #b1b100;">return</span> <span style="color: #66cc66;">&#40;</span>Rating::<span style="color: #006600;">ratingAverage</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$positive</span>, <span style="color: #0000ff;">$total</span>, <span style="color: #0000ff;">$power</span><span style="color: #66cc66;">&#41;</span> / <a style="text-decoration: none;" href="http://www.php.net/pow"><span style="color: #000066;">pow</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$time</span>,<span style="color: #cc66cc;">0.5</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
  <span style="color: #66cc66;">&#125;</span>
  ...
<span style="color: #66cc66;">&#125;</span>
&nbsp;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span>,<span style="color: #ff0000;">'0.5'</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span>,<span style="color: #ff0000;">'1'</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span>,<span style="color: #ff0000;">'4'</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span>,<span style="color: #ff0000;">'8'</span><span style="color: #66cc66;">&#41;</span>;
sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span>,<span style="color: #cc66cc;">50</span>,<span style="color: #ff0000;">'24'</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #000000; font-weight: bold;">function</span> sample<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span>,<span style="color: #0000ff;">$n</span>,<span style="color: #0000ff;">$time</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#123;</span>
  <a style="text-decoration: none;" href="http://www.php.net/echo"><span style="color: #000066;">echo</span></a> Rating::<span style="color: #006600;">gravityRating</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$p</span>,<span style="color: #0000ff;">$p</span>+<span style="color: #0000ff;">$n</span>,<span style="color: #0000ff;">$time</span><span style="color: #66cc66;">&#41;</span>.<span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>;
<span style="color: #66cc66;">&#125;</span>
&nbsp;</pre>
<p>In the example above, $time represents the age (in hours) and you can see the decay in the output:</p>
<p>0.83141271310867<br />
0.58789756740385<br />
0.29394878370192<br />
0.20785317827717<br />
0.12000408843024</p>
<p>My recommendation would be to "cap" the time to stop decay after a fixed period such as 12 or 24 hours to stop the initial boost of fresh content and let it normalize quickly.  The rate of decay of course, can be adjusted as fast or as slow as you want and again the individual weighting you want to apply will vary from site to site.  Depending on the volatility of your content, a front page "freshness" that will encompass a week would not merit a 12 hour decay, but rather a week long decay.   Hopefully the above code is enough to get started with content rating and making better use of user feedback and can help lead web masters to making a more intelligent calculation of their content beyond the traditional "5-Star Rating".</p>
<p>[ c9maji2tvz ]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/09/01/php-content-rating-confidence/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Bayesian Filtering &amp; Financial Applications</title>
		<link>http://www.derivante.com/2009/03/27/bayesian-filtering-financial-applications/</link>
		<comments>http://www.derivante.com/2009/03/27/bayesian-filtering-financial-applications/#comments</comments>
		<pubDate>Fri, 27 Mar 2009 17:08:17 +0000</pubDate>
		<dc:creator>Clay vanSchalkwijk</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bayesian]]></category>
		<category><![CDATA[content analysis]]></category>
		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=257</guid>
		<description><![CDATA[A friend of mine and I recently started a new project. After kicking around several ideas we finally reached a consensus on applying software prediction to financial data. This has been pursued pretty heavily but from a home brew stand point, we wanted to make software that could compete by mashing up existing data and [...]]]></description>
			<content:encoded><![CDATA[<p>A friend of mine and I recently started a new project.  After kicking around several ideas we finally reached a consensus on applying software prediction to financial data.  This has been pursued pretty heavily but from a home brew stand point, we wanted to make software that could compete by mashing up existing data and technology available on the internet to make competitive and functioning software.</p>
<p>We intend on predicting the movement of stocks based on real time content analysis.  This requires a good deal of machine learning and historical data, but even good content analysis is not enough.  Using Bayesian Filtering with noise word reduction we plan on processing historical data and assigning the content to one of three categories: moveup, movedown, nomove.  In order to train the filters, past press releases will be inserted into the filter mashed up with the stock data to track how the markets reacted to the context of the content.  Over time, the software will be able to recognize keywords that trigger positive versus negative emotion in the market that would drive the price one way or the other.  A score can be applied much like spam scores are applied and this number can be used as part of a greater overall algorithm to determine an action.</p>
<p>Just to bring a few readers up to speed on exactly how this will be applied, take the following formula:</p>
<p><img class="aligncenter size-full wp-image-263" src="http://www.derivante.com/wp-content/uploads/2009/03/b307149835ea31ced4ae23af2ab89b05.png" alt="" width="437" height="46" /></p>
<p>Rather than training it to recognize the probability of spam we train it to recognize the probability that the word will trigger positive stock movement:</p>
<ul>
<li><span class="texhtml"><em>p</em></span> is the probability that the content will result in positive movement.</li>
<li><span class="texhtml"><em>p</em>1</span> is the probability <span class="texhtml"><em>p</em>(<em>S</em> | <em>W</em>1)</span> that it is positive knowing it contains a first word (for example "capital");</li>
<li><span class="texhtml"><em>p</em>2</span> is the probability <span class="texhtml"><em>p</em>(<em>S</em> | <em>W</em>2)</span> that it is positive knowing it contains a second word (for example "boosted");</li>
<li><em>etc...</em></li>
</ul>
<p>The entire body of the content will be processed against a known database of words and the market reaction to the presence of those words.  The basic Bayesian filtering will need to be extended to deal with phrase recognition but overall a solid proven technology for machine learning to build from.</p>
<p>This information by itself, is nothing revolutionary but with strong pattern analysis like candlestick pattern recognition and other market indicators it can be used to create an accurate trading platform for marginal gains which over time can offer pretty high returns.  There is certainly a lot of potential for this if it works, but it heavily depends on working accurately and there will be a lot of trial and error in the process.</p>
<p>For more reading on the concepts and components behind this idea, check out:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier">Naive Bayes Classifier</a></li>
<li><a href="http://www.leavittbrothers.com/education/candlestick_patterns/">Candlestick Patterns</a></li>
<li><a href="http://en.wikipedia.org/wiki/Candlestick_chart">Candlestick Charting</a></li>
<li><a href="http://www.paulgraham.com/better.html">Better Bayesian Filtering</a></li>
<li><a href="http://www.tdameritrade.com/tradingtools/partnertools/api_dev.html">TD Ameritrade API</a></li>
</ul>
<p>The nice part is all the historical data is out there around the internet which makes back-testing and scoring very easy to do and there will need to be a lot of testing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/03/27/bayesian-filtering-financial-applications/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->