<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Some stuff &#187; query</title>
	<atom:link href="http://blog.yhuang.org/?feed=rss2&#038;tag=query" rel="self" type="application/rss+xml" />
	<link>https://blog.yhuang.org</link>
	<description>here.</description>
	<lastBuildDate>Wed, 27 Aug 2025 08:50:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.1</generator>
		<item>
		<title>data structure problem</title>
		<link>https://blog.yhuang.org/?p=854</link>
		<comments>https://blog.yhuang.org/?p=854#comments</comments>
		<pubDate>Fri, 09 Mar 2012 04:50:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[entropy]]></category>
		<category><![CDATA[fraction]]></category>
		<category><![CDATA[mathbb]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[structure]]></category>

		<guid isPermaLink="false">http://scripts.mit.edu/~zong/wpress/?p=854</guid>
		<description><![CDATA[Another problem by fakalin. A data structure has the entropy bound if all queries have amortized time , where is the fraction of the time that key is queried. It has the working-set property if the time to search for an element is , where is the number of elements queried since the last access [...]]]></description>
			<content:encoded><![CDATA[<p>Another problem by fakalin.</p>
<blockquote><p>A data structure has the entropy bound if all queries have amortized time \(O(\sum_k p_k \log 1/p_k)\), where \(p_k\) is the fraction of the time that key \(k\) is queried. It has the working-set property if the time to search for an element \(x_i\) is \(O(\log t_i)\), where \(t_i\) is the number of elements queried since the last access to \(x_i\). Prove that the working-set property implies the entropy bound.</p></blockquote>
<p>This isn&#8217;t really a data structure problem, per se.<br />
<span id="more-854"></span><br />
The general intuition here is that, if the waiting time between two queries to a key \(k\) is \(t(k)\), then key \(k\) ends up taking up about a \(p_k = 1/t(k)\) fraction of the queries, and therefore the average query time is about \(\sum_{k\in K} 1/t(k) (\log t(k))\).</p>
<p>While this is exactly true for evenly spaced-out queries, the general case only requires a slight modification using any of the rudimentary convex inequalities such as:</p>
<p><strong>Jensen&#8217;s inequality:</strong> If \(f\) is a convex function and \(X\) is a random variable, then \(\mathbb{E}f(X)\ge f(\mathbb{E}X)\).</p>
<p>(The proof just uses the definition of what a convex function is. Furthermore, if we recognize that \(-\log x\) is a convex function, then we get \(\mathbb{E}\log(X)\le \log(\mathbb{E}X)\), which restates the well-known fact that the geometric mean is less than or equal to the arithmetic mean of a collection of real numbers.)</p>
<p>Now, let \(N\) be the total number of queries. Let \(n(k)\) be the number of queries on key \(k\). Let \(t_i(k)\) be the time between the \(i\)th query on key \(k\) and the previous query on the same key. Let \(\bar{a}(k)< C \sum_{i=1}^{n(k)} \log t_i(k) / n(k)\) (for some \(C>0\)) be the <em>average</em> query time looking up key \(k\), as guaranteed by the working-set property (*). Let \(\bar{t}(k) = \sum_{i=1}^{n(k)} t_i(k) / n(k)\) be the <em>average</em> time between queries for key \(k\).</p>
<p>Note that we must have \(\bar{t}(k) \le N/n(k)\). Furthermore, \(p_k = n(k)/N\) by definition, hence \(\bar{t}(k) \le 1/p_k\) (**). The average query time over all keys is therefore:</p>
<p>\(\sum_k \bar{a}(k) n(k) / N\)<br />
\(< \sum_k [C \sum_{i=1}^{n(k)} \log t_i(k) / n(k)] [n(k) / N]\), by (*)<br />
\(\le C \sum_k [\log \sum_{i=1}^{n(k)} t_i(k) / n(k)] [n(k) / N]\), by Jensen&#8217;s inequality<br />
\(= C \sum_k [\log \bar{t}(k)] p_k\)<br />
\(\le C \sum_k p_k \log 1/p_k\), by (**)</p>
<p>∎</p>
]]></content:encoded>
			<wfw:commentRss>https://blog.yhuang.org/?feed=rss2&#038;p=854</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
