Some people ask me for a "issue" in mapreduce-jobhistory (/jobhistory.jsp) - the history tooks a while to load the site on high-traffic clusters. For that I'll explain the mechanism: The history-files will be available for 30 days (hardcoded in pre-h21). That produce a lot of logs and waste also space on the hadoop-jobtracker. So I have some installations which hold 20GB on logs in history, as a dependecy a audit of long running jobs isn't really useable. Beginning from h21 the cleanup is configurable: Key: mapreduce.jobtracker.jobhistory.maxage Default: 7 * 24 * 60 * 60 * 1000L (one week) to set the store into a 3-day period use: mapreduce.jobtracker.jobhistory.maxage 3 * 24 * 60 * 60 * 1000L That means 3 Days, 24 hours, 60 minutes, 60 seconds and a cache size of 1000. a other way, but more a hack via crond.d: find /var/log/hadoop-0.20/history/done/ -type f -mtime +1 |xargs rm -f
Hey, I'm Alex. I founded X-Warp, Infinimesh, Infinite Devices, Scalytics and worked with Cloudera, E.On, Google, Evariant, and had the incredible luck to build products with outstanding people in my life, across the globe.