<div dir="ltr">Hi Paul,<div><br></div><div>The video from PyCon AU last year is available, but you might as well wait until the content from LCA which will be somewhat updated for the latest information. Code samples and slides from last year are on github: <a href="https://github.com/tleeuwenburg/pyconau15">https://github.com/tleeuwenburg/pyconau15</a>. I can think of a few ways to improve the slides and the code examples and I hope to find the time for that. If nothing else, I have the first few days of the conference :) </div><div><br></div><div>For me, computer science is where the subject domain is computation. It includes investigations into memory, disk, algorithm efficiency, design methodology etc. A fun definition I heard for data science is "What they call statistics in silicon valley". I think that data science focuses on the identification of trends, features and relationships in both structured and unstructured data. Typically, this involves both algorithmic processing and analytical understanding from a human analyst. Machine learning is the exploration of effective algorithms for the prediction of future states based on complex inputs and complex (and potentially hidden) rules and relationships. Happy to provide more exposition here. I might not include it in the presentation, because I'm not sure how into 'computer science' / how academic the general audience is likely to be. Maybe in 2017 I'll re-run this as a kind of 'masterclass' idea rather than a tutorial and get into some more challenging territory.</div><div><br></div><div>I struggle with mathematics but have made the effort to learn some fundamentals. It is possible to use many of the techniques of machine learning "black box" but it's not really possible to do data science that way. I think it would be worth learning:</div><div> -- Standard deviation function</div><div> -- Normal distribution</div><div> -- How to draw a good graph</div><div> -- How to draw a scatter plot</div><div> -- How to draw a Venn Diagram of the state space. Almost all complex probability theory can be more easily understood when you start with a square representing "everything", then start bisecting it into smaller sub-populations. It's easier to understand Bayes Rule, correlation, causation, likelihoods....</div><div><br></div><div>Visual methods are highly effective.</div><div><br></div><div>Regarding trending and anomaly detection ... I am straying outside of my main knowledge here, but in general you will be looking here at data normalisation, hypothesis testing, bias elimination/identification and significance thresholding. Some methods are more robust to this than others. If you have a use case, I'd be happy to hear about it! :)</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 13 January 2016 at 20:59, Paul Gear <span dir="ltr"><<a href="mailto:paul@gear.dyndns.org" target="_blank">paul@gear.dyndns.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span class="">
On 13/01/16 18:30, Tennessee Leeuwenburg wrote:<br>
</span><div><div class="h5"><blockquote type="cite">
<div dir="ltr">Hi all,
<div><br>
</div>
<div>I am running a tutorial on data science at LCA. The chief
language used will be Python, but users of other technologies
will still find the concepts relevant. In preparation for
this, I will be dusting off my slide deck, re-running the
code, and updating the content with a small amount of new
findings from the last 6 months. This is also an opportunity
to focus the content on what the LCA audience might be most
interested in.</div>
<div><br>
</div>
<div>Does anyone on this list have any particular questions
around data science / machine learning / AI which they would
like to see answered? </div>
<div><br>
</div>
<div>The session is practical, with supplied code and data, and
audience members should be able to re-create the results while
the session is being presented. Are there any particular
problems that people are confronted with? I might not be able
to re-work a major case study, but I should be able to
incorporate some relevant examples...</div>
<div><br>
</div>
<div>Cheers,</div>
<div>-T<br>
</div>
</div>
</blockquote>
<br></div></div>
Hi Tennessee,<br>
<br>
I won't be at LCA this year, but would love to see the slides/code
samples. Here are my suggestions, not having any real background in
data science:<br>
<ul>
<li>What's the difference between data science and computer
science? i.e. What are the important characteristics which
distinguish it as a field (or sub-field) in its own right?</li>
<li>My eyes tend to glaze over at the first sign of grade 12 or
higher maths (even though I did pretty well at it in grade 12).
What are the main mathematical concepts that non-data scientists
need to brush up on to understand what data scientists are
telling them?</li>
<li>Keen to hear anything you can teach about the theory behind
trending & anomaly detection, especially as it relates to
modern monitoring systems.<br>
</li>
</ul>
<p>Regards,<br>
Paul<br>
</p>
</div>
<br>_______________________________________________<br>
linux-aus mailing list<br>
<a href="mailto:linux-aus@lists.linux.org.au">linux-aus@lists.linux.org.au</a><br>
<a href="http://lists.linux.org.au/mailman/listinfo/linux-aus" rel="noreferrer" target="_blank">http://lists.linux.org.au/mailman/listinfo/linux-aus</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">--------------------------------------------------<br>Tennessee Leeuwenburg<br><a href="http://myownhat.blogspot.com/" target="_blank">http://myownhat.blogspot.com/</a><br>"Don't believe everything you think"</div>
</div>