From 972016a3f5b61c37e387a282404fda1b5aefa34e Mon Sep 17 00:00:00 2001 From: jhkimon Date: Sat, 13 Jun 2026 14:12:13 +0900 Subject: [PATCH 1/3] =?UTF-8?q?docs:=20IPW=20=EC=84=A4=EB=AA=85=20?= =?UTF-8?q?=EC=88=98=EC=A0=95=20(=EC=84=B1=EB=B3=84=20->=20=EC=A4=91?= =?UTF-8?q?=EC=A6=9D=EB=8F=84)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- book/ipw_basics/what_is_ipw_en.ipynb | 82 ++++++++++++++------------- book/ipw_basics/what_is_ipw_ko.ipynb | 84 ++++++++++++++++------------ 2 files changed, 91 insertions(+), 75 deletions(-) diff --git a/book/ipw_basics/what_is_ipw_en.ipynb b/book/ipw_basics/what_is_ipw_en.ipynb index c114873..5afb72b 100644 --- a/book/ipw_basics/what_is_ipw_en.ipynb +++ b/book/ipw_basics/what_is_ipw_en.ipynb @@ -31,11 +31,13 @@ "\n", "> **Does medication actually help you recover faster?**\n", "\n", - "This is a question deeply connected to our everyday lives. Whether the drug I take when I'm sick can shorten my hospital stay — or which treatments a hospital should adopt as standard care — these decisions all hinge on getting the answer right. A wrong answer could mean missing an effective treatment, or trusting one that doesn't work.\n", + "This question is closely connected to our everyday lives. It directly affects decisions such as whether the medicine I take when I am sick actually shortens my hospital stay, or which treatment a hospital should adopt as the standard of care. If we reach the wrong conclusion, we may overlook an effective treatment or continue to trust and use a treatment that does not actually work.\n", "\n", - "Let's consider a specific scenario. Suppose men tend to get sicker more often, leading to longer hospital stays and higher medication use. Women, on the other hand, tend to get less sick, so they have shorter stays and take less medication.\n", + "Now, let us consider a simple situation.\n", "\n", - "In this setting, can the data we observe truly reflect the effect of the treatment? How can we use the data to estimate the treatment effect without distortion or bias?" + "Patients with severe illness are usually sicker to begin with, so they tend to stay in the hospital longer. At the same time, they are also more likely to take the medication. In contrast, patients with mild illness are relatively less sick, so they tend to have shorter hospital stays and are less likely to take the medication.\n", + "\n", + "In this situation, does the data we observe really show the “effect of the medication”? How can we use data to estimate the effect of the medication without distortion?" ] }, { @@ -59,15 +61,15 @@ "id": "f18b57da", "metadata": {}, "source": [ - "The simplest approach is to compare the average hospitalization days between those who took the drug and those who didn't. This naive comparison is valid when treatment is randomly assigned (RCT) — random assignment makes treatment independent of confounders like sex, so the two groups can be compared directly.\n", + "The simplest approach is to compare the average hospitalization days between those who took the drug and those who didn't. This naive comparison is valid when treatment is randomly assigned (RCT) — random assignment makes treatment independent of confounders like severity, so the two groups can be compared directly.\n", "\n", - "But our situation is different. Most people who took the drug are male, and most who didn't are female. Since men tend to be sicker to begin with, their longer hospital stays may have nothing to do with the drug — **it's sex, not the drug, that's driving the difference**.\n", + "But our situation is different. Most people who took the drug are more severe, and most who didn't are less severe. Since the more severe people tend to be sicker to begin with, their longer hospital stays may have nothing to do with the drug — **it's severity not the drug, that's driving the difference**.\n", "\n", "```mermaid\n", "graph LR\n", " Treatment[Drug T] --> Outcome[Hospitalization Days Y]\n", - " Sex[Sex X] --> Treatment\n", - " Sex --> Outcome\n", + " Severity[Severity X] --> Treatment\n", + " Severity --> Outcome\n", "```\n", "\n", "A variable that affects both treatment assignment and the outcome is called a **confounder**. Confounders are what cause our estimate of the ATE to be distorted." @@ -89,9 +91,9 @@ "outputs": [], "source": [ "drug_example = pd.DataFrame(dict(\n", - " sex= [\"M\",\"M\",\"M\",\"M\",\"M\",\"M\", \"W\",\"W\",\"W\",\"W\"],\n", + " severity= [\"M\",\"M\",\"M\",\"M\",\"M\",\"M\", \"W\",\"W\",\"W\",\"W\"],\n", " drug=[1,1,1,1,1,0, 1,0,1,0],\n", - " days=[5,5,5,5,5,8, 2,4,2,4]\n", + " days=[7,7,7,7,7,8, 2,3,2,3]\n", "))\n", "drug_example" ] @@ -101,15 +103,15 @@ "id": "4cbd6140", "metadata": {}, "source": [ - "| Group | Treated (T=1) | Untreated (T=0) |\n", - "|---|---|---|\n", - "| Male | 5 | 1 |\n", - "| Female | 2 | 2 |\n", + "| Severity | Treated (T=1) | Untreated (T=0) |\n", + "| -------- | ------------: | --------------: |\n", + "| Severe | 5 | 1 |\n", + "| Mild | 2 | 2 |\n", "\n", - "In reality we can't observe both $Y_0$ and $Y_1$ for the same person, but for the purposes of this explanation, let's assume we can.\n", + "In reality, we cannot observe both (Y_0) and (Y_1) for the same person, but for the purposes of this explanation, let’s assume we can.\n", "\n", - "- Male: $Y_1 = 5, Y_0 = 8 \\Rightarrow Y_1 - Y_0 = -3$\n", - "- Female: $Y_1 = 2, Y_0 = 4 \\Rightarrow Y_1 - Y_0 = -2$" + "* Severe patients: (Y_1 = 7, Y_0 = 8 \\Rightarrow Y_1 - Y_0 = -1)\n", + "* Mild patients: (Y_1 = 2, Y_0 = 3 \\Rightarrow Y_1 - Y_0 = -1)\n" ] }, { @@ -119,7 +121,7 @@ "source": [ "## The Problem with Naive Comparison\n", "\n", - "Let's compute the naive comparison. The treated group average is $(5 \\times 5 + 2 \\times 2)/7 = 29/7$, and the untreated group average is $(1 \\times 8 + 2 \\times 4)/3 = 16/3$." + "Let's compute the naive comparison. The treated group average is $(5 \\times 7 + 2 \\times 2)/7 = 39/7$, and the untreated group average is $(1 \\times 8 + 2 \\times 3)/3 = 14/3$." ] }, { @@ -139,22 +141,24 @@ "metadata": {}, "source": [ "$$\n", - "\\hat{ATE}_{naive} = 29/7 - 16/3 \\approx -1.19\n", + "\\hat{ATE}_{naive} = 39/7 - 14/3 \\approx +0.90\n", "$$\n", "\n", - "This result doesn't look right. The drug reduces hospitalization by 3 days for men and 2 days for women — so in every case, the effect should be at least $-2$. Yet the naive comparison gives only $-1.19$, which is smaller in magnitude than either group's true effect.\n", + "This result doesn't look right. The drug reduces hospitalization by 1 day for **both** men and women, so the true effect is $-1$ in every subgroup. Yet the naive comparison gives $+0.90$: the sign has flipped, making a helpful drug look **harmful**.\n", + "\n", + "This distortion isn't from the drug — it comes from the **difference in group composition** (the unequal severity distribution between treated and untreated groups).\n", "\n", - "This distortion isn't from the drug — it comes from the **difference in group composition** (the unequal sex distribution between treated and untreated groups).\n", + "This is a textbook case of **Simpson's paradox**: a trend that holds within every subgroup can reverse once the subgroups are pooled. The drug shortens hospitalization in both the mild and severe groups, yet the combined comparison makes it look harmful — simply because most treated patients are severe patients and they stay longer regardless of the drug.\n", "\n", "### True ATE\n", "\n", - "Since 6 males each have a −3 day effect and 4 females each have a −2 day effect:\n", + "Since 6 severe patients each have a −1 day effect and 4 mild patients each have a −1 day effect:\n", "\n", "$$\n", - "ATE = \\frac{-3 \\times 6 + (-2) \\times 4}{10} = -2.6\n", + "ATE = \\frac{-1 \\times 6 + (-1) \\times 4}{10} = -1.0\n", "$$\n", "\n", - "The naive comparison substantially **underestimates** the true effect of the drug." + "The naive comparison doesn't just shrink the effect — it **reverses its sign**, making an effective drug appear harmful." ] }, { @@ -164,11 +168,11 @@ "source": [ "## Solution: Inverse Probability Weighting (IPW)\n", "\n", - "The root cause of the problem is that **the sex composition of the treated and untreated groups was different**. What if we could reweight the data so that the two groups appear to have the same sex composition?\n", + "The root cause of the problem is that **the severity composition of the treated and untreated groups was different**. What if we could reweight the data so that the two groups appear to have the same severity composition?\n", "\n", - "The core idea is simple. **Assign each individual a weight that makes the data look as if both groups had the same composition.** If a certain sex is overrepresented in one group, reduce their influence; if underrepresented, increase it. When sex is the source of confounding, correcting for this imbalance through reweighting makes a simple mean comparison valid.\n", + "The core idea is simple. **Assign each individual a weight that makes the data look as if both groups had the same composition.** If a certain severity is overrepresented in one group, reduce their influence; if underrepresented, increase it. When severity is the source of confounding, correcting for this imbalance through reweighting makes a simple mean comparison valid.\n", "\n", - "What we want is for the sex distribution within each treatment arm to be balanced — treated and untreated groups should look the same within each sex. Since men tend to take the drug more often, there are too few untreated men. We need to \"inflate\" that underrepresented group by multiplying by the inverse of its probability.\n", + "What we want is for the severity distribution within each treatment arm to be balanced — treated and untreated groups should look the same within each severity. Since severe patients tend to take the drug more often, there are too few untreated severe patients. We need to \"inflate\" that underrepresented group by multiplying by the inverse of its probability.\n", "\n", "This is the idea behind **Inverse Probability Weighting (IPW)**. Each individual receives the following weight:\n", "\n", @@ -198,27 +202,27 @@ "source": [ "### Why Does This Weight Create Balance?\n", "\n", - "The treatment probabilities by sex are:\n", + "The treatment probabilities by severity are:\n", "\n", "$$\n", - "\\text{Male: } P(T=1 \\mid X=M)=\\frac{5}{6}, \\quad P(T=0 \\mid X=M)=\\frac{1}{6}\n", + "\\text{Severe: } P(T=1 \\mid X=M)=\\frac{5}{6}, \\quad P(T=0 \\mid X=M)=\\frac{1}{6}\n", "$$\n", "\n", "$$\n", - "\\text{Female: } P(T=1 \\mid X=W)=\\frac{1}{2}, \\quad P(T=0 \\mid X=W)=\\frac{1}{2}\n", + "\\text{Mild: } P(T=1 \\mid X=W)=\\frac{1}{2}, \\quad P(T=0 \\mid X=W)=\\frac{1}{2}\n", "$$\n", "\n", "The weights are the reciprocals of these values:\n", "\n", - "- Male + treated: $6/5$,   Male + untreated: $6$\n", - "- Female + treated: $2$,   Female + untreated: $2$\n", + "- Severe + treated: $6/5$,   Severe + untreated: $6$\n", + "- Mild + treated: $2$,   Mild + untreated: $2$\n", "\n", - "There is originally only 1 untreated male, but with a weight of 6, he is **treated as if there were 6 of him**. Conversely, the 5 treated males each get a weight of $6/5$, so their total also sums to 6.\n", + "There is originally only 1 untreated for severe group, but with a weight of 6, he is **treated as if there were 6 of him**. Conversely, the 5 treated severe group each get a weight of $6/5$, so their total also sums to 6.\n", "\n", - "As a result, within the male group, the total weight is 6 for both treated and untreated — a perfect balance. The same holds for females. With this balanced dataset, a simple mean comparison recovers the true ATE:\n", + "As a result, within the severe group, the total weight is 6 for both treated and untreated — a perfect balance. The same holds for mild patients. With this balanced dataset, a simple mean comparison recovers the true ATE:\n", "\n", "$$\n", - "\\hat{ATE}_{IPW} = \\frac{5 \\times 6 + 2 \\times 4}{6+4} - \\frac{8 \\times 6 + 4 \\times 4}{6+4} = -2.6\n", + "\\hat{ATE}_{IPW} = \\frac{7 \\times 6 + 2 \\times 4}{6+4} - \\frac{8 \\times 6 + 3 \\times 4}{6+4} = -1.0\n", "$$" ] }, @@ -229,8 +233,8 @@ "metadata": {}, "outputs": [], "source": [ - "ps = drug_example.groupby(\"sex\")[\"drug\"].mean()\n", - "drug_example[\"ps\"] = drug_example[\"sex\"].map(ps)\n", + "ps = drug_example.groupby(\"severity\")[\"drug\"].mean()\n", + "drug_example[\"ps\"] = drug_example[\"severity\"].map(ps)\n", "drug_example[\"w\"] = (\n", " drug_example[\"drug\"] / drug_example[\"ps\"]\n", " + (1 - drug_example[\"drug\"]) / (1 - drug_example[\"ps\"])\n", @@ -250,7 +254,7 @@ "id": "ca99e148", "metadata": {}, "source": [ - "The reweighted dataset is not the original observed data. It can be interpreted as a **pseudo-population** — an artificial construct in which treatment and sex appear independent. In this pseudo-population, no particular sex group is systematically over- or under-treated, recreating a situation equivalent to random assignment. As a result, a simple mean comparison is sufficient to estimate the causal effect." + "The reweighted dataset is not the original observed data. It can be interpreted as a **pseudo-population** — an artificial construct in which treatment and severity appear independent. In this pseudo-population, no particular severity group is systematically over- or under-treated, recreating a situation equivalent to random assignment. As a result, a simple mean comparison is sufficient to estimate the causal effect." ] }, { @@ -262,7 +266,7 @@ "\n", "The treatment probability $e(X) = P(T=1 \\mid X)$ used to construct the weights is called the **propensity score**. It represents each individual's probability of receiving the treatment given their covariates $X$.\n", "\n", - "In our example, since $X$ was a single binary variable, we could compute $P(T \\mid X)$ by directly counting the treatment rate within each sex group.\n", + "In our example, since $X$ was a single binary variable, we could compute $P(T \\mid X)$ by directly counting the treatment rate within each severity group.\n", "\n", "In practice, however, this probability is rarely known. When there are multiple covariates or continuous variables involved, simple counting won't work. This is why we **estimate** the propensity score using a model — most commonly logistic regression.\n", "\n", @@ -273,7 +277,7 @@ "\\frac{T_i}{\\hat{e}(X_i)} + \\frac{1 - T_i}{1 - \\hat{e}(X_i)}\n", "$$\n", "\n", - "That said, this approach depends heavily on how well we estimate the propensity score. If the model fails to capture important nonlinearities or interactions — say, between age and sex — then $\\hat{e}(X)$ may not reflect the true treatment probability. The resulting weights would be inaccurate, covariate balance may not be achieved even after reweighting, and confounding may not be fully removed." + "That said, this approach depends heavily on how well we estimate the propensity score. If the model fails to capture important nonlinearities or interactions — say, between age and severity — then $\\hat{e}(X)$ may not reflect the true treatment probability. The resulting weights would be inaccurate, covariate balance may not be achieved even after reweighting, and confounding may not be fully removed." ] }, { diff --git a/book/ipw_basics/what_is_ipw_ko.ipynb b/book/ipw_basics/what_is_ipw_ko.ipynb index 91876b4..4218fd1 100644 --- a/book/ipw_basics/what_is_ipw_ko.ipynb +++ b/book/ipw_basics/what_is_ipw_ko.ipynb @@ -33,17 +33,27 @@ "\n", "이 질문은 우리의 삶과 아주 밀접한 문제입니다. 내가 아플 때 먹는 이 약이 입원 일수를 줄이는지, 병원에서 어떤 치료를 표준으로 삼을지와 같은 결정에 직접 영향을 미칩니다. 잘못된 답을 내리면 효과 있는 치료를 놓치거나, 효과 없는 치료를 믿고 사용하는 상황이 생길 수 있습니다.\n", "\n", - "이제 한 가지 상황을 떠올려 보겠습니다. 남성은 원래 더 아픈 경우가 많아 입원 일수가 길고, 그만큼 약도 더 많이 복용하는 경향이 있습니다. 반대로 여성은 상대적으로 덜 아파서 입원 일수가 짧고 약도 덜 복용합니다.\n", + "이제 한 가지 상황을 떠올려 보겠습니다.\n", + "\n", + "중증 환자는 원래 더 아픈 경우가 많아 입원 일수가 길고, 그만큼 약도 더 많이 복용하는 경향이 있습니다. 반대로 경증 환자는 상대적으로 덜 아파서 입원 일수가 짧고 약도 덜 복용합니다.\n", "\n", "이런 상황에서 우리가 관찰하는 데이터는 과연 \"약의 효과\"를 보여주고 있을까요? 이런 상황에서 어떻게 하면 데이터를 활용해서 약의 효과를 왜곡되지 않게 계산할 수 있을까요?" ] }, + { + "cell_type": "markdown", + "id": "0bba377f", + "metadata": {}, + "source": [ + "### 인과추론 다시 살펴보기" + ] + }, { "cell_type": "markdown", "id": "cb92ecce", "metadata": {}, "source": [ - "관심 있는 질문을 인과추론의 언어로 표현해 보겠습니다. 우리는 단순히 \"먹은 사람 vs 안 먹은 사람\"이 아니라, \"같은 사람이 약을 먹었을 때 vs 안 먹었을 때\"를 비교하고 싶습니다. 하지만 두 값을 같은 사람에게서 동시에 관찰할 수 없으므로, 전체 집단에서의 평균 효과를 목표로 합니다.\n", + "우선 관심 있는 질문을 인과추론의 언어로 표현해 보겠습니다. 우리는 단순히 \"먹은 사람 vs 안 먹은 사람\"이 아니라, \"같은 사람이 약을 먹었을 때 vs 안 먹었을 때\"를 비교하고 싶습니다. 하지만 두 값을 같은 사람에게서 동시에 관찰할 수 없으므로, 전체 집단에서의 평균 효과를 목표로 합니다.\n", "\n", "즉, \"전체 집단에서 모든 사람이 약을 복용했을 때의 평균 입원 일수와, 아무도 복용하지 않았을 때의 평균 입원 일수가 얼마나 차이 나는가?\"를 묻는 것입니다.\n", "\n", @@ -59,15 +69,15 @@ "id": "f18b57da", "metadata": {}, "source": [ - "가장 단순한 방법은 약을 먹은 사람과 먹지 않은 사람의 평균 입원 일수를 그냥 비교하는 것입니다. 이 단순 평균 비교는 약이 무작위로 배정된 경우(RCT)에는 타당합니다. 치료 할당이 무작위이면 약 복용 여부가 성별 같은 교란변수와 독립이 되어, 두 집단을 직접 비교해도 인과효과를 추정할 수 있기 때문입니다.\n", + "가장 단순한 방법은 약을 먹은 사람과 먹지 않은 사람의 평균 입원 일수를 그냥 비교하는 것입니다. 이 단순 평균 비교는 약이 무작위로 배정된 경우(RCT)에는 타당합니다. 치료 할당이 무작위이면 약 복용 여부가 중증도 같은 교란변수와 독립이 되어, 두 집단을 직접 비교해도 인과효과를 추정할 수 있기 때문입니다.\n", "\n", - "하지만 지금 상황은 다릅니다. 약을 먹은 사람은 대부분 남성이고, 안 먹은 사람은 대부분 여성입니다. 남성은 원래 더 아프기 때문에 입원 일수가 긴데, 이 때문에 **약 때문이 아니라 성별 때문에 입원 일수가 달라 보이는 것**입니다.\n", + "하지만 지금 상황은 다릅니다. 약을 먹은 사람은 대부분 중증 환자들이고, 안 먹은 사람은 대부분 경증 환자들입니다. 중증 환자들은 원래 더 아프기 때문에 입원 일수가 긴데, 이 때문에 **약 때문이 아니라 중증도 때문에 입원 일수가 달라 보이는 것**입니다.\n", "\n", "```mermaid\n", "graph LR\n", " Treatment[약 복용 T] --> Outcome[입원 일수 Y]\n", - " Sex[성별 X] --> Treatment\n", - " Sex --> Outcome\n", + " Severity[중증도 X] --> Treatment\n", + " Severity --> Outcome\n", "```\n", "\n", "이처럼 약 복용 여부와 입원 일수 모두에 영향을 주는 변수를 **교란변수(confounder)** 라고 합니다. 교란변수는 우리가 알고 싶은 ATE를 왜곡시키는 원인입니다." @@ -89,9 +99,9 @@ "outputs": [], "source": [ "drug_example = pd.DataFrame(dict(\n", - " sex= [\"M\",\"M\",\"M\",\"M\",\"M\",\"M\", \"W\",\"W\",\"W\",\"W\"],\n", + " severity= [\"M\",\"M\",\"M\",\"M\",\"M\",\"M\", \"W\",\"W\",\"W\",\"W\"],\n", " drug=[1,1,1,1,1,0, 1,0,1,0],\n", - " days=[5,5,5,5,5,8, 2,4,2,4]\n", + " days=[7,7,7,7,7,8, 2,3,2,3]\n", "))\n", "drug_example" ] @@ -103,13 +113,13 @@ "source": [ "| 그룹 | 약 복용 (T=1) | 약 미복용 (T=0) |\n", "|---|---|---|\n", - "| 남성 | 5명 | 1명 |\n", - "| 여성 | 2명 | 2명 |\n", + "| 중증 | 5명 | 1명 |\n", + "| 경증 | 2명 | 2명 |\n", "\n", "현실에서는 $Y_0$와 $Y_1$을 동시에 관측할 수 없지만, 설명을 위해 두 값을 모두 알고 있다고 가정하겠습니다.\n", "\n", - "- 남성: $Y_1 = 5, Y_0 = 8 \\Rightarrow Y_1 - Y_0 = -3$\n", - "- 여성: $Y_1 = 2, Y_0 = 4 \\Rightarrow Y_1 - Y_0 = -2$" + "- 중증: $Y_1 = 7, Y_0 = 8 \\Rightarrow Y_1 - Y_0 = -1$\n", + "- 경증: $Y_1 = 2, Y_0 = 3 \\Rightarrow Y_1 - Y_0 = -1$" ] }, { @@ -119,7 +129,7 @@ "source": [ "## 단순 평균 비교의 문제\n", "\n", - "단순 평균 비교를 해보겠습니다. 치료 집단의 평균은 $(5 \\times 5 + 2 \\times 2)/7 = 29/7$, 비치료 집단은 $(1 \\times 8 + 2 \\times 4)/3 = 16/3$입니다." + "단순 평균 비교를 해보겠습니다. 약 복용 집단의 평균은 $(5 \\times 7 + 2 \\times 2)/7 = 39/7$, 약 비복용 집단은 $(1 \\times 8 + 2 \\times 3)/3 = 14/3$입니다." ] }, { @@ -139,22 +149,24 @@ "metadata": {}, "source": [ "$$\n", - "\\hat{ATE}_{naive} = 29/7 - 16/3 \\approx -1.19\n", + "\\hat{ATE}_{naive} = 39/7 - 14/3 \\approx +0.90\n", "$$\n", "\n", - "이 값은 이상합니다. 남성은 약을 먹으면 입원 일수가 3일, 여성은 2일 줄어듭니다. 어떤 경우에도 효과가 최소 $-2$ 이상이어야 하는데, 단순 평균 비교는 $-1.19$로 훨씬 작게 나옵니다.\n", + "이 값은 이상합니다. 경증 환자도 중증 환자도 약을 먹으면 입원 일수가 1일씩 줄어듭니다. 모든 집단에서 효과가 $-1$인데, 단순 평균 비교는 $+0.90$으로 부호가 정반대로 뒤집혀 마치 약이 해로운 것처럼 나옵니다.\n", + "\n", + "이는 약의 효과가 아니라 **집단 구성의 차이(중증도 분포 차이)** 가 섞여 왜곡된 결과입니다.\n", "\n", - "이는 약의 효과가 아니라 **집단 구성의 차이(성별 분포 차이)** 가 섞여 왜곡된 결과입니다.\n", + "이것을 우리는 **심슨의 역설(Simpson's paradox)** 이라고 부릅니다. 각 부분집단에서는 일관되게 나타나던 경향이, 집단을 합치면 정반대로 뒤집히는 현상이죠. 여기서도 약은 경증 집단과 중증 집단 모두에서 입원 일수를 줄이지만, 둘을 합쳐 비교하면 오히려 해로워 보입니다. 약을 먹은 사람 대부분이 중증 환자이고, 중증 환자는 약과 무관하게 원래 입원 일수가 길기 때문입니다.\n", "\n", "### 실제 ATE\n", "\n", - "입원 일수가 3일 줄어드는 남성이 6명, 2일 줄어드는 여성이 4명이므로:\n", + "입원 일수가 1일 줄어드는 중증환자가 6명, 1일 줄어드는 경증환자가 4명이므로:\n", "\n", "$$\n", - "ATE = \\frac{-3 \\times 6 + (-2) \\times 4}{10} = -2.6\n", + "ATE = \\frac{-1 \\times 6 + (-1) \\times 4}{10} = -1.0\n", "$$\n", "\n", - "단순 평균 비교는 실제 효과보다 약의 감소 효과를 **훨씬 작게** 보이게 합니다." + "단순 평균 비교는 약의 효과를 작게 보이게 할 뿐 아니라, **부호까지 뒤집어** 효과 있는 약을 해로운 것처럼 보이게 만듭니다." ] }, { @@ -164,11 +176,11 @@ "source": [ "## 해결책: Inverse Probability Weighting (IPW)\n", "\n", - "문제의 근본 원인은 **약을 먹은 집단과 먹지 않은 집단의 성별 구성이 달랐기 때문**입니다. 두 집단이 같은 성별 구성을 가진 것처럼 보이도록 데이터를 재구성하면 어떨까요?\n", + "문제의 근본 원인은 **약을 먹은 집단과 먹지 않은 집단의 중증도 구성이 달랐기 때문**입니다. 두 집단이 같은 증상 심각성 구성을 가진 것처럼 보이도록 데이터를 재구성하면 어떨까요?\n", "\n", - "핵심 아이디어는 단순합니다. **각 개인에게 가중치를 부여하여, 마치 두 집단의 성별 구성이 동일했던 것처럼 데이터를 다시 만들어주는 것**입니다. 특정 성별이 어떤 집단에 과도하게 많으면 그 영향력을 줄이고, 덜 포함되어 있으면 늘려주는 방식입니다.\n", + "핵심 아이디어는 단순합니다. **각 개인에게 가중치를 부여하여, 마치 두 집단의 증상 심각성이 동일했던 것처럼 데이터를 다시 만들어주는 것**입니다. 특정 중증도이 어떤 집단에 과도하게 많으면 그 영향력을 줄이고, 덜 포함되어 있으면 늘려주는 방식입니다.\n", "\n", - "우리가 원하는 건, 각 성별 안에서 \"약 먹은 사람 = 안 먹은 사람\"이 되도록 만드는 것입니다. 남성은 원래 약을 많이 먹기 때문에 약 안 먹은 남성이 너무 적습니다. 이 부족한 그룹을 \"부풀려야\" 합니다. 그래서 확률의 역수를 곱합니다.\n", + "우리가 원하는 건, 각 증상 심각성 안에서 \"약 먹은 사람 = 안 먹은 사람\"이 되도록 만드는 것입니다. 중증 환자는 원래 약을 많이 먹기 때문에 약 안 먹은 중증환자가 너무 적습니다. 이 부족한 그룹을 \"부풀려야\" 합니다. 그래서 확률의 역수를 곱합니다.\n", "\n", "이를 구현한 대표적인 방법이 **Inverse Probability Weighting (IPW)** 입니다. 각 개인에게 다음과 같은 가중치를 부여합니다.\n", "\n", @@ -180,7 +192,7 @@ "\\end{cases}\n", "$$\n", "\n", - "이는 \"해당 성별에서 그 사람이 실제로 받은 처치를 받을 확률의 역수\"로, **드물게 관찰된 경우일수록 더 큰 가중치를 부여하는 방식**입니다.\n", + "이는 \"해당 중증도에서 그 사람이 실제로 받은 처치를 받을 확률의 역수\"로, **드물게 관찰된 경우일수록 더 큰 가중치를 부여하는 방식**입니다.\n", "\n", "두 가지 상태를 합쳐 보통 다음과 같이 씁니다.\n", "\n", @@ -198,27 +210,27 @@ "source": [ "### 왜 이 가중치가 균형을 만들어주는가?\n", "\n", - "성별별 치료 확률을 계산해 보면:\n", + "중증도별 치료 확률을 계산해 보면:\n", "\n", "$$\n", - "\\text{남성: } P(T=1 \\mid X=M)=\\frac{5}{6}, \\quad P(T=0 \\mid X=M)=\\frac{1}{6}\n", + "\\text{중증: } P(T=1 \\mid X=M)=\\frac{5}{6}, \\quad P(T=0 \\mid X=M)=\\frac{1}{6}\n", "$$\n", "\n", "$$\n", - "\\text{여성: } P(T=1 \\mid X=W)=\\frac{1}{2}, \\quad P(T=0 \\mid X=W)=\\frac{1}{2}\n", + "\\text{경증: } P(T=1 \\mid X=W)=\\frac{1}{2}, \\quad P(T=0 \\mid X=W)=\\frac{1}{2}\n", "$$\n", "\n", "가중치는 이 값의 역수입니다.\n", "\n", - "- 남성 + 약 복용: $6/5$, 남성 + 미복용: $6$\n", - "- 여성 + 약 복용: $2$, 여성 + 미복용: $2$\n", + "- 중증 + 약 복용: $6/5$, 중증 + 미복용: $6$\n", + "- 경증 + 약 복용: $2$, 경증 + 미복용: $2$\n", "\n", - "남성 중 약을 복용하지 않은 사람은 원래 1명뿐이지만, 가중치가 6이므로 **마치 6명 있는 것처럼** 취급됩니다. 반대로 약을 복용한 남성 5명은 각각 $6/5$의 가중치를 가져 총합이 6으로 맞춰집니다.\n", + "중증 환자 중 약을 복용하지 않은 사람은 원래 1명뿐이지만, 가중치가 6이므로 **마치 6명 있는 것처럼** 취급됩니다. 반대로 약을 복용한 중증 환자 5명은 각각 $6/5$의 가중치를 가져 총합이 6으로 맞춰집니다.\n", "\n", - "결과적으로 남성 집단에서는 약 복용/미복용 그룹의 총 weight가 모두 6으로 균형을 이루고, 여성도 마찬가지입니다. 이제 단순 평균 비교만으로 원하는 ATE를 구할 수 있습니다.\n", + "결과적으로 중증 환자] 집단에서는 약 복용/미복용 그룹의 총 weight가 모두 6으로 균형을 이루고, 경증 환자도 마찬가지입니다. 이제 단순 평균 비교만으로 원하는 ATE를 구할 수 있습니다.\n", "\n", "$$\n", - "\\hat{ATE}_{IPW} = \\frac{5 \\times 6 + 2 \\times 4}{6+4} - \\frac{8 \\times 6 + 4 \\times 4}{6+4} = -2.6\n", + "\\hat{ATE}_{IPW} = \\frac{7 \\times 6 + 2 \\times 4}{6+4} - \\frac{8 \\times 6 + 3 \\times 4}{6+4} = -1.0\n", "$$" ] }, @@ -229,8 +241,8 @@ "metadata": {}, "outputs": [], "source": [ - "ps = drug_example.groupby(\"sex\")[\"drug\"].mean()\n", - "drug_example[\"ps\"] = drug_example[\"sex\"].map(ps)\n", + "ps = drug_example.groupby(\"severity\")[\"drug\"].mean()\n", + "drug_example[\"ps\"] = drug_example[\"severity\"].map(ps)\n", "drug_example[\"w\"] = (\n", " drug_example[\"drug\"] / drug_example[\"ps\"]\n", " + (1 - drug_example[\"drug\"]) / (1 - drug_example[\"ps\"])\n", @@ -250,7 +262,7 @@ "id": "ca99e148", "metadata": {}, "source": [ - "이렇게 재구성된 데이터는 실제로 관측된 데이터가 아닙니다. 성별과 치료가 마치 독립인 것처럼 보이도록 만든 **가상의 모집단(pseudo-population)** 으로 해석할 수 있습니다. 이 가상 모집단에서는 더 이상 특정 성별이 특정 치료를 더 많이 받는 구조가 없기 때문에, 무작위로 약 복용이 배정된 것과 같은 상황이 됩니다." + "이렇게 재구성된 데이터는 실제로 관측된 데이터가 아닙니다. 중증도과 치료가 마치 독립인 것처럼 보이도록 만든 **가상의 모집단(pseudo-population)** 으로 해석할 수 있습니다. 이 가상 모집단에서는 더 이상 특정 중증도이 특정 치료를 더 많이 받는 구조가 없기 때문에, 무작위로 약 복용이 배정된 것과 같은 상황이 됩니다." ] }, { @@ -262,7 +274,7 @@ "\n", "가중치를 만들 때 사용한 치료 확률 $e(X) = P(T=1 \\mid X)$를 **성향점수(propensity score)** 라고 부릅니다. 각 개인이 자신의 공변량 $X$를 바탕으로 특정 처치를 받을 확률을 의미합니다.\n", "\n", - "지금 예시에서는 $X$가 하나뿐이라 성별 내 치료 비율을 단순히 세어서 $P(T \\mid X)$를 직접 계산할 수 있었습니다.\n", + "지금 예시에서는 $X$가 하나뿐이라 집단 내 치료 비율을 단순히 세어서 $P(T \\mid X)$를 직접 계산할 수 있었습니다.\n", "\n", "하지만 실제 분석에서는 이 확률을 정확히 알 수 없는 경우가 대부분입니다. 공변량이 여러 개이거나 연속형 변수가 포함된 경우, 단순 비율 계산만으로는 성향점수를 구할 수 없습니다. 그래서 실제로는 로지스틱 회귀 같은 모델로 성향점수를 **추정**합니다.\n", "\n", @@ -273,7 +285,7 @@ "\\frac{T_i}{\\hat{e}(X_i)} + \\frac{1 - T_i}{1 - \\hat{e}(X_i)}\n", "$$\n", "\n", - "다만 이 방법도 성향점수를 얼마나 잘 추정하느냐에 크게 의존합니다. 나이와 성별의 상호작용이나 비선형 관계를 단순한 모형으로 추정하면 $\\hat{e}(X)$가 실제 확률을 제대로 반영하지 못할 수 있습니다. 그러면 가중치도 부정확해지고, 재가중된 데이터에서도 균형이 충분히 맞지 않아 교란이 완전히 제거되지 않을 수 있습니다." + "다만 이 방법도 성향점수를 얼마나 잘 추정하느냐에 크게 의존합니다. 나이와 중증도의 상호작용이나 비선형 관계를 단순한 모형으로 추정하면 $\\hat{e}(X)$가 실제 확률을 제대로 반영하지 못할 수 있습니다. 그러면 가중치도 부정확해지고, 재가중된 데이터에서도 균형이 충분히 맞지 않아 교란이 완전히 제거되지 않을 수 있습니다." ] }, { From 30996e38132ae8b066ad09866bba2a0c21ed47b3 Mon Sep 17 00:00:00 2001 From: jhkimon Date: Sat, 13 Jun 2026 14:12:15 +0900 Subject: [PATCH 2/3] =?UTF-8?q?chore:=20manim-video-pipeline=20=EC=98=A4?= =?UTF-8?q?=EB=94=94=EC=98=A4=20=EC=8A=A4=ED=81=AC=EB=A6=BD=ED=8A=B8=20?= =?UTF-8?q?=EC=B6=94=EA=B0=80=20=EB=B0=8F=20=EB=B9=84=EB=94=94=EC=98=A4=20?= =?UTF-8?q?=EC=86=8C=EC=8A=A4=20=EC=A0=95=EB=A6=AC?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../manim-video-pipeline/package-lock.json | 15 +- .../skills/manim-video-pipeline/package.json | 3 +- .../scripts/gen_en_slow.mjs | 108 ++++++ .../scripts/generate_elevenlabs_audio.mjs | 3 +- videos/what_is_ipw/en/src/00_intro.py | 37 ++ videos/what_is_ipw/en/src/01_what_is_ipw.py | 162 ++++++++ .../what_is_ipw/en/src/02_ipw_application.py | 331 ++++++++++++++++ videos/what_is_ipw/en/src/03_ipw_formula.py | 114 ++++++ .../what_is_ipw/en/src/04_propensity_score.py | 161 ++++++++ videos/what_is_ipw/en/src/05_summary.py | 136 +++++++ .../what_is_ipw/en/src/scripts/00_intro.txt | 3 + .../en/src/scripts/01_medicine_question.txt | 33 ++ .../en/src/scripts/02_ipw_application.txt | 71 ++++ .../en/src/scripts/03_ipw_formula.txt | 25 ++ .../en/src/scripts/04_propensity_score.txt | 37 ++ .../what_is_ipw/en/src/scripts/05_summary.txt | 31 ++ videos/what_is_ipw/en/src/thumbnail.py | 53 +++ videos/what_is_ipw/ko/src/00_intro.py | 41 ++ videos/what_is_ipw/ko/src/01_what_is_ipw.py | 196 ++++++++++ .../what_is_ipw/ko/src/02_ipw_application.py | 367 ++++++++++++++++++ videos/what_is_ipw/ko/src/03_ipw_formula.py | 137 +++++++ .../what_is_ipw/ko/src/04_propensity_score.py | 191 +++++++++ videos/what_is_ipw/ko/src/05_summary.py | 165 ++++++++ .../what_is_ipw/ko/src/scripts/00_intro.txt | 3 + .../ko/src/scripts/01_medicine_question.txt | 33 ++ .../ko/src/scripts/02_ipw_application.txt | 71 ++++ .../ko/src/scripts/03_ipw_formula.txt | 25 ++ .../ko/src/scripts/04_propensity_score.txt | 39 ++ .../what_is_ipw/ko/src/scripts/05_summary.txt | 31 ++ videos/what_is_ipw/ko/src/thumbnail.py | 53 +++ 30 files changed, 2672 insertions(+), 3 deletions(-) create mode 100644 .claude/skills/manim-video-pipeline/scripts/gen_en_slow.mjs create mode 100644 videos/what_is_ipw/en/src/00_intro.py create mode 100644 videos/what_is_ipw/en/src/01_what_is_ipw.py create mode 100644 videos/what_is_ipw/en/src/02_ipw_application.py create mode 100644 videos/what_is_ipw/en/src/03_ipw_formula.py create mode 100644 videos/what_is_ipw/en/src/04_propensity_score.py create mode 100644 videos/what_is_ipw/en/src/05_summary.py create mode 100644 videos/what_is_ipw/en/src/scripts/00_intro.txt create mode 100644 videos/what_is_ipw/en/src/scripts/01_medicine_question.txt create mode 100644 videos/what_is_ipw/en/src/scripts/02_ipw_application.txt create mode 100644 videos/what_is_ipw/en/src/scripts/03_ipw_formula.txt create mode 100644 videos/what_is_ipw/en/src/scripts/04_propensity_score.txt create mode 100644 videos/what_is_ipw/en/src/scripts/05_summary.txt create mode 100644 videos/what_is_ipw/en/src/thumbnail.py create mode 100644 videos/what_is_ipw/ko/src/00_intro.py create mode 100644 videos/what_is_ipw/ko/src/01_what_is_ipw.py create mode 100644 videos/what_is_ipw/ko/src/02_ipw_application.py create mode 100644 videos/what_is_ipw/ko/src/03_ipw_formula.py create mode 100644 videos/what_is_ipw/ko/src/04_propensity_score.py create mode 100644 videos/what_is_ipw/ko/src/05_summary.py create mode 100644 videos/what_is_ipw/ko/src/scripts/00_intro.txt create mode 100644 videos/what_is_ipw/ko/src/scripts/01_medicine_question.txt create mode 100644 videos/what_is_ipw/ko/src/scripts/02_ipw_application.txt create mode 100644 videos/what_is_ipw/ko/src/scripts/03_ipw_formula.txt create mode 100644 videos/what_is_ipw/ko/src/scripts/04_propensity_score.txt create mode 100644 videos/what_is_ipw/ko/src/scripts/05_summary.txt create mode 100644 videos/what_is_ipw/ko/src/thumbnail.py diff --git a/.claude/skills/manim-video-pipeline/package-lock.json b/.claude/skills/manim-video-pipeline/package-lock.json index 32aabc7..e5bdbda 100644 --- a/.claude/skills/manim-video-pipeline/package-lock.json +++ b/.claude/skills/manim-video-pipeline/package-lock.json @@ -9,7 +9,8 @@ "version": "1.0.0", "license": "ISC", "dependencies": { - "@elevenlabs/elevenlabs-js": "^2.38.1" + "@elevenlabs/elevenlabs-js": "^2.38.1", + "dotenv": "^16.3.1" } }, "node_modules/@elevenlabs/elevenlabs-js": { @@ -32,6 +33,18 @@ "integrity": "sha512-LTQ/SGc+s0Xc0Fu5WaKnR0YiygZkm9eKFvyS+fRsU7/ZWFF8ykFM6Pc9aCVf1+xasOOZpO3BAVgVrKvsqKHV7w==", "license": "MIT" }, + "node_modules/dotenv": { + "version": "16.6.1", + "resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.6.1.tgz", + "integrity": "sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow==", + "license": "BSD-2-Clause", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://dotenvx.com" + } + }, "node_modules/node-fetch": { "version": "2.7.0", "resolved": "https://registry.npmjs.org/node-fetch/-/node-fetch-2.7.0.tgz", diff --git a/.claude/skills/manim-video-pipeline/package.json b/.claude/skills/manim-video-pipeline/package.json index 95af029..2ce0428 100644 --- a/.claude/skills/manim-video-pipeline/package.json +++ b/.claude/skills/manim-video-pipeline/package.json @@ -11,6 +11,7 @@ "license": "ISC", "type": "commonjs", "dependencies": { - "@elevenlabs/elevenlabs-js": "^2.38.1" + "@elevenlabs/elevenlabs-js": "^2.38.1", + "dotenv": "^16.3.1" } } diff --git a/.claude/skills/manim-video-pipeline/scripts/gen_en_slow.mjs b/.claude/skills/manim-video-pipeline/scripts/gen_en_slow.mjs new file mode 100644 index 0000000..5bfbf95 --- /dev/null +++ b/.claude/skills/manim-video-pipeline/scripts/gen_en_slow.mjs @@ -0,0 +1,108 @@ +// One-off: regenerate English narration with slower speed + inter-chunk pauses. +// Outputs build/audio/{NN}_{name}.mp3 + .timings.json (same names as before). +import fs from "node:fs/promises"; +import os from "node:os"; +import path from "node:path"; +import { spawn } from "node:child_process"; +import dotenv from "dotenv"; +import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js"; + +const VOICE_ID = "7Nah3cbXKVmGX7gQUuwz"; +const MODEL_ID = "eleven_multilingual_v2"; +const OUTPUT_FORMAT = "mp3_44100_128"; +const SPEED = 0.9; +const GAP = 0.25; // seconds of silence between chunks + +const repoRoot = path.resolve(path.dirname(new URL(import.meta.url).pathname), "../../../.."); +const topicDir = path.join(repoRoot, "videos", "what_is_ipw"); +dotenv.config({ path: path.join(repoRoot, ".env") }); +const client = new ElevenLabsClient({ apiKey: process.env.ELEVENLABS_API_KEY }); + +const SCENES = [ + ["02", "ipw_application_en"], + ["03", "ipw_formula_en"], + ["04", "propensity_score_en"], + ["05", "summary_en"], +]; + +function splitIntoChunks(text, maxChars = 650) { + const paras = text.split(/\n\s*\n/g).map((p) => p.replace(/\s+/g, " ").trim()).filter(Boolean); + const chunks = []; + for (const p of paras) { + if (p.length <= maxChars) { chunks.push(p); continue; } + let cur = ""; + for (const s of p.split(/(?<=[.!?])\s+/).map((x) => x.trim()).filter(Boolean)) { + if ((cur + " " + s).trim().length > maxChars) { if (cur) chunks.push(cur); cur = s; } + else cur = (cur + " " + s).trim(); + } + if (cur) chunks.push(cur); + } + return chunks; +} + +async function streamToBuffer(stream) { + const reader = stream.getReader(); + const out = []; + while (true) { const { done, value } = await reader.read(); if (done) break; out.push(Buffer.from(value)); } + return Buffer.concat(out); +} + +function ff(args) { + return new Promise((res, rej) => { + const p = spawn("ffmpeg", args, { stdio: "ignore" }); + p.on("close", (c) => (c === 0 ? res() : rej(new Error("ffmpeg " + c)))); + }); +} +function probe(file) { + return new Promise((res, rej) => { + const p = spawn("ffprobe", ["-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", file]); + let o = ""; p.stdout.on("data", (d) => (o += d)); p.on("close", (c) => (c === 0 ? res(parseFloat(o)) : rej(new Error("ffprobe")))); + }); +} + +for (const [scene, name] of SCENES) { + const scriptPath = path.join(topicDir, "src", "scripts", `${scene}_${name}.txt`); + const outMp3 = path.join(topicDir, "build", "audio", `${scene}_${name}.mp3`); + const outTim = path.join(topicDir, "build", "audio", `${scene}_${name}.timings.json`); + const text = (await fs.readFile(scriptPath, "utf8")).trim(); + const chunks = splitIntoChunks(text); + const tmp = await fs.mkdtemp(path.join(os.tmpdir(), "en-slow-")); + + const silence = path.join(tmp, "sil.mp3"); + await ff(["-y", "-f", "lavfi", "-i", `anullsrc=channel_layout=mono:sample_rate=44100`, "-t", String(GAP), "-c:a", "libmp3lame", "-b:a", "128k", silence]); + + const chunkPaths = []; + for (let i = 0; i < chunks.length; i++) { + const stream = await client.textToSpeech.convert(VOICE_ID, { + text: chunks[i], modelId: MODEL_ID, outputFormat: OUTPUT_FORMAT, + voiceSettings: { speed: SPEED }, + }); + const buf = await streamToBuffer(stream); + const cp = path.join(tmp, `c${String(i).padStart(2, "0")}.mp3`); + await fs.writeFile(cp, buf); + chunkPaths.push(cp); + } + + // interleave silence between chunks + const listItems = []; + for (let i = 0; i < chunkPaths.length; i++) { + listItems.push(`file '${chunkPaths[i]}'`); + if (i < chunkPaths.length - 1) listItems.push(`file '${silence}'`); + } + const listPath = path.join(tmp, "list.txt"); + await fs.writeFile(listPath, listItems.join("\n")); + await ff(["-y", "-f", "concat", "-safe", "0", "-i", listPath, "-c:a", "libmp3lame", "-b:a", "128k", outMp3]); + + // timings: chunk i speech start = sum(prev durations) + i*GAP + const durs = []; + for (const cp of chunkPaths) durs.push(await probe(cp)); + let acc = 0; const tchunks = []; + for (let i = 0; i < durs.length; i++) { + const start = acc + i * GAP; + tchunks.push({ index: i + 1, text: chunks[i], duration: durs[i], start, end: start + durs[i] }); + acc += durs[i]; + } + const total = await probe(outMp3); + await fs.writeFile(outTim, JSON.stringify({ topic: "what_is_ipw", scene, sceneName: name, speed: SPEED, gap: GAP, totalDuration: total, chunkCount: chunks.length, chunks: tchunks }, null, 2)); + console.log(`${scene}_${name}: ${chunks.length} chunks, total ${total.toFixed(2)}s`); +} diff --git a/.claude/skills/manim-video-pipeline/scripts/generate_elevenlabs_audio.mjs b/.claude/skills/manim-video-pipeline/scripts/generate_elevenlabs_audio.mjs index ad7badd..bb9211a 100644 --- a/.claude/skills/manim-video-pipeline/scripts/generate_elevenlabs_audio.mjs +++ b/.claude/skills/manim-video-pipeline/scripts/generate_elevenlabs_audio.mjs @@ -5,6 +5,7 @@ import os from "node:os"; import path from "node:path"; import process from "node:process"; import { spawn } from "node:child_process"; +import dotenv from "dotenv"; import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js"; @@ -94,7 +95,7 @@ function splitIntoChunks(text, maxChars = 650) { async function ensureEnv(repoRoot) { const envPath = path.join(repoRoot, ".env"); - process.loadEnvFile(envPath); + dotenv.config({ path: envPath }); const apiKey = process.env.ELEVENLABS_API_KEY; if (!apiKey) { throw new Error(`ELEVENLABS_API_KEY not found in ${envPath}`); diff --git a/videos/what_is_ipw/en/src/00_intro.py b/videos/what_is_ipw/en/src/00_intro.py new file mode 100644 index 0000000..13d51ac --- /dev/null +++ b/videos/what_is_ipw/en/src/00_intro.py @@ -0,0 +1,37 @@ +from manim import * + +# Scene 00 (EN) — intro title card (~7s). +# timings: build/audio/00_intro_en.timings.json (total 6.92s) +# chunk1 0.0~3.1, chunk2 3.1~6.92 + + +class Intro(Scene): + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + topic = Text("Today's topic", font_size=36, color=GRAY_A, weight=BOLD) + ipw = Text("IPW", font_size=150, color=ORANGE, weight=BOLD) + full = Text("Inverse Probability Weighting", font_size=40, color=WHITE, weight=BOLD) + title = VGroup(topic, ipw, full).arrange(DOWN, buff=0.4).move_to(UP * 0.2) + rule = Line(LEFT * 2.9, RIGHT * 2.9, color=ORANGE, stroke_width=3).next_to(ipw, DOWN, buff=0.18) + sub = Text("Finding the true cause from observational data", + font_size=30, color=GRAY_B, weight=BOLD).next_to(title, DOWN, buff=0.7) + + play_at(0.3, FadeIn(topic, shift=DOWN * 0.1), run_time=0.4) + play_at(0.85, Write(ipw), run_time=0.7) + play_at(1.75, GrowFromCenter(rule), FadeIn(full, shift=UP * 0.1), run_time=0.5) + play_at(3.7, FadeIn(sub, shift=UP * 0.1), run_time=0.5) # chunk2 + + go_to(7.6) diff --git a/videos/what_is_ipw/en/src/01_what_is_ipw.py b/videos/what_is_ipw/en/src/01_what_is_ipw.py new file mode 100644 index 0000000..60f6e2f --- /dev/null +++ b/videos/what_is_ipw/en/src/01_what_is_ipw.py @@ -0,0 +1,162 @@ +from manim import * +import numpy as np + +ICON = "/Users/jhkim/Desktop/personal_study/causal/causal-studio/videos/assets/tabler-icons/icons/outline" + + +class MedicineQuestionSynced(Scene): + """Scene 01 (EN) — Simpson's paradox intro. Same animations, English text. + timings: build/audio/01_medicine_question_en.timings.json (total 70.25s, 17 chunks) + """ + + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + def icon(name, color, height=1.1): + m = SVGMobject(f"{ICON}/{name}.svg") + m.set_stroke(color, width=3) + m.set_fill(color, opacity=0) + m.scale_to_fit_height(height) + return m + + UNIT = 0.62 + + def make_bar(days, color, y, x0, unit=UNIT): + bar = RoundedRectangle(width=days * unit, height=0.42, corner_radius=0.07, + stroke_color=color, stroke_width=2.5, fill_color=color, fill_opacity=0.5) + bar.move_to([x0 + days * unit / 2, y, 0]) + return bar + + def bar_row(label, days, color, y, x0, fs=24, unit=UNIT): + lab = Text(label, font_size=fs, color=color, weight=BOLD) + lab.move_to([x0 - 0.3 - lab.width / 2, y, 0]) + bar = make_bar(days, color, y, x0, unit) + val = Text(f"{days:g}d", font_size=fs + 2, color=color, weight=BOLD).next_to(bar, RIGHT, buff=0.18) + return VGroup(lab, bar, val) + + # Beat A — sick → medicine → recover, really the medicine? (chunk2~3) + sick = icon("mood-sick", RED_B, height=1.5).move_to(LEFT * 3.4 + UP * 0.2) + happy = icon("mood-happy", GREEN_B, height=1.5).move_to(RIGHT * 3.4 + UP * 0.2) + rec_arrow = Arrow(sick.get_right(), happy.get_left(), color=WHITE, stroke_width=6, buff=0.5) + pill = icon("pill", BLUE_B, height=0.95).move_to(UP * 1.45) + pill_lbl = Text("Med", font_size=26, color=BLUE_B, weight=BOLD).next_to(pill, RIGHT, buff=0.15) + qbig = Text("?", font_size=120, color=RED, weight=BOLD).move_to(DOWN * 1.7) + + play_at(2, FadeIn(sick, shift=UP * 0.15), run_time=0.4) # chunk2 + play_at(3.5, GrowArrow(rec_arrow), FadeIn(happy, shift=UP * 0.15), run_time=0.5) + play_at(4.9, FadeIn(pill, shift=DOWN * 0.15), FadeIn(pill_lbl), run_time=0.4) + play_at(6.6, FadeIn(qbig, scale=1.2), run_time=0.4) # chunk3 + beatA = VGroup(sick, happy, rec_arrow, pill, pill_lbl, qbig) + + # Beat B — title (chunk4) + title = Text("How to find the true cause?", font_size=40, color=YELLOW, weight=BOLD).move_to(UP * 0.3) + play_at(9, FadeOut(beatA), FadeIn(title), run_time=0.5) # chunk4 + + # Beat C — what to compare? difference in recovery time (chunk5~6) + hint = Text("Medicine should help you heal faster, right?", font_size=30, color=GRAY_A, weight=BOLD).move_to(DOWN * 0.4) + play_at(14.1, title.animate.scale(0.85).to_edge(UP, buff=0.6), FadeIn(hint, shift=UP * 0.1), run_time=0.5) # chunk5 + + g_treat = icon("users", BLUE_B, height=1.2).move_to(LEFT * 3.4 + UP * 0.8) + t_lbl = Text("Took medicine", font_size=26, color=BLUE_B, weight=BOLD).next_to(g_treat, DOWN, buff=0.25) + g_ctrl = icon("users", RED_B, height=1.2).move_to(RIGHT * 3.4 + UP * 0.8) + c_lbl = Text("No medicine", font_size=26, color=RED_B, weight=BOLD).next_to(g_ctrl, DOWN, buff=0.25) + metric = Text("Difference in recovery time?", font_size=30, color=YELLOW, weight=BOLD).move_to(DOWN * 1.55) + diff_arrow = DoubleArrow(LEFT * 1.7 + DOWN * 2.15, RIGHT * 1.7 + DOWN * 2.15, color=YELLOW, + stroke_width=5, tip_length=0.22, buff=0.1) + + play_at(17.5, FadeOut(hint), + FadeIn(g_treat, t_lbl), FadeIn(g_ctrl, c_lbl), run_time=0.5) # chunk6 + play_at(19.8, FadeIn(metric, shift=UP * 0.08), GrowFromCenter(diff_arrow), run_time=0.5) + beatC = VGroup(g_treat, t_lbl, g_ctrl, c_lbl, metric, diff_arrow) + + # Beat D — all patients bars: medicine 5.6 > none 4.7 (chunk7~9) + c_title = Text("All patients", font_size=34, color=GRAY_A, weight=BOLD).to_edge(UP, buff=0.6) + c_x0 = -1.6 + c_treat = bar_row("Took medicine", 5.6, BLUE, y=0.8, x0=c_x0, fs=26) + c_ctrl = bar_row("No medicine", 4.7, RED, y=-0.5, x0=c_x0, fs=26) + harm = Text("Medicine made it worse?", font_size=32, color=RED_B, weight=BOLD).move_to(DOWN * 2.0) + harm_face = icon("mood-confuzed", RED_B, height=0.7).next_to(harm, LEFT, buff=0.25) + + play_at(23.6, FadeOut(beatC), ReplacementTransform(title, c_title), run_time=0.5) # chunk7 + play_at(25.9, GrowFromEdge(c_treat[1], LEFT), FadeIn(c_treat[0], c_treat[2]), run_time=0.6) # chunk8 + play_at(28.8, GrowFromEdge(c_ctrl[1], LEFT), FadeIn(c_ctrl[0], c_ctrl[2]), run_time=0.6) + play_at(33.3, FadeIn(harm, shift=UP * 0.1), FadeIn(harm_face, shift=UP * 0.1), run_time=0.5) # chunk9 + + # Beat E — trap + split severe/mild (chunk10~11) + trap = Text("But is that true?", font_size=28, color=YELLOW, weight=BOLD).move_to(DOWN * 2.0) + play_at(40.3, FadeOut(harm, harm_face), FadeIn(trap, shift=UP * 0.08), run_time=0.45) # chunk10 + + split_title = Text("Now split by severity", font_size=34, color=YELLOW, weight=BOLD).to_edge(UP, buff=0.55) + play_at(43.8, FadeOut(c_treat, c_ctrl, trap), + ReplacementTransform(c_title, split_title), run_time=0.5) # chunk11 + + # Beat F — severe (7/8), mild (2/3): medicine shorter in each (chunk12~14) + d_x0 = -1.9 + sev_head = Text("Severe", font_size=28, color=BLUE_D, weight=BOLD).move_to(LEFT * 5.1 + UP * 1.55) + sev_t = bar_row("Med", 7, BLUE, y=1.95, x0=d_x0, fs=22) + sev_c = bar_row("No med", 8, RED, y=1.15, x0=d_x0, fs=22) + mild_head = Text("Mild", font_size=28, color=PINK, weight=BOLD).move_to(LEFT * 5.1 + DOWN * 1.15) + mild_t = bar_row("Med", 2, BLUE, y=-0.75, x0=d_x0, fs=22) + mild_c = bar_row("No med", 3, RED, y=-1.55, x0=d_x0, fs=22) + sev_ok = Text("−1 day ✓", font_size=24, color=GREEN_B, weight=BOLD).next_to(sev_c, RIGHT, buff=0.5) + mild_ok = Text("−1 day ✓", font_size=24, color=GREEN_B, weight=BOLD).next_to(mild_c, RIGHT, buff=0.5) + + play_at(47.5, FadeIn(sev_head), GrowFromEdge(sev_t[1], LEFT), FadeIn(sev_t[0], sev_t[2]), run_time=0.55) # chunk12 + play_at(50.3, GrowFromEdge(sev_c[1], LEFT), FadeIn(sev_c[0], sev_c[2]), run_time=0.55) + play_at(52.8, FadeIn(sev_ok, shift=LEFT * 0.1), run_time=0.4) + play_at(56.1, FadeIn(mild_head), GrowFromEdge(mild_t[1], LEFT), FadeIn(mild_t[0], mild_t[2]), run_time=0.55) # chunk13 + play_at(58.8, GrowFromEdge(mild_c[1], LEFT), FadeIn(mild_c[0], mild_c[2]), run_time=0.55) + play_at(61.2, FadeIn(mild_ok, shift=LEFT * 0.1), run_time=0.4) + both_ok = Text("Within each group, medicine clearly helps ✓", font_size=28, color=GREEN_B, weight=BOLD).to_edge(DOWN, buff=0.45) + play_at(63.5, FadeIn(both_ok, shift=UP * 0.1), run_time=0.5) # chunk14 + + # Beat G — reversal: groups (medicine shorter) → overall (medicine longer) (chunk15) + rev_title = Text("But combine them all…", font_size=34, color=ORANGE, weight=BOLD).to_edge(UP, buff=0.55) + ov_x0 = -1.6 + ov_treat = make_bar(5.6, BLUE, y=0.7, x0=ov_x0) + ov_ctrl = make_bar(4.7, RED, y=-0.6, x0=ov_x0) + ov_t_lbl = Text("Took medicine", font_size=26, color=BLUE_B, weight=BOLD) + ov_t_lbl.move_to([ov_x0 - 0.3 - ov_t_lbl.width / 2, 0.7, 0]) + ov_c_lbl = Text("No medicine", font_size=26, color=RED_B, weight=BOLD) + ov_c_lbl.move_to([ov_x0 - 0.3 - ov_c_lbl.width / 2, -0.6, 0]) + ov_t_val = Text("5.6d", font_size=28, color=BLUE_B, weight=BOLD).next_to(ov_treat, RIGHT, buff=0.18) + ov_c_val = Text("4.7d", font_size=28, color=RED_B, weight=BOLD).next_to(ov_ctrl, RIGHT, buff=0.18) + flip = Text("Now medicine looks harmful ✗", font_size=30, color=RED_B, weight=BOLD).to_edge(DOWN, buff=0.5) + + play_at(68.7, + FadeOut(sev_head, mild_head, sev_ok, mild_ok, both_ok, + sev_t[0], sev_t[2], sev_c[0], sev_c[2], mild_t[0], mild_t[2], mild_c[0], mild_c[2]), + ReplacementTransform(split_title, rev_title), run_time=0.45) # chunk15 + play_at(69.7, + ReplacementTransform(VGroup(sev_t[1], mild_t[1]), ov_treat), + ReplacementTransform(VGroup(sev_c[1], mild_c[1]), ov_ctrl), + FadeIn(ov_t_lbl, ov_c_lbl, ov_t_val, ov_c_val), run_time=1.3) + play_at(71.8, FadeIn(flip, shift=UP * 0.1), run_time=0.5) + + # Beat H — why? → IPW (chunk16~17) + why = Text("Why does this happen?", font_size=36, color=YELLOW, weight=BOLD).move_to(DOWN * 2.6) + play_at(74.9, FadeIn(why, scale=1.1), run_time=0.4) # chunk16 + + ipw = Text("IPW", font_size=96, color=ORANGE, weight=BOLD) + ipw_sub = Text("Inverse Probability Weighting", font_size=34, color=WHITE, weight=BOLD) + ipw_group = VGroup(ipw, ipw_sub).arrange(DOWN, buff=0.35).move_to(ORIGIN) + + play_at(76.6, + FadeOut(rev_title, ov_treat, ov_ctrl, ov_t_lbl, ov_c_lbl, ov_t_val, ov_c_val, flip, why), + run_time=0.35) # chunk17 + play_at(77.05, Write(ipw), run_time=0.6) + play_at(78.3, FadeIn(ipw_sub, shift=UP * 0.1), run_time=0.4) + + go_to(80) diff --git a/videos/what_is_ipw/en/src/02_ipw_application.py b/videos/what_is_ipw/en/src/02_ipw_application.py new file mode 100644 index 0000000..517384c --- /dev/null +++ b/videos/what_is_ipw/en/src/02_ipw_application.py @@ -0,0 +1,331 @@ +from manim import * +import numpy as np + +ICON = "/Users/jhkim/Desktop/personal_study/causal/causal-studio/videos/assets/tabler-icons/icons/outline" + +SEVERE = "#b07cff" # severe +MILD = "#39d98a" # mild + + +class IPWApplication(Scene): + """Scene 02 (EN). Same animations as KO, English text, resynced. + timings: build/audio/02_ipw_application_en.timings.json (total 148.81s, 36 chunks) + """ + + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + def dot(color, r=0.16, fill=0.9): + return Circle(radius=r, stroke_color=color, stroke_width=2).set_fill(color, opacity=fill) + + def ring(color, r=0.16): + return Circle(radius=r, stroke_color=color, stroke_width=2.5).set_fill(color, opacity=0.0) + + def people_block(n_sev, n_mild, cols=4, r=0.16, buff=0.16): + g = VGroup(*[dot(SEVERE, r) for _ in range(n_sev)], + *[dot(MILD, r) for _ in range(n_mild)]) + rows = int(np.ceil(len(g) / cols)) + g.arrange_in_grid(rows=rows, cols=cols, buff=buff) + return g + + def icon(name, color, height=1.0): + m = SVGMobject(f"{ICON}/{name}.svg") + m.set_stroke(color, width=3) + m.set_fill(color, opacity=0) + m.scale_to_fit_height(height) + return m + + UNIT = 0.62 + + def make_bar(days, color, y, x0, unit=UNIT): + bar = RoundedRectangle(width=days * unit, height=0.46, corner_radius=0.07, + stroke_color=color, stroke_width=2.5, fill_color=color, fill_opacity=0.5) + bar.move_to([x0 + days * unit / 2, y, 0]) + return bar + + # Beat 1 — recap of Simpson's paradox (chunk1~5) + recap = Text("Let's recap", font_size=30, color=GRAY_A, weight=BOLD).to_edge(UP, buff=0.7) + row_group = VGroup( + Text("Within groups", font_size=30, color=GRAY_A, weight=BOLD), + Text("medicine → faster", font_size=32, color=GREEN_B, weight=BOLD), + Text("✓", font_size=40, color=GREEN_B, weight=BOLD), + ).arrange(RIGHT, buff=0.5).move_to(UP * 1.55) + row_total = VGroup( + Text("Overall", font_size=30, color=GRAY_A, weight=BOLD), + Text("medicine → slower", font_size=32, color=RED_B, weight=BOLD), + Text("✗", font_size=40, color=RED_B, weight=BOLD), + ).arrange(RIGHT, buff=0.5).move_to(UP * 0.45) + paradox = Text("Simpson's Paradox", font_size=40, color=ORANGE, weight=BOLD).move_to(DOWN * 1.6) + why = Text("Why does this happen?", font_size=34, color=YELLOW, weight=BOLD).move_to(DOWN * 2.9) + + play_at(0.3, FadeIn(recap), run_time=0.35) + play_at(2.4, FadeIn(row_group, shift=UP * 0.1), run_time=0.45) # chunk2 + play_at(7.5, FadeIn(row_total, shift=UP * 0.1), run_time=0.45) # chunk3 + play_at(18.9, FadeIn(paradox, shift=UP * 0.1), run_time=0.5) # chunk4 + play_at(21.5, FadeIn(why), run_time=0.4) # chunk5 + + # Beat 2 — group makeup with dots + emphasis (chunk6~10) + title2 = Text("Simpson's Paradox", font_size=34, color=ORANGE, weight=BOLD).to_edge(UP, buff=0.45) + legend = VGroup( + dot(SEVERE, 0.13), Text("Severe", font_size=22, color=SEVERE, weight=BOLD), + dot(MILD, 0.13), Text("Mild", font_size=22, color=MILD, weight=BOLD), + ).arrange(RIGHT, buff=0.22).next_to(title2, DOWN, buff=0.25) + treat_block = people_block(5, 2, cols=4).scale(1.05).move_to(LEFT * 3.4 + DOWN * 0.3) + treat_lbl = Text("Took medicine", font_size=27, color=BLUE_B, weight=BOLD).next_to(treat_block, UP, buff=0.4) + ctrl_block = people_block(1, 2, cols=3).scale(1.05).move_to(RIGHT * 3.4 + DOWN * 0.3) + ctrl_lbl = Text("No medicine", font_size=27, color=RED_B, weight=BOLD).next_to(ctrl_block, UP, buff=0.4) + diff_note = Text("The two groups differed from the start", font_size=30, color=GRAY_A, weight=BOLD).move_to(DOWN * 2.9) + treat_box = SurroundingRectangle(VGroup(treat_lbl, treat_block), color=BLUE_B, buff=0.25, corner_radius=0.15) + ctrl_box = SurroundingRectangle(VGroup(ctrl_lbl, ctrl_block), color=RED_B, buff=0.25, corner_radius=0.15) + + play_at(23.1, FadeOut(recap, row_group, row_total, paradox, why), + FadeIn(title2, legend), run_time=0.4) # chunk6 + play_at(25.3, FadeIn(treat_lbl), LaggedStartMap(FadeIn, treat_block, lag_ratio=0.12), run_time=1.0) # chunk7 + play_at(26.7, Create(treat_box), + *[Indicate(treat_block[i], color=SEVERE, scale_factor=1.4) for i in range(5)], run_time=1.3) + play_at(28.4, FadeOut(treat_box), run_time=0.3) + play_at(30.3, FadeIn(ctrl_lbl), LaggedStartMap(FadeIn, ctrl_block, lag_ratio=0.15), run_time=0.9) # chunk8 + play_at(31.6, Create(ctrl_box), + *[Indicate(ctrl_block[i], color=MILD, scale_factor=1.4) for i in range(1, 3)], run_time=1.3) + play_at(33.4, FadeOut(ctrl_box), run_time=0.3) + play_at(36, FadeIn(diff_note, shift=UP * 0.1), run_time=0.5) # chunk9 + play_at(45.3, FadeOut(title2, legend, treat_block, treat_lbl, ctrl_block, ctrl_lbl, diff_note), run_time=0.3) + + # Beat 3 — confounder (+ treatment→outcome arrow) (chunk11~12) + conf = VGroup( + Text("Confounder", font_size=36, color=PURPLE_A, weight=BOLD), + Text("Severe / Mild", font_size=26, color=GRAY_B, weight=BOLD), + ).arrange(DOWN, buff=0.15).move_to(UP * 1.9) + node_t = Text("Took medicine?", font_size=30, color=BLUE_B, weight=BOLD).move_to(LEFT * 3.3 + DOWN * 1.4) + node_y = Text("Recovery time", font_size=30, color=GREEN_B, weight=BOLD).move_to(RIGHT * 3.3 + DOWN * 1.4) + a1 = Arrow(conf.get_bottom(), node_t.get_top(), color=PURPLE_A, stroke_width=5, buff=0.25) + a2 = Arrow(conf.get_bottom(), node_y.get_top(), color=PURPLE_A, stroke_width=5, buff=0.25) + a3 = Arrow(node_t.get_right(), node_y.get_left(), color=BLUE_B, stroke_width=5, buff=0.3) + + play_at(46.5, FadeIn(conf, shift=DOWN * 0.1), run_time=0.5) # chunk11 + play_at(48.3, GrowArrow(a1), GrowArrow(a2), FadeIn(node_t, node_y), run_time=0.7) + play_at(49.8, GrowArrow(a3), run_time=0.6) + play_at(51.9, Indicate(conf[1], color=PURPLE_A, scale_factor=1.2), run_time=0.8) # chunk12 + + # Beat 4 — idea + per-individual weight (chunk15~17) + idea = Text("What if both groups had the same mix?", font_size=38, color=YELLOW, weight=BOLD).move_to(UP * 1.6) + keep = Text("Keep the data as is", font_size=30, color=GRAY_A, weight=BOLD).move_to(UP * 0.4) + ppl = VGroup(dot(SEVERE), dot(SEVERE), dot(MILD), dot(SEVERE), dot(MILD)).arrange(RIGHT, buff=0.9).move_to(DOWN * 0.9) + badges = VGroup(*[Text("×?", font_size=22, color=ORANGE, weight=BOLD).next_to(d, UP, buff=0.12) for d in ppl]) + wgt_note = Text("Give each person a weight", font_size=34, color=ORANGE, weight=BOLD).move_to(DOWN * 2.1) + + play_at(58.3, FadeOut(conf, node_t, node_y, a1, a2, a3), + FadeIn(idea, shift=UP * 0.1), run_time=0.5) # chunk15 + play_at(64.1, FadeIn(keep), run_time=0.4) # chunk16 + play_at(65.4, LaggedStartMap(FadeIn, ppl, lag_ratio=0.1), + LaggedStartMap(FadeIn, badges, shift=DOWN * 0.1, lag_ratio=0.1), + FadeIn(wgt_note), run_time=1.3) + + # Beat 5 — probability of taking medicine (chunk17~20) + prob_box = RoundedRectangle(width=11.5, height=4.6, corner_radius=0.25, + stroke_color=GRAY_B, stroke_width=2, fill_opacity=0).move_to(UP * 0.1) + prob_title = Text("Probability of taking medicine", font_size=34, color=WHITE, weight=BOLD).move_to(prob_box.get_top() + DOWN * 0.5) + sev_lbl = Text("Severe", font_size=26, color=SEVERE, weight=BOLD).move_to(LEFT * 4.85 + UP * 0.55) + mild_lbl = Text("Mild", font_size=26, color=MILD, weight=BOLD).move_to(LEFT * 4.85 + DOWN * 1.2) + LEFTX = -3.0 + sev_circ = VGroup(*[ring(SEVERE, 0.24) for _ in range(6)]).arrange(RIGHT, buff=0.26) + sev_circ.move_to([0, 0.55, 0]).shift(RIGHT * (LEFTX - sev_circ.get_left()[0])) + mild_circ = VGroup(*[ring(MILD, 0.24) for _ in range(4)]).arrange(RIGHT, buff=0.26) + mild_circ.move_to([0, -1.2, 0]).shift(RIGHT * (LEFTX - mild_circ.get_left()[0])) + sev_cnt = Text("5 of 6", font_size=22, color=SEVERE, weight=BOLD).move_to(RIGHT * 4.1 + UP * 0.9) + sev_frac = MathTex(r"\tfrac{5}{6}\approx 83\%", color=SEVERE).scale(0.8).move_to(RIGHT * 4.1 + UP * 0.25) + mild_cnt = Text("2 of 4", font_size=22, color=MILD, weight=BOLD).move_to(RIGHT * 4.1 + DOWN * 0.85) + mild_frac = MathTex(r"\tfrac{2}{4}=50\%", color=MILD).scale(0.8).move_to(RIGHT * 4.1 + DOWN * 1.5) + legend2 = Text("Colored = took medicine", font_size=22, color=GRAY_B, weight=BOLD).move_to(prob_box.get_bottom() + UP * 0.4) + + play_at(70.2, FadeOut(idea, keep, ppl, badges, wgt_note), + Create(prob_box), FadeIn(prob_title), run_time=0.6) # chunk17 + play_at(71.9, FadeIn(sev_lbl, mild_lbl), + LaggedStartMap(FadeIn, VGroup(*sev_circ, *mild_circ), lag_ratio=0.1), + FadeIn(legend2), run_time=1.4) # chunk18 + play_at(76.4, *[sev_circ[i].animate.set_fill(SEVERE, opacity=0.9) for i in range(5)], + FadeIn(sev_cnt), run_time=1.3) # chunk19 + play_at(81.3, Write(sev_frac), run_time=0.7) + play_at(84.7, *[mild_circ[i].animate.set_fill(MILD, opacity=0.9) for i in range(2)], + FadeIn(mild_cnt), run_time=1.1) # chunk20 + play_at(87.5, Write(mild_frac), run_time=0.7) + beat5 = VGroup(prob_box, prob_title, sev_lbl, sev_circ, sev_cnt, sev_frac, + mild_lbl, mild_circ, mild_cnt, mild_frac, legend2) + + # Beat 6 — weight = 1/prob, pull the people out of the probability box (chunk21~24) + play_at(90.2, beat5.animate.scale(0.5).to_edge(LEFT, buff=0.3), run_time=0.7) # chunk21 + whead = Text("Weight = 1 ÷ probability", font_size=32, color=ORANGE, weight=BOLD).move_to(RIGHT * 2.7 + UP * 2.55) + link = Arrow(beat5.get_right() + UP * 0.5, whead.get_left() + DOWN * 0.2, + color=ORANGE, stroke_width=4, buff=0.2) + + def dd_cluster(flags, color, y): + g = VGroup(*[(dot(color, 0.11) if f else ring(color, 0.11)) for f in flags]).arrange(RIGHT, buff=0.08) + g.move_to([1.2, y, 0]) + return g + + dd1 = dd_cluster([True] * 5, SEVERE, 1.35) + dd2 = dd_cluster([False], SEVERE, 0.25) + dd3 = dd_cluster([True] * 2, MILD, -0.85) + dd4 = dd_cluster([False] * 2, MILD, -1.95) + + def wbody(label, color, tex, anchor): + lab = Text(label, font_size=19, color=color, weight=BOLD) + m = MathTex(tex, color=color).scale(0.66) + body = VGroup(lab, m).arrange(RIGHT, buff=0.26) + body.next_to(anchor, RIGHT, buff=0.3) + return body + + b1 = wbody("Severe · med", SEVERE, r"1 \div \tfrac{5}{6} = 1.2", dd1) + b2 = wbody("Severe · no med", SEVERE, r"1 \div \tfrac{1}{6} = 6", dd2) + b3 = wbody("Mild · med", MILD, r"1 \div \tfrac{2}{4} = 2", dd3) + b4 = wbody("Mild · no med", MILD, r"1 \div \tfrac{2}{4} = 2", dd4) + rare = Text("Rarer cases count more", font_size=23, color=YELLOW, weight=BOLD).move_to([2.6, -2.95, 0]) + + play_at(91.2, GrowArrow(link), FadeIn(whead), run_time=0.5) # chunk21 + play_at(93.6, FadeIn(rare, shift=UP * 0.08), run_time=0.4) # chunk22 + play_at(97.4, TransformFromCopy(VGroup(*[sev_circ[i] for i in range(5)]), dd1), + FadeIn(b1, shift=RIGHT * 0.1), run_time=0.9) # chunk23 + play_at(100.2, TransformFromCopy(VGroup(sev_circ[5]), dd2), + FadeIn(b2, shift=RIGHT * 0.1), run_time=0.9) + play_at(102.1, Indicate(b2[1], color=YELLOW, scale_factor=1.2), run_time=0.7) + play_at(104.2, TransformFromCopy(VGroup(*[mild_circ[i] for i in range(2)]), dd3), + FadeIn(b3, shift=RIGHT * 0.1), run_time=0.8) # chunk24 + play_at(106, TransformFromCopy(VGroup(*[mild_circ[i] for i in range(2, 4)]), dd4), + FadeIn(b4, shift=RIGHT * 0.1), run_time=0.8) + play_at(107.1, FadeOut(beat5, link, whead, dd1, dd2, dd3, dd4, b1, b2, b3, b4, rare), run_time=0.35) + + # Beat 7 — reweight to balance (chunk25~28) + btitle = Text("After reweighting", font_size=34, color=ORANGE, weight=BOLD).to_edge(UP, buff=0.4) + bt_t = people_block(5, 2, cols=4, r=0.14, buff=0.2).move_to(LEFT * 3.0 + UP * 1.15) + bt_t_l = Text("Med", font_size=22, color=BLUE_B, weight=BOLD).next_to(bt_t, UP, buff=0.2) + bt_c = people_block(1, 2, cols=3, r=0.14, buff=0.2).move_to(RIGHT * 3.0 + UP * 1.15) + bt_c_l = Text("No med", font_size=22, color=RED_B, weight=BOLD).next_to(bt_c, UP, buff=0.2) + orig_l = Text("Before", font_size=22, color=GRAY_B, weight=BOLD).move_to(LEFT * 5.4 + UP * 1.15) + bt_t_badges = VGroup( + *[Text("×1.2", font_size=14, color=ORANGE, weight=BOLD).next_to(bt_t[i], DOWN, buff=0.05) for i in range(5)], + *[Text("×2", font_size=14, color=ORANGE, weight=BOLD).next_to(bt_t[i], DOWN, buff=0.05) for i in range(5, 7)], + ) + bt_c_badges = VGroup( + Text("×6", font_size=15, color=YELLOW, weight=BOLD).next_to(bt_c[0], DOWN, buff=0.05), + *[Text("×2", font_size=14, color=ORANGE, weight=BOLD).next_to(bt_c[i], DOWN, buff=0.05) for i in range(1, 3)], + ) + af_t = people_block(6, 4, cols=5, r=0.14, buff=0.2).move_to(LEFT * 3.0 + DOWN * 1.7) + af_c = people_block(6, 4, cols=5, r=0.14, buff=0.2).move_to(RIGHT * 3.0 + DOWN * 1.7) + after_l = Text("After", font_size=22, color=GRAY_B, weight=BOLD).move_to(LEFT * 5.4 + DOWN * 1.7) + approx = MathTex(r"\approx", color=GREEN_B).scale(1.6).move_to(DOWN * 1.7) + arr_t = Arrow(bt_t.get_bottom() + DOWN * 0.15, af_t.get_top(), color=ORANGE, stroke_width=4, buff=0.15) + arr_c = Arrow(bt_c.get_bottom() + DOWN * 0.15, af_c.get_top(), color=ORANGE, stroke_width=4, buff=0.15) + balance_note = Text("Same severe/mild mix in both groups now", font_size=26, color=GRAY_A, weight=BOLD).to_edge(DOWN, buff=0.35) + ipw_name = Text("Inverse Probability Weighting (IPW)", font_size=30, color=ORANGE, weight=BOLD).next_to(btitle, DOWN, buff=0.12) + + play_at(108.4, FadeIn(btitle, bt_t, bt_t_l, orig_l, bt_c, bt_c_l), run_time=0.5) # chunk25 + play_at(109.4, LaggedStartMap(FadeIn, bt_t_badges, shift=DOWN * 0.05, lag_ratio=0.05), + LaggedStartMap(FadeIn, bt_c_badges, shift=DOWN * 0.05, lag_ratio=0.05), run_time=1.0) + play_at(112.8, GrowArrow(arr_t), GrowArrow(arr_c), run_time=0.5) # chunk26 + play_at(113.6, Indicate(bt_c_badges[0], color=YELLOW, scale_factor=1.6), + LaggedStartMap(FadeIn, af_t, lag_ratio=0.06), + LaggedStartMap(FadeIn, af_c, lag_ratio=0.06), FadeIn(after_l), run_time=2.0) + play_at(119.7, FadeIn(approx, scale=1.2), FadeIn(balance_note), run_time=0.6) # chunk27 + play_at(125.1, FadeIn(ipw_name, shift=DOWN * 0.1), run_time=0.5) # chunk28 + play_at(127.6, FadeOut(btitle, ipw_name, bt_t, bt_t_l, bt_c, bt_c_l, orig_l, approx, + bt_t_badges, bt_c_badges, af_t, af_c, after_l, arr_t, arr_c, balance_note), run_time=0.4) + + # Beat 8 — weighted average: general formula first → groups (chunk29~34) + dtitle = Text("Weighted average comparison", font_size=34, color=WHITE, weight=BOLD).to_edge(UP, buff=0.35) + gnum = Text("Sum of (days × weight)", font_size=24, color=WHITE, weight=BOLD) + gbar = Line(LEFT, RIGHT, color=GRAY_A, stroke_width=2.5).set_width(gnum.width + 0.5) + gden = Text("Sum of weights", font_size=24, color=GRAY_A, weight=BOLD) + gfrac = VGroup(gnum, gbar, gden).arrange(DOWN, buff=0.12) + glbl = Text("Weighted avg =", font_size=26, color=ORANGE, weight=BOLD) + general = VGroup(glbl, gfrac).arrange(RIGHT, buff=0.3).move_to(UP * 1.7) + + def calc_block(days_sev, days_mild, result, color_lbl, label, y): + def term(grp_label, expr, color): + lbl = Text(grp_label, font_size=18, color=color, weight=BOLD) + m = MathTex(expr, color=color).scale(0.72) + return VGroup(lbl, m).arrange(DOWN, buff=0.08) + sev_chip = term("Sev", rf"{days_sev}\times 6", SEVERE) + mild_chip = term("Mild", rf"{days_mild}\times 4", MILD) + plus = MathTex("+", color=WHITE).scale(0.8) + numer = VGroup(sev_chip, plus, mild_chip).arrange(RIGHT, buff=0.28) + bar = Line(LEFT, RIGHT, color=GRAY_A, stroke_width=2.5) + bar.set_width(numer.width + 0.4).next_to(numer, DOWN, buff=0.12) + denom = MathTex("10", color=GRAY_A).scale(0.8).next_to(bar, DOWN, buff=0.08) + frac = VGroup(numer, bar, denom) + eq = MathTex(rf"= {result}", color=color_lbl).scale(1.0).next_to(frac, RIGHT, buff=0.35) + lab = Text(label, font_size=24, color=color_lbl, weight=BOLD).next_to(frac, LEFT, buff=0.45) + grp = VGroup(lab, frac, eq).scale(0.9) + grp.move_to([0.5, y, 0]) + return grp + + treat_calc = calc_block(7, 2, "5", BLUE_B, "Took med", 0.1) + ctrl_calc = calc_block(8, 3, "6", RED_B, "No med", -1.45) + meaning = Text("6 = severe weight · 4 = mild weight · 10 = total", + font_size=20, color=GRAY_B, weight=BOLD).to_edge(DOWN, buff=0.4) + + play_at(129.4, FadeIn(dtitle), run_time=0.35) # chunk29 + play_at(132.6, FadeIn(general, shift=UP * 0.08), run_time=0.6) # chunk30 + play_at(138.2, FadeIn(treat_calc, shift=UP * 0.08), run_time=0.6) # chunk31 + play_at(142.9, FadeIn(ctrl_calc, shift=UP * 0.08), FadeIn(meaning), run_time=0.6) # chunk32 + + bx0 = -1.3 + b_treat = make_bar(5.6, BLUE, y=1.0, x0=bx0) + b_ctrl = make_bar(4.7, RED, y=-0.3, x0=bx0) + b_t_lbl = Text("Took medicine", font_size=24, color=BLUE_B, weight=BOLD) + b_t_lbl.move_to([bx0 - 0.3 - b_t_lbl.width / 2, 1.0, 0]) + b_c_lbl = Text("No medicine", font_size=24, color=RED_B, weight=BOLD) + b_c_lbl.move_to([bx0 - 0.3 - b_c_lbl.width / 2, -0.3, 0]) + b_t_val = Text("5.6d", font_size=24, color=BLUE_B, weight=BOLD).next_to(b_treat, RIGHT, buff=0.15) + b_c_val = Text("4.7d", font_size=24, color=RED_B, weight=BOLD).next_to(b_ctrl, RIGHT, buff=0.15) + state_lbl = Text("Before reweighting", font_size=26, color=GRAY_A, weight=BOLD).to_edge(UP, buff=1.15) + + play_at(146.8, FadeOut(dtitle, general, treat_calc, ctrl_calc, meaning), + FadeIn(state_lbl, b_t_lbl, b_c_lbl), + GrowFromEdge(b_treat, LEFT), GrowFromEdge(b_ctrl, LEFT), + FadeIn(b_t_val, b_c_val), run_time=0.8) # chunk33 + nt_treat = make_bar(5.0, BLUE, y=1.0, x0=bx0) + nt_ctrl = make_bar(6.0, RED, y=-0.3, x0=bx0) + state2 = Text("After IPW", font_size=26, color=ORANGE, weight=BOLD).to_edge(UP, buff=1.15) + eff = Text("Medicine: 1 day faster ✓", font_size=32, color=GREEN_B, weight=BOLD).to_edge(DOWN, buff=0.7) + play_at(149.5, + Transform(b_treat, nt_treat), Transform(b_ctrl, nt_ctrl), + ReplacementTransform(state_lbl, state2), + b_t_val.animate.become(Text("5d", font_size=24, color=BLUE_B, weight=BOLD).next_to(nt_treat, RIGHT, buff=0.15)), + b_c_val.animate.become(Text("6d", font_size=24, color=RED_B, weight=BOLD).next_to(nt_ctrl, RIGHT, buff=0.15)), + run_time=1.6) + play_at(151.7, FadeIn(eff, shift=UP * 0.1), run_time=0.5) + play_at(154, Indicate(eff, color=GREEN_B, scale_factor=1.1), run_time=0.8) # chunk34 + play_at(158.2, FadeOut(state2, b_treat, b_ctrl, b_t_lbl, b_c_lbl, b_t_val, b_c_val, eff), run_time=0.4) + + # Beat 9 — conclusion: observational → IPW → like a randomized trial (chunk35~36) + obs_t = people_block(4, 1, cols=3, r=0.17, buff=0.14) + obs_c = people_block(1, 3, cols=2, r=0.17, buff=0.14) + obs = VGroup(obs_t, obs_c).arrange(RIGHT, buff=0.7).move_to(LEFT * 4.2 + UP * 0.2) + obs_lbl = Text("Observational data", font_size=28, color=GRAY_A, weight=BOLD).next_to(obs, DOWN, buff=0.5) + shuffle = icon("arrows-shuffle", ORANGE, height=1.3).move_to(UP * 0.2) + shuffle_lbl = Text("IPW", font_size=34, color=ORANGE, weight=BOLD).next_to(shuffle, DOWN, buff=0.25) + arr = Arrow(LEFT * 1.7, RIGHT * 1.7, color=GRAY_B, stroke_width=5).move_to(UP * 0.2) + bal_t = people_block(3, 2, cols=3, r=0.17, buff=0.14) + bal_c = people_block(3, 2, cols=3, r=0.17, buff=0.14) + bal = VGroup(bal_t, bal_c).arrange(RIGHT, buff=0.7).move_to(RIGHT * 4.2 + UP * 0.2) + bal_lbl = Text("Like a randomized trial", font_size=28, color=GREEN_B, weight=BOLD).next_to(bal, DOWN, buff=0.5) + concl = Text("Now we can compare it like a randomized trial", font_size=30, color=WHITE, weight=BOLD).to_edge(DOWN, buff=0.6) + + play_at(159.4, FadeIn(obs, shift=RIGHT * 0.1), FadeIn(obs_lbl), run_time=0.7) # chunk35 + play_at(161.6, GrowArrow(arr), FadeIn(shuffle, shuffle_lbl), run_time=0.7) + play_at(163.6, FadeIn(bal, shift=RIGHT * 0.1), FadeIn(bal_lbl), run_time=0.8) + play_at(165.1, FadeIn(concl, shift=UP * 0.1), run_time=0.6) # chunk36 + + go_to(171.1) diff --git a/videos/what_is_ipw/en/src/03_ipw_formula.py b/videos/what_is_ipw/en/src/03_ipw_formula.py new file mode 100644 index 0000000..b53134c --- /dev/null +++ b/videos/what_is_ipw/en/src/03_ipw_formula.py @@ -0,0 +1,114 @@ +from manim import * + +ICON = "/Users/jhkim/Desktop/personal_study/causal/causal-studio/videos/assets/tabler-icons/icons/outline" + +SEVERE = "#b07cff" +MILD = "#39d98a" + + +class IPWFormula(Scene): + """Scene 03 (EN). Same animations, English text, resynced. + timings: build/audio/03_ipw_formula_en.timings.json (total 53.28s, 13 chunks) + """ + + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + def icon(name, color, height=1.0): + m = SVGMobject(f"{ICON}/{name}.svg") + m.set_stroke(color, width=3) + m.set_fill(color, opacity=0) + m.scale_to_fit_height(height) + return m + + # Beat 1~2 — symbols on one person (chunk1~5) + person = icon("user", WHITE, height=1.5).move_to(LEFT * 1.6 + UP * 2.0) + LEFT_ANCHOR = -4.9 + + def define(sym, gloss, color, y): + m = MathTex(sym, color=color).scale(1.05) + colon = Text(":", font_size=30, color=GRAY_B) + g = Text(gloss, font_size=24, color=GRAY_A, weight=BOLD) + row = VGroup(m, colon, g).arrange(RIGHT, buff=0.28) + row.move_to([0, y, 0]) + row.shift(RIGHT * (LEFT_ANCHOR - m.get_left()[0])) + return row, m + + T_def, T_sym = define("T", "took medicine ( 1 / 0 )", BLUE_B, 0.55) + X_def, X_sym = define("X", "patient state ( severe / mild )", SEVERE, -0.55) + e_def, e_sym = define("e(X)=P(T=1|X)", "prob. of taking medicine", ORANGE, -1.65) + + play_at(0.5, FadeIn(person, shift=DOWN * 0.1), run_time=0.5) + play_at(4.9, FadeIn(T_def, shift=RIGHT * 0.1), run_time=0.5) # chunk3 + play_at(9.8, FadeIn(X_def, shift=RIGHT * 0.1), run_time=0.5) # chunk4 + play_at(13.7, FadeIn(e_def, shift=RIGHT * 0.1), run_time=0.5) # chunk5 + + # Beat 3 — weight = reciprocal of probability, two cases (chunk6~9) + play_at(17.3, FadeOut(person, T_def, X_def), run_time=0.4) # chunk6 + play_at(18.3, e_def.animate.scale(0.85).to_edge(UP, buff=0.5), run_time=0.5) + rule = Text("Rule: weight = 1 / probability", font_size=30, color=ORANGE, weight=BOLD).move_to(UP * 1.55) + play_at(19.6, FadeIn(rule), run_time=0.5) + + case_t = VGroup( + Text("Took medicine", font_size=28, color=BLUE_B, weight=BOLD), + MathTex(r"(T=1)", color=BLUE_B).scale(0.65), + MathTex(r"w = \frac{1}{e(X)}", color=BLUE_B).scale(1.2), + ).arrange(RIGHT, buff=0.4).move_to(UP * 0.55) + case_c = VGroup( + Text("No medicine", font_size=28, color=RED_B, weight=BOLD), + MathTex(r"(T=0)", color=RED_B).scale(0.65), + MathTex(r"w = \frac{1}{1 - e(X)}", color=RED_B).scale(1.2), + ).arrange(RIGHT, buff=0.4).move_to(DOWN * 1.1) + rare = Text("The rarer the event, the larger the weight", font_size=26, color=YELLOW, weight=BOLD).to_edge(DOWN, buff=0.55) + + play_at(23.9, FadeIn(case_t, shift=UP * 0.08), run_time=0.6) # chunk7 + play_at(28.9, FadeIn(case_c, shift=UP * 0.08), run_time=0.5) # chunk8 + play_at(36.3, FadeIn(rare), run_time=0.45) # chunk9 + play_at(39.6, FadeOut(e_def, rule, case_t, case_c, rare), run_time=0.35) + + # Beat 4 — combine into one formula (chunk10~13) + combined = MathTex("w", "=", r"\frac{T}{e(X)}", "+", r"\frac{1-T}{1-e(X)}").scale(1.25) + combined.move_to(UP * 0.9) + combined[2].set_color(BLUE_B) + combined[4].set_color(RED_B) + + brace_t = Brace(combined[2], DOWN, color=BLUE_B) + brace_c = Brace(combined[4], DOWN, color=RED_B) + lbl_t = Text("Took medicine", font_size=24, color=BLUE_B, weight=BOLD).next_to(brace_t, DOWN, buff=0.15).shift(LEFT * 0.25) + lbl_c = Text("No medicine", font_size=24, color=RED_B, weight=BOLD).next_to(brace_c, DOWN, buff=0.15).shift(RIGHT * 0.25) + sub1 = MathTex(r"T=1:", r"\ w =", r"\frac{1}{e(X)}", r"+", r"\frac{0}{1-e(X)}").scale(0.82) + sub1[0].set_color(BLUE_B) + sub1[4].set_color(GRAY_D) + sub0 = MathTex(r"T=0:", r"\ w =", r"\frac{0}{e(X)}", r"+", r"\frac{1}{1-e(X)}").scale(0.82) + sub0[0].set_color(RED_B) + sub0[2].set_color(GRAY_D) + subs = VGroup(sub1, sub0).arrange(DOWN, buff=0.45).move_to(DOWN * 0.7) + one = icon("user", WHITE, height=0.55) + one_w = MathTex("w", color=ORANGE).scale(0.8) + one_arrow = Arrow(LEFT * 0.4, RIGHT * 0.4, color=GRAY_B, stroke_width=4, buff=0.05) + closing = VGroup(one, one_arrow, one_w, + Text(" Everyone gets their own weight", font_size=26, color=ORANGE, weight=BOLD) + ).arrange(RIGHT, buff=0.2).to_edge(DOWN, buff=0.5) + + play_at(42.9, Write(combined), run_time=0.8) # chunk10 + play_at(46, GrowFromCenter(brace_t), FadeIn(lbl_t), + GrowFromCenter(brace_c), FadeIn(lbl_c), run_time=0.6) # chunk11 + play_at(50.7, FadeOut(brace_t, brace_c, lbl_t, lbl_c), + combined.animate.to_edge(UP, buff=1.4).scale(0.9), run_time=0.5) # chunk12 + play_at(51.4, FadeIn(sub1, shift=UP * 0.08), run_time=0.45) + play_at(52.5, FadeIn(sub0, shift=UP * 0.08), run_time=0.45) + play_at(56, FadeIn(closing, shift=UP * 0.1), run_time=0.5) # chunk13 + + go_to(61.1) diff --git a/videos/what_is_ipw/en/src/04_propensity_score.py b/videos/what_is_ipw/en/src/04_propensity_score.py new file mode 100644 index 0000000..6e6ad1d --- /dev/null +++ b/videos/what_is_ipw/en/src/04_propensity_score.py @@ -0,0 +1,161 @@ +from manim import * +import numpy as np + +ICON = "/Users/jhkim/Desktop/personal_study/causal/causal-studio/videos/assets/tabler-icons/icons/outline" + +SEVERE = "#b07cff" +MILD = "#39d98a" + + +class PropensityScore(Scene): + """Scene 04 (EN). Same animations, English text, resynced. + timings: build/audio/04_propensity_score_en.timings.json (total 88.18s, 19 chunks) + """ + + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + def icon(name, color, height=1.0): + m = SVGMobject(f"{ICON}/{name}.svg") + m.set_stroke(color, width=3) + m.set_fill(color, opacity=0) + m.scale_to_fit_height(height) + return m + + # Beat 1 — limits of counting by hand → problem (chunk1~2) + recap = VGroup( + VGroup(Text("Severe", font_size=28, color=SEVERE, weight=BOLD), + MathTex(r"\tfrac{5}{6}", color=SEVERE).scale(0.95)).arrange(RIGHT, buff=0.25), + VGroup(Text("Mild", font_size=28, color=MILD, weight=BOLD), + MathTex(r"\tfrac{2}{4}", color=MILD).scale(0.95)).arrange(RIGHT, buff=0.25), + ).arrange(RIGHT, buff=1.2).move_to(UP * 0.3) + recap_lbl = Text("Probabilities counted by hand", font_size=26, color=GRAY_B, weight=BOLD).next_to(recap, UP, buff=0.5) + problem = Text("But there's a problem.", font_size=40, color=YELLOW, weight=BOLD).move_to(DOWN * 1.6) + + play_at(0.4, FadeIn(recap_lbl), FadeIn(recap), run_time=0.5) # chunk1 + play_at(6.1, FadeIn(problem, shift=UP * 0.1), run_time=0.5) # chunk2 + play_at(7.3, FadeOut(recap_lbl, recap, problem), run_time=0.3) + + # Beat 2 — many variables in reality (chunk3~5) + def var_icon(name, label, color): + ic = icon(name, color, height=0.95) + lb = Text(label, font_size=24, color=color, weight=BOLD) + return VGroup(ic, lb).arrange(DOWN, buff=0.25) + + v1 = var_icon("gender-bigender", "Sex", BLUE_B) + v2 = var_icon("calendar", "Age", TEAL_B) + v3 = var_icon("weight", "Weight", GOLD_B) + v4 = var_icon("stethoscope", "Conditions", RED_B) + vars_row = VGroup(v1, v2, v3, v4).arrange(RIGHT, buff=1.0).move_to(UP * 0.3) + hard = Text("Too many confounders to count by hand", font_size=30, color=GRAY_A, weight=BOLD).move_to(DOWN * 2.2) + + play_at(7.9, LaggedStartMap(FadeIn, vars_row, lag_ratio=0.25), run_time=1.6) # chunk3 + play_at(13.1, FadeIn(hard, shift=UP * 0.1), run_time=0.5) # chunk4 + play_at(17.4, FadeOut(vars_row, hard), run_time=0.3) # chunk5 + + # Beat 3 — propensity score: e(X) first, name after (chunk6~7) + ps_title = Text("Propensity score", font_size=44, color=ORANGE, weight=BOLD).move_to(UP * 1.3) + ps_eq = MathTex(r"e(X) = P(\,T=1 \mid X\,)", color=WHITE).scale(1.0).move_to(UP * 0.1) + ps_gloss = Text("Probability of taking medicine, given patient info", font_size=28, color=GRAY_A, weight=BOLD).move_to(DOWN * 1.1) + + play_at(20.3, Write(ps_eq), run_time=0.7) # chunk6 + play_at(22, FadeIn(ps_gloss), run_time=0.5) + play_at(24, FadeIn(ps_title, shift=DOWN * 0.1), run_time=0.5) + play_at(26.5, Indicate(ps_eq, color=ORANGE, scale_factor=1.1), run_time=0.7) # chunk7 + play_at(29.1, FadeOut(ps_title, ps_eq, ps_gloss), run_time=0.3) + + # Beat 4 — model flow + sigmoid (chunk8~13) + in_node = Text("Patient info", font_size=26, color=GRAY_A, weight=BOLD).move_to(LEFT * 4.3) + box_rect = RoundedRectangle(width=3.4, height=1.3, corner_radius=0.2, stroke_color=ORANGE, stroke_width=2.5) + out_node = Text("Prob. of medicine", font_size=26, color=WHITE, weight=BOLD).move_to(RIGHT * 4.3) + flow = VGroup(in_node, box_rect, out_node).arrange(RIGHT, buff=0.9).move_to(UP * 2.2) + ar1 = Arrow(in_node.get_right(), box_rect.get_left(), color=GRAY_B, stroke_width=4, buff=0.2) + ar2 = Arrow(box_rect.get_right(), out_node.get_left(), color=GRAY_B, stroke_width=4, buff=0.2) + box_q = Text("?", font_size=40, color=GRAY_B, weight=BOLD).move_to(box_rect.get_center()) + box_lbl = Text("Logistic regression", font_size=24, color=ORANGE, weight=BOLD).move_to(box_rect.get_center()) + + play_at(29.6, FadeIn(in_node, out_node), Create(box_rect), FadeIn(box_q), + GrowArrow(ar1), GrowArrow(ar2), run_time=0.7) # chunk8 + play_at(34.9, ReplacementTransform(box_q, box_lbl), run_time=0.6) # chunk9 + + ax = Axes(x_range=[-6, 6, 3], y_range=[0, 1, 0.5], x_length=5.2, y_length=2.3, + axis_config={"include_tip": False, "stroke_color": GRAY_B}, + y_axis_config={"include_numbers": True, "font_size": 20}).move_to(DOWN * 1.0) + sig = ax.plot(lambda x: 1 / (1 + np.exp(-x)), x_range=[-6, 6], color=ORANGE, stroke_width=4) + line0 = DashedLine(ax.c2p(-6, 0), ax.c2p(6, 0), color=GRAY_D, stroke_width=1.5) + line1 = DashedLine(ax.c2p(-6, 1), ax.c2p(6, 1), color=GRAY_D, stroke_width=1.5) + sig_name = Text("sigmoid", font_size=26, color=ORANGE, weight=BOLD, slant=ITALIC).next_to(ax, UP, buff=0.12) + + play_at(37.2, Create(ax), FadeIn(line0, line1, sig_name), run_time=0.8) # chunk10 + play_at(38.4, Create(sig), run_time=2.0) + + lo_dot = Dot(ax.c2p(-5, 1 / (1 + np.exp(5))), color=BLUE_B, radius=0.09) + hi_dot = Dot(ax.c2p(5, 1 / (1 + np.exp(-5))), color=GREEN_B, radius=0.09) + lo_lbl = Text("small input → 0", font_size=16, color=BLUE_B, weight=BOLD).move_to(ax.c2p(-3.0, 0)).shift(DOWN * 0.50) + hi_lbl = Text("large input → 1", font_size=16, color=GREEN_B, weight=BOLD).move_to(ax.c2p(3.0, 1)).shift(UP * 0.40) + play_at(43.9, FadeIn(lo_dot), Flash(lo_dot, color=BLUE_B), FadeIn(lo_lbl, shift=UP * 0.08), run_time=0.7) # chunk11 + play_at(45.9, FadeIn(hi_dot), Flash(hi_dot, color=GREEN_B), FadeIn(hi_lbl, shift=DOWN * 0.08), run_time=0.7) + + out_lbl = Text("Always between 0 and 1 → a probability", font_size=26, color=GREEN_B, weight=BOLD).to_edge(DOWN, buff=0.4) + play_at(49.4, FadeIn(out_lbl, shift=UP * 0.1), run_time=0.5) # chunk12 + ehat = MathTex(r"\hat{e}(X)", color=ORANGE).scale(1.0).next_to(out_node, DOWN, buff=0.25) + ehat_lbl = Text("Estimated propensity score", font_size=22, color=GRAY_A, weight=BOLD).next_to(ehat, DOWN, buff=0.12) + play_at(55.7, FadeIn(ehat), FadeIn(ehat_lbl), run_time=0.6) # chunk13 + play_at(59.4, FadeOut(in_node, out_node, box_rect, box_lbl, ar1, ar2, ax, sig, line0, line1, + sig_name, lo_dot, hi_dot, lo_lbl, hi_lbl, out_lbl, ehat, ehat_lbl), run_time=0.4) + + # Beat 5 — IPW with estimated score (chunk14) + big_box = RoundedRectangle(width=9.0, height=2.6, corner_radius=0.25, + stroke_color=GRAY_B, stroke_width=2, fill_opacity=0).move_to(UP * 0.1) + ipw_eq = MathTex(r"\hat{w} = \frac{T}{\hat{e}(X)} + \frac{1-T}{1-\hat{e}(X)}", color=WHITE).scale(1.15).move_to(UP * 0.1) + same = Text("The rest is just like before", font_size=28, color=GRAY_A, weight=BOLD).next_to(big_box, UP, buff=0.4) + + play_at(60.5, FadeIn(same), Create(big_box), run_time=0.6) # chunk14 + play_at(62.4, Write(ipw_eq), run_time=1.0) + play_at(66.6, FadeOut(same, big_box, ipw_eq), run_time=0.35) + + # Beat 6 — when the model is wrong: missing age (chunk15~18) + warn = Text("Only works if the score is estimated well", font_size=30, color=YELLOW, weight=BOLD).to_edge(UP, buff=0.55) + reg = MathTex(r"\text{logit}\,\hat{e}(X) = \beta_0 + \beta_1 x_1 + \beta_2 x_2", color=WHITE).scale(0.88).move_to(UP * 1.6) + reg_legend = Text("x₁ : sex x₂ : symptoms", font_size=22, color=GRAY_A, weight=BOLD).next_to(reg, DOWN, buff=0.25) + miss_expr = VGroup( + MathTex(r"x_3", color=RED_B).scale(0.76), + Text("= age", font_size=22, color=RED_B, weight=BOLD), + ).arrange(RIGHT, buff=0.08) + miss = VGroup( + Text("Missing variable", font_size=22, color=RED_B, weight=BOLD), + miss_expr, + ).arrange(RIGHT, buff=0.25).next_to(reg_legend, DOWN, buff=0.28) + miss_cross = Cross(miss_expr, stroke_color=RED, stroke_width=4) + + ax2 = Axes(x_range=[0, 6, 2], y_range=[0, 6, 2], x_length=4.6, y_length=2.55, + axis_config={"include_tip": False, "stroke_color": GRAY_B, "include_numbers": False}).move_to(DOWN * 1.55) + pts_x = [0.5, 1.3, 2.1, 3.0, 3.9, 4.7, 5.4] + pts_y = [1.0, 3.0, 5.0, 5.3, 4.6, 2.4, 1.0] + dots = VGroup(*[Dot(ax2.c2p(x, y), color=TEAL_B, radius=0.08) for x, y in zip(pts_x, pts_y)]) + bad_line = Line(ax2.c2p(0, 1.3), ax2.c2p(6, 4.6), color=RED_B, stroke_width=3) + graph_lbl = Text("Misses the true pattern", font_size=24, color=RED_B, weight=BOLD).next_to(ax2, RIGHT, buff=0.3) + + play_at(67.8, FadeIn(warn), run_time=0.4) # chunk15~16 + play_at(74.1, FadeIn(reg), FadeIn(reg_legend), run_time=0.5) # chunk17 + play_at(77.5, FadeIn(miss), Create(miss_cross), run_time=0.7) + play_at(80.5, Create(ax2), LaggedStartMap(FadeIn, dots, lag_ratio=0.1), run_time=0.9) + play_at(86.1, Create(bad_line), FadeIn(graph_lbl), run_time=0.7) # chunk18 + + # Beat 7 — closing: keep the screen, just emphasize (chunk19) + play_at(91.8, Indicate(VGroup(ax2, dots, bad_line, graph_lbl), color=RED_B, scale_factor=1.05), run_time=1.0) + play_at(93.8, Indicate(miss, color=RED_B, scale_factor=1.25), run_time=0.9) + + go_to(96.5) diff --git a/videos/what_is_ipw/en/src/05_summary.py b/videos/what_is_ipw/en/src/05_summary.py new file mode 100644 index 0000000..38706a7 --- /dev/null +++ b/videos/what_is_ipw/en/src/05_summary.py @@ -0,0 +1,136 @@ +from manim import * +import numpy as np + +SEVERE = "#b07cff" +MILD = "#39d98a" + + +class IPWSummary(Scene): + """Scene 05 (EN). Same animations, English text, resynced. + timings: build/audio/05_summary_en.timings.json (total 67.25s, 16 chunks) + """ + + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + def dot(color, r=0.13): + return Circle(radius=r, stroke_color=color, stroke_width=2).set_fill(color, opacity=0.9) + + def people_block(n_sev, n_mild, cols=5, r=0.13, buff=0.12): + g = VGroup(*[dot(SEVERE, r) for _ in range(n_sev)], *[dot(MILD, r) for _ in range(n_mild)]) + rows = int(np.ceil(len(g) / cols)) + g.arrange_in_grid(rows=rows, cols=cols, buff=buff) + return g + + def badge(m, txt, color=ORANGE, fs=14): + return Text(txt, font_size=fs, color=color, weight=BOLD).next_to(m, DOWN, buff=0.04) + + # Beat 1 — problem recap: confounding (chunk1~5) + head = Text("To recap", font_size=32, color=GRAY_A, weight=BOLD).to_edge(UP, buff=0.55) + legend = VGroup( + dot(SEVERE), Text("Severe", font_size=24, color=SEVERE, weight=BOLD), + dot(MILD), Text("Mild", font_size=24, color=MILD, weight=BOLD), + ).arrange(RIGHT, buff=0.25).next_to(head, DOWN, buff=0.3) + ques = Text("Does medicine really speed up recovery?", font_size=32, color=YELLOW, weight=BOLD).move_to(UP * 0.9) + treat = people_block(5, 2, cols=4).move_to(LEFT * 3.3 + DOWN * 0.3) + treat_l = Text("Took medicine", font_size=26, color=BLUE_B, weight=BOLD).next_to(treat, UP, buff=0.3) + ctrl = people_block(1, 2, cols=3).move_to(RIGHT * 3.3 + DOWN * 0.3) + ctrl_l = Text("No medicine", font_size=26, color=RED_B, weight=BOLD).next_to(ctrl, UP, buff=0.3) + confound = Text("The confounder (severe/mild) hides the true effect", font_size=28, color=PURPLE_A, weight=BOLD).move_to(DOWN * 2.5) + + play_at(0.4, FadeIn(head), run_time=0.4) + play_at(2, FadeIn(legend, shift=DOWN * 0.05), FadeIn(ques), run_time=0.5) # chunk2 + play_at(5.5, FadeOut(ques), FadeIn(treat_l, treat), FadeIn(ctrl_l, ctrl), run_time=0.6) # chunk3 + treat_box = SurroundingRectangle(VGroup(treat_l, treat), color=BLUE_B, buff=0.2, corner_radius=0.12) + ctrl_box = SurroundingRectangle(VGroup(ctrl_l, ctrl), color=RED_B, buff=0.2, corner_radius=0.12) + play_at(10.2, Create(treat_box), + *[Indicate(treat[i], color=SEVERE, scale_factor=1.4) for i in range(5)], run_time=1.2) # chunk4 + play_at(12.5, ReplacementTransform(treat_box, ctrl_box), + *[Indicate(ctrl[i], color=MILD, scale_factor=1.4) for i in range(1, 3)], run_time=1.2) + play_at(14.3, FadeOut(ctrl_box), run_time=0.3) + play_at(14.7, FadeIn(confound, shift=UP * 0.1), run_time=0.5) # chunk5 + play_at(19, FadeOut(head, legend, treat, treat_l, ctrl, ctrl_l, confound), run_time=0.35) + + # Beat 2 — IPW: weights make the two groups equal + true effect (chunk6~8) + ipw_head = Text("IPW: weights make the two groups equal", font_size=32, color=ORANGE, weight=BOLD).to_edge(UP, buff=0.5) + bt0 = people_block(5, 2, cols=4, buff=0.32).move_to(LEFT * 3.7 + UP * 0.75) + bc0 = people_block(1, 2, cols=3, buff=0.32).move_to(RIGHT * 3.7 + UP * 0.75) + bt_l = Text("Med", font_size=24, color=BLUE_B, weight=BOLD).next_to(bt0, UP, buff=0.25) + bc_l = Text("No med", font_size=24, color=RED_B, weight=BOLD).next_to(bc0, UP, buff=0.25) + bc_l.match_y(bt_l) + bt_badges = VGroup(*[badge(bt0[i], "×1.2") for i in range(5)], *[badge(bt0[i], "×2") for i in range(5, 7)]) + bc_badges = VGroup(badge(bc0[0], "×6", YELLOW, 15), *[badge(bc0[i], "×2") for i in range(1, 3)]) + mul_note = Text("× weight", font_size=26, color=ORANGE, weight=BOLD).move_to(UP * 0.75) + bt1 = people_block(6, 4, cols=5).move_to(LEFT * 3.7 + UP * 0.75) + bc1 = people_block(6, 4, cols=5).move_to(RIGHT * 3.7 + UP * 0.75) + approx = MathTex(r"\approx", color=GREEN_B).scale(1.6).move_to(UP * 0.75) + UNIT = 0.5 + eff_t = VGroup(Text("Med", font_size=22, color=BLUE_B, weight=BOLD), + RoundedRectangle(width=5 * UNIT, height=0.36, corner_radius=0.06, stroke_color=BLUE, + stroke_width=2, fill_color=BLUE, fill_opacity=0.5), + Text("5d", font_size=22, color=BLUE_B, weight=BOLD)).arrange(RIGHT, buff=0.2).move_to(DOWN * 1.5 + LEFT * 0.4) + eff_c = VGroup(Text("No med", font_size=22, color=RED_B, weight=BOLD), + RoundedRectangle(width=6 * UNIT, height=0.36, corner_radius=0.06, stroke_color=RED, + stroke_width=2, fill_color=RED, fill_opacity=0.5), + Text("6d", font_size=22, color=RED_B, weight=BOLD)).arrange(RIGHT, buff=0.2).move_to(DOWN * 2.3 + LEFT * 0.4) + eff_lbl = Text("True effect: 1 day faster ✓", font_size=26, color=GREEN_B, weight=BOLD).to_edge(DOWN, buff=0.4) + + play_at(19.3, FadeIn(ipw_head), FadeIn(bt_l, bt0, bc_l, bc0), run_time=0.5) # chunk6 + play_at(21.7, LaggedStartMap(FadeIn, bt_badges, shift=DOWN * 0.05, lag_ratio=0.05), + LaggedStartMap(FadeIn, bc_badges, shift=DOWN * 0.05, lag_ratio=0.05), FadeIn(mul_note), run_time=1.0) # chunk7 + play_at(23.5, FadeOut(bt_badges, bc_badges, mul_note), + ReplacementTransform(bt0, bt1), ReplacementTransform(bc0, bc1), FadeIn(approx), run_time=1.0) + play_at(28.2, FadeIn(eff_t, eff_c), run_time=0.5) # chunk8 + play_at(30, FadeIn(eff_lbl, shift=UP * 0.1), run_time=0.5) + play_at(32.55, FadeOut(ipw_head, bt1, bt_l, bc1, bc_l, approx, eff_t, eff_c, eff_lbl), run_time=0.35) + + # Beat 3 — real data: sigmoid figure + per-individual weight (chunk9~13) + real_head = Text("In real data", font_size=30, color=GRAY_A, weight=BOLD).to_edge(UP, buff=0.5) + ax = Axes(x_range=[-6, 6, 3], y_range=[0, 1, 1], x_length=4.4, y_length=2.1, + axis_config={"include_tip": False, "stroke_color": GRAY_B, "include_numbers": False}).move_to(LEFT * 3.7 + UP * 0.25) + sig = ax.plot(lambda x: 1 / (1 + np.exp(-x)), x_range=[-6, 6], color=ORANGE, stroke_width=4) + sig_name = Text("Logistic regression", font_size=24, color=ORANGE, weight=BOLD).next_to(ax, UP, buff=0.15) + ehat = MathTex(r"\hat{e}(X)", color=ORANGE).scale(0.95).next_to(ax, DOWN, buff=0.3) + ehat_lbl = Text("Estimated propensity score", font_size=20, color=GRAY_A, weight=BOLD).next_to(ehat, DOWN, buff=0.12) + + play_at(33.45, FadeIn(real_head), Create(ax), Create(sig), FadeIn(sig_name), run_time=0.9) # chunk9 + play_at(39.1, Write(ehat), FadeIn(ehat_lbl), run_time=0.6) # chunk10 + ppl = VGroup(dot(SEVERE), dot(SEVERE), dot(MILD), dot(SEVERE), dot(MILD)).arrange(RIGHT, buff=0.45).move_to(RIGHT * 3.3 + UP * 0.6) + ppl_badges = VGroup(*[badge(ppl[i], "×w", ORANGE, 16) for i in range(len(ppl))]) + indiv_lbl = Text("A weight for each person", font_size=24, color=ORANGE, weight=BOLD).next_to(ppl, DOWN, buff=0.45) + play_at(42.9, FadeIn(ppl), LaggedStartMap(FadeIn, ppl_badges, shift=DOWN * 0.05, lag_ratio=0.08), + FadeIn(indiv_lbl), run_time=1.0) # chunk11 + balanced = Text("Now the groups match ✓", font_size=24, color=GREEN_B, weight=BOLD).next_to(indiv_lbl, DOWN, buff=0.3) + play_at(47.2, FadeIn(balanced, shift=UP * 0.08), run_time=0.5) # chunk12 + caution = Text("If the model is wrong, so is IPW → choose inputs carefully", font_size=26, color=YELLOW, weight=BOLD).to_edge(DOWN, buff=0.5) + play_at(50.3, FadeIn(caution, shift=UP * 0.1), run_time=0.5) # chunk13 + play_at(55.3, FadeOut(real_head, ax, sig, sig_name, ehat, ehat_lbl, ppl, ppl_badges, indiv_lbl, balanced, caution), run_time=0.4) + + # Beat 4 — why IPW matters: merit / limit (chunk14~16) + sig_title = Text("Why IPW matters", font_size=44, color=ORANGE, weight=BOLD).to_edge(UP, buff=0.8) + merit = VGroup( + Text("✓", font_size=34, color=GREEN_B, weight=BOLD), + Text("Causal effects from observational data alone", font_size=30, color=GREEN_B, weight=BOLD), + ).arrange(RIGHT, buff=0.35).move_to(UP * 0.4) + limit = VGroup( + Text("△", font_size=32, color=GRAY_B, weight=BOLD), + Text("Limited by unmeasured confounders", font_size=30, color=GRAY_A, weight=BOLD), + ).arrange(RIGHT, buff=0.35).move_to(DOWN * 0.8) + + play_at(56.8, FadeIn(sig_title, shift=DOWN * 0.1), FadeIn(merit, shift=UP * 0.1), run_time=0.6) # chunk14 + play_at(64.6, FadeIn(limit, shift=UP * 0.1), run_time=0.5) # chunk15 + play_at(67.9, Indicate(sig_title, color=ORANGE, scale_factor=1.12), run_time=1.0) # chunk16 + + go_to(74.6) diff --git a/videos/what_is_ipw/en/src/scripts/00_intro.txt b/videos/what_is_ipw/en/src/scripts/00_intro.txt new file mode 100644 index 0000000..ddba178 --- /dev/null +++ b/videos/what_is_ipw/en/src/scripts/00_intro.txt @@ -0,0 +1,3 @@ +Today's topic is Inverse Probability Weighting, or IPW. + +Let's explore how to uncover the true cause using only observational data. diff --git a/videos/what_is_ipw/en/src/scripts/01_medicine_question.txt b/videos/what_is_ipw/en/src/scripts/01_medicine_question.txt new file mode 100644 index 0000000..47b83a7 --- /dev/null +++ b/videos/what_is_ipw/en/src/scripts/01_medicine_question.txt @@ -0,0 +1,33 @@ +Let's look at an example. + +If you take a cold medicine and recover in a few days, you naturally think it was thanks to the medicine. + +But was it really because of the medicine? + +Today, we'll talk about how to tell whether a difference we see is truly caused by that factor. + +It does feel obvious that taking medicine helps you recover faster. + +And checking it seems simple. We just compare the recovery time of people who took the medicine with those who didn't. + +Let's look at data from one hospital. + +People who took the medicine recovered in 5.6 days on average. But people who didn't take it recovered in just 4.7 days. + +So the group that took the medicine was actually sick 0.9 days longer. At this rate, the medicine looks harmful. + +But is that really true? This is exactly where the trap is. + +This time, let's split the patients into severe and mild cases. + +First, the severe patients. With the medicine, recovery took 7 days on average; without it, 8 days. So the medicine sped up recovery by one day. + +What about the mild patients? With the medicine, recovery took 2 days; without it, 3 days. Here too, the medicine cut recovery by a day. + +Isn't that strange? In both severe and mild cases, the medicine clearly sped up recovery. + +But when we combine them and look at the whole, the result flips, and the medicine looks harmful instead. + +Why on earth does this happen? + +To solve this puzzle, today we'll explore IPW. diff --git a/videos/what_is_ipw/en/src/scripts/02_ipw_application.txt b/videos/what_is_ipw/en/src/scripts/02_ipw_application.txt new file mode 100644 index 0000000..98933d6 --- /dev/null +++ b/videos/what_is_ipw/en/src/scripts/02_ipw_application.txt @@ -0,0 +1,71 @@ +Let's revisit the problem we just saw. + +In both severe and mild patients, the medicine clearly sped up recovery by one day each. + +But looking at the whole, the group that took the medicine actually takes 0.9 days longer to recover. The effect is completely reversed. + +When we split the data the medicine clearly helps, but when we combine it the medicine looks worse. This phenomenon is called Simpson's Paradox. + +So why does this happen? + +Let's take a look at the makeup of each group. + +The group that took the medicine was originally packed with severe patients, who recover slowly. + +On the other hand, the group that didn't take it had many mild patients, who recover quickly. + +From the start, the makeup of the two groups was completely different. + +So the group that took the medicine looked slower to recover overall, and the medicine appeared harmful, flipping the result. + +A variable that affects both whether you take the medicine and your recovery time is called a confounder. + +In other words, being severe or mild is the confounder here. + +So what should we do? + +The key idea is simple. + +We imagine what it would look like if the two groups had the same ratio of severe and mild patients from the start. + +We don't change the data itself. Instead, we give each person a weight. + +First, let's find each patient's probability of taking the medicine. + +Let's mark each patient with a circle, and color in only those who took the medicine. + +Among the six severe patients, five took the medicine. So the probability that a severe person takes it is five out of six, about 83 percent. + +Among the mild patients, two of four took it, so that probability is two out of four, that is, 50 percent. + +Now we use the reciprocal of this probability as the weight. + +Common cases get a small weight, and rare cases get a large weight. + +For example, a severe person who took the medicine gets 1.2 times, while one who didn't take it gets 6 times the weight. + +For mild patients, the two sides are similar, so we multiply both by 2. + +Once we apply these weights, the ratio of severe to mild becomes the same in both groups. + +A rare person with a large weight comes to represent several people, while a common person with a small weight has less influence. + +Then, as if the patients had been fairly divided from the start, the data becomes balanced. + +This method is called Inverse Probability Weighting, or IPW. + +Now we just compare the averages again on the balanced data. + +We multiply each person's recovery days by their weight, add them all up, and divide by the sum of the weights. + +Doing this, the weighted average for the group that took the medicine is 5 days. + +And the weighted average for the group that didn't take it is 6 days. + +Now the group that took the medicine recovers one day faster. The medicine that looked harmful was actually speeding up recovery by a day. + +Only after removing the effect of severe and mild does the medicine's true, hidden effect appear. + +In this way, IPW reweights the data as if the medicine had been randomly assigned from the start. + +Now we can compare the data in a balanced way, just like a randomized experiment. diff --git a/videos/what_is_ipw/en/src/scripts/03_ipw_formula.txt b/videos/what_is_ipw/en/src/scripts/03_ipw_formula.txt new file mode 100644 index 0000000..52a8108 --- /dev/null +++ b/videos/what_is_ipw/en/src/scripts/03_ipw_formula.txt @@ -0,0 +1,25 @@ +Now let's generalize what we just did with a formula. + +Let's check just a few symbols together. + +T indicates whether the medicine was taken: 1 if taken, 0 if not. + +X represents the patient's state, that is, whether they are severe or mild. + +And we'll write the probability that a person in state X takes the medicine as follows. + +Now let's define the weight. We give the reciprocal of the probability of the event that actually happened. + +For a person who took the medicine, we give the reciprocal of the probability of taking it. + +For a person who didn't take it, we give the reciprocal of the probability of not taking it, that is, one minus the probability of taking it. + +The rarer the event, the smaller its probability, so its reciprocal, the weight, becomes larger. + +Shall we combine these two cases into a single formula? + +The left term is for people who took the medicine, and the right term is for people who didn't. + +If T is 1, only the left term remains; if T is 0, only the right term remains. + +This way, each person is given exactly one weight that fits their own case. diff --git a/videos/what_is_ipw/en/src/scripts/04_propensity_score.txt b/videos/what_is_ipw/en/src/scripts/04_propensity_score.txt new file mode 100644 index 0000000..d39eded --- /dev/null +++ b/videos/what_is_ipw/en/src/scripts/04_propensity_score.txt @@ -0,0 +1,37 @@ +Until now, we could count the probabilities by hand, because there were only two cases: severe and mild. + +But here, a problem arises. + +In reality, there's much more to consider: sex, age, weight, even pre-existing conditions. + +When there are this many confounders, counting probabilities one by one by hand becomes impractical. + +So we change our approach. + +The probability that a person takes the medicine, given their information. This is called the propensity score. + +It's just a name for the same probability we used before. + +So how do we find this probability? We train a statistical model on the data to estimate it. + +A common method is logistic regression. + +When we feed in patient information like sex and symptoms, a curve called the sigmoid squeezes the value to between 0 and 1. + +This curve sticks close to 0 when the input is very small, and approaches 1 when the input is very large. + +So no matter what information comes in, the output is always between 0 and 1, in other words, a probability. + +The probability the model estimates this way is called the estimated propensity score. + +Once we have the estimated propensity score, the rest is the same. Just as we learned, we make the weights from the reciprocal of the score. + +But there's one thing to watch out for. + +All of this only holds when the propensity score is estimated well. + +For example, imagine a formula with only sex and symptoms. Suppose age strongly affects whether someone takes the medicine, but the model has no variable for age. In that case, we miss the real effect of age on taking the medicine. + +As a result, the estimated probability differs from reality, and the confounding isn't cleanly removed. + +So in real analysis, which information you put into the model matters more than anything. diff --git a/videos/what_is_ipw/en/src/scripts/05_summary.txt b/videos/what_is_ipw/en/src/scripts/05_summary.txt new file mode 100644 index 0000000..707d705 --- /dev/null +++ b/videos/what_is_ipw/en/src/scripts/05_summary.txt @@ -0,0 +1,31 @@ +Let's wrap up what we've learned. + +We wanted to know whether the medicine really speeds up recovery. + +But we couldn't just compare those who took the medicine with those who didn't. + +Because the group that took it was crowded with severe patients, and the group that didn't with mild patients. + +When the two groups start from different baselines like this, the confounder hides the true effect. + +What solved this problem was IPW. + +By giving each person a weight equal to the reciprocal of the probability, we make the makeup of the two groups the same. + +Then the medicine that looked harmful revealed its true effect: speeding up recovery by one day. + +In real data with many variables, we estimate this probability with a model like logistic regression. + +The probability estimated this way is the estimated propensity score. + +Using this estimated propensity score, we assign a weight to each individual. + +Then, overall, we can make the makeup of the two groups the same. + +Of course, if the model is inaccurate, IPW wavers too, so we must be careful about what information we include. + +The greatest appeal of IPW is that, even when experiments are hard, it can estimate causal effects from observational data alone. + +If there are confounders we couldn't measure, its limits are clear, + +but used carefully, knowing those limits, IPW gives answers that a simple average comparison never could. diff --git a/videos/what_is_ipw/en/src/thumbnail.py b/videos/what_is_ipw/en/src/thumbnail.py new file mode 100644 index 0000000..d206e19 --- /dev/null +++ b/videos/what_is_ipw/en/src/thumbnail.py @@ -0,0 +1,53 @@ +from manim import * + +# what_is_ipw 유튜브 썸네일 (1920×1080) +# 컨셉: 동그라미(환자)로 '서로 다르던 두 그룹이 IPW로 똑같이 균형 맞춰진다'만 강조. +# 디자인: 점 격자를 45도 기울인 다이아몬드 배치로 감각 있게. + +SEVERE = "#b07cff" # 중증 +MILD = "#39d98a" # 경증 +ORG = "#F5A623" +GOOD = "#7CE38B" +GOLD = "#F4C95D" + + +class Thumbnail(Scene): + def construct(self): + self.camera.background_color = "#000000" + + def cluster(n_p, n_g, cols, r=0.16, buff=0.14): + g = VGroup(*[Dot(color=SEVERE, radius=r) for _ in range(n_p)], + *[Dot(color=MILD, radius=r) for _ in range(n_g)]) + g.arrange_in_grid(cols=cols, buff=buff) + g.rotate(45 * DEGREES) # 45도 기울인 다이아몬드 배치 + return g + + # ===== 상단 타이틀 ===== + ipw = Tex(r"\textbf{IPW}", color=ORG).scale(2.4) + ipw_en = Text("Inverse Probability Weighting", font_size=42, color=WHITE, weight=BOLD) + title = VGroup(ipw, ipw_en).arrange(RIGHT, buff=0.5).to_edge(UP, buff=0.55) + + # ===== 위: 서로 다른 두 그룹 (작게, 흐리게) ===== + b0 = cluster(5, 2, 4, r=0.1, buff=0.1) + c0 = cluster(1, 2, 3, r=0.1, buff=0.1) + neq = Tex(r"$\neq$", color=GREY_B).scale(1.4) + before = VGroup(b0, neq, c0).arrange(RIGHT, buff=0.55).set_opacity(0.55) + before_grp = VGroup(before).arrange(DOWN, buff=0.3).move_to(UP * 1.35) + + # ===== IPW 화살표 ===== + arrow = Arrow(UP * 0.55, DOWN * 0.55, color=ORG, stroke_width=10, + max_tip_length_to_length_ratio=0.35).move_to(UP * 0.05) + arrow_lbl = Text("IPW", font_size=34, color=ORG, weight=BOLD).next_to(arrow, RIGHT, buff=0.25) + step = VGroup(arrow, arrow_lbl) + + # ===== 아래: 균형 맞춰진 두 그룹 (크게, 강조) ===== + b1 = cluster(6, 4, 5) + c1 = cluster(6, 4, 5) + eq = Tex(r"$=$", color=GOOD).scale(2.6) + after = VGroup(b1, eq, c1).arrange(RIGHT, buff=0.9) + glow = SurroundingRectangle(after, color=GOOD, stroke_width=3, corner_radius=0.25, buff=0.4) + after_lbl = Text("Balance", font_size=40, color=GOOD, weight=BOLD) + after_grp = VGroup(VGroup(after, glow), after_lbl).arrange(DOWN, buff=0.35).move_to(DOWN * 1.7) + + self.add(title, before_grp, step, after_grp) + diff --git a/videos/what_is_ipw/ko/src/00_intro.py b/videos/what_is_ipw/ko/src/00_intro.py new file mode 100644 index 0000000..27538f1 --- /dev/null +++ b/videos/what_is_ipw/ko/src/00_intro.py @@ -0,0 +1,41 @@ +from manim import * + +# 씬 00 — 시작 표지 (약 5초). 오늘의 주제 IPW 소개. +# 타이밍 기준: build/audio/00_intro.timings.json (총 7.89s) +# chunk0 0.00~3.30 "오늘의 주제는, 역확률 가중치. 아이피더블유입니다." +# chunk1 3.30~7.85 "관찰 데이터만으로 진짜 원인을 가려내는 방법을..." + + +class Intro(Scene): + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + topic = Text("오늘의 주제", font_size=34, color=GRAY_A, weight=BOLD) + ipw = Text("IPW", font_size=150, color=ORANGE, weight=BOLD) + kr = Text("역확률 가중치", font_size=46, color=WHITE, weight=BOLD) + en = Text("Inverse Probability Weighting", font_size=30, color=GRAY_B, weight=BOLD) + title = VGroup(topic, ipw, kr, en).arrange(DOWN, buff=0.38).move_to(UP * 0.1) + + # 밑줄 악센트 + rule = Line(LEFT * 2.7, RIGHT * 2.7, color=ORANGE, stroke_width=3).next_to(ipw, DOWN, buff=0.18) + + # chunk0 — 주제 + IPW + play_at(0.30, FadeIn(topic, shift=DOWN * 0.1), run_time=0.4) + play_at(0.90, Write(ipw), run_time=0.7) + play_at(1.90, GrowFromCenter(rule), FadeIn(kr, shift=UP * 0.1), run_time=0.5) + # chunk1 — 영문 병기 + play_at(3.60, FadeIn(en), run_time=0.5) + + go_to(8.10) diff --git a/videos/what_is_ipw/ko/src/01_what_is_ipw.py b/videos/what_is_ipw/ko/src/01_what_is_ipw.py new file mode 100644 index 0000000..15eb9b8 --- /dev/null +++ b/videos/what_is_ipw/ko/src/01_what_is_ipw.py @@ -0,0 +1,196 @@ +from manim import * +import numpy as np + +# 공용 tabler 아이콘 경로 (outline 세트) +ICON = "/Users/jhkim/Desktop/personal_study/causal/causal-studio/videos/assets/tabler-icons/icons/outline" + + +class MedicineQuestionSynced(Scene): + """ + 씬 01 — 감기약은 정말 회복을 앞당기는가? (심슨의 역설 도입) + + 핵심: 전체만 보면 약이 해로워 보이지만, 중증/경증으로 나누면 약은 항상 도움이 된다. + 이번 개정 포인트: + - 비교 대상(회복 기간의 '차이')을 명시적으로 보여 준다. + - "더 오래 아팠다?"에 표정 아이콘을 같이 띄운다. + - 그룹(복용이 짧음) → 전체(복용이 김)로 막대가 '직접 뒤집히는' 애니메이션. + - 장면 사이 블랙아웃 없이 겹쳐서 전환. + - IPW 영문(Inverse Probability Weighting) 병기. + 타이밍 기준: build/audio/01_medicine_question.timings.json (총 93.45s) + """ + + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + def icon(name, color, height=1.1): + m = SVGMobject(f"{ICON}/{name}.svg") + m.set_stroke(color, width=3) + m.set_fill(color, opacity=0) + m.scale_to_fit_height(height) + return m + + UNIT = 0.62 # 1일당 막대 길이 + + def make_bar(days, color, y, x0, unit=UNIT): + bar = RoundedRectangle(width=days * unit, height=0.42, corner_radius=0.07, + stroke_color=color, stroke_width=2.5, fill_color=color, fill_opacity=0.5) + bar.move_to([x0 + days * unit / 2, y, 0]) + return bar + + def bar_row(label, days, color, y, x0, fs=24, unit=UNIT): + lab = Text(label, font_size=fs, color=color, weight=BOLD) + lab.move_to([x0 - 0.3 - lab.width / 2, y, 0]) + bar = make_bar(days, color, y, x0, unit) + val = Text(f"{days:g}일", font_size=fs + 2, color=color, weight=BOLD).next_to(bar, RIGHT, buff=0.18) + return VGroup(lab, bar, val) + + # ============================================================ + # Beat A — 아픈 사람 → 약 → 회복, 정말 약 때문? (chunk1~2, 0.00~8.03) + # ============================================================ + sick = icon("mood-sick", RED_B, height=1.5).move_to(LEFT * 3.4 + UP * 0.2) + happy = icon("mood-happy", GREEN_B, height=1.5).move_to(RIGHT * 3.4 + UP * 0.2) + rec_arrow = Arrow(sick.get_right(), happy.get_left(), color=WHITE, stroke_width=6, buff=0.5) + pill = icon("pill", BLUE_B, height=0.95).move_to(UP * 1.45) + pill_lbl = Text("약", font_size=26, color=BLUE_B, weight=BOLD).next_to(pill, RIGHT, buff=0.15) + qbig = Text("?", font_size=120, color=RED, weight=BOLD).move_to(DOWN * 1.7) + + play_at(2.00, FadeIn(sick, shift=UP * 0.15), run_time=0.4) # chunk2 + play_at(3.80, GrowArrow(rec_arrow), FadeIn(happy, shift=UP * 0.15), run_time=0.5) + play_at(5.20, FadeIn(pill, shift=DOWN * 0.15), FadeIn(pill_lbl), run_time=0.4) + play_at(7.90, FadeIn(qbig, scale=1.2), run_time=0.4) # chunk3 "정말 약 때문?" + beatA = VGroup(sick, happy, rec_arrow, pill, pill_lbl, qbig) + + # ============================================================ + # Beat B — 제목 (chunk4) — 블랙아웃 없이 겹쳐 전환 + # ============================================================ + title = Text("진짜 원인을 가려내려면?", font_size=40, color=YELLOW, weight=BOLD).move_to(UP * 0.3) + play_at(10.50, FadeOut(beatA), FadeIn(title), run_time=0.5) # chunk4 + + # ============================================================ + # Beat C — 무엇을 비교? 회복 기간의 '차이' (chunk5~6) + # ============================================================ + hint = Text("약을 먹으면 더 빨리 나을 것 같습니다.", font_size=30, color=GRAY_A, weight=BOLD).move_to(DOWN * 0.4) + play_at(15.85, title.animate.scale(0.85).to_edge(UP, buff=0.6), FadeIn(hint, shift=UP * 0.1), run_time=0.5) # chunk5 + + g_treat = icon("users", BLUE_B, height=1.2).move_to(LEFT * 3.4 + UP * 0.35) + t_lbl = Text("약 복용", font_size=26, color=BLUE_B, weight=BOLD).next_to(g_treat, DOWN, buff=0.25) + g_ctrl = icon("users", RED_B, height=1.2).move_to(RIGHT * 3.4 + UP * 0.35) + c_lbl = Text("약 미복용", font_size=26, color=RED_B, weight=BOLD).next_to(g_ctrl, DOWN, buff=0.25) + # 비교 대상 = 회복 기간, 그 '차이'를 본다 + metric = Text("회복 기간 차이?", font_size=30, color=YELLOW, weight=BOLD).move_to(DOWN * 1.0) + diff_arrow = DoubleArrow(LEFT * 1.7 + DOWN * 1.65, RIGHT * 1.7 + DOWN * 1.65, color=YELLOW, + stroke_width=5, tip_length=0.22, buff=0.1) + + play_at(18.70, FadeOut(hint), + FadeIn(g_treat, t_lbl), FadeIn(g_ctrl, c_lbl), run_time=0.5) # chunk6 + play_at(21.50, FadeIn(metric, shift=UP * 0.08), GrowFromCenter(diff_arrow), run_time=0.5) + beatC = VGroup(g_treat, t_lbl, g_ctrl, c_lbl, metric, diff_arrow) + + # ============================================================ + # Beat D — 전체 환자 막대: 복용 5.6 > 미복용 4.7 (chunk6~8, 24.71~41.89) + # ============================================================ + c_title = Text("전체 환자", font_size=34, color=GRAY_A, weight=BOLD).to_edge(UP, buff=0.6) + c_x0 = -1.6 + c_treat = bar_row("약 복용", 5.6, BLUE, y=0.8, x0=c_x0, fs=26) + c_ctrl = bar_row("약 미복용", 4.7, RED, y=-0.5, x0=c_x0, fs=26) + harm = Text("약 먹은 쪽이 더 오래 아팠다?", font_size=32, color=RED_B, weight=BOLD).move_to(DOWN * 2.0) + harm_face = icon("mood-confuzed", RED_B, height=0.7).next_to(harm, LEFT, buff=0.25) + + play_at(24.70, FadeOut(beatC), ReplacementTransform(title, c_title), run_time=0.5) # chunk7 + play_at(26.60, GrowFromEdge(c_treat[1], LEFT), FadeIn(c_treat[0], c_treat[2]), run_time=0.6) # chunk8 + play_at(30.20, GrowFromEdge(c_ctrl[1], LEFT), FadeIn(c_ctrl[0], c_ctrl[2]), run_time=0.6) + play_at(34.90, FadeIn(harm, shift=UP * 0.1), FadeIn(harm_face, shift=UP * 0.1), run_time=0.5) # chunk9 + + # ============================================================ + # Beat E — 함정 + 중증/경증으로 나눠서 (chunk9~10, 41.89~51.08) + # ============================================================ + trap = Text("정말 그럴까요?", font_size=28, color=YELLOW, weight=BOLD).move_to(DOWN * 2.0) + play_at(41.72, FadeOut(harm, harm_face), FadeIn(trap, shift=UP * 0.08), run_time=0.45) # chunk10 + + split_title = Text("중증 / 경증으로 나눠 보면?", font_size=34, color=YELLOW, weight=BOLD).to_edge(UP, buff=0.55) + # 전체 막대를 정리하고(겹쳐서) 그룹 레이아웃으로 전환 — 블랙아웃 없음 + play_at(45.43, FadeOut(c_treat, c_ctrl, trap), + ReplacementTransform(c_title, split_title), run_time=0.5) # chunk11 + + # ============================================================ + # Beat F — 중증(7/8), 경증(2/3): 각 그룹에서 복용이 더 짧다 (chunk11~13, 51.08~80.39) + # ============================================================ + d_x0 = -1.9 + sev_head = Text("중증", font_size=28, color=BLUE_D, weight=BOLD).move_to(LEFT * 5.0 + UP * 1.55) + sev_t = bar_row("복용", 7, BLUE, y=1.95, x0=d_x0, fs=22) + sev_c = bar_row("미복용", 8, RED, y=1.15, x0=d_x0, fs=22) + mild_head = Text("경증", font_size=28, color=PINK, weight=BOLD).move_to(LEFT * 5.0 + DOWN * 1.15) + mild_t = bar_row("복용", 2, BLUE, y=-0.75, x0=d_x0, fs=22) + mild_c = bar_row("미복용", 3, RED, y=-1.55, x0=d_x0, fs=22) + sev_ok = Text("약이 1일 단축 ✓", font_size=24, color=GREEN_B, weight=BOLD).next_to(sev_c, RIGHT, buff=0.5) + mild_ok = Text("약이 1일 단축 ✓", font_size=24, color=GREEN_B, weight=BOLD).next_to(mild_c, RIGHT, buff=0.5) + + # chunk12 중증 + play_at(49.40, FadeIn(sev_head), GrowFromEdge(sev_t[1], LEFT), FadeIn(sev_t[0], sev_t[2]), run_time=0.55) + play_at(52.80, GrowFromEdge(sev_c[1], LEFT), FadeIn(sev_c[0], sev_c[2]), run_time=0.55) + play_at(55.80, FadeIn(sev_ok, shift=LEFT * 0.1), run_time=0.4) + # chunk13 경증 + play_at(60.30, FadeIn(mild_head), GrowFromEdge(mild_t[1], LEFT), FadeIn(mild_t[0], mild_t[2]), run_time=0.55) + play_at(63.60, GrowFromEdge(mild_c[1], LEFT), FadeIn(mild_c[0], mild_c[2]), run_time=0.55) + play_at(66.60, FadeIn(mild_ok, shift=LEFT * 0.1), run_time=0.4) + # chunk14 — 그룹 안에선 약이 분명히 좋다 + both_ok = Text("그룹 안에서는 약이 분명히 좋다 ✓", font_size=28, color=GREEN_B, weight=BOLD).to_edge(DOWN, buff=0.45) + play_at(71.70, FadeIn(both_ok, shift=UP * 0.1), run_time=0.5) + + # ============================================================ + # Beat G — 역전: 그룹(복용 짧음) → 전체(복용 김)로 막대가 직접 뒤집힌다 (chunk14, 80.39~86.52) + # ============================================================ + rev_title = Text("그런데 전체로 합치면…", font_size=34, color=ORANGE, weight=BOLD).to_edge(UP, buff=0.55) + ov_x0 = -1.6 + ov_treat = make_bar(5.6, BLUE, y=0.7, x0=ov_x0) + ov_ctrl = make_bar(4.7, RED, y=-0.6, x0=ov_x0) + ov_t_lbl = Text("약 복용", font_size=26, color=BLUE_B, weight=BOLD) + ov_t_lbl.move_to([ov_x0 - 0.3 - ov_t_lbl.width / 2, 0.7, 0]) + ov_c_lbl = Text("약 미복용", font_size=26, color=RED_B, weight=BOLD) + ov_c_lbl.move_to([ov_x0 - 0.3 - ov_c_lbl.width / 2, -0.6, 0]) + ov_t_val = Text("5.6일", font_size=28, color=BLUE_B, weight=BOLD).next_to(ov_treat, RIGHT, buff=0.18) + ov_c_val = Text("4.7일", font_size=28, color=RED_B, weight=BOLD).next_to(ov_ctrl, RIGHT, buff=0.18) + flip = Text("약 복용이 오히려 해롭습니다 ✗", font_size=30, color=RED_B, weight=BOLD).to_edge(DOWN, buff=0.5) + + # 두 복용(파랑) 막대가 하나의 긴 파랑 막대로, 두 미복용(빨강)이 짧은 빨강으로 합쳐진다 + play_at(77.90, + FadeOut(sev_head, mild_head, sev_ok, mild_ok, both_ok, + sev_t[0], sev_t[2], sev_c[0], sev_c[2], mild_t[0], mild_t[2], mild_c[0], mild_c[2]), + ReplacementTransform(split_title, rev_title), run_time=0.45) # chunk15 + play_at(78.90, + ReplacementTransform(VGroup(sev_t[1], mild_t[1]), ov_treat), + ReplacementTransform(VGroup(sev_c[1], mild_c[1]), ov_ctrl), + FadeIn(ov_t_lbl, ov_c_lbl, ov_t_val, ov_c_val), run_time=1.3) + play_at(81.10, FadeIn(flip, shift=UP * 0.1), run_time=0.5) + + # ============================================================ + # Beat H — 왜? → IPW (영문 병기) (chunk16~17) + # ============================================================ + why = Text("왜 이런 일이 벌어질까요?", font_size=36, color=YELLOW, weight=BOLD).move_to(DOWN * 2.6) + play_at(83.50, FadeIn(why, scale=1.1), run_time=0.4) # chunk16 + + ipw = Text("IPW", font_size=96, color=ORANGE, weight=BOLD) + ipw_kr = Text("역확률 가중치", font_size=40, color=WHITE, weight=BOLD) + ipw_en = Text("Inverse Probability Weighting", font_size=28, color=GRAY_B, weight=BOLD) + ipw_group = VGroup(ipw, ipw_kr, ipw_en).arrange(DOWN, buff=0.3).move_to(ORIGIN) + + play_at(85.40, + FadeOut(rev_title, ov_treat, ov_ctrl, ov_t_lbl, ov_c_lbl, ov_t_val, ov_c_val, flip, why), + run_time=0.35) # chunk17 + play_at(85.85, Write(ipw), run_time=0.6) + play_at(87.10, FadeIn(ipw_kr, shift=UP * 0.1), run_time=0.4) + play_at(88.00, FadeIn(ipw_en, shift=UP * 0.1), run_time=0.4) + + go_to(91.30) diff --git a/videos/what_is_ipw/ko/src/02_ipw_application.py b/videos/what_is_ipw/ko/src/02_ipw_application.py new file mode 100644 index 0000000..30cccb8 --- /dev/null +++ b/videos/what_is_ipw/ko/src/02_ipw_application.py @@ -0,0 +1,367 @@ +from manim import * +import numpy as np + +ICON = "/Users/jhkim/Desktop/personal_study/causal/causal-studio/videos/assets/tabler-icons/icons/outline" + +# 증상(교란변수) 색: 중증=밝은 보라, 경증=밝은 초록. 처치는 위치+BLUE/RED. +SEVERE = "#b07cff" # 중증 +MILD = "#39d98a" # 경증 + + +class IPWApplication(Scene): + """ + 씬 02 — 심슨의 역설을 IPW로 푼다. (3차 개정) + + 이번 개정: + - 가중치(1.2/6/2)가 나올 때마다 '약을 먹을 확률' 박스의 동그라미를 끌어와 연결. + - '사람마다 가중치를 부여' 슬라이드를 조금 더 머무르게. + - '재가중 평균 비교'에서 일반식을 먼저 보이고, 약 복용/미복용을 목소리에 맞춰. + - 마지막 결론 워딩을 스크립트에 맞춤. + 타이밍 기준: build/audio/02_ipw_application.timings.json (총 189.69s, 36 chunks) + """ + + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + def dot(color, r=0.16, fill=0.9): + return Circle(radius=r, stroke_color=color, stroke_width=2).set_fill(color, opacity=fill) + + def ring(color, r=0.16): + return Circle(radius=r, stroke_color=color, stroke_width=2.5).set_fill(color, opacity=0.0) + + def people_block(n_sev, n_mild, cols=4, r=0.16, buff=0.16): + g = VGroup(*[dot(SEVERE, r) for _ in range(n_sev)], + *[dot(MILD, r) for _ in range(n_mild)]) + rows = int(np.ceil(len(g) / cols)) + g.arrange_in_grid(rows=rows, cols=cols, buff=buff) + return g + + def icon(name, color, height=1.0): + m = SVGMobject(f"{ICON}/{name}.svg") + m.set_stroke(color, width=3) + m.set_fill(color, opacity=0) + m.scale_to_fit_height(height) + return m + + UNIT = 0.62 + + def make_bar(days, color, y, x0, unit=UNIT): + bar = RoundedRectangle(width=days * unit, height=0.46, corner_radius=0.07, + stroke_color=color, stroke_width=2.5, fill_color=color, fill_opacity=0.5) + bar.move_to([x0 + days * unit / 2, y, 0]) + return bar + + # ============================================================ + # Beat 1 — 심슨의 역설 정리 (chunk1~5, 0.00~27.91) + # ============================================================ + recap = Text("다시 정리해 보면", font_size=30, color=GRAY_A, weight=BOLD).to_edge(UP, buff=0.7) + row_group = VGroup( + Text("그룹 안에서는", font_size=30, color=GRAY_A, weight=BOLD), + Text("약 → 회복 빨라짐", font_size=32, color=GREEN_B, weight=BOLD), + Text("✓", font_size=40, color=GREEN_B, weight=BOLD), + ).arrange(RIGHT, buff=0.5).move_to(UP * 1.55) + row_total = VGroup( + Text("전체로 보면", font_size=30, color=GRAY_A, weight=BOLD), + Text("약 → 회복 느려짐", font_size=32, color=RED_B, weight=BOLD), + Text("✗", font_size=40, color=RED_B, weight=BOLD), + ).arrange(RIGHT, buff=0.5).move_to(UP * 0.45) + paradox = Text("심슨의 역설 (Simpson's Paradox)", font_size=40, color=ORANGE, weight=BOLD).move_to(DOWN * 1.6) + why = Text("왜 이런 일이 벌어질까요?", font_size=34, color=YELLOW, weight=BOLD).move_to(DOWN * 2.9) + + play_at(0.30, FadeIn(recap), run_time=0.35) + play_at(2.40, FadeIn(row_group, shift=UP * 0.1), run_time=0.45) # chunk2 + play_at(8.40, FadeIn(row_total, shift=UP * 0.1), run_time=0.45) # chunk3 + play_at(22.00, FadeIn(paradox, shift=UP * 0.1), run_time=0.5) # chunk4: '심슨의 역설' 들릴 때 + play_at(25.69, FadeIn(why), run_time=0.4) # chunk5 + + # ============================================================ + # Beat 2 — 사람 dots로 그룹 구성 차이 + 양쪽 강조 (chunk6~10, 27.91~50.85) + # ============================================================ + title2 = Text("심슨의 역설", font_size=34, color=ORANGE, weight=BOLD).to_edge(UP, buff=0.45) + legend = VGroup( + dot(SEVERE, 0.13), Text("중증", font_size=22, color=SEVERE, weight=BOLD), + dot(MILD, 0.13), Text("경증", font_size=22, color=MILD, weight=BOLD), + ).arrange(RIGHT, buff=0.22).next_to(title2, DOWN, buff=0.25) + treat_block = people_block(5, 2, cols=4).scale(1.05).move_to(LEFT * 3.4 + DOWN * 0.3) + treat_lbl = Text("약 복용", font_size=28, color=BLUE_B, weight=BOLD).next_to(treat_block, UP, buff=0.4) + ctrl_block = people_block(1, 2, cols=3).scale(1.05).move_to(RIGHT * 3.4 + DOWN * 0.3) + ctrl_lbl = Text("약 미복용", font_size=28, color=RED_B, weight=BOLD).next_to(ctrl_block, UP, buff=0.4) + diff_note = Text("두 그룹의 환자 구성이 처음부터 달랐다", font_size=30, color=GRAY_A, weight=BOLD).move_to(DOWN * 2.9) + treat_box = SurroundingRectangle(VGroup(treat_lbl, treat_block), color=BLUE_B, buff=0.25, corner_radius=0.15) + ctrl_box = SurroundingRectangle(VGroup(ctrl_lbl, ctrl_block), color=RED_B, buff=0.25, corner_radius=0.15) + + play_at(28.10, FadeOut(recap, row_group, row_total, paradox, why), + FadeIn(title2, legend), run_time=0.4) + # chunk7 (30.46): 약 먹은 그룹엔 중증 잔뜩 → 등장 + 중증 강조 + play_at(30.66, FadeIn(treat_lbl), LaggedStartMap(FadeIn, treat_block, lag_ratio=0.12), run_time=1.0) + play_at(32.20, Create(treat_box), + *[Indicate(treat_block[i], color=SEVERE, scale_factor=1.4) for i in range(5)], run_time=1.3) + play_at(34.30, FadeOut(treat_box), run_time=0.3) + # chunk8 (35.67): 안 먹은 그룹엔 경증 많음 → 등장 + 경증 강조 (대칭) + play_at(35.87, FadeIn(ctrl_lbl), LaggedStartMap(FadeIn, ctrl_block, lag_ratio=0.15), run_time=0.9) + play_at(37.30, Create(ctrl_box), + *[Indicate(ctrl_block[i], color=MILD, scale_factor=1.4) for i in range(1, 3)], run_time=1.3) + play_at(39.40, FadeOut(ctrl_box), run_time=0.3) + play_at(40.37, FadeIn(diff_note, shift=UP * 0.1), run_time=0.5) # chunk9 + play_at(50.55, FadeOut(title2, legend, treat_block, treat_lbl, ctrl_block, ctrl_lbl, diff_note), run_time=0.3) + + # ============================================================ + # Beat 3 — 교란 변수 (+ 약 복용 → 회복 기간 화살표) (chunk11~12, 50.85~60.09) + # ============================================================ + conf = VGroup( + Text("교란 변수 (Confounder)", font_size=36, color=PURPLE_A, weight=BOLD), + Text("중증 / 경증", font_size=26, color=GRAY_B, weight=BOLD), + ).arrange(DOWN, buff=0.15).move_to(UP * 1.9) + node_t = Text("약 복용 여부", font_size=30, color=BLUE_B, weight=BOLD).move_to(LEFT * 3.2 + DOWN * 1.4) + node_y = Text("회복 기간", font_size=30, color=GREEN_B, weight=BOLD).move_to(RIGHT * 3.2 + DOWN * 1.4) + a1 = Arrow(conf.get_bottom(), node_t.get_top(), color=PURPLE_A, stroke_width=5, buff=0.25) + a2 = Arrow(conf.get_bottom(), node_y.get_top(), color=PURPLE_A, stroke_width=5, buff=0.25) + a3 = Arrow(node_t.get_right(), node_y.get_left(), color=BLUE_B, stroke_width=5, buff=0.3) + + play_at(51.05, FadeIn(conf, shift=DOWN * 0.1), run_time=0.5) # chunk11 + play_at(53.20, GrowArrow(a1), GrowArrow(a2), FadeIn(node_t, node_y), run_time=0.7) + play_at(55.20, GrowArrow(a3), run_time=0.6) # 약 복용 여부 → 회복 기간 + play_at(57.32, Indicate(conf[1], color=PURPLE_A, scale_factor=1.2), run_time=0.8) # chunk12 + # chunk13~14 (60.09~63.72) '그럼 어떻게/단순합니다' 동안 교란 화면을 가만히 둔다. + + # ============================================================ + # Beat 4 — '단순합니다' 직후 전환 + 개인별 가중치(머무르게) (chunk15~17, 63.72~77.97) + # ============================================================ + idea = Text("두 그룹의 구성이 같았다면?", font_size=42, color=YELLOW, weight=BOLD).move_to(UP * 1.6) + keep = Text("데이터는 그대로", font_size=30, color=GRAY_A, weight=BOLD).move_to(UP * 0.4) + ppl = VGroup(dot(SEVERE), dot(SEVERE), dot(MILD), dot(SEVERE), dot(MILD)).arrange(RIGHT, buff=0.9).move_to(DOWN * 0.9) + badges = VGroup(*[Text("×?", font_size=22, color=ORANGE, weight=BOLD).next_to(d, UP, buff=0.12) for d in ppl]) + wgt_note = Text("사람마다 가중치를 부여", font_size=34, color=ORANGE, weight=BOLD).move_to(DOWN * 2.1) + + play_at(63.92, FadeOut(conf, node_t, node_y, a1, a2, a3), + FadeIn(idea, shift=UP * 0.1), run_time=0.5) # chunk15: '단순합니다' 직후 + play_at(69.35, FadeIn(keep), run_time=0.4) # chunk16 + play_at(71.80, LaggedStartMap(FadeIn, ppl, lag_ratio=0.1), + LaggedStartMap(FadeIn, badges, shift=DOWN * 0.1, lag_ratio=0.1), + FadeIn(wgt_note), run_time=1.3) # '사람마다 가중치' (머무르게) + + # ============================================================ + # Beat 5 — 약 먹을 확률 (원 두 줄 좌측 정렬 + '명') (chunk17~20, 74.54~103.05) + # ============================================================ + prob_box = RoundedRectangle(width=11.5, height=4.6, corner_radius=0.25, + stroke_color=GRAY_B, stroke_width=2, fill_opacity=0).move_to(UP * 0.1) + prob_title = Text("약을 먹을 확률", font_size=34, color=WHITE, weight=BOLD).move_to(prob_box.get_top() + DOWN * 0.5) + sev_lbl = Text("중증", font_size=26, color=SEVERE, weight=BOLD).move_to(LEFT * 4.9 + UP * 0.55) + mild_lbl = Text("경증", font_size=26, color=MILD, weight=BOLD).move_to(LEFT * 4.9 + DOWN * 1.2) + LEFTX = -3.1 + sev_circ = VGroup(*[ring(SEVERE, 0.24) for _ in range(6)]).arrange(RIGHT, buff=0.26) + sev_circ.move_to([0, 0.55, 0]).shift(RIGHT * (LEFTX - sev_circ.get_left()[0])) + mild_circ = VGroup(*[ring(MILD, 0.24) for _ in range(4)]).arrange(RIGHT, buff=0.26) + mild_circ.move_to([0, -1.2, 0]).shift(RIGHT * (LEFTX - mild_circ.get_left()[0])) + sev_cnt = Text("여섯 명 중 다섯 명", font_size=22, color=SEVERE, weight=BOLD).move_to(RIGHT * 4.0 + UP * 0.9) + sev_frac = MathTex(r"\tfrac{5}{6}\approx 83\%", color=SEVERE).scale(0.8).move_to(RIGHT * 4.0 + UP * 0.25) + mild_cnt = Text("네 명 중 두 명", font_size=22, color=MILD, weight=BOLD).move_to(RIGHT * 4.0 + DOWN * 0.85) + mild_frac = MathTex(r"\tfrac{2}{4}=50\%", color=MILD).scale(0.8).move_to(RIGHT * 4.0 + DOWN * 1.5) + legend2 = Text("색칠 = 약을 먹은 사람", font_size=22, color=GRAY_B, weight=BOLD).move_to(prob_box.get_bottom() + UP * 0.4) + + # chunk17 끝~chunk18 (76.8): 가중치 슬라이드를 충분히 머문 뒤 전환 + play_at(76.80, FadeOut(idea, keep, ppl, badges, wgt_note), + Create(prob_box), FadeIn(prob_title), run_time=0.6) + play_at(78.17, FadeIn(sev_lbl, mild_lbl), + LaggedStartMap(FadeIn, VGroup(*sev_circ, *mild_circ), lag_ratio=0.1), + FadeIn(legend2), run_time=1.4) # chunk18 + play_at(83.88, *[sev_circ[i].animate.set_fill(SEVERE, opacity=0.9) for i in range(5)], + FadeIn(sev_cnt), run_time=1.3) # chunk19 + play_at(89.20, Write(sev_frac), run_time=0.7) + play_at(94.01, *[mild_circ[i].animate.set_fill(MILD, opacity=0.9) for i in range(2)], + FadeIn(mild_cnt), run_time=1.1) # chunk20 + play_at(96.90, Write(mild_frac), run_time=0.7) + beat5 = VGroup(prob_box, prob_title, sev_lbl, sev_circ, sev_cnt, sev_frac, + mild_lbl, mild_circ, mild_cnt, mild_frac, legend2) + + # ============================================================ + # Beat 6 — 가중치 = 1 ÷ 확률, 확률 박스에서 '그 사람들'을 끌어온다 (chunk21~24, 100.31~123.11) + # ============================================================ + play_at(100.51, beat5.animate.scale(0.5).to_edge(LEFT, buff=0.3), run_time=0.7) # chunk21 + whead = Text("가중치 = 1 ÷ 확률", font_size=32, color=ORANGE, weight=BOLD).move_to(RIGHT * 2.7 + UP * 2.55) + link = Arrow(beat5.get_right() + UP * 0.5, whead.get_left() + DOWN * 0.2, + color=ORANGE, stroke_width=4, buff=0.2) + + def dd_cluster(flags, color, y): + g = VGroup(*[(dot(color, 0.11) if f else ring(color, 0.11)) for f in flags]).arrange(RIGHT, buff=0.08) + g.move_to([1.2, y, 0]) + return g + + dd1 = dd_cluster([True] * 5, SEVERE, 1.35) + dd2 = dd_cluster([False], SEVERE, 0.25) + dd3 = dd_cluster([True] * 2, MILD, -0.85) + dd4 = dd_cluster([False] * 2, MILD, -1.95) + + def wbody(label, color, tex, anchor): + lab = Text(label, font_size=21, color=color, weight=BOLD) + m = MathTex(tex, color=color).scale(0.68) + body = VGroup(lab, m).arrange(RIGHT, buff=0.28) + body.next_to(anchor, RIGHT, buff=0.35) + return body + + b1 = wbody("중증·복용", SEVERE, r"1 \div \tfrac{5}{6} = 1.2", dd1) + b2 = wbody("중증·미복용", SEVERE, r"1 \div \tfrac{1}{6} = 6", dd2) + b3 = wbody("경증·복용", MILD, r"1 \div \tfrac{2}{4} = 2", dd3) + b4 = wbody("경증·미복용", MILD, r"1 \div \tfrac{2}{4} = 2", dd4) + rare = Text("드문 경우일수록 크게 반영", font_size=23, color=YELLOW, weight=BOLD).move_to([2.6, -2.95, 0]) + + play_at(100.71, GrowArrow(link), FadeIn(whead), run_time=0.5) # chunk21 + play_at(103.25, FadeIn(rare, shift=UP * 0.08), run_time=0.4) # chunk22 + # chunk23 (108.53): 중증 복용 1.2 / 미복용 6 — 박스에서 그 사람들을 끌어온다 + play_at(108.73, TransformFromCopy(VGroup(*[sev_circ[i] for i in range(5)]), dd1), + FadeIn(b1, shift=RIGHT * 0.1), run_time=0.9) + play_at(113.10, TransformFromCopy(VGroup(sev_circ[5]), dd2), + FadeIn(b2, shift=RIGHT * 0.1), run_time=0.9) + play_at(115.00, Indicate(b2[1], color=YELLOW, scale_factor=1.2), run_time=0.7) + # chunk24 (117.17): 경증 양쪽 2 + play_at(117.40, TransformFromCopy(VGroup(*[mild_circ[i] for i in range(2)]), dd3), + FadeIn(b3, shift=RIGHT * 0.1), run_time=0.8) + play_at(120.00, TransformFromCopy(VGroup(*[mild_circ[i] for i in range(2, 4)]), dd4), + FadeIn(b4, shift=RIGHT * 0.1), run_time=0.8) + play_at(122.90, FadeOut(beat5, link, whead, dd1, dd2, dd3, dd4, b1, b2, b3, b4, rare), run_time=0.35) + + # ============================================================ + # Beat 7 — 개인별 가중치로 재가중 → 균형 (chunk25~28, 123.11~146.80) + # ============================================================ + btitle = Text("재가중 후", font_size=34, color=ORANGE, weight=BOLD).to_edge(UP, buff=0.4) + bt_t = people_block(5, 2, cols=4, r=0.14, buff=0.2).move_to(LEFT * 3.0 + UP * 1.15) + bt_t_l = Text("복용", font_size=22, color=BLUE_B, weight=BOLD).next_to(bt_t, UP, buff=0.2) + bt_c = people_block(1, 2, cols=3, r=0.14, buff=0.2).move_to(RIGHT * 3.0 + UP * 1.15) + bt_c_l = Text("미복용", font_size=22, color=RED_B, weight=BOLD).next_to(bt_c, UP, buff=0.2) + orig_l = Text("원래", font_size=22, color=GRAY_B, weight=BOLD).move_to(LEFT * 5.4 + UP * 1.15) + bt_t_badges = VGroup( + *[Text("×1.2", font_size=14, color=ORANGE, weight=BOLD).next_to(bt_t[i], DOWN, buff=0.05) for i in range(5)], + *[Text("×2", font_size=14, color=ORANGE, weight=BOLD).next_to(bt_t[i], DOWN, buff=0.05) for i in range(5, 7)], + ) + bt_c_badges = VGroup( + Text("×6", font_size=15, color=YELLOW, weight=BOLD).next_to(bt_c[0], DOWN, buff=0.05), + *[Text("×2", font_size=14, color=ORANGE, weight=BOLD).next_to(bt_c[i], DOWN, buff=0.05) for i in range(1, 3)], + ) + af_t = people_block(6, 4, cols=5, r=0.14, buff=0.2).move_to(LEFT * 3.0 + DOWN * 1.7) + af_c = people_block(6, 4, cols=5, r=0.14, buff=0.2).move_to(RIGHT * 3.0 + DOWN * 1.7) + after_l = Text("재가중 후", font_size=22, color=GRAY_B, weight=BOLD).move_to(LEFT * 5.4 + DOWN * 1.7) + approx = MathTex(r"\approx", color=GREEN_B).scale(1.6).move_to(DOWN * 1.7) + arr_t = Arrow(bt_t.get_bottom() + DOWN * 0.15, af_t.get_top(), color=ORANGE, stroke_width=4, buff=0.15) + arr_c = Arrow(bt_c.get_bottom() + DOWN * 0.15, af_c.get_top(), color=ORANGE, stroke_width=4, buff=0.15) + balance_note = Text("두 그룹의 중증·경증 비율이 같아짐", font_size=26, color=GRAY_A, weight=BOLD).to_edge(DOWN, buff=0.35) + ipw_name = Text("역확률 가중치 (IPW)", font_size=30, color=ORANGE, weight=BOLD).next_to(btitle, DOWN, buff=0.12) + + play_at(123.31, FadeIn(btitle, bt_t, bt_t_l, orig_l, bt_c, bt_c_l), run_time=0.5) # chunk25 + play_at(124.30, LaggedStartMap(FadeIn, bt_t_badges, shift=DOWN * 0.05, lag_ratio=0.05), + LaggedStartMap(FadeIn, bt_c_badges, shift=DOWN * 0.05, lag_ratio=0.05), run_time=1.0) + # chunk26 (128.41): 큰 가중치는 여러 명 대표, 작은 가중치는 영향력 줄어듦 + play_at(128.61, GrowArrow(arr_t), GrowArrow(arr_c), run_time=0.5) + play_at(129.40, Indicate(bt_c_badges[0], color=YELLOW, scale_factor=1.6), + LaggedStartMap(FadeIn, af_t, lag_ratio=0.06), + LaggedStartMap(FadeIn, af_c, lag_ratio=0.06), FadeIn(after_l), run_time=2.0) + # chunk27 (136.81): 균형을 이룹니다 → ≈ + play_at(137.01, FadeIn(approx, scale=1.2), FadeIn(balance_note), run_time=0.6) + play_at(143.19, FadeIn(ipw_name, shift=DOWN * 0.1), run_time=0.5) # chunk28 + play_at(146.40, FadeOut(btitle, ipw_name, bt_t, bt_t_l, bt_c, bt_c_l, orig_l, approx, + bt_t_badges, bt_c_badges, af_t, af_c, after_l, arr_t, arr_c, balance_note), run_time=0.4) + + # ============================================================ + # Beat 8 — 재가중 평균: 일반식 먼저 → 약 복용/미복용 (chunk29~34, 146.80~177.35) + # ============================================================ + dtitle = Text("재가중 평균 비교", font_size=34, color=WHITE, weight=BOLD).to_edge(UP, buff=0.35) + # (1) 일반식 + gnum = Text("(입원일수 × 가중치) 들의 합", font_size=24, color=WHITE, weight=BOLD) + gbar = Line(LEFT, RIGHT, color=GRAY_A, stroke_width=2.5).set_width(gnum.width + 0.5) + gden = Text("가중치 들의 합", font_size=24, color=GRAY_A, weight=BOLD) + gfrac = VGroup(gnum, gbar, gden).arrange(DOWN, buff=0.12) + glbl = Text("가중 평균 =", font_size=26, color=ORANGE, weight=BOLD) + general = VGroup(glbl, gfrac).arrange(RIGHT, buff=0.3).move_to(UP * 1.7) + + def calc_block(days_sev, days_mild, result, color_lbl, label, y): + def term(grp_label, expr, color): + lbl = Text(grp_label, font_size=18, color=color, weight=BOLD) + m = MathTex(expr, color=color).scale(0.72) + return VGroup(lbl, m).arrange(DOWN, buff=0.08) + sev_chip = term("중증", rf"{days_sev}\times 6", SEVERE) + mild_chip = term("경증", rf"{days_mild}\times 4", MILD) + plus = MathTex("+", color=WHITE).scale(0.8) + numer = VGroup(sev_chip, plus, mild_chip).arrange(RIGHT, buff=0.28) + bar = Line(LEFT, RIGHT, color=GRAY_A, stroke_width=2.5) + bar.set_width(numer.width + 0.4).next_to(numer, DOWN, buff=0.12) + denom = MathTex("10", color=GRAY_A).scale(0.8).next_to(bar, DOWN, buff=0.08) + frac = VGroup(numer, bar, denom) + eq = MathTex(rf"= {result}", color=color_lbl).scale(1.0).next_to(frac, RIGHT, buff=0.35) + lab = Text(label, font_size=24, color=color_lbl, weight=BOLD).next_to(frac, LEFT, buff=0.45) + grp = VGroup(lab, frac, eq).scale(0.9) + grp.move_to([0.5, y, 0]) + return grp + + treat_calc = calc_block(7, 2, "5", BLUE_B, "약 복용", 0.1) + ctrl_calc = calc_block(8, 3, "6", RED_B, "약 미복용", -1.45) + meaning = Text("6 = 중증 가중치 합 · 4 = 경증 가중치 합 · 10 = 전체", + font_size=20, color=GRAY_B, weight=BOLD).to_edge(DOWN, buff=0.4) + + play_at(146.95, FadeIn(dtitle), run_time=0.35) # chunk29 + play_at(150.99, FadeIn(general, shift=UP * 0.08), run_time=0.6) # chunk30: 일반식 먼저 + play_at(156.24, FadeIn(treat_calc, shift=UP * 0.08), run_time=0.6) # chunk31: 약 복용 = 5 + play_at(160.37, FadeIn(ctrl_calc, shift=UP * 0.08), FadeIn(meaning), run_time=0.6) # chunk32: 약 미복용 = 6 + + # chunk33 (163.75): '해로워 보이던 약이…' → 막대 재가중 전 → IPW 후 + bx0 = -1.3 + b_treat = make_bar(5.6, BLUE, y=1.0, x0=bx0) + b_ctrl = make_bar(4.7, RED, y=-0.3, x0=bx0) + b_t_lbl = Text("약 복용", font_size=24, color=BLUE_B, weight=BOLD) + b_t_lbl.move_to([bx0 - 0.3 - b_t_lbl.width / 2, 1.0, 0]) + b_c_lbl = Text("약 미복용", font_size=24, color=RED_B, weight=BOLD) + b_c_lbl.move_to([bx0 - 0.3 - b_c_lbl.width / 2, -0.3, 0]) + b_t_val = Text("5.6일", font_size=24, color=BLUE_B, weight=BOLD).next_to(b_treat, RIGHT, buff=0.15) + b_c_val = Text("4.7일", font_size=24, color=RED_B, weight=BOLD).next_to(b_ctrl, RIGHT, buff=0.15) + state_lbl = Text("재가중 전", font_size=26, color=GRAY_A, weight=BOLD).to_edge(UP, buff=1.15) + + play_at(163.95, FadeOut(dtitle, general, treat_calc, ctrl_calc, meaning), + FadeIn(state_lbl, b_t_lbl, b_c_lbl), + GrowFromEdge(b_treat, LEFT), GrowFromEdge(b_ctrl, LEFT), + FadeIn(b_t_val, b_c_val), run_time=0.8) + nt_treat = make_bar(5.0, BLUE, y=1.0, x0=bx0) + nt_ctrl = make_bar(6.0, RED, y=-0.3, x0=bx0) + state2 = Text("IPW 적용 후", font_size=26, color=ORANGE, weight=BOLD).to_edge(UP, buff=1.15) + eff = Text("약이 하루 더 빨리 회복 ✓", font_size=32, color=GREEN_B, weight=BOLD).to_edge(DOWN, buff=0.7) + play_at(166.80, + Transform(b_treat, nt_treat), Transform(b_ctrl, nt_ctrl), + ReplacementTransform(state_lbl, state2), + b_t_val.animate.become(Text("5일", font_size=24, color=BLUE_B, weight=BOLD).next_to(nt_treat, RIGHT, buff=0.15)), + b_c_val.animate.become(Text("6일", font_size=24, color=RED_B, weight=BOLD).next_to(nt_ctrl, RIGHT, buff=0.15)), + run_time=1.6) + play_at(169.20, FadeIn(eff, shift=UP * 0.1), run_time=0.5) + play_at(171.84, Indicate(eff, color=GREEN_B, scale_factor=1.1), run_time=0.8) # chunk34 + play_at(177.15, FadeOut(state2, b_treat, b_ctrl, b_t_lbl, b_c_lbl, b_t_val, b_c_val, eff), run_time=0.4) + + # ============================================================ + # Beat 9 — 결론: 관찰 데이터 → IPW → 무작위 실험처럼 (chunk35~36, 177.35~189.69) + # ============================================================ + obs_t = people_block(4, 1, cols=3, r=0.17, buff=0.14) + obs_c = people_block(1, 3, cols=2, r=0.17, buff=0.14) + obs = VGroup(obs_t, obs_c).arrange(RIGHT, buff=0.7).move_to(LEFT * 4.2 + UP * 0.2) + obs_lbl = Text("관찰 데이터", font_size=30, color=GRAY_A, weight=BOLD).next_to(obs, DOWN, buff=0.5) + shuffle = icon("arrows-shuffle", ORANGE, height=1.3).move_to(UP * 0.2) + shuffle_lbl = Text("IPW", font_size=34, color=ORANGE, weight=BOLD).next_to(shuffle, DOWN, buff=0.25) + arr = Arrow(LEFT * 1.7, RIGHT * 1.7, color=GRAY_B, stroke_width=5).move_to(UP * 0.2) + bal_t = people_block(3, 2, cols=3, r=0.17, buff=0.14) + bal_c = people_block(3, 2, cols=3, r=0.17, buff=0.14) + bal = VGroup(bal_t, bal_c).arrange(RIGHT, buff=0.7).move_to(RIGHT * 4.2 + UP * 0.2) + bal_lbl = Text("무작위 실험처럼", font_size=30, color=GREEN_B, weight=BOLD).next_to(bal, DOWN, buff=0.5) + concl = Text("무작위 실험처럼 균형 있게 비교할 수 있다", font_size=30, color=WHITE, weight=BOLD).to_edge(DOWN, buff=0.6) + + play_at(177.55, FadeIn(obs, shift=RIGHT * 0.1), FadeIn(obs_lbl), run_time=0.7) # chunk35 + play_at(179.90, GrowArrow(arr), FadeIn(shuffle, shuffle_lbl), run_time=0.7) + play_at(182.10, FadeIn(bal, shift=RIGHT * 0.1), FadeIn(bal_lbl), run_time=0.8) + play_at(184.19, FadeIn(concl, shift=UP * 0.1), run_time=0.6) # chunk36 + + go_to(190.40) diff --git a/videos/what_is_ipw/ko/src/03_ipw_formula.py b/videos/what_is_ipw/ko/src/03_ipw_formula.py new file mode 100644 index 0000000..e888b61 --- /dev/null +++ b/videos/what_is_ipw/ko/src/03_ipw_formula.py @@ -0,0 +1,137 @@ +from manim import * +import numpy as np + +ICON = "/Users/jhkim/Desktop/personal_study/causal/causal-studio/videos/assets/tabler-icons/icons/outline" + +SEVERE = "#b07cff" +MILD = "#39d98a" + + +class IPWFormula(Scene): + """ + 씬 03 — IPW 가중치를 수식으로 일반화. + + 흐름: 한 사람을 기준으로 기호(T, X, e(X)) 정의 → 가중치 = 1/확률 (두 경우) → 한 식으로 합치기(underbrace). + 피드백: ①일반화는 그림으로 ②기호 정의 슬라이드 없이 바로 ③가중치 박스 크게 ⑤underbrace ⑥균형 파트 삭제 ⑯강조 최소. + 데이터/표기: e(X)=P(T=1|X). 복용 w=1/e(X), 미복용 w=1/(1-e(X)). + 이전 씬(02): 역확률 가중치(IPW) 명명·재가중. 다음 씬(04): 확률을 모델로 추정(성향점수). + 타이밍 기준: build/audio/03_ipw_formula.timings.json (총 61.77s) + """ + + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + def icon(name, color, height=1.0): + m = SVGMobject(f"{ICON}/{name}.svg") + m.set_stroke(color, width=3) + m.set_fill(color, opacity=0) + m.scale_to_fit_height(height) + return m + + # ============================================================ + # Beat 1~2 — 한 사람을 기준으로 기호 정의 (chunk 0~4, 0.00~20.15) + # 일반화를 '한 사람 그림'으로 표현하고, 그 사람에 T·X·e(X) 를 붙인다. + # ============================================================ + person = icon("user", WHITE, height=1.5).move_to(LEFT * 1.6 + UP * 2.0) + + # 기호 정의를 왼쪽으로 충분히 옮긴다 (이전엔 우측으로 치우쳐 보였음) + LEFT_ANCHOR = -4.6 + + def define(sym, gloss, color, y): + m = MathTex(sym, color=color).scale(1.05) + colon = Text(":", font_size=30, color=GRAY_B) + g = Text(gloss, font_size=26, color=GRAY_A, weight=BOLD) + row = VGroup(m, colon, g).arrange(RIGHT, buff=0.28) + row.move_to([0, y, 0]) + row.shift(RIGHT * (LEFT_ANCHOR - m.get_left()[0])) + return row, m + + T_def, T_sym = define("T", "약 복용 ( 1 / 0 )", BLUE_B, 0.55) + X_def, X_sym = define("X", "환자 상태 ( 중증 / 경증 )", SEVERE, -0.55) + e_def, e_sym = define("e(X)=P(T=1|X)", "약을 먹을 확률", ORANGE, -1.65) + + play_at(0.50, FadeIn(person, shift=DOWN * 0.1), run_time=0.5) + play_at(6.10, FadeIn(T_def, shift=RIGHT * 0.1), run_time=0.5) # chunk3 + play_at(11.40, FadeIn(X_def, shift=RIGHT * 0.1), run_time=0.5) # chunk4 + play_at(15.60, FadeIn(e_def, shift=RIGHT * 0.1), run_time=0.5) # chunk5 + + # ============================================================ + # Beat 3 — 가중치 = 확률의 역수, 두 경우 (chunk6~9, 20.90~44.44) + # (역수 뒤집기 애니메이션은 렌더 행 이슈로 제거) + # ============================================================ + play_at(20.30, FadeOut(person, T_def, X_def), run_time=0.4) # chunk6 + play_at(20.90, e_def.animate.scale(0.85).to_edge(UP, buff=0.5), run_time=0.5) # e(X) 위로 + rule = Text("규칙: 확률의 역수를 가중치로", font_size=30, color=ORANGE, weight=BOLD).move_to(UP * 1.55) + play_at(22.30, FadeIn(rule), run_time=0.5) + + # 두 경우의 가중치 식 + case_t = VGroup( + Text("약 복용", font_size=28, color=BLUE_B, weight=BOLD), + MathTex(r"(T=1)", color=BLUE_B).scale(0.65), + MathTex(r"w = \frac{1}{e(X)}", color=BLUE_B).scale(1.2), + ).arrange(RIGHT, buff=0.4).move_to(UP * 0.55) + case_c = VGroup( + Text("약 미복용", font_size=28, color=RED_B, weight=BOLD), + MathTex(r"(T=0)", color=RED_B).scale(0.65), + MathTex(r"w = \frac{1}{1 - e(X)}", color=RED_B).scale(1.2), + ).arrange(RIGHT, buff=0.4).move_to(DOWN * 1.1) + rare = Text("드물게 일어난 일일수록 가중치가 커짐", font_size=26, color=YELLOW, weight=BOLD).to_edge(DOWN, buff=0.55) + + play_at(27.50, FadeIn(case_t, shift=UP * 0.08), run_time=0.6) # chunk7 + play_at(31.60, FadeIn(case_c, shift=UP * 0.08), run_time=0.5) # chunk8 + play_at(39.50, FadeIn(rare), run_time=0.45) # chunk9 + play_at(43.10, FadeOut(e_def, rule, case_t, case_c, rare), run_time=0.35) + + # ============================================================ + # Beat 4 — 한 식으로 합치기 (underbrace) (chunk 9~12, 43.75~61.77) + # ============================================================ + combined = MathTex("w", "=", r"\frac{T}{e(X)}", "+", r"\frac{1-T}{1-e(X)}").scale(1.25) + combined.move_to(UP * 0.9) + combined[2].set_color(BLUE_B) + combined[4].set_color(RED_B) + + brace_t = Brace(combined[2], DOWN, color=BLUE_B) + brace_c = Brace(combined[4], DOWN, color=RED_B) + lbl_t = Text("왼쪽은 약 복용", font_size=24, color=BLUE_B, weight=BOLD).next_to(brace_t, DOWN, buff=0.15) + lbl_c = Text("오른쪽은 약 미복용", font_size=24, color=RED_B, weight=BOLD).next_to(brace_c, DOWN, buff=0.15) + # chunk12 — T에 숫자를 직접 대입해서 보여 준다 (0이 되는 항은 회색) + sub1 = MathTex(r"T=1:", r"\ w =", r"\frac{1}{e(X)}", r"+", r"\frac{0}{1-e(X)}").scale(0.82) + sub1[0].set_color(BLUE_B) + sub1[4].set_color(GRAY_D) + sub0 = MathTex(r"T=0:", r"\ w =", r"\frac{0}{e(X)}", r"+", r"\frac{1}{1-e(X)}").scale(0.82) + sub0[0].set_color(RED_B) + sub0[2].set_color(GRAY_D) + subs = VGroup(sub1, sub0).arrange(DOWN, buff=0.45).move_to(DOWN * 0.7) + # chunk13 — 누구에게나 가중치 '하나씩' (개인별) + one = icon("user", WHITE, height=0.55) + one_w = MathTex("w", color=ORANGE).scale(0.8) + one_arrow = Arrow(LEFT * 0.4, RIGHT * 0.4, color=GRAY_B, stroke_width=4, buff=0.05) + closing = VGroup(one, one_arrow, one_w, + Text(" 자기 몫의 가중치 하나씩", font_size=26, color=ORANGE, weight=BOLD) + ).arrange(RIGHT, buff=0.2).to_edge(DOWN, buff=0.5) + + play_at(44.60, Write(combined), run_time=0.8) # chunk10 + # chunk11 (47.14): 왼쪽/오른쪽 항 + play_at(47.34, GrowFromCenter(brace_t), FadeIn(lbl_t), + GrowFromCenter(brace_c), FadeIn(lbl_c), run_time=0.6) + # chunk12 (52.80): T=1이면 왼쪽만, T=0이면 오른쪽만 → 숫자 대입 + play_at(53.00, FadeOut(brace_t, brace_c, lbl_t, lbl_c), + combined.animate.to_edge(UP, buff=1.4).scale(0.9), run_time=0.5) + play_at(53.70, FadeIn(sub1, shift=UP * 0.08), run_time=0.45) + play_at(54.80, FadeIn(sub0, shift=UP * 0.08), run_time=0.45) + # chunk13 (57.21): 자기 경우에 맞는 가중치 하나씩 + play_at(57.41, FadeIn(closing, shift=UP * 0.1), run_time=0.5) + + go_to(61.75) diff --git a/videos/what_is_ipw/ko/src/04_propensity_score.py b/videos/what_is_ipw/ko/src/04_propensity_score.py new file mode 100644 index 0000000..332a4a0 --- /dev/null +++ b/videos/what_is_ipw/ko/src/04_propensity_score.py @@ -0,0 +1,191 @@ +from manim import * +import numpy as np + +ICON = "/Users/jhkim/Desktop/personal_study/causal/causal-studio/videos/assets/tabler-icons/icons/outline" + +SEVERE = "#b07cff" +MILD = "#39d98a" + + +class PropensityScore(Scene): + """ + 씬 04 — 성향점수와 로지스틱 회귀로 확률 추정. + + 흐름: (직접 셈의 한계) → 변수 多 → 성향점수 e(X) → 로지스틱 회귀 + 시그모이드 → 추정 ê(X) + → IPW 동일 적용 → 모델이 틀리면(나이 누락) → 정보 선택이 중요. + 피드백: ⑦"문제가 있습니다" ⑧변수 아이콘 ⑩ML 시각화+ê(X) ⑪시그모이드 ⑫큰 박스 ⑬회귀식+나이누락 ⑯강조 최소. + 이전 씬(03): w = T/e(X) + (1-T)/(1-e(X)). 다음 씬(05): 요약. + 타이밍 기준: build/audio/04_propensity_score.timings.json (총 104.03s) + """ + + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + def icon(name, color, height=1.0): + m = SVGMobject(f"{ICON}/{name}.svg") + m.set_stroke(color, width=3) + m.set_fill(color, opacity=0) + m.scale_to_fit_height(height) + return m + + # ============================================================ + # Beat 1 — 직접 셈의 한계 → 문제 제기 (chunk 0~1, 0.00~8.13) + # ============================================================ + recap = VGroup( + VGroup(Text("중증", font_size=28, color=SEVERE, weight=BOLD), + MathTex(r"\tfrac{5}{6}", color=SEVERE).scale(0.95)).arrange(RIGHT, buff=0.25), + VGroup(Text("경증", font_size=28, color=MILD, weight=BOLD), + MathTex(r"\tfrac{2}{4}", color=MILD).scale(0.95)).arrange(RIGHT, buff=0.25), + ).arrange(RIGHT, buff=1.2).move_to(UP * 0.3) + recap_lbl = Text("직접 세어 구한 확률", font_size=26, color=GRAY_B, weight=BOLD).next_to(recap, UP, buff=0.5) + problem = Text("그런데, 문제가 있습니다.", font_size=40, color=YELLOW, weight=BOLD).move_to(DOWN * 1.6) + + play_at(0.40, FadeIn(recap_lbl), FadeIn(recap), run_time=0.5) # chunk1 + play_at(5.95, FadeIn(problem, shift=UP * 0.1), run_time=0.5) # chunk2 + play_at(8.00, FadeOut(recap_lbl, recap, problem), run_time=0.3) + + # ============================================================ + # Beat 2 — 현실엔 변수가 많다 (아이콘) (chunk 2~3, 8.13~19.50) + # ============================================================ + def var_icon(name, label, color): + ic = icon(name, color, height=0.95) + lb = Text(label, font_size=24, color=color, weight=BOLD) + return VGroup(ic, lb).arrange(DOWN, buff=0.25) + + v1 = var_icon("gender-bigender", "성별", BLUE_B) + v2 = var_icon("calendar", "나이", TEAL_B) + v3 = var_icon("weight", "체중", GOLD_B) + v4 = var_icon("stethoscope", "기저질환", RED_B) + vars_row = VGroup(v1, v2, v3, v4).arrange(RIGHT, buff=1.0).move_to(UP * 0.3) + hard = Text("교란변수가 많으면 손으로 셀 수 없다", font_size=30, color=GRAY_A, weight=BOLD).move_to(DOWN * 2.2) + + play_at(8.33, LaggedStartMap(FadeIn, vars_row, lag_ratio=0.25), run_time=1.6) # chunk3 + play_at(13.81, FadeIn(hard, shift=UP * 0.1), run_time=0.5) # chunk4 + play_at(18.56, FadeOut(vars_row, hard), run_time=0.3) # chunk5 + + # ============================================================ + # Beat 3 — 성향점수 정의: e(X) 먼저, '성향점수' 이름은 목소리에 맞춰 뒤에 (chunk6~7) + # ============================================================ + ps_title = Text("성향점수", font_size=44, color=ORANGE, weight=BOLD).move_to(UP * 1.3) + ps_eq = MathTex(r"e(X) = P(\,T=1 \mid X\,)", color=WHITE).scale(1.0).move_to(UP * 0.1) + ps_gloss = Text("환자 정보를 고려한, 약을 먹을 확률", font_size=28, color=GRAY_A, weight=BOLD).move_to(DOWN * 1.1) + + play_at(21.40, Write(ps_eq), run_time=0.7) # chunk6: e(X) 먼저 + play_at(23.40, FadeIn(ps_gloss), run_time=0.5) + play_at(25.60, FadeIn(ps_title, shift=DOWN * 0.1), run_time=0.5) # '성향점수라고 부릅니다' + play_at(27.23, Indicate(ps_eq, color=ORANGE, scale_factor=1.1), run_time=0.7) # chunk7 + play_at(30.16, FadeOut(ps_title, ps_eq, ps_gloss), run_time=0.3) + + # ============================================================ + # Beat 4 — 모델 흐름 + 시그모이드 (chunk 8~13, 32.60~67.06) + # 환자 정보·약 먹을 확률 먼저, '로지스틱 회귀'는 나중에 채워 넣는다. + # ============================================================ + in_node = Text("환자 정보", font_size=26, color=GRAY_A, weight=BOLD).move_to(LEFT * 4.3) + box_rect = RoundedRectangle(width=3.4, height=1.3, corner_radius=0.2, stroke_color=ORANGE, stroke_width=2.5) + out_node = Text("약 먹을 확률", font_size=26, color=WHITE, weight=BOLD).move_to(RIGHT * 4.3) + flow = VGroup(in_node, box_rect, out_node).arrange(RIGHT, buff=0.9).move_to(UP * 2.2) + ar1 = Arrow(in_node.get_right(), box_rect.get_left(), color=GRAY_B, stroke_width=4, buff=0.2) + ar2 = Arrow(box_rect.get_right(), out_node.get_left(), color=GRAY_B, stroke_width=4, buff=0.2) + box_q = Text("?", font_size=40, color=GRAY_B, weight=BOLD).move_to(box_rect.get_center()) + box_lbl = Text("로지스틱 회귀", font_size=26, color=ORANGE, weight=BOLD).move_to(box_rect.get_center()) + + # chunk8 (30.46): 환자 정보 → ? → 약 먹을 확률 (모델은 아직 비워 둠) + play_at(30.66, FadeIn(in_node, out_node), Create(box_rect), FadeIn(box_q), + GrowArrow(ar1), GrowArrow(ar2), run_time=0.7) + # chunk9 (35.76): 그 모델이 '로지스틱 회귀' + play_at(35.96, ReplacementTransform(box_q, box_lbl), run_time=0.6) + + # (b) 시그모이드 그래프 + ax = Axes(x_range=[-6, 6, 3], y_range=[0, 1, 0.5], x_length=5.2, y_length=2.3, + axis_config={"include_tip": False, "stroke_color": GRAY_B}, + y_axis_config={"include_numbers": True, "font_size": 20}).move_to(DOWN * 1.0) + sig = ax.plot(lambda x: 1 / (1 + np.exp(-x)), x_range=[-6, 6], color=ORANGE, stroke_width=4) + line0 = DashedLine(ax.c2p(-6, 0), ax.c2p(6, 0), color=GRAY_D, stroke_width=1.5) + line1 = DashedLine(ax.c2p(-6, 1), ax.c2p(6, 1), color=GRAY_D, stroke_width=1.5) + sig_name = Text("sigmoid", font_size=26, color=ORANGE, weight=BOLD, slant=ITALIC).next_to(ax, UP, buff=0.12) + + # chunk10 (38.41): 성별·증상 정보 → 시그모이드가 0~1로 + play_at(38.61, Create(ax), FadeIn(line0, line1, sig_name), run_time=0.8) + play_at(39.80, Create(sig), run_time=2.0) + + # chunk11 (48.39): 입력 작을 때 0, 클 때 1 — 양 끝 강조 (라벨은 곡선 밖으로) + lo_dot = Dot(ax.c2p(-5, 1 / (1 + np.exp(5))), color=BLUE_B, radius=0.09) + hi_dot = Dot(ax.c2p(5, 1 / (1 + np.exp(-5))), color=GREEN_B, radius=0.09) + lo_lbl = Text("입력 작음 → 0", font_size=16, color=BLUE_B, weight=BOLD).move_to(ax.c2p(-3.0, 0)).shift(DOWN * 0.50) + hi_lbl = Text("입력 큼 → 1", font_size=16, color=GREEN_B, weight=BOLD).move_to(ax.c2p(3.0, 1)).shift(UP * 0.40) + play_at(45.80, FadeIn(lo_dot), Flash(lo_dot, color=BLUE_B), FadeIn(lo_lbl, shift=UP * 0.08), run_time=0.7) # chunk11 + play_at(47.80, FadeIn(hi_dot), Flash(hi_dot, color=GREEN_B), FadeIn(hi_lbl, shift=DOWN * 0.08), run_time=0.7) + + out_lbl = Text("출력은 항상 0 ~ 1 = 확률", font_size=26, color=GREEN_B, weight=BOLD).to_edge(DOWN, buff=0.4) + play_at(51.28, FadeIn(out_lbl, shift=UP * 0.1), run_time=0.5) # chunk12 + # chunk13 (57.21): 추정된 성향점수 ê(X) + ehat = MathTex(r"\hat{e}(X)", color=ORANGE).scale(1.0).next_to(out_node, DOWN, buff=0.25) + ehat_lbl = Text("추정된 성향점수", font_size=22, color=GRAY_A, weight=BOLD).next_to(ehat, DOWN, buff=0.12) + play_at(57.41, FadeIn(ehat), FadeIn(ehat_lbl), run_time=0.6) # chunk13 + play_at(61.00, FadeOut(in_node, out_node, box_rect, box_lbl, ar1, ar2, ax, sig, line0, line1, + sig_name, lo_dot, hi_dot, lo_lbl, hi_lbl, out_lbl, ehat, ehat_lbl), run_time=0.4) + + # ============================================================ + # Beat 5 — 추정 성향점수로 IPW (큰 박스) (chunk 13, 63.95~72.26) + # ============================================================ + big_box = RoundedRectangle(width=9.0, height=2.6, corner_radius=0.25, + stroke_color=GRAY_B, stroke_width=2, fill_opacity=0).move_to(UP * 0.1) + ipw_eq = MathTex(r"\hat{w} = \frac{T}{\hat{e}(X)} + \frac{1-T}{1-\hat{e}(X)}", color=WHITE).scale(1.15).move_to(UP * 0.1) + same = Text("그다음은 앞에서 본 그대로..", font_size=28, color=GRAY_A, weight=BOLD).next_to(big_box, UP, buff=0.4) + + play_at(61.55, FadeIn(same), Create(big_box), run_time=0.6) # chunk14 + play_at(63.50, Write(ipw_eq), run_time=1.0) + play_at(69.00, FadeOut(same, big_box, ipw_eq), run_time=0.35) + + # ============================================================ + # Beat 6 — 모델이 틀리면: 나이 누락 회귀식 + 그래프 (chunk 14~18, 72.26~98.50) + # ============================================================ + warn = Text("성향점수를 잘 추정했을 때만 통한다", font_size=30, color=YELLOW, weight=BOLD).to_edge(UP, buff=0.55) + # 회귀식 (나이 항 빠짐) — 위는 완성된 모델식, 아래는 누락 정보를 분리해 시선 경쟁을 줄인다. + reg = MathTex(r"\text{logit}\,\hat{e}(X) = \beta_0 + \beta_1 x_1 + \beta_2 x_2", color=WHITE).scale(0.88).move_to(UP * 1.6) + reg_legend = Text("x₁ : 성별 x₂ : 증상", font_size=22, color=GRAY_A, weight=BOLD).next_to(reg, DOWN, buff=0.25) + miss_expr = VGroup( + MathTex(r"x_3", color=RED_B).scale(0.76), + Text("= 나이", font_size=22, color=RED_B, weight=BOLD), + ).arrange(RIGHT, buff=0.08) + miss = VGroup( + Text("누락된 변수", font_size=22, color=RED_B, weight=BOLD), + miss_expr, + ).arrange(RIGHT, buff=0.25).next_to(reg_legend, DOWN, buff=0.28) + miss_cross = Cross(miss_expr, stroke_color=RED, stroke_width=4) + + # 회귀 그래프: 점들 + 잘못 맞춘 직선 + ax2 = Axes(x_range=[0, 6, 2], y_range=[0, 6, 2], x_length=4.6, y_length=2.55, + axis_config={"include_tip": False, "stroke_color": GRAY_B, "include_numbers": False}).move_to(DOWN * 1.55) + # 실제 패턴은 산처럼 휘어 있는데(나이 효과), 직선은 그것을 전혀 못 맞춘다 + pts_x = [0.5, 1.3, 2.1, 3.0, 3.9, 4.7, 5.4] + pts_y = [1.0, 3.0, 5.0, 5.3, 4.6, 2.4, 1.0] + dots = VGroup(*[Dot(ax2.c2p(x, y), color=TEAL_B, radius=0.08) for x, y in zip(pts_x, pts_y)]) + bad_line = Line(ax2.c2p(0, 1.3), ax2.c2p(6, 4.6), color=RED_B, stroke_width=3) + graph_lbl = Text("실제 확률과 크게 어긋남", font_size=24, color=RED_B, weight=BOLD).next_to(ax2, RIGHT, buff=0.3) + + play_at(69.67, FadeIn(warn), run_time=0.4) # chunk15~16 + play_at(75.71, FadeIn(reg), FadeIn(reg_legend), run_time=0.5) # chunk17 (성별·증상만) + play_at(80.50, FadeIn(miss), Create(miss_cross), run_time=0.7) # chunk17 (나이 누락) + play_at(85.50, Create(ax2), LaggedStartMap(FadeIn, dots, lag_ratio=0.1), run_time=0.9) + play_at(91.08, Create(bad_line), FadeIn(graph_lbl), run_time=0.7) # chunk18 (확률 어긋남) + + # ============================================================ + # Beat 7 — 마무리: 텍스트 추가 없이 기존 화면을 유지하고 문제점만 강조 (chunk19, 96.83~) + # ============================================================ + play_at(97.03, Indicate(VGroup(ax2, dots, bad_line, graph_lbl), color=RED_B, scale_factor=1.05), run_time=1.0) + play_at(99.30, Indicate(miss, color=RED_B, scale_factor=1.25), run_time=0.9) + + go_to(103.80) diff --git a/videos/what_is_ipw/ko/src/05_summary.py b/videos/what_is_ipw/ko/src/05_summary.py new file mode 100644 index 0000000..6ff4c96 --- /dev/null +++ b/videos/what_is_ipw/ko/src/05_summary.py @@ -0,0 +1,165 @@ +from manim import * +import numpy as np + +ICON = "/Users/jhkim/Desktop/personal_study/causal/causal-studio/videos/assets/tabler-icons/icons/outline" + +SEVERE = "#b07cff" +MILD = "#39d98a" + + +class IPWSummary(Scene): + """ + 씬 05 — 요약. + + 흐름: 문제(교란) → IPW 핵심(두 그룹을 '=' 로 같게) → 실제 데이터(로지스틱 회귀·ê(X)) → 장점/한계 → takeaway. + 피드백: ⑭'두 그룹을 같게'를 '=' 등호로 ⑮로지스틱 회귀·ê(X) 언급 ⑯강조 최소. + 데이터: 진짜 효과 = 하루(복용 5 < 미복용 6). + 타이밍 기준: build/audio/05_summary.timings.json (총 71.29s) + """ + + def construct(self): + self.camera.background_color = "#0a0a0a" + self.t = 0.0 + + def go_to(target_time): + dt = target_time - self.t + if dt > 0: + self.wait(dt) + self.t += dt + + def play_at(target_time, *anims, run_time=0.5): + go_to(target_time) + self.play(*anims, run_time=run_time) + self.t += run_time + + def dot(color, r=0.13): + return Circle(radius=r, stroke_color=color, stroke_width=2).set_fill(color, opacity=0.9) + + def people_block(n_sev, n_mild, cols=5, r=0.13, buff=0.12): + g = VGroup(*[dot(SEVERE, r) for _ in range(n_sev)], *[dot(MILD, r) for _ in range(n_mild)]) + rows = int(np.ceil(len(g) / cols)) + g.arrange_in_grid(rows=rows, cols=cols, buff=buff) + return g + + def badge(m, txt, color=ORANGE, fs=14): + return Text(txt, font_size=fs, color=color, weight=BOLD).next_to(m, DOWN, buff=0.04) + + # ============================================================ + # Beat 1 — 문제 요약: 교란 (chunk1~5, 0.00~21.78) + # ============================================================ + head = Text("정리해 보면", font_size=32, color=GRAY_A, weight=BOLD).to_edge(UP, buff=0.55) + legend = VGroup( + dot(SEVERE), Text("중증", font_size=24, color=SEVERE, weight=BOLD), + dot(MILD), Text("경증", font_size=24, color=MILD, weight=BOLD), + ).arrange(RIGHT, buff=0.25).next_to(head, DOWN, buff=0.3) + ques = Text("약은 정말 회복을 앞당길까?", font_size=32, color=YELLOW, weight=BOLD).move_to(UP * 0.9) + treat = people_block(5, 2, cols=4).move_to(LEFT * 3.3 + DOWN * 0.3) + treat_l = Text("약 복용", font_size=26, color=BLUE_B, weight=BOLD).next_to(treat, UP, buff=0.3) + ctrl = people_block(1, 2, cols=3).move_to(RIGHT * 3.3 + DOWN * 0.3) + ctrl_l = Text("약 미복용", font_size=26, color=RED_B, weight=BOLD).next_to(ctrl, UP, buff=0.3) + confound = Text("교란 변수 (중증/경증)가 진짜 효과를 가린다", font_size=28, color=PURPLE_A, weight=BOLD).move_to(DOWN * 2.5) + + play_at(0.40, FadeIn(head), run_time=0.4) + # chunk2 (2.97): 중증/경증 표시(범례)를 띄운다 + play_at(2.97, FadeIn(legend, shift=DOWN * 0.05), FadeIn(ques), run_time=0.5) + # chunk3 (6.36): 그냥 비교하면 안 됐다 → 두 그룹 + play_at(6.36, FadeOut(ques), FadeIn(treat_l, treat), FadeIn(ctrl_l, ctrl), run_time=0.6) + # chunk4 (10.36): 복용엔 중증, 미복용엔 경증 — 양쪽 모두 강조 + treat_box = SurroundingRectangle(VGroup(treat_l, treat), color=BLUE_B, buff=0.2, corner_radius=0.12) + ctrl_box = SurroundingRectangle(VGroup(ctrl_l, ctrl), color=RED_B, buff=0.2, corner_radius=0.12) + play_at(10.56, Create(treat_box), + *[Indicate(treat[i], color=SEVERE, scale_factor=1.4) for i in range(5)], run_time=1.2) + play_at(12.90, ReplacementTransform(treat_box, ctrl_box), + *[Indicate(ctrl[i], color=MILD, scale_factor=1.4) for i in range(1, 3)], run_time=1.2) + play_at(14.90, FadeOut(ctrl_box), run_time=0.3) + play_at(15.76, FadeIn(confound, shift=UP * 0.1), run_time=0.5) # chunk5 + play_at(20.40, FadeOut(head, legend, treat, treat_l, ctrl, ctrl_l, confound), run_time=0.35) + + # ============================================================ + # Beat 2 — IPW: 가중치를 곱해 두 그룹을 같게 + 진짜 효과 (chunk6~8, 21.78~37.15) + # ============================================================ + ipw_head = Text("IPW : 가중치를 곱해 두 그룹을 같게", font_size=32, color=ORANGE, weight=BOLD).to_edge(UP, buff=0.5) + # 원래 구성 (서로 다름) + bt0 = people_block(5, 2, cols=4, buff=0.32).move_to(LEFT * 3.7 + UP * 0.75) + bc0 = people_block(1, 2, cols=3, buff=0.32).move_to(RIGHT * 3.7 + UP * 0.75) + bt_l = Text("복용", font_size=24, color=BLUE_B, weight=BOLD).next_to(bt0, UP, buff=0.25) + bc_l = Text("미복용", font_size=24, color=RED_B, weight=BOLD).next_to(bc0, UP, buff=0.25) + bc_l.match_y(bt_l) # '복용'/'미복용' 텍스트 높이를 맞춘다 (블록 높이가 달라도 동일선상) + # 개인별 ×가중치 배지 + bt_badges = VGroup(*[badge(bt0[i], "×1.2") for i in range(5)], *[badge(bt0[i], "×2") for i in range(5, 7)]) + bc_badges = VGroup(badge(bc0[0], "×6", YELLOW, 15), *[badge(bc0[i], "×2") for i in range(1, 3)]) + mul_note = Text("× 가중치", font_size=26, color=ORANGE, weight=BOLD).move_to(UP * 0.75) + # 가중치 적용 후 (같은 구성) + bt1 = people_block(6, 4, cols=5).move_to(LEFT * 3.7 + UP * 0.75) + bc1 = people_block(6, 4, cols=5).move_to(RIGHT * 3.7 + UP * 0.75) + approx = MathTex(r"\approx", color=GREEN_B).scale(1.6).move_to(UP * 0.75) + UNIT = 0.5 + eff_t = VGroup(Text("복용", font_size=22, color=BLUE_B, weight=BOLD), + RoundedRectangle(width=5 * UNIT, height=0.36, corner_radius=0.06, stroke_color=BLUE, + stroke_width=2, fill_color=BLUE, fill_opacity=0.5), + Text("5일", font_size=22, color=BLUE_B, weight=BOLD)).arrange(RIGHT, buff=0.2).move_to(DOWN * 1.5 + LEFT * 0.4) + eff_c = VGroup(Text("미복용", font_size=22, color=RED_B, weight=BOLD), + RoundedRectangle(width=6 * UNIT, height=0.36, corner_radius=0.06, stroke_color=RED, + stroke_width=2, fill_color=RED, fill_opacity=0.5), + Text("6일", font_size=22, color=RED_B, weight=BOLD)).arrange(RIGHT, buff=0.2).move_to(DOWN * 2.3 + LEFT * 0.4) + eff_lbl = Text("약의 진짜 효과: 하루 단축 ✓", font_size=26, color=GREEN_B, weight=BOLD).to_edge(DOWN, buff=0.4) + + play_at(20.91, FadeIn(ipw_head), FadeIn(bt_l, bt0, bc_l, bc0), run_time=0.5) # chunk6 + # chunk7 (23.96): 개인별 ×가중치 → 두 그룹 구성 같아짐 + play_at(24.16, LaggedStartMap(FadeIn, bt_badges, shift=DOWN * 0.05, lag_ratio=0.05), + LaggedStartMap(FadeIn, bc_badges, shift=DOWN * 0.05, lag_ratio=0.05), FadeIn(mul_note), run_time=1.0) + play_at(26.00, FadeOut(bt_badges, bc_badges, mul_note), + ReplacementTransform(bt0, bt1), ReplacementTransform(bc0, bc1), FadeIn(approx), run_time=1.0) + # chunk8 (29.91): 진짜 효과 + play_at(30.11, FadeIn(eff_t, eff_c), run_time=0.5) + play_at(32.00, FadeIn(eff_lbl, shift=UP * 0.1), run_time=0.5) + play_at(36.90, FadeOut(ipw_head, bt1, bt_l, bc1, bc_l, approx, eff_t, eff_c, eff_lbl), run_time=0.35) + + # ============================================================ + # Beat 3 — 실제 데이터: 시그모이드 그림 + 개인별 가중치 (chunk9~13, 37.15~61.95) + # 텍스트를 줄이고 그림으로 채운다. + # ============================================================ + real_head = Text("실제 데이터에서는", font_size=30, color=GRAY_A, weight=BOLD).to_edge(UP, buff=0.5) + # 왼쪽: 모델(시그모이드) / 오른쪽: 개인별 가중치 — 서로 겹치지 않게 좌우로 분리 + ax = Axes(x_range=[-6, 6, 3], y_range=[0, 1, 1], x_length=4.4, y_length=2.1, + axis_config={"include_tip": False, "stroke_color": GRAY_B, "include_numbers": False}).move_to(LEFT * 3.7 + UP * 0.25) + sig = ax.plot(lambda x: 1 / (1 + np.exp(-x)), x_range=[-6, 6], color=ORANGE, stroke_width=4) + sig_name = Text("로지스틱 회귀", font_size=24, color=ORANGE, weight=BOLD).next_to(ax, UP, buff=0.15) + ehat = MathTex(r"\hat{e}(X)", color=ORANGE).scale(0.95).next_to(ax, DOWN, buff=0.3) + ehat_lbl = Text("추정된 성향점수", font_size=20, color=GRAY_A, weight=BOLD).next_to(ehat, DOWN, buff=0.12) + + play_at(37.35, FadeIn(real_head), Create(ax), Create(sig), FadeIn(sig_name), run_time=0.9) # chunk9 + play_at(42.83, Write(ehat), FadeIn(ehat_lbl), run_time=0.6) # chunk10 + # chunk11 (46.35): 각 개인별로 가중치 부여 (우측) + ppl = VGroup(dot(SEVERE), dot(SEVERE), dot(MILD), dot(SEVERE), dot(MILD)).arrange(RIGHT, buff=0.45).move_to(RIGHT * 3.3 + UP * 0.6) + ppl_badges = VGroup(*[badge(ppl[i], "×w", ORANGE, 16) for i in range(len(ppl))]) + indiv_lbl = Text("각 개인별로 가중치", font_size=24, color=ORANGE, weight=BOLD).next_to(ppl, DOWN, buff=0.45) + play_at(46.13, FadeIn(ppl), LaggedStartMap(FadeIn, ppl_badges, shift=DOWN * 0.05, lag_ratio=0.08), + FadeIn(indiv_lbl), run_time=1.0) # chunk11 + # chunk12 (50.81): 두 그룹 구성 같게 + balanced = Text("두 그룹 구성을 똑같이 ✓", font_size=24, color=GREEN_B, weight=BOLD).next_to(indiv_lbl, DOWN, buff=0.3) + play_at(51.01, FadeIn(balanced, shift=UP * 0.08), run_time=0.5) + # chunk13 (55.45): 모델 부정확하면 신중 + caution = Text("모델이 부정확하면 IPW도 흔들린다 → 정보 선택이 중요", font_size=26, color=YELLOW, weight=BOLD).to_edge(DOWN, buff=0.5) + play_at(55.65, FadeIn(caution, shift=UP * 0.1), run_time=0.5) + play_at(61.45, FadeOut(real_head, ax, sig, sig_name, ehat, ehat_lbl, ppl, ppl_badges, indiv_lbl, balanced, caution), run_time=0.4) + + # ============================================================ + # Beat 4 — 장점 / 한계 + takeaway(같은 화면에서 마무리) (chunk14~16, 61.95~81.45) + # ============================================================ + sig_title = Text("IPW의 의의", font_size=44, color=ORANGE, weight=BOLD).to_edge(UP, buff=0.8) + merit = VGroup( + Text("✓", font_size=34, color=GREEN_B, weight=BOLD), + Text("실험 없이 관찰 데이터만으로 인과효과 추정", font_size=30, color=GREEN_B, weight=BOLD), + ).arrange(RIGHT, buff=0.35).move_to(UP * 0.4) + limit = VGroup( + Text("△", font_size=32, color=GRAY_B, weight=BOLD), + Text("측정하지 못한 교란 변수가 있으면 한계", font_size=30, color=GRAY_A, weight=BOLD), + ).arrange(RIGHT, buff=0.35).move_to(DOWN * 0.8) + + play_at(61.92, FadeIn(sig_title, shift=DOWN * 0.1), FadeIn(merit, shift=UP * 0.1), run_time=0.6) # chunk14 + play_at(69.49, FadeIn(limit, shift=UP * 0.1), run_time=0.5) # chunk15 + # chunk16 (73.33): 별도 텍스트 없이 제목을 한 번 강조하며 마무리 + play_at(73.53, Indicate(sig_title, color=ORANGE, scale_factor=1.12), run_time=1.0) + + go_to(80.30) diff --git a/videos/what_is_ipw/ko/src/scripts/00_intro.txt b/videos/what_is_ipw/ko/src/scripts/00_intro.txt new file mode 100644 index 0000000..c00955c --- /dev/null +++ b/videos/what_is_ipw/ko/src/scripts/00_intro.txt @@ -0,0 +1,3 @@ +오늘의 주제는, 역확률 가중치. 아이피더블유입니다. + +관찰 데이터만으로 진짜 원인을 가려내는 방법을 함께 알아보겠습니다. diff --git a/videos/what_is_ipw/ko/src/scripts/01_medicine_question.txt b/videos/what_is_ipw/ko/src/scripts/01_medicine_question.txt new file mode 100644 index 0000000..39cf930 --- /dev/null +++ b/videos/what_is_ipw/ko/src/scripts/01_medicine_question.txt @@ -0,0 +1,33 @@ +예시 하나를 보겠습니다. + +감기약을 먹고 며칠 만에 나았다면, 우리는 자연스럽게 약 덕분이라고 생각합니다. + +하지만 정말 약 때문이었을까요? + +오늘은 우리가 본 차이가 진짜 그 원인 때문인지 가려내는 방법을 이야기해 보겠습니다. + +사실 약을 먹으면 더 빨리 나을 것 같죠. + +확인하는 방법도 간단해 보입니다. 약을 먹은 사람과 먹지 않은 사람의 회복 기간을 비교하면 되니까요. + +한 병원의 데이터를 보겠습니다. + +약을 먹은 사람은 평균 5쩜 6일 만에 회복했습니다. 그런데 약을 먹지 않은 사람은 평균 4쩜 7일만에 회복한거죠. + +약을 먹은 쪽이 오히려 영쩜 9일 더 오래 아팠던 겁니다. 이대로라면, 약이 해로운 것처럼 보이죠. + +그런데 정말 그럴까요? 바로 여기에 함정이 있습니다. + +이번엔 환자 구분을 중증과 경증으로 나눠서 보겠습니다. + +먼저 중증 환자입니다. 약을 먹었을 때는 회복까지 평균 7일, 먹지 않았을 때는 회복까지 평균 8일이 걸렸습니다. 약이 회복을 하루 앞당긴 셈입니다. + +경증 환자도 볼까요. 약을 먹었을 때는 회복까지 평균 2일, 먹지 않았을 때는 회복까지 평균 3일이 걸렸습니다. 여기서도 약이 하루를 줄여 줬습니다. + +정말 이상하지 않나요? 중증에서도, 경증에서도 약은 분명히 회복을 앞당겨 줬습니다. + +그런데 둘을 합쳐 전체로 보면, 오히려 약이 해로운 것처럼 결과가 뒤집혀 버립니다. + +도대체 왜 이런 일이 벌어질까요? + +이 수수께끼를 푸는 방법으로, 오늘은 아이피더블유를 알아보겠습니다. diff --git a/videos/what_is_ipw/ko/src/scripts/02_ipw_application.txt b/videos/what_is_ipw/ko/src/scripts/02_ipw_application.txt new file mode 100644 index 0000000..38d2995 --- /dev/null +++ b/videos/what_is_ipw/ko/src/scripts/02_ipw_application.txt @@ -0,0 +1,71 @@ +방금 본 문제를 다시 정리해 봅시다. + +중증 환자에서도, 경증 환자에서도 약은 분명히 회복을 하루씩 앞당겨 줬습니다. + +그런데 전체로 보면, 오히려 약을 먹은 쪽이 회복까지 영쩜 9일이 더 걸립니다. 효과가 정반대로 뒤집힌 거죠. + +나눠서 보면 약이 분명히 좋은데, 전체로 합치면 거꾸로 나빠 보이는 이런 현상을, 심슨의 역설이라고 부릅니다. + +그렇다면 왜 이런 일이 벌어질까요? + +환자 구성을 한 번 살펴보겠습니다. + +약을 먹은 그룹에는 원래 회복이 느린 중증 환자가 잔뜩 모여 있었습니다. + +반대로 약을 먹지 않은 그룹에는 회복이 빠른 경증 환자가 많았습니다. + +처음부터 두 그룹의 환자 구성이 완전히 달랐던 겁니다. + +그래서 약을 먹은 그룹이 전체적으로 회복이 더뎌 보였고, 약이 거꾸로 해로운 것처럼 뒤집힌 것입니다. + +이렇게 약을 먹을지 여부와 회복 기간에 동시에 영향을 주는 변수를, 교란 변수라고 부릅니다. + +즉, 중증 경증 여부가 교란 변수가 되는거죠. + +그럼 어떻게 해야 할까요? + +핵심 아이디어는 단순합니다. + +두 그룹의 중증과 경증 비율이 처음부터 똑같았다면 어땠을지를 보는겁니다. + +데이터 자체를 바꾸지는 않습니다. 대신 사람마다 가중치를 부여할 것입니다. + +먼저 각 환자가 약을 먹을 확률을 구해 봅시다. + +환자 한 명을 동그라미 하나로 표시하고, 약을 먹은 사람만 색을 칠해 보겠습니다. + +중증 환자 여섯 명 중 다섯 명이 약을 먹었습니다. 그러니 중증인 사람이 약을 먹을 확률은, 여섯 명 중 다섯 명, 약 83%입니다. + +경증 환자는 네 명 중 두 명이 먹었으니, 그 확률은 네 명 중 두 명, 즉 50%입니다. + +이제 이 확률의 역수를 가중치로 사용합니다. + +흔하게 보이는 경우에는 작은 가중치를, 드물게 보이는 경우에는 큰 가중치를 주는거죠. + +예를 들어, 중증이면서 약을 먹은 사람에게는 1쩜 이배, 약을 먹지 않은 사람에게는 6배의 가중치를 줍니다. + +경증은 약을 먹은 쪽과 안 먹은 쪽이 비슷하니, 양쪽 모두 2배씩 곱해주면 되겠죠. + +이렇게 가중치를 적용하면, 두 그룹의 중증과 경증 비율이 똑같아집니다. + +드물어서 큰 가중치를 받은 사람은 여러 명을 대표하게 되고, 흔해서 작은 가중치를 받은 사람은 그 영향력이 줄어드는거죠. + +그러면 마치 처음부터 환자를 공정하게 나눠 준 것처럼, 데이터가 균형을 이루게 됩니다. + +이 방법을 역확률 가중치, 즉 아이피더블유라고 부릅니다. + +이제 균형 잡힌 데이터에서 평균만 다시 비교하면 됩니다. + +각 사람의 입원 일수에 가중치를 곱해 모두 더한 다음, 가중치의 합으로 나눠 줍니다. + +이렇게 구하면 약을 먹은 그룹의 가중 평균은 5일. + +약을 먹지 않은 그룹의 가중 평균은 6일이 됩니다. + +이제 약을 먹은 쪽이 하루 더 빨리 회복하게 됩니다. 해로워 보이던 약이, 사실은 회복을 하루 앞당기고 있었던 겁니다. + +중증과 경증의 영향을 걷어 내고 나서야, 가려져 있던 약의 진짜 효과가 드러난 것입니다. + +이렇게 아이피더블유는, 마치 처음부터 약을 무작위로 나눠 준 실험처럼 데이터의 비중을 바꿔 줍니다. + +이제 무작위 실험처럼 데이터를 균형있게 비교할 수 있게 된겁니다. \ No newline at end of file diff --git a/videos/what_is_ipw/ko/src/scripts/03_ipw_formula.txt b/videos/what_is_ipw/ko/src/scripts/03_ipw_formula.txt new file mode 100644 index 0000000..a3a62e3 --- /dev/null +++ b/videos/what_is_ipw/ko/src/scripts/03_ipw_formula.txt @@ -0,0 +1,25 @@ +방금 한 일을, 이번엔 수식으로 일반화해 보겠습니다. + +기호 몇 개만 같이 확인해보죠. + +T는 약을 먹었는지를 나타냅니다. 먹었으면 1, 먹지 않았으면 0입니다. + +X는 환자의 상태, 즉 중증인지 경증인지를 나타냅니다. + +그리고 X 상태인 사람이 약을 먹을 확률을 다음과 같이 쓰겠습니다. + +이제 가중치를 정해 봅시다. 실제로 관찰된 일이 일어날 확률, 그 역수를 가중치로 줍니다. + +약을 먹은 사람에게는, 약을 먹을 확률의 역수를 줍니다. + +약을 먹지 않은 사람에게는, 약을 먹지 않을 확률, 즉 1에서 약을 먹을 확률을 뺀 값의 역수를 줍니다. + +드물게 일어난 일일수록 확률이 작으니, 그 역수인 가중치는 커지는거죠. + +이 두 경우를 하나의 식으로 합쳐볼까요? + +왼쪽 항 은 약을 먹은 사람을 위한 것이고, 오른쪽 항 은 약을 먹지 않은 사람을 위한 것입니다. + +T가 1이면 왼쪽 항만 남고, T가 0이면 오른쪽 항만 남게 되는거죠. + +이렇게 자기 경우에 맞는 가중치가 하나씩 주어지는 것입니다. diff --git a/videos/what_is_ipw/ko/src/scripts/04_propensity_score.txt b/videos/what_is_ipw/ko/src/scripts/04_propensity_score.txt new file mode 100644 index 0000000..172fa29 --- /dev/null +++ b/videos/what_is_ipw/ko/src/scripts/04_propensity_score.txt @@ -0,0 +1,39 @@ +지금까지는 확률을 손으로 직접 셀 수 있었습니다. 중증과 경증, 두 경우뿐이었으니까요. + +그런데 여기서 문제가 하나 생깁니다. + +현실에서는 고려할 것이 훨씬 많습니다. 성별, 나이, 체중, 기저질환까지. + +이렇게 교란변수가 많아지면, 손으로 하나하나 세어 확률을 구하기가 어려워집니다. + +그래서 접근을 바꿉니다. + +각 환자의 정보를 고려했을 때, 그 사람이 약을 먹을 확률. 이것을 성향점수라고 부릅니다. + +앞에서 썼던 그 확률에, 이름을 붙인 것이죠. + +그럼 이 확률은 어떻게 구할까요? 데이터로부터 통계적 모델을 학습시켜 추정합니다. + +대표적인 방법이 로지스틱 회귀입니다. + +성별이나 증상 같은 환자 정보를 넣으면, 시그모이드라는 곡선이 그 값을 0과 1 사이로 바꿔 줍니다. + +이 곡선은 입력이 아주 작을 때는 0에 붙고, 아주 클 때는 1에 가까워집니다. + +그래서 어떤 정보가 들어와도 출력은 항상 0과 1 사이, 그러니까 확률이 되는 겁니다. + +이렇게 모델이 추정한 확률을, 추정된 성향점수라고 부릅니다. + +추정된 성향점수를 구하고 나면, 그다음은 똑같습니다. 앞에서 배운 그대로, 성향점수의 역수로 가중치를 만들면 됩니다. + +단, 한 가지 주의할 점이 있습니다. + +이 모든 건 성향점수를 잘 추정했을 때의 이야기입니다. + +예를 들어 성별과 증상만 들어간 식을 떠올려 봅시다. +이때 나이가 약 복용에 큰 영향을 주는데, 모델에서 나이라는 변수가 없다고 합시다. +이 경우, 실제 나이가 약 복용에 주는 영향을 놓치게 됩니다. + +결국 추정한 확률이 실제와 어긋나고, 결국 교란이 깨끗이 제거되지 않습니다. + +그래서 실제 분석에서는, 어떤 정보를 모델에 넣을지가 무엇보다 중요합니다. diff --git a/videos/what_is_ipw/ko/src/scripts/05_summary.txt b/videos/what_is_ipw/ko/src/scripts/05_summary.txt new file mode 100644 index 0000000..a466499 --- /dev/null +++ b/videos/what_is_ipw/ko/src/scripts/05_summary.txt @@ -0,0 +1,31 @@ +지금까지 배운 내용을 정리해 보겠습니다. + +우리는 약이 정말 회복을 앞당기는지 알고 싶었습니다. + +하지만 약을 먹은 사람과 안 먹은 사람을 그냥 비교하면 안 됐습니다. + +약을 먹은 그룹엔 중증 환자가, 먹지 않은 그룹엔 경증 환자가 몰려 있었기 때문입니다. + +이렇게 두 그룹의 출발선이 다르면, 교란 변수가 진짜 효과를 가려 버립니다. + +이 문제를 풀어 준 것이 바로 아이피더블유였습니다. + +각 사람에게 확률의 역수만큼 가중치를 줘서, 두 그룹의 환자 구성을 똑같이 맞춥니다. + +그러자 해로워 보이던 약이 사실은 회복을 하루 앞당긴다는, 진짜 효과를 얻을 수 있었습니다. + +변수가 많은 실제 데이터에서는, 이 확률을 로지스틱 회귀 같은 모델로 추정합니다. + +이렇게 추정한 확률이 바로 추정된 성향점수입니다. + +이 추정된 성향점수를 이용해서 각 개인별로 가중치를 부여해줍니다. + +그러면 전체적으로 두 그룹의 환자 구성을 똑같이 맞출 수 있습니다. + +물론 모델이 부정확하면 아이피더블유도 흔들리니, 어떤 정보를 넣을지 신중해야 합니다. + +아이피더블유의 가장 큰 매력은, 실험이 어려운 상황에서도 관찰 데이터만으로 인과효과를 추정할 수 있다는 점입니다. + +측정하지 못한 교란 변수가 있다면 한계는 분명하지만, + +그 한계를 알고 신중히 쓴다면, 아이피더블유는 단순한 평균 비교가 줄 수 없는 답을 줍니다. diff --git a/videos/what_is_ipw/ko/src/thumbnail.py b/videos/what_is_ipw/ko/src/thumbnail.py new file mode 100644 index 0000000..ea8923f --- /dev/null +++ b/videos/what_is_ipw/ko/src/thumbnail.py @@ -0,0 +1,53 @@ +from manim import * + +# what_is_ipw 유튜브 썸네일 (1920×1080) +# 컨셉: 동그라미(환자)로 '서로 다르던 두 그룹이 IPW로 똑같이 균형 맞춰진다'만 강조. +# 디자인: 점 격자를 45도 기울인 다이아몬드 배치로 감각 있게. + +SEVERE = "#b07cff" # 중증 +MILD = "#39d98a" # 경증 +ORG = "#F5A623" +GOOD = "#7CE38B" +GOLD = "#F4C95D" + + +class Thumbnail(Scene): + def construct(self): + self.camera.background_color = "#000000" + + def cluster(n_p, n_g, cols, r=0.16, buff=0.14): + g = VGroup(*[Dot(color=SEVERE, radius=r) for _ in range(n_p)], + *[Dot(color=MILD, radius=r) for _ in range(n_g)]) + g.arrange_in_grid(cols=cols, buff=buff) + g.rotate(45 * DEGREES) # 45도 기울인 다이아몬드 배치 + return g + + # ===== 상단 타이틀 ===== + ipw = Tex(r"\textbf{IPW}", color=ORG).scale(2.4) + ipw_kr = Text("역확률 가중치", font_size=58, color=WHITE, weight=BOLD) + title = VGroup(ipw, ipw_kr).arrange(RIGHT, buff=0.5).to_edge(UP, buff=0.55) + + # ===== 위: 서로 다른 두 그룹 (작게, 흐리게) ===== + b0 = cluster(5, 2, 4, r=0.1, buff=0.1) + c0 = cluster(1, 2, 3, r=0.1, buff=0.1) + neq = Tex(r"$\neq$", color=GREY_B).scale(1.4) + before = VGroup(b0, neq, c0).arrange(RIGHT, buff=0.55).set_opacity(0.55) + before_grp = VGroup(before).arrange(DOWN, buff=0.3).move_to(UP * 1.35) + + # ===== IPW 화살표 ===== + arrow = Arrow(UP * 0.55, DOWN * 0.55, color=ORG, stroke_width=10, + max_tip_length_to_length_ratio=0.35).move_to(UP * 0.05) + arrow_lbl = Text("IPW", font_size=34, color=ORG, weight=BOLD).next_to(arrow, RIGHT, buff=0.25) + step = VGroup(arrow, arrow_lbl) + + # ===== 아래: 균형 맞춰진 두 그룹 (크게, 강조) ===== + b1 = cluster(6, 4, 5) + c1 = cluster(6, 4, 5) + eq = Tex(r"$=$", color=GOOD).scale(2.6) + after = VGroup(b1, eq, c1).arrange(RIGHT, buff=0.9) + glow = SurroundingRectangle(after, color=GOOD, stroke_width=3, corner_radius=0.25, buff=0.4) + after_lbl = Text("Balance", font_size=40, color=GOOD, weight=BOLD) + after_grp = VGroup(VGroup(after, glow), after_lbl).arrange(DOWN, buff=0.35).move_to(DOWN * 1.7) + + self.add(title, before_grp, step, after_grp) + From ac5d0cc627cc82771d442719d3aa36c18a54f211 Mon Sep 17 00:00:00 2001 From: jhkimon Date: Sat, 13 Jun 2026 14:27:53 +0900 Subject: [PATCH 3/3] =?UTF-8?q?chore:=20=ED=8F=B4=EB=8D=94=20=EA=B5=AC?= =?UTF-8?q?=EC=A1=B0=20=EB=B3=80=EA=B2=BD?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- Pipfile | 20 +++++++++++++++++++ .../{en/src => src/code/en}/00_intro.py | 0 .../{en/src => src/code/en}/01_what_is_ipw.py | 0 .../src => src/code/en}/02_ipw_application.py | 0 .../{en/src => src/code/en}/03_ipw_formula.py | 0 .../code/en}/04_propensity_score.py | 0 .../{en/src => src/code/en}/05_summary.py | 0 .../{en/src => src/code/en}/thumbnail.py | 0 .../{ko/src => src/code/ko}/00_intro.py | 0 .../{ko/src => src/code/ko}/01_what_is_ipw.py | 0 .../src => src/code/ko}/02_ipw_application.py | 0 .../{ko/src => src/code/ko}/03_ipw_formula.py | 0 .../code/ko}/04_propensity_score.py | 0 .../{ko/src => src/code/ko}/05_summary.py | 0 .../{ko/src => src/code/ko}/thumbnail.py | 0 .../scripts => src/scripts/en}/00_intro.txt | 0 .../scripts/en}/01_medicine_question.txt | 0 .../scripts/en}/02_ipw_application.txt | 0 .../scripts/en}/03_ipw_formula.txt | 0 .../scripts/en}/04_propensity_score.txt | 0 .../scripts => src/scripts/en}/05_summary.txt | 0 .../scripts => src/scripts/ko}/00_intro.txt | 0 .../scripts/ko}/01_medicine_question.txt | 0 .../scripts/ko}/02_ipw_application.txt | 0 .../scripts/ko}/03_ipw_formula.txt | 0 .../scripts/ko}/04_propensity_score.txt | 0 .../scripts => src/scripts/ko}/05_summary.txt | 0 27 files changed, 20 insertions(+) create mode 100644 Pipfile rename videos/what_is_ipw/{en/src => src/code/en}/00_intro.py (100%) rename videos/what_is_ipw/{en/src => src/code/en}/01_what_is_ipw.py (100%) rename videos/what_is_ipw/{en/src => src/code/en}/02_ipw_application.py (100%) rename videos/what_is_ipw/{en/src => src/code/en}/03_ipw_formula.py (100%) rename videos/what_is_ipw/{en/src => src/code/en}/04_propensity_score.py (100%) rename videos/what_is_ipw/{en/src => src/code/en}/05_summary.py (100%) rename videos/what_is_ipw/{en/src => src/code/en}/thumbnail.py (100%) rename videos/what_is_ipw/{ko/src => src/code/ko}/00_intro.py (100%) rename videos/what_is_ipw/{ko/src => src/code/ko}/01_what_is_ipw.py (100%) rename videos/what_is_ipw/{ko/src => src/code/ko}/02_ipw_application.py (100%) rename videos/what_is_ipw/{ko/src => src/code/ko}/03_ipw_formula.py (100%) rename videos/what_is_ipw/{ko/src => src/code/ko}/04_propensity_score.py (100%) rename videos/what_is_ipw/{ko/src => src/code/ko}/05_summary.py (100%) rename videos/what_is_ipw/{ko/src => src/code/ko}/thumbnail.py (100%) rename videos/what_is_ipw/{en/src/scripts => src/scripts/en}/00_intro.txt (100%) rename videos/what_is_ipw/{en/src/scripts => src/scripts/en}/01_medicine_question.txt (100%) rename videos/what_is_ipw/{en/src/scripts => src/scripts/en}/02_ipw_application.txt (100%) rename videos/what_is_ipw/{en/src/scripts => src/scripts/en}/03_ipw_formula.txt (100%) rename videos/what_is_ipw/{en/src/scripts => src/scripts/en}/04_propensity_score.txt (100%) rename videos/what_is_ipw/{en/src/scripts => src/scripts/en}/05_summary.txt (100%) rename videos/what_is_ipw/{ko/src/scripts => src/scripts/ko}/00_intro.txt (100%) rename videos/what_is_ipw/{ko/src/scripts => src/scripts/ko}/01_medicine_question.txt (100%) rename videos/what_is_ipw/{ko/src/scripts => src/scripts/ko}/02_ipw_application.txt (100%) rename videos/what_is_ipw/{ko/src/scripts => src/scripts/ko}/03_ipw_formula.txt (100%) rename videos/what_is_ipw/{ko/src/scripts => src/scripts/ko}/04_propensity_score.txt (100%) rename videos/what_is_ipw/{ko/src/scripts => src/scripts/ko}/05_summary.txt (100%) diff --git a/Pipfile b/Pipfile new file mode 100644 index 0000000..acdc94e --- /dev/null +++ b/Pipfile @@ -0,0 +1,20 @@ +[[source]] +url = "https://pypi.org/simple" +verify_ssl = true +name = "pypi" + +[packages] +jupyter-book = ">=2.0.0" +manim = ">=0.18.0" +numpy = "*" +pandas = "*" +matplotlib = "*" +seaborn = "*" +scikit-learn = "*" +statsmodels = "*" +graphviz = "*" + +[dev-packages] + +[requires] +python_version = "3.11" diff --git a/videos/what_is_ipw/en/src/00_intro.py b/videos/what_is_ipw/src/code/en/00_intro.py similarity index 100% rename from videos/what_is_ipw/en/src/00_intro.py rename to videos/what_is_ipw/src/code/en/00_intro.py diff --git a/videos/what_is_ipw/en/src/01_what_is_ipw.py b/videos/what_is_ipw/src/code/en/01_what_is_ipw.py similarity index 100% rename from videos/what_is_ipw/en/src/01_what_is_ipw.py rename to videos/what_is_ipw/src/code/en/01_what_is_ipw.py diff --git a/videos/what_is_ipw/en/src/02_ipw_application.py b/videos/what_is_ipw/src/code/en/02_ipw_application.py similarity index 100% rename from videos/what_is_ipw/en/src/02_ipw_application.py rename to videos/what_is_ipw/src/code/en/02_ipw_application.py diff --git a/videos/what_is_ipw/en/src/03_ipw_formula.py b/videos/what_is_ipw/src/code/en/03_ipw_formula.py similarity index 100% rename from videos/what_is_ipw/en/src/03_ipw_formula.py rename to videos/what_is_ipw/src/code/en/03_ipw_formula.py diff --git a/videos/what_is_ipw/en/src/04_propensity_score.py b/videos/what_is_ipw/src/code/en/04_propensity_score.py similarity index 100% rename from videos/what_is_ipw/en/src/04_propensity_score.py rename to videos/what_is_ipw/src/code/en/04_propensity_score.py diff --git a/videos/what_is_ipw/en/src/05_summary.py b/videos/what_is_ipw/src/code/en/05_summary.py similarity index 100% rename from videos/what_is_ipw/en/src/05_summary.py rename to videos/what_is_ipw/src/code/en/05_summary.py diff --git a/videos/what_is_ipw/en/src/thumbnail.py b/videos/what_is_ipw/src/code/en/thumbnail.py similarity index 100% rename from videos/what_is_ipw/en/src/thumbnail.py rename to videos/what_is_ipw/src/code/en/thumbnail.py diff --git a/videos/what_is_ipw/ko/src/00_intro.py b/videos/what_is_ipw/src/code/ko/00_intro.py similarity index 100% rename from videos/what_is_ipw/ko/src/00_intro.py rename to videos/what_is_ipw/src/code/ko/00_intro.py diff --git a/videos/what_is_ipw/ko/src/01_what_is_ipw.py b/videos/what_is_ipw/src/code/ko/01_what_is_ipw.py similarity index 100% rename from videos/what_is_ipw/ko/src/01_what_is_ipw.py rename to videos/what_is_ipw/src/code/ko/01_what_is_ipw.py diff --git a/videos/what_is_ipw/ko/src/02_ipw_application.py b/videos/what_is_ipw/src/code/ko/02_ipw_application.py similarity index 100% rename from videos/what_is_ipw/ko/src/02_ipw_application.py rename to videos/what_is_ipw/src/code/ko/02_ipw_application.py diff --git a/videos/what_is_ipw/ko/src/03_ipw_formula.py b/videos/what_is_ipw/src/code/ko/03_ipw_formula.py similarity index 100% rename from videos/what_is_ipw/ko/src/03_ipw_formula.py rename to videos/what_is_ipw/src/code/ko/03_ipw_formula.py diff --git a/videos/what_is_ipw/ko/src/04_propensity_score.py b/videos/what_is_ipw/src/code/ko/04_propensity_score.py similarity index 100% rename from videos/what_is_ipw/ko/src/04_propensity_score.py rename to videos/what_is_ipw/src/code/ko/04_propensity_score.py diff --git a/videos/what_is_ipw/ko/src/05_summary.py b/videos/what_is_ipw/src/code/ko/05_summary.py similarity index 100% rename from videos/what_is_ipw/ko/src/05_summary.py rename to videos/what_is_ipw/src/code/ko/05_summary.py diff --git a/videos/what_is_ipw/ko/src/thumbnail.py b/videos/what_is_ipw/src/code/ko/thumbnail.py similarity index 100% rename from videos/what_is_ipw/ko/src/thumbnail.py rename to videos/what_is_ipw/src/code/ko/thumbnail.py diff --git a/videos/what_is_ipw/en/src/scripts/00_intro.txt b/videos/what_is_ipw/src/scripts/en/00_intro.txt similarity index 100% rename from videos/what_is_ipw/en/src/scripts/00_intro.txt rename to videos/what_is_ipw/src/scripts/en/00_intro.txt diff --git a/videos/what_is_ipw/en/src/scripts/01_medicine_question.txt b/videos/what_is_ipw/src/scripts/en/01_medicine_question.txt similarity index 100% rename from videos/what_is_ipw/en/src/scripts/01_medicine_question.txt rename to videos/what_is_ipw/src/scripts/en/01_medicine_question.txt diff --git a/videos/what_is_ipw/en/src/scripts/02_ipw_application.txt b/videos/what_is_ipw/src/scripts/en/02_ipw_application.txt similarity index 100% rename from videos/what_is_ipw/en/src/scripts/02_ipw_application.txt rename to videos/what_is_ipw/src/scripts/en/02_ipw_application.txt diff --git a/videos/what_is_ipw/en/src/scripts/03_ipw_formula.txt b/videos/what_is_ipw/src/scripts/en/03_ipw_formula.txt similarity index 100% rename from videos/what_is_ipw/en/src/scripts/03_ipw_formula.txt rename to videos/what_is_ipw/src/scripts/en/03_ipw_formula.txt diff --git a/videos/what_is_ipw/en/src/scripts/04_propensity_score.txt b/videos/what_is_ipw/src/scripts/en/04_propensity_score.txt similarity index 100% rename from videos/what_is_ipw/en/src/scripts/04_propensity_score.txt rename to videos/what_is_ipw/src/scripts/en/04_propensity_score.txt diff --git a/videos/what_is_ipw/en/src/scripts/05_summary.txt b/videos/what_is_ipw/src/scripts/en/05_summary.txt similarity index 100% rename from videos/what_is_ipw/en/src/scripts/05_summary.txt rename to videos/what_is_ipw/src/scripts/en/05_summary.txt diff --git a/videos/what_is_ipw/ko/src/scripts/00_intro.txt b/videos/what_is_ipw/src/scripts/ko/00_intro.txt similarity index 100% rename from videos/what_is_ipw/ko/src/scripts/00_intro.txt rename to videos/what_is_ipw/src/scripts/ko/00_intro.txt diff --git a/videos/what_is_ipw/ko/src/scripts/01_medicine_question.txt b/videos/what_is_ipw/src/scripts/ko/01_medicine_question.txt similarity index 100% rename from videos/what_is_ipw/ko/src/scripts/01_medicine_question.txt rename to videos/what_is_ipw/src/scripts/ko/01_medicine_question.txt diff --git a/videos/what_is_ipw/ko/src/scripts/02_ipw_application.txt b/videos/what_is_ipw/src/scripts/ko/02_ipw_application.txt similarity index 100% rename from videos/what_is_ipw/ko/src/scripts/02_ipw_application.txt rename to videos/what_is_ipw/src/scripts/ko/02_ipw_application.txt diff --git a/videos/what_is_ipw/ko/src/scripts/03_ipw_formula.txt b/videos/what_is_ipw/src/scripts/ko/03_ipw_formula.txt similarity index 100% rename from videos/what_is_ipw/ko/src/scripts/03_ipw_formula.txt rename to videos/what_is_ipw/src/scripts/ko/03_ipw_formula.txt diff --git a/videos/what_is_ipw/ko/src/scripts/04_propensity_score.txt b/videos/what_is_ipw/src/scripts/ko/04_propensity_score.txt similarity index 100% rename from videos/what_is_ipw/ko/src/scripts/04_propensity_score.txt rename to videos/what_is_ipw/src/scripts/ko/04_propensity_score.txt diff --git a/videos/what_is_ipw/ko/src/scripts/05_summary.txt b/videos/what_is_ipw/src/scripts/ko/05_summary.txt similarity index 100% rename from videos/what_is_ipw/ko/src/scripts/05_summary.txt rename to videos/what_is_ipw/src/scripts/ko/05_summary.txt